Enterprise AWS Deployment Guide - Energent.ai

2025-05-28

Enterprise AWS Deployment Guide - Energent.ai

Energent.ai delivers AI-powered virtual desktop agents that automate complex multi-application workflows for enterprise users. This guide provides comprehensive AWS deployment specifications using modern cloud-native architecture with AWS EKS, multi-tenant design, and enterprise-grade security controls.

  • Document Classification: Public
  • Version: 3.0
  • Last Updated: 2025-05-28
  • Architecture: AWS EKS + Serverless Hybrid
  • Compliance: SOC 2, AWS Well-Architected Framework

Table of Contents

  1. Architecture Overview
  2. AWS Infrastructure Requirements
  3. EKS Cluster Specifications
  4. Data Layer Architecture
  5. Serverless Components
  6. Security & Compliance
  7. Network Architecture
  8. CI/CD Pipeline
  9. Monitoring & Observability
  10. Deployment Process
  11. Operations & Maintenance
  12. Support & Escalation

1. Architecture Overview

1.1 Cloud-Native Multi-Tenant Architecture

Energent.ai deploys on AWS using a modern, scalable architecture that combines Kubernetes orchestration with serverless components for optimal performance and cost efficiency.

┌──────────────────────────────────────────────────────────────────┐
│                        AWS CLOUD ENVIRONMENT                     │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐   │
│  │   EKS CLUSTER   │  │   SERVERLESS    │  │   DATA LAYER    │   │
│  │                 │  │                 │  │                 │   │
│  │ • Multi-tenant  │  │ • Lambda Auth   │  │ • DynamoDB      │   │
│  │ • C5.4xlarge    │  │ • Lambda Billing│  │ • S3 Storage    │   │
│  │ • Auto-scaling  │  │ • API Gateway   │  │ • EFS Shared    │   │
│  │ • Flux GitOps   │  │ • EventBridge   │  │ • Secrets Mgr   │   │
│  └─────────────────┘  └─────────────────┘  └─────────────────┘   │
│           │                     │                    │           │
│           └─────────────────────┼────────────────────┘           │
│                                 │                                │
│  ┌─────────────────────────────────────────────────────────────┐ │
│  │                    VPC SECURITY BOUNDARY                    │ │
│  │  • Private Subnets • NAT Gateway   • Security Groups        │ │
│  │  • NACLs           • VPC Endpoints • Transit Gateway        │ │
│  └─────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────┘

1.2 Deployment Models

ModelDescriptionUse CaseSLA
Multi-Tenant EKSShared cluster with namespace isolationStandard enterprise deployment99.9%
Dedicated EKSSingle-tenant clusterHigh-security, regulatory compliance99.95%
Hybrid DeploymentEKS + customer on-premises integrationLegacy system integration99.9%

2. AWS Infrastructure Requirements

2.1 Minimum Infrastructure Specifications

ComponentSpecificationPurpose
EKS Cluster Version1.30+Kubernetes orchestration
Node Group Instance TypeC5.4xlarge (16 vCPU, 32 GB RAM)Compute-optimized workloads
Minimum Node Configuration1 vCPU, 2 GB RAM per tenantResource allocation
EBS Storage100 GB gp3, encryptedPod persistent storage
EFS StorageStandard, encryptedShared file system
S3 BucketsStandard-IA, versioning enabledObject storage
DynamoDBOn-demand, encryption at restMetadata and configuration

2.2 AWS Service Dependencies

ServicePurposeConfiguration
Amazon EKSKubernetes orchestrationPrivate endpoint, logging enabled
EC2 Auto ScalingDynamic node scalingTarget tracking, predictive scaling
Application Load BalancerTraffic distributionSSL termination, WAF integration
AWS LambdaServerless functionsRuntime: Python 3.11, VPC integration
API GatewayAPI managementREST + WebSocket, throttling enabled
CloudWatchMonitoring and loggingContainer Insights, custom metrics
AWS Secrets ManagerSecrets managementAutomatic rotation, encryption
AWS KMSKey managementCustomer-managed keys, auto-rotation

3. EKS Cluster Specifications

3.1 Cluster Configuration

# EKS Cluster Terraform Configuration
resource "aws_eks_cluster" "energent_cluster" {
name     = "energent-production"
role_arn = aws_iam_role.eks_cluster_role.arn
version  = "1.30"

vpc_config {
subnet_ids              = var.private_subnet_ids
endpoint_private_access = true
endpoint_public_access  = false
security_group_ids      = [aws_security_group.eks_cluster.id]
}

encryption_config {
provider {
key_arn = aws_kms_key.eks_encryption.arn
}
resources = ["secrets"]
}

enabled_cluster_log_types = [
"api", "audit", "authenticator", "controllerManager", "scheduler"
]

tags = {
Environment = "production"
Product     = "energent-ai"
Compliance  = "soc2"
}
}

3.2 Node Group Configuration

# Managed Node Group
resource "aws_eks_node_group" "energent_nodes" {
cluster_name    = aws_eks_cluster.energent_cluster.name
node_group_name = "energent-compute-nodes"
node_role_arn   = aws_iam_role.eks_node_role.arn
subnet_ids      = var.private_subnet_ids

instance_types = ["c5.4xlarge"]
capacity_type  = "ON_DEMAND"

scaling_config {
desired_size = 3
max_size     = 20
min_size     = 2
}

update_config {
max_unavailable_percentage = 25
}

launch_template {
id      = aws_launch_template.eks_nodes.id
version = aws_launch_template.eks_nodes.latest_version
}

tags = {
"kubernetes.io/cluster/energent-production" = "owned"
}
}

3.3 Multi-Tenant Resource Allocation

Tenant TierCPU LimitMemory LimitStorageConcurrent Workflows
Basic1 vCPU2 GB10 GB1
Standard2 vCPU4 GB25 GB2
Premium4 vCPU8 GB50 GB4
Enterprise8 vCPU16 GB100 GB8

4. Data Layer Architecture

4.1 Storage Architecture

4.1.1 Amazon S3 Configuration

# S3 Bucket for Object Storage
resource "aws_s3_bucket" "energent_storage" {
bucket = "energent-${var.environment}-storage-${random_id.bucket_suffix.hex}"

tags = {
Environment = var.environment
Purpose     = "energent-object-storage"
}
}

resource "aws_s3_bucket_encryption" "energent_storage" {
bucket = aws_s3_bucket.energent_storage.id

server_side_encryption_configuration {
rule {
apply_server_side_encryption_by_default {
kms_master_key_id = aws_kms_key.s3_encryption.arn
sse_algorithm     = "aws:kms"
}
bucket_key_enabled = true
}
}
}

resource "aws_s3_bucket_versioning" "energent_storage" {
bucket = aws_s3_bucket.energent_storage.id
versioning_configuration {
status = "Enabled"
}
}

4.1.2 DynamoDB Configuration

# DynamoDB for Metadata and Configuration
resource "aws_dynamodb_table" "energent_metadata" {
name           = "energent-metadata-${var.environment}"
billing_mode   = "ON_DEMAND"
hash_key       = "tenant_id"
range_key      = "entity_type"

attribute {
name = "tenant_id"
type = "S"
}

attribute {
name = "entity_type"
type = "S"
}

server_side_encryption {
enabled     = true
kms_key_arn = aws_kms_key.dynamodb_encryption.arn
}

point_in_time_recovery {
enabled = true
}

tags = {
Environment = var.environment
Purpose     = "energent-metadata"
}
}

4.1.3 EFS Shared Storage

# EFS for Shared File System
resource "aws_efs_file_system" "energent_shared" {
creation_token = "energent-shared-${var.environment}"
encrypted      = true
kms_key_id     = aws_kms_key.efs_encryption.arn

performance_mode = "generalPurpose"
provisioned_throughput_in_mibps = 500
throughput_mode = "provisioned"

tags = {
Name        = "energent-shared-storage"
Environment = var.environment
}
}

5. Serverless Components

5.1 AWS Lambda Functions

5.1.1 Authentication Service

# Lambda for Authentication
resource "aws_lambda_function" "auth_service" {
filename         = "auth_service.zip"
function_name    = "energent-auth-${var.environment}"
role            = aws_iam_role.lambda_auth_role.arn
handler         = "auth.handler"
runtime         = "python3.11"
timeout         = 30
memory_size     = 512

vpc_config {
subnet_ids         = var.private_subnet_ids
security_group_ids = [aws_security_group.lambda_auth.id]
}

environment {
variables = {
DYNAMODB_TABLE = aws_dynamodb_table.energent_metadata.name
KMS_KEY_ID     = aws_kms_key.lambda_encryption.key_id
ENVIRONMENT    = var.environment
}
}

tags = {
Environment = var.environment
Service     = "authentication"
}
}

5.1.2 Billing Service

# Lambda for Billing
resource "aws_lambda_function" "billing_service" {
filename         = "billing_service.zip"
function_name    = "energent-billing-${var.environment}"
role            = aws_iam_role.lambda_billing_role.arn
handler         = "billing.handler"
runtime         = "python3.11"
timeout         = 300
memory_size     = 1024

environment {
variables = {
DYNAMODB_TABLE = aws_dynamodb_table.energent_metadata.name
S3_BUCKET      = aws_s3_bucket.energent_storage.bucket
}
}
}

5.2 API Gateway Configuration

# API Gateway for Serverless Functions
resource "aws_api_gateway_rest_api" "energent_api" {
name        = "energent-api-${var.environment}"
description = "Energent.ai Enterprise API"

endpoint_configuration {
types = ["PRIVATE"]
vpc_endpoint_ids = [aws_vpc_endpoint.api_gateway.id]
}

policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Principal = "*"
Action = "execute-api:Invoke"
Resource = "*"
Condition = {
StringEquals = {
"aws:sourceVpc" = var.vpc_id
}
}
}
]
})
}

6. Security & Compliance

6.1 Network Security

6.1.1 VPC Configuration

# VPC Security Groups
resource "aws_security_group" "eks_cluster" {
name_prefix = "energent-eks-cluster-"
vpc_id      = var.vpc_id

ingress {
from_port   = 443
to_port     = 443
protocol    = "tcp"
cidr_blocks = [var.vpc_cidr]
}

egress {
from_port   = 0
to_port     = 0
protocol    = "-1"
cidr_blocks = ["0.0.0.0/0"]
}

tags = {
Name = "energent-eks-cluster-sg"
}
}

6.1.2 Network ACLs

DirectionProtocolPort RangeSource/DestinationPurpose
InboundHTTPS443VPC CIDRAPI access
InboundTCP1024-655350.0.0.0/0Return traffic
OutboundHTTPS4430.0.0.0/0External API calls
OutboundTCP530.0.0.0/0DNS resolution

6.2 Encryption Standards

Data StateEncryption MethodKey ManagementCompliance
At RestAES-256-GCMAWS KMS CMK with auto-rotationSOC 2, FIPS 140-2 Level 3
In TransitTLS 1.3Certificate ManagerSOC 2, PCI DSS
In MemoryApplication-levelHardware Security ModuleSOC 2
BackupAES-256Cross-region KMS keysSOC 2, GDPR

6.3 IAM Roles and Policies

6.3.1 EKS Service Roles

# EKS Cluster Role
resource "aws_iam_role" "eks_cluster_role" {
name = "energent-eks-cluster-role-${var.environment}"

assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "eks.amazonaws.com"
}
}
]
})
}

resource "aws_iam_role_policy_attachment" "eks_cluster_policy" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
role       = aws_iam_role.eks_cluster_role.name
}

7. Network Architecture

7.1 VPC Design

┌─────────────────────────────────────────────────────────────────┐
│                           VPC (10.0.0.0/16)                     │
│                                                                 │
│  ┌─────────────────┐    ┌─────────────────┐    ┌─────────────┐  │
│  │  Public Subnet  │    │  Public Subnet  │    │ Public Sub  │  │
│  │   (10.0.1.0/24) │    │   (10.0.2.0/24) │    │(10.0.3.0/24)│  │
│  │                 │    │                 │    │             │  │
│  │   NAT Gateway   │    │   NAT Gateway   │    │NAT Gateway  │  │
│  │   ALB (public)  │    │   ALB (public)  │    │ALB (public) │  │
│  └─────────────────┘    └─────────────────┘    └─────────────┘  │
│           │                       │                     │       │
│  ┌─────────────────┐    ┌─────────────────┐    ┌─────────────┐  │
│  │ Private Subnet  │    │ Private Subnet  │    │Private Sub  │  │
│  │  (10.0.11.0/24) │    │  (10.0.12.0/24) │    │(10.0.13.0/24│  │
│  │                 │    │                 │    │             │  │
│  │  EKS Nodes      │    │  EKS Nodes      │    │ EKS Nodes   │  │
│  │  Lambda VPC     │    │  Lambda VPC     │    │ Lambda VPC  │  │
│  └─────────────────┘    └─────────────────┘    └─────────────┘  │
│           │                       │                     │       │
│  ┌─────────────────┐    ┌─────────────────┐    ┌─────────────┐  │
│  │   Data Subnet   │    │   Data Subnet   │    │ Data Subnet │  │
│  │  (10.0.21.0/24) │    │  (10.0.22.0/24) │    │(10.0.23.0/24│  │
│  │                 │    │                 │    │             │  │
│  │   RDS/DynamoDB  │    │   RDS/DynamoDB  │    │RDS/DynamoDB │  │
│  │   EFS Mount     │    │   EFS Mount     │    │ EFS Mount   │  │
│  └─────────────────┘    └─────────────────┘    └─────────────┘  │
└─────────────────────────────────────────────────────────────────┘

7.2 VPC Endpoints

ServiceTypePurpose
S3GatewayObject storage access
DynamoDBGatewayMetadata access
EKSInterfaceCluster API access
ECRInterfaceContainer registry
CloudWatchInterfaceMonitoring and logging
Secrets ManagerInterfaceSecrets access

8. CI/CD Pipeline

8.1 Infrastructure as Code (Terraform)

8.1.1 Terraform Structure

terraform/
├── environments/
│   ├── dev/
│   ├── staging/
│   └── production/
├── modules/
│   ├── eks/
│   ├── networking/
│   ├── security/
│   └── storage/
├── shared/
│   └── backend.tf
└── global/
    └── iam.tf

8.1.2 Terraform Pipeline (GitHub Actions)

# .github/workflows/terraform.yml
name: Terraform Infrastructure
on:
  push:
    branches: [main, develop]
    paths: ['terraform/**']

jobs:
  terraform:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: 1.6.0

      - name: Terraform Plan
        run: |
          terraform init
          terraform plan -var-file="environments/${{ github.ref_name }}.tfvars"

      - name: Terraform Apply
        if: github.ref == 'refs/heads/main'
        run: terraform apply -auto-approve

8.2 Kubernetes GitOps (Flux)

8.2.1 Flux Configuration

# flux-system/gotk-sync.yaml
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: GitRepository
metadata:
  name: energent-k8s
  namespace: flux-system
spec:
  interval: 1m
  ref:
    branch: main
  url: https://github.com/energent-ai/k8s-manifests
---
apiVersion: kustomize.toolkit.fluxcd.io/v1beta2
kind: Kustomization
metadata:
  name: energent-apps
  namespace: flux-system
spec:
  interval: 10m
  path: './apps'
  prune: true
  sourceRef:
    kind: GitRepository
    name: energent-k8s
  validation: client

8.3 Serverless Deployment (Serverless Framework)

8.3.1 Serverless Configuration

# serverless.yml
service: energent-serverless
frameworkVersion: '3'

provider:
  name: aws
  runtime: python3.11
  region: ${opt:region, 'us-east-1'}
  stage: ${opt:stage, 'dev'}

  vpc:
    securityGroupIds:
      - ${cf:energent-infrastructure.LambdaSecurityGroup}
    subnetIds:
      - ${cf:energent-infrastructure.PrivateSubnet1}
      - ${cf:energent-infrastructure.PrivateSubnet2}

functions:
  auth:
    handler: src/auth/handler.main
    timeout: 30
    memorySize: 512
    events:
      - http:
          path: /auth/{proxy+}
          method: ANY

  billing:
    handler: src/billing/handler.main
    timeout: 300
    memorySize: 1024
    events:
      - schedule: rate(1 hour)

plugins:
  - serverless-python-requirements
  - serverless-iam-roles-per-function

9. Monitoring & Observability

9.1 CloudWatch Configuration

9.1.1 Container Insights

# CloudWatch Container Insights DaemonSet
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: cloudwatch-agent
  namespace: amazon-cloudwatch
spec:
  selector:
    matchLabels:
      name: cloudwatch-agent
  template:
    metadata:
      labels:
        name: cloudwatch-agent
    spec:
      serviceAccountName: cloudwatch-agent
      containers:
        - name: cloudwatch-agent
          image: amazon/cloudwatch-agent:1.300.0
          env:
            - name: CW_CONFIG_CONTENT
              value: |
                {
                  "metrics": {
                    "namespace": "CWAgent",
                    "metrics_collected": {
                      "cpu": {
                        "measurement": ["cpu_usage_idle", "cpu_usage_iowait"],
                        "metrics_collection_interval": 60
                      },
                      "disk": {
                        "measurement": ["used_percent"],
                        "metrics_collection_interval": 60,
                        "resources": ["*"]
                      },
                      "mem": {
                        "measurement": ["mem_used_percent"],
                        "metrics_collection_interval": 60
                      }
                    }
                  }
                }

9.2 Application Metrics

Metric CategoryMetricsTargetAlert Threshold
AvailabilityUptime, Health Checks99.9%< 99.5%
PerformanceResponse Time, Throughput< 2s, > 1000 RPS> 5s, < 500 RPS
Resource UsageCPU, Memory, Storage< 80%> 90%
Error Rates4xx, 5xx errors< 1%> 5%

9.3 Audit Logging

# Audit Policy for EKS
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
  - level: Metadata
    namespaces: ['energent-ai']
    resources:
      - group: ''
        resources: ['secrets', 'configmaps']
      - group: 'rbac.authorization.k8s.io'
        resources: ['roles', 'rolebindings']

10. Deployment Process

10.1 Deployment Timeline

PhaseDurationActivitiesStakeholders
Pre-Deployment2-3 daysInfrastructure planning, security reviewCustomer IT, Security, Energent Solutions
Infrastructure1-2 daysTerraform deployment, VPC setupCustomer DevOps, Energent Platform
EKS Cluster0.5 dayCluster provisioning, node groupsCustomer DevOps, Energent Platform
Application0.5 dayFlux deployment, application rolloutEnergent Platform Team
Integration1-2 daysSSO, monitoring, testingCustomer IT, Energent Support
Go-Live0.5 dayProduction cutover, validationAll stakeholders

10.2 Deployment Commands

10.2.1 Infrastructure Deployment

# Infrastructure Deployment with Terraform
cd terraform/environments/production
terraform init -backend-config="bucket=energent-terraform-state"
terraform plan -var-file="production.tfvars"
terraform apply -auto-approve

# Verify EKS cluster
aws eks update-kubeconfig --region us-east-1 --name energent-production
kubectl get nodes

10.2.2 Application Deployment

# Install Flux GitOps
flux bootstrap github \
  --owner=energent-ai \
  --repository=k8s-manifests \
  --branch=main \
  --path=./clusters/production

# Deploy serverless components
cd serverless/
serverless deploy --stage production --region us-east-1

# Verify deployment
kubectl get pods -n energent-ai
kubectl get ingress -n energent-ai

10.3 Deployment Validation

# Health check endpoints
curl -k https://api.energent.example.com/health
curl -k https://api.energent.example.com/metrics

# Kubernetes validation
kubectl top nodes
kubectl get hpa -n energent-ai
kubectl logs -n energent-ai -l app=energent-platform

11. Operations & Maintenance

11.1 Backup & Disaster Recovery

11.1.1 Backup Strategy

ComponentFrequencyRetentionRTORPO
EKS Cluster StateDaily30 days< 4 hours< 24 hours
Application DataReal-time90 days< 1 hour< 15 minutes
ConfigurationOn change1 year< 30 minutes0
Audit LogsReal-time7 years< 24 hours0

11.1.2 Disaster Recovery Procedures

# EKS cluster backup using Velero
velero backup create energent-cluster-backup \
  --include-namespaces energent-ai \
  --storage-location aws

# DynamoDB point-in-time recovery
aws dynamodb restore-table-to-point-in-time \
  --source-table-name energent-metadata-production \
  --target-table-name energent-metadata-restored \
  --restore-date-time 2025-05-28T10:00:00.000Z

11.2 Scaling & Performance

11.2.1 Auto-Scaling Configuration

# Horizontal Pod Autoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: energent-platform-hpa
  namespace: energent-ai
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: energent-platform
  minReplicas: 3
  maxReplicas: 50
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80

11.3 Update & Maintenance

11.3.1 Rolling Updates

# EKS cluster update
aws eks update-cluster-version \
  --name energent-production \
  --kubernetes-version 1.30

# Application rolling update via Flux
git commit -am "Update energent-platform to v2.1.0"
git push origin main
# Flux automatically detects and applies changes

12. Support & Escalation

12.1 Support Tiers

TierResponse TimeChannelsScope
L1 - Basic< 4 hoursEmail, PortalGeneral questions, documentation
L2 - Standard< 2 hoursPhone, Email, SlackTechnical issues, integration support
L3 - Premium< 1 hourPhone, Slack, VideoComplex technical issues, architecture
L4 - Critical< 30 minutesPhone, SMS, EscalationProduction outages, security incidents

12.2 24/7 Support Coverage

Enterprise Support:

Emergency Escalation:

12.3 Service Level Agreements

ServiceSLAPenalty
Platform Availability99.9% uptime10% monthly credit per 0.1% shortfall
Response Time (P95)< 2 seconds5% monthly credit if > 5 seconds
Support ResponsePer tier aboveEscalation to next tier
Data RecoveryRTO < 4 hours25% monthly credit if exceeded

Appendices

Appendix A: Security Compliance Checklist

  • VPC with private subnets deployed
  • Security groups with least privilege access
  • KMS encryption for all data at rest
  • TLS 1.3 for all data in transit
  • IAM roles with minimal permissions
  • CloudTrail logging enabled
  • GuardDuty threat detection enabled
  • Config compliance monitoring enabled
  • Secrets Manager for all credentials
  • Regular security scans and assessments

Appendix B: Troubleshooting Guide

Common Issues:

  1. EKS Nodes Not Joining Cluster

    • Verify IAM roles and security groups
    • Check subnet routing and NAT gateway
  2. Application Pods CrashLooping

    • Check resource limits and requests
    • Verify persistent volume claims
  3. Network Connectivity Issues

    • Verify VPC endpoints configuration
    • Check security group rules

  • Document Classification: Public
  • Version: 3.0
  • Last Updated: 2025-05-28
  • Next Review: 2025-08-28
  • Contact: support@energent.ai

Let's talk!

Office:

Abu Dhabi Office:

Al Khatem Tower, Al Maryah Island, Abu Dhabi

Silicon Valley Office:

3101 Park Blvd. Palo Alto, CA

Enterprise AWS Deployment Guide - Energent.ai | Energent.ai Resources