Enterprise AWS Deployment Guide - Energent.ai
Energent.ai delivers AI-powered virtual desktop agents that automate complex multi-application workflows for enterprise users. This guide provides comprehensive AWS deployment specifications using modern cloud-native architecture with AWS EKS, multi-tenant design, and enterprise-grade security controls.
- Document Classification: Public
- Version: 3.0
- Last Updated: 2025-05-28
- Architecture: AWS EKS + Serverless Hybrid
- Compliance: SOC 2, AWS Well-Architected Framework
Table of Contents
- Architecture Overview
- AWS Infrastructure Requirements
- EKS Cluster Specifications
- Data Layer Architecture
- Serverless Components
- Security & Compliance
- Network Architecture
- CI/CD Pipeline
- Monitoring & Observability
- Deployment Process
- Operations & Maintenance
- Support & Escalation
1. Architecture Overview
1.1 Cloud-Native Multi-Tenant Architecture
Energent.ai deploys on AWS using a modern, scalable architecture that combines Kubernetes orchestration with serverless components for optimal performance and cost efficiency.
┌──────────────────────────────────────────────────────────────────┐
│ AWS CLOUD ENVIRONMENT │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ EKS CLUSTER │ │ SERVERLESS │ │ DATA LAYER │ │
│ │ │ │ │ │ │ │
│ │ • Multi-tenant │ │ • Lambda Auth │ │ • DynamoDB │ │
│ │ • C5.4xlarge │ │ • Lambda Billing│ │ • S3 Storage │ │
│ │ • Auto-scaling │ │ • API Gateway │ │ • EFS Shared │ │
│ │ • Flux GitOps │ │ • EventBridge │ │ • Secrets Mgr │ │
│ └─────────────────┘ └─────────────────┘ └─────────────────┘ │
│ │ │ │ │
│ └─────────────────────┼────────────────────┘ │
│ │ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ VPC SECURITY BOUNDARY │ │
│ │ • Private Subnets • NAT Gateway • Security Groups │ │
│ │ • NACLs • VPC Endpoints • Transit Gateway │ │
│ └─────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────┘
1.2 Deployment Models
Model | Description | Use Case | SLA |
---|
Multi-Tenant EKS | Shared cluster with namespace isolation | Standard enterprise deployment | 99.9% |
Dedicated EKS | Single-tenant cluster | High-security, regulatory compliance | 99.95% |
Hybrid Deployment | EKS + customer on-premises integration | Legacy system integration | 99.9% |
2. AWS Infrastructure Requirements
2.1 Minimum Infrastructure Specifications
Component | Specification | Purpose |
---|
EKS Cluster Version | 1.30+ | Kubernetes orchestration |
Node Group Instance Type | C5.4xlarge (16 vCPU, 32 GB RAM) | Compute-optimized workloads |
Minimum Node Configuration | 1 vCPU, 2 GB RAM per tenant | Resource allocation |
EBS Storage | 100 GB gp3, encrypted | Pod persistent storage |
EFS Storage | Standard, encrypted | Shared file system |
S3 Buckets | Standard-IA, versioning enabled | Object storage |
DynamoDB | On-demand, encryption at rest | Metadata and configuration |
2.2 AWS Service Dependencies
Service | Purpose | Configuration |
---|
Amazon EKS | Kubernetes orchestration | Private endpoint, logging enabled |
EC2 Auto Scaling | Dynamic node scaling | Target tracking, predictive scaling |
Application Load Balancer | Traffic distribution | SSL termination, WAF integration |
AWS Lambda | Serverless functions | Runtime: Python 3.11, VPC integration |
API Gateway | API management | REST + WebSocket, throttling enabled |
CloudWatch | Monitoring and logging | Container Insights, custom metrics |
AWS Secrets Manager | Secrets management | Automatic rotation, encryption |
AWS KMS | Key management | Customer-managed keys, auto-rotation |
3. EKS Cluster Specifications
3.1 Cluster Configuration
# EKS Cluster Terraform Configuration
resource "aws_eks_cluster" "energent_cluster" {
name = "energent-production"
role_arn = aws_iam_role.eks_cluster_role.arn
version = "1.30"
vpc_config {
subnet_ids = var.private_subnet_ids
endpoint_private_access = true
endpoint_public_access = false
security_group_ids = [aws_security_group.eks_cluster.id]
}
encryption_config {
provider {
key_arn = aws_kms_key.eks_encryption.arn
}
resources = ["secrets"]
}
enabled_cluster_log_types = [
"api", "audit", "authenticator", "controllerManager", "scheduler"
]
tags = {
Environment = "production"
Product = "energent-ai"
Compliance = "soc2"
}
}
3.2 Node Group Configuration
# Managed Node Group
resource "aws_eks_node_group" "energent_nodes" {
cluster_name = aws_eks_cluster.energent_cluster.name
node_group_name = "energent-compute-nodes"
node_role_arn = aws_iam_role.eks_node_role.arn
subnet_ids = var.private_subnet_ids
instance_types = ["c5.4xlarge"]
capacity_type = "ON_DEMAND"
scaling_config {
desired_size = 3
max_size = 20
min_size = 2
}
update_config {
max_unavailable_percentage = 25
}
launch_template {
id = aws_launch_template.eks_nodes.id
version = aws_launch_template.eks_nodes.latest_version
}
tags = {
"kubernetes.io/cluster/energent-production" = "owned"
}
}
3.3 Multi-Tenant Resource Allocation
Tenant Tier | CPU Limit | Memory Limit | Storage | Concurrent Workflows |
---|
Basic | 1 vCPU | 2 GB | 10 GB | 1 |
Standard | 2 vCPU | 4 GB | 25 GB | 2 |
Premium | 4 vCPU | 8 GB | 50 GB | 4 |
Enterprise | 8 vCPU | 16 GB | 100 GB | 8 |
4. Data Layer Architecture
4.1 Storage Architecture
4.1.1 Amazon S3 Configuration
# S3 Bucket for Object Storage
resource "aws_s3_bucket" "energent_storage" {
bucket = "energent-${var.environment}-storage-${random_id.bucket_suffix.hex}"
tags = {
Environment = var.environment
Purpose = "energent-object-storage"
}
}
resource "aws_s3_bucket_encryption" "energent_storage" {
bucket = aws_s3_bucket.energent_storage.id
server_side_encryption_configuration {
rule {
apply_server_side_encryption_by_default {
kms_master_key_id = aws_kms_key.s3_encryption.arn
sse_algorithm = "aws:kms"
}
bucket_key_enabled = true
}
}
}
resource "aws_s3_bucket_versioning" "energent_storage" {
bucket = aws_s3_bucket.energent_storage.id
versioning_configuration {
status = "Enabled"
}
}
4.1.2 DynamoDB Configuration
# DynamoDB for Metadata and Configuration
resource "aws_dynamodb_table" "energent_metadata" {
name = "energent-metadata-${var.environment}"
billing_mode = "ON_DEMAND"
hash_key = "tenant_id"
range_key = "entity_type"
attribute {
name = "tenant_id"
type = "S"
}
attribute {
name = "entity_type"
type = "S"
}
server_side_encryption {
enabled = true
kms_key_arn = aws_kms_key.dynamodb_encryption.arn
}
point_in_time_recovery {
enabled = true
}
tags = {
Environment = var.environment
Purpose = "energent-metadata"
}
}
4.1.3 EFS Shared Storage
# EFS for Shared File System
resource "aws_efs_file_system" "energent_shared" {
creation_token = "energent-shared-${var.environment}"
encrypted = true
kms_key_id = aws_kms_key.efs_encryption.arn
performance_mode = "generalPurpose"
provisioned_throughput_in_mibps = 500
throughput_mode = "provisioned"
tags = {
Name = "energent-shared-storage"
Environment = var.environment
}
}
5. Serverless Components
5.1 AWS Lambda Functions
5.1.1 Authentication Service
# Lambda for Authentication
resource "aws_lambda_function" "auth_service" {
filename = "auth_service.zip"
function_name = "energent-auth-${var.environment}"
role = aws_iam_role.lambda_auth_role.arn
handler = "auth.handler"
runtime = "python3.11"
timeout = 30
memory_size = 512
vpc_config {
subnet_ids = var.private_subnet_ids
security_group_ids = [aws_security_group.lambda_auth.id]
}
environment {
variables = {
DYNAMODB_TABLE = aws_dynamodb_table.energent_metadata.name
KMS_KEY_ID = aws_kms_key.lambda_encryption.key_id
ENVIRONMENT = var.environment
}
}
tags = {
Environment = var.environment
Service = "authentication"
}
}
5.1.2 Billing Service
# Lambda for Billing
resource "aws_lambda_function" "billing_service" {
filename = "billing_service.zip"
function_name = "energent-billing-${var.environment}"
role = aws_iam_role.lambda_billing_role.arn
handler = "billing.handler"
runtime = "python3.11"
timeout = 300
memory_size = 1024
environment {
variables = {
DYNAMODB_TABLE = aws_dynamodb_table.energent_metadata.name
S3_BUCKET = aws_s3_bucket.energent_storage.bucket
}
}
}
5.2 API Gateway Configuration
# API Gateway for Serverless Functions
resource "aws_api_gateway_rest_api" "energent_api" {
name = "energent-api-${var.environment}"
description = "Energent.ai Enterprise API"
endpoint_configuration {
types = ["PRIVATE"]
vpc_endpoint_ids = [aws_vpc_endpoint.api_gateway.id]
}
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Principal = "*"
Action = "execute-api:Invoke"
Resource = "*"
Condition = {
StringEquals = {
"aws:sourceVpc" = var.vpc_id
}
}
}
]
})
}
6. Security & Compliance
6.1 Network Security
6.1.1 VPC Configuration
# VPC Security Groups
resource "aws_security_group" "eks_cluster" {
name_prefix = "energent-eks-cluster-"
vpc_id = var.vpc_id
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = [var.vpc_cidr]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "energent-eks-cluster-sg"
}
}
6.1.2 Network ACLs
Direction | Protocol | Port Range | Source/Destination | Purpose |
---|
Inbound | HTTPS | 443 | VPC CIDR | API access |
Inbound | TCP | 1024-65535 | 0.0.0.0/0 | Return traffic |
Outbound | HTTPS | 443 | 0.0.0.0/0 | External API calls |
Outbound | TCP | 53 | 0.0.0.0/0 | DNS resolution |
6.2 Encryption Standards
Data State | Encryption Method | Key Management | Compliance |
---|
At Rest | AES-256-GCM | AWS KMS CMK with auto-rotation | SOC 2, FIPS 140-2 Level 3 |
In Transit | TLS 1.3 | Certificate Manager | SOC 2, PCI DSS |
In Memory | Application-level | Hardware Security Module | SOC 2 |
Backup | AES-256 | Cross-region KMS keys | SOC 2, GDPR |
6.3 IAM Roles and Policies
6.3.1 EKS Service Roles
# EKS Cluster Role
resource "aws_iam_role" "eks_cluster_role" {
name = "energent-eks-cluster-role-${var.environment}"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "eks.amazonaws.com"
}
}
]
})
}
resource "aws_iam_role_policy_attachment" "eks_cluster_policy" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
role = aws_iam_role.eks_cluster_role.name
}
7. Network Architecture
7.1 VPC Design
┌─────────────────────────────────────────────────────────────────┐
│ VPC (10.0.0.0/16) │
│ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────┐ │
│ │ Public Subnet │ │ Public Subnet │ │ Public Sub │ │
│ │ (10.0.1.0/24) │ │ (10.0.2.0/24) │ │(10.0.3.0/24)│ │
│ │ │ │ │ │ │ │
│ │ NAT Gateway │ │ NAT Gateway │ │NAT Gateway │ │
│ │ ALB (public) │ │ ALB (public) │ │ALB (public) │ │
│ └─────────────────┘ └─────────────────┘ └─────────────┘ │
│ │ │ │ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────┐ │
│ │ Private Subnet │ │ Private Subnet │ │Private Sub │ │
│ │ (10.0.11.0/24) │ │ (10.0.12.0/24) │ │(10.0.13.0/24│ │
│ │ │ │ │ │ │ │
│ │ EKS Nodes │ │ EKS Nodes │ │ EKS Nodes │ │
│ │ Lambda VPC │ │ Lambda VPC │ │ Lambda VPC │ │
│ └─────────────────┘ └─────────────────┘ └─────────────┘ │
│ │ │ │ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────┐ │
│ │ Data Subnet │ │ Data Subnet │ │ Data Subnet │ │
│ │ (10.0.21.0/24) │ │ (10.0.22.0/24) │ │(10.0.23.0/24│ │
│ │ │ │ │ │ │ │
│ │ RDS/DynamoDB │ │ RDS/DynamoDB │ │RDS/DynamoDB │ │
│ │ EFS Mount │ │ EFS Mount │ │ EFS Mount │ │
│ └─────────────────┘ └─────────────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────────────────┘
7.2 VPC Endpoints
Service | Type | Purpose |
---|
S3 | Gateway | Object storage access |
DynamoDB | Gateway | Metadata access |
EKS | Interface | Cluster API access |
ECR | Interface | Container registry |
CloudWatch | Interface | Monitoring and logging |
Secrets Manager | Interface | Secrets access |
8. CI/CD Pipeline
8.1 Infrastructure as Code (Terraform)
8.1.1 Terraform Structure
terraform/
├── environments/
│ ├── dev/
│ ├── staging/
│ └── production/
├── modules/
│ ├── eks/
│ ├── networking/
│ ├── security/
│ └── storage/
├── shared/
│ └── backend.tf
└── global/
└── iam.tf
8.1.2 Terraform Pipeline (GitHub Actions)
# .github/workflows/terraform.yml
name: Terraform Infrastructure
on:
push:
branches: [main, develop]
paths: ['terraform/**']
jobs:
terraform:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
with:
terraform_version: 1.6.0
- name: Terraform Plan
run: |
terraform init
terraform plan -var-file="environments/${{ github.ref_name }}.tfvars"
- name: Terraform Apply
if: github.ref == 'refs/heads/main'
run: terraform apply -auto-approve
8.2 Kubernetes GitOps (Flux)
8.2.1 Flux Configuration
# flux-system/gotk-sync.yaml
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: GitRepository
metadata:
name: energent-k8s
namespace: flux-system
spec:
interval: 1m
ref:
branch: main
url: https://github.com/energent-ai/k8s-manifests
---
apiVersion: kustomize.toolkit.fluxcd.io/v1beta2
kind: Kustomization
metadata:
name: energent-apps
namespace: flux-system
spec:
interval: 10m
path: './apps'
prune: true
sourceRef:
kind: GitRepository
name: energent-k8s
validation: client
8.3 Serverless Deployment (Serverless Framework)
8.3.1 Serverless Configuration
# serverless.yml
service: energent-serverless
frameworkVersion: '3'
provider:
name: aws
runtime: python3.11
region: ${opt:region, 'us-east-1'}
stage: ${opt:stage, 'dev'}
vpc:
securityGroupIds:
- ${cf:energent-infrastructure.LambdaSecurityGroup}
subnetIds:
- ${cf:energent-infrastructure.PrivateSubnet1}
- ${cf:energent-infrastructure.PrivateSubnet2}
functions:
auth:
handler: src/auth/handler.main
timeout: 30
memorySize: 512
events:
- http:
path: /auth/{proxy+}
method: ANY
billing:
handler: src/billing/handler.main
timeout: 300
memorySize: 1024
events:
- schedule: rate(1 hour)
plugins:
- serverless-python-requirements
- serverless-iam-roles-per-function
9. Monitoring & Observability
9.1 CloudWatch Configuration
9.1.1 Container Insights
# CloudWatch Container Insights DaemonSet
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: cloudwatch-agent
namespace: amazon-cloudwatch
spec:
selector:
matchLabels:
name: cloudwatch-agent
template:
metadata:
labels:
name: cloudwatch-agent
spec:
serviceAccountName: cloudwatch-agent
containers:
- name: cloudwatch-agent
image: amazon/cloudwatch-agent:1.300.0
env:
- name: CW_CONFIG_CONTENT
value: |
{
"metrics": {
"namespace": "CWAgent",
"metrics_collected": {
"cpu": {
"measurement": ["cpu_usage_idle", "cpu_usage_iowait"],
"metrics_collection_interval": 60
},
"disk": {
"measurement": ["used_percent"],
"metrics_collection_interval": 60,
"resources": ["*"]
},
"mem": {
"measurement": ["mem_used_percent"],
"metrics_collection_interval": 60
}
}
}
}
9.2 Application Metrics
Metric Category | Metrics | Target | Alert Threshold |
---|
Availability | Uptime, Health Checks | 99.9% | < 99.5% |
Performance | Response Time, Throughput | < 2s, > 1000 RPS | > 5s, < 500 RPS |
Resource Usage | CPU, Memory, Storage | < 80% | > 90% |
Error Rates | 4xx, 5xx errors | < 1% | > 5% |
9.3 Audit Logging
# Audit Policy for EKS
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
- level: Metadata
namespaces: ['energent-ai']
resources:
- group: ''
resources: ['secrets', 'configmaps']
- group: 'rbac.authorization.k8s.io'
resources: ['roles', 'rolebindings']
10. Deployment Process
10.1 Deployment Timeline
Phase | Duration | Activities | Stakeholders |
---|
Pre-Deployment | 2-3 days | Infrastructure planning, security review | Customer IT, Security, Energent Solutions |
Infrastructure | 1-2 days | Terraform deployment, VPC setup | Customer DevOps, Energent Platform |
EKS Cluster | 0.5 day | Cluster provisioning, node groups | Customer DevOps, Energent Platform |
Application | 0.5 day | Flux deployment, application rollout | Energent Platform Team |
Integration | 1-2 days | SSO, monitoring, testing | Customer IT, Energent Support |
Go-Live | 0.5 day | Production cutover, validation | All stakeholders |
10.2 Deployment Commands
10.2.1 Infrastructure Deployment
# Infrastructure Deployment with Terraform
cd terraform/environments/production
terraform init -backend-config="bucket=energent-terraform-state"
terraform plan -var-file="production.tfvars"
terraform apply -auto-approve
# Verify EKS cluster
aws eks update-kubeconfig --region us-east-1 --name energent-production
kubectl get nodes
10.2.2 Application Deployment
# Install Flux GitOps
flux bootstrap github \
--owner=energent-ai \
--repository=k8s-manifests \
--branch=main \
--path=./clusters/production
# Deploy serverless components
cd serverless/
serverless deploy --stage production --region us-east-1
# Verify deployment
kubectl get pods -n energent-ai
kubectl get ingress -n energent-ai
10.3 Deployment Validation
# Health check endpoints
curl -k https://api.energent.example.com/health
curl -k https://api.energent.example.com/metrics
# Kubernetes validation
kubectl top nodes
kubectl get hpa -n energent-ai
kubectl logs -n energent-ai -l app=energent-platform
11. Operations & Maintenance
11.1 Backup & Disaster Recovery
11.1.1 Backup Strategy
Component | Frequency | Retention | RTO | RPO |
---|
EKS Cluster State | Daily | 30 days | < 4 hours | < 24 hours |
Application Data | Real-time | 90 days | < 1 hour | < 15 minutes |
Configuration | On change | 1 year | < 30 minutes | 0 |
Audit Logs | Real-time | 7 years | < 24 hours | 0 |
11.1.2 Disaster Recovery Procedures
# EKS cluster backup using Velero
velero backup create energent-cluster-backup \
--include-namespaces energent-ai \
--storage-location aws
# DynamoDB point-in-time recovery
aws dynamodb restore-table-to-point-in-time \
--source-table-name energent-metadata-production \
--target-table-name energent-metadata-restored \
--restore-date-time 2025-05-28T10:00:00.000Z
11.2 Scaling & Performance
11.2.1 Auto-Scaling Configuration
# Horizontal Pod Autoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: energent-platform-hpa
namespace: energent-ai
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: energent-platform
minReplicas: 3
maxReplicas: 50
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
11.3 Update & Maintenance
11.3.1 Rolling Updates
# EKS cluster update
aws eks update-cluster-version \
--name energent-production \
--kubernetes-version 1.30
# Application rolling update via Flux
git commit -am "Update energent-platform to v2.1.0"
git push origin main
# Flux automatically detects and applies changes
12. Support & Escalation
12.1 Support Tiers
Tier | Response Time | Channels | Scope |
---|
L1 - Basic | < 4 hours | Email, Portal | General questions, documentation |
L2 - Standard | < 2 hours | Phone, Email, Slack | Technical issues, integration support |
L3 - Premium | < 1 hour | Phone, Slack, Video | Complex technical issues, architecture |
L4 - Critical | < 30 minutes | Phone, SMS, Escalation | Production outages, security incidents |
12.2 24/7 Support Coverage
Enterprise Support:
Emergency Escalation:
12.3 Service Level Agreements
Service | SLA | Penalty |
---|
Platform Availability | 99.9% uptime | 10% monthly credit per 0.1% shortfall |
Response Time (P95) | < 2 seconds | 5% monthly credit if > 5 seconds |
Support Response | Per tier above | Escalation to next tier |
Data Recovery | RTO < 4 hours | 25% monthly credit if exceeded |
Appendices
Appendix A: Security Compliance Checklist
Appendix B: Troubleshooting Guide
Common Issues:
-
EKS Nodes Not Joining Cluster
- Verify IAM roles and security groups
- Check subnet routing and NAT gateway
-
Application Pods CrashLooping
- Check resource limits and requests
- Verify persistent volume claims
-
Network Connectivity Issues
- Verify VPC endpoints configuration
- Check security group rules
- Document Classification: Public
- Version: 3.0
- Last Updated: 2025-05-28
- Next Review: 2025-08-28
- Contact: support@energent.ai