Enterprise GCP Deployment Guide - Energent.ai

Energent.ai delivers AI-powered virtual desktop agents that automate complex multi-application workflows for enterprise users. This guide provides comprehensive Google Cloud Platform deployment specifications using modern cloud-native architecture with GKE, multi-tenant design, and enterprise-grade security controls.

Document Classification: Public
Version: 3.0
Last Updated: 2025-05-28
Architecture: GCP GKE + Serverless Hybrid
Compliance: SOC 2, Google Cloud Security Best Practices

Architecture Overview
GCP Infrastructure Requirements
GKE Cluster Specifications
Data Layer Architecture
Serverless Components
Security & Compliance
Network Architecture
CI/CD Pipeline
Monitoring & Observability
Deployment Process
Operations & Maintenance
Support & Escalation

1. Architecture Overview

1.1 Cloud-Native Multi-Tenant Architecture

Energent.ai deploys on Google Cloud Platform using a modern, scalable architecture that combines Kubernetes orchestration with serverless components for optimal performance and cost efficiency.

┌──────────────────────────────────────────────────────────────────┐
│                        GCP CLOUD ENVIRONMENT                     │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐   │
│  │   GKE CLUSTER   │  │   SERVERLESS    │  │   DATA LAYER    │   │
│  │                 │  │                 │  │                 │   │
│  │ • Multi-tenant  │  │ • Functions Auth│  │ • Firestore     │   │
│  │ • n2-standard-4 │  │ • Functions Bill│  │ • Cloud Storage │   │
│  │ • Auto-scaling  │  │ • API Gateway   │  │ • Filestore     │   │
│  │ • Flux GitOps   │  │ • Pub/Sub       │  │ • Secret Mgr    │   │
│  └─────────────────┘  └─────────────────┘  └─────────────────┘   │
│           │                     │                    │           │
│           └─────────────────────┼────────────────────┘           │
│                                 │                                │
│  ┌─────────────────────────────────────────────────────────────┐ │
│  │                    VPC SECURITY BOUNDARY                    │ │
│  │  • Private Subnets • Cloud NAT    • Firewall Rules          │ │
│  │  • IAP Tunnels     • VPC Endpoints • Load Balancer          │ │
│  └─────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────┘

1.2 Deployment Models

Model	Description	Use Case	SLA
Multi-Tenant GKE	Shared cluster with namespace isolation	Standard enterprise deployment	99.9%
Dedicated GKE	Single-tenant cluster	High-security, regulatory compliance	99.95%
Hybrid Deployment	GKE + customer on-premises integration	Legacy system integration	99.9%

2. GCP Infrastructure Requirements

2.1 Minimum Infrastructure Specifications

Component	Specification	Purpose
GKE Cluster Version	1.30+	Kubernetes orchestration
Node Pool Instance Type	n2-standard-4 (4 vCPU, 16 GB RAM)	Compute-optimized workloads
Minimum Node Configuration	1 vCPU, 2 GB RAM per tenant	Resource allocation
Persistent Disks	100 GB SSD, encrypted	Pod persistent storage
Filestore	Basic, encrypted	Shared file system
Cloud Storage	Standard, versioning enabled	Object storage
Firestore	Native mode, encryption at rest	Metadata and configuration

2.2 GCP Service Dependencies

Service	Purpose	Configuration
Google GKE	Kubernetes orchestration	Private cluster, logging enabled
Compute Engine	Dynamic node scaling	Auto-scaling, preemptible instances
Cloud Load Balancing	Traffic distribution	SSL termination, Cloud Armor
Cloud Functions	Serverless functions	Runtime: Python 3.11, VPC connector
API Gateway	API management	Rate limiting, authentication
Cloud Monitoring	Monitoring and logging	GKE monitoring, custom metrics
Secret Manager	Secrets management	Automatic rotation, encryption
Cloud KMS	Key management	Customer-managed keys, auto-rotation

3. GKE Cluster Specifications

3.1 Cluster Configuration

# GKE Cluster Terraform Configuration
resource "google_container_cluster" "energent_cluster" {
name     = "energent-production"
location = var.gcp_region

remove_default_node_pool = true
initial_node_count       = 1

network    = google_compute_network.vpc.name
subnetwork = google_compute_subnetwork.subnet.name

networking_mode = "VPC_NATIVE"
ip_allocation_policy {
cluster_secondary_range_name  = "k8s-pod-range"
services_secondary_range_name = "k8s-service-range"
}

private_cluster_config {
enable_private_nodes    = true
enable_private_endpoint = false
master_ipv4_cidr_block  = "172.16.0.0/28"
}

master_auth {
client_certificate_config {
issue_client_certificate = false
}
}

workload_identity_config {
workload_pool = "${var.project_id}.svc.id.goog"
}

addons_config {
gcp_filestore_csi_driver_config {
enabled = true
}

network_policy_config {
disabled = false
}
}

cluster_telemetry {
type = "ENABLED"
}

logging_config {
enable_components = [
"SYSTEM_COMPONENTS",
"WORKLOADS",
"API_SERVER"
]
}

monitoring_config {
enable_components = [
"SYSTEM_COMPONENTS",
"WORKLOADS"
]
}
}

3.2 Node Pool Configuration

# Primary Node Pool
resource "google_container_node_pool" "energent_nodes" {
name       = "energent-node-pool"
location   = var.gcp_region
cluster    = google_container_cluster.energent_cluster.name
node_count = 3

autoscaling {
min_node_count = 2
max_node_count = 20
}

node_config {
preemptible  = false
machine_type = "n2-standard-4"
disk_size_gb = 100
disk_type    = "pd-ssd"

service_account = google_service_account.gke_service_account.email
oauth_scopes = [
"https://www.googleapis.com/auth/logging.write",
"https://www.googleapis.com/auth/monitoring",
"https://www.googleapis.com/auth/cloud-platform"
]

workload_metadata_config {
mode = "GKE_METADATA"
}

labels = {
env = "production"
app = "energent-ai"
}

taint {
key    = "workload"
value  = "energent-ai"
effect = "NO_SCHEDULE"
}
}

management {
auto_repair  = true
auto_upgrade = true
}
}

3.3 Multi-Tenant Resource Allocation

Tenant Tier	CPU Limit	Memory Limit	Storage	Concurrent Workflows
Basic	1 vCPU	2 GB	10 GB	1
Standard	2 vCPU	4 GB	25 GB	2
Premium	4 vCPU	8 GB	50 GB	4
Enterprise	8 vCPU	16 GB	100 GB	8

4. Data Layer Architecture

4.1 Storage Architecture

4.1.1 Cloud Storage Configuration

# Cloud Storage Bucket for Object Storage
resource "google_storage_bucket" "energent_storage" {
  name     = "energent-${var.environment}-storage-${random_id.bucket_suffix.hex}"
  location = var.gcp_region

  uniform_bucket_level_access = true

  versioning {
    enabled = true
  }

  encryption {
    default_kms_key_name = google_kms_crypto_key.storage_key.id
  }

  lifecycle_rule {
    condition {
      age = 90
    }
    action {
      type          = "SetStorageClass"
      storage_class = "NEARLINE"
    }
  }

  retention_policy {
    retention_period = 2592000  # 30 days
  }

  labels = {
    environment = var.environment
    purpose     = "energent-object-storage"
  }
}

resource "google_storage_bucket_iam_member" "storage_admin" {
  bucket = google_storage_bucket.energent_storage.name
  role   = "roles/storage.admin"
  member = "serviceAccount:${google_service_account.gke_service_account.email}"
}

4.1.2 Firestore Configuration

# Firestore Database for Metadata and Configuration
resource "google_firestore_database" "energent_metadata" {
  project     = var.project_id
  name        = "energent-metadata-${var.environment}"
  location_id = var.gcp_region
  type        = "FIRESTORE_NATIVE"

  concurrency_mode = "OPTIMISTIC"
  app_engine_integration_mode = "DISABLED"

  point_in_time_recovery_enablement = "POINT_IN_TIME_RECOVERY_ENABLED"
  delete_protection_state = "DELETE_PROTECTION_ENABLED"
}

# Firestore Security Rules
resource "google_firestore_database" "security_rules" {
  depends_on = [google_firestore_database.energent_metadata]

  # Security rules content would be defined here
  # Implementing tenant isolation and access controls
}

4.1.3 Filestore Shared Storage

# Filestore for Shared File System
resource "google_filestore_instance" "energent_shared" {
name     = "energent-shared-${var.environment}"
location = var.gcp_zone
tier     = "BASIC_HDD"

file_shares {
capacity_gb = 1024
name        = "energent-share"
}

networks {
network = google_compute_network.vpc.name
modes   = ["MODE_IPV4"]
}

labels = {
environment = var.environment
purpose     = "shared-storage"
}
}

5. Serverless Components

5.1 Cloud Functions

5.1.1 Authentication Service

# Cloud Function for Authentication
resource "google_cloudfunctions2_function" "auth_service" {
name     = "energent-auth-${var.environment}"
location = var.gcp_region

build_config {
runtime     = "python311"
entry_point = "auth_handler"
source {
storage_source {
bucket = google_storage_bucket.functions_source.name
object = google_storage_bucket_object.auth_source.name
}
}
}

service_config {
max_instance_count = 100
min_instance_count = 1
available_memory   = "512Mi"
timeout_seconds    = 60

environment_variables = {
FIRESTORE_PROJECT = var.project_id
SECRET_MANAGER_PROJECT = var.project_id
ENVIRONMENT = var.environment
}

vpc_connector = google_vpc_access_connector.connector.id
vpc_connector_egress_settings = "ALL_TRAFFIC"

service_account_email = google_service_account.functions_service_account.email
}

event_trigger {
trigger_region = var.gcp_region
event_type     = "google.cloud.pubsub.topic.v1.messagePublished"
pubsub_topic   = google_pubsub_topic.auth_events.id
}

labels = {
environment = var.environment
service     = "authentication"
}
}

5.1.2 Billing Service

# Cloud Function for Billing
resource "google_cloudfunctions2_function" "billing_service" {
name     = "energent-billing-${var.environment}"
location = var.gcp_region

build_config {
runtime     = "python311"
entry_point = "billing_handler"
source {
storage_source {
bucket = google_storage_bucket.functions_source.name
object = google_storage_bucket_object.billing_source.name
}
}
}

service_config {
max_instance_count = 50
min_instance_count = 0
available_memory   = "1Gi"
timeout_seconds    = 300

environment_variables = {
FIRESTORE_PROJECT = var.project_id
STORAGE_BUCKET = google_storage_bucket.energent_storage.name
}

service_account_email = google_service_account.functions_service_account.email
}
}

5.2 API Gateway Configuration

# API Gateway for Serverless Functions
resource "google_api_gateway_api" "energent_api" {
provider = google-beta
api_id   = "energent-api-${var.environment}"
project  = var.project_id

labels = {
environment = var.environment
service     = "api-gateway"
}
}

resource "google_api_gateway_api_config" "energent_api_config" {
provider      = google-beta
api           = google_api_gateway_api.energent_api.api_id
api_config_id = "energent-config-${var.environment}"
project       = var.project_id

openapi_documents {
document {
path     = "spec.yaml"
contents = base64encode(templatefile("${path.module}/api-spec.yaml", {
project_id = var.project_id
region     = var.gcp_region
}))
}
}

lifecycle {
create_before_destroy = true
}
}

resource "google_api_gateway_gateway" "energent_gateway" {
provider   = google-beta
gateway_id = "energent-gateway-${var.environment}"
api_config = google_api_gateway_api_config.energent_api_config.id
location   = var.gcp_region
project    = var.project_id

labels = {
environment = var.environment
service     = "api-gateway"
}
}

6. Security & Compliance

6.1 Network Security

6.1.1 VPC Configuration

# VPC Network and Firewall Rules
resource "google_compute_network" "vpc" {
name                    = "energent-vpc-${var.environment}"
auto_create_subnetworks = false
mtu                     = 1460
}

resource "google_compute_subnetwork" "subnet" {
name          = "energent-subnet-${var.environment}"
ip_cidr_range = "10.0.0.0/16"
region        = var.gcp_region
network       = google_compute_network.vpc.id

secondary_ip_range {
range_name    = "k8s-pod-range"
ip_cidr_range = "10.1.0.0/16"
}

secondary_ip_range {
range_name    = "k8s-service-range"
ip_cidr_range = "10.2.0.0/16"
}

private_ip_google_access = true
}

resource "google_compute_firewall" "allow_internal" {
name    = "energent-allow-internal"
network = google_compute_network.vpc.name

allow {
protocol = "tcp"
ports    = ["0-65535"]
}

allow {
protocol = "udp"
ports    = ["0-65535"]
}

allow {
protocol = "icmp"
}

source_ranges = ["10.0.0.0/8"]
}

resource "google_compute_firewall" "allow_https" {
name    = "energent-allow-https"
network = google_compute_network.vpc.name

allow {
protocol = "tcp"
ports    = ["443"]
}

source_ranges = ["0.0.0.0/0"]
target_tags   = ["https-server"]
}

6.1.2 Firewall Rules

Direction	Protocol	Port Range	Source/Destination	Purpose
Inbound	HTTPS	443	0.0.0.0/0	API access
Inbound	TCP	1024-65535	10.0.0.0/8	Internal traffic
Outbound	HTTPS	443	0.0.0.0/0	External API calls
Outbound	TCP	53	0.0.0.0/0	DNS resolution

6.2 Encryption Standards

Data State	Encryption Method	Key Management	Compliance
At Rest	AES-256-GCM	Cloud KMS with auto-rotation	SOC 2, FIPS 140-2 Level 3
In Transit	TLS 1.3	Google-managed certificates	SOC 2, PCI DSS
In Memory	Application-level	Hardware Security Module	SOC 2
Backup	AES-256	Cross-region Cloud KMS	SOC 2, GDPR

6.3 IAM and Service Accounts

6.3.1 GKE Service Accounts

# GKE Service Account
resource "google_service_account" "gke_service_account" {
  account_id   = "energent-gke-${var.environment}"
  display_name = "Energent GKE Service Account"
  project      = var.project_id
}

resource "google_project_iam_member" "gke_permissions" {
  for_each = toset([
    "roles/logging.logWriter",
    "roles/monitoring.metricWriter",
    "roles/monitoring.viewer",
    "roles/storage.objectViewer"
  ])

  project = var.project_id
  role    = each.value
  member  = "serviceAccount:${google_service_account.gke_service_account.email}"
}

# Workload Identity binding
resource "google_service_account_iam_member" "workload_identity" {
  service_account_id = google_service_account.gke_service_account.name
  role               = "roles/iam.workloadIdentityUser"
  member             = "serviceAccount:${var.project_id}.svc.id.goog[energent-ai/energent-platform]"
}

7. Network Architecture

7.1 VPC Design

┌─────────────────────────────────────────────────────────────────┐
│                         VPC (10.0.0.0/16)                       │
│                                                                 │
│  ┌─────────────────┐    ┌─────────────────┐    ┌─────────────┐  │
│  │  Public Subnet  │    │  Public Subnet  │    │ Public Sub  │  │
│  │   (10.0.1.0/24) │    │   (10.0.2.0/24) │    │(10.0.3.0/24)│  │
│  │                 │    │                 │    │             │  │
│  │   Cloud NAT     │    │   Cloud NAT     │    │ Cloud NAT   │  │
│  │   Load Balancer │    │   Load Balancer │    │Load Balancer│  │
│  └─────────────────┘    └─────────────────┘    └─────────────┘  │
│           │                       │                     │       │
│  ┌─────────────────┐    ┌─────────────────┐    ┌─────────────┐  │
│  │ Private Subnet  │    │ Private Subnet  │    │Private Sub  │  │
│  │  (10.1.0.0/16)  │    │  (10.1.0.0/16)  │    │(10.1.0.0/16)│  │
│  │                 │    │                 │    │             │  │
│  │  GKE Nodes      │    │  GKE Nodes      │    │ GKE Nodes   │  │
│  │  Functions VPC  │    │  Functions VPC  │    │ Functions   │  │
│  └─────────────────┘    └─────────────────┘    └─────────────┘  │
│           │                       │                     │       │
│  ┌─────────────────┐    ┌─────────────────┐    ┌─────────────┐  │
│  │ Services Subnet │    │ Services Subnet │    │Services Sub │  │
│  │  (10.2.0.0/16)  │    │  (10.2.0.0/16)  │    │(10.2.0.0/16)│  │
│  │                 │    │                 │    │             │  │
│  │   Firestore     │    │   Firestore     │    │ Firestore   │  │
│  │   Cloud Storage │    │   Cloud Storage │    │Cloud Storage│  │
│  └─────────────────┘    └─────────────────┘    └─────────────┘  │
└─────────────────────────────────────────────────────────────────┘

7.2 Private Service Connections

Service	Type	Purpose
Cloud Storage	Private endpoint	Object storage access
Firestore	Private endpoint	Metadata access
GKE	Private cluster	Cluster API access
Container Registry	Private endpoint	Container registry
Cloud Monitoring	Private endpoint	Monitoring and logging
Secret Manager	Private endpoint	Secrets access

8. CI/CD Pipeline

8.1 Infrastructure as Code (Terraform)

8.1.1 Terraform Structure

terraform/
├── environments/
│   ├── dev/
│   ├── staging/
│   └── production/
├── modules/
│   ├── gke/
│   ├── networking/
│   ├── security/
│   └── storage/
├── shared/
│   └── backend.tf
└── global/
    └── iam.tf

8.1.2 Terraform Pipeline (Cloud Build)

# cloudbuild.yaml
steps:
  # Terraform Init
  - name: 'hashicorp/terraform:1.6.0'
    entrypoint: 'sh'
    args:
      - '-c'
      - |
        cd terraform/environments/${_ENVIRONMENT}
        terraform init -backend-config="bucket=${_TF_STATE_BUCKET}"

  # Terraform Plan
  - name: 'hashicorp/terraform:1.6.0'
    entrypoint: 'sh'
    args:
      - '-c'
      - |
        cd terraform/environments/${_ENVIRONMENT}
        terraform plan -var-file="${_ENVIRONMENT}.tfvars" -out=tfplan

  # Terraform Apply (only on main branch)
  - name: 'hashicorp/terraform:1.6.0'
    entrypoint: 'sh'
    args:
      - '-c'
      - |
        if [ "${BRANCH_NAME}" = "main" ]; then
          cd terraform/environments/${_ENVIRONMENT}
          terraform apply -auto-approve tfplan
        else
          echo "Skipping apply for non-main branch"
        fi

substitutions:
  _ENVIRONMENT: 'production'
  _TF_STATE_BUCKET: 'energent-terraform-state'

options:
  logging: CLOUD_LOGGING_ONLY
  machineType: 'E2_HIGHCPU_8'

timeout: 1200s

8.2 Kubernetes GitOps (Flux)

8.2.1 Flux Configuration

# flux-system/gotk-sync.yaml
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: GitRepository
metadata:
  name: energent-k8s
  namespace: flux-system
spec:
  interval: 1m
  ref:
    branch: main
  url: https://github.com/energent-ai/k8s-manifests
---
apiVersion: kustomize.toolkit.fluxcd.io/v1beta2
kind: Kustomization
metadata:
  name: energent-apps
  namespace: flux-system
spec:
  interval: 10m
  path: './apps'
  prune: true
  sourceRef:
    kind: GitRepository
    name: energent-k8s
  validation: client

8.3 Serverless Deployment (Cloud Build)

8.3.1 Function Deployment Configuration

# cloudbuild-functions.yaml
steps:
  # Deploy Auth Function
  - name: 'gcr.io/google.com/cloudsdktool/cloud-sdk:latest'
    entrypoint: 'bash'
    args:
      - '-c'
      - |
        cd functions/auth
        gcloud functions deploy energent-auth-${_ENVIRONMENT} \
          --runtime python311 \
          --trigger-http \
          --entry-point auth_handler \
          --memory 512MB \
          --timeout 60s \
          --region ${_REGION} \
          --vpc-connector ${_VPC_CONNECTOR} \
          --set-env-vars ENVIRONMENT=${_ENVIRONMENT}

  # Deploy Billing Function
  - name: 'gcr.io/google.com/cloudsdktool/cloud-sdk:latest'
    entrypoint: 'bash'
    args:
      - '-c'
      - |
        cd functions/billing
        gcloud functions deploy energent-billing-${_ENVIRONMENT} \
          --runtime python311 \
          --trigger-topic billing-events \
          --entry-point billing_handler \
          --memory 1024MB \
          --timeout 300s \
          --region ${_REGION}

substitutions:
  _ENVIRONMENT: 'production'
  _REGION: 'us-central1'
  _VPC_CONNECTOR: 'energent-vpc-connector'

options:
  logging: CLOUD_LOGGING_ONLY

9. Monitoring & Observability

9.1 Cloud Monitoring Configuration

9.1.1 GKE Monitoring

# Cloud Monitoring for GKE
resource "google_monitoring_dashboard" "gke_dashboard" {
  dashboard_json = jsonencode({
    displayName = "Energent GKE Dashboard"
    mosaicLayout = {
      tiles = [
        {
          width = 6
          height = 4
          widget = {
            title = "GKE Cluster CPU Utilization"
            xyChart = {
              dataSets = [{
                timeSeriesQuery = {
                  timeSeriesFilter = {
                    filter = "resource.type=\"k8s_cluster\" AND metric.type=\"kubernetes.io/container/cpu/core_usage_time\""
                  }
                }
              }]
            }
          }
        }
      ]
    }
  })
}

# Log-based Metrics
resource "google_logging_metric" "error_rate" {
  name   = "energent_error_rate"
  filter = "resource.type=\"k8s_container\" AND resource.labels.namespace_name=\"energent-ai\" AND severity=\"ERROR\""

  metric_descriptor {
    metric_kind = "GAUGE"
    value_type  = "INT64"
    display_name = "Energent Error Rate"
  }
}

9.2 Application Metrics

Metric Category	Metrics	Target	Alert Threshold
Availability	Uptime, Health Checks	99.9%	< 99.5%
Performance	Response Time, Throughput	< 2s, > 1000 RPS	> 5s, < 500 RPS
Resource Usage	CPU, Memory, Storage	< 80%	> 90%
Error Rates	4xx, 5xx errors	< 1%	> 5%

9.3 Audit Logging

# Cloud Audit Logs Configuration
resource "google_project_iam_audit_config" "project_audit" {
  project = var.project_id
  service = "allServices"

  audit_log_config {
    log_type = "ADMIN_READ"
  }

  audit_log_config {
    log_type = "DATA_READ"
  }

  audit_log_config {
    log_type = "DATA_WRITE"
  }
}

# Log Sink for Security Events
resource "google_logging_project_sink" "security_sink" {
  name        = "energent-security-sink"
  destination = "storage.googleapis.com/${google_storage_bucket.audit_logs.name}"

  filter = "protoPayload.serviceName=\"container.googleapis.com\" OR protoPayload.serviceName=\"iam.googleapis.com\""

  unique_writer_identity = true
}

10. Deployment Process

10.1 Deployment Timeline

Phase	Duration	Activities	Stakeholders
Pre-Deployment	2-3 days	Infrastructure planning, security review	Customer IT, Security, Energent Solutions
Infrastructure	1-2 days	Terraform deployment, VPC setup	Customer DevOps, Energent Platform
GKE Cluster	0.5 day	Cluster provisioning, node pools	Customer DevOps, Energent Platform
Application	0.5 day	Flux deployment, application rollout	Energent Platform Team
Integration	1-2 days	IAM, monitoring, testing	Customer IT, Energent Support
Go-Live	0.5 day	Production cutover, validation	All stakeholders

10.2 Deployment Commands

10.2.1 Infrastructure Deployment

# Infrastructure Deployment with Terraform
cd terraform/environments/production
terraform init -backend-config="bucket=energent-terraform-state"
terraform plan -var-file="production.tfvars"
terraform apply -auto-approve

# Verify GKE cluster
gcloud container clusters get-credentials energent-production --region us-central1
kubectl get nodes

10.2.2 Application Deployment

# Install Flux GitOps
flux bootstrap github \
  --owner=energent-ai \
  --repository=k8s-manifests \
  --branch=main \
  --path=./clusters/production

# Deploy serverless components
gcloud builds submit --config cloudbuild-functions.yaml \
  --substitutions _ENVIRONMENT=production,_REGION=us-central1

# Verify deployment
kubectl get pods -n energent-ai
kubectl get ingress -n energent-ai

10.3 Deployment Validation

# Health check endpoints
curl -k https://api.energent.example.com/health
curl -k https://api.energent.example.com/metrics

# Kubernetes validation
kubectl top nodes
kubectl get hpa -n energent-ai
kubectl logs -n energent-ai -l app=energent-platform

11. Operations & Maintenance

11.1 Backup & Disaster Recovery

11.1.1 Backup Strategy

Component	Frequency	Retention	RTO	RPO
GKE Cluster State	Daily	30 days	< 4 hours	< 24 hours
Application Data	Real-time	90 days	< 1 hour	< 15 minutes
Configuration	On change	1 year	< 30 minutes	0
Audit Logs	Real-time	7 years	< 24 hours	0

11.1.2 Disaster Recovery Procedures

# GKE cluster backup using Velero
velero backup create energent-cluster-backup \
  --include-namespaces energent-ai \
  --storage-location gcp

# Firestore point-in-time recovery
gcloud firestore databases restore \
  --source-database=energent-metadata-production \
  --destination-database=energent-metadata-restored \
  --backup-time=2025-05-28T10:00:00Z

11.2 Scaling & Performance

11.2.1 Auto-Scaling Configuration

# Horizontal Pod Autoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: energent-platform-hpa
  namespace: energent-ai
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: energent-platform
  minReplicas: 3
  maxReplicas: 50
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80

11.3 Update & Maintenance

11.3.1 Rolling Updates

# GKE cluster update
gcloud container clusters upgrade energent-production \
  --master \
  --cluster-version 1.30 \
  --region us-central1

# Application rolling update via Flux
git commit -am "Update energent-platform to v2.1.0"
git push origin main
# Flux automatically detects and applies changes

12. Support & Escalation

12.1 Support Tiers

Tier	Response Time	Channels	Scope
L1 - Basic	< 4 hours	Email, Portal	General questions, documentation
L2 - Standard	< 2 hours	Phone, Email, Meet	Technical issues, integration support
L3 - Premium	< 1 hour	Phone, Meet, Video	Complex technical issues, architecture
L4 - Critical	< 30 minutes	Phone, SMS, Escalation	Production outages, security incidents

12.2 24/7 Support Coverage

Enterprise Support:

📧 Email: support@energent.ai

Emergency Escalation:

📧 Emergency Email: emergency@energent.ai

12.3 Service Level Agreements

Service	SLA	Penalty
Platform Availability	99.9% uptime	10% monthly credit per 0.1% shortfall
Response Time (P95)	< 2 seconds	5% monthly credit if > 5 seconds
Support Response	Per tier above	Escalation to next tier
Data Recovery	RTO < 4 hours	25% monthly credit if exceeded

Appendices

Appendix A: GCP Service Costs

Service	Estimated Monthly Cost	Scaling Factor
GKE Cluster	$75	Fixed per cluster
Compute Engine (3x n2-standard-4)	$850	Linear per node
Persistent Disks (300GB)	$60	Linear per GB
Cloud Storage (1TB)	$20	Linear per GB
Firestore	$120	Usage-based
Cloud Functions	$35	Request-based
Total Base Cost	~$1,160/month	For 100 tenants

Appendix B: Security Compliance Checklist

Appendix C: Troubleshooting Guide

Common Issues:

GKE Nodes Not Joining Cluster
- Verify service account permissions
- Check subnet routing and Cloud NAT
Application Pods CrashLooping
- Check resource limits and requests
- Verify persistent volume claims
Network Connectivity Issues
- Verify VPC connector configuration
- Check firewall rules

Document Classification: Public
Version: 3.0
Last Updated: 2025-05-28
Next Review: 2025-08-28
Contact: support@energent.ai