Changelog
Initial version
Skill Content
# GCP Cloud Architect
Design scalable, cost-effective Google Cloud architectures for startups and enterprises with infrastructure-as-code templates.
---
## Workflow
### Step 1: Gather Requirements
Collect application specifications:
```
- Application type (web app, mobile backend, data pipeline, SaaS)
- Expected users and requests per second
- Budget constraints (monthly spend limit)
- Team size and GCP experience level
- Compliance requirements (GDPR, HIPAA, SOC 2)
- Availability requirements (SLA, RPO/RTO)
```
### Step 2: Design Architecture
Run the architecture designer to get pattern recommendations:
```bash
python scripts/architecture_designer.py --input requirements.json
```
**Example output:**
```json
{
"recommended_pattern": "serverless_web",
"service_stack": ["Cloud Storage", "Cloud CDN", "Cloud Run", "Firestore", "Identity Platform"],
"estimated_monthly_cost_usd": 30,
"pros": ["Low ops overhead", "Pay-per-use", "Auto-scaling", "No cold starts on Cloud Run min instances"],
"cons": ["Vendor lock-in", "Regional limitations", "Eventual consistency with Firestore"]
}
```
Select from recommended patterns:
- **Serverless Web**: Cloud Storage + Cloud CDN + Cloud Run + Firestore
- **Microservices on GKE**: GKE Autopilot + Cloud SQL + Memorystore + Cloud Pub/Sub
- **Serverless Data Pipeline**: Pub/Sub + Dataflow + BigQuery + Looker
- **ML Platform**: Vertex AI + Cloud Storage + BigQuery + Cloud Functions
See `references/architecture_patterns.md` for detailed pattern specifications.
**Validation checkpoint:** Confirm the recommended pattern matches the team's operational maturity and compliance requirements before proceeding to Step 3.
### Step 3: Estimate Cost
Analyze estimated costs and optimization opportunities:
```bash
python scripts/cost_optimizer.py --resources current_setup.json --monthly-spend 2000
```
**Example output:**
```json
{
"current_monthly_usd": 2000,
"recommendations": [
{ "action": "Right-size Cloud SQL db-custom-4-16384 to db-custom-2-8192", "savings_usd": 380, "priority": "high" },
{ "action": "Purchase 1-yr committed use discount for GKE nodes", "savings_usd": 290, "priority": "high" },
{ "action": "Move Cloud Storage objects >90 days to Nearline", "savings_usd": 75, "priority": "medium" }
],
"total_potential_savings_usd": 745
}
```
Output includes:
- Monthly cost breakdown by service
- Right-sizing recommendations
- Committed use discount opportunities
- Sustained use discount analysis
- Potential monthly savings
Use the [GCP Pricing Calculator](https://cloud.google.com/products/calculator) for detailed estimates.
### Step 4: Generate IaC
Create infrastructure-as-code for the selected pattern:
```bash
python scripts/deployment_manager.py --app-name my-app --pattern serverless_web --region us-central1
```
**Example Terraform HCL output (Cloud Run + Firestore):**
```hcl
terraform {
required_providers {
google = {
source = "hashicorp/google"
version = "~> 5.0"
}
}
}
provider "google" {
project = var.project_id
region = var.region
}
variable "project_id" {
description = "GCP project ID"
type = string
}
variable "region" {
description = "GCP region"
type = string
default = "us-central1"
}
resource "google_cloud_run_v2_service" "api" {
name = "${var.environment}-${var.app_name}-api"
location = var.region
template {
containers {
image = "gcr.io/${var.project_id}/${var.app_name}:latest"
resources {
limits = {
cpu = "1000m"
memory = "512Mi"
}
}
env {
name = "FIRESTORE_PROJECT"
value = var.project_id
}
}
scaling {
min_instance_count = 0
max_instance_count = 10
}
}
}
resource "google_firestore_database" "default" {
project = var.project_id
name = "(default)"
location_id = var.region
type = "FIRESTORE_NATIVE"
}
```
**Example gcloud CLI deployment:**
```bash
# Deploy Cloud Run service
gcloud run deploy my-app-api \
--image gcr.io/$PROJECT_ID/my-app:latest \
--region us-central1 \
--platform managed \
--allow-unauthenticated \
--memory 512Mi \
--cpu 1 \
--min-instances 0 \
--max-instances 10
# Create Firestore database
gcloud firestore databases create --location=us-central1
```
> Full templates including Cloud CDN, Identity Platform, IAM, and Cloud Monitoring are generated by `deployment_manager.py` and also available in `references/architecture_patterns.md`.
### Step 5: Configure CI/CD
Set up automated deployment with Cloud Build or GitHub Actions:
```yaml
# cloudbuild.yaml
steps:
- name: 'gcr.io/cloud-builders/docker'
args: ['build', '-t', 'gcr.io/$PROJECT_ID/my-app:$COMMIT_SHA', '.']
- name: 'gcr.io/cloud-builders/docker'
args: ['push', 'gcr.io/$PROJECT_ID/my-app:$COMMIT_SHA']
- name: 'gcr.io/google.com/cloudsdktool/cloud-sdk'
entrypoint: gcloud
args:
- 'run'
- 'deploy'
- 'my-app-api'
- '--image=gcr.io/$PROJECT_ID/my-app:$COMMIT_SHA'
- '--region=us-central1'
- '--platform=managed'
images:
- 'gcr.io/$PROJECT_ID/my-app:$COMMIT_SHA'
```
```bash
# Connect repo and create trigger
gcloud builds triggers create github \
--repo-name=my-app \
--repo-owner=my-org \
--branch-pattern="^main$" \
--build-config=cloudbuild.yaml
```
### Step 6: Security Review
Verify security configuration:
```bash
# Review IAM bindings
gcloud projects get-iam-policy $PROJECT_ID --format=json
# Check service account permissions
gcloud iam service-accounts list --project=$PROJECT_ID
# Verify VPC Service Controls (if applicable)
gcloud access-context-manager perimeters list --policy=$POLICY_ID
```
**Security checklist:**
- IAM roles follow least privilege (prefer predefined roles over basic roles)
- Service accounts use Workload Identity for GKE
- VPC Service Controls configured for sensitive APIs
- Cloud KMS encryption keys for customer-managed encryption
- Cloud Audit Logs enabled for all admin activity
- Organization policies restrict public access
- Secret Manager used for all credentials
**If deployment fails:**
1. Check the failure reason:
```bash
gcloud run services describe my-app-api --region us-central1
gcloud logging read "resource.type=cloud_run_revision" --limit=20
```
2. Review Cloud Logging for application errors.
3. Fix the configuration or container image.
4. Redeploy:
```bash
gcloud run deploy my-app-api --image gcr.io/$PROJECT_ID/my-app:latest --region us-central1
```
**Common failure causes:**
- IAM permission errors -- verify service account roles and `--allow-unauthenticated` flag
- Quota exceeded -- request quota increase via IAM & Admin > Quotas
- Container startup failure -- check container logs and health check configuration
- Region not enabled -- enable the required APIs with `gcloud services enable`
---
## Tools
### architecture_designer.py
Recommends GCP services based on workload requirements.
```bash
python scripts/architecture_designer.py --input requirements.json --output design.json
```
**Input:** JSON with app type, scale, budget, compliance needs
**Output:** Recommended pattern, service stack, cost estimate, pros/cons
### cost_optimizer.py
Analyzes GCP resources for cost savings.
```bash
python scripts/cost_optimizer.py --resources inventory.json --monthly-spend 5000
```
**Output:** Recommendations for:
- Idle resource removal
- Machine type right-sizing
- Committed use discounts
- Storage class transitions
- Network egress optimization
### deployment_manager.py
Generates gcloud CLI deployment scripts and Terraform configurations.
```bash
python scripts/deployment_manager.py --app-name my-app --pattern serverless_web --region us-central1
```
**Output:** Production-ready deployment scripts with:
- Cloud Run or GKE deployment
- Firestore or Cloud SQL setup
- Identity Platform configuration
- IAM roles with least privilege
- Cloud Monitoring and Logging
---
## Quick Start
### Web App on Cloud Run (< $100/month)
```
Ask: "Design a serverless web backend for a mobile app with 1000 users"
Result:
- Cloud Run for API (auto-scaling, no cold start with min instances)
- Firestore for data (pay-per-operation)
- Identity Platform for authentication
- Cloud Storage + Cloud CDN for static assets
- Estimated: $15-40/month
```
### Microservices on GKE ($500-2000/month)
```
Ask: "Design a scalable architecture for a SaaS platform with 50k users"
Result:
- GKE Autopilot for containerized workloads
- Cloud SQL (PostgreSQL) with read replicas
- Memorystore (Redis) for session caching
- Cloud CDN for global delivery
- Cloud Build for CI/CD
- Multi-zone deployment
```
### Serverless Data Pipeline
```
Ask: "Design a real-time analytics pipeline for event data"
Result:
- Pub/Sub for event ingestion
- Dataflow (Apache Beam) for stream processing
- BigQuery for analytics and warehousing
- Looker for dashboards
- Cloud Functions for lightweight transforms
```
### ML Platform
```
Ask: "Design a machine learning platform for model training and serving"
Result:
- Vertex AI for training and prediction
- Cloud Storage for datasets and model artifacts
- BigQuery for feature store
- Cloud Functions for preprocessing triggers
- Cloud Monitoring for model drift detection
```
---
## Input Requirements
Provide these details for architecture design:
| Requirement | Description | Example |
|-------------|-------------|---------|
| Application type | What you're building | SaaS platform, mobile backend |
| Expected scale | Users, requests/sec | 10k users, 100 RPS |
| Budget | Monthly GCP limit | $500/month max |
| Team context | Size, GCP experience | 3 devs, intermediate |
| Compliance | Regulatory needs | HIPAA, GDPR, SOC 2 |
| Availability | Uptime requirements | 99.9% SLA, 1hr RPO |
**JSON Format:**
```json
{
"application_type": "saas_platform",
"expected_users": 10000,
"requests_per_second": 100,
"budget_monthly_usd": 500,
"team_size": 3,
"gcp_experience": "intermediate",
"compliance": ["SOC2"],
"availability_sla": "99.9%"
}
```
---
## Output Formats
### Architecture Design
- Pattern recommendation with rationale
- Service stack diagram (ASCII)
- Monthly cost estimate and trade-offs
### IaC Templates
- **Terraform HCL**: Production-ready Google provider configs
- **gcloud CLI**: Scripted deployment commands
- **Cloud Build YAML**: CI/CD pipeline definitions
### Cost Analysis
- Current spend breakdown with optimization recommendations
- Priority action list (high/medium/low) and implementation checklist
---
## Anti-Patterns
| Anti-Pattern | Why It Fails | Better Approach |
|---|---|---|
| Using default VPC for production | No isolation, shared firewall rules | Create custom VPC with private subnets |
| Over-provisioning GKE node pools | Wasted cost on idle capacity | Use GKE Autopilot or cluster autoscaler |
| Storing secrets in environment variables | Visible in Cloud Console, logs | Use Secret Manager with Workload Identity |
| Ignoring sustained use discounts | Missing 20-30% automatic savings | Right-size VMs for consistent baseline usage |
| Single-region deployment for SaaS | One region outage = full downtime | Multi-region with Cloud Load Balancing |
| BigQuery on-demand for heavy workloads | Unpredictable costs at scale | Use BigQuery slots (flat-rate) for consistent workloads |
| Running Cloud Functions for long tasks | 9-minute timeout, cold starts | Use Cloud Run for tasks > 60 seconds |
---
## Cross-References
| Skill | Relationship |
|-------|-------------|
| `engineering-team/aws-solution-architect` | AWS equivalent — same 6-step workflow, different services |
| `engineering-team/azure-cloud-architect` | Azure equivalent — completes the cloud trifecta |
| `engineering-team/senior-devops` | Broader DevOps scope — pipelines, monitoring, containerization |
| `engineering/terraform-patterns` | IaC implementation — use for Terraform modules targeting GCP |
| `engineering/ci-cd-pipeline-builder` | Pipeline construction — automates Cloud Build and deployment |
---
## Reference Documentation
| Document | Contents |
|----------|----------|
| `references/architecture_patterns.md` | 6 patterns: serverless, GKE microservices, three-tier, data pipeline, ML platform, multi-region |
| `references/service_selection.md` | Decision matrices for compute, database, storage, messaging |
| `references/best_practices.md` | Naming, labels, IAM, networking, monitoring, disaster recovery |