{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "gcp-title",
   "metadata": {},
   "source": "# Google Cloud Platform (GCP) Tutorial\n\nThis tutorial demonstrates how to use Clustrix with Google Cloud Platform (GCP) infrastructure for scalable distributed computing.\n\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ContextLab/clustrix/blob/master/docs/source/notebooks/gcp_cloud_tutorial.ipynb)\n\n## Overview\n\nGCP provides several services that integrate well with Clustrix:\n\n- **Compute Engine**: Virtual machines for compute clusters\n- **Google Kubernetes Engine (GKE)**: Managed Kubernetes clusters\n- **Batch**: Managed job scheduling service\n- **Cloud Run**: Serverless container platform\n- **Vertex AI**: Machine learning platform\n- **Cloud Storage**: Object storage for data and results\n- **VPC**: Network isolation and security\n- **Preemptible VMs**: Cost-effective compute instances\n\n## Complete Setup Guide from Scratch\n\n### Step 1: Google Cloud Account Setup\n\n1. **Create Google Cloud Account**:\n   - Go to [Google Cloud Console](https://console.cloud.google.com/)\n   - Sign up with your Google account or create a new one\n   - Accept the terms of service\n\n2. **Enable Billing**:\n   - Navigate to Billing in the Google Cloud Console\n   - Create a billing account and add a payment method\n   - **Important**: New users get $300 in free credits\n   - Set up billing alerts to avoid unexpected charges\n\n3. **Create a New Project**:\n   - Go to the Project Selector in the console\n   - Click \"New Project\"\n   - Choose a unique project ID (e.g., `my-clustrix-project-123`)\n   - Enable billing for this project\n\n### Step 2: Install Google Cloud SDK (gcloud CLI)\n\n**On macOS:**\n```bash\n# Using Homebrew (recommended)\nbrew install google-cloud-sdk\n\n# Or download installer\ncurl https://sdk.cloud.google.com | bash\nexec -l $SHELL\n```\n\n**On Linux:**\n```bash\n# Download and install\ncurl https://sdk.cloud.google.com | bash\nexec -l $SHELL\n\n# Or use package manager (Ubuntu/Debian)\nsudo apt-get install google-cloud-sdk\n```\n\n**On Windows:**\n- Download the installer from [Google Cloud SDK page](https://cloud.google.com/sdk/docs/install)\n- Run the installer and follow instructions\n\n### Step 3: Enable Required APIs\n\nEnable the necessary Google Cloud APIs for this tutorial:\n\n```bash\n# Set your project ID\nexport PROJECT_ID=\"your-project-id-here\"\ngcloud config set project $PROJECT_ID\n\n# Enable required APIs\ngcloud services enable compute.googleapis.com\ngcloud services enable container.googleapis.com\ngcloud services enable batch.googleapis.com\ngcloud services enable aiplatform.googleapis.com\ngcloud services enable storage.googleapis.com\n```\n\n## Prerequisites Checklist\n\nBefore proceeding, ensure you have:\n\n- [ ] Google Cloud account with billing enabled\n- [ ] Google Cloud project created\n- [ ] Google Cloud SDK (gcloud) installed locally\n- [ ] Required APIs enabled (compute, container, batch, storage, aiplatform)\n- [ ] SSH key pair for VM access (we'll create this below)\n- [ ] Basic understanding of command line usage"
  },
  {
   "cell_type": "markdown",
   "id": "installation",
   "metadata": {},
   "source": [
    "## Installation and Setup\n",
    "\n",
    "Install Clustrix with GCP dependencies:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4wiyb0urchu",
   "source": "### Step 4: SSH Key Setup\n\nCreate SSH keys for secure access to your GCP instances:\n\n```bash\n# Generate SSH key pair (if you don't have one)\nssh-keygen -t rsa -b 4096 -C \"your-email@example.com\" -f ~/.ssh/gcp_key\n\n# Add the public key to GCP\ngcloud compute os-login ssh-keys add --key-file=~/.ssh/gcp_key.pub\n\n# Or add to project metadata (alternative method)\ngcloud compute project-info add-metadata --metadata-from-file ssh-keys=~/.ssh/gcp_key.pub\n```\n\n**Note**: If you're using Google Cloud Shell, SSH keys are automatically managed.",
   "metadata": {}
  },
  {
   "cell_type": "code",
   "id": "install",
   "metadata": {},
   "outputs": [],
   "source": "# Install Clustrix with GCP support\n!pip install clustrix google-cloud-compute google-cloud-storage google-auth google-auth-oauthlib\n\n# Import required libraries\nimport clustrix\nfrom clustrix import cluster, configure\nfrom google.cloud import compute_v1\nfrom google.cloud import storage\nfrom google.auth import default\nimport os\nimport numpy as np\nimport time\nimport json",
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "id": "gcp-authentication",
   "metadata": {},
   "source": "## GCP Authentication Setup\n\nConfigure your GCP credentials. Choose the method that best fits your environment:\n\n### Option 1: gcloud CLI Authentication (Recommended for Local Development)\n\nThis method uses your personal Google account credentials:"
  },
  {
   "cell_type": "code",
   "id": "gcloud-auth",
   "metadata": {},
   "outputs": [],
   "source": "# Initial authentication and project setup\n!gcloud auth login\n!gcloud auth application-default login\n\n# Set your project ID (replace with your actual project ID)\nPROJECT_ID = \"your-project-id-here\"  # Replace this!\n!gcloud config set project {PROJECT_ID}\n\n# Verify authentication and project setup\n!gcloud auth list\n!gcloud config get-value project\n!gcloud projects describe {PROJECT_ID}",
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "id": "gcp-service-account",
   "metadata": {},
   "source": "### Option 2: Service Account Authentication (Recommended for Production)\n\nFor production environments, create and use a service account with specific permissions:"
  },
  {
   "cell_type": "code",
   "id": "service-account",
   "metadata": {},
   "outputs": [],
   "source": "# Test GCP connection\ntry:\n    credentials, project_id = default()\n    print(f\"\u2713 Successfully authenticated with project: {project_id}\")\n    \n    # Test compute API\n    compute_client = compute_v1.InstancesClient()\n    print(\"\u2713 Compute Engine API access confirmed\")\n    \n    # Test storage API\n    storage_client = storage.Client()\n    print(\"\u2713 Cloud Storage API access confirmed\")\n    \nexcept Exception as e:\n    print(f\"\u274c GCP authentication failed: {e}\")\n    print(\"Please check your authentication setup and try again.\")",
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "id": "hc7w51mwb54",
   "source": "**Service Account Setup (Production Environments)**\n\nFor production use, create a service account with specific permissions:\n\n```bash\n# Create service account\ngcloud iam service-accounts create clustrix-service-account \\\n  --description=\"Service account for Clustrix operations\" \\\n  --display-name=\"Clustrix Service Account\"\n\n# Grant necessary permissions\ngcloud projects add-iam-policy-binding YOUR_PROJECT_ID \\\n  --member=\"serviceAccount:clustrix-service-account@YOUR_PROJECT_ID.iam.gserviceaccount.com\" \\\n  --role=\"roles/compute.admin\"\n\ngcloud projects add-iam-policy-binding YOUR_PROJECT_ID \\\n  --member=\"serviceAccount:clustrix-service-account@YOUR_PROJECT_ID.iam.gserviceaccount.com\" \\\n  --role=\"roles/storage.admin\"\n\n# Create and download service account key\ngcloud iam service-accounts keys create ~/clustrix-service-account-key.json \\\n  --iam-account=clustrix-service-account@YOUR_PROJECT_ID.iam.gserviceaccount.com\n\n# Set the environment variable\nexport GOOGLE_APPLICATION_CREDENTIALS=\"/path/to/clustrix-service-account-key.json\"\n```",
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "id": "eegtd4c0im",
   "source": "**Important**: Make sure you have completed authentication setup and enabled all required APIs before proceeding. \n\nIf authentication fails, double-check that:\n- Your project ID is correct\n- Billing is enabled for your project  \n- Required APIs are enabled\n- Your credentials are properly configured",
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "id": "compute-engine-setup",
   "metadata": {},
   "source": [
    "## Method 1: Google Compute Engine Configuration\n",
    "\n",
    "### Create Compute Engine Instance for Clustrix"
   ]
  },
  {
   "cell_type": "code",
   "id": "compute-engine-creation",
   "metadata": {},
   "outputs": [],
   "source": "def create_clustrix_compute_instance(project_id, zone='us-central1-a', machine_type='e2-standard-4'):\n    \"\"\"\n    Create a GCP Compute Engine instance configured for Clustrix.\n    \n    Args:\n        project_id: GCP project ID\n        zone: GCP zone for the instance\n        machine_type: Machine type (CPU/memory configuration)\n    \n    Returns:\n        Instance configuration and gcloud commands\n    \"\"\"\n    \n    # Startup script for instance initialization\n    startup_script = '''\n#!/bin/bash\n\n# Update system\napt-get update\napt-get install -y python3 python3-pip git htop curl\n\n# Install clustrix and common packages\npip3 install clustrix numpy scipy pandas scikit-learn matplotlib\n\n# Install uv for faster package management\ncurl -LsSf https://astral.sh/uv/install.sh | sh\nsource ~/.cargo/env\n\n# Create clustrix user\nuseradd -m -s /bin/bash clustrix\nusermod -aG sudo clustrix\necho \"clustrix ALL=(ALL) NOPASSWD:ALL\" >> /etc/sudoers\n\n# Setup SSH for clustrix user\nmkdir -p /home/clustrix/.ssh\n# Copy SSH keys from default user\nif [ -d \"/home/$(logname)/.ssh\" ]; then\n    cp -r /home/$(logname)/.ssh/* /home/clustrix/.ssh/\n    chown -R clustrix:clustrix /home/clustrix/.ssh\n    chmod 700 /home/clustrix/.ssh\n    chmod 600 /home/clustrix/.ssh/authorized_keys 2>/dev/null || true\nfi\n\n# Create working directory\nmkdir -p /tmp/clustrix\nchown clustrix:clustrix /tmp/clustrix\n\n# Install Google Cloud SDK for clustrix user\ncurl https://sdk.cloud.google.com | bash\nexec -l $SHELL\n\n# Log completion\necho \"Clustrix setup completed at $(date)\" >> /var/log/clustrix-setup.log\n'''\n    \n    # gcloud commands for instance creation\n    gcloud_commands = f\"\"\"\n# Create firewall rule for SSH (if not exists)\ngcloud compute firewall-rules create allow-ssh \\\n  --allow tcp:22 \\\n  --source-ranges 0.0.0.0/0 \\\n  --description \"Allow SSH access\" \\\n  --project {project_id} || echo \"SSH rule already exists\"\n\n# Create the instance\ngcloud compute instances create clustrix-instance \\\n  --project={project_id} \\\n  --zone={zone} \\\n  --machine-type={machine_type} \\\n  --network-interface=network-tier=PREMIUM,subnet=default \\\n  --maintenance-policy=MIGRATE \\\n  --provisioning-model=STANDARD \\\n  --service-account=default \\\n  --scopes=https://www.googleapis.com/auth/cloud-platform \\\n  --tags=clustrix,http-server,https-server \\\n  --create-disk=auto-delete=yes,boot=yes,device-name=clustrix-instance,image=projects/ubuntu-os-cloud/global/images/family/ubuntu-2204-lts,mode=rw,size=50,type=projects/{project_id}/zones/{zone}/diskTypes/pd-balanced \\\n  --no-shielded-secure-boot \\\n  --shielded-vtpm \\\n  --shielded-integrity-monitoring \\\n  --labels=purpose=clustrix,environment=tutorial \\\n  --reservation-affinity=any \\\n  --metadata-from-file startup-script=startup-script.sh\n\n# Get the external IP\ngcloud compute instances describe clustrix-instance \\\n  --project={project_id} \\\n  --zone={zone} \\\n  --format='get(networkInterfaces[0].accessConfigs[0].natIP)'\n\n# SSH to the instance (after startup script completes)\ngcloud compute ssh clustrix-instance \\\n  --project={project_id} \\\n  --zone={zone}\n\"\"\"\n    \n    return {\n        'project_id': project_id,\n        'zone': zone,\n        'machine_type': machine_type,\n        'instance_name': 'clustrix-instance',\n        'gcloud_commands': gcloud_commands,\n        'startup_script': startup_script\n    }\n\n# Example usage - replace with your actual project ID\ninstance_config = create_clustrix_compute_instance(\n    project_id=PROJECT_ID,  # Using the PROJECT_ID variable from above\n    zone='us-central1-a',\n    machine_type='e2-standard-4'  # 4 vCPUs, 16 GB RAM\n)\n\n# Display the configuration results\nprint(\"=== GCP Compute Engine Instance Configuration ===\")\nprint(f\"Project ID: {instance_config['project_id']}\")\nprint(f\"Zone: {instance_config['zone']}\")\nprint(f\"Machine Type: {instance_config['machine_type']}\")\nprint(f\"Instance Name: {instance_config['instance_name']}\")\nprint(\"\\n=== Next Steps ===\")\nprint(\"1. Save the startup script to 'startup-script.sh'\")\nprint(\"2. Execute the gcloud commands shown above\")\nprint(\"3. Wait 3-5 minutes for instance initialization\")\nprint(\"4. Get the external IP and configure Clustrix\")",
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "id": "rs47wjva5yi",
   "source": "### GCP Compute Engine Instance Creation\n\nThe above code defines a function that creates a GCP Compute Engine instance optimized for Clustrix workloads. The function returns:\n\n- **gcloud commands**: Complete CLI commands to create the instance\n- **startup script**: Automated setup script that configures the instance\n\nThe configuration includes:\n- Ubuntu 22.04 LTS base image\n- Pre-installed Python packages and Clustrix\n- Clustrix user account with sudo privileges  \n- SSH key setup and working directories\n- 50GB balanced persistent disk\n- Appropriate firewall rules and metadata",
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "id": "nrfay0eolc",
   "source": "**Next Steps**: \n\n1. **Save the startup script** to a file named `startup-script.sh` in your current directory\n2. **Execute the gcloud commands** shown above to create your instance\n3. **Wait for the instance to fully initialize** (startup script takes 3-5 minutes)\n4. **Get the external IP** using the describe command shown above\n5. **Test SSH access** to ensure the instance is ready for Clustrix",
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "id": "clustrix-gcp-config",
   "metadata": {},
   "source": [
    "### Configure Clustrix for Compute Engine"
   ]
  },
  {
   "cell_type": "code",
   "id": "config-gcp-compute",
   "metadata": {},
   "outputs": [],
   "source": "# Get the external IP of your created instance\n# Replace with the actual external IP from your instance\nINSTANCE_EXTERNAL_IP = \"YOUR_INSTANCE_EXTERNAL_IP\"  # Replace this!\n\n# Configure Clustrix to use your Compute Engine instance\nconfigure(\n    cluster_type=\"ssh\",\n    cluster_host=INSTANCE_EXTERNAL_IP,\n    username=\"clustrix\",  # or your default user\n    key_file=\"~/.ssh/gcp_key\",  # path to your SSH private key\n    remote_work_dir=\"/tmp/clustrix\",\n    package_manager=\"auto\",  # Will use uv if available, pip otherwise\n    default_cores=4,\n    default_memory=\"8GB\",\n    default_time=\"01:00:00\"\n)\n\n# Verify configuration\nif INSTANCE_EXTERNAL_IP != \"YOUR_INSTANCE_EXTERNAL_IP\":\n    print(f\"\u2713 Clustrix configured for GCP Compute Engine\")\n    print(f\"  Host: {INSTANCE_EXTERNAL_IP}\")\n    print(f\"  SSH Key: ~/.ssh/gcp_key\")\n    print(f\"  Remote Work Dir: /tmp/clustrix\")\nelse:\n    print(\"\u26a0\ufe0f  Please replace INSTANCE_EXTERNAL_IP with your actual IP address\")",
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "id": "6qtk506dlio",
   "source": "**Important Configuration Notes**:\n\n- Replace `YOUR_INSTANCE_EXTERNAL_IP` with the actual external IP address from your Compute Engine instance\n- Use the SSH key path that corresponds to your setup (either `~/.ssh/gcp_key` if you created one following this tutorial, or `~/.ssh/google_compute_engine` for gcloud-generated keys)\n- The `clustrix` user was created by the startup script with appropriate permissions\n- If you encounter connection issues, ensure your firewall rules allow SSH access from your IP address",
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "id": "gcp-example",
   "metadata": {},
   "source": [
    "### Example: Remote Computation on Compute Engine"
   ]
  },
  {
   "cell_type": "code",
   "id": "gcp-compute-example",
   "metadata": {},
   "outputs": [],
   "source": "# Example: GCP Data Analysis\n@cluster(cores=2, memory=\"4GB\")\ndef gcp_data_analysis(dataset_size=10000, analysis_type='regression'):\n    \"\"\"Perform data analysis on GCP Compute Engine.\"\"\"\n    import numpy as np\n    from sklearn.model_selection import train_test_split\n    from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier\n    from sklearn.metrics import mean_squared_error, accuracy_score\n    from sklearn.datasets import make_regression, make_classification\n    import time\n    \n    start_time = time.time()\n    \n    # Generate synthetic dataset\n    if analysis_type == 'regression':\n        X, y = make_regression(\n            n_samples=dataset_size,\n            n_features=20,\n            noise=0.1,\n            random_state=42\n        )\n        model = RandomForestRegressor(n_estimators=100, random_state=42, n_jobs=-1)\n        metric_name = 'rmse'\n    else:\n        X, y = make_classification(\n            n_samples=dataset_size,\n            n_features=20,\n            n_classes=3,\n            random_state=42\n        )\n        model = RandomForestClassifier(n_estimators=100, random_state=42, n_jobs=-1)\n        metric_name = 'accuracy'\n    \n    # Split data\n    X_train, X_test, y_train, y_test = train_test_split(\n        X, y, test_size=0.2, random_state=42\n    )\n    \n    # Train model\n    training_start = time.time()\n    model.fit(X_train, y_train)\n    training_time = time.time() - training_start\n    \n    # Evaluate\n    y_pred = model.predict(X_test)\n    \n    if analysis_type == 'regression':\n        metric_value = np.sqrt(mean_squared_error(y_test, y_pred))\n    else:\n        metric_value = accuracy_score(y_test, y_pred)\n    \n    total_time = time.time() - start_time\n    \n    return {\n        'analysis_type': analysis_type,\n        'dataset_size': dataset_size,\n        'training_time': training_time,\n        'total_time': total_time,\n        metric_name: metric_value,\n        'feature_importance': model.feature_importances_[:5].tolist(),  # Top 5\n        'training_samples': len(X_train),\n        'test_samples': len(X_test)\n    }\n\n# Example: Parallel Computation\n@cluster(cores=4, memory=\"8GB\")\ndef gcp_parallel_computation(n_iterations=1000):\n    \"\"\"Basic parallel computation example.\"\"\"\n    import numpy as np\n    import time\n    \n    start_time = time.time()\n    \n    # Simulate CPU-intensive work\n    results = []\n    for i in range(n_iterations):\n        # Monte Carlo pi estimation\n        points = np.random.random((1000, 2))\n        inside_circle = np.sum((points**2).sum(axis=1) <= 1)\n        pi_estimate = 4 * inside_circle / 1000\n        results.append(pi_estimate)\n    \n    computation_time = time.time() - start_time\n    final_pi_estimate = np.mean(results)\n    \n    return {\n        'iterations': n_iterations,\n        'pi_estimate': final_pi_estimate,\n        'computation_time': computation_time,\n        'accuracy': abs(final_pi_estimate - np.pi)\n    }\n\nprint(\"\u2713 GCP computation examples defined\")\nprint(\"\\n\ud83d\udcdd Example usage:\")\nprint(\"# Data analysis:\")\nprint(\"# result = gcp_data_analysis(dataset_size=50000, analysis_type='classification')\")\nprint(\"# print(f'Accuracy: {result[\\\"accuracy\\\"]:.4f}')\")\nprint(\"#\")\nprint(\"# Parallel computation:\")\nprint(\"# result = gcp_parallel_computation(n_iterations=5000)\")\nprint(\"# print(f'Pi estimate: {result[\\\"pi_estimate\\\"]:.6f}')\")\n\n# Example execution (commented out - uncomment after setup):\n# result = gcp_data_analysis(dataset_size=5000, analysis_type='classification')\n# print(f\"\u2713 Analysis completed: {result['accuracy']:.4f} accuracy\")\n# print(f\"\u23f1\ufe0f  Training time: {result['training_time']:.2f} seconds\")",
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "id": "gke-setup",
   "metadata": {},
   "source": [
    "## Method 2: Google Kubernetes Engine (GKE) Configuration\n",
    "\n",
    "GKE provides managed Kubernetes clusters ideal for containerized Clustrix workloads:"
   ]
  },
  {
   "cell_type": "code",
   "id": "gke-cluster-setup",
   "metadata": {},
   "outputs": [],
   "source": "def setup_gke_cluster_for_clustrix(project_id, cluster_name='clustrix-cluster', zone='us-central1-a'):\n    \"\"\"\n    Setup GKE cluster optimized for Clustrix workloads.\n    \"\"\"\n    \n    gke_commands = f\"\"\"\n# Enable required APIs\ngcloud services enable container.googleapis.com \\\n  --project {project_id}\n\n# Create GKE cluster with auto-scaling\ngcloud container clusters create {cluster_name} \\\n  --project {project_id} \\\n  --zone {zone} \\\n  --machine-type e2-standard-4 \\\n  --num-nodes 1 \\\n  --enable-autoscaling \\\n  --min-nodes 0 \\\n  --max-nodes 10 \\\n  --enable-autorepair \\\n  --enable-autoupgrade \\\n  --disk-size 50GB \\\n  --disk-type pd-ssd \\\n  --enable-network-policy \\\n  --enable-ip-alias \\\n  --labels purpose=clustrix,environment=tutorial\n\n# Get cluster credentials\ngcloud container clusters get-credentials {cluster_name} \\\n  --project {project_id} \\\n  --zone {zone}\n\n# Verify cluster access\nkubectl get nodes\n\n# Create clustrix namespace\nkubectl create namespace clustrix\n\n# Set as default namespace\nkubectl config set-context --current --namespace=clustrix\n\"\"\"\n    \n    # Clustrix job template for Kubernetes\n    k8s_job_template = \"\"\"\napiVersion: batch/v1\nkind: Job\nmetadata:\n  name: clustrix-job-${JOB_ID}\n  namespace: clustrix\nspec:\n  template:\n    spec:\n      restartPolicy: Never\n      containers:\n      - name: clustrix-worker\n        image: python:3.11-slim\n        command: [\"bash\", \"-c\"]\n        args:\n        - |\n          pip install clustrix numpy scipy pandas scikit-learn\n          python -c \"\n          import pickle\n          import sys\n          \n          # Load and execute function\n          with open('/data/function_data.pkl', 'rb') as f:\n              data = pickle.load(f)\n          \n          func = pickle.loads(data['function'])\n          args = pickle.loads(data['args'])\n          kwargs = pickle.loads(data['kwargs'])\n          \n          try:\n              result = func(*args, **kwargs)\n              with open('/data/result.pkl', 'wb') as f:\n                  pickle.dump(result, f)\n          except Exception as e:\n              with open('/data/error.pkl', 'wb') as f:\n                  pickle.dump({'error': str(e)}, f)\n              raise\n          \"\n        resources:\n          requests:\n            memory: \"2Gi\"\n            cpu: \"1\"\n          limits:\n            memory: \"4Gi\"\n            cpu: \"2\"\n        volumeMounts:\n        - name: job-data\n          mountPath: /data\n      volumes:\n      - name: job-data\n        persistentVolumeClaim:\n          claimName: clustrix-pvc\n  backoffLimit: 3\n\"\"\"\n    \n    return {\n        'cluster_name': cluster_name,\n        'project_id': project_id,\n        'zone': zone,\n        'setup_commands': gke_commands,\n        'job_template': k8s_job_template\n    }\n\ndef configure_clustrix_for_gke(cluster_endpoint, cluster_name):\n    \"\"\"Configure Clustrix to use GKE cluster.\"\"\"\n    configure(\n        cluster_type=\"kubernetes\",\n        cluster_host=cluster_endpoint,\n        # For GKE, authentication is handled via kubectl config\n        remote_work_dir=\"/tmp/clustrix\",\n        package_manager=\"pip\",  # Container-based, pip is fine\n        default_cores=2,\n        default_memory=\"4GB\",\n        default_time=\"01:00:00\"\n    )\n    print(f\"\u2713 Configured Clustrix for GKE cluster: {cluster_name}\")\n\n# Create GKE configuration\ngke_config = setup_gke_cluster_for_clustrix(\n    project_id=PROJECT_ID,\n    cluster_name='clustrix-cluster'\n)\n\nprint(\"=== GKE Cluster Setup Commands ===\")\nprint(gke_config['setup_commands'])\nprint(\"\\n=== Kubernetes Job Template ===\")\nprint(gke_config['job_template'])\nprint(\"\\n\ud83d\udcdd Note: GKE integration requires additional implementation in Clustrix.\")\nprint(\"Current Clustrix supports basic Kubernetes, but GKE-specific features need custom setup.\")",
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "id": "gcp-batch",
   "metadata": {},
   "source": [
    "## Method 3: Google Cloud Batch\n",
    "\n",
    "Google Cloud Batch provides managed job scheduling for large-scale workloads:"
   ]
  },
  {
   "cell_type": "code",
   "id": "gcp-batch-setup",
   "metadata": {},
   "outputs": [],
   "source": "def setup_gcp_batch_environment(project_id, region='us-central1'):\n    \"\"\"\n    Setup Google Cloud Batch for Clustrix workloads.\n    \"\"\"\n    \n    batch_setup_commands = f\"\"\"\n# Enable Batch API\ngcloud services enable batch.googleapis.com \\\n  --project {project_id}\n\n# Create a service account for Batch jobs\ngcloud iam service-accounts create clustrix-batch-sa \\\n  --project {project_id} \\\n  --description=\"Service account for Clustrix Batch jobs\" \\\n  --display-name=\"Clustrix Batch Service Account\"\n\n# Grant necessary permissions\ngcloud projects add-iam-policy-binding {project_id} \\\n  --member=\"serviceAccount:clustrix-batch-sa@{project_id}.iam.gserviceaccount.com\" \\\n  --role=\"roles/batch.jobsEditor\"\n\ngcloud projects add-iam-policy-binding {project_id} \\\n  --member=\"serviceAccount:clustrix-batch-sa@{project_id}.iam.gserviceaccount.com\" \\\n  --role=\"roles/storage.objectAdmin\"\n\n# Create Cloud Storage bucket for job data\ngsutil mb -p {project_id} -l {region} gs://{project_id}-clustrix-batch\n\"\"\"\n    \n    # Batch job configuration template\n    batch_job_config = {\n        \"taskGroups\": [\n            {\n                \"taskSpec\": {\n                    \"runnables\": [\n                        {\n                            \"script\": {\n                                \"text\": f\"\"\"\n#!/bin/bash\nset -e\n\n# Install required packages\npip3 install clustrix numpy scipy pandas scikit-learn\n\n# Download job data from Cloud Storage\ngsutil cp gs://{project_id}-clustrix-batch/jobs/${{BATCH_JOB_ID}}/function_data.pkl .\n\n# Execute the function\npython3 -c \"\nimport pickle\nimport traceback\n\ntry:\n    with open('function_data.pkl', 'rb') as f:\n        data = pickle.load(f)\n    \n    func = pickle.loads(data['function'])\n    args = pickle.loads(data['args'])\n    kwargs = pickle.loads(data['kwargs'])\n    \n    result = func(*args, **kwargs)\n    \n    with open('result.pkl', 'wb') as f:\n        pickle.dump(result, f)\n        \nexcept Exception as e:\n    with open('error.pkl', 'wb') as f:\n        pickle.dump({{\n            'error': str(e),\n            'traceback': traceback.format_exc()\n        }}, f)\n    raise\n\"\n\n# Upload results to Cloud Storage\ngsutil cp result.pkl gs://{project_id}-clustrix-batch/jobs/${{BATCH_JOB_ID}}/result.pkl || \\\ngsutil cp error.pkl gs://{project_id}-clustrix-batch/jobs/${{BATCH_JOB_ID}}/error.pkl\n\"\"\"\n                            }\n                        }\n                    ],\n                    \"computeResource\": {\n                        \"cpuMilli\": 2000,  # 2 CPUs\n                        \"memoryMib\": 4096  # 4 GB RAM\n                    },\n                    \"maxRetryCount\": 2,\n                    \"maxRunDuration\": \"3600s\"  # 1 hour\n                },\n                \"taskCount\": 1\n            }\n        ],\n        \"allocationPolicy\": {\n            \"instances\": [\n                {\n                    \"instanceTemplate\": {\n                        \"machineType\": \"e2-standard-2\",\n                        \"provisioningModel\": \"STANDARD\"\n                    }\n                }\n            ]\n        },\n        \"labels\": {\n            \"purpose\": \"clustrix\",\n            \"environment\": \"tutorial\"\n        },\n        \"logsPolicy\": {\n            \"destination\": \"CLOUD_LOGGING\"\n        }\n    }\n    \n    return {\n        'project_id': project_id,\n        'region': region,\n        'bucket_name': f'{project_id}-clustrix-batch',\n        'service_account': f'clustrix-batch-sa@{project_id}.iam.gserviceaccount.com',\n        'job_config': batch_job_config,\n        'setup_commands': batch_setup_commands\n    }\n\n# Create Batch configuration\nbatch_config = setup_gcp_batch_environment(PROJECT_ID)\n\nprint(\"=== Google Cloud Batch Setup Commands ===\")\nprint(batch_config['setup_commands'])\nprint(\"\\n=== Batch Job Configuration ===\")\nprint(json.dumps(batch_config['job_config'], indent=2))\nprint(\"\\n\ud83d\udca1 Google Cloud Batch provides excellent integration for large-scale Clustrix workloads.\")",
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "id": "cloud-storage",
   "metadata": {},
   "source": [
    "## Data Management with Google Cloud Storage"
   ]
  },
  {
   "cell_type": "code",
   "id": "cloud-storage-integration",
   "metadata": {},
   "outputs": [],
   "source": "@cluster(cores=2, memory=\"4GB\")\ndef process_gcs_data(bucket_name, input_blob, output_blob, project_id=None):\n    \"\"\"Process data from Google Cloud Storage and save results back.\"\"\"\n    from google.cloud import storage\n    import numpy as np\n    import pickle\n    import io\n    import time\n    \n    # Initialize Cloud Storage client\n    storage_client = storage.Client(project=project_id)\n    bucket = storage_client.bucket(bucket_name)\n    \n    # Download data from Cloud Storage\n    input_blob_obj = bucket.blob(input_blob)\n    data_bytes = input_blob_obj.download_as_bytes()\n    data = pickle.loads(data_bytes)\n    \n    # Process the data\n    processed_data = {\n        'original_shape': data.shape if hasattr(data, 'shape') else len(data) if hasattr(data, '__len__') else 'scalar',\n        'mean': float(np.mean(data)) if hasattr(data, '__iter__') else float(data),\n        'std': float(np.std(data)) if hasattr(data, '__iter__') else 0.0,\n        'max': float(np.max(data)) if hasattr(data, '__iter__') else float(data),\n        'min': float(np.min(data)) if hasattr(data, '__iter__') else float(data),\n        'processing_timestamp': time.time(),\n        'processed_on': 'gcp-compute-engine',\n        'data_type': str(type(data).__name__)\n    }\n    \n    # Advanced processing based on data type\n    if hasattr(data, 'shape') and len(data.shape) >= 2:\n        # Matrix operations\n        processed_data.update({\n            'matrix_rank': int(np.linalg.matrix_rank(data)) if data.shape[0] == data.shape[1] else 'non_square',\n            'frobenius_norm': float(np.linalg.norm(data, 'fro')),\n            'condition_number': float(np.linalg.cond(data)) if data.shape[0] == data.shape[1] else None\n        })\n    \n    # Upload results to Cloud Storage\n    output_bytes = pickle.dumps(processed_data)\n    output_blob_obj = bucket.blob(output_blob)\n    output_blob_obj.upload_from_string(output_bytes)\n    \n    return f\"Processed data saved to gs://{bucket_name}/{output_blob}\"\n\n# Utility functions for Google Cloud Storage\ndef upload_to_gcs(data, bucket_name, blob_name, project_id=None):\n    \"\"\"Upload data to Google Cloud Storage.\"\"\"\n    storage_client = storage.Client(project=project_id)\n    bucket = storage_client.bucket(bucket_name)\n    blob = bucket.blob(blob_name)\n    \n    data_bytes = pickle.dumps(data)\n    blob.upload_from_string(data_bytes)\n    return f\"gs://{bucket_name}/{blob_name}\"\n\ndef download_from_gcs(bucket_name, blob_name, project_id=None):\n    \"\"\"Download data from Google Cloud Storage.\"\"\"\n    storage_client = storage.Client(project=project_id)\n    bucket = storage_client.bucket(bucket_name)\n    blob = bucket.blob(blob_name)\n    \n    data_bytes = blob.download_as_bytes()\n    return pickle.loads(data_bytes)\n\ndef create_gcs_bucket_for_clustrix(project_id, bucket_name, location='us-central1'):\n    \"\"\"Create a Cloud Storage bucket for Clustrix data.\"\"\"\n    gcs_commands = f\"\"\"\n# Create bucket with appropriate settings\ngsutil mb -p {project_id} -l {location} gs://{bucket_name}\n\n# Set lifecycle policy to delete temporary files after 7 days\necho '{{\n  \"lifecycle\": {{\n    \"rule\": [\n      {{\n        \"action\": {{\"type\": \"Delete\"}},\n        \"condition\": {{\n          \"age\": 7,\n          \"matchesPrefix\": [\"temp/\"]\n        }}\n      }}\n    ]\n  }}\n}}' > lifecycle.json\n\ngsutil lifecycle set lifecycle.json gs://{bucket_name}\n\n# Set up proper permissions (if using service account)\ngsutil iam ch serviceAccount:clustrix-batch-sa@{project_id}.iam.gserviceaccount.com:objectAdmin gs://{bucket_name}\n\"\"\"\n    \n    return gcs_commands\n\n# Create bucket configuration\nBUCKET_NAME = f\"{PROJECT_ID}-clustrix-data\"\nbucket_commands = create_gcs_bucket_for_clustrix(PROJECT_ID, BUCKET_NAME)\n\nprint(\"=== Commands to create Cloud Storage bucket ===\")\nprint(bucket_commands)\n\n# Example usage (commented out - uncomment after creating bucket):\n# sample_data = np.random.rand(1000, 100)\n# upload_location = upload_to_gcs(sample_data, BUCKET_NAME, 'input/sample_data.pkl', PROJECT_ID)\n# print(f\"\u2713 Data uploaded to {upload_location}\")\n# \n# result = process_gcs_data(BUCKET_NAME, 'input/sample_data.pkl', 'output/results.pkl', PROJECT_ID)\n# print(f\"\u2713 Processing completed: {result}\")\n\nprint(\"\\n\u2713 Google Cloud Storage integration functions defined.\")\nprint(\"Execute the bucket creation commands above, then uncomment the example usage.\")",
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "id": "vertex-ai",
   "metadata": {},
   "source": [
    "## Vertex AI Integration"
   ]
  },
  {
   "cell_type": "code",
   "id": "vertex-ai-setup",
   "metadata": {},
   "outputs": [],
   "source": "def setup_vertex_ai_for_clustrix(project_id, region='us-central1'):\n    \"\"\"\n    Setup Vertex AI for ML workloads with Clustrix.\n    \"\"\"\n    \n    vertex_commands = f\"\"\"\n# Enable Vertex AI API\ngcloud services enable aiplatform.googleapis.com \\\n  --project {project_id}\n\n# Create Vertex AI custom training job\ngcloud ai custom-jobs create \\\n  --region={region} \\\n  --display-name=clustrix-training-job \\\n  --config=training_job_config.yaml\n\n# Create Vertex AI endpoints for model serving\ngcloud ai endpoints create \\\n  --region={region} \\\n  --display-name=clustrix-model-endpoint\n\"\"\"\n    \n    # Vertex AI training job configuration\n    training_config = f\"\"\"\n# training_job_config.yaml\nworkerPoolSpecs:\n- machineSpec:\n    machineType: e2-standard-4\n  replicaCount: 1\n  containerSpec:\n    imageUri: gcr.io/cloud-aiplatform/training/tf-cpu.2-8:latest\n    command:\n    - python3\n    - -c\n    args:\n    - |\n      import subprocess\n      import sys\n      \n      # Install clustrix\n      subprocess.check_call([sys.executable, '-m', 'pip', 'install', 'clustrix', 'numpy', 'pandas', 'scikit-learn'])\n      \n      # Your training code here\n      print(\"Clustrix training job completed on Vertex AI\")\n    env:\n    - name: GOOGLE_CLOUD_PROJECT\n      value: {project_id}\n    - name: AIP_MODEL_DIR\n      value: gs://{project_id}-vertex-models\n\"\"\"\n    \n    return {\n        'project_id': project_id,\n        'region': region,\n        'setup_commands': vertex_commands,\n        'training_config': training_config\n    }\n\n@cluster(cores=4, memory=\"8GB\")\ndef vertex_ai_ml_pipeline(dataset_config, model_config, project_id, bucket_name):\n    \"\"\"ML pipeline that could run on Vertex AI with Clustrix.\"\"\"\n    import numpy as np\n    from sklearn.ensemble import GradientBoostingClassifier\n    from sklearn.model_selection import cross_val_score, GridSearchCV\n    from sklearn.datasets import make_classification\n    from sklearn.metrics import classification_report\n    from google.cloud import storage\n    import pickle\n    import time\n    \n    start_time = time.time()\n    \n    # Generate or load dataset\n    X, y = make_classification(\n        n_samples=dataset_config['n_samples'],\n        n_features=dataset_config['n_features'],\n        n_classes=dataset_config['n_classes'],\n        n_informative=dataset_config.get('n_informative', dataset_config['n_features'] // 2),\n        random_state=42\n    )\n    \n    # Hyperparameter tuning\n    param_grid = {\n        'n_estimators': [50, 100, 200],\n        'max_depth': [3, 5, 7],\n        'learning_rate': [0.01, 0.1, 0.2]\n    }\n    \n    # Grid search with cross-validation\n    model = GradientBoostingClassifier(random_state=42)\n    grid_search = GridSearchCV(\n        model, param_grid, cv=5, scoring='accuracy', n_jobs=-1\n    )\n    \n    grid_search.fit(X, y)\n    \n    # Get best model\n    best_model = grid_search.best_estimator_\n    \n    # Evaluate with cross-validation\n    cv_scores = cross_val_score(best_model, X, y, cv=5, scoring='accuracy')\n    \n    # Save model to Cloud Storage\n    storage_client = storage.Client(project=project_id)\n    bucket = storage_client.bucket(bucket_name)\n    \n    model_blob = bucket.blob('models/clustrix_model.pkl')\n    model_bytes = pickle.dumps(best_model)\n    model_blob.upload_from_string(model_bytes)\n    \n    total_time = time.time() - start_time\n    \n    return {\n        'best_params': grid_search.best_params_,\n        'best_score': grid_search.best_score_,\n        'cv_mean_score': cv_scores.mean(),\n        'cv_std_score': cv_scores.std(),\n        'training_time': total_time,\n        'model_location': f'gs://{bucket_name}/models/clustrix_model.pkl',\n        'feature_importance': best_model.feature_importances_[:10].tolist(),  # Top 10\n        'dataset_size': len(X)\n    }\n\n# Setup Vertex AI configuration\nvertex_config = setup_vertex_ai_for_clustrix(PROJECT_ID)\n\nprint(\"=== Vertex AI Setup Commands ===\")\nprint(vertex_config['setup_commands'])\nprint(\"\\n=== Training Job Configuration ===\")\nprint(vertex_config['training_config'])\n\n# Example usage (commented out):\n# dataset_params = {'n_samples': 10000, 'n_features': 20, 'n_classes': 3}\n# model_params = {}\n# result = vertex_ai_ml_pipeline(dataset_params, model_params, PROJECT_ID, BUCKET_NAME)\n# print(f\"\u2713 Best model score: {result['best_score']:.4f}\")\n# print(f\"\u2713 Model saved to: {result['model_location']}\")\n\nprint(\"\\n\u2713 Vertex AI integration examples defined.\")",
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "id": "gcp-security",
   "metadata": {},
   "source": [
    "## Security Best Practices"
   ]
  },
  {
   "cell_type": "code",
   "id": "gcp-security-setup",
   "metadata": {},
   "outputs": [],
   "source": "def setup_gcp_security_for_clustrix(project_id):\n    \"\"\"\n    Security configuration for GCP + Clustrix deployment.\n    \"\"\"\n    \n    security_commands = f\"\"\"\n# Create VPC with private subnets\ngcloud compute networks create clustrix-vpc \\\n  --project {project_id} \\\n  --subnet-mode custom\n\ngcloud compute networks subnets create clustrix-subnet \\\n  --project {project_id} \\\n  --network clustrix-vpc \\\n  --range 10.1.0.0/24 \\\n  --region us-central1 \\\n  --enable-private-ip-google-access\n\n# Create firewall rules (restrictive)\ngcloud compute firewall-rules create clustrix-allow-ssh \\\n  --project {project_id} \\\n  --network clustrix-vpc \\\n  --allow tcp:22 \\\n  --source-ranges YOUR_IP/32 \\\n  --target-tags clustrix\n\ngcloud compute firewall-rules create clustrix-internal \\\n  --project {project_id} \\\n  --network clustrix-vpc \\\n  --allow tcp,udp,icmp \\\n  --source-ranges 10.1.0.0/24 \\\n  --target-tags clustrix\n\n# Create service account with minimal permissions\ngcloud iam service-accounts create clustrix-compute \\\n  --project {project_id} \\\n  --description=\"Service account for Clustrix compute instances\" \\\n  --display-name=\"Clustrix Compute Service Account\"\n\n# Grant only necessary permissions\ngcloud projects add-iam-policy-binding {project_id} \\\n  --member=\"serviceAccount:clustrix-compute@{project_id}.iam.gserviceaccount.com\" \\\n  --role=\"roles/storage.objectAdmin\"\n\ngcloud projects add-iam-policy-binding {project_id} \\\n  --member=\"serviceAccount:clustrix-compute@{project_id}.iam.gserviceaccount.com\" \\\n  --role=\"roles/logging.logWriter\"\n\n# Enable OS Login for better SSH key management\ngcloud compute project-info add-metadata \\\n  --project {project_id} \\\n  --metadata enable-oslogin=TRUE\n\n# Create Cloud KMS key for encryption\ngcloud kms keyrings create clustrix-keyring \\\n  --project {project_id} \\\n  --location global\n\ngcloud kms keys create clustrix-key \\\n  --project {project_id} \\\n  --keyring clustrix-keyring \\\n  --location global \\\n  --purpose encryption\n\"\"\"\n    \n    return {\n        'project_id': project_id,\n        'vpc_name': 'clustrix-vpc',\n        'subnet_name': 'clustrix-subnet',\n        'service_account': f'clustrix-compute@{project_id}.iam.gserviceaccount.com',\n        'security_commands': security_commands\n    }\n\n# Generate security configuration\nsecurity_config = setup_gcp_security_for_clustrix(PROJECT_ID)\n\nprint(\"=== GCP Security Setup Commands ===\")\nprint(security_config['security_commands'])\nprint(f\"\\n\u2713 Security configuration templates generated for project: {PROJECT_ID}\")\nprint(f\"\u2713 VPC: {security_config['vpc_name']}\")\nprint(f\"\u2713 Service Account: {security_config['service_account']}\")\nprint(\"\\n\u26a0\ufe0f  Remember to replace 'YOUR_IP' with your actual IP address in the firewall rules!\")",
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "id": "7zpjrtwse94",
   "source": "### GCP Security Checklist for Clustrix\n\n\u2713 **Authentication and Access**\n- Use IAM service accounts with minimal permissions\n- Enable OS Login for centralized SSH key management\n- Create custom VPC with private subnets\n- Restrict firewall rules to specific IP ranges\n\n\u2713 **Infrastructure Security**\n- Enable private Google access for instances without external IPs\n- Use Cloud KMS for encryption at rest\n- Enable audit logging and Cloud Security Command Center\n- Use Binary Authorization for container security\n\n\u2713 **Network Security**\n- Implement VPC Service Controls for data perimeter\n- Enable DDoS protection and Cloud Armor\n- Use Secret Manager for sensitive configuration\n- Enable vulnerability scanning for container images\n\n\u2713 **Governance and Compliance**\n- Set up budget alerts and billing account security\n- Use organization policies for governance\n- Regular security reviews and access audits",
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "id": "cleanup-gcp",
   "metadata": {},
   "source": [
    "## Resource Cleanup"
   ]
  },
  {
   "cell_type": "code",
   "id": "cleanup-gcp-resources",
   "metadata": {},
   "outputs": [],
   "source": "def cleanup_gcp_resources(project_id, zone='us-central1-a', region='us-central1'):\n    \"\"\"\n    Clean up GCP resources to avoid ongoing charges.\n    \n    Args:\n        project_id: GCP project ID\n        zone: Zone where resources were created\n        region: Region where resources were created\n    \"\"\"\n    \n    cleanup_commands = f\"\"\"\n# List all compute instances\ngcloud compute instances list --project {project_id}\n\n# Delete specific instances\ngcloud compute instances delete clustrix-instance \\\n  --project {project_id} \\\n  --zone {zone} \\\n  --quiet\n\n# Delete managed instance groups\ngcloud compute instance-groups managed delete clustrix-preemptible-group \\\n  --project {project_id} \\\n  --zone {zone} \\\n  --quiet\n\n# Delete instance templates\ngcloud compute instance-templates delete clustrix-preemptible-template \\\n  --project {project_id} \\\n  --quiet\n\n# Delete GKE clusters\ngcloud container clusters delete clustrix-cluster \\\n  --project {project_id} \\\n  --zone {zone} \\\n  --quiet\n\n# Delete Cloud Storage buckets (BE CAREFUL - THIS DELETES ALL DATA)\ngsutil -m rm -r gs://{project_id}-clustrix-batch\ngsutil -m rm -r gs://{project_id}-vertex-models\ngsutil -m rm -r gs://{project_id}-clustrix-data\n\n# Delete firewall rules\ngcloud compute firewall-rules delete clustrix-allow-ssh clustrix-internal \\\n  --project {project_id} \\\n  --quiet\n\n# Delete VPC network\ngcloud compute networks subnets delete clustrix-subnet \\\n  --project {project_id} \\\n  --region {region} \\\n  --quiet\n\ngcloud compute networks delete clustrix-vpc \\\n  --project {project_id} \\\n  --quiet\n\n# Delete service accounts\ngcloud iam service-accounts delete clustrix-compute@{project_id}.iam.gserviceaccount.com \\\n  --project {project_id} \\\n  --quiet\n\ngcloud iam service-accounts delete clustrix-batch-sa@{project_id}.iam.gserviceaccount.com \\\n  --project {project_id} \\\n  --quiet\n\n# List remaining billable resources\necho \"=== Remaining billable resources ===\"\ngcloud compute instances list --project {project_id}\ngcloud compute disks list --project {project_id}\ngcloud compute addresses list --project {project_id}\ngcloud container clusters list --project {project_id}\n\"\"\"\n    \n    return {\n        'project_id': project_id,\n        'zone': zone,\n        'region': region,\n        'cleanup_commands': cleanup_commands\n    }\n\n# Generate cleanup commands\ncleanup_info = cleanup_gcp_resources(PROJECT_ID)\n\nprint(f\"=== GCP Resource Cleanup Commands for Project: {PROJECT_ID} ===\")\nprint(cleanup_info['cleanup_commands'])\nprint(\"\\n\u26a0\ufe0f  WARNING: Some commands will permanently delete resources and data!\")\nprint(\"Review each resource before deleting and ensure you have backups if needed.\")\nprint(\"\\n\ud83d\udca1 TIP: Use 'gcloud compute instances stop' instead of 'delete' to preserve instances while stopping charges.\")\nprint(\"\\n\u2713 Cleanup commands generated. Always verify resources before deletion!\")",
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "id": "advanced-gcp-example",
   "metadata": {},
   "source": [
    "## Advanced Example: Distributed Scientific Computing"
   ]
  },
  {
   "cell_type": "code",
   "id": "scientific-computing-example",
   "metadata": {},
   "outputs": [],
   "source": "# Advanced Scientific Computing\n@cluster(cores=4, memory=\"8GB\", time=\"01:00:00\")\ndef gcp_scientific_simulation(simulation_params, storage_config=None):\n    \"\"\"\n    Distributed scientific simulation using GCP infrastructure.\n    \"\"\"\n    import numpy as np\n    from scipy.integrate import odeint\n    from scipy.optimize import minimize\n    import pickle\n    import time\n    import matplotlib\n    matplotlib.use('Agg')  # Use non-interactive backend\n    import matplotlib.pyplot as plt\n    import io\n    \n    # Only import GCP storage if config provided\n    if storage_config:\n        from google.cloud import storage\n    \n    def lorenz_system(state, t, sigma, rho, beta):\n        \"\"\"Lorenz attractor differential equations.\"\"\"\n        x, y, z = state\n        return [\n            sigma * (y - x),\n            x * (rho - z) - y,\n            x * y - beta * z\n        ]\n    \n    def simulate_lorenz(params, time_points):\n        \"\"\"Simulate Lorenz system with given parameters.\"\"\"\n        initial_state = [1.0, 1.0, 1.0]\n        solution = odeint(\n            lorenz_system, initial_state, time_points,\n            args=(params['sigma'], params['rho'], params['beta'])\n        )\n        return solution\n    \n    start_time = time.time()\n    \n    # Parameter sweep\n    parameter_sets = simulation_params['parameter_sets']\n    time_points = np.linspace(0, simulation_params['max_time'], simulation_params['num_points'])\n    \n    results = []\n    \n    for i, params in enumerate(parameter_sets):\n        # Run simulation\n        solution = simulate_lorenz(params, time_points)\n        \n        # Analyze results\n        x, y, z = solution[:, 0], solution[:, 1], solution[:, 2]\n        \n        analysis = {\n            'params': params,\n            'max_x': float(np.max(x)),\n            'min_x': float(np.min(x)),\n            'max_y': float(np.max(y)),\n            'min_y': float(np.min(y)),\n            'max_z': float(np.max(z)),\n            'min_z': float(np.min(z)),\n            'mean_energy': float(np.mean(x**2 + y**2 + z**2)),\n            'final_state': [float(x[-1]), float(y[-1]), float(z[-1])],\n            'std_x': float(np.std(x)),\n            'std_y': float(np.std(y)),\n            'std_z': float(np.std(z))\n        }\n        \n        results.append(analysis)\n        \n        # Create visualization for first few parameter sets\n        if i < 3:\n            fig = plt.figure(figsize=(12, 4))\n            \n            # Time series\n            plt.subplot(1, 3, 1)\n            plt.plot(time_points, x, label='X', alpha=0.8)\n            plt.plot(time_points, y, label='Y', alpha=0.8)\n            plt.plot(time_points, z, label='Z', alpha=0.8)\n            plt.xlabel('Time')\n            plt.ylabel('State')\n            plt.title(f'Lorenz System (\u03c3={params[\"sigma\"]}, \u03c1={params[\"rho\"]}, \u03b2={params[\"beta\"]})')\n            plt.legend()\n            plt.grid(True, alpha=0.3)\n            \n            # Phase space (X-Y)\n            plt.subplot(1, 3, 2)\n            plt.plot(x, y, alpha=0.7, linewidth=0.8)\n            plt.xlabel('X')\n            plt.ylabel('Y')\n            plt.title('X-Y Phase Space')\n            plt.grid(True, alpha=0.3)\n            \n            # Phase space (X-Z)\n            plt.subplot(1, 3, 3)\n            plt.plot(x, z, alpha=0.7, linewidth=0.8)\n            plt.xlabel('X')\n            plt.ylabel('Z')\n            plt.title('X-Z Phase Space')\n            plt.grid(True, alpha=0.3)\n            \n            plt.tight_layout()\n            \n            # Save plot to Cloud Storage if configured\n            if storage_config:\n                try:\n                    img_buffer = io.BytesIO()\n                    plt.savefig(img_buffer, format='png', dpi=150, bbox_inches='tight')\n                    img_buffer.seek(0)\n                    \n                    storage_client = storage.Client(project=storage_config['project_id'])\n                    bucket = storage_client.bucket(storage_config['bucket_name'])\n                    \n                    plot_blob = bucket.blob(f\"plots/lorenz_simulation_{i}.png\")\n                    plot_blob.upload_from_string(img_buffer.getvalue(), content_type='image/png')\n                except Exception as e:\n                    print(f\"Warning: Could not save plot to GCS: {e}\")\n            \n            plt.close()\n    \n    computation_time = time.time() - start_time\n    \n    # Calculate summary statistics\n    energies = [r['mean_energy'] for r in results]\n    summary_stats = {\n        'total_simulations': len(parameter_sets),\n        'computation_time': computation_time,\n        'average_energy': np.mean(energies),\n        'max_energy': max(energies),\n        'min_energy': min(energies),\n        'energy_std': np.std(energies),\n        'time_per_simulation': computation_time / len(parameter_sets)\n    }\n    \n    # Save detailed results to Cloud Storage if configured\n    if storage_config:\n        try:\n            storage_client = storage.Client(project=storage_config['project_id'])\n            bucket = storage_client.bucket(storage_config['bucket_name'])\n            \n            results_blob = bucket.blob(\"results/simulation_results.pkl\")\n            results_data = {\n                'simulation_params': simulation_params,\n                'results': results,\n                'summary_stats': summary_stats,\n                'timestamp': time.time()\n            }\n            results_bytes = pickle.dumps(results_data)\n            results_blob.upload_from_string(results_bytes)\n        except Exception as e:\n            print(f\"Warning: Could not save results to GCS: {e}\")\n    \n    return {\n        'num_simulations': len(parameter_sets),\n        'computation_time': computation_time,\n        'summary_stats': summary_stats,\n        'results_preview': results[:2],  # First 2 for brevity\n        'storage_location': f\"gs://{storage_config['bucket_name']}/results/\" if storage_config else None,\n        'plots_saved': min(3, len(parameter_sets))\n    }\n\n# Monte Carlo simulation example\n@cluster(cores=2, memory=\"4GB\")\ndef gcp_monte_carlo_simulation(n_samples=1000000):\n    \"\"\"Monte Carlo simulation for option pricing.\"\"\"\n    import numpy as np\n    import time\n    \n    start_time = time.time()\n    \n    # Black-Scholes parameters\n    S0 = 100    # Initial stock price\n    K = 105     # Strike price\n    T = 1.0     # Time to expiration\n    r = 0.05    # Risk-free rate\n    sigma = 0.2 # Volatility\n    \n    # Generate random samples\n    np.random.seed(42)\n    Z = np.random.standard_normal(n_samples)\n    \n    # Simulate stock prices at expiration\n    ST = S0 * np.exp((r - 0.5 * sigma**2) * T + sigma * np.sqrt(T) * Z)\n    \n    # Calculate option payoffs\n    call_payoffs = np.maximum(ST - K, 0)\n    put_payoffs = np.maximum(K - ST, 0)\n    \n    # Discount to present value\n    call_price = np.exp(-r * T) * np.mean(call_payoffs)\n    put_price = np.exp(-r * T) * np.mean(put_payoffs)\n    \n    # Calculate confidence intervals\n    call_std = np.std(call_payoffs) / np.sqrt(n_samples)\n    put_std = np.std(put_payoffs) / np.sqrt(n_samples)\n    \n    computation_time = time.time() - start_time\n    \n    return {\n        'n_samples': n_samples,\n        'computation_time': computation_time,\n        'call_price': call_price,\n        'put_price': put_price,\n        'call_confidence_interval': [call_price - 1.96*call_std, call_price + 1.96*call_std],\n        'put_confidence_interval': [put_price - 1.96*put_std, put_price + 1.96*put_std],\n        'parameters': {'S0': S0, 'K': K, 'T': T, 'r': r, 'sigma': sigma}\n    }\n\nprint(\"\u2713 Advanced scientific computing examples defined\")\n\n# Example simulation parameters\nexample_lorenz_params = {\n    'parameter_sets': [\n        {'sigma': 10.0, 'rho': 28.0, 'beta': 8.0/3.0},    # Classic chaotic\n        {'sigma': 10.0, 'rho': 24.74, 'beta': 8.0/3.0},   # Near onset\n        {'sigma': 10.0, 'rho': 99.65, 'beta': 8.0/3.0},   # High rho\n        {'sigma': 16.0, 'rho': 45.92, 'beta': 4.0},       # Different params\n    ],\n    'max_time': 25.0,\n    'num_points': 5000\n}\n\nprint(\"\\n\ud83d\udcdd Example usage:\")\nprint(\"# Lorenz simulation:\")\nprint(\"# result = gcp_scientific_simulation(example_lorenz_params)\")\nprint(\"# print(f'Completed {result[\\\"num_simulations\\\"]} simulations')\")\nprint(\"# print(f'Computation time: {result[\\\"computation_time\\\"]:.2f} seconds')\")\nprint(\"#\")\nprint(\"# Monte Carlo simulation:\")\nprint(\"# mc_result = gcp_monte_carlo_simulation(n_samples=5000000)\")\nprint(\"# print(f'Call option price: ${mc_result[\\\"call_price\\\"]:.2f}')\")\n\nprint(\"\\n\ud83e\uddea These examples demonstrate GCP's computational capabilities:\")\nprint(\"  \u2022 Parallel differential equation solving\")\nprint(\"  \u2022 Statistical simulations with confidence intervals\")\nprint(\"  \u2022 Cloud Storage integration for results\")\nprint(\"  \u2022 Visualization generation and storage\")",
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "id": "gcp-summary",
   "metadata": {},
   "source": "## Summary\n\nThis tutorial covered:\n\n1. **Setup**: GCP authentication and Clustrix installation\n2. **Compute Engine**: Direct VM configuration and management\n3. **GKE Integration**: Kubernetes clusters for containerized workloads\n4. **Cloud Batch**: Managed job scheduling for large-scale processing\n5. **Cloud Storage**: Data management and result storage\n6. **Vertex AI**: Machine learning platform integration\n7. **Security**: Best practices for secure deployment\n8. **Resource Management**: Proper cleanup procedures\n\n### Cost Monitoring\n\nFor comprehensive cost monitoring, optimization strategies, and multi-cloud cost comparisons, see the dedicated [Cost Monitoring Tutorial](cost_monitoring_tutorial.ipynb).\n\n### Next Steps\n\n- Set up your GCP credentials and test the basic configuration\n- Start with a simple Compute Engine instance for initial testing\n- Consider GKE for containerized workloads and auto-scaling\n- Explore Cloud Batch for large-scale batch processing\n- Implement proper monitoring and access controls\n- Review the Cost Monitoring Tutorial for expense tracking\n\n### GCP-Specific Advantages\n\n- **Preemptible/Spot VMs**: Exceptional cost savings (up to 80%)\n- **Google Kubernetes Engine**: Industry-leading managed Kubernetes\n- **Vertex AI**: Comprehensive ML platform with AutoML capabilities\n- **Global Network**: Superior network performance and global reach\n- **BigQuery Integration**: Seamless data analytics integration\n- **Sustained Use Discounts**: Automatic discounts for sustained usage\n\n### Resources\n\n- [Google Cloud Compute Engine Documentation](https://cloud.google.com/compute/docs)\n- [Google Kubernetes Engine Documentation](https://cloud.google.com/kubernetes-engine/docs)\n- [Google Cloud Batch Documentation](https://cloud.google.com/batch/docs)\n- [Vertex AI Documentation](https://cloud.google.com/vertex-ai/docs)\n- [Google Cloud Storage Documentation](https://cloud.google.com/storage/docs)\n- [GCP Pricing Calculator](https://cloud.google.com/products/calculator)\n- [Clustrix Documentation](https://clustrix.readthedocs.io/)\n- [Clustrix Cost Monitoring Tutorial](cost_monitoring_tutorial.ipynb)\n\n**Remember**: Always monitor your cloud costs and clean up resources when not in use!"
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}