{ "cells": [ { "cell_type": "markdown", "id": "gcp-title", "metadata": {}, "source": "# Google Cloud Platform (GCP) Tutorial\n\nThis tutorial demonstrates how to use Clustrix with Google Cloud Platform (GCP) infrastructure for scalable distributed computing.\n\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ContextLab/clustrix/blob/master/docs/source/notebooks/gcp_cloud_tutorial.ipynb)\n\n## Overview\n\nGCP provides several services that integrate well with Clustrix:\n\n- **Compute Engine**: Virtual machines for compute clusters\n- **Google Kubernetes Engine (GKE)**: Managed Kubernetes clusters\n- **Batch**: Managed job scheduling service\n- **Cloud Run**: Serverless container platform\n- **Vertex AI**: Machine learning platform\n- **Cloud Storage**: Object storage for data and results\n- **VPC**: Network isolation and security\n- **Preemptible VMs**: Cost-effective compute instances\n\n## Complete Setup Guide from Scratch\n\n### Step 1: Google Cloud Account Setup\n\n1. **Create Google Cloud Account**:\n - Go to [Google Cloud Console](https://console.cloud.google.com/)\n - Sign up with your Google account or create a new one\n - Accept the terms of service\n\n2. **Enable Billing**:\n - Navigate to Billing in the Google Cloud Console\n - Create a billing account and add a payment method\n - **Important**: New users get $300 in free credits\n - Set up billing alerts to avoid unexpected charges\n\n3. **Create a New Project**:\n - Go to the Project Selector in the console\n - Click \"New Project\"\n - Choose a unique project ID (e.g., `my-clustrix-project-123`)\n - Enable billing for this project\n\n### Step 2: Install Google Cloud SDK (gcloud CLI)\n\n**On macOS:**\n```bash\n# Using Homebrew (recommended)\nbrew install google-cloud-sdk\n\n# Or download installer\ncurl https://sdk.cloud.google.com | bash\nexec -l $SHELL\n```\n\n**On Linux:**\n```bash\n# Download and install\ncurl https://sdk.cloud.google.com | bash\nexec -l $SHELL\n\n# Or use package manager (Ubuntu/Debian)\nsudo apt-get install google-cloud-sdk\n```\n\n**On Windows:**\n- Download the installer from [Google Cloud SDK page](https://cloud.google.com/sdk/docs/install)\n- Run the installer and follow instructions\n\n### Step 3: Enable Required APIs\n\nEnable the necessary Google Cloud APIs for this tutorial:\n\n```bash\n# Set your project ID\nexport PROJECT_ID=\"your-project-id-here\"\ngcloud config set project $PROJECT_ID\n\n# Enable required APIs\ngcloud services enable compute.googleapis.com\ngcloud services enable container.googleapis.com\ngcloud services enable batch.googleapis.com\ngcloud services enable aiplatform.googleapis.com\ngcloud services enable storage.googleapis.com\n```\n\n## Prerequisites Checklist\n\nBefore proceeding, ensure you have:\n\n- [ ] Google Cloud account with billing enabled\n- [ ] Google Cloud project created\n- [ ] Google Cloud SDK (gcloud) installed locally\n- [ ] Required APIs enabled (compute, container, batch, storage, aiplatform)\n- [ ] SSH key pair for VM access (we'll create this below)\n- [ ] Basic understanding of command line usage" }, { "cell_type": "markdown", "id": "installation", "metadata": {}, "source": [ "## Installation and Setup\n", "\n", "Install Clustrix with GCP dependencies:" ] }, { "cell_type": "markdown", "id": "4wiyb0urchu", "source": "### Step 4: SSH Key Setup\n\nCreate SSH keys for secure access to your GCP instances:\n\n```bash\n# Generate SSH key pair (if you don't have one)\nssh-keygen -t rsa -b 4096 -C \"your-email@example.com\" -f ~/.ssh/gcp_key\n\n# Add the public key to GCP\ngcloud compute os-login ssh-keys add --key-file=~/.ssh/gcp_key.pub\n\n# Or add to project metadata (alternative method)\ngcloud compute project-info add-metadata --metadata-from-file ssh-keys=~/.ssh/gcp_key.pub\n```\n\n**Note**: If you're using Google Cloud Shell, SSH keys are automatically managed.", "metadata": {} }, { "cell_type": "code", "id": "install", "metadata": {}, "outputs": [], "source": "# Install Clustrix with GCP support\n!pip install clustrix google-cloud-compute google-cloud-storage google-auth google-auth-oauthlib\n\n# Import required libraries\nimport clustrix\nfrom clustrix import cluster, configure\nfrom google.cloud import compute_v1\nfrom google.cloud import storage\nfrom google.auth import default\nimport os\nimport numpy as np\nimport time\nimport json", "execution_count": null }, { "cell_type": "markdown", "id": "gcp-authentication", "metadata": {}, "source": "## GCP Authentication Setup\n\nConfigure your GCP credentials. Choose the method that best fits your environment:\n\n### Option 1: gcloud CLI Authentication (Recommended for Local Development)\n\nThis method uses your personal Google account credentials:" }, { "cell_type": "code", "id": "gcloud-auth", "metadata": {}, "outputs": [], "source": "# Initial authentication and project setup\n!gcloud auth login\n!gcloud auth application-default login\n\n# Set your project ID (replace with your actual project ID)\nPROJECT_ID = \"your-project-id-here\" # Replace this!\n!gcloud config set project {PROJECT_ID}\n\n# Verify authentication and project setup\n!gcloud auth list\n!gcloud config get-value project\n!gcloud projects describe {PROJECT_ID}", "execution_count": null }, { "cell_type": "markdown", "id": "gcp-service-account", "metadata": {}, "source": "### Option 2: Service Account Authentication (Recommended for Production)\n\nFor production environments, create and use a service account with specific permissions:" }, { "cell_type": "code", "id": "service-account", "metadata": {}, "outputs": [], "source": "# Test GCP connection\ntry:\n credentials, project_id = default()\n print(f\"\u2713 Successfully authenticated with project: {project_id}\")\n \n # Test compute API\n compute_client = compute_v1.InstancesClient()\n print(\"\u2713 Compute Engine API access confirmed\")\n \n # Test storage API\n storage_client = storage.Client()\n print(\"\u2713 Cloud Storage API access confirmed\")\n \nexcept Exception as e:\n print(f\"\u274c GCP authentication failed: {e}\")\n print(\"Please check your authentication setup and try again.\")", "execution_count": null }, { "cell_type": "markdown", "id": "hc7w51mwb54", "source": "**Service Account Setup (Production Environments)**\n\nFor production use, create a service account with specific permissions:\n\n```bash\n# Create service account\ngcloud iam service-accounts create clustrix-service-account \\\n --description=\"Service account for Clustrix operations\" \\\n --display-name=\"Clustrix Service Account\"\n\n# Grant necessary permissions\ngcloud projects add-iam-policy-binding YOUR_PROJECT_ID \\\n --member=\"serviceAccount:clustrix-service-account@YOUR_PROJECT_ID.iam.gserviceaccount.com\" \\\n --role=\"roles/compute.admin\"\n\ngcloud projects add-iam-policy-binding YOUR_PROJECT_ID \\\n --member=\"serviceAccount:clustrix-service-account@YOUR_PROJECT_ID.iam.gserviceaccount.com\" \\\n --role=\"roles/storage.admin\"\n\n# Create and download service account key\ngcloud iam service-accounts keys create ~/clustrix-service-account-key.json \\\n --iam-account=clustrix-service-account@YOUR_PROJECT_ID.iam.gserviceaccount.com\n\n# Set the environment variable\nexport GOOGLE_APPLICATION_CREDENTIALS=\"/path/to/clustrix-service-account-key.json\"\n```", "metadata": {} }, { "cell_type": "markdown", "id": "eegtd4c0im", "source": "**Important**: Make sure you have completed authentication setup and enabled all required APIs before proceeding. \n\nIf authentication fails, double-check that:\n- Your project ID is correct\n- Billing is enabled for your project \n- Required APIs are enabled\n- Your credentials are properly configured", "metadata": {} }, { "cell_type": "markdown", "id": "compute-engine-setup", "metadata": {}, "source": [ "## Method 1: Google Compute Engine Configuration\n", "\n", "### Create Compute Engine Instance for Clustrix" ] }, { "cell_type": "code", "id": "compute-engine-creation", "metadata": {}, "outputs": [], "source": "def create_clustrix_compute_instance(project_id, zone='us-central1-a', machine_type='e2-standard-4'):\n \"\"\"\n Create a GCP Compute Engine instance configured for Clustrix.\n \n Args:\n project_id: GCP project ID\n zone: GCP zone for the instance\n machine_type: Machine type (CPU/memory configuration)\n \n Returns:\n Instance configuration and gcloud commands\n \"\"\"\n \n # Startup script for instance initialization\n startup_script = '''\n#!/bin/bash\n\n# Update system\napt-get update\napt-get install -y python3 python3-pip git htop curl\n\n# Install clustrix and common packages\npip3 install clustrix numpy scipy pandas scikit-learn matplotlib\n\n# Install uv for faster package management\ncurl -LsSf https://astral.sh/uv/install.sh | sh\nsource ~/.cargo/env\n\n# Create clustrix user\nuseradd -m -s /bin/bash clustrix\nusermod -aG sudo clustrix\necho \"clustrix ALL=(ALL) NOPASSWD:ALL\" >> /etc/sudoers\n\n# Setup SSH for clustrix user\nmkdir -p /home/clustrix/.ssh\n# Copy SSH keys from default user\nif [ -d \"/home/$(logname)/.ssh\" ]; then\n cp -r /home/$(logname)/.ssh/* /home/clustrix/.ssh/\n chown -R clustrix:clustrix /home/clustrix/.ssh\n chmod 700 /home/clustrix/.ssh\n chmod 600 /home/clustrix/.ssh/authorized_keys 2>/dev/null || true\nfi\n\n# Create working directory\nmkdir -p /tmp/clustrix\nchown clustrix:clustrix /tmp/clustrix\n\n# Install Google Cloud SDK for clustrix user\ncurl https://sdk.cloud.google.com | bash\nexec -l $SHELL\n\n# Log completion\necho \"Clustrix setup completed at $(date)\" >> /var/log/clustrix-setup.log\n'''\n \n # gcloud commands for instance creation\n gcloud_commands = f\"\"\"\n# Create firewall rule for SSH (if not exists)\ngcloud compute firewall-rules create allow-ssh \\\n --allow tcp:22 \\\n --source-ranges 0.0.0.0/0 \\\n --description \"Allow SSH access\" \\\n --project {project_id} || echo \"SSH rule already exists\"\n\n# Create the instance\ngcloud compute instances create clustrix-instance \\\n --project={project_id} \\\n --zone={zone} \\\n --machine-type={machine_type} \\\n --network-interface=network-tier=PREMIUM,subnet=default \\\n --maintenance-policy=MIGRATE \\\n --provisioning-model=STANDARD \\\n --service-account=default \\\n --scopes=https://www.googleapis.com/auth/cloud-platform \\\n --tags=clustrix,http-server,https-server \\\n --create-disk=auto-delete=yes,boot=yes,device-name=clustrix-instance,image=projects/ubuntu-os-cloud/global/images/family/ubuntu-2204-lts,mode=rw,size=50,type=projects/{project_id}/zones/{zone}/diskTypes/pd-balanced \\\n --no-shielded-secure-boot \\\n --shielded-vtpm \\\n --shielded-integrity-monitoring \\\n --labels=purpose=clustrix,environment=tutorial \\\n --reservation-affinity=any \\\n --metadata-from-file startup-script=startup-script.sh\n\n# Get the external IP\ngcloud compute instances describe clustrix-instance \\\n --project={project_id} \\\n --zone={zone} \\\n --format='get(networkInterfaces[0].accessConfigs[0].natIP)'\n\n# SSH to the instance (after startup script completes)\ngcloud compute ssh clustrix-instance \\\n --project={project_id} \\\n --zone={zone}\n\"\"\"\n \n return {\n 'project_id': project_id,\n 'zone': zone,\n 'machine_type': machine_type,\n 'instance_name': 'clustrix-instance',\n 'gcloud_commands': gcloud_commands,\n 'startup_script': startup_script\n }\n\n# Example usage - replace with your actual project ID\ninstance_config = create_clustrix_compute_instance(\n project_id=PROJECT_ID, # Using the PROJECT_ID variable from above\n zone='us-central1-a',\n machine_type='e2-standard-4' # 4 vCPUs, 16 GB RAM\n)\n\n# Display the configuration results\nprint(\"=== GCP Compute Engine Instance Configuration ===\")\nprint(f\"Project ID: {instance_config['project_id']}\")\nprint(f\"Zone: {instance_config['zone']}\")\nprint(f\"Machine Type: {instance_config['machine_type']}\")\nprint(f\"Instance Name: {instance_config['instance_name']}\")\nprint(\"\\n=== Next Steps ===\")\nprint(\"1. Save the startup script to 'startup-script.sh'\")\nprint(\"2. Execute the gcloud commands shown above\")\nprint(\"3. Wait 3-5 minutes for instance initialization\")\nprint(\"4. Get the external IP and configure Clustrix\")", "execution_count": null }, { "cell_type": "markdown", "id": "rs47wjva5yi", "source": "### GCP Compute Engine Instance Creation\n\nThe above code defines a function that creates a GCP Compute Engine instance optimized for Clustrix workloads. The function returns:\n\n- **gcloud commands**: Complete CLI commands to create the instance\n- **startup script**: Automated setup script that configures the instance\n\nThe configuration includes:\n- Ubuntu 22.04 LTS base image\n- Pre-installed Python packages and Clustrix\n- Clustrix user account with sudo privileges \n- SSH key setup and working directories\n- 50GB balanced persistent disk\n- Appropriate firewall rules and metadata", "metadata": {} }, { "cell_type": "markdown", "id": "nrfay0eolc", "source": "**Next Steps**: \n\n1. **Save the startup script** to a file named `startup-script.sh` in your current directory\n2. **Execute the gcloud commands** shown above to create your instance\n3. **Wait for the instance to fully initialize** (startup script takes 3-5 minutes)\n4. **Get the external IP** using the describe command shown above\n5. **Test SSH access** to ensure the instance is ready for Clustrix", "metadata": {} }, { "cell_type": "markdown", "id": "clustrix-gcp-config", "metadata": {}, "source": [ "### Configure Clustrix for Compute Engine" ] }, { "cell_type": "code", "id": "config-gcp-compute", "metadata": {}, "outputs": [], "source": "# Get the external IP of your created instance\n# Replace with the actual external IP from your instance\nINSTANCE_EXTERNAL_IP = \"YOUR_INSTANCE_EXTERNAL_IP\" # Replace this!\n\n# Configure Clustrix to use your Compute Engine instance\nconfigure(\n cluster_type=\"ssh\",\n cluster_host=INSTANCE_EXTERNAL_IP,\n username=\"clustrix\", # or your default user\n key_file=\"~/.ssh/gcp_key\", # path to your SSH private key\n remote_work_dir=\"/tmp/clustrix\",\n package_manager=\"auto\", # Will use uv if available, pip otherwise\n default_cores=4,\n default_memory=\"8GB\",\n default_time=\"01:00:00\"\n)\n\n# Verify configuration\nif INSTANCE_EXTERNAL_IP != \"YOUR_INSTANCE_EXTERNAL_IP\":\n print(f\"\u2713 Clustrix configured for GCP Compute Engine\")\n print(f\" Host: {INSTANCE_EXTERNAL_IP}\")\n print(f\" SSH Key: ~/.ssh/gcp_key\")\n print(f\" Remote Work Dir: /tmp/clustrix\")\nelse:\n print(\"\u26a0\ufe0f Please replace INSTANCE_EXTERNAL_IP with your actual IP address\")", "execution_count": null }, { "cell_type": "markdown", "id": "6qtk506dlio", "source": "**Important Configuration Notes**:\n\n- Replace `YOUR_INSTANCE_EXTERNAL_IP` with the actual external IP address from your Compute Engine instance\n- Use the SSH key path that corresponds to your setup (either `~/.ssh/gcp_key` if you created one following this tutorial, or `~/.ssh/google_compute_engine` for gcloud-generated keys)\n- The `clustrix` user was created by the startup script with appropriate permissions\n- If you encounter connection issues, ensure your firewall rules allow SSH access from your IP address", "metadata": {} }, { "cell_type": "markdown", "id": "gcp-example", "metadata": {}, "source": [ "### Example: Remote Computation on Compute Engine" ] }, { "cell_type": "code", "id": "gcp-compute-example", "metadata": {}, "outputs": [], "source": "# Example: GCP Data Analysis\n@cluster(cores=2, memory=\"4GB\")\ndef gcp_data_analysis(dataset_size=10000, analysis_type='regression'):\n \"\"\"Perform data analysis on GCP Compute Engine.\"\"\"\n import numpy as np\n from sklearn.model_selection import train_test_split\n from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier\n from sklearn.metrics import mean_squared_error, accuracy_score\n from sklearn.datasets import make_regression, make_classification\n import time\n \n start_time = time.time()\n \n # Generate synthetic dataset\n if analysis_type == 'regression':\n X, y = make_regression(\n n_samples=dataset_size,\n n_features=20,\n noise=0.1,\n random_state=42\n )\n model = RandomForestRegressor(n_estimators=100, random_state=42, n_jobs=-1)\n metric_name = 'rmse'\n else:\n X, y = make_classification(\n n_samples=dataset_size,\n n_features=20,\n n_classes=3,\n random_state=42\n )\n model = RandomForestClassifier(n_estimators=100, random_state=42, n_jobs=-1)\n metric_name = 'accuracy'\n \n # Split data\n X_train, X_test, y_train, y_test = train_test_split(\n X, y, test_size=0.2, random_state=42\n )\n \n # Train model\n training_start = time.time()\n model.fit(X_train, y_train)\n training_time = time.time() - training_start\n \n # Evaluate\n y_pred = model.predict(X_test)\n \n if analysis_type == 'regression':\n metric_value = np.sqrt(mean_squared_error(y_test, y_pred))\n else:\n metric_value = accuracy_score(y_test, y_pred)\n \n total_time = time.time() - start_time\n \n return {\n 'analysis_type': analysis_type,\n 'dataset_size': dataset_size,\n 'training_time': training_time,\n 'total_time': total_time,\n metric_name: metric_value,\n 'feature_importance': model.feature_importances_[:5].tolist(), # Top 5\n 'training_samples': len(X_train),\n 'test_samples': len(X_test)\n }\n\n# Example: Parallel Computation\n@cluster(cores=4, memory=\"8GB\")\ndef gcp_parallel_computation(n_iterations=1000):\n \"\"\"Basic parallel computation example.\"\"\"\n import numpy as np\n import time\n \n start_time = time.time()\n \n # Simulate CPU-intensive work\n results = []\n for i in range(n_iterations):\n # Monte Carlo pi estimation\n points = np.random.random((1000, 2))\n inside_circle = np.sum((points**2).sum(axis=1) <= 1)\n pi_estimate = 4 * inside_circle / 1000\n results.append(pi_estimate)\n \n computation_time = time.time() - start_time\n final_pi_estimate = np.mean(results)\n \n return {\n 'iterations': n_iterations,\n 'pi_estimate': final_pi_estimate,\n 'computation_time': computation_time,\n 'accuracy': abs(final_pi_estimate - np.pi)\n }\n\nprint(\"\u2713 GCP computation examples defined\")\nprint(\"\\n\ud83d\udcdd Example usage:\")\nprint(\"# Data analysis:\")\nprint(\"# result = gcp_data_analysis(dataset_size=50000, analysis_type='classification')\")\nprint(\"# print(f'Accuracy: {result[\\\"accuracy\\\"]:.4f}')\")\nprint(\"#\")\nprint(\"# Parallel computation:\")\nprint(\"# result = gcp_parallel_computation(n_iterations=5000)\")\nprint(\"# print(f'Pi estimate: {result[\\\"pi_estimate\\\"]:.6f}')\")\n\n# Example execution (commented out - uncomment after setup):\n# result = gcp_data_analysis(dataset_size=5000, analysis_type='classification')\n# print(f\"\u2713 Analysis completed: {result['accuracy']:.4f} accuracy\")\n# print(f\"\u23f1\ufe0f Training time: {result['training_time']:.2f} seconds\")", "execution_count": null }, { "cell_type": "markdown", "id": "gke-setup", "metadata": {}, "source": [ "## Method 2: Google Kubernetes Engine (GKE) Configuration\n", "\n", "GKE provides managed Kubernetes clusters ideal for containerized Clustrix workloads:" ] }, { "cell_type": "code", "id": "gke-cluster-setup", "metadata": {}, "outputs": [], "source": "def setup_gke_cluster_for_clustrix(project_id, cluster_name='clustrix-cluster', zone='us-central1-a'):\n \"\"\"\n Setup GKE cluster optimized for Clustrix workloads.\n \"\"\"\n \n gke_commands = f\"\"\"\n# Enable required APIs\ngcloud services enable container.googleapis.com \\\n --project {project_id}\n\n# Create GKE cluster with auto-scaling\ngcloud container clusters create {cluster_name} \\\n --project {project_id} \\\n --zone {zone} \\\n --machine-type e2-standard-4 \\\n --num-nodes 1 \\\n --enable-autoscaling \\\n --min-nodes 0 \\\n --max-nodes 10 \\\n --enable-autorepair \\\n --enable-autoupgrade \\\n --disk-size 50GB \\\n --disk-type pd-ssd \\\n --enable-network-policy \\\n --enable-ip-alias \\\n --labels purpose=clustrix,environment=tutorial\n\n# Get cluster credentials\ngcloud container clusters get-credentials {cluster_name} \\\n --project {project_id} \\\n --zone {zone}\n\n# Verify cluster access\nkubectl get nodes\n\n# Create clustrix namespace\nkubectl create namespace clustrix\n\n# Set as default namespace\nkubectl config set-context --current --namespace=clustrix\n\"\"\"\n \n # Clustrix job template for Kubernetes\n k8s_job_template = \"\"\"\napiVersion: batch/v1\nkind: Job\nmetadata:\n name: clustrix-job-${JOB_ID}\n namespace: clustrix\nspec:\n template:\n spec:\n restartPolicy: Never\n containers:\n - name: clustrix-worker\n image: python:3.11-slim\n command: [\"bash\", \"-c\"]\n args:\n - |\n pip install clustrix numpy scipy pandas scikit-learn\n python -c \"\n import pickle\n import sys\n \n # Load and execute function\n with open('/data/function_data.pkl', 'rb') as f:\n data = pickle.load(f)\n \n func = pickle.loads(data['function'])\n args = pickle.loads(data['args'])\n kwargs = pickle.loads(data['kwargs'])\n \n try:\n result = func(*args, **kwargs)\n with open('/data/result.pkl', 'wb') as f:\n pickle.dump(result, f)\n except Exception as e:\n with open('/data/error.pkl', 'wb') as f:\n pickle.dump({'error': str(e)}, f)\n raise\n \"\n resources:\n requests:\n memory: \"2Gi\"\n cpu: \"1\"\n limits:\n memory: \"4Gi\"\n cpu: \"2\"\n volumeMounts:\n - name: job-data\n mountPath: /data\n volumes:\n - name: job-data\n persistentVolumeClaim:\n claimName: clustrix-pvc\n backoffLimit: 3\n\"\"\"\n \n return {\n 'cluster_name': cluster_name,\n 'project_id': project_id,\n 'zone': zone,\n 'setup_commands': gke_commands,\n 'job_template': k8s_job_template\n }\n\ndef configure_clustrix_for_gke(cluster_endpoint, cluster_name):\n \"\"\"Configure Clustrix to use GKE cluster.\"\"\"\n configure(\n cluster_type=\"kubernetes\",\n cluster_host=cluster_endpoint,\n # For GKE, authentication is handled via kubectl config\n remote_work_dir=\"/tmp/clustrix\",\n package_manager=\"pip\", # Container-based, pip is fine\n default_cores=2,\n default_memory=\"4GB\",\n default_time=\"01:00:00\"\n )\n print(f\"\u2713 Configured Clustrix for GKE cluster: {cluster_name}\")\n\n# Create GKE configuration\ngke_config = setup_gke_cluster_for_clustrix(\n project_id=PROJECT_ID,\n cluster_name='clustrix-cluster'\n)\n\nprint(\"=== GKE Cluster Setup Commands ===\")\nprint(gke_config['setup_commands'])\nprint(\"\\n=== Kubernetes Job Template ===\")\nprint(gke_config['job_template'])\nprint(\"\\n\ud83d\udcdd Note: GKE integration requires additional implementation in Clustrix.\")\nprint(\"Current Clustrix supports basic Kubernetes, but GKE-specific features need custom setup.\")", "execution_count": null }, { "cell_type": "markdown", "id": "gcp-batch", "metadata": {}, "source": [ "## Method 3: Google Cloud Batch\n", "\n", "Google Cloud Batch provides managed job scheduling for large-scale workloads:" ] }, { "cell_type": "code", "id": "gcp-batch-setup", "metadata": {}, "outputs": [], "source": "def setup_gcp_batch_environment(project_id, region='us-central1'):\n \"\"\"\n Setup Google Cloud Batch for Clustrix workloads.\n \"\"\"\n \n batch_setup_commands = f\"\"\"\n# Enable Batch API\ngcloud services enable batch.googleapis.com \\\n --project {project_id}\n\n# Create a service account for Batch jobs\ngcloud iam service-accounts create clustrix-batch-sa \\\n --project {project_id} \\\n --description=\"Service account for Clustrix Batch jobs\" \\\n --display-name=\"Clustrix Batch Service Account\"\n\n# Grant necessary permissions\ngcloud projects add-iam-policy-binding {project_id} \\\n --member=\"serviceAccount:clustrix-batch-sa@{project_id}.iam.gserviceaccount.com\" \\\n --role=\"roles/batch.jobsEditor\"\n\ngcloud projects add-iam-policy-binding {project_id} \\\n --member=\"serviceAccount:clustrix-batch-sa@{project_id}.iam.gserviceaccount.com\" \\\n --role=\"roles/storage.objectAdmin\"\n\n# Create Cloud Storage bucket for job data\ngsutil mb -p {project_id} -l {region} gs://{project_id}-clustrix-batch\n\"\"\"\n \n # Batch job configuration template\n batch_job_config = {\n \"taskGroups\": [\n {\n \"taskSpec\": {\n \"runnables\": [\n {\n \"script\": {\n \"text\": f\"\"\"\n#!/bin/bash\nset -e\n\n# Install required packages\npip3 install clustrix numpy scipy pandas scikit-learn\n\n# Download job data from Cloud Storage\ngsutil cp gs://{project_id}-clustrix-batch/jobs/${{BATCH_JOB_ID}}/function_data.pkl .\n\n# Execute the function\npython3 -c \"\nimport pickle\nimport traceback\n\ntry:\n with open('function_data.pkl', 'rb') as f:\n data = pickle.load(f)\n \n func = pickle.loads(data['function'])\n args = pickle.loads(data['args'])\n kwargs = pickle.loads(data['kwargs'])\n \n result = func(*args, **kwargs)\n \n with open('result.pkl', 'wb') as f:\n pickle.dump(result, f)\n \nexcept Exception as e:\n with open('error.pkl', 'wb') as f:\n pickle.dump({{\n 'error': str(e),\n 'traceback': traceback.format_exc()\n }}, f)\n raise\n\"\n\n# Upload results to Cloud Storage\ngsutil cp result.pkl gs://{project_id}-clustrix-batch/jobs/${{BATCH_JOB_ID}}/result.pkl || \\\ngsutil cp error.pkl gs://{project_id}-clustrix-batch/jobs/${{BATCH_JOB_ID}}/error.pkl\n\"\"\"\n }\n }\n ],\n \"computeResource\": {\n \"cpuMilli\": 2000, # 2 CPUs\n \"memoryMib\": 4096 # 4 GB RAM\n },\n \"maxRetryCount\": 2,\n \"maxRunDuration\": \"3600s\" # 1 hour\n },\n \"taskCount\": 1\n }\n ],\n \"allocationPolicy\": {\n \"instances\": [\n {\n \"instanceTemplate\": {\n \"machineType\": \"e2-standard-2\",\n \"provisioningModel\": \"STANDARD\"\n }\n }\n ]\n },\n \"labels\": {\n \"purpose\": \"clustrix\",\n \"environment\": \"tutorial\"\n },\n \"logsPolicy\": {\n \"destination\": \"CLOUD_LOGGING\"\n }\n }\n \n return {\n 'project_id': project_id,\n 'region': region,\n 'bucket_name': f'{project_id}-clustrix-batch',\n 'service_account': f'clustrix-batch-sa@{project_id}.iam.gserviceaccount.com',\n 'job_config': batch_job_config,\n 'setup_commands': batch_setup_commands\n }\n\n# Create Batch configuration\nbatch_config = setup_gcp_batch_environment(PROJECT_ID)\n\nprint(\"=== Google Cloud Batch Setup Commands ===\")\nprint(batch_config['setup_commands'])\nprint(\"\\n=== Batch Job Configuration ===\")\nprint(json.dumps(batch_config['job_config'], indent=2))\nprint(\"\\n\ud83d\udca1 Google Cloud Batch provides excellent integration for large-scale Clustrix workloads.\")", "execution_count": null }, { "cell_type": "markdown", "id": "cloud-storage", "metadata": {}, "source": [ "## Data Management with Google Cloud Storage" ] }, { "cell_type": "code", "id": "cloud-storage-integration", "metadata": {}, "outputs": [], "source": "@cluster(cores=2, memory=\"4GB\")\ndef process_gcs_data(bucket_name, input_blob, output_blob, project_id=None):\n \"\"\"Process data from Google Cloud Storage and save results back.\"\"\"\n from google.cloud import storage\n import numpy as np\n import pickle\n import io\n import time\n \n # Initialize Cloud Storage client\n storage_client = storage.Client(project=project_id)\n bucket = storage_client.bucket(bucket_name)\n \n # Download data from Cloud Storage\n input_blob_obj = bucket.blob(input_blob)\n data_bytes = input_blob_obj.download_as_bytes()\n data = pickle.loads(data_bytes)\n \n # Process the data\n processed_data = {\n 'original_shape': data.shape if hasattr(data, 'shape') else len(data) if hasattr(data, '__len__') else 'scalar',\n 'mean': float(np.mean(data)) if hasattr(data, '__iter__') else float(data),\n 'std': float(np.std(data)) if hasattr(data, '__iter__') else 0.0,\n 'max': float(np.max(data)) if hasattr(data, '__iter__') else float(data),\n 'min': float(np.min(data)) if hasattr(data, '__iter__') else float(data),\n 'processing_timestamp': time.time(),\n 'processed_on': 'gcp-compute-engine',\n 'data_type': str(type(data).__name__)\n }\n \n # Advanced processing based on data type\n if hasattr(data, 'shape') and len(data.shape) >= 2:\n # Matrix operations\n processed_data.update({\n 'matrix_rank': int(np.linalg.matrix_rank(data)) if data.shape[0] == data.shape[1] else 'non_square',\n 'frobenius_norm': float(np.linalg.norm(data, 'fro')),\n 'condition_number': float(np.linalg.cond(data)) if data.shape[0] == data.shape[1] else None\n })\n \n # Upload results to Cloud Storage\n output_bytes = pickle.dumps(processed_data)\n output_blob_obj = bucket.blob(output_blob)\n output_blob_obj.upload_from_string(output_bytes)\n \n return f\"Processed data saved to gs://{bucket_name}/{output_blob}\"\n\n# Utility functions for Google Cloud Storage\ndef upload_to_gcs(data, bucket_name, blob_name, project_id=None):\n \"\"\"Upload data to Google Cloud Storage.\"\"\"\n storage_client = storage.Client(project=project_id)\n bucket = storage_client.bucket(bucket_name)\n blob = bucket.blob(blob_name)\n \n data_bytes = pickle.dumps(data)\n blob.upload_from_string(data_bytes)\n return f\"gs://{bucket_name}/{blob_name}\"\n\ndef download_from_gcs(bucket_name, blob_name, project_id=None):\n \"\"\"Download data from Google Cloud Storage.\"\"\"\n storage_client = storage.Client(project=project_id)\n bucket = storage_client.bucket(bucket_name)\n blob = bucket.blob(blob_name)\n \n data_bytes = blob.download_as_bytes()\n return pickle.loads(data_bytes)\n\ndef create_gcs_bucket_for_clustrix(project_id, bucket_name, location='us-central1'):\n \"\"\"Create a Cloud Storage bucket for Clustrix data.\"\"\"\n gcs_commands = f\"\"\"\n# Create bucket with appropriate settings\ngsutil mb -p {project_id} -l {location} gs://{bucket_name}\n\n# Set lifecycle policy to delete temporary files after 7 days\necho '{{\n \"lifecycle\": {{\n \"rule\": [\n {{\n \"action\": {{\"type\": \"Delete\"}},\n \"condition\": {{\n \"age\": 7,\n \"matchesPrefix\": [\"temp/\"]\n }}\n }}\n ]\n }}\n}}' > lifecycle.json\n\ngsutil lifecycle set lifecycle.json gs://{bucket_name}\n\n# Set up proper permissions (if using service account)\ngsutil iam ch serviceAccount:clustrix-batch-sa@{project_id}.iam.gserviceaccount.com:objectAdmin gs://{bucket_name}\n\"\"\"\n \n return gcs_commands\n\n# Create bucket configuration\nBUCKET_NAME = f\"{PROJECT_ID}-clustrix-data\"\nbucket_commands = create_gcs_bucket_for_clustrix(PROJECT_ID, BUCKET_NAME)\n\nprint(\"=== Commands to create Cloud Storage bucket ===\")\nprint(bucket_commands)\n\n# Example usage (commented out - uncomment after creating bucket):\n# sample_data = np.random.rand(1000, 100)\n# upload_location = upload_to_gcs(sample_data, BUCKET_NAME, 'input/sample_data.pkl', PROJECT_ID)\n# print(f\"\u2713 Data uploaded to {upload_location}\")\n# \n# result = process_gcs_data(BUCKET_NAME, 'input/sample_data.pkl', 'output/results.pkl', PROJECT_ID)\n# print(f\"\u2713 Processing completed: {result}\")\n\nprint(\"\\n\u2713 Google Cloud Storage integration functions defined.\")\nprint(\"Execute the bucket creation commands above, then uncomment the example usage.\")", "execution_count": null }, { "cell_type": "markdown", "id": "vertex-ai", "metadata": {}, "source": [ "## Vertex AI Integration" ] }, { "cell_type": "code", "id": "vertex-ai-setup", "metadata": {}, "outputs": [], "source": "def setup_vertex_ai_for_clustrix(project_id, region='us-central1'):\n \"\"\"\n Setup Vertex AI for ML workloads with Clustrix.\n \"\"\"\n \n vertex_commands = f\"\"\"\n# Enable Vertex AI API\ngcloud services enable aiplatform.googleapis.com \\\n --project {project_id}\n\n# Create Vertex AI custom training job\ngcloud ai custom-jobs create \\\n --region={region} \\\n --display-name=clustrix-training-job \\\n --config=training_job_config.yaml\n\n# Create Vertex AI endpoints for model serving\ngcloud ai endpoints create \\\n --region={region} \\\n --display-name=clustrix-model-endpoint\n\"\"\"\n \n # Vertex AI training job configuration\n training_config = f\"\"\"\n# training_job_config.yaml\nworkerPoolSpecs:\n- machineSpec:\n machineType: e2-standard-4\n replicaCount: 1\n containerSpec:\n imageUri: gcr.io/cloud-aiplatform/training/tf-cpu.2-8:latest\n command:\n - python3\n - -c\n args:\n - |\n import subprocess\n import sys\n \n # Install clustrix\n subprocess.check_call([sys.executable, '-m', 'pip', 'install', 'clustrix', 'numpy', 'pandas', 'scikit-learn'])\n \n # Your training code here\n print(\"Clustrix training job completed on Vertex AI\")\n env:\n - name: GOOGLE_CLOUD_PROJECT\n value: {project_id}\n - name: AIP_MODEL_DIR\n value: gs://{project_id}-vertex-models\n\"\"\"\n \n return {\n 'project_id': project_id,\n 'region': region,\n 'setup_commands': vertex_commands,\n 'training_config': training_config\n }\n\n@cluster(cores=4, memory=\"8GB\")\ndef vertex_ai_ml_pipeline(dataset_config, model_config, project_id, bucket_name):\n \"\"\"ML pipeline that could run on Vertex AI with Clustrix.\"\"\"\n import numpy as np\n from sklearn.ensemble import GradientBoostingClassifier\n from sklearn.model_selection import cross_val_score, GridSearchCV\n from sklearn.datasets import make_classification\n from sklearn.metrics import classification_report\n from google.cloud import storage\n import pickle\n import time\n \n start_time = time.time()\n \n # Generate or load dataset\n X, y = make_classification(\n n_samples=dataset_config['n_samples'],\n n_features=dataset_config['n_features'],\n n_classes=dataset_config['n_classes'],\n n_informative=dataset_config.get('n_informative', dataset_config['n_features'] // 2),\n random_state=42\n )\n \n # Hyperparameter tuning\n param_grid = {\n 'n_estimators': [50, 100, 200],\n 'max_depth': [3, 5, 7],\n 'learning_rate': [0.01, 0.1, 0.2]\n }\n \n # Grid search with cross-validation\n model = GradientBoostingClassifier(random_state=42)\n grid_search = GridSearchCV(\n model, param_grid, cv=5, scoring='accuracy', n_jobs=-1\n )\n \n grid_search.fit(X, y)\n \n # Get best model\n best_model = grid_search.best_estimator_\n \n # Evaluate with cross-validation\n cv_scores = cross_val_score(best_model, X, y, cv=5, scoring='accuracy')\n \n # Save model to Cloud Storage\n storage_client = storage.Client(project=project_id)\n bucket = storage_client.bucket(bucket_name)\n \n model_blob = bucket.blob('models/clustrix_model.pkl')\n model_bytes = pickle.dumps(best_model)\n model_blob.upload_from_string(model_bytes)\n \n total_time = time.time() - start_time\n \n return {\n 'best_params': grid_search.best_params_,\n 'best_score': grid_search.best_score_,\n 'cv_mean_score': cv_scores.mean(),\n 'cv_std_score': cv_scores.std(),\n 'training_time': total_time,\n 'model_location': f'gs://{bucket_name}/models/clustrix_model.pkl',\n 'feature_importance': best_model.feature_importances_[:10].tolist(), # Top 10\n 'dataset_size': len(X)\n }\n\n# Setup Vertex AI configuration\nvertex_config = setup_vertex_ai_for_clustrix(PROJECT_ID)\n\nprint(\"=== Vertex AI Setup Commands ===\")\nprint(vertex_config['setup_commands'])\nprint(\"\\n=== Training Job Configuration ===\")\nprint(vertex_config['training_config'])\n\n# Example usage (commented out):\n# dataset_params = {'n_samples': 10000, 'n_features': 20, 'n_classes': 3}\n# model_params = {}\n# result = vertex_ai_ml_pipeline(dataset_params, model_params, PROJECT_ID, BUCKET_NAME)\n# print(f\"\u2713 Best model score: {result['best_score']:.4f}\")\n# print(f\"\u2713 Model saved to: {result['model_location']}\")\n\nprint(\"\\n\u2713 Vertex AI integration examples defined.\")", "execution_count": null }, { "cell_type": "markdown", "id": "gcp-security", "metadata": {}, "source": [ "## Security Best Practices" ] }, { "cell_type": "code", "id": "gcp-security-setup", "metadata": {}, "outputs": [], "source": "def setup_gcp_security_for_clustrix(project_id):\n \"\"\"\n Security configuration for GCP + Clustrix deployment.\n \"\"\"\n \n security_commands = f\"\"\"\n# Create VPC with private subnets\ngcloud compute networks create clustrix-vpc \\\n --project {project_id} \\\n --subnet-mode custom\n\ngcloud compute networks subnets create clustrix-subnet \\\n --project {project_id} \\\n --network clustrix-vpc \\\n --range 10.1.0.0/24 \\\n --region us-central1 \\\n --enable-private-ip-google-access\n\n# Create firewall rules (restrictive)\ngcloud compute firewall-rules create clustrix-allow-ssh \\\n --project {project_id} \\\n --network clustrix-vpc \\\n --allow tcp:22 \\\n --source-ranges YOUR_IP/32 \\\n --target-tags clustrix\n\ngcloud compute firewall-rules create clustrix-internal \\\n --project {project_id} \\\n --network clustrix-vpc \\\n --allow tcp,udp,icmp \\\n --source-ranges 10.1.0.0/24 \\\n --target-tags clustrix\n\n# Create service account with minimal permissions\ngcloud iam service-accounts create clustrix-compute \\\n --project {project_id} \\\n --description=\"Service account for Clustrix compute instances\" \\\n --display-name=\"Clustrix Compute Service Account\"\n\n# Grant only necessary permissions\ngcloud projects add-iam-policy-binding {project_id} \\\n --member=\"serviceAccount:clustrix-compute@{project_id}.iam.gserviceaccount.com\" \\\n --role=\"roles/storage.objectAdmin\"\n\ngcloud projects add-iam-policy-binding {project_id} \\\n --member=\"serviceAccount:clustrix-compute@{project_id}.iam.gserviceaccount.com\" \\\n --role=\"roles/logging.logWriter\"\n\n# Enable OS Login for better SSH key management\ngcloud compute project-info add-metadata \\\n --project {project_id} \\\n --metadata enable-oslogin=TRUE\n\n# Create Cloud KMS key for encryption\ngcloud kms keyrings create clustrix-keyring \\\n --project {project_id} \\\n --location global\n\ngcloud kms keys create clustrix-key \\\n --project {project_id} \\\n --keyring clustrix-keyring \\\n --location global \\\n --purpose encryption\n\"\"\"\n \n return {\n 'project_id': project_id,\n 'vpc_name': 'clustrix-vpc',\n 'subnet_name': 'clustrix-subnet',\n 'service_account': f'clustrix-compute@{project_id}.iam.gserviceaccount.com',\n 'security_commands': security_commands\n }\n\n# Generate security configuration\nsecurity_config = setup_gcp_security_for_clustrix(PROJECT_ID)\n\nprint(\"=== GCP Security Setup Commands ===\")\nprint(security_config['security_commands'])\nprint(f\"\\n\u2713 Security configuration templates generated for project: {PROJECT_ID}\")\nprint(f\"\u2713 VPC: {security_config['vpc_name']}\")\nprint(f\"\u2713 Service Account: {security_config['service_account']}\")\nprint(\"\\n\u26a0\ufe0f Remember to replace 'YOUR_IP' with your actual IP address in the firewall rules!\")", "execution_count": null }, { "cell_type": "markdown", "id": "7zpjrtwse94", "source": "### GCP Security Checklist for Clustrix\n\n\u2713 **Authentication and Access**\n- Use IAM service accounts with minimal permissions\n- Enable OS Login for centralized SSH key management\n- Create custom VPC with private subnets\n- Restrict firewall rules to specific IP ranges\n\n\u2713 **Infrastructure Security**\n- Enable private Google access for instances without external IPs\n- Use Cloud KMS for encryption at rest\n- Enable audit logging and Cloud Security Command Center\n- Use Binary Authorization for container security\n\n\u2713 **Network Security**\n- Implement VPC Service Controls for data perimeter\n- Enable DDoS protection and Cloud Armor\n- Use Secret Manager for sensitive configuration\n- Enable vulnerability scanning for container images\n\n\u2713 **Governance and Compliance**\n- Set up budget alerts and billing account security\n- Use organization policies for governance\n- Regular security reviews and access audits", "metadata": {} }, { "cell_type": "markdown", "id": "cleanup-gcp", "metadata": {}, "source": [ "## Resource Cleanup" ] }, { "cell_type": "code", "id": "cleanup-gcp-resources", "metadata": {}, "outputs": [], "source": "def cleanup_gcp_resources(project_id, zone='us-central1-a', region='us-central1'):\n \"\"\"\n Clean up GCP resources to avoid ongoing charges.\n \n Args:\n project_id: GCP project ID\n zone: Zone where resources were created\n region: Region where resources were created\n \"\"\"\n \n cleanup_commands = f\"\"\"\n# List all compute instances\ngcloud compute instances list --project {project_id}\n\n# Delete specific instances\ngcloud compute instances delete clustrix-instance \\\n --project {project_id} \\\n --zone {zone} \\\n --quiet\n\n# Delete managed instance groups\ngcloud compute instance-groups managed delete clustrix-preemptible-group \\\n --project {project_id} \\\n --zone {zone} \\\n --quiet\n\n# Delete instance templates\ngcloud compute instance-templates delete clustrix-preemptible-template \\\n --project {project_id} \\\n --quiet\n\n# Delete GKE clusters\ngcloud container clusters delete clustrix-cluster \\\n --project {project_id} \\\n --zone {zone} \\\n --quiet\n\n# Delete Cloud Storage buckets (BE CAREFUL - THIS DELETES ALL DATA)\ngsutil -m rm -r gs://{project_id}-clustrix-batch\ngsutil -m rm -r gs://{project_id}-vertex-models\ngsutil -m rm -r gs://{project_id}-clustrix-data\n\n# Delete firewall rules\ngcloud compute firewall-rules delete clustrix-allow-ssh clustrix-internal \\\n --project {project_id} \\\n --quiet\n\n# Delete VPC network\ngcloud compute networks subnets delete clustrix-subnet \\\n --project {project_id} \\\n --region {region} \\\n --quiet\n\ngcloud compute networks delete clustrix-vpc \\\n --project {project_id} \\\n --quiet\n\n# Delete service accounts\ngcloud iam service-accounts delete clustrix-compute@{project_id}.iam.gserviceaccount.com \\\n --project {project_id} \\\n --quiet\n\ngcloud iam service-accounts delete clustrix-batch-sa@{project_id}.iam.gserviceaccount.com \\\n --project {project_id} \\\n --quiet\n\n# List remaining billable resources\necho \"=== Remaining billable resources ===\"\ngcloud compute instances list --project {project_id}\ngcloud compute disks list --project {project_id}\ngcloud compute addresses list --project {project_id}\ngcloud container clusters list --project {project_id}\n\"\"\"\n \n return {\n 'project_id': project_id,\n 'zone': zone,\n 'region': region,\n 'cleanup_commands': cleanup_commands\n }\n\n# Generate cleanup commands\ncleanup_info = cleanup_gcp_resources(PROJECT_ID)\n\nprint(f\"=== GCP Resource Cleanup Commands for Project: {PROJECT_ID} ===\")\nprint(cleanup_info['cleanup_commands'])\nprint(\"\\n\u26a0\ufe0f WARNING: Some commands will permanently delete resources and data!\")\nprint(\"Review each resource before deleting and ensure you have backups if needed.\")\nprint(\"\\n\ud83d\udca1 TIP: Use 'gcloud compute instances stop' instead of 'delete' to preserve instances while stopping charges.\")\nprint(\"\\n\u2713 Cleanup commands generated. Always verify resources before deletion!\")", "execution_count": null }, { "cell_type": "markdown", "id": "advanced-gcp-example", "metadata": {}, "source": [ "## Advanced Example: Distributed Scientific Computing" ] }, { "cell_type": "code", "id": "scientific-computing-example", "metadata": {}, "outputs": [], "source": "# Advanced Scientific Computing\n@cluster(cores=4, memory=\"8GB\", time=\"01:00:00\")\ndef gcp_scientific_simulation(simulation_params, storage_config=None):\n \"\"\"\n Distributed scientific simulation using GCP infrastructure.\n \"\"\"\n import numpy as np\n from scipy.integrate import odeint\n from scipy.optimize import minimize\n import pickle\n import time\n import matplotlib\n matplotlib.use('Agg') # Use non-interactive backend\n import matplotlib.pyplot as plt\n import io\n \n # Only import GCP storage if config provided\n if storage_config:\n from google.cloud import storage\n \n def lorenz_system(state, t, sigma, rho, beta):\n \"\"\"Lorenz attractor differential equations.\"\"\"\n x, y, z = state\n return [\n sigma * (y - x),\n x * (rho - z) - y,\n x * y - beta * z\n ]\n \n def simulate_lorenz(params, time_points):\n \"\"\"Simulate Lorenz system with given parameters.\"\"\"\n initial_state = [1.0, 1.0, 1.0]\n solution = odeint(\n lorenz_system, initial_state, time_points,\n args=(params['sigma'], params['rho'], params['beta'])\n )\n return solution\n \n start_time = time.time()\n \n # Parameter sweep\n parameter_sets = simulation_params['parameter_sets']\n time_points = np.linspace(0, simulation_params['max_time'], simulation_params['num_points'])\n \n results = []\n \n for i, params in enumerate(parameter_sets):\n # Run simulation\n solution = simulate_lorenz(params, time_points)\n \n # Analyze results\n x, y, z = solution[:, 0], solution[:, 1], solution[:, 2]\n \n analysis = {\n 'params': params,\n 'max_x': float(np.max(x)),\n 'min_x': float(np.min(x)),\n 'max_y': float(np.max(y)),\n 'min_y': float(np.min(y)),\n 'max_z': float(np.max(z)),\n 'min_z': float(np.min(z)),\n 'mean_energy': float(np.mean(x**2 + y**2 + z**2)),\n 'final_state': [float(x[-1]), float(y[-1]), float(z[-1])],\n 'std_x': float(np.std(x)),\n 'std_y': float(np.std(y)),\n 'std_z': float(np.std(z))\n }\n \n results.append(analysis)\n \n # Create visualization for first few parameter sets\n if i < 3:\n fig = plt.figure(figsize=(12, 4))\n \n # Time series\n plt.subplot(1, 3, 1)\n plt.plot(time_points, x, label='X', alpha=0.8)\n plt.plot(time_points, y, label='Y', alpha=0.8)\n plt.plot(time_points, z, label='Z', alpha=0.8)\n plt.xlabel('Time')\n plt.ylabel('State')\n plt.title(f'Lorenz System (\u03c3={params[\"sigma\"]}, \u03c1={params[\"rho\"]}, \u03b2={params[\"beta\"]})')\n plt.legend()\n plt.grid(True, alpha=0.3)\n \n # Phase space (X-Y)\n plt.subplot(1, 3, 2)\n plt.plot(x, y, alpha=0.7, linewidth=0.8)\n plt.xlabel('X')\n plt.ylabel('Y')\n plt.title('X-Y Phase Space')\n plt.grid(True, alpha=0.3)\n \n # Phase space (X-Z)\n plt.subplot(1, 3, 3)\n plt.plot(x, z, alpha=0.7, linewidth=0.8)\n plt.xlabel('X')\n plt.ylabel('Z')\n plt.title('X-Z Phase Space')\n plt.grid(True, alpha=0.3)\n \n plt.tight_layout()\n \n # Save plot to Cloud Storage if configured\n if storage_config:\n try:\n img_buffer = io.BytesIO()\n plt.savefig(img_buffer, format='png', dpi=150, bbox_inches='tight')\n img_buffer.seek(0)\n \n storage_client = storage.Client(project=storage_config['project_id'])\n bucket = storage_client.bucket(storage_config['bucket_name'])\n \n plot_blob = bucket.blob(f\"plots/lorenz_simulation_{i}.png\")\n plot_blob.upload_from_string(img_buffer.getvalue(), content_type='image/png')\n except Exception as e:\n print(f\"Warning: Could not save plot to GCS: {e}\")\n \n plt.close()\n \n computation_time = time.time() - start_time\n \n # Calculate summary statistics\n energies = [r['mean_energy'] for r in results]\n summary_stats = {\n 'total_simulations': len(parameter_sets),\n 'computation_time': computation_time,\n 'average_energy': np.mean(energies),\n 'max_energy': max(energies),\n 'min_energy': min(energies),\n 'energy_std': np.std(energies),\n 'time_per_simulation': computation_time / len(parameter_sets)\n }\n \n # Save detailed results to Cloud Storage if configured\n if storage_config:\n try:\n storage_client = storage.Client(project=storage_config['project_id'])\n bucket = storage_client.bucket(storage_config['bucket_name'])\n \n results_blob = bucket.blob(\"results/simulation_results.pkl\")\n results_data = {\n 'simulation_params': simulation_params,\n 'results': results,\n 'summary_stats': summary_stats,\n 'timestamp': time.time()\n }\n results_bytes = pickle.dumps(results_data)\n results_blob.upload_from_string(results_bytes)\n except Exception as e:\n print(f\"Warning: Could not save results to GCS: {e}\")\n \n return {\n 'num_simulations': len(parameter_sets),\n 'computation_time': computation_time,\n 'summary_stats': summary_stats,\n 'results_preview': results[:2], # First 2 for brevity\n 'storage_location': f\"gs://{storage_config['bucket_name']}/results/\" if storage_config else None,\n 'plots_saved': min(3, len(parameter_sets))\n }\n\n# Monte Carlo simulation example\n@cluster(cores=2, memory=\"4GB\")\ndef gcp_monte_carlo_simulation(n_samples=1000000):\n \"\"\"Monte Carlo simulation for option pricing.\"\"\"\n import numpy as np\n import time\n \n start_time = time.time()\n \n # Black-Scholes parameters\n S0 = 100 # Initial stock price\n K = 105 # Strike price\n T = 1.0 # Time to expiration\n r = 0.05 # Risk-free rate\n sigma = 0.2 # Volatility\n \n # Generate random samples\n np.random.seed(42)\n Z = np.random.standard_normal(n_samples)\n \n # Simulate stock prices at expiration\n ST = S0 * np.exp((r - 0.5 * sigma**2) * T + sigma * np.sqrt(T) * Z)\n \n # Calculate option payoffs\n call_payoffs = np.maximum(ST - K, 0)\n put_payoffs = np.maximum(K - ST, 0)\n \n # Discount to present value\n call_price = np.exp(-r * T) * np.mean(call_payoffs)\n put_price = np.exp(-r * T) * np.mean(put_payoffs)\n \n # Calculate confidence intervals\n call_std = np.std(call_payoffs) / np.sqrt(n_samples)\n put_std = np.std(put_payoffs) / np.sqrt(n_samples)\n \n computation_time = time.time() - start_time\n \n return {\n 'n_samples': n_samples,\n 'computation_time': computation_time,\n 'call_price': call_price,\n 'put_price': put_price,\n 'call_confidence_interval': [call_price - 1.96*call_std, call_price + 1.96*call_std],\n 'put_confidence_interval': [put_price - 1.96*put_std, put_price + 1.96*put_std],\n 'parameters': {'S0': S0, 'K': K, 'T': T, 'r': r, 'sigma': sigma}\n }\n\nprint(\"\u2713 Advanced scientific computing examples defined\")\n\n# Example simulation parameters\nexample_lorenz_params = {\n 'parameter_sets': [\n {'sigma': 10.0, 'rho': 28.0, 'beta': 8.0/3.0}, # Classic chaotic\n {'sigma': 10.0, 'rho': 24.74, 'beta': 8.0/3.0}, # Near onset\n {'sigma': 10.0, 'rho': 99.65, 'beta': 8.0/3.0}, # High rho\n {'sigma': 16.0, 'rho': 45.92, 'beta': 4.0}, # Different params\n ],\n 'max_time': 25.0,\n 'num_points': 5000\n}\n\nprint(\"\\n\ud83d\udcdd Example usage:\")\nprint(\"# Lorenz simulation:\")\nprint(\"# result = gcp_scientific_simulation(example_lorenz_params)\")\nprint(\"# print(f'Completed {result[\\\"num_simulations\\\"]} simulations')\")\nprint(\"# print(f'Computation time: {result[\\\"computation_time\\\"]:.2f} seconds')\")\nprint(\"#\")\nprint(\"# Monte Carlo simulation:\")\nprint(\"# mc_result = gcp_monte_carlo_simulation(n_samples=5000000)\")\nprint(\"# print(f'Call option price: ${mc_result[\\\"call_price\\\"]:.2f}')\")\n\nprint(\"\\n\ud83e\uddea These examples demonstrate GCP's computational capabilities:\")\nprint(\" \u2022 Parallel differential equation solving\")\nprint(\" \u2022 Statistical simulations with confidence intervals\")\nprint(\" \u2022 Cloud Storage integration for results\")\nprint(\" \u2022 Visualization generation and storage\")", "execution_count": null }, { "cell_type": "markdown", "id": "gcp-summary", "metadata": {}, "source": "## Summary\n\nThis tutorial covered:\n\n1. **Setup**: GCP authentication and Clustrix installation\n2. **Compute Engine**: Direct VM configuration and management\n3. **GKE Integration**: Kubernetes clusters for containerized workloads\n4. **Cloud Batch**: Managed job scheduling for large-scale processing\n5. **Cloud Storage**: Data management and result storage\n6. **Vertex AI**: Machine learning platform integration\n7. **Security**: Best practices for secure deployment\n8. **Resource Management**: Proper cleanup procedures\n\n### Cost Monitoring\n\nFor comprehensive cost monitoring, optimization strategies, and multi-cloud cost comparisons, see the dedicated [Cost Monitoring Tutorial](cost_monitoring_tutorial.ipynb).\n\n### Next Steps\n\n- Set up your GCP credentials and test the basic configuration\n- Start with a simple Compute Engine instance for initial testing\n- Consider GKE for containerized workloads and auto-scaling\n- Explore Cloud Batch for large-scale batch processing\n- Implement proper monitoring and access controls\n- Review the Cost Monitoring Tutorial for expense tracking\n\n### GCP-Specific Advantages\n\n- **Preemptible/Spot VMs**: Exceptional cost savings (up to 80%)\n- **Google Kubernetes Engine**: Industry-leading managed Kubernetes\n- **Vertex AI**: Comprehensive ML platform with AutoML capabilities\n- **Global Network**: Superior network performance and global reach\n- **BigQuery Integration**: Seamless data analytics integration\n- **Sustained Use Discounts**: Automatic discounts for sustained usage\n\n### Resources\n\n- [Google Cloud Compute Engine Documentation](https://cloud.google.com/compute/docs)\n- [Google Kubernetes Engine Documentation](https://cloud.google.com/kubernetes-engine/docs)\n- [Google Cloud Batch Documentation](https://cloud.google.com/batch/docs)\n- [Vertex AI Documentation](https://cloud.google.com/vertex-ai/docs)\n- [Google Cloud Storage Documentation](https://cloud.google.com/storage/docs)\n- [GCP Pricing Calculator](https://cloud.google.com/products/calculator)\n- [Clustrix Documentation](https://clustrix.readthedocs.io/)\n- [Clustrix Cost Monitoring Tutorial](cost_monitoring_tutorial.ipynb)\n\n**Remember**: Always monitor your cloud costs and clean up resources when not in use!" } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 5 }