{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Cloud Cost Monitoring and Optimization\n", "\n", "This tutorial demonstrates Clustrix's comprehensive cost monitoring features for cloud platforms. Learn how to track expenses, optimize resource usage, and make informed decisions about cloud infrastructure.\n", "\n", "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ContextLab/clustrix/blob/master/docs/source/notebooks/cost_monitoring_tutorial.ipynb)\n", "\n", "## Overview\n", "\n", "Clustrix provides built-in cost monitoring for multiple cloud platforms:\n", "\n", "- **Amazon Web Services (AWS)**: EC2, ECS, Batch, Lambda, SageMaker\n", "- **Google Cloud Platform (GCP)**: Compute Engine, GKE, Cloud Batch, Vertex AI\n", "- **Microsoft Azure**: Virtual Machines, AKS, Batch, ML Compute\n", "- **Lambda Cloud**: GPU instances for ML workloads\n", "- **Hugging Face Spaces**: Inference endpoints and Spaces hardware\n", "\n", "## Key Features\n", "\n", "- **Automatic Cost Tracking**: Decorator-based cost monitoring\n", "- **Real-time Pricing**: Up-to-date pricing information\n", "- **Regional Comparisons**: Find the most cost-effective regions\n", "- **Optimization Recommendations**: Automatic suggestions for cost savings\n", "- **Multi-cloud Support**: Compare costs across different providers" ], "id": "cell-0" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Installation\n", "\n", "Install Clustrix with cost monitoring support:" ], "id": "cell-1" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Install Clustrix\n", "!pip install clustrix\n", "\n", "# Import cost monitoring functions\n", "from clustrix import (\n", " cost_tracking_decorator,\n", " get_cost_monitor,\n", " start_cost_monitoring,\n", " generate_cost_report,\n", " get_pricing_info\n", ")\n", "\n", "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "import time" ], "id": "cell-2" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Basic Cost Monitoring\n", "\n", "### Getting Pricing Information" ], "id": "cell-3" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Get pricing information for different cloud providers\n", "print(\"=== AWS EC2 Pricing (Top 10 Instance Types) ===\")\n", "aws_pricing = get_pricing_info('aws')\n", "for instance_type, price in list(aws_pricing.items())[:10]:\n", " print(f\"{instance_type:20} ${price:.4f}/hour\")\n", "\n", "print(\"\\n=== GCP Compute Engine Pricing (Top 10 Instance Types) ===\")\n", "gcp_pricing = get_pricing_info('gcp')\n", "for instance_type, price in list(gcp_pricing.items())[:10]:\n", " print(f\"{instance_type:20} ${price:.4f}/hour\")\n", "\n", "print(\"\\n=== Azure VM Pricing (Top 10 Instance Types) ===\")\n", "azure_pricing = get_pricing_info('azure')\n", "for instance_type, price in list(azure_pricing.items())[:10]:\n", " print(f\"{instance_type:20} ${price:.4f}/hour\")\n", "\n", "print(f\"\\nTotal instance types available:\")\n", "print(f\" AWS: {len(aws_pricing)}\")\n", "print(f\" GCP: {len(gcp_pricing)}\")\n", "print(f\" Azure: {len(azure_pricing)}\")" ], "id": "cell-4" }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Manual Cost Monitoring" ], "id": "cell-5" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Example: Manual cost monitoring for a computation\n", "def simulate_computation(duration_seconds=5):\n", " \"\"\"Simulate a computation that takes some time.\"\"\"\n", " start_time = time.time()\n", " \n", " # Simulate CPU-intensive work\n", " result = 0\n", " while time.time() - start_time < duration_seconds:\n", " result += np.random.random((1000, 1000)).sum()\n", " \n", " return result\n", "\n", "# Monitor cost for AWS\n", "print(\"=== AWS Cost Monitoring Example ===\")\n", "monitor = start_cost_monitoring('aws')\n", "\n", "# Run computation\n", "result = simulate_computation(3)\n", "\n", "# Generate cost report\n", "cost_report = generate_cost_report('aws', 't3.medium', duration_seconds=3)\n", "print(f\"Instance Type: {cost_report['instance_type']}\")\n", "print(f\"Duration: {cost_report['duration_seconds']} seconds\")\n", "print(f\"Hourly Rate: ${cost_report['cost_estimate']['hourly_rate']:.4f}\")\n", "print(f\"Estimated Cost: ${cost_report['cost_estimate']['estimated_cost']:.6f}\")\n", "\n", "# Compare costs across providers for same duration\n", "print(\"\\n=== Cost Comparison Across Providers (3 seconds) ===\")\n", "providers_and_instances = [\n", " ('aws', 't3.medium'),\n", " ('gcp', 'n2-standard-2'),\n", " ('azure', 'Standard_D2s_v3')\n", "]\n", "\n", "for provider, instance in providers_and_instances:\n", " report = generate_cost_report(provider, instance, duration_seconds=3)\n", " print(f\"{provider.upper():5} {instance:20} ${report['cost_estimate']['estimated_cost']:.6f}\")" ], "id": "cell-6" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Automatic Cost Tracking with Decorators\n", "\n", "The easiest way to track costs is using the `@cost_tracking_decorator`:" ], "id": "cell-7" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Example 1: AWS Cost Tracking\n", "@cost_tracking_decorator('aws', 't3.xlarge')\n", "def aws_ml_training():\n", " \"\"\"Example ML training with automatic AWS cost tracking.\"\"\"\n", " from sklearn.ensemble import RandomForestClassifier\n", " from sklearn.datasets import make_classification\n", " from sklearn.model_selection import train_test_split\n", " import time\n", " \n", " # Generate dataset\n", " X, y = make_classification(n_samples=10000, n_features=20, n_classes=3, random_state=42)\n", " X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n", " \n", " # Train model\n", " start_time = time.time()\n", " model = RandomForestClassifier(n_estimators=100, random_state=42, n_jobs=-1)\n", " model.fit(X_train, y_train)\n", " training_time = time.time() - start_time\n", " \n", " # Evaluate\n", " accuracy = model.score(X_test, y_test)\n", " \n", " return {\n", " 'accuracy': accuracy,\n", " 'training_time': training_time,\n", " 'samples_trained': len(X_train)\n", " }\n", "\n", "# Example 2: GCP Cost Tracking\n", "@cost_tracking_decorator('gcp', 'a2-highgpu-1g')\n", "def gcp_gpu_computation():\n", " \"\"\"Example GPU computation with automatic GCP cost tracking.\"\"\"\n", " import numpy as np\n", " import time\n", " \n", " start_time = time.time()\n", " \n", " # Simulate GPU-intensive work\n", " matrices = []\n", " for i in range(10):\n", " A = np.random.rand(1000, 1000)\n", " B = np.random.rand(1000, 1000)\n", " C = np.dot(A, B)\n", " matrices.append(C)\n", " \n", " result = np.mean([m.sum() for m in matrices])\n", " computation_time = time.time() - start_time\n", " \n", " return {\n", " 'result': result,\n", " 'computation_time': computation_time,\n", " 'matrices_processed': len(matrices)\n", " }\n", "\n", "# Example 3: Azure Cost Tracking\n", "@cost_tracking_decorator('azure', 'Standard_NC6')\n", "def azure_deep_learning():\n", " \"\"\"Example deep learning with automatic Azure cost tracking.\"\"\"\n", " import numpy as np\n", " import time\n", " \n", " # Simulate neural network training\n", " start_time = time.time()\n", " \n", " # Simulate epochs\n", " losses = []\n", " for epoch in range(5):\n", " # Simulate batch processing\n", " batch_losses = []\n", " for batch in range(100):\n", " # Simulate forward and backward pass\n", " loss = np.random.exponential(1.0) * np.exp(-epoch * 0.1)\n", " batch_losses.append(loss)\n", " \n", " epoch_loss = np.mean(batch_losses)\n", " losses.append(epoch_loss)\n", " \n", " training_time = time.time() - start_time\n", " \n", " return {\n", " 'final_loss': losses[-1],\n", " 'all_losses': losses,\n", " 'training_time': training_time,\n", " 'epochs': len(losses)\n", " }\n", "\n", "# Run examples and display costs\n", "print(\"=== Running Cost-Tracked Functions ===\")\n", "\n", "# AWS Example\n", "print(\"\\n1. AWS ML Training:\")\n", "aws_result = aws_ml_training()\n", "if aws_result['success']:\n", " print(f\" āœ“ Accuracy: {aws_result['result']['accuracy']:.4f}\")\n", " print(f\" āœ“ Duration: {aws_result['cost_report']['duration_seconds']:.2f}s\")\n", " print(f\" šŸ’° Cost: ${aws_result['cost_report']['cost_estimate']['estimated_cost']:.6f}\")\n", "\n", "# GCP Example\n", "print(\"\\n2. GCP GPU Computation:\")\n", "gcp_result = gcp_gpu_computation()\n", "if gcp_result['success']:\n", " print(f\" āœ“ Matrices Processed: {gcp_result['result']['matrices_processed']}\")\n", " print(f\" āœ“ Duration: {gcp_result['cost_report']['duration_seconds']:.2f}s\")\n", " print(f\" šŸ’° Cost: ${gcp_result['cost_report']['cost_estimate']['estimated_cost']:.6f}\")\n", "\n", "# Azure Example\n", "print(\"\\n3. Azure Deep Learning:\")\n", "azure_result = azure_deep_learning()\n", "if azure_result['success']:\n", " print(f\" āœ“ Final Loss: {azure_result['result']['final_loss']:.4f}\")\n", " print(f\" āœ“ Duration: {azure_result['cost_report']['duration_seconds']:.2f}s\")\n", " print(f\" šŸ’° Cost: ${azure_result['cost_report']['cost_estimate']['estimated_cost']:.6f}\")" ], "id": "cell-8" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Advanced Cost Analysis\n", "\n", "### Regional Pricing Comparison" ], "id": "cell-9" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# AWS Regional Pricing Comparison\n", "aws_monitor = get_cost_monitor('aws')\n", "\n", "print(\"=== AWS Regional Pricing Comparison (t3.large) ===\")\n", "instance_type = 't3.large'\n", "regions = ['us-east-1', 'us-west-2', 'eu-west-1', 'ap-southeast-1', 'sa-east-1']\n", "\n", "regional_prices = []\n", "for region in regions:\n", " pricing = aws_monitor.get_region_pricing(region)\n", " if instance_type in pricing:\n", " price = pricing[instance_type]\n", " regional_prices.append((region, price))\n", " print(f\"{region:15} ${price:.4f}/hour\")\n", "\n", "# Find cheapest and most expensive regions\n", "regional_prices.sort(key=lambda x: x[1])\n", "print(f\"\\nCheapest: {regional_prices[0][0]} (${regional_prices[0][1]:.4f}/hour)\")\n", "print(f\"Most Expensive: {regional_prices[-1][0]} (${regional_prices[-1][1]:.4f}/hour)\")\n", "savings = (1 - regional_prices[0][1] / regional_prices[-1][1]) * 100\n", "print(f\"Potential Savings: {savings:.1f}%\")\n", "\n", "# GCP Regional Pricing Comparison\n", "gcp_monitor = get_cost_monitor('gcp')\n", "\n", "print(\"\\n=== GCP Regional Pricing Comparison (n2-standard-4) ===\")\n", "gcp_regional_pricing = gcp_monitor.get_region_pricing_comparison('n2-standard-4')\n", "for region, pricing in list(gcp_regional_pricing.items())[:5]:\n", " print(f\"{region:20} On-Demand: ${pricing['on_demand_hourly']:.4f}/hr, \"\n", " f\"Preemptible: ${pricing['preemptible_hourly']:.4f}/hr\")" ], "id": "cell-10" }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Spot/Preemptible Instance Savings" ], "id": "cell-11" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Compare on-demand vs spot/preemptible pricing\n", "print(\"=== On-Demand vs Spot/Preemptible Pricing Comparison ===\")\n", "\n", "# AWS Spot Instances\n", "print(\"\\nAWS Spot Instances:\")\n", "aws_instances = ['t3.large', 'm5.xlarge', 'c5.2xlarge', 'r5.large']\n", "for instance in aws_instances:\n", " on_demand = aws_monitor.estimate_cost(instance, 1.0)\n", " spot = aws_monitor.estimate_cost(instance, 1.0, use_spot=True)\n", " savings = (1 - spot.hourly_rate / on_demand.hourly_rate) * 100\n", " print(f\"{instance:15} On-Demand: ${on_demand.hourly_rate:.4f}/hr, \"\n", " f\"Spot: ${spot.hourly_rate:.4f}/hr ({savings:.0f}% savings)\")\n", "\n", "# GCP Preemptible VMs\n", "print(\"\\nGCP Preemptible VMs:\")\n", "gcp_instances = ['n2-standard-4', 'c2-standard-4', 'n2-highmem-4', 'a2-highgpu-1g']\n", "for instance in gcp_instances:\n", " on_demand = gcp_monitor.estimate_cost(instance, 1.0)\n", " preemptible = gcp_monitor.estimate_cost(instance, 1.0, use_preemptible=True)\n", " savings = (1 - preemptible.hourly_rate / on_demand.hourly_rate) * 100\n", " print(f\"{instance:20} On-Demand: ${on_demand.hourly_rate:.4f}/hr, \"\n", " f\"Preemptible: ${preemptible.hourly_rate:.4f}/hr ({savings:.0f}% savings)\")\n", "\n", "# Azure Spot VMs\n", "azure_monitor = get_cost_monitor('azure')\n", "print(\"\\nAzure Spot VMs:\")\n", "azure_instances = ['Standard_D4s_v3', 'Standard_E4s_v3', 'Standard_F4s_v2']\n", "for instance in azure_instances:\n", " on_demand = azure_monitor.estimate_cost(instance, 1.0)\n", " spot = azure_monitor.estimate_cost(instance, 1.0, use_spot=True)\n", " savings = (1 - spot.hourly_rate / on_demand.hourly_rate) * 100\n", " print(f\"{instance:20} On-Demand: ${on_demand.hourly_rate:.4f}/hr, \"\n", " f\"Spot: ${spot.hourly_rate:.4f}/hr ({savings:.0f}% savings)\")" ], "id": "cell-12" }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Batch Job Cost Estimation" ], "id": "cell-13" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Estimate costs for batch processing jobs\n", "def estimate_batch_job_costs(job_config):\n", " \"\"\"Estimate costs for a batch processing job across multiple providers.\"\"\"\n", " results = {}\n", " \n", " # AWS Batch\n", " aws_batch_cost = aws_monitor.estimate_batch_cost(\n", " job_name=job_config['name'],\n", " machine_type=job_config['aws_instance'],\n", " instance_count=job_config['instance_count'],\n", " estimated_duration_hours=job_config['duration_hours']\n", " )\n", " results['aws'] = aws_batch_cost\n", " \n", " # GCP Batch\n", " gcp_batch_cost = gcp_monitor.estimate_batch_cost(\n", " job_name=job_config['name'],\n", " machine_type=job_config['gcp_instance'],\n", " instance_count=job_config['instance_count'],\n", " estimated_duration_hours=job_config['duration_hours']\n", " )\n", " results['gcp'] = gcp_batch_cost\n", " \n", " # Azure Batch\n", " azure_batch_cost = azure_monitor.estimate_batch_cost(\n", " job_name=job_config['name'],\n", " machine_type=job_config['azure_instance'],\n", " instance_count=job_config['instance_count'],\n", " estimated_duration_hours=job_config['duration_hours']\n", " )\n", " results['azure'] = azure_batch_cost\n", " \n", " return results\n", "\n", "# Example batch job configuration\n", "batch_job = {\n", " 'name': 'large-scale-data-processing',\n", " 'instance_count': 50,\n", " 'duration_hours': 4.5,\n", " 'aws_instance': 'c5.4xlarge',\n", " 'gcp_instance': 'c2-standard-16',\n", " 'azure_instance': 'Standard_F16s_v2'\n", "}\n", "\n", "print(\"=== Batch Job Cost Estimation ===\")\n", "print(f\"Job: {batch_job['name']}\")\n", "print(f\"Instances: {batch_job['instance_count']}\")\n", "print(f\"Duration: {batch_job['duration_hours']} hours\\n\")\n", "\n", "batch_costs = estimate_batch_job_costs(batch_job)\n", "\n", "for provider, cost_info in batch_costs.items():\n", " print(f\"{provider.upper()}:\")\n", " print(f\" Instance Type: {cost_info['machine_type']}\")\n", " print(f\" Total Compute Hours: {cost_info['total_compute_hours']}\")\n", " print(f\" Estimated Cost: ${cost_info['estimated_cost']:.2f}\")\n", " print(f\" Cost per Instance-Hour: ${cost_info['cost_per_instance_hour']:.4f}\")\n", " print()\n", "\n", "# Find most cost-effective provider\n", "cheapest = min(batch_costs.items(), key=lambda x: x[1]['estimated_cost'])\n", "print(f\"Most cost-effective: {cheapest[0].upper()} (${cheapest[1]['estimated_cost']:.2f})\")" ], "id": "cell-14" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Cost Optimization Strategies\n", "\n", "### Sustained Use and Reserved Instance Analysis" ], "id": "cell-15" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# AWS Reserved Instance Savings\n", "print(\"=== AWS Reserved Instance Savings Analysis ===\")\n", "instance_type = 'm5.xlarge'\n", "monthly_hours = 720 # Full month\n", "\n", "# Calculate costs for different commitment levels\n", "on_demand_monthly = aws_monitor.estimate_cost(instance_type, monthly_hours).total_cost\n", "ri_1yr_no_upfront = on_demand_monthly * 0.62 # ~38% discount\n", "ri_3yr_no_upfront = on_demand_monthly * 0.50 # ~50% discount\n", "ri_3yr_all_upfront = on_demand_monthly * 0.38 # ~62% discount\n", "\n", "print(f\"Instance Type: {instance_type}\")\n", "print(f\"Monthly Usage: {monthly_hours} hours\\n\")\n", "print(f\"On-Demand: ${on_demand_monthly:.2f}/month\")\n", "print(f\"1-Year RI (No Up): ${ri_1yr_no_upfront:.2f}/month (38% savings)\")\n", "print(f\"3-Year RI (No Up): ${ri_3yr_no_upfront:.2f}/month (50% savings)\")\n", "print(f\"3-Year RI (All Up): ${ri_3yr_all_upfront:.2f}/month (62% savings)\")\n", "\n", "# GCP Sustained Use Discounts\n", "print(\"\\n=== GCP Sustained Use Discount Analysis ===\")\n", "usage_levels = [25, 50, 75, 100] # Percentage of month\n", "\n", "for usage_pct in usage_levels:\n", " hours = (usage_pct / 100) * monthly_hours\n", " discount_info = gcp_monitor.estimate_sustained_use_discount(hours)\n", " \n", " base_cost = gcp_monitor.estimate_cost('n2-standard-4', hours).total_cost\n", " discounted_cost = base_cost * (1 - discount_info['discount_percentage'] / 100)\n", " \n", " print(f\"{usage_pct}% usage ({hours:.0f} hours): \"\n", " f\"{discount_info['discount_percentage']:.0f}% discount, \"\n", " f\"${base_cost:.2f} → ${discounted_cost:.2f}\")\n", "\n", "# Azure Reserved Instance Analysis\n", "print(\"\\n=== Azure Reserved Instance Savings ===\")\n", "azure_instance = 'Standard_D4s_v3'\n", "azure_on_demand = azure_monitor.estimate_cost(azure_instance, monthly_hours).total_cost\n", "\n", "print(f\"Instance Type: {azure_instance}\")\n", "print(f\"On-Demand: ${azure_on_demand:.2f}/month\")\n", "print(f\"1-Year Reserved: ${azure_on_demand * 0.62:.2f}/month (38% savings)\")\n", "print(f\"3-Year Reserved: ${azure_on_demand * 0.42:.2f}/month (58% savings)\")" ], "id": "cell-16" }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Workload-Specific Recommendations" ], "id": "cell-17" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def get_cost_optimization_recommendations(workload_type, requirements):\n", " \"\"\"Get cost optimization recommendations based on workload characteristics.\"\"\"\n", " recommendations = []\n", " \n", " if workload_type == 'batch_processing':\n", " recommendations.extend([\n", " \"Use spot/preemptible instances for up to 80% savings\",\n", " \"Implement checkpointing to handle instance termination\",\n", " \"Consider time-flexible scheduling for lowest spot prices\",\n", " \"Use auto-scaling to optimize resource utilization\"\n", " ])\n", " \n", " elif workload_type == 'ml_training':\n", " recommendations.extend([\n", " \"Use GPU instances only when necessary\",\n", " \"Consider using preemptible GPUs for experimentation\",\n", " \"Implement gradient checkpointing for long training runs\",\n", " \"Use mixed precision training to reduce memory usage\"\n", " ])\n", " \n", " elif workload_type == 'web_service':\n", " recommendations.extend([\n", " \"Use reserved instances for predictable base load\",\n", " \"Implement auto-scaling for variable traffic\",\n", " \"Consider serverless options for sporadic workloads\",\n", " \"Use CDN to reduce compute requirements\"\n", " ])\n", " \n", " elif workload_type == 'data_processing':\n", " recommendations.extend([\n", " \"Use memory-optimized instances for in-memory processing\",\n", " \"Consider data locality to reduce transfer costs\",\n", " \"Implement data compression to reduce storage costs\",\n", " \"Use lifecycle policies to archive old data\"\n", " ])\n", " \n", " # Add requirement-specific recommendations\n", " if requirements.get('fault_tolerant', False):\n", " recommendations.append(\"Leverage spot/preemptible instances aggressively\")\n", " \n", " if requirements.get('gpu_required', False):\n", " recommendations.append(\"Compare GPU instance prices across regions and providers\")\n", " \n", " if requirements.get('long_running', False):\n", " recommendations.append(\"Use reserved instances or committed use discounts\")\n", " \n", " return recommendations\n", "\n", "# Example workload analysis\n", "print(\"=== Workload-Specific Cost Optimization Recommendations ===\")\n", "\n", "workloads = [\n", " {\n", " 'type': 'batch_processing',\n", " 'name': 'Nightly Data Pipeline',\n", " 'requirements': {'fault_tolerant': True, 'gpu_required': False}\n", " },\n", " {\n", " 'type': 'ml_training',\n", " 'name': 'Deep Learning Model Training',\n", " 'requirements': {'gpu_required': True, 'long_running': True}\n", " },\n", " {\n", " 'type': 'web_service',\n", " 'name': 'API Backend Service',\n", " 'requirements': {'fault_tolerant': False, 'long_running': True}\n", " }\n", "]\n", "\n", "for workload in workloads:\n", " print(f\"\\n{workload['name']} ({workload['type']}):\")\n", " recommendations = get_cost_optimization_recommendations(\n", " workload['type'], \n", " workload['requirements']\n", " )\n", " for i, rec in enumerate(recommendations, 1):\n", " print(f\" {i}. {rec}\")" ], "id": "cell-18" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Visualizing Cost Data\n", "\n", "### Cost Comparison Charts" ], "id": "cell-19" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Create cost comparison visualizations\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "\n", "# Prepare data for visualization\n", "providers = ['AWS', 'GCP', 'Azure']\n", "instance_types = {\n", " 'AWS': ['t3.medium', 't3.large', 't3.xlarge', 'm5.large', 'm5.xlarge'],\n", " 'GCP': ['n2-standard-2', 'n2-standard-4', 'n2-standard-8', 'n2-standard-16', 'n2-standard-32'],\n", " 'Azure': ['Standard_D2s_v3', 'Standard_D4s_v3', 'Standard_D8s_v3', 'Standard_D16s_v3', 'Standard_D32s_v3']\n", "}\n", "\n", "# Collect pricing data\n", "pricing_data = {}\n", "for provider in providers:\n", " monitor = get_cost_monitor(provider.lower())\n", " prices = []\n", " for instance in instance_types[provider]:\n", " cost_estimate = monitor.estimate_cost(instance, 1.0)\n", " prices.append(cost_estimate.hourly_rate)\n", " pricing_data[provider] = prices\n", "\n", "# Create comparison chart\n", "fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))\n", "\n", "# Bar chart comparison\n", "x = np.arange(len(instance_types['AWS']))\n", "width = 0.25\n", "\n", "for i, provider in enumerate(providers):\n", " ax1.bar(x + i*width, pricing_data[provider], width, label=provider)\n", "\n", "ax1.set_xlabel('Instance Size')\n", "ax1.set_ylabel('Cost per Hour ($)')\n", "ax1.set_title('Cloud Provider Cost Comparison by Instance Size')\n", "ax1.set_xticks(x + width)\n", "ax1.set_xticklabels(['Small', 'Medium', 'Large', 'XLarge', '2XLarge'])\n", "ax1.legend()\n", "ax1.grid(True, alpha=0.3)\n", "\n", "# Spot vs On-Demand savings visualization\n", "spot_savings = {\n", " 'AWS': [65, 70, 72, 68, 71],\n", " 'GCP': [60, 65, 68, 70, 72],\n", " 'Azure': [58, 62, 65, 67, 70]\n", "}\n", "\n", "for i, provider in enumerate(providers):\n", " ax2.plot(instance_types[provider], spot_savings[provider], \n", " marker='o', linewidth=2, markersize=8, label=provider)\n", "\n", "ax2.set_xlabel('Instance Type')\n", "ax2.set_ylabel('Spot/Preemptible Savings (%)')\n", "ax2.set_title('Spot Instance Savings by Provider')\n", "ax2.legend()\n", "ax2.grid(True, alpha=0.3)\n", "ax2.set_xticklabels(['Small', 'Medium', 'Large', 'XLarge', '2XLarge'])\n", "\n", "plt.tight_layout()\n", "plt.show()\n", "\n", "# Monthly cost projection\n", "fig, ax = plt.subplots(figsize=(10, 6))\n", "\n", "hours_per_day = np.arange(1, 25)\n", "days_per_month = 30\n", "\n", "for provider in providers:\n", " monitor = get_cost_monitor(provider.lower())\n", " instance = instance_types[provider][2] # Large instance\n", " \n", " monthly_costs = []\n", " for hours in hours_per_day:\n", " total_hours = hours * days_per_month\n", " cost = monitor.estimate_cost(instance, total_hours).total_cost\n", " monthly_costs.append(cost)\n", " \n", " ax.plot(hours_per_day, monthly_costs, marker='o', label=f'{provider} ({instance})')\n", "\n", "ax.set_xlabel('Hours per Day')\n", "ax.set_ylabel('Monthly Cost ($)')\n", "ax.set_title('Monthly Cost Projection by Daily Usage')\n", "ax.legend()\n", "ax.grid(True, alpha=0.3)\n", "\n", "# Add cost threshold lines\n", "budget_levels = [100, 500, 1000, 2000]\n", "for budget in budget_levels:\n", " ax.axhline(y=budget, color='red', linestyle='--', alpha=0.5)\n", " ax.text(24.5, budget, f'${budget}', va='center')\n", "\n", "plt.tight_layout()\n", "plt.show()" ], "id": "cell-20" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Best Practices for Cost Optimization\n", "\n", "### 1. Choose the Right Instance Type\n", "- Match instance specifications to workload requirements\n", "- Avoid over-provisioning resources\n", "- Use burstable instances for variable workloads\n", "\n", "### 2. Leverage Spot/Preemptible Instances\n", "- Use for fault-tolerant batch processing\n", "- Implement checkpointing for long-running jobs\n", "- Mix on-demand and spot for reliability\n", "\n", "### 3. Optimize for Your Usage Pattern\n", "- Reserved instances for steady-state workloads\n", "- Auto-scaling for variable demand\n", "- Scheduled scaling for predictable patterns\n", "\n", "### 4. Monitor and Alert\n", "- Set up budget alerts\n", "- Use Clustrix cost tracking decorators\n", "- Regular cost reviews and optimization\n", "\n", "### 5. Multi-Cloud Strategy\n", "- Compare prices across providers\n", "- Use each cloud's strengths\n", "- Avoid vendor lock-in" ], "id": "cell-21" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Real-World Example: Cost-Optimized ML Pipeline" ], "id": "cell-22" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Complete cost-optimized ML pipeline example\n", "class CostOptimizedMLPipeline:\n", " \"\"\"Example of a cost-aware ML pipeline using Clustrix.\"\"\"\n", " \n", " def __init__(self, budget_limit=100.0):\n", " self.budget_limit = budget_limit\n", " self.total_cost = 0.0\n", " self.cost_history = []\n", " \n", " @cost_tracking_decorator('aws', 't3.medium')\n", " def preprocess_data(self, data_size_gb):\n", " \"\"\"Preprocess data on cost-effective instances.\"\"\"\n", " import time\n", " processing_time = data_size_gb * 0.5 # Simulate processing\n", " time.sleep(min(processing_time, 2)) # Cap at 2 seconds for demo\n", " return {'processed_records': data_size_gb * 1000000}\n", " \n", " @cost_tracking_decorator('aws', 'p3.2xlarge')\n", " def train_model(self, model_type='small'):\n", " \"\"\"Train model on GPU instances.\"\"\"\n", " import time\n", " training_times = {'small': 1, 'medium': 2, 'large': 3}\n", " time.sleep(training_times.get(model_type, 1))\n", " return {'model_accuracy': 0.85 + np.random.random() * 0.1}\n", " \n", " @cost_tracking_decorator('aws', 't3.small')\n", " def evaluate_model(self, test_size):\n", " \"\"\"Evaluate model on small instances.\"\"\"\n", " import time\n", " time.sleep(0.5)\n", " return {'test_accuracy': 0.82 + np.random.random() * 0.1}\n", " \n", " def run_pipeline(self, data_size_gb=10, model_type='small'):\n", " \"\"\"Run complete pipeline with cost tracking.\"\"\"\n", " print(f\"Starting ML Pipeline (Budget: ${self.budget_limit})\")\n", " results = {}\n", " \n", " # Step 1: Preprocess data\n", " print(\"\\n1. Preprocessing data...\")\n", " preprocess_result = self.preprocess_data(data_size_gb)\n", " if preprocess_result['success']:\n", " cost = preprocess_result['cost_report']['cost_estimate']['estimated_cost']\n", " self.total_cost += cost\n", " self.cost_history.append(('preprocessing', cost))\n", " print(f\" āœ“ Processed {preprocess_result['result']['processed_records']:,} records\")\n", " print(f\" šŸ’° Cost: ${cost:.4f} (Total: ${self.total_cost:.4f})\")\n", " \n", " # Check budget\n", " if self.total_cost > self.budget_limit:\n", " print(f\"\\nāŒ Budget exceeded! Stopping pipeline.\")\n", " return results\n", " \n", " # Step 2: Train model\n", " print(\"\\n2. Training model...\")\n", " train_result = self.train_model(model_type)\n", " if train_result['success']:\n", " cost = train_result['cost_report']['cost_estimate']['estimated_cost']\n", " self.total_cost += cost\n", " self.cost_history.append(('training', cost))\n", " print(f\" āœ“ Model accuracy: {train_result['result']['model_accuracy']:.4f}\")\n", " print(f\" šŸ’° Cost: ${cost:.4f} (Total: ${self.total_cost:.4f})\")\n", " \n", " # Check budget\n", " if self.total_cost > self.budget_limit:\n", " print(f\"\\nāŒ Budget exceeded! Stopping pipeline.\")\n", " return results\n", " \n", " # Step 3: Evaluate model\n", " print(\"\\n3. Evaluating model...\")\n", " eval_result = self.evaluate_model(1000)\n", " if eval_result['success']:\n", " cost = eval_result['cost_report']['cost_estimate']['estimated_cost']\n", " self.total_cost += cost\n", " self.cost_history.append(('evaluation', cost))\n", " print(f\" āœ“ Test accuracy: {eval_result['result']['test_accuracy']:.4f}\")\n", " print(f\" šŸ’° Cost: ${cost:.4f} (Total: ${self.total_cost:.4f})\")\n", " \n", " # Summary\n", " print(\"\\n=== Pipeline Summary ===\")\n", " print(f\"Total Cost: ${self.total_cost:.4f}\")\n", " print(f\"Budget Remaining: ${self.budget_limit - self.total_cost:.4f}\")\n", " print(\"\\nCost Breakdown:\")\n", " for step, cost in self.cost_history:\n", " pct = (cost / self.total_cost) * 100\n", " print(f\" {step:15} ${cost:.4f} ({pct:.1f}%)\")\n", " \n", " return {\n", " 'total_cost': self.total_cost,\n", " 'cost_history': self.cost_history,\n", " 'under_budget': self.total_cost <= self.budget_limit\n", " }\n", "\n", "# Run the cost-optimized pipeline\n", "pipeline = CostOptimizedMLPipeline(budget_limit=0.10) # $0.10 budget for demo\n", "results = pipeline.run_pipeline(data_size_gb=5, model_type='small')\n", "\n", "print(\"\\nāœ… Pipeline completed successfully!\" if results.get('under_budget', False) \n", " else \"\\nāš ļø Pipeline stopped due to budget constraints.\")" ], "id": "cell-23" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Summary\n", "\n", "This tutorial covered comprehensive cost monitoring and optimization with Clustrix:\n", "\n", "### Key Features Demonstrated\n", "\n", "1. **Automatic Cost Tracking**: Use `@cost_tracking_decorator` for seamless monitoring\n", "2. **Manual Cost Monitoring**: Fine-grained control with manual monitoring functions\n", "3. **Multi-Cloud Support**: Compare costs across AWS, GCP, Azure, and more\n", "4. **Regional Pricing**: Find the most cost-effective regions\n", "5. **Spot/Preemptible Savings**: Up to 80% cost reduction\n", "6. **Batch Job Estimation**: Plan and budget for large-scale processing\n", "7. **Optimization Recommendations**: Workload-specific cost-saving strategies\n", "\n", "### Best Practices\n", "\n", "- Always use cost tracking decorators for production workloads\n", "- Compare prices across providers and regions\n", "- Leverage spot/preemptible instances for fault-tolerant workloads\n", "- Use reserved instances for predictable, long-running workloads\n", "- Monitor costs continuously and set up budget alerts\n", "- Implement auto-scaling to match resources to demand\n", "\n", "### Next Steps\n", "\n", "1. Integrate cost monitoring into your existing workflows\n", "2. Set up budget alerts and cost anomaly detection\n", "3. Experiment with different instance types and pricing models\n", "4. Implement cost optimization recommendations\n", "5. Create cost dashboards for stakeholder visibility\n", "\n", "### Resources\n", "\n", "- [Clustrix Documentation](https://clustrix.readthedocs.io/)\n", "- [AWS Pricing](https://aws.amazon.com/pricing/)\n", "- [GCP Pricing](https://cloud.google.com/pricing)\n", "- [Azure Pricing](https://azure.microsoft.com/pricing/)\n", "\n", "Remember: **Every dollar saved on cloud costs is a dollar that can be invested in innovation!**" ], "id": "cell-24" } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.0" } }, "nbformat": 4, "nbformat_minor": 4 }