{ "cells": [ { "cell_type": "markdown", "id": "aws-title", "metadata": {}, "source": "# Amazon Web Services (AWS) Cloud Tutorial\n\nThis tutorial demonstrates how to use Clustrix with Amazon Web Services (AWS) cloud infrastructure for scalable distributed computing.\n\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ContextLab/clustrix/blob/master/docs/source/notebooks/aws_cloud_tutorial.ipynb)\n\n## Overview\n\nAWS provides several services that work well with Clustrix:\n\n- **EC2**: Virtual machines for compute clusters\n- **AWS Batch**: Managed job scheduling service\n- **ECS**: Container orchestration\n- **ParallelCluster**: HPC cluster management\n- **S3**: Object storage for data and results\n- **VPC**: Network isolation and security\n\n## Prerequisites\n\nBefore starting this tutorial, ensure you have:\n\n1. **AWS Account**: Active AWS account with billing enabled\n2. **AWS CLI**: Installed and configured on your local machine\n3. **SSH Key Pair**: Generated and uploaded to AWS EC2 for secure access\n4. **IAM Permissions**: Appropriate permissions for EC2, S3, and other services\n5. **Basic AWS Knowledge**: Understanding of AWS services, regions, and availability zones\n6. **Python Environment**: Python 3.7+ with pip installed\n\n## Complete AWS Setup Guide\n\n### Step 1: Create AWS Account\n1. Go to [aws.amazon.com](https://aws.amazon.com) and create an account\n2. Verify your email and provide payment information\n3. Choose the Basic Support plan (free)\n\n### Step 2: Install AWS CLI\n```bash\n# On macOS\nbrew install awscli\n\n# On Linux/WSL\ncurl \"https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip\" -o \"awscliv2.zip\"\nunzip awscliv2.zip\nsudo ./aws/install\n\n# On Windows\n# Download and run the AWS CLI MSI installer from AWS documentation\n```\n\n### Step 3: Create IAM User and Access Keys\n1. Go to AWS Console \u2192 IAM \u2192 Users \u2192 Create User\n2. Create a user with programmatic access\n3. Attach policies: `AmazonEC2FullAccess`, `AmazonS3FullAccess`, `IAMReadOnlyAccess`\n4. Save the Access Key ID and Secret Access Key securely\n\n### Step 4: Generate SSH Key Pair\n```bash\n# Generate SSH key pair locally\nssh-keygen -t rsa -b 4096 -f ~/.ssh/aws-clustrix-key\n\n# Import public key to AWS\naws ec2 import-key-pair --key-name \"clustrix-key\" --public-key-material fileb://~/.ssh/aws-clustrix-key.pub\n```", "outputs": [] }, { "cell_type": "markdown", "id": "installation", "metadata": {}, "source": [ "## Installation and Setup\n", "\n", "Install Clustrix with AWS dependencies:" ] }, { "cell_type": "code", "execution_count": null, "id": "install", "metadata": {}, "outputs": [], "source": [ "# Install Clustrix with AWS support\n", "!pip install clustrix boto3 awscli\n", "\n", "# Import required libraries\n", "import clustrix\n", "from clustrix import cluster, configure\n", "import boto3\n", "import os\n", "import numpy as np\n", "import time" ] }, { "cell_type": "markdown", "id": "aws-credentials", "metadata": {}, "source": "## AWS Credentials Configuration\n\nConfigure your AWS credentials using one of the following methods:\n\n### Option 1: AWS CLI Configuration (Recommended)\n\nRun the following command in your terminal to configure credentials interactively:\n\n```bash\naws configure\n```\n\nYou'll be prompted to enter:\n- AWS Access Key ID\n- AWS Secret Access Key \n- Default region name (e.g., us-east-1)\n- Default output format (json)\n\nThis creates credential files at `~/.aws/credentials` and `~/.aws/config`.", "outputs": [] }, { "cell_type": "code", "execution_count": null, "id": "aws-config", "metadata": {}, "outputs": [], "source": [ "# Configure AWS CLI (run this in terminal)\n", "# aws configure\n", "\n", "# Verify configuration\n", "!aws sts get-caller-identity" ] }, { "cell_type": "markdown", "id": "aws-creds-env", "metadata": {}, "source": [ "### Option 2: Environment Variables" ] }, { "cell_type": "code", "id": "env-vars", "metadata": {}, "outputs": [], "source": "# Option 2: Set AWS credentials as environment variables (if needed)\n# os.environ['AWS_ACCESS_KEY_ID'] = 'your-access-key'\n# os.environ['AWS_SECRET_ACCESS_KEY'] = 'your-secret-key'\n# os.environ['AWS_DEFAULT_REGION'] = 'us-east-1'\n\n# Test AWS connection\ntry:\n ec2 = boto3.client('ec2')\n regions = ec2.describe_regions()\n print(f\"\u2713 Successfully connected to AWS. Available regions: {len(regions['Regions'])}\")\nexcept Exception as e:\n print(f\"\u2717 AWS connection failed: {e}\")", "execution_count": null }, { "cell_type": "markdown", "id": "ec2-setup", "metadata": {}, "source": "## Method 1: Direct EC2 Instance Configuration\n\n### Prerequisites: Create Security Group\n\nBefore launching an EC2 instance, you need to create a security group that allows SSH access. You can do this through the AWS Console or use the function provided in the Security section below.\n\n**Quick Setup via AWS Console:**\n1. Go to EC2 \u2192 Security Groups \u2192 Create Security Group\n2. Name: `clustrix-sg`\n3. Add inbound rule: SSH (port 22) from your IP address only\n4. Note the Security Group ID (sg-xxxxxxxxx)\n\n### Launch EC2 Instance for Clustrix\n\nThis example shows how to programmatically launch an EC2 instance suitable for Clustrix:", "outputs": [] }, { "cell_type": "code", "id": "ec2-launch", "metadata": {}, "outputs": [], "source": "def launch_clustrix_ec2_instance(key_name, security_group_id, instance_type='t3.large'):\n \"\"\"\n Launch an EC2 instance configured for Clustrix.\n \n Args:\n key_name: Name of your EC2 key pair\n security_group_id: Security group ID that allows SSH access\n instance_type: EC2 instance type\n \n Returns:\n Instance ID and public IP\n \"\"\"\n ec2 = boto3.client('ec2')\n \n # User data script to setup Python environment\n user_data = '''\n#!/bin/bash\nyum update -y\nyum install -y python3 python3-pip git\npip3 install clustrix numpy scipy pandas\n\n# Install uv for faster package management\ncurl -LsSf https://astral.sh/uv/install.sh | sh\nsource $HOME/.cargo/env\n\n# Create clustrix user\nuseradd -m -s /bin/bash clustrix\nmkdir -p /home/clustrix/.ssh\ncp /home/ec2-user/.ssh/authorized_keys /home/clustrix/.ssh/\nchown -R clustrix:clustrix /home/clustrix/.ssh\nchmod 700 /home/clustrix/.ssh\nchmod 600 /home/clustrix/.ssh/authorized_keys\n\n# Setup sudo access\necho \"clustrix ALL=(ALL) NOPASSWD:ALL\" >> /etc/sudoers\n'''\n \n try:\n response = ec2.run_instances(\n ImageId='ami-0c02fb55956c7d316', # Amazon Linux 2 AMI\n MinCount=1,\n MaxCount=1,\n InstanceType=instance_type,\n KeyName=key_name,\n SecurityGroupIds=[security_group_id],\n UserData=user_data,\n TagSpecifications=[\n {\n 'ResourceType': 'instance',\n 'Tags': [\n {'Key': 'Name', 'Value': 'Clustrix-Compute-Node'},\n {'Key': 'Purpose', 'Value': 'Clustrix-Tutorial'}\n ]\n }\n ]\n )\n \n instance_id = response['Instances'][0]['InstanceId']\n \n # Wait for instance to be running\n waiter = ec2.get_waiter('instance_running')\n waiter.wait(InstanceIds=[instance_id])\n \n # Get public IP\n instance_info = ec2.describe_instances(InstanceIds=[instance_id])\n public_ip = instance_info['Reservations'][0]['Instances'][0].get('PublicIpAddress')\n \n return instance_id, public_ip\n \n except Exception as e:\n print(f\"Error launching instance: {e}\")\n return None, None\n\n# Example usage (uncomment and modify with your details)\n# instance_id, public_ip = launch_clustrix_ec2_instance(\n# key_name='clustrix-key',\n# security_group_id='sg-xxxxxxxxx'\n# )\n# \n# if instance_id and public_ip:\n# print(f\"\u2713 Instance launched: {instance_id}\")\n# print(f\"\u2713 Public IP: {public_ip}\")\n# print(\"\u23f3 Wait 2-3 minutes for user data script to complete before connecting.\")\n# else:\n# print(\"\u2717 Failed to launch instance\")", "execution_count": null }, { "cell_type": "markdown", "id": "clustrix-config", "metadata": {}, "source": [ "### Configure Clustrix for EC2" ] }, { "cell_type": "code", "id": "config-ec2", "metadata": {}, "outputs": [], "source": "# Configure Clustrix to use your EC2 instance\nconfigure(\n cluster_type=\"ssh\",\n cluster_host=\"your-ec2-public-ip\", # Replace with actual IP\n username=\"clustrix\", # or \"ec2-user\" if using default user\n key_file=\"~/.ssh/your-key.pem\", # Path to your private key\n remote_work_dir=\"/tmp/clustrix\",\n package_manager=\"auto\", # Will use uv if available, fallback to pip\n default_cores=4,\n default_memory=\"8GB\",\n default_time=\"01:00:00\"\n)", "execution_count": null }, { "cell_type": "markdown", "id": "fms2rlxukv8", "source": "**Configuration Complete!** \n\nYour Clustrix is now configured to use the EC2 instance. Make sure to replace `your-ec2-public-ip` with the actual IP address of your running EC2 instance.", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "id": "example-computation", "metadata": {}, "source": [ "### Example: Remote Computation on EC2" ] }, { "cell_type": "code", "id": "ec2-example", "metadata": {}, "outputs": [], "source": "@cluster(cores=2, memory=\"4GB\")\ndef aws_monte_carlo_pi(n_samples=1000000):\n \"\"\"Estimate \u03c0 using Monte Carlo method on AWS EC2.\"\"\"\n import numpy as np\n \n # Generate random points\n x = np.random.uniform(-1, 1, n_samples)\n y = np.random.uniform(-1, 1, n_samples)\n \n # Count points inside unit circle\n inside_circle = (x**2 + y**2) <= 1\n pi_estimate = 4 * np.sum(inside_circle) / n_samples\n \n return {\n 'pi_estimate': pi_estimate,\n 'n_samples': n_samples,\n 'error': abs(pi_estimate - np.pi)\n }\n\n# Example usage (uncomment to run on your EC2 instance):\n# result = aws_monte_carlo_pi(n_samples=5000000)\n# print(f\"\u03c0 estimate: {result['pi_estimate']:.6f}\")\n# print(f\"Error: {result['error']:.6f}\")\n# print(f\"Samples used: {result['n_samples']:,}\")", "execution_count": null }, { "cell_type": "markdown", "id": "s0rz170o8cq", "source": "**Ready to Run!** \n\nThe Monte Carlo \u03c0 estimation function is now defined and ready to execute on your EC2 instance. Simply uncomment the example usage lines above to run the computation remotely on AWS.", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "id": "aws-batch", "metadata": {}, "source": [ "## Method 2: AWS Batch Configuration\n", "\n", "AWS Batch provides managed job scheduling for more complex workloads:" ] }, { "cell_type": "code", "id": "batch-setup", "metadata": {}, "outputs": [], "source": "def create_aws_batch_environment():\n \"\"\"\n Example of setting up AWS Batch compute environment.\n This is a template - you'll need to adapt it to your specific needs.\n \"\"\"\n batch = boto3.client('batch')\n ec2 = boto3.client('ec2')\n iam = boto3.client('iam')\n \n # This is a simplified example - real setup requires:\n # 1. VPC and subnet configuration\n # 2. IAM roles and policies\n # 3. Security groups\n # 4. Compute environment\n # 5. Job queue\n # 6. Job definition\n \n return {\n 'compute_environment': 'clustrix-batch-env',\n 'job_queue': 'clustrix-queue',\n 'job_definition': 'clustrix-job-def'\n }\n\n# batch_config = create_aws_batch_environment()", "execution_count": null }, { "cell_type": "markdown", "id": "e7wlnigrkda", "source": "**Note on AWS Batch Complexity**\n\nAWS Batch setup is complex and requires careful configuration of networking, IAM, and compute resources. For easier HPC setups, consider using AWS ParallelCluster or EKS instead. The function above provides a template structure for those who want to implement full Batch integration.", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "id": "parallel-cluster", "metadata": {}, "source": [ "## Method 3: AWS ParallelCluster Integration\n", "\n", "AWS ParallelCluster is designed for HPC workloads and integrates well with Clustrix:" ] }, { "cell_type": "code", "id": "parallelcluster", "metadata": {}, "outputs": [], "source": "# Configure Clustrix for ParallelCluster\ndef configure_for_parallelcluster(cluster_name, master_ip):\n \"\"\"Configure Clustrix to use AWS ParallelCluster.\"\"\"\n configure(\n cluster_type=\"slurm\",\n cluster_host=master_ip,\n username=\"ec2-user\",\n key_file=\"~/.ssh/aws-clustrix-key\",\n remote_work_dir=\"/shared/clustrix\", # Use shared storage\n package_manager=\"uv\",\n module_loads=[\"python3\"], # Load required modules\n default_cores=4,\n default_memory=\"8GB\",\n default_time=\"01:00:00\",\n default_partition=\"compute\"\n )\n return f\"Configured Clustrix for ParallelCluster: {cluster_name}\"\n\n# Example usage:\n# result = configure_for_parallelcluster(\"my-cluster\", \"10.0.0.100\")\n# print(result)", "execution_count": null }, { "cell_type": "markdown", "id": "7ipi0is97ue", "source": "### ParallelCluster Configuration Example\n\nHere's a sample ParallelCluster configuration file for use with Clustrix:\n\n```ini\n# Save as ~/.parallelcluster/config\n[aws]\naws_region_name = us-east-1\n\n[global]\ncluster_template = clustrix-template\nupdate_check = false\nsanity_check = true\n\n[cluster clustrix-template]\nkey_name = your-key-name\nvpc_settings = vpc-settings\ncompute_instance_type = c5.xlarge\nmaster_instance_type = t3.medium\ninitial_queue_size = 0\nmax_queue_size = 10\nscheduler = slurm\nplacement_group = DYNAMIC\nplacement = compute\ndisable_hyperthreading = false\npost_install = https://raw.githubusercontent.com/your-repo/clustrix-setup.sh\n\n[vpc vpc-settings]\nvpc_id = vpc-xxxxxxxxx\nmaster_subnet_id = subnet-xxxxxxxxx\ncompute_subnet_id = subnet-xxxxxxxxx\n```", "metadata": {} }, { "cell_type": "markdown", "id": "storage-s3", "metadata": {}, "source": [ "## Data Management with S3\n", "\n", "Integrate S3 for data input/output:" ] }, { "cell_type": "code", "id": "s3-integration", "metadata": {}, "outputs": [], "source": "@cluster(cores=2, memory=\"4GB\")\ndef process_s3_data(bucket_name, input_key, output_key):\n \"\"\"Process data from S3 and save results back to S3.\"\"\"\n import boto3\n import numpy as np\n import pickle\n import io\n \n s3 = boto3.client('s3')\n \n # Download data from S3\n response = s3.get_object(Bucket=bucket_name, Key=input_key)\n data = pickle.loads(response['Body'].read())\n \n # Process the data\n processed_data = {\n 'original_shape': data.shape if hasattr(data, 'shape') else len(data),\n 'mean': np.mean(data) if hasattr(data, '__iter__') else data,\n 'std': np.std(data) if hasattr(data, '__iter__') else 0,\n 'processing_timestamp': time.time()\n }\n \n # Upload results to S3\n output_buffer = io.BytesIO()\n pickle.dump(processed_data, output_buffer)\n output_buffer.seek(0)\n \n s3.put_object(\n Bucket=bucket_name,\n Key=output_key,\n Body=output_buffer.getvalue()\n )\n \n return f\"Processed data saved to s3://{bucket_name}/{output_key}\"\n\n# Example S3 utility functions\ndef upload_to_s3(data, bucket_name, key):\n \"\"\"Upload data to S3.\"\"\"\n s3 = boto3.client('s3')\n buffer = io.BytesIO()\n pickle.dump(data, buffer)\n buffer.seek(0)\n s3.put_object(Bucket=bucket_name, Key=key, Body=buffer.getvalue())\n print(f\"\u2713 Data uploaded to s3://{bucket_name}/{key}\")\n\ndef download_from_s3(bucket_name, key):\n \"\"\"Download data from S3.\"\"\"\n s3 = boto3.client('s3')\n response = s3.get_object(Bucket=bucket_name, Key=key)\n data = pickle.loads(response['Body'].read())\n print(f\"\u2713 Data downloaded from s3://{bucket_name}/{key}\")\n return data\n\n# Example usage:\n# sample_data = np.random.rand(1000, 100)\n# upload_to_s3(sample_data, 'your-bucket', 'input/sample_data.pkl')\n# result = process_s3_data('your-bucket', 'input/sample_data.pkl', 'output/results.pkl')\n# print(result)", "execution_count": null }, { "cell_type": "markdown", "id": "security", "metadata": {}, "source": [ "## Security Best Practices\n", "\n", "### Security Group Configuration" ] }, { "cell_type": "code", "id": "security-group", "metadata": {}, "outputs": [], "source": "def create_clustrix_security_group(vpc_id, your_ip):\n \"\"\"\n Create a security group for Clustrix with minimal required access.\n \n Args:\n vpc_id: VPC ID where to create the security group\n your_ip: Your public IP address (get from https://checkip.amazonaws.com)\n \n Returns:\n Security group ID\n \"\"\"\n ec2 = boto3.client('ec2')\n \n try:\n response = ec2.create_security_group(\n GroupName='clustrix-sg',\n Description='Security group for Clustrix compute nodes',\n VpcId=vpc_id\n )\n \n sg_id = response['GroupId']\n \n # Add SSH access from your IP only\n ec2.authorize_security_group_ingress(\n GroupId=sg_id,\n IpPermissions=[\n {\n 'IpProtocol': 'tcp',\n 'FromPort': 22,\n 'ToPort': 22,\n 'IpRanges': [{'CidrIp': f'{your_ip}/32', 'Description': 'SSH access'}]\n }\n ]\n )\n \n print(f\"\u2713 Created security group: {sg_id}\")\n return sg_id\n \n except Exception as e:\n print(f\"\u2717 Error creating security group: {e}\")\n return None\n\n# Helper function to get your public IP\ndef get_my_public_ip():\n \"\"\"Get your current public IP address.\"\"\"\n import requests\n try:\n response = requests.get('https://checkip.amazonaws.com')\n return response.text.strip()\n except:\n print(\"Could not determine public IP. Please check manually at https://checkip.amazonaws.com\")\n return None\n\n# Example usage:\n# my_ip = get_my_public_ip()\n# if my_ip:\n# print(f\"Your public IP: {my_ip}\")\n# # sg_id = create_clustrix_security_group('vpc-xxxxxxxxx', my_ip)", "execution_count": null }, { "cell_type": "markdown", "id": "or3qhdz81af", "source": "### AWS Security Checklist for Clustrix\n\n\u2713 **Authentication & Access**\n- Use IAM roles instead of access keys when possible\n- Restrict security groups to your IP address only\n- Regularly rotate SSH keys and access credentials\n\n\u2713 **Network Security**\n- Use private subnets for compute nodes when possible\n- Enable VPC Flow Logs for network monitoring\n- Use AWS Systems Manager Session Manager instead of direct SSH when possible\n\n\u2713 **Data Protection**\n- Use encrypted EBS volumes and S3 buckets\n- Enable CloudTrail for API logging\n\n\u2713 **Monitoring & Management**\n- Set up billing alerts to monitor costs\n- Tag all resources for cost tracking and management", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "id": "cost-optimization", "metadata": {}, "source": [ "## Cost Optimization" ] }, { "cell_type": "code", "id": "cost-tips", "metadata": {}, "outputs": [], "source": "# Import Clustrix cost monitoring for AWS\nfrom clustrix import cost_tracking_decorator, get_cost_monitor, generate_cost_report, get_pricing_info\n\n# Example 1: Cost tracking with AWS instances\n@cost_tracking_decorator('aws', 'p3.2xlarge')\n@cluster(cores=8, memory=\"60GB\")\ndef aws_training_with_cost_tracking():\n \"\"\"Example training function with AWS cost tracking.\"\"\"\n import time\n import numpy as np\n \n print(\"Starting AWS training with cost monitoring...\")\n time.sleep(3) # Simulate training\n \n # Simulate GPU workload\n data = np.random.randn(2000, 2000)\n result = np.linalg.svd(data)\n \n print(\"Training completed!\")\n return {'accuracy': 0.92, 'epochs': 50}\n\n# Example 2: Compare AWS pricing\ndef compare_aws_pricing():\n \"\"\"Compare AWS EC2 pricing for different instance types.\"\"\"\n pricing = get_pricing_info('aws')\n if pricing:\n print(\"AWS EC2 On-Demand Pricing (USD/hour):\")\n \n # Group by category\n gpu_instances = {k: v for k, v in pricing.items() if k.startswith(('p3', 'p4d', 'g4dn'))}\n compute_instances = {k: v for k, v in pricing.items() if k.startswith('c5')}\n memory_instances = {k: v for k, v in pricing.items() if k.startswith('r5')}\n \n print(\"\\nGPU Instances:\")\n for instance, price in sorted(gpu_instances.items(), key=lambda x: x[1]):\n print(f\" {instance:<20}: ${price:.3f}/hour\")\n \n print(\"\\nCompute Optimized:\")\n for instance, price in sorted(compute_instances.items(), key=lambda x: x[1]):\n print(f\" {instance:<20}: ${price:.3f}/hour\")\n \n print(\"\\nMemory Optimized:\")\n for instance, price in sorted(memory_instances.items(), key=lambda x: x[1]):\n print(f\" {instance:<20}: ${price:.3f}/hour\")\n\n# Example 3: AWS Spot vs On-Demand cost analysis\ndef aws_spot_cost_analysis():\n \"\"\"Analyze potential savings with AWS Spot instances.\"\"\"\n monitor = get_cost_monitor('aws')\n if monitor:\n print(\"AWS Spot Instance Savings Analysis:\")\n print(\"-\" * 40)\n \n instance_types = ['p3.2xlarge', 'p3.8xlarge', 'g4dn.xlarge', 'c5.large']\n \n for instance in instance_types:\n on_demand = monitor.estimate_cost(instance, 1.0, use_spot=False)\n spot = monitor.estimate_cost(instance, 1.0, use_spot=True)\n savings = ((on_demand.hourly_rate - spot.hourly_rate) / on_demand.hourly_rate) * 100\n \n print(f\"{instance}:\")\n print(f\" On-Demand: ${on_demand.hourly_rate:.3f}/hour\")\n print(f\" Spot: ${spot.hourly_rate:.3f}/hour\")\n print(f\" Savings: {savings:.1f}%\")\n print()\n\n# Example 4: AWS Batch cost estimation\ndef estimate_aws_batch_costs():\n \"\"\"Estimate costs for AWS Batch workloads.\"\"\"\n monitor = get_cost_monitor('aws')\n if monitor:\n batch_estimate = monitor.estimate_batch_cost(\n job_queue=\"clustrix-batch-queue\",\n compute_environment=\"clustrix-compute-env\",\n estimated_jobs=100,\n avg_job_duration_hours=0.25\n )\n \n print(\"AWS Batch Cost Estimation:\")\n print(f\" Job Queue: {batch_estimate['job_queue']}\")\n print(f\" Total Jobs: {batch_estimate['estimated_jobs']}\")\n print(f\" Avg Duration: {batch_estimate['avg_job_duration_hours']} hours\")\n print(f\" Total Compute Hours: {batch_estimate['total_compute_hours']}\")\n print(f\" Estimated Cost: ${batch_estimate['estimated_cost']:.2f}\")\n print(f\" Cost per Job: ${batch_estimate['cost_per_job']:.4f}\")\n\n# Example 5: Regional pricing comparison\ndef compare_aws_regions():\n \"\"\"Compare AWS pricing across different regions.\"\"\"\n monitor = get_cost_monitor('aws')\n if monitor:\n print(\"AWS Regional Pricing Comparison for p3.2xlarge:\")\n print(\"-\" * 50)\n \n regional_pricing = monitor.get_region_pricing_comparison('p3.2xlarge')\n for region, pricing_info in regional_pricing.items():\n print(f\"{region}:\")\n print(f\" On-Demand: ${pricing_info['on_demand_hourly']:.3f}/hour\")\n print(f\" Est. Spot: ${pricing_info['estimated_spot_hourly']:.3f}/hour\")\n print()\n\n# Example 6: Real-time AWS cost monitoring\ndef monitor_aws_costs():\n \"\"\"Monitor current AWS resource usage and costs.\"\"\"\n report = generate_cost_report('aws', 'p3.2xlarge')\n if report:\n print(\"Current AWS Resource Status:\")\n print(f\" CPU Usage: {report['resource_usage']['cpu_percent']:.1f}%\")\n print(f\" Memory Usage: {report['resource_usage']['memory_percent']:.1f}%\")\n if report['resource_usage']['gpu_stats']:\n print(f\" GPU Count: {len(report['resource_usage']['gpu_stats'])}\")\n print(f\" Hourly Rate: ${report['cost_estimate']['hourly_rate']:.3f}\")\n \n if report['recommendations']:\n print(\"\\nCost Optimization Recommendations:\")\n for rec in report['recommendations']:\n print(f\" \u2022 {rec}\")\n\n# Run examples\nprint(\"AWS Cost Monitoring Examples:\")\nprint(\"=\" * 40)\n\nprint(\"\\n1. AWS Pricing Comparison:\")\ncompare_aws_pricing()\n\nprint(\"\\n2. Spot Instance Savings Analysis:\")\naws_spot_cost_analysis()\n\nprint(\"\\n3. AWS Batch Cost Estimation:\")\nestimate_aws_batch_costs()\n\nprint(\"\\n4. Regional Pricing Comparison:\")\ncompare_aws_regions()\n\nprint(\"\\n5. Current AWS Status:\")\nmonitor_aws_costs()\n\nprint(\"\\n\u2705 AWS cost monitoring examples ready!\")\nprint(\"\ud83d\udca1 Use @cost_tracking_decorator('aws', 'instance_type') for automatic cost tracking\")", "execution_count": null }, { "cell_type": "markdown", "id": "gb89uvgkc9", "source": "### AWS Cost Optimization for Clustrix\n\n#### 1. Instance Selection\n- **Use Spot Instances** for non-critical workloads (up to 90% savings)\n- **Choose right-sized instances** (don't over-provision)\n- **Consider AMD instances** (often cheaper than Intel)\n\n#### 2. Storage Optimization\n- Use **S3 Intelligent Tiering** for data\n- Delete temporary files and logs regularly\n- Use **gp3 EBS volumes** instead of gp2\n\n#### 3. Network Efficiency\n- Use same AZ for compute and storage to avoid data transfer costs\n- Minimize cross-region data transfer\n\n#### 4. Smart Scheduling\n- Use scheduled scaling for predictable workloads\n- Terminate instances when not in use\n- Use AWS Lambda for small, short-running tasks\n\n#### 5. Monitoring & Control\n- Set up cost alerts and budgets\n- Use AWS Cost Explorer to analyze spending\n- Monitor with CloudWatch to optimize resource usage", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "id": "cleanup", "metadata": {}, "source": [ "## Resource Cleanup" ] }, { "cell_type": "code", "id": "cleanup-resources", "metadata": {}, "outputs": [], "source": "def cleanup_aws_resources(instance_ids=None, security_group_ids=None):\n \"\"\"\n Clean up AWS resources to avoid ongoing charges.\n \n Args:\n instance_ids: List of EC2 instance IDs to terminate\n security_group_ids: List of security group IDs to delete\n \"\"\"\n ec2 = boto3.client('ec2')\n \n try:\n # Terminate instances\n if instance_ids:\n response = ec2.terminate_instances(InstanceIds=instance_ids)\n print(f\"\u23f3 Terminating instances: {instance_ids}\")\n \n # Wait for termination\n waiter = ec2.get_waiter('instance_terminated')\n waiter.wait(InstanceIds=instance_ids)\n print(\"\u2713 Instances terminated.\")\n \n # Delete security groups\n if security_group_ids:\n for sg_id in security_group_ids:\n try:\n ec2.delete_security_group(GroupId=sg_id)\n print(f\"\u2713 Deleted security group: {sg_id}\")\n except Exception as e:\n print(f\"\u2717 Could not delete security group {sg_id}: {e}\")\n \n print(\"\u2705 Cleanup completed!\")\n \n except Exception as e:\n print(f\"\u2717 Error during cleanup: {e}\")\n\n# Helper function to list your running instances\ndef list_running_instances():\n \"\"\"List all running EC2 instances in your account.\"\"\"\n ec2 = boto3.client('ec2')\n \n try:\n response = ec2.describe_instances(\n Filters=[{'Name': 'instance-state-name', 'Values': ['running']}]\n )\n \n instances = []\n for reservation in response['Reservations']:\n for instance in reservation['Instances']:\n name = next((tag['Value'] for tag in instance.get('Tags', []) if tag['Key'] == 'Name'), 'No Name')\n instances.append({\n 'InstanceId': instance['InstanceId'],\n 'Name': name,\n 'InstanceType': instance['InstanceType'],\n 'PublicIpAddress': instance.get('PublicIpAddress', 'No Public IP')\n })\n \n if instances:\n print(\"Running instances:\")\n for inst in instances:\n print(f\" {inst['InstanceId']} ({inst['Name']}) - {inst['InstanceType']} - {inst['PublicIpAddress']}\")\n else:\n print(\"No running instances found.\")\n \n return instances\n \n except Exception as e:\n print(f\"\u2717 Error listing instances: {e}\")\n return []\n\n# Example cleanup (uncomment and modify as needed)\n# instances = list_running_instances()\n# cleanup_aws_resources(\n# instance_ids=['i-1234567890abcdef0'],\n# security_group_ids=['sg-1234567890abcdef0']\n# )", "execution_count": null }, { "cell_type": "markdown", "id": "5y04rycyarp", "source": "**\u26a0\ufe0f Important: Clean Up Resources**\n\nAlways remember to clean up AWS resources when you're done to avoid ongoing charges! The cleanup function above helps automate this process.", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "id": "advanced-example", "metadata": {}, "source": [ "## Advanced Example: Distributed Machine Learning" ] }, { "cell_type": "code", "id": "ml-example", "metadata": {}, "outputs": [], "source": "@cluster(cores=4, memory=\"8GB\", time=\"00:30:00\")\ndef distributed_model_training(data_params, model_params):\n \"\"\"\n Train a machine learning model on AWS with data from S3.\n \n Args:\n data_params: Dictionary with S3 bucket and key information\n model_params: Dictionary with model hyperparameters\n \n Returns:\n Dictionary with training results and model location\n \"\"\"\n import numpy as np\n import boto3\n import pickle\n import io\n from sklearn.ensemble import RandomForestClassifier\n from sklearn.metrics import accuracy_score\n from sklearn.model_selection import train_test_split\n \n # Download training data from S3\n s3 = boto3.client('s3')\n response = s3.get_object(\n Bucket=data_params['bucket'], \n Key=data_params['training_data_key']\n )\n data = pickle.loads(response['Body'].read())\n \n X, y = data['features'], data['labels']\n X_train, X_test, y_train, y_test = train_test_split(\n X, y, test_size=0.2, random_state=42\n )\n \n # Train model\n model = RandomForestClassifier(**model_params)\n model.fit(X_train, y_train)\n \n # Evaluate\n y_pred = model.predict(X_test)\n accuracy = accuracy_score(y_test, y_pred)\n \n # Save model to S3\n model_buffer = io.BytesIO()\n pickle.dump(model, model_buffer)\n model_buffer.seek(0)\n \n s3.put_object(\n Bucket=data_params['bucket'],\n Key=data_params['model_output_key'],\n Body=model_buffer.getvalue()\n )\n \n return {\n 'accuracy': accuracy,\n 'model_location': f\"s3://{data_params['bucket']}/{data_params['model_output_key']}\",\n 'training_samples': len(X_train),\n 'test_samples': len(X_test)\n }\n\n# Example usage:\n# data_config = {\n# 'bucket': 'your-ml-bucket',\n# 'training_data_key': 'datasets/training_data.pkl',\n# 'model_output_key': 'models/random_forest_model.pkl'\n# }\n# \n# model_config = {\n# 'n_estimators': 100,\n# 'max_depth': 10,\n# 'random_state': 42,\n# 'n_jobs': -1\n# }\n# \n# result = distributed_model_training(data_config, model_config)\n# print(f\"\u2713 Model trained with accuracy: {result['accuracy']:.4f}\")\n# print(f\"\u2713 Model saved to: {result['model_location']}\")\n# print(f\"\u2713 Training samples: {result['training_samples']:,}\")\n# print(f\"\u2713 Test samples: {result['test_samples']:,}\")", "execution_count": null }, { "cell_type": "markdown", "id": "summary", "metadata": {}, "source": [ "## Summary\n", "\n", "This tutorial covered:\n", "\n", "1. **Setup**: AWS credentials and Clustrix installation\n", "2. **EC2 Integration**: Direct instance configuration\n", "3. **AWS Batch**: Managed job scheduling\n", "4. **ParallelCluster**: HPC-optimized clusters\n", "5. **S3 Integration**: Data storage and retrieval\n", "6. **Security**: Best practices for safe deployment\n", "7. **Cost Optimization**: Strategies to minimize expenses\n", "8. **Resource Management**: Proper cleanup procedures\n", "\n", "### Next Steps\n", "\n", "- Set up your AWS credentials and test the basic configuration\n", "- Start with a simple EC2 instance for initial testing\n", "- Consider ParallelCluster for production HPC workloads\n", "- Implement proper monitoring and cost controls\n", "- Explore AWS Spot instances for cost-effective batch processing\n", "\n", "### Resources\n", "\n", "- [AWS ParallelCluster Documentation](https://docs.aws.amazon.com/parallelcluster/)\n", "- [AWS Batch User Guide](https://docs.aws.amazon.com/batch/)\n", "- [AWS HPC Workshops](https://hpc-workshops.com/)\n", "- [Clustrix Documentation](https://clustrix.readthedocs.io/)\n", "\n", "**Remember**: Always monitor your AWS costs and clean up resources when not in use!" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 5 }