{ "cells": [ { "cell_type": "markdown", "id": "azure-title", "metadata": {}, "source": "# Microsoft Azure Cloud Tutorial\n\nThis tutorial demonstrates how to use Clustrix with Microsoft Azure cloud infrastructure for scalable distributed computing.\n\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ContextLab/clustrix/blob/master/docs/source/notebooks/azure_cloud_tutorial.ipynb)\n\n## Overview\n\nAzure provides several services that integrate well with Clustrix:\n\n- **Azure Virtual Machines**: Scalable compute instances\n- **Azure Batch**: Managed job scheduling service\n- **Azure CycleCloud**: HPC cluster orchestration\n- **Azure Machine Learning Compute**: ML-optimized infrastructure\n- **Azure Container Instances**: Serverless containers\n- **Azure Blob Storage**: Object storage for data and results\n- **Azure Virtual Network**: Network isolation and security\n\n## Prerequisites\n\n### Required Azure Setup\n\n1. **Azure Account**: Active Azure subscription with appropriate permissions\n2. **Azure CLI**: Installed and configured on your local machine\n3. **SSH Key Pair**: For secure VM access\n4. **Resource Quotas**: Sufficient compute quotas in your preferred region\n5. **Billing Setup**: Credit card or other payment method configured\n\n### Local Environment Setup\n\n1. **Python Environment**: Python 3.8+ with pip\n2. **SSH Client**: OpenSSH or equivalent\n3. **Git**: For version control (optional but recommended)\n4. **Code Editor**: VS Code, PyCharm, or your preferred editor", "outputs": [] }, { "cell_type": "markdown", "id": "installation", "metadata": {}, "source": "## Step-by-Step Setup Guide\n\n### Step 1: Install Azure CLI\n\nFirst, install the Azure CLI on your local machine:\n\n**Windows (PowerShell):**\n```powershell\nInvoke-WebRequest -Uri https://aka.ms/installazurecliwindows -OutFile .\\AzureCLI.msi; Start-Process msiexec.exe -Wait -ArgumentList '/I AzureCLI.msi /quiet'; rm .\\AzureCLI.msi\n```\n\n**macOS (Homebrew):**\n```bash\nbrew update && brew install azure-cli\n```\n\n**Linux (Ubuntu/Debian):**\n```bash\ncurl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash\n```\n\n### Step 2: Create Azure Account and Subscription\n\n1. Go to [Azure Portal](https://portal.azure.com)\n2. Sign up for a free account (includes $200 credit)\n3. Complete account verification\n4. Note your Subscription ID from the Azure Portal\n\n### Step 3: Install Clustrix with Azure Dependencies", "outputs": [] }, { "cell_type": "code", "execution_count": null, "id": "install", "metadata": {}, "outputs": [], "source": [ "# Install Clustrix with Azure support\n", "!pip install clustrix azure-identity azure-mgmt-compute azure-mgmt-network azure-storage-blob\n", "\n", "# Import required libraries\n", "import clustrix\n", "from clustrix import cluster, configure\n", "from azure.identity import DefaultAzureCredential\n", "from azure.mgmt.compute import ComputeManagementClient\n", "from azure.mgmt.network import NetworkManagementClient\n", "from azure.storage.blob import BlobServiceClient\n", "import os\n", "import numpy as np\n", "import time\n", "import json" ] }, { "cell_type": "markdown", "id": "azure-credentials", "metadata": {}, "source": "## Step 4: Azure Authentication Setup\n\nConfigure your Azure credentials. You can do this in several ways:\n\n### Option 1: Azure CLI Authentication (Recommended for Development)\n\nThis is the simplest method for getting started:", "outputs": [] }, { "cell_type": "code", "id": "azure-cli-auth", "metadata": {}, "outputs": [], "source": "# Login with Azure CLI (run this in terminal)\n# az login\n\n# Set your subscription (replace with your actual subscription ID)\n# az account set --subscription \"12345678-1234-1234-1234-123456789012\"\n\n# Verify authentication\n!az account show --output table", "execution_count": null }, { "cell_type": "markdown", "id": "azure-creds-env", "metadata": {}, "source": "### Option 2: Service Principal Authentication (Recommended for Production)\n\nFor production environments, create a service principal with limited permissions:\n\n**Create Service Principal (run in terminal):**\n```bash\n# Create service principal\naz ad sp create-for-rbac --name \"clustrix-service-principal\" --role contributor\n\n# The output will include:\n# - appId (client ID)\n# - password (client secret)\n# - tenant (tenant ID)\n```\n\n**Set Environment Variables:**", "outputs": [] }, { "cell_type": "code", "id": "service-principal", "metadata": {}, "outputs": [], "source": "# Set Azure credentials as environment variables (replace with your actual values)\n# os.environ['AZURE_CLIENT_ID'] = 'your-client-id-from-service-principal'\n# os.environ['AZURE_CLIENT_SECRET'] = 'your-client-secret-from-service-principal' \n# os.environ['AZURE_TENANT_ID'] = 'your-tenant-id-from-service-principal'\n\n# Test Azure connection\ntry:\n credential = DefaultAzureCredential()\n subscription_id = 'your-subscription-id' # Replace with actual ID\n \n compute_client = ComputeManagementClient(credential, subscription_id)\n # Test by listing VM sizes in East US\n vm_sizes = list(compute_client.virtual_machine_sizes.list('eastus'))\n print(f\"Successfully connected to Azure. Available VM sizes: {len(vm_sizes)}\")\nexcept Exception as e:\n print(f\"Azure connection failed: {e}\")\n print(\"Make sure you have:\")\n print(\"1. Run 'az login' or set service principal environment variables\")\n print(\"2. Set the correct subscription ID\")\n print(\"3. Have appropriate permissions in your Azure subscription\")", "execution_count": null }, { "cell_type": "markdown", "id": "1hqa6m0oltd", "source": "### Step 5: Generate SSH Key Pair\n\nClustrix requires SSH access to remote VMs. Generate an SSH key pair if you don't have one:\n\n**Generate SSH Key (run in terminal):**\n```bash\n# Generate SSH key pair (press Enter for default location)\nssh-keygen -t rsa -b 4096 -C \"your-email@example.com\"\n\n# Add key to SSH agent\nssh-add ~/.ssh/id_rsa\n\n# Display public key (you'll need this for VM creation)\ncat ~/.ssh/id_rsa.pub\n```\n\n**Important Notes:**\n- Keep your private key (`~/.ssh/id_rsa`) secure and never share it\n- You'll use the public key (`~/.ssh/id_rsa.pub`) when creating Azure VMs\n- Make sure you have set up authentication and have the correct subscription ID", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "id": "vm-setup", "metadata": {}, "source": "## Method 1: Azure Virtual Machines Configuration\n\n### Step 6: Create Resource Group and Azure VM for Clustrix\n\nFirst, create a resource group to organize your Azure resources:", "outputs": [] }, { "cell_type": "code", "id": "vm-creation", "metadata": {}, "outputs": [], "source": "def create_clustrix_vm(resource_group, vm_name, location='eastus', vm_size='Standard_D4s_v3'):\n \"\"\"\n Create an Azure VM configured for Clustrix.\n \n Args:\n resource_group: Azure resource group name\n vm_name: Name for the VM\n location: Azure region\n vm_size: VM size (CPU/memory configuration)\n \n Returns:\n VM details including public IP\n \"\"\"\n # Cloud-init script for VM setup\n cloud_init_script = '''\n#cloud-config\npackage_update: true\npackages:\n - python3\n - python3-pip\n - git\n - htop\n\nruncmd:\n # Install clustrix and common packages\n - pip3 install clustrix numpy scipy pandas scikit-learn\n \n # Install uv for faster package management\n - curl -LsSf https://astral.sh/uv/install.sh | sh\n \n # Create clustrix user\n - useradd -m -s /bin/bash clustrix\n - usermod -aG sudo clustrix\n - echo \"clustrix ALL=(ALL) NOPASSWD:ALL\" >> /etc/sudoers\n \n # Setup SSH for clustrix user\n - mkdir -p /home/clustrix/.ssh\n - cp /home/azureuser/.ssh/authorized_keys /home/clustrix/.ssh/\n - chown -R clustrix:clustrix /home/clustrix/.ssh\n - chmod 700 /home/clustrix/.ssh\n - chmod 600 /home/clustrix/.ssh/authorized_keys\n \n # Create working directory\n - mkdir -p /tmp/clustrix\n - chown clustrix:clustrix /tmp/clustrix\n'''\n \n # Azure CLI commands for VM creation\n azure_commands = f\"\"\"\n# Create resource group\naz group create --name {resource_group} --location {location}\n\n# Create VM with cloud-init\naz vm create \\\\\n --resource-group {resource_group} \\\\\n --name {vm_name} \\\\\n --image Ubuntu2204 \\\\\n --size {vm_size} \\\\\n --admin-username azureuser \\\\\n --generate-ssh-keys \\\\\n --custom-data cloud-init.txt \\\\\n --public-ip-sku Standard \\\\\n --tags Purpose=Clustrix Environment=Tutorial\n\n# Get public IP\naz vm show \\\\\n --resource-group {resource_group} \\\\\n --name {vm_name} \\\\\n --show-details \\\\\n --query publicIps \\\\\n --output tsv\n\"\"\"\n \n return {\n 'resource_group': resource_group,\n 'vm_name': vm_name,\n 'location': location,\n 'vm_size': vm_size,\n 'commands': azure_commands,\n 'cloud_init': cloud_init_script\n }\n\n# Example VM configuration\nvm_config = create_clustrix_vm(\n resource_group='clustrix-tutorial-rg',\n vm_name='clustrix-vm-01',\n location='eastus',\n vm_size='Standard_D4s_v3' # 4 vCPUs, 16 GB RAM\n)\n\nprint(\"Save the cloud-init script to a file called 'cloud-init.txt' in your current directory\")\nprint(\"Then execute these Azure CLI commands to create your VM:\")\nprint(\"-\" * 60)\nprint(vm_config['commands'])", "execution_count": null }, { "cell_type": "markdown", "id": "2qs6d0q31fd", "source": "### Cloud-Init Script\n\nSave this cloud-init script to a file named `cloud-init.txt` in your current directory:", "metadata": {}, "outputs": [] }, { "cell_type": "code", "id": "nbe27b4ha1e", "source": "# Display the cloud-init script content\nprint(vm_config['cloud_init'])", "metadata": {}, "outputs": [], "execution_count": null }, { "cell_type": "markdown", "id": "clustrix-azure-config", "metadata": {}, "source": "### Step 7: Configure Clustrix for Azure VM\n\nAfter your VM is created and you have the public IP address, configure Clustrix to use it:", "outputs": [] }, { "cell_type": "code", "id": "config-azure-vm", "metadata": {}, "outputs": [], "source": "# Configure Clustrix to use your Azure VM\n# Replace 'your-vm-public-ip' with the actual IP from: az vm show --resource-group clustrix-tutorial-rg --name clustrix-vm-01 --show-details --query publicIps --output tsv\n\nconfigure(\n cluster_type=\"ssh\",\n cluster_host=\"your-vm-public-ip\", # Replace with actual IP\n username=\"clustrix\", # or \"azureuser\" if using default user\n key_file=\"~/.ssh/id_rsa\", # Azure CLI generated key\n remote_work_dir=\"/tmp/clustrix\",\n package_manager=\"auto\", # Will use uv if available\n default_cores=4,\n default_memory=\"8GB\",\n default_time=\"01:00:00\"\n)\n\nprint(\"Clustrix configured for Azure VM\")\nprint(\"Make sure to replace 'your-vm-public-ip' with your actual VM's public IP address\")", "execution_count": null }, { "cell_type": "markdown", "id": "kh72n7h6uzp", "source": "### Testing Your Azure VM Connection\n\nBefore running Clustrix jobs, test your SSH connection to the VM:\n\n```bash\n# Test SSH connection (replace with your actual IP)\nssh -i ~/.ssh/id_rsa clustrix@your-vm-public-ip\n\n# Or if using default azureuser:\nssh -i ~/.ssh/id_rsa azureuser@your-vm-public-ip\n```\n\n**Troubleshooting Connection Issues:**\n- Ensure your VM is running: `az vm show --resource-group clustrix-tutorial-rg --name clustrix-vm-01 --show-details --query powerState`\n- Check Network Security Group rules allow SSH (port 22)\n- Verify your SSH key is correct and has proper permissions (`chmod 600 ~/.ssh/id_rsa`)", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "id": "azure-example", "metadata": {}, "source": [ "### Example: Remote Computation on Azure VM" ] }, { "cell_type": "code", "id": "azure-vm-example", "metadata": {}, "outputs": [], "source": "@cluster(cores=2, memory=\"4GB\")\ndef azure_numerical_analysis(matrix_size=1000, iterations=10):\n \"\"\"Perform numerical analysis on Azure VM.\"\"\"\n import numpy as np\n import time\n \n results = []\n \n for i in range(iterations):\n # Generate random matrix\n matrix = np.random.rand(matrix_size, matrix_size)\n \n # Perform eigenvalue decomposition\n start_time = time.time()\n eigenvalues = np.linalg.eigvals(matrix)\n computation_time = time.time() - start_time\n \n results.append({\n 'iteration': i + 1,\n 'max_eigenvalue': float(np.max(eigenvalues.real)),\n 'min_eigenvalue': float(np.min(eigenvalues.real)),\n 'computation_time': computation_time\n })\n \n return {\n 'matrix_size': matrix_size,\n 'total_iterations': iterations,\n 'average_time': np.mean([r['computation_time'] for r in results]),\n 'results': results\n }\n\n# Run computation on Azure VM (uncomment after configuring your VM)\n# result = azure_numerical_analysis(matrix_size=500, iterations=5)\n# print(f\"Completed {result['total_iterations']} iterations\")\n# print(f\"Average computation time: {result['average_time']:.3f} seconds\")\n\nprint(\"Example function defined. Configure your VM IP address and uncomment the lines above to run.\")", "execution_count": null }, { "cell_type": "markdown", "id": "azure-batch", "metadata": {}, "source": [ "## Method 2: Azure Batch Configuration\n", "\n", "Azure Batch provides managed job scheduling for large-scale parallel workloads:" ] }, { "cell_type": "code", "id": "azure-batch-setup", "metadata": {}, "outputs": [], "source": "def setup_azure_batch_environment():\n \"\"\"\n Template for setting up Azure Batch environment.\n This requires manual setup through Azure portal or CLI.\n \"\"\"\n \n batch_setup_commands = \"\"\"\n# Create Azure Batch account\naz batch account create \\\\\n --name clustrixbatch \\\\\n --resource-group clustrix-tutorial-rg \\\\\n --location eastus\n\n# Create storage account for Batch\naz storage account create \\\\\n --name clustrixstorage \\\\\n --resource-group clustrix-tutorial-rg \\\\\n --location eastus \\\\\n --sku Standard_LRS\n\n# Link storage to Batch account\naz batch account set \\\\\n --name clustrixbatch \\\\\n --resource-group clustrix-tutorial-rg \\\\\n --storage-account clustrixstorage\n\n# Create Batch pool\naz batch pool create \\\\\n --id clustrix-pool \\\\\n --vm-size Standard_D2s_v3 \\\\\n --target-dedicated-nodes 2 \\\\\n --image canonical:0001-com-ubuntu-server-jammy:22_04-lts \\\\\n --node-agent-sku-id \"batch.node.ubuntu 22.04\"\n\n# Create Batch job\naz batch job create \\\\\n --id clustrix-job \\\\\n --pool-id clustrix-pool\n\"\"\"\n \n batch_config = {\n 'account_name': 'clustrixbatch',\n 'account_url': 'https://clustrixbatch.eastus.batch.azure.com',\n 'resource_group': 'clustrix-tutorial-rg',\n 'pool_id': 'clustrix-pool',\n 'job_id': 'clustrix-job'\n }\n \n return batch_config, batch_setup_commands\n\nbatch_config, batch_commands = setup_azure_batch_environment()\n\nprint(\"Azure Batch Configuration:\")\nprint(json.dumps(batch_config, indent=2))\nprint(\"\\nTo set up Azure Batch, run these commands:\")\nprint(\"-\" * 50)\nprint(batch_commands)", "execution_count": null }, { "cell_type": "markdown", "id": "i6n8kc7lvi", "source": "**Important Notes for Azure Batch:**\n- Azure Batch integration with Clustrix requires custom implementation\n- Consider using Azure CycleCloud for HPC workloads instead\n- Batch is better suited for managed job scheduling at scale", "metadata": {} }, { "cell_type": "markdown", "id": "cyclecloud", "metadata": {}, "source": [ "## Method 3: Azure CycleCloud Integration\n", "\n", "Azure CycleCloud is designed for HPC workloads and provides SLURM integration:" ] }, { "cell_type": "code", "id": "cyclecloud-config", "metadata": {}, "outputs": [], "source": "# Azure CycleCloud cluster template for Clustrix\ncyclecloud_template = \"\"\"\n# CycleCloud SLURM cluster template\n# Save as clustrix-slurm.txt and import into CycleCloud\n\n[cluster clustrix-slurm]\nFormLayout = selectionpanel\nCategory = Schedulers\nIconUrl = static/cloud/cluster/ui/ClusterIcon/slurm.png\n\n [[node defaults]]\n UsePublicNetwork = false\n Credentials = $Credentials\n SubnetId = $SubnetId\n Region = $Region\n KeyPairLocation = ~/.ssh/cyclecloud.pem\n \n # Install clustrix on all nodes\n [[[configuration]]]\n clustrix.version = latest\n \n [[[cluster-init clustrix:default:1.0.0]]]\n \n [[node master]]\n MachineType = $MasterMachineType\n IsReturnProxy = $ReturnProxy\n AdditionalClusterInitSpecs = $MasterClusterInitSpecs\n \n [[[configuration]]]\n slurm.version = $configuration_slurm_version\n \n [[[cluster-init slurm:master:2.7.2]]]\n \n [[[network-interface eth0]]]\n AssociatePublicIpAddress = $UsePublicNetwork\n\n [[nodearray execute]]\n MachineType = $ExecuteMachineType\n MaxCoreCount = $MaxExecuteCoreCount\n Interruptible = $UseLowPrio\n AdditionalClusterInitSpecs = $ExecuteClusterInitSpecs\n \n [[[configuration]]]\n slurm.version = $configuration_slurm_version\n \n [[[cluster-init slurm:execute:2.7.2]]]\n \n [[[network-interface eth0]]]\n AssociatePublicIpAddress = false\n\n[parameters About]\nOrder = 1\n\n [[parameters About Clustrix]]\n \n [[[parameter clustrix]]]\n HideLabel = true\n Config.Plugin = pico.widget.HtmlTemplateWidget\n Config.Template = \"Clustrix-enabled SLURM cluster for distributed computing\"\n\n[parameters Required Settings]\nOrder = 10\n\n [[parameters Virtual Machines]]\n Description = \"Configure the VM types and sizes\"\n Order = 20\n\n [[[parameter Region]]]\n Label = Region\n Description = Deployment Location\n ParameterType = Cloud.Region\n DefaultValue = eastus\n\n [[[parameter MasterMachineType]]]\n Label = Master VM Type\n Description = Master node VM type\n ParameterType = Cloud.MachineType\n DefaultValue = Standard_D4s_v3\n\n [[[parameter ExecuteMachineType]]]\n Label = Execute VM Type\n Description = Execute node VM type\n ParameterType = Cloud.MachineType\n DefaultValue = Standard_H16r\n\n\"\"\"\n\ndef configure_for_cyclecloud(master_ip, cluster_name=\"clustrix-slurm\"):\n \"\"\"Configure Clustrix to use Azure CycleCloud SLURM cluster.\"\"\"\n configure(\n cluster_type=\"slurm\",\n cluster_host=master_ip,\n username=\"cyclecloud\", # Default CycleCloud user\n key_file=\"~/.ssh/cyclecloud.pem\",\n remote_work_dir=\"/shared/clustrix\", # Use shared storage\n package_manager=\"uv\",\n module_loads=[\"python3\"],\n environment_variables={\n \"CLUSTRIX_CLUSTER\": cluster_name\n },\n default_cores=8,\n default_memory=\"16GB\",\n default_time=\"02:00:00\",\n default_partition=\"hpc\"\n )\n return f\"Configured Clustrix for CycleCloud cluster: {cluster_name}\"\n\nprint(\"CycleCloud Template (save as clustrix-slurm.txt):\")\nprint(cyclecloud_template)\n\n# Example configuration (uncomment and modify as needed)\n# config_message = configure_for_cyclecloud(\"10.1.0.4\", \"my-clustrix-cluster\")\n# print(config_message)", "execution_count": null }, { "cell_type": "markdown", "id": "88eh6lcuqop", "source": "**Azure CycleCloud Benefits:**\n- Best-in-class HPC cluster management for Azure\n- Native SLURM integration works seamlessly with Clustrix\n- Automatic scaling and cost optimization\n- Enterprise-grade security and compliance\n- Hybrid cloud capabilities for on-premises integration", "metadata": {} }, { "cell_type": "markdown", "id": "azure-storage", "metadata": {}, "source": [ "## Data Management with Azure Blob Storage" ] }, { "cell_type": "code", "id": "blob-storage", "metadata": {}, "outputs": [], "source": "@cluster(cores=2, memory=\"4GB\")\ndef process_blob_data(storage_account, container_name, input_blob, output_blob, storage_key=None):\n \"\"\"Process data from Azure Blob Storage and save results back.\"\"\"\n from azure.storage.blob import BlobServiceClient\n from azure.identity import DefaultAzureCredential\n import numpy as np\n import pickle\n import io\n \n # Initialize Blob Service Client\n if storage_key:\n account_url = f\"https://{storage_account}.blob.core.windows.net\"\n blob_service_client = BlobServiceClient(account_url=account_url, credential=storage_key)\n else:\n # Use managed identity or Azure CLI authentication\n account_url = f\"https://{storage_account}.blob.core.windows.net\"\n credential = DefaultAzureCredential()\n blob_service_client = BlobServiceClient(account_url=account_url, credential=credential)\n \n # Download data from blob storage\n blob_client = blob_service_client.get_blob_client(container=container_name, blob=input_blob)\n blob_data = blob_client.download_blob()\n data = pickle.loads(blob_data.readall())\n \n # Process the data\n processed_data = {\n 'original_shape': data.shape if hasattr(data, 'shape') else len(data),\n 'mean': float(np.mean(data)) if hasattr(data, '__iter__') else float(data),\n 'std': float(np.std(data)) if hasattr(data, '__iter__') else 0.0,\n 'max': float(np.max(data)) if hasattr(data, '__iter__') else float(data),\n 'min': float(np.min(data)) if hasattr(data, '__iter__') else float(data),\n 'processing_timestamp': time.time(),\n 'processed_on': 'azure-vm'\n }\n \n # Upload results to blob storage\n output_buffer = io.BytesIO()\n pickle.dump(processed_data, output_buffer)\n output_buffer.seek(0)\n \n output_blob_client = blob_service_client.get_blob_client(container=container_name, blob=output_blob)\n output_blob_client.upload_blob(output_buffer.getvalue(), overwrite=True)\n \n return f\"Processed data saved to blob: {output_blob}\"\n\n# Utility functions for Azure Blob Storage\ndef upload_to_blob(data, storage_account, container_name, blob_name, storage_key=None):\n \"\"\"Upload data to Azure Blob Storage.\"\"\"\n if storage_key:\n account_url = f\"https://{storage_account}.blob.core.windows.net\"\n blob_service_client = BlobServiceClient(account_url=account_url, credential=storage_key)\n else:\n account_url = f\"https://{storage_account}.blob.core.windows.net\"\n credential = DefaultAzureCredential()\n blob_service_client = BlobServiceClient(account_url=account_url, credential=credential)\n \n buffer = io.BytesIO()\n pickle.dump(data, buffer)\n buffer.seek(0)\n \n blob_client = blob_service_client.get_blob_client(container=container_name, blob=blob_name)\n blob_client.upload_blob(buffer.getvalue(), overwrite=True)\n return f\"Data uploaded to blob: {blob_name}\"\n\ndef download_from_blob(storage_account, container_name, blob_name, storage_key=None):\n \"\"\"Download data from Azure Blob Storage.\"\"\"\n if storage_key:\n account_url = f\"https://{storage_account}.blob.core.windows.net\"\n blob_service_client = BlobServiceClient(account_url=account_url, credential=storage_key)\n else:\n account_url = f\"https://{storage_account}.blob.core.windows.net\"\n credential = DefaultAzureCredential()\n blob_service_client = BlobServiceClient(account_url=account_url, credential=credential)\n \n blob_client = blob_service_client.get_blob_client(container=container_name, blob=blob_name)\n blob_data = blob_client.download_blob()\n return pickle.loads(blob_data.readall())\n\n# Example usage (uncomment and modify as needed):\n# sample_data = np.random.rand(1000, 50)\n# upload_result = upload_to_blob(sample_data, 'yourstorageaccount', 'data', 'input/sample.pkl')\n# print(upload_result)\n# \n# process_result = process_blob_data('yourstorageaccount', 'data', 'input/sample.pkl', 'output/results.pkl')\n# print(process_result)\n\nprint(\"Azure Blob Storage integration functions defined.\")", "execution_count": null }, { "cell_type": "markdown", "id": "azure-ml", "metadata": {}, "source": [ "## Azure Machine Learning Compute Integration" ] }, { "cell_type": "code", "id": "azure-ml-compute", "metadata": {}, "outputs": [], "source": "def setup_azure_ml_compute():\n \"\"\"\n Template for setting up Azure ML compute clusters.\n These can be used with Clustrix for ML workloads.\n \"\"\"\n \n aml_setup_commands = \"\"\"\n# Create Azure ML workspace\naz ml workspace create \\\\\n --name clustrix-ml-workspace \\\\\n --resource-group clustrix-tutorial-rg \\\\\n --location eastus\n\n# Create compute cluster\naz ml compute create \\\\\n --name clustrix-compute \\\\\n --type amlcompute \\\\\n --min-instances 0 \\\\\n --max-instances 4 \\\\\n --size Standard_DS3_v2 \\\\\n --workspace-name clustrix-ml-workspace \\\\\n --resource-group clustrix-tutorial-rg\n\n# Create compute instance for development\naz ml compute create \\\\\n --name clustrix-dev-instance \\\\\n --type computeinstance \\\\\n --size Standard_DS3_v2 \\\\\n --workspace-name clustrix-ml-workspace \\\\\n --resource-group clustrix-tutorial-rg\n\"\"\"\n \n return {\n 'workspace': 'clustrix-ml-workspace',\n 'compute_cluster': 'clustrix-compute',\n 'compute_instance': 'clustrix-dev-instance',\n 'commands': aml_setup_commands\n }\n\n@cluster(cores=4, memory=\"8GB\")\ndef azure_ml_training_job(dataset_params, model_params):\n \"\"\"Example ML training job that could run on Azure ML compute.\"\"\"\n import numpy as np\n from sklearn.ensemble import RandomForestClassifier\n from sklearn.metrics import accuracy_score, classification_report\n from sklearn.model_selection import train_test_split\n from sklearn.datasets import make_classification\n import time\n \n # Generate synthetic dataset (in real scenario, load from Azure ML datasets)\n X, y = make_classification(\n n_samples=dataset_params['n_samples'],\n n_features=dataset_params['n_features'],\n n_classes=dataset_params['n_classes'],\n random_state=42\n )\n \n X_train, X_test, y_train, y_test = train_test_split(\n X, y, test_size=0.2, random_state=42\n )\n \n # Train model\n start_time = time.time()\n model = RandomForestClassifier(**model_params)\n model.fit(X_train, y_train)\n training_time = time.time() - start_time\n \n # Evaluate\n y_pred = model.predict(X_test)\n accuracy = accuracy_score(y_test, y_pred)\n \n return {\n 'accuracy': accuracy,\n 'training_time': training_time,\n 'training_samples': len(X_train),\n 'test_samples': len(X_test),\n 'feature_importance': model.feature_importances_.tolist()[:10], # Top 10\n 'model_params': model_params,\n 'dataset_params': dataset_params\n }\n\naml_config = setup_azure_ml_compute()\n\nprint(\"Azure ML Setup Commands:\")\nprint(aml_config['commands'])\n\n# Example usage (uncomment to run after setting up Azure ML):\n# dataset_config = {'n_samples': 10000, 'n_features': 20, 'n_classes': 3}\n# model_config = {'n_estimators': 100, 'max_depth': 10, 'random_state': 42, 'n_jobs': -1}\n# result = azure_ml_training_job(dataset_config, model_config)\n# print(f\"Model trained with accuracy: {result['accuracy']:.4f}\")\n\nprint(\"Azure ML integration example defined.\")", "execution_count": null }, { "cell_type": "markdown", "id": "azure-security", "metadata": {}, "source": [ "## Security Best Practices" ] }, { "cell_type": "code", "id": "azure-security-setup", "metadata": {}, "outputs": [], "source": "def setup_azure_security_for_clustrix(resource_group='clustrix-tutorial-rg', location='eastus'):\n \"\"\"\n Security configuration for Azure + Clustrix deployment.\n \"\"\"\n \n security_commands = f\"\"\"\n# Create virtual network with private subnets\naz network vnet create \\\\\n --resource-group {resource_group} \\\\\n --name clustrix-vnet \\\\\n --address-prefix 10.1.0.0/16 \\\\\n --subnet-name clustrix-subnet \\\\\n --subnet-prefix 10.1.0.0/24 \\\\\n --location {location}\n\n# Create Network Security Group with restrictive rules\naz network nsg create \\\\\n --resource-group {resource_group} \\\\\n --name clustrix-nsg \\\\\n --location {location}\n\n# Allow SSH only from your IP (replace with your actual IP)\naz network nsg rule create \\\\\n --resource-group {resource_group} \\\\\n --nsg-name clustrix-nsg \\\\\n --name AllowSSHFromMyIP \\\\\n --protocol tcp \\\\\n --priority 1000 \\\\\n --destination-port-range 22 \\\\\n --source-address-prefixes YOUR_IP_ADDRESS/32 \\\\\n --access allow\n\n# Allow internal communication\naz network nsg rule create \\\\\n --resource-group {resource_group} \\\\\n --nsg-name clustrix-nsg \\\\\n --name AllowVnetInbound \\\\\n --protocol '*' \\\\\n --priority 1001 \\\\\n --source-address-prefixes 10.1.0.0/16 \\\\\n --destination-address-prefixes 10.1.0.0/16 \\\\\n --access allow\n\n# Create Key Vault for secrets management\naz keyvault create \\\\\n --resource-group {resource_group} \\\\\n --name clustrix-keyvault-$(uuidgen | tr '[:upper:]' '[:lower:]' | cut -c1-8) \\\\\n --location {location} \\\\\n --enable-disk-encryption \\\\\n --sku standard\n\n# Create managed identity for VMs\naz identity create \\\\\n --resource-group {resource_group} \\\\\n --name clustrix-identity \\\\\n --location {location}\n\n# Create storage account with private endpoint\naz storage account create \\\\\n --resource-group {resource_group} \\\\\n --name clustrixstorage$(uuidgen | tr '[:upper:]' '[:lower:]' | cut -c1-8) \\\\\n --location {location} \\\\\n --sku Standard_LRS \\\\\n --allow-blob-public-access false \\\\\n --https-only true \\\\\n --min-tls-version TLS1_2\n\n# Enable Azure Security Center\naz security auto-provisioning-setting update \\\\\n --name default \\\\\n --auto-provision on\n\"\"\"\n \n return {\n 'resource_group': resource_group,\n 'location': location,\n 'vnet_name': 'clustrix-vnet',\n 'subnet_name': 'clustrix-subnet',\n 'nsg_name': 'clustrix-nsg',\n 'security_commands': security_commands\n }\n\nsecurity_config = setup_azure_security_for_clustrix()\n\nprint(\"Azure Security Setup Commands:\")\nprint(security_config['security_commands'])\nprint(\"\\nIMPORTANT: Replace 'YOUR_IP_ADDRESS' with your actual public IP address!\")\nprint(\"Find your IP with: curl ifconfig.me\")", "execution_count": null }, { "cell_type": "markdown", "id": "8jwgahv9skt", "source": "### Azure Security Checklist for Clustrix\n\n\u2713 **Authentication and Access**\n- Use Azure Active Directory for authentication\n- Enable managed identities instead of service principals when possible\n- Restrict Network Security Groups to your IP address only\n- Use private endpoints for storage accounts\n\n\u2713 **Infrastructure Security**\n- Enable disk encryption for all VMs\n- Use Azure Key Vault for secrets and certificates\n- Enable Azure Security Center recommendations\n- Use Azure Private Link for service connectivity\n\n\u2713 **Monitoring and Compliance**\n- Enable diagnostic logging and monitoring\n- Implement Azure Policy for compliance\n- Use Azure Defender for cloud workload protection\n- Regularly rotate access keys and certificates\n\n\u2713 **Cost and Resource Management**\n- Set up cost alerts and spending limits\n- Tag all resources for governance and cost tracking", "metadata": {} }, { "cell_type": "markdown", "id": "cost-management", "metadata": {}, "source": [ "## Cost Management and Optimization" ] }, { "cell_type": "code", "id": "azure-cost-optimization", "metadata": {}, "outputs": [], "source": "# Import Clustrix cost monitoring for Azure\nfrom clustrix import cost_tracking_decorator, get_cost_monitor, generate_cost_report, get_pricing_info\n\n# Example 1: Cost tracking with Azure VMs\n@cost_tracking_decorator('azure', 'Standard_NC6s_v3')\n@cluster(cores=6, memory=\"112GB\")\ndef azure_training_with_cost_tracking():\n \"\"\"Example training function with Azure cost tracking.\"\"\"\n import time\n import numpy as np\n \n print(\"Starting Azure training with cost monitoring...\")\n time.sleep(2) # Simulate training\n \n # Simulate ML workload\n data = np.random.randn(1500, 1500)\n result = np.linalg.qr(data)\n \n print(\"Training completed!\")\n return {'accuracy': 0.89, 'training_time': 2.0}\n\n# Example 2: Compare Azure VM pricing\ndef compare_azure_pricing():\n \"\"\"Compare Azure VM pricing for different instance types.\"\"\"\n pricing = get_pricing_info('azure')\n if pricing:\n print(\"Azure VM Pay-as-you-go Pricing (USD/hour):\")\n \n # Group by category\n gpu_vms = {k: v for k, v in pricing.items() if k.startswith('Standard_NC')}\n general_vms = {k: v for k, v in pricing.items() if k.startswith('Standard_D')}\n compute_vms = {k: v for k, v in pricing.items() if k.startswith('Standard_F')}\n memory_vms = {k: v for k, v in pricing.items() if k.startswith('Standard_E')}\n \n print(\"\\nGPU VMs:\")\n for vm, price in sorted(gpu_vms.items(), key=lambda x: x[1]):\n print(f\" {vm:<25}: ${price:.3f}/hour\")\n \n print(\"\\nGeneral Purpose:\")\n for vm, price in sorted(general_vms.items(), key=lambda x: x[1]):\n print(f\" {vm:<25}: ${price:.3f}/hour\")\n \n print(\"\\nCompute Optimized:\")\n for vm, price in sorted(compute_vms.items(), key=lambda x: x[1]):\n print(f\" {vm:<25}: ${price:.3f}/hour\")\n\n# Example 3: Azure Spot VM savings analysis\ndef azure_spot_cost_analysis():\n \"\"\"Analyze potential savings with Azure Spot VMs.\"\"\"\n monitor = get_cost_monitor('azure')\n if monitor:\n print(\"Azure Spot VM Savings Analysis:\")\n print(\"-\" * 40)\n \n vm_types = ['Standard_NC6s_v3', 'Standard_D4s_v3', 'Standard_F8s_v2', 'Standard_E8s_v3']\n \n for vm in vm_types:\n pay_as_you_go = monitor.estimate_cost(vm, 1.0, use_spot=False)\n spot = monitor.estimate_cost(vm, 1.0, use_spot=True)\n savings = ((pay_as_you_go.hourly_rate - spot.hourly_rate) / pay_as_you_go.hourly_rate) * 100\n \n print(f\"{vm}:\")\n print(f\" Pay-as-you-go: ${pay_as_you_go.hourly_rate:.3f}/hour\")\n print(f\" Spot: ${spot.hourly_rate:.3f}/hour\")\n print(f\" Savings: {savings:.1f}%\")\n print()\n\n# Example 4: Azure Batch cost estimation\ndef estimate_azure_batch_costs():\n \"\"\"Estimate costs for Azure Batch workloads.\"\"\"\n monitor = get_cost_monitor('azure')\n if monitor:\n batch_estimate = monitor.estimate_batch_cost(\n pool_name=\"clustrix-batch-pool\",\n vm_size=\"Standard_D4s_v3\",\n target_nodes=8,\n estimated_duration_hours=2.0\n )\n \n print(\"Azure Batch Cost Estimation:\")\n print(f\" Pool Name: {batch_estimate['pool_name']}\")\n print(f\" VM Size: {batch_estimate['vm_size']}\")\n print(f\" Target Nodes: {batch_estimate['target_nodes']}\")\n print(f\" Duration: {batch_estimate['estimated_duration_hours']} hours\")\n print(f\" Total Compute Hours: {batch_estimate['total_compute_hours']}\")\n print(f\" Estimated Cost: ${batch_estimate['estimated_cost']:.2f}\")\n print(f\" Cost per Node-Hour: ${batch_estimate['cost_per_node_hour']:.3f}\")\n\n# Example 5: Regional pricing comparison\ndef compare_azure_regions():\n \"\"\"Compare Azure pricing across different regions.\"\"\"\n monitor = get_cost_monitor('azure')\n if monitor:\n print(\"Azure Regional Pricing Comparison for Standard_NC6s_v3:\")\n print(\"-\" * 55)\n \n regional_pricing = monitor.get_region_pricing_comparison('Standard_NC6s_v3')\n for region, pricing_info in regional_pricing.items():\n print(f\"{region}:\")\n print(f\" Pay-as-you-go: ${pricing_info['pay_as_you_go_hourly']:.3f}/hour\")\n print(f\" Est. Spot: ${pricing_info['estimated_spot_hourly']:.3f}/hour\")\n print()\n\n# Example 6: Real-time Azure cost monitoring\ndef monitor_azure_costs():\n \"\"\"Monitor current Azure resource usage and costs.\"\"\"\n report = generate_cost_report('azure', 'Standard_NC6s_v3')\n if report:\n print(\"Current Azure Resource Status:\")\n print(f\" CPU Usage: {report['resource_usage']['cpu_percent']:.1f}%\")\n print(f\" Memory Usage: {report['resource_usage']['memory_percent']:.1f}%\")\n if report['resource_usage']['gpu_stats']:\n print(f\" GPU Count: {len(report['resource_usage']['gpu_stats'])}\")\n print(f\" Hourly Rate: ${report['cost_estimate']['hourly_rate']:.3f}\")\n \n if report['recommendations']:\n print(\"\\nCost Optimization Recommendations:\")\n for rec in report['recommendations']:\n print(f\" \u2022 {rec}\")\n\n# Example 7: Spot VM configuration for cost savings\ndef configure_spot_vm():\n \"\"\"Example configuration for using Azure Spot VMs.\"\"\"\n configure(\n cluster_type=\"ssh\",\n cluster_host=\"your-spot-vm-ip\",\n username=\"azureuser\",\n key_file=\"~/.ssh/id_rsa\",\n remote_work_dir=\"/tmp/clustrix\",\n # Spot VMs can be evicted, so use shorter timeouts\n default_time=\"00:30:00\",\n job_poll_interval=60, # Check more frequently\n cleanup_on_success=True # Clean up quickly\n )\n return \"Configured for Azure Spot VMs with appropriate timeouts.\"\n\n# Run examples\nprint(\"Azure Cost Monitoring Examples:\")\nprint(\"=\" * 40)\n\nprint(\"\\n1. Azure VM Pricing Comparison:\")\ncompare_azure_pricing()\n\nprint(\"\\n2. Spot VM Savings Analysis:\")\nazure_spot_cost_analysis()\n\nprint(\"\\n3. Azure Batch Cost Estimation:\")\nestimate_azure_batch_costs()\n\nprint(\"\\n4. Regional Pricing Comparison:\")\ncompare_azure_regions()\n\nprint(\"\\n5. Current Azure Status:\")\nmonitor_azure_costs()\n\nprint(\"\\n\u2705 Azure cost monitoring examples ready!\")\nprint(\"\ud83d\udca1 Use @cost_tracking_decorator('azure', 'vm_size') for automatic cost tracking\")\n\n# Example spot VM configuration (uncomment to use)\n# spot_config = configure_spot_vm()\n# print(f\"Configuration result: {spot_config}\")", "execution_count": null }, { "cell_type": "markdown", "id": "hnir1aze4v", "source": "### Azure Cost Optimization for Clustrix\n\n#### Cost Monitoring Commands\n\n```bash\n# Set up budget alerts\naz consumption budget create \\\n --budget-name clustrix-monthly-budget \\\n --amount 100 \\\n --time-grain Monthly \\\n --time-period-start 2025-01-01 \\\n --time-period-end 2025-12-31\n\n# Get current costs\naz consumption usage list \\\n --start-date 2025-01-01 \\\n --end-date 2025-01-31\n\n# List resource costs by resource group\naz costmanagement query \\\n --type Usage \\\n --dataset-aggregation '{\"totalCost\":{\"name\":\"PreTaxCost\",\"function\":\"Sum\"}}' \\\n --dataset-grouping name=ResourceGroup type=Dimension\n\n# Set up auto-shutdown for VMs\naz vm auto-shutdown \\\n --resource-group clustrix-tutorial-rg \\\n --name clustrix-vm-01 \\\n --time 1900 \\\n --email your-email@example.com\n```\n\n#### Cost Optimization Recommendations\n\n1. **Use Spot VMs** for batch processing (up to 90% savings)\n2. **Enable auto-shutdown** for dev resources\n3. **Implement lifecycle policies** for blob storage\n4. **Set up budget alerts** and spending limits\n5. **Regular cost reviews** and resource optimization\n6. **Use reserved instances** for predictable workloads\n7. **Choose appropriate VM sizes** based on actual usage", "metadata": {} }, { "cell_type": "markdown", "id": "scseti9hu", "source": "### Azure Cost Optimization for Clustrix\n\n#### 1. Compute Optimization\n- **Use Azure Spot VMs** for non-critical workloads (up to 90% savings)\n- **Choose B-series burstable VMs** for variable workloads\n- **Use reserved instances** for predictable workloads (1-3 year terms)\n- **Enable auto-shutdown** for dev/test VMs\n- **Right-size VMs** based on actual usage\n\n#### 2. Storage Optimization\n- **Use appropriate storage tiers** (Hot, Cool, Archive)\n- **Enable lifecycle management** for blob storage\n- **Use managed disks** with appropriate performance tiers\n- **Implement data deduplication** and compression\n\n#### 3. Network Optimization\n- **Minimize data transfer** between regions\n- **Use Azure CDN** for static content\n- **Optimize data transfer** patterns\n\n#### 4. Monitoring and Management\n- **Set up budget alerts** and spending limits\n- **Use Azure Cost Management + Billing**\n- **Implement proper resource tagging**\n- **Regular cost reviews** and optimizations\n\n#### 5. Service-Specific\n- **Use Azure Functions** for small, event-driven tasks\n- **Consider Azure Container Instances** for short-running jobs\n- **Use Azure Batch** for large-scale parallel processing", "metadata": {} }, { "cell_type": "markdown", "id": "cleanup-azure", "metadata": {}, "source": [ "## Resource Cleanup" ] }, { "cell_type": "code", "id": "cleanup-azure-resources", "metadata": {}, "outputs": [], "source": "def cleanup_azure_resources(resource_group='clustrix-tutorial-rg'):\n \"\"\"\n Clean up Azure resources to avoid ongoing charges.\n \n Args:\n resource_group: Name of the resource group to clean up\n \"\"\"\n \n cleanup_commands = f\"\"\"\n# List all resources in the resource group\naz resource list --resource-group {resource_group} --output table\n\n# Stop all VMs first (to gracefully shut down)\naz vm deallocate --resource-group {resource_group} --name clustrix-vm-01\n\n# Delete specific resources individually (optional - more granular control)\n# az vm delete --resource-group {resource_group} --name clustrix-vm-01 --yes\n# az disk delete --resource-group {resource_group} --name clustrix-vm-01_disk1_* --yes\n# az network public-ip delete --resource-group {resource_group} --name clustrix-vm-01PublicIP\n\n# WARNING: Delete the entire resource group (removes ALL resources)\naz group delete --name {resource_group} --yes --no-wait\n\n# Verify deletion\naz group list --output table | grep {resource_group}\n\"\"\"\n \n return {\n 'resource_group': resource_group,\n 'cleanup_commands': cleanup_commands\n }\n\ncleanup_info = cleanup_azure_resources()\n\nprint(f\"Azure Resource Cleanup Commands for Resource Group: {cleanup_info['resource_group']}\")\nprint(\"=\" * 70)\nprint(cleanup_info['cleanup_commands'])\nprint(\"\\n\" + \"\u26a0\ufe0f \" * 10 + \" IMPORTANT WARNINGS \" + \"\u26a0\ufe0f \" * 10)\nprint(\"1. The 'az group delete' command will permanently delete ALL resources in the group!\")\nprint(\"2. Review the resources first with 'az resource list' before proceeding\")\nprint(\"3. Make sure to backup any important data before deletion\")\nprint(\"4. Consider stopping VMs instead of deleting if you plan to use them again\")\nprint(\"5. Deleted resources cannot be recovered - this action is irreversible!\")\nprint(\"=\" * 70)", "execution_count": null }, { "cell_type": "markdown", "id": "advanced-azure-example", "metadata": {}, "source": [ "## Advanced Example: Distributed Image Processing" ] }, { "cell_type": "code", "id": "image-processing-example", "metadata": {}, "outputs": [], "source": "@cluster(cores=4, memory=\"8GB\", time=\"00:45:00\")\ndef azure_image_processing_pipeline(storage_config, processing_params):\n \"\"\"\n Distributed image processing pipeline using Azure Blob Storage.\n \"\"\"\n from azure.storage.blob import BlobServiceClient\n from azure.identity import DefaultAzureCredential\n import numpy as np\n from PIL import Image\n import io\n import time\n \n # Connect to Azure Blob Storage\n account_url = f\"https://{storage_config['account_name']}.blob.core.windows.net\"\n credential = DefaultAzureCredential()\n blob_service_client = BlobServiceClient(account_url=account_url, credential=credential)\n \n container_client = blob_service_client.get_container_client(storage_config['container'])\n \n processed_images = []\n processing_stats = []\n \n # List images to process\n blob_list = container_client.list_blobs(name_starts_with=storage_config['input_prefix'])\n \n for blob in blob_list:\n if blob.name.lower().endswith(('.png', '.jpg', '.jpeg')):\n start_time = time.time()\n \n try:\n # Download image\n blob_client = blob_service_client.get_blob_client(\n container=storage_config['container'], blob=blob.name\n )\n image_data = blob_client.download_blob().readall()\n \n # Process image\n image = Image.open(io.BytesIO(image_data))\n \n # Apply processing operations\n if processing_params.get('resize'):\n image = image.resize(processing_params['resize'])\n \n if processing_params.get('grayscale'):\n image = image.convert('L')\n \n if processing_params.get('rotate'):\n image = image.rotate(processing_params['rotate'])\n \n # Convert back to bytes\n output_buffer = io.BytesIO()\n image.save(output_buffer, format='PNG')\n output_buffer.seek(0)\n \n # Upload processed image\n output_blob_name = blob.name.replace(\n storage_config['input_prefix'], \n storage_config['output_prefix']\n )\n \n output_blob_client = blob_service_client.get_blob_client(\n container=storage_config['container'], blob=output_blob_name\n )\n output_blob_client.upload_blob(output_buffer.getvalue(), overwrite=True)\n \n processing_time = time.time() - start_time\n \n processed_images.append(output_blob_name)\n processing_stats.append({\n 'input_blob': blob.name,\n 'output_blob': output_blob_name,\n 'processing_time': processing_time,\n 'original_size': image.size,\n 'processed_size': image.size\n })\n \n except Exception as e:\n print(f\"Error processing {blob.name}: {e}\")\n \n return {\n 'processed_count': len(processed_images),\n 'total_processing_time': sum(stat['processing_time'] for stat in processing_stats),\n 'average_processing_time': np.mean([stat['processing_time'] for stat in processing_stats]) if processing_stats else 0,\n 'processed_images': processed_images[:10], # First 10 for brevity\n 'processing_stats': processing_stats[:5] # First 5 for brevity\n }\n\n# Example usage (uncomment and modify as needed):\n# storage_config = {\n# 'account_name': 'yourstorageaccount',\n# 'container': 'images',\n# 'input_prefix': 'raw/',\n# 'output_prefix': 'processed/'\n# }\n# \n# processing_config = {\n# 'resize': (800, 600),\n# 'grayscale': True,\n# 'rotate': 0\n# }\n# \n# result = azure_image_processing_pipeline(storage_config, processing_config)\n# print(f\"Processed {result['processed_count']} images in {result['total_processing_time']:.2f} seconds\")\n\nprint(\"Advanced image processing pipeline example defined.\")", "execution_count": null }, { "cell_type": "markdown", "id": "azure-summary", "metadata": {}, "source": [ "## Summary\n", "\n", "This tutorial covered:\n", "\n", "1. **Setup**: Azure authentication and Clustrix installation\n", "2. **VM Integration**: Direct Azure VM configuration\n", "3. **Azure Batch**: Managed job scheduling\n", "4. **CycleCloud**: HPC-optimized clusters with SLURM\n", "5. **Blob Storage**: Data storage and retrieval\n", "6. **Azure ML**: Machine learning compute integration\n", "7. **Security**: Best practices for safe deployment\n", "8. **Cost Management**: Strategies to minimize expenses\n", "9. **Resource Management**: Proper cleanup procedures\n", "\n", "### Next Steps\n", "\n", "- Set up your Azure credentials and test the basic configuration\n", "- Start with a simple VM for initial testing\n", "- Consider CycleCloud for production HPC workloads\n", "- Implement proper monitoring and cost controls\n", "- Explore Azure Spot VMs for cost-effective batch processing\n", "\n", "### Azure-Specific Advantages\n", "\n", "- **CycleCloud**: Best-in-class HPC cluster management\n", "- **Azure ML**: Integrated machine learning platform\n", "- **Hybrid Cloud**: Seamless integration with on-premises\n", "- **Enterprise Integration**: Active Directory and enterprise tools\n", "- **Compliance**: Strong compliance and security certifications\n", "\n", "### Resources\n", "\n", "- [Azure CycleCloud Documentation](https://docs.microsoft.com/en-us/azure/cyclecloud/)\n", "- [Azure Batch Documentation](https://docs.microsoft.com/en-us/azure/batch/)\n", "- [Azure Machine Learning Documentation](https://docs.microsoft.com/en-us/azure/machine-learning/)\n", "- [Azure HPC Documentation](https://docs.microsoft.com/en-us/azure/architecture/topics/high-performance-computing/)\n", "- [Clustrix Documentation](https://clustrix.readthedocs.io/)\n", "\n", "**Remember**: Always monitor your Azure costs and clean up resources when not in use!" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 5 }