Microsoft Azure Cloud Tutorial¶

This tutorial demonstrates how to use Clustrix with Microsoft Azure cloud infrastructure for scalable distributed computing.

Overview¶

Azure provides several services that integrate well with Clustrix:

Azure Virtual Machines: Scalable compute instances
Azure Batch: Managed job scheduling service
Azure CycleCloud: HPC cluster orchestration
Azure Machine Learning Compute: ML-optimized infrastructure
Azure Container Instances: Serverless containers
Azure Blob Storage: Object storage for data and results
Azure Virtual Network: Network isolation and security

Prerequisites¶

Required Azure Setup¶

Azure Account: Active Azure subscription with appropriate permissions
Azure CLI: Installed and configured on your local machine
SSH Key Pair: For secure VM access
Resource Quotas: Sufficient compute quotas in your preferred region
Billing Setup: Credit card or other payment method configured

Local Environment Setup¶

Python Environment: Python 3.8+ with pip
SSH Client: OpenSSH or equivalent
Git: For version control (optional but recommended)
Code Editor: VS Code, PyCharm, or your preferred editor

Step-by-Step Setup Guide¶

Step 1: Install Azure CLI¶

First, install the Azure CLI on your local machine:

Windows (PowerShell):

Invoke-WebRequest -Uri https://aka.ms/installazurecliwindows -OutFile .\AzureCLI.msi; Start-Process msiexec.exe -Wait -ArgumentList '/I AzureCLI.msi /quiet'; rm .\AzureCLI.msi

macOS (Homebrew):

brew update && brew install azure-cli

Linux (Ubuntu/Debian):

curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash

Step 2: Create Azure Account and Subscription¶

Go to Azure Portal
Sign up for a free account (includes $200 credit)
Complete account verification
Note your Subscription ID from the Azure Portal

Step 3: Install Clustrix with Azure Dependencies¶

[ ]:

# Install Clustrix with Azure support
!pip install clustrix azure-identity azure-mgmt-compute azure-mgmt-network azure-storage-blob

# Import required libraries
import clustrix
from clustrix import cluster, configure
from azure.identity import DefaultAzureCredential
from azure.mgmt.compute import ComputeManagementClient
from azure.mgmt.network import NetworkManagementClient
from azure.storage.blob import BlobServiceClient
import os
import numpy as np
import time
import json

Step 4: Azure Authentication Setup¶

Configure your Azure credentials. You can do this in several ways:

Option 1: Azure CLI Authentication (Recommended for Development)¶

This is the simplest method for getting started:

[ ]:

# Login with Azure CLI (run this in terminal)
# az login

# Set your subscription (replace with your actual subscription ID)
# az account set --subscription "12345678-1234-1234-1234-123456789012"

# Verify authentication
!az account show --output table

Option 2: Service Principal Authentication (Recommended for Production)¶

For production environments, create a service principal with limited permissions:

Create Service Principal (run in terminal):

# Create service principal
az ad sp create-for-rbac --name "clustrix-service-principal" --role contributor

# The output will include:
# - appId (client ID)
# - password (client secret)
# - tenant (tenant ID)

Set Environment Variables:

[ ]:

# Set Azure credentials as environment variables (replace with your actual values)
# os.environ['AZURE_CLIENT_ID'] = 'your-client-id-from-service-principal'
# os.environ['AZURE_CLIENT_SECRET'] = 'your-client-secret-from-service-principal'
# os.environ['AZURE_TENANT_ID'] = 'your-tenant-id-from-service-principal'

# Test Azure connection
try:
    credential = DefaultAzureCredential()
    subscription_id = 'your-subscription-id'  # Replace with actual ID

    compute_client = ComputeManagementClient(credential, subscription_id)
    # Test by listing VM sizes in East US
    vm_sizes = list(compute_client.virtual_machine_sizes.list('eastus'))
    print(f"Successfully connected to Azure. Available VM sizes: {len(vm_sizes)}")
except Exception as e:
    print(f"Azure connection failed: {e}")
    print("Make sure you have:")
    print("1. Run 'az login' or set service principal environment variables")
    print("2. Set the correct subscription ID")
    print("3. Have appropriate permissions in your Azure subscription")

Step 5: Generate SSH Key Pair¶

Clustrix requires SSH access to remote VMs. Generate an SSH key pair if you don’t have one:

Generate SSH Key (run in terminal):

# Generate SSH key pair (press Enter for default location)
ssh-keygen -t rsa -b 4096 -C "your-email@example.com"

# Add key to SSH agent
ssh-add ~/.ssh/id_rsa

# Display public key (you'll need this for VM creation)
cat ~/.ssh/id_rsa.pub

Important Notes:

Keep your private key (~/.ssh/id_rsa) secure and never share it
You’ll use the public key (~/.ssh/id_rsa.pub) when creating Azure VMs
Make sure you have set up authentication and have the correct subscription ID

Method 1: Azure Virtual Machines Configuration¶

Step 6: Create Resource Group and Azure VM for Clustrix¶

First, create a resource group to organize your Azure resources:

[ ]:

def create_clustrix_vm(resource_group, vm_name, location='eastus', vm_size='Standard_D4s_v3'):
    """
    Create an Azure VM configured for Clustrix.

    Args:
        resource_group: Azure resource group name
        vm_name: Name for the VM
        location: Azure region
        vm_size: VM size (CPU/memory configuration)

    Returns:
        VM details including public IP
    """
    # Cloud-init script for VM setup
    cloud_init_script = '''
#cloud-config
package_update: true
packages:
  - python3
  - python3-pip
  - git
  - htop

runcmd:
  # Install clustrix and common packages
  - pip3 install clustrix numpy scipy pandas scikit-learn

  # Install uv for faster package management
  - curl -LsSf https://astral.sh/uv/install.sh | sh

  # Create clustrix user
  - useradd -m -s /bin/bash clustrix
  - usermod -aG sudo clustrix
  - echo "clustrix ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers

  # Setup SSH for clustrix user
  - mkdir -p /home/clustrix/.ssh
  - cp /home/azureuser/.ssh/authorized_keys /home/clustrix/.ssh/
  - chown -R clustrix:clustrix /home/clustrix/.ssh
  - chmod 700 /home/clustrix/.ssh
  - chmod 600 /home/clustrix/.ssh/authorized_keys

  # Create working directory
  - mkdir -p /tmp/clustrix
  - chown clustrix:clustrix /tmp/clustrix
'''

    # Azure CLI commands for VM creation
    azure_commands = f"""
# Create resource group
az group create --name {resource_group} --location {location}

# Create VM with cloud-init
az vm create \\
  --resource-group {resource_group} \\
  --name {vm_name} \\
  --image Ubuntu2204 \\
  --size {vm_size} \\
  --admin-username azureuser \\
  --generate-ssh-keys \\
  --custom-data cloud-init.txt \\
  --public-ip-sku Standard \\
  --tags Purpose=Clustrix Environment=Tutorial

# Get public IP
az vm show \\
  --resource-group {resource_group} \\
  --name {vm_name} \\
  --show-details \\
  --query publicIps \\
  --output tsv
"""

    return {
        'resource_group': resource_group,
        'vm_name': vm_name,
        'location': location,
        'vm_size': vm_size,
        'commands': azure_commands,
        'cloud_init': cloud_init_script
    }

# Example VM configuration
vm_config = create_clustrix_vm(
    resource_group='clustrix-tutorial-rg',
    vm_name='clustrix-vm-01',
    location='eastus',
    vm_size='Standard_D4s_v3'  # 4 vCPUs, 16 GB RAM
)

print("Save the cloud-init script to a file called 'cloud-init.txt' in your current directory")
print("Then execute these Azure CLI commands to create your VM:")
print("-" * 60)
print(vm_config['commands'])

Cloud-Init Script¶

Save this cloud-init script to a file named cloud-init.txt in your current directory:

[ ]:

# Display the cloud-init script content
print(vm_config['cloud_init'])

Step 7: Configure Clustrix for Azure VM¶

After your VM is created and you have the public IP address, configure Clustrix to use it:

[ ]:

# Configure Clustrix to use your Azure VM
# Replace 'your-vm-public-ip' with the actual IP from: az vm show --resource-group clustrix-tutorial-rg --name clustrix-vm-01 --show-details --query publicIps --output tsv

configure(
    cluster_type="ssh",
    cluster_host="your-vm-public-ip",  # Replace with actual IP
    username="clustrix",  # or "azureuser" if using default user
    key_file="~/.ssh/id_rsa",  # Azure CLI generated key
    remote_work_dir="/tmp/clustrix",
    package_manager="auto",  # Will use uv if available
    default_cores=4,
    default_memory="8GB",
    default_time="01:00:00"
)

print("Clustrix configured for Azure VM")
print("Make sure to replace 'your-vm-public-ip' with your actual VM's public IP address")

Testing Your Azure VM Connection¶

Before running Clustrix jobs, test your SSH connection to the VM:

# Test SSH connection (replace with your actual IP)
ssh -i ~/.ssh/id_rsa clustrix@your-vm-public-ip

# Or if using default azureuser:
ssh -i ~/.ssh/id_rsa azureuser@your-vm-public-ip

Troubleshooting Connection Issues:

Ensure your VM is running: az vm show --resource-group clustrix-tutorial-rg --name clustrix-vm-01 --show-details --query powerState
Check Network Security Group rules allow SSH (port 22)
Verify your SSH key is correct and has proper permissions (chmod 600 ~/.ssh/id_rsa)

Example: Remote Computation on Azure VM¶

[ ]:

@cluster(cores=2, memory="4GB")
def azure_numerical_analysis(matrix_size=1000, iterations=10):
    """Perform numerical analysis on Azure VM."""
    import numpy as np
    import time

    results = []

    for i in range(iterations):
        # Generate random matrix
        matrix = np.random.rand(matrix_size, matrix_size)

        # Perform eigenvalue decomposition
        start_time = time.time()
        eigenvalues = np.linalg.eigvals(matrix)
        computation_time = time.time() - start_time

        results.append({
            'iteration': i + 1,
            'max_eigenvalue': float(np.max(eigenvalues.real)),
            'min_eigenvalue': float(np.min(eigenvalues.real)),
            'computation_time': computation_time
        })

    return {
        'matrix_size': matrix_size,
        'total_iterations': iterations,
        'average_time': np.mean([r['computation_time'] for r in results]),
        'results': results
    }

# Run computation on Azure VM (uncomment after configuring your VM)
# result = azure_numerical_analysis(matrix_size=500, iterations=5)
# print(f"Completed {result['total_iterations']} iterations")
# print(f"Average computation time: {result['average_time']:.3f} seconds")

print("Example function defined. Configure your VM IP address and uncomment the lines above to run.")

Method 2: Azure Batch Configuration¶

Azure Batch provides managed job scheduling for large-scale parallel workloads:

[ ]:

def setup_azure_batch_environment():
    """
    Template for setting up Azure Batch environment.
    This requires manual setup through Azure portal or CLI.
    """

    batch_setup_commands = """
# Create Azure Batch account
az batch account create \\
  --name clustrixbatch \\
  --resource-group clustrix-tutorial-rg \\
  --location eastus

# Create storage account for Batch
az storage account create \\
  --name clustrixstorage \\
  --resource-group clustrix-tutorial-rg \\
  --location eastus \\
  --sku Standard_LRS

# Link storage to Batch account
az batch account set \\
  --name clustrixbatch \\
  --resource-group clustrix-tutorial-rg \\
  --storage-account clustrixstorage

# Create Batch pool
az batch pool create \\
  --id clustrix-pool \\
  --vm-size Standard_D2s_v3 \\
  --target-dedicated-nodes 2 \\
  --image canonical:0001-com-ubuntu-server-jammy:22_04-lts \\
  --node-agent-sku-id "batch.node.ubuntu 22.04"

# Create Batch job
az batch job create \\
  --id clustrix-job \\
  --pool-id clustrix-pool
"""

    batch_config = {
        'account_name': 'clustrixbatch',
        'account_url': 'https://clustrixbatch.eastus.batch.azure.com',
        'resource_group': 'clustrix-tutorial-rg',
        'pool_id': 'clustrix-pool',
        'job_id': 'clustrix-job'
    }

    return batch_config, batch_setup_commands

batch_config, batch_commands = setup_azure_batch_environment()

print("Azure Batch Configuration:")
print(json.dumps(batch_config, indent=2))
print("\nTo set up Azure Batch, run these commands:")
print("-" * 50)
print(batch_commands)

Important Notes for Azure Batch:

Azure Batch integration with Clustrix requires custom implementation
Consider using Azure CycleCloud for HPC workloads instead
Batch is better suited for managed job scheduling at scale

Method 3: Azure CycleCloud Integration¶

Azure CycleCloud is designed for HPC workloads and provides SLURM integration:

[ ]:

# Azure CycleCloud cluster template for Clustrix
cyclecloud_template = """
# CycleCloud SLURM cluster template
# Save as clustrix-slurm.txt and import into CycleCloud

[cluster clustrix-slurm]
FormLayout = selectionpanel
Category = Schedulers
IconUrl = static/cloud/cluster/ui/ClusterIcon/slurm.png

    [[node defaults]]
    UsePublicNetwork = false
    Credentials = $Credentials
    SubnetId = $SubnetId
    Region = $Region
    KeyPairLocation = ~/.ssh/cyclecloud.pem

    # Install clustrix on all nodes
    [[[configuration]]]
    clustrix.version = latest

    [[[cluster-init clustrix:default:1.0.0]]]

    [[node master]]
    MachineType = $MasterMachineType
    IsReturnProxy = $ReturnProxy
    AdditionalClusterInitSpecs = $MasterClusterInitSpecs

        [[[configuration]]]
        slurm.version = $configuration_slurm_version

        [[[cluster-init slurm:master:2.7.2]]]

        [[[network-interface eth0]]]
        AssociatePublicIpAddress = $UsePublicNetwork

    [[nodearray execute]]
    MachineType = $ExecuteMachineType
    MaxCoreCount = $MaxExecuteCoreCount
    Interruptible = $UseLowPrio
    AdditionalClusterInitSpecs = $ExecuteClusterInitSpecs

        [[[configuration]]]
        slurm.version = $configuration_slurm_version

        [[[cluster-init slurm:execute:2.7.2]]]

        [[[network-interface eth0]]]
        AssociatePublicIpAddress = false

[parameters About]
Order = 1

    [[parameters About Clustrix]]

        [[[parameter clustrix]]]
        HideLabel = true
        Config.Plugin = pico.widget.HtmlTemplateWidget
        Config.Template = "Clustrix-enabled SLURM cluster for distributed computing"

[parameters Required Settings]
Order = 10

    [[parameters Virtual Machines]]
    Description = "Configure the VM types and sizes"
    Order = 20

        [[[parameter Region]]]
        Label = Region
        Description = Deployment Location
        ParameterType = Cloud.Region
        DefaultValue = eastus

        [[[parameter MasterMachineType]]]
        Label = Master VM Type
        Description = Master node VM type
        ParameterType = Cloud.MachineType
        DefaultValue = Standard_D4s_v3

        [[[parameter ExecuteMachineType]]]
        Label = Execute VM Type
        Description = Execute node VM type
        ParameterType = Cloud.MachineType
        DefaultValue = Standard_H16r

"""

def configure_for_cyclecloud(master_ip, cluster_name="clustrix-slurm"):
    """Configure Clustrix to use Azure CycleCloud SLURM cluster."""
    configure(
        cluster_type="slurm",
        cluster_host=master_ip,
        username="cyclecloud",  # Default CycleCloud user
        key_file="~/.ssh/cyclecloud.pem",
        remote_work_dir="/shared/clustrix",  # Use shared storage
        package_manager="uv",
        module_loads=["python3"],
        environment_variables={
            "CLUSTRIX_CLUSTER": cluster_name
        },
        default_cores=8,
        default_memory="16GB",
        default_time="02:00:00",
        default_partition="hpc"
    )
    return f"Configured Clustrix for CycleCloud cluster: {cluster_name}"

print("CycleCloud Template (save as clustrix-slurm.txt):")
print(cyclecloud_template)

# Example configuration (uncomment and modify as needed)
# config_message = configure_for_cyclecloud("10.1.0.4", "my-clustrix-cluster")
# print(config_message)

Azure CycleCloud Benefits:

Best-in-class HPC cluster management for Azure
Native SLURM integration works seamlessly with Clustrix
Automatic scaling and cost optimization
Enterprise-grade security and compliance
Hybrid cloud capabilities for on-premises integration

Data Management with Azure Blob Storage¶

[ ]:

@cluster(cores=2, memory="4GB")
def process_blob_data(storage_account, container_name, input_blob, output_blob, storage_key=None):
    """Process data from Azure Blob Storage and save results back."""
    from azure.storage.blob import BlobServiceClient
    from azure.identity import DefaultAzureCredential
    import numpy as np
    import pickle
    import io

    # Initialize Blob Service Client
    if storage_key:
        account_url = f"https://{storage_account}.blob.core.windows.net"
        blob_service_client = BlobServiceClient(account_url=account_url, credential=storage_key)
    else:
        # Use managed identity or Azure CLI authentication
        account_url = f"https://{storage_account}.blob.core.windows.net"
        credential = DefaultAzureCredential()
        blob_service_client = BlobServiceClient(account_url=account_url, credential=credential)

    # Download data from blob storage
    blob_client = blob_service_client.get_blob_client(container=container_name, blob=input_blob)
    blob_data = blob_client.download_blob()
    data = pickle.loads(blob_data.readall())

    # Process the data
    processed_data = {
        'original_shape': data.shape if hasattr(data, 'shape') else len(data),
        'mean': float(np.mean(data)) if hasattr(data, '__iter__') else float(data),
        'std': float(np.std(data)) if hasattr(data, '__iter__') else 0.0,
        'max': float(np.max(data)) if hasattr(data, '__iter__') else float(data),
        'min': float(np.min(data)) if hasattr(data, '__iter__') else float(data),
        'processing_timestamp': time.time(),
        'processed_on': 'azure-vm'
    }

    # Upload results to blob storage
    output_buffer = io.BytesIO()
    pickle.dump(processed_data, output_buffer)
    output_buffer.seek(0)

    output_blob_client = blob_service_client.get_blob_client(container=container_name, blob=output_blob)
    output_blob_client.upload_blob(output_buffer.getvalue(), overwrite=True)

    return f"Processed data saved to blob: {output_blob}"

# Utility functions for Azure Blob Storage
def upload_to_blob(data, storage_account, container_name, blob_name, storage_key=None):
    """Upload data to Azure Blob Storage."""
    if storage_key:
        account_url = f"https://{storage_account}.blob.core.windows.net"
        blob_service_client = BlobServiceClient(account_url=account_url, credential=storage_key)
    else:
        account_url = f"https://{storage_account}.blob.core.windows.net"
        credential = DefaultAzureCredential()
        blob_service_client = BlobServiceClient(account_url=account_url, credential=credential)

    buffer = io.BytesIO()
    pickle.dump(data, buffer)
    buffer.seek(0)

    blob_client = blob_service_client.get_blob_client(container=container_name, blob=blob_name)
    blob_client.upload_blob(buffer.getvalue(), overwrite=True)
    return f"Data uploaded to blob: {blob_name}"

def download_from_blob(storage_account, container_name, blob_name, storage_key=None):
    """Download data from Azure Blob Storage."""
    if storage_key:
        account_url = f"https://{storage_account}.blob.core.windows.net"
        blob_service_client = BlobServiceClient(account_url=account_url, credential=storage_key)
    else:
        account_url = f"https://{storage_account}.blob.core.windows.net"
        credential = DefaultAzureCredential()
        blob_service_client = BlobServiceClient(account_url=account_url, credential=credential)

    blob_client = blob_service_client.get_blob_client(container=container_name, blob=blob_name)
    blob_data = blob_client.download_blob()
    return pickle.loads(blob_data.readall())

# Example usage (uncomment and modify as needed):
# sample_data = np.random.rand(1000, 50)
# upload_result = upload_to_blob(sample_data, 'yourstorageaccount', 'data', 'input/sample.pkl')
# print(upload_result)
#
# process_result = process_blob_data('yourstorageaccount', 'data', 'input/sample.pkl', 'output/results.pkl')
# print(process_result)

print("Azure Blob Storage integration functions defined.")

Azure Machine Learning Compute Integration¶

[ ]:

def setup_azure_ml_compute():
    """
    Template for setting up Azure ML compute clusters.
    These can be used with Clustrix for ML workloads.
    """

    aml_setup_commands = """
# Create Azure ML workspace
az ml workspace create \\
  --name clustrix-ml-workspace \\
  --resource-group clustrix-tutorial-rg \\
  --location eastus

# Create compute cluster
az ml compute create \\
  --name clustrix-compute \\
  --type amlcompute \\
  --min-instances 0 \\
  --max-instances 4 \\
  --size Standard_DS3_v2 \\
  --workspace-name clustrix-ml-workspace \\
  --resource-group clustrix-tutorial-rg

# Create compute instance for development
az ml compute create \\
  --name clustrix-dev-instance \\
  --type computeinstance \\
  --size Standard_DS3_v2 \\
  --workspace-name clustrix-ml-workspace \\
  --resource-group clustrix-tutorial-rg
"""

    return {
        'workspace': 'clustrix-ml-workspace',
        'compute_cluster': 'clustrix-compute',
        'compute_instance': 'clustrix-dev-instance',
        'commands': aml_setup_commands
    }

@cluster(cores=4, memory="8GB")
def azure_ml_training_job(dataset_params, model_params):
    """Example ML training job that could run on Azure ML compute."""
    import numpy as np
    from sklearn.ensemble import RandomForestClassifier
    from sklearn.metrics import accuracy_score, classification_report
    from sklearn.model_selection import train_test_split
    from sklearn.datasets import make_classification
    import time

    # Generate synthetic dataset (in real scenario, load from Azure ML datasets)
    X, y = make_classification(
        n_samples=dataset_params['n_samples'],
        n_features=dataset_params['n_features'],
        n_classes=dataset_params['n_classes'],
        random_state=42
    )

    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42
    )

    # Train model
    start_time = time.time()
    model = RandomForestClassifier(**model_params)
    model.fit(X_train, y_train)
    training_time = time.time() - start_time

    # Evaluate
    y_pred = model.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)

    return {
        'accuracy': accuracy,
        'training_time': training_time,
        'training_samples': len(X_train),
        'test_samples': len(X_test),
        'feature_importance': model.feature_importances_.tolist()[:10],  # Top 10
        'model_params': model_params,
        'dataset_params': dataset_params
    }

aml_config = setup_azure_ml_compute()

print("Azure ML Setup Commands:")
print(aml_config['commands'])

# Example usage (uncomment to run after setting up Azure ML):
# dataset_config = {'n_samples': 10000, 'n_features': 20, 'n_classes': 3}
# model_config = {'n_estimators': 100, 'max_depth': 10, 'random_state': 42, 'n_jobs': -1}
# result = azure_ml_training_job(dataset_config, model_config)
# print(f"Model trained with accuracy: {result['accuracy']:.4f}")

print("Azure ML integration example defined.")

Security Best Practices¶

[ ]:

def setup_azure_security_for_clustrix(resource_group='clustrix-tutorial-rg', location='eastus'):
    """
    Security configuration for Azure + Clustrix deployment.
    """

    security_commands = f"""
# Create virtual network with private subnets
az network vnet create \\
  --resource-group {resource_group} \\
  --name clustrix-vnet \\
  --address-prefix 10.1.0.0/16 \\
  --subnet-name clustrix-subnet \\
  --subnet-prefix 10.1.0.0/24 \\
  --location {location}

# Create Network Security Group with restrictive rules
az network nsg create \\
  --resource-group {resource_group} \\
  --name clustrix-nsg \\
  --location {location}

# Allow SSH only from your IP (replace with your actual IP)
az network nsg rule create \\
  --resource-group {resource_group} \\
  --nsg-name clustrix-nsg \\
  --name AllowSSHFromMyIP \\
  --protocol tcp \\
  --priority 1000 \\
  --destination-port-range 22 \\
  --source-address-prefixes YOUR_IP_ADDRESS/32 \\
  --access allow

# Allow internal communication
az network nsg rule create \\
  --resource-group {resource_group} \\
  --nsg-name clustrix-nsg \\
  --name AllowVnetInbound \\
  --protocol '*' \\
  --priority 1001 \\
  --source-address-prefixes 10.1.0.0/16 \\
  --destination-address-prefixes 10.1.0.0/16 \\
  --access allow

# Create Key Vault for secrets management
az keyvault create \\
  --resource-group {resource_group} \\
  --name clustrix-keyvault-$(uuidgen | tr '[:upper:]' '[:lower:]' | cut -c1-8) \\
  --location {location} \\
  --enable-disk-encryption \\
  --sku standard

# Create managed identity for VMs
az identity create \\
  --resource-group {resource_group} \\
  --name clustrix-identity \\
  --location {location}

# Create storage account with private endpoint
az storage account create \\
  --resource-group {resource_group} \\
  --name clustrixstorage$(uuidgen | tr '[:upper:]' '[:lower:]' | cut -c1-8) \\
  --location {location} \\
  --sku Standard_LRS \\
  --allow-blob-public-access false \\
  --https-only true \\
  --min-tls-version TLS1_2

# Enable Azure Security Center
az security auto-provisioning-setting update \\
  --name default \\
  --auto-provision on
"""

    return {
        'resource_group': resource_group,
        'location': location,
        'vnet_name': 'clustrix-vnet',
        'subnet_name': 'clustrix-subnet',
        'nsg_name': 'clustrix-nsg',
        'security_commands': security_commands
    }

security_config = setup_azure_security_for_clustrix()

print("Azure Security Setup Commands:")
print(security_config['security_commands'])
print("\nIMPORTANT: Replace 'YOUR_IP_ADDRESS' with your actual public IP address!")
print("Find your IP with: curl ifconfig.me")

Azure Security Checklist for Clustrix¶

✓ Authentication and Access

Use Azure Active Directory for authentication
Enable managed identities instead of service principals when possible
Restrict Network Security Groups to your IP address only
Use private endpoints for storage accounts

✓ Infrastructure Security

Enable disk encryption for all VMs
Use Azure Key Vault for secrets and certificates
Enable Azure Security Center recommendations
Use Azure Private Link for service connectivity

✓ Monitoring and Compliance

Enable diagnostic logging and monitoring
Implement Azure Policy for compliance
Use Azure Defender for cloud workload protection
Regularly rotate access keys and certificates

✓ Cost and Resource Management

Set up cost alerts and spending limits
Tag all resources for governance and cost tracking

Cost Management and Optimization¶

[ ]:

# Import Clustrix cost monitoring for Azure
from clustrix import cost_tracking_decorator, get_cost_monitor, generate_cost_report, get_pricing_info

# Example 1: Cost tracking with Azure VMs
@cost_tracking_decorator('azure', 'Standard_NC6s_v3')
@cluster(cores=6, memory="112GB")
def azure_training_with_cost_tracking():
    """Example training function with Azure cost tracking."""
    import time
    import numpy as np

    print("Starting Azure training with cost monitoring...")
    time.sleep(2)  # Simulate training

    # Simulate ML workload
    data = np.random.randn(1500, 1500)
    result = np.linalg.qr(data)

    print("Training completed!")
    return {'accuracy': 0.89, 'training_time': 2.0}

# Example 2: Compare Azure VM pricing
def compare_azure_pricing():
    """Compare Azure VM pricing for different instance types."""
    pricing = get_pricing_info('azure')
    if pricing:
        print("Azure VM Pay-as-you-go Pricing (USD/hour):")

        # Group by category
        gpu_vms = {k: v for k, v in pricing.items() if k.startswith('Standard_NC')}
        general_vms = {k: v for k, v in pricing.items() if k.startswith('Standard_D')}
        compute_vms = {k: v for k, v in pricing.items() if k.startswith('Standard_F')}
        memory_vms = {k: v for k, v in pricing.items() if k.startswith('Standard_E')}

        print("\nGPU VMs:")
        for vm, price in sorted(gpu_vms.items(), key=lambda x: x[1]):
            print(f"  {vm:<25}: ${price:.3f}/hour")

        print("\nGeneral Purpose:")
        for vm, price in sorted(general_vms.items(), key=lambda x: x[1]):
            print(f"  {vm:<25}: ${price:.3f}/hour")

        print("\nCompute Optimized:")
        for vm, price in sorted(compute_vms.items(), key=lambda x: x[1]):
            print(f"  {vm:<25}: ${price:.3f}/hour")

# Example 3: Azure Spot VM savings analysis
def azure_spot_cost_analysis():
    """Analyze potential savings with Azure Spot VMs."""
    monitor = get_cost_monitor('azure')
    if monitor:
        print("Azure Spot VM Savings Analysis:")
        print("-" * 40)

        vm_types = ['Standard_NC6s_v3', 'Standard_D4s_v3', 'Standard_F8s_v2', 'Standard_E8s_v3']

        for vm in vm_types:
            pay_as_you_go = monitor.estimate_cost(vm, 1.0, use_spot=False)
            spot = monitor.estimate_cost(vm, 1.0, use_spot=True)
            savings = ((pay_as_you_go.hourly_rate - spot.hourly_rate) / pay_as_you_go.hourly_rate) * 100

            print(f"{vm}:")
            print(f"  Pay-as-you-go: ${pay_as_you_go.hourly_rate:.3f}/hour")
            print(f"  Spot:          ${spot.hourly_rate:.3f}/hour")
            print(f"  Savings:       {savings:.1f}%")
            print()

# Example 4: Azure Batch cost estimation
def estimate_azure_batch_costs():
    """Estimate costs for Azure Batch workloads."""
    monitor = get_cost_monitor('azure')
    if monitor:
        batch_estimate = monitor.estimate_batch_cost(
            pool_name="clustrix-batch-pool",
            vm_size="Standard_D4s_v3",
            target_nodes=8,
            estimated_duration_hours=2.0
        )

        print("Azure Batch Cost Estimation:")
        print(f"  Pool Name: {batch_estimate['pool_name']}")
        print(f"  VM Size: {batch_estimate['vm_size']}")
        print(f"  Target Nodes: {batch_estimate['target_nodes']}")
        print(f"  Duration: {batch_estimate['estimated_duration_hours']} hours")
        print(f"  Total Compute Hours: {batch_estimate['total_compute_hours']}")
        print(f"  Estimated Cost: ${batch_estimate['estimated_cost']:.2f}")
        print(f"  Cost per Node-Hour: ${batch_estimate['cost_per_node_hour']:.3f}")

# Example 5: Regional pricing comparison
def compare_azure_regions():
    """Compare Azure pricing across different regions."""
    monitor = get_cost_monitor('azure')
    if monitor:
        print("Azure Regional Pricing Comparison for Standard_NC6s_v3:")
        print("-" * 55)

        regional_pricing = monitor.get_region_pricing_comparison('Standard_NC6s_v3')
        for region, pricing_info in regional_pricing.items():
            print(f"{region}:")
            print(f"  Pay-as-you-go: ${pricing_info['pay_as_you_go_hourly']:.3f}/hour")
            print(f"  Est. Spot:     ${pricing_info['estimated_spot_hourly']:.3f}/hour")
            print()

# Example 6: Real-time Azure cost monitoring
def monitor_azure_costs():
    """Monitor current Azure resource usage and costs."""
    report = generate_cost_report('azure', 'Standard_NC6s_v3')
    if report:
        print("Current Azure Resource Status:")
        print(f"  CPU Usage: {report['resource_usage']['cpu_percent']:.1f}%")
        print(f"  Memory Usage: {report['resource_usage']['memory_percent']:.1f}%")
        if report['resource_usage']['gpu_stats']:
            print(f"  GPU Count: {len(report['resource_usage']['gpu_stats'])}")
        print(f"  Hourly Rate: ${report['cost_estimate']['hourly_rate']:.3f}")

        if report['recommendations']:
            print("\nCost Optimization Recommendations:")
            for rec in report['recommendations']:
                print(f"  • {rec}")

# Example 7: Spot VM configuration for cost savings
def configure_spot_vm():
    """Example configuration for using Azure Spot VMs."""
    configure(
        cluster_type="ssh",
        cluster_host="your-spot-vm-ip",
        username="azureuser",
        key_file="~/.ssh/id_rsa",
        remote_work_dir="/tmp/clustrix",
        # Spot VMs can be evicted, so use shorter timeouts
        default_time="00:30:00",
        job_poll_interval=60,  # Check more frequently
        cleanup_on_success=True  # Clean up quickly
    )
    return "Configured for Azure Spot VMs with appropriate timeouts."

# Run examples
print("Azure Cost Monitoring Examples:")
print("=" * 40)

print("\n1. Azure VM Pricing Comparison:")
compare_azure_pricing()

print("\n2. Spot VM Savings Analysis:")
azure_spot_cost_analysis()

print("\n3. Azure Batch Cost Estimation:")
estimate_azure_batch_costs()

print("\n4. Regional Pricing Comparison:")
compare_azure_regions()

print("\n5. Current Azure Status:")
monitor_azure_costs()

print("\n✅ Azure cost monitoring examples ready!")
print("💡 Use @cost_tracking_decorator('azure', 'vm_size') for automatic cost tracking")

# Example spot VM configuration (uncomment to use)
# spot_config = configure_spot_vm()
# print(f"Configuration result: {spot_config}")

Azure Cost Optimization for Clustrix¶

Cost Monitoring Commands¶

# Set up budget alerts
az consumption budget create \
  --budget-name clustrix-monthly-budget \
  --amount 100 \
  --time-grain Monthly \
  --time-period-start 2025-01-01 \
  --time-period-end 2025-12-31

# Get current costs
az consumption usage list \
  --start-date 2025-01-01 \
  --end-date 2025-01-31

# List resource costs by resource group
az costmanagement query \
  --type Usage \
  --dataset-aggregation '{"totalCost":{"name":"PreTaxCost","function":"Sum"}}' \
  --dataset-grouping name=ResourceGroup type=Dimension

# Set up auto-shutdown for VMs
az vm auto-shutdown \
  --resource-group clustrix-tutorial-rg \
  --name clustrix-vm-01 \
  --time 1900 \
  --email your-email@example.com

Cost Optimization Recommendations¶

Use Spot VMs for batch processing (up to 90% savings)
Enable auto-shutdown for dev resources
Implement lifecycle policies for blob storage
Set up budget alerts and spending limits
Regular cost reviews and resource optimization
Use reserved instances for predictable workloads
Choose appropriate VM sizes based on actual usage

Azure Cost Optimization for Clustrix¶

1. Compute Optimization¶

Use Azure Spot VMs for non-critical workloads (up to 90% savings)
Choose B-series burstable VMs for variable workloads
Use reserved instances for predictable workloads (1-3 year terms)
Enable auto-shutdown for dev/test VMs
Right-size VMs based on actual usage

2. Storage Optimization¶

Use appropriate storage tiers (Hot, Cool, Archive)
Enable lifecycle management for blob storage
Use managed disks with appropriate performance tiers
Implement data deduplication and compression

3. Network Optimization¶

Minimize data transfer between regions
Use Azure CDN for static content
Optimize data transfer patterns

4. Monitoring and Management¶

Set up budget alerts and spending limits
Use Azure Cost Management + Billing
Implement proper resource tagging
Regular cost reviews and optimizations

5. Service-Specific¶

Use Azure Functions for small, event-driven tasks
Consider Azure Container Instances for short-running jobs
Use Azure Batch for large-scale parallel processing

Resource Cleanup¶

[ ]:

def cleanup_azure_resources(resource_group='clustrix-tutorial-rg'):
    """
    Clean up Azure resources to avoid ongoing charges.

    Args:
        resource_group: Name of the resource group to clean up
    """

    cleanup_commands = f"""
# List all resources in the resource group
az resource list --resource-group {resource_group} --output table

# Stop all VMs first (to gracefully shut down)
az vm deallocate --resource-group {resource_group} --name clustrix-vm-01

# Delete specific resources individually (optional - more granular control)
# az vm delete --resource-group {resource_group} --name clustrix-vm-01 --yes
# az disk delete --resource-group {resource_group} --name clustrix-vm-01_disk1_* --yes
# az network public-ip delete --resource-group {resource_group} --name clustrix-vm-01PublicIP

# WARNING: Delete the entire resource group (removes ALL resources)
az group delete --name {resource_group} --yes --no-wait

# Verify deletion
az group list --output table | grep {resource_group}
"""

    return {
        'resource_group': resource_group,
        'cleanup_commands': cleanup_commands
    }

cleanup_info = cleanup_azure_resources()

print(f"Azure Resource Cleanup Commands for Resource Group: {cleanup_info['resource_group']}")
print("=" * 70)
print(cleanup_info['cleanup_commands'])
print("\n" + "⚠️ " * 10 + " IMPORTANT WARNINGS " + "⚠️ " * 10)
print("1. The 'az group delete' command will permanently delete ALL resources in the group!")
print("2. Review the resources first with 'az resource list' before proceeding")
print("3. Make sure to backup any important data before deletion")
print("4. Consider stopping VMs instead of deleting if you plan to use them again")
print("5. Deleted resources cannot be recovered - this action is irreversible!")
print("=" * 70)

Advanced Example: Distributed Image Processing¶

[ ]:

@cluster(cores=4, memory="8GB", time="00:45:00")
def azure_image_processing_pipeline(storage_config, processing_params):
    """
    Distributed image processing pipeline using Azure Blob Storage.
    """
    from azure.storage.blob import BlobServiceClient
    from azure.identity import DefaultAzureCredential
    import numpy as np
    from PIL import Image
    import io
    import time

    # Connect to Azure Blob Storage
    account_url = f"https://{storage_config['account_name']}.blob.core.windows.net"
    credential = DefaultAzureCredential()
    blob_service_client = BlobServiceClient(account_url=account_url, credential=credential)

    container_client = blob_service_client.get_container_client(storage_config['container'])

    processed_images = []
    processing_stats = []

    # List images to process
    blob_list = container_client.list_blobs(name_starts_with=storage_config['input_prefix'])

    for blob in blob_list:
        if blob.name.lower().endswith(('.png', '.jpg', '.jpeg')):
            start_time = time.time()

            try:
                # Download image
                blob_client = blob_service_client.get_blob_client(
                    container=storage_config['container'], blob=blob.name
                )
                image_data = blob_client.download_blob().readall()

                # Process image
                image = Image.open(io.BytesIO(image_data))

                # Apply processing operations
                if processing_params.get('resize'):
                    image = image.resize(processing_params['resize'])

                if processing_params.get('grayscale'):
                    image = image.convert('L')

                if processing_params.get('rotate'):
                    image = image.rotate(processing_params['rotate'])

                # Convert back to bytes
                output_buffer = io.BytesIO()
                image.save(output_buffer, format='PNG')
                output_buffer.seek(0)

                # Upload processed image
                output_blob_name = blob.name.replace(
                    storage_config['input_prefix'],
                    storage_config['output_prefix']
                )

                output_blob_client = blob_service_client.get_blob_client(
                    container=storage_config['container'], blob=output_blob_name
                )
                output_blob_client.upload_blob(output_buffer.getvalue(), overwrite=True)

                processing_time = time.time() - start_time

                processed_images.append(output_blob_name)
                processing_stats.append({
                    'input_blob': blob.name,
                    'output_blob': output_blob_name,
                    'processing_time': processing_time,
                    'original_size': image.size,
                    'processed_size': image.size
                })

            except Exception as e:
                print(f"Error processing {blob.name}: {e}")

    return {
        'processed_count': len(processed_images),
        'total_processing_time': sum(stat['processing_time'] for stat in processing_stats),
        'average_processing_time': np.mean([stat['processing_time'] for stat in processing_stats]) if processing_stats else 0,
        'processed_images': processed_images[:10],  # First 10 for brevity
        'processing_stats': processing_stats[:5]  # First 5 for brevity
    }

# Example usage (uncomment and modify as needed):
# storage_config = {
#     'account_name': 'yourstorageaccount',
#     'container': 'images',
#     'input_prefix': 'raw/',
#     'output_prefix': 'processed/'
# }
#
# processing_config = {
#     'resize': (800, 600),
#     'grayscale': True,
#     'rotate': 0
# }
#
# result = azure_image_processing_pipeline(storage_config, processing_config)
# print(f"Processed {result['processed_count']} images in {result['total_processing_time']:.2f} seconds")

print("Advanced image processing pipeline example defined.")

Summary¶

This tutorial covered:

Setup: Azure authentication and Clustrix installation
VM Integration: Direct Azure VM configuration
Azure Batch: Managed job scheduling
CycleCloud: HPC-optimized clusters with SLURM
Blob Storage: Data storage and retrieval
Azure ML: Machine learning compute integration
Security: Best practices for safe deployment
Cost Management: Strategies to minimize expenses
Resource Management: Proper cleanup procedures

Next Steps¶

Set up your Azure credentials and test the basic configuration
Start with a simple VM for initial testing
Consider CycleCloud for production HPC workloads
Implement proper monitoring and cost controls
Explore Azure Spot VMs for cost-effective batch processing

Azure-Specific Advantages¶

CycleCloud: Best-in-class HPC cluster management
Azure ML: Integrated machine learning platform
Hybrid Cloud: Seamless integration with on-premises
Enterprise Integration: Active Directory and enterprise tools
Compliance: Strong compliance and security certifications

Resources¶

Remember: Always monitor your Azure costs and clean up resources when not in use!