Microsoft Azure Cloud TutorialΒΆ
This tutorial demonstrates how to use Clustrix with Microsoft Azure cloud infrastructure for scalable distributed computing.
OverviewΒΆ
Azure provides several services that integrate well with Clustrix:
Azure Virtual Machines: Scalable compute instances
Azure Batch: Managed job scheduling service
Azure CycleCloud: HPC cluster orchestration
Azure Machine Learning Compute: ML-optimized infrastructure
Azure Container Instances: Serverless containers
Azure Blob Storage: Object storage for data and results
Azure Virtual Network: Network isolation and security
PrerequisitesΒΆ
Required Azure SetupΒΆ
Azure Account: Active Azure subscription with appropriate permissions
Azure CLI: Installed and configured on your local machine
SSH Key Pair: For secure VM access
Resource Quotas: Sufficient compute quotas in your preferred region
Billing Setup: Credit card or other payment method configured
Local Environment SetupΒΆ
Python Environment: Python 3.8+ with pip
SSH Client: OpenSSH or equivalent
Git: For version control (optional but recommended)
Code Editor: VS Code, PyCharm, or your preferred editor
Step-by-Step Setup GuideΒΆ
Step 1: Install Azure CLIΒΆ
First, install the Azure CLI on your local machine:
Windows (PowerShell):
Invoke-WebRequest -Uri https://aka.ms/installazurecliwindows -OutFile .\AzureCLI.msi; Start-Process msiexec.exe -Wait -ArgumentList '/I AzureCLI.msi /quiet'; rm .\AzureCLI.msi
macOS (Homebrew):
brew update && brew install azure-cli
Linux (Ubuntu/Debian):
curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash
Step 2: Create Azure Account and SubscriptionΒΆ
Go to Azure Portal
Sign up for a free account (includes $200 credit)
Complete account verification
Note your Subscription ID from the Azure Portal
Step 3: Install Clustrix with Azure DependenciesΒΆ
[ ]:
# Install Clustrix with Azure support
!pip install clustrix azure-identity azure-mgmt-compute azure-mgmt-network azure-storage-blob
# Import required libraries
import clustrix
from clustrix import cluster, configure
from azure.identity import DefaultAzureCredential
from azure.mgmt.compute import ComputeManagementClient
from azure.mgmt.network import NetworkManagementClient
from azure.storage.blob import BlobServiceClient
import os
import numpy as np
import time
import json
Step 4: Azure Authentication SetupΒΆ
Configure your Azure credentials. You can do this in several ways:
Option 1: Azure CLI Authentication (Recommended for Development)ΒΆ
This is the simplest method for getting started:
[ ]:
# Login with Azure CLI (run this in terminal)
# az login
# Set your subscription (replace with your actual subscription ID)
# az account set --subscription "12345678-1234-1234-1234-123456789012"
# Verify authentication
!az account show --output table
Option 2: Service Principal Authentication (Recommended for Production)ΒΆ
For production environments, create a service principal with limited permissions:
Create Service Principal (run in terminal):
# Create service principal
az ad sp create-for-rbac --name "clustrix-service-principal" --role contributor
# The output will include:
# - appId (client ID)
# - password (client secret)
# - tenant (tenant ID)
Set Environment Variables:
[ ]:
# Set Azure credentials as environment variables (replace with your actual values)
# os.environ['AZURE_CLIENT_ID'] = 'your-client-id-from-service-principal'
# os.environ['AZURE_CLIENT_SECRET'] = 'your-client-secret-from-service-principal'
# os.environ['AZURE_TENANT_ID'] = 'your-tenant-id-from-service-principal'
# Test Azure connection
try:
credential = DefaultAzureCredential()
subscription_id = 'your-subscription-id' # Replace with actual ID
compute_client = ComputeManagementClient(credential, subscription_id)
# Test by listing VM sizes in East US
vm_sizes = list(compute_client.virtual_machine_sizes.list('eastus'))
print(f"Successfully connected to Azure. Available VM sizes: {len(vm_sizes)}")
except Exception as e:
print(f"Azure connection failed: {e}")
print("Make sure you have:")
print("1. Run 'az login' or set service principal environment variables")
print("2. Set the correct subscription ID")
print("3. Have appropriate permissions in your Azure subscription")
Step 5: Generate SSH Key PairΒΆ
Clustrix requires SSH access to remote VMs. Generate an SSH key pair if you donβt have one:
Generate SSH Key (run in terminal):
# Generate SSH key pair (press Enter for default location)
ssh-keygen -t rsa -b 4096 -C "your-email@example.com"
# Add key to SSH agent
ssh-add ~/.ssh/id_rsa
# Display public key (you'll need this for VM creation)
cat ~/.ssh/id_rsa.pub
Important Notes:
Keep your private key (
~/.ssh/id_rsa) secure and never share itYouβll use the public key (
~/.ssh/id_rsa.pub) when creating Azure VMsMake sure you have set up authentication and have the correct subscription ID
Method 1: Azure Virtual Machines ConfigurationΒΆ
Step 6: Create Resource Group and Azure VM for ClustrixΒΆ
First, create a resource group to organize your Azure resources:
[ ]:
def create_clustrix_vm(resource_group, vm_name, location='eastus', vm_size='Standard_D4s_v3'):
"""
Create an Azure VM configured for Clustrix.
Args:
resource_group: Azure resource group name
vm_name: Name for the VM
location: Azure region
vm_size: VM size (CPU/memory configuration)
Returns:
VM details including public IP
"""
# Cloud-init script for VM setup
cloud_init_script = '''
#cloud-config
package_update: true
packages:
- python3
- python3-pip
- git
- htop
runcmd:
# Install clustrix and common packages
- pip3 install clustrix numpy scipy pandas scikit-learn
# Install uv for faster package management
- curl -LsSf https://astral.sh/uv/install.sh | sh
# Create clustrix user
- useradd -m -s /bin/bash clustrix
- usermod -aG sudo clustrix
- echo "clustrix ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers
# Setup SSH for clustrix user
- mkdir -p /home/clustrix/.ssh
- cp /home/azureuser/.ssh/authorized_keys /home/clustrix/.ssh/
- chown -R clustrix:clustrix /home/clustrix/.ssh
- chmod 700 /home/clustrix/.ssh
- chmod 600 /home/clustrix/.ssh/authorized_keys
# Create working directory
- mkdir -p /tmp/clustrix
- chown clustrix:clustrix /tmp/clustrix
'''
# Azure CLI commands for VM creation
azure_commands = f"""
# Create resource group
az group create --name {resource_group} --location {location}
# Create VM with cloud-init
az vm create \\
--resource-group {resource_group} \\
--name {vm_name} \\
--image Ubuntu2204 \\
--size {vm_size} \\
--admin-username azureuser \\
--generate-ssh-keys \\
--custom-data cloud-init.txt \\
--public-ip-sku Standard \\
--tags Purpose=Clustrix Environment=Tutorial
# Get public IP
az vm show \\
--resource-group {resource_group} \\
--name {vm_name} \\
--show-details \\
--query publicIps \\
--output tsv
"""
return {
'resource_group': resource_group,
'vm_name': vm_name,
'location': location,
'vm_size': vm_size,
'commands': azure_commands,
'cloud_init': cloud_init_script
}
# Example VM configuration
vm_config = create_clustrix_vm(
resource_group='clustrix-tutorial-rg',
vm_name='clustrix-vm-01',
location='eastus',
vm_size='Standard_D4s_v3' # 4 vCPUs, 16 GB RAM
)
print("Save the cloud-init script to a file called 'cloud-init.txt' in your current directory")
print("Then execute these Azure CLI commands to create your VM:")
print("-" * 60)
print(vm_config['commands'])
Cloud-Init ScriptΒΆ
Save this cloud-init script to a file named cloud-init.txt in your current directory:
[ ]:
# Display the cloud-init script content
print(vm_config['cloud_init'])
Step 7: Configure Clustrix for Azure VMΒΆ
After your VM is created and you have the public IP address, configure Clustrix to use it:
[ ]:
# Configure Clustrix to use your Azure VM
# Replace 'your-vm-public-ip' with the actual IP from: az vm show --resource-group clustrix-tutorial-rg --name clustrix-vm-01 --show-details --query publicIps --output tsv
configure(
cluster_type="ssh",
cluster_host="your-vm-public-ip", # Replace with actual IP
username="clustrix", # or "azureuser" if using default user
key_file="~/.ssh/id_rsa", # Azure CLI generated key
remote_work_dir="/tmp/clustrix",
package_manager="auto", # Will use uv if available
default_cores=4,
default_memory="8GB",
default_time="01:00:00"
)
print("Clustrix configured for Azure VM")
print("Make sure to replace 'your-vm-public-ip' with your actual VM's public IP address")
Testing Your Azure VM ConnectionΒΆ
Before running Clustrix jobs, test your SSH connection to the VM:
# Test SSH connection (replace with your actual IP)
ssh -i ~/.ssh/id_rsa clustrix@your-vm-public-ip
# Or if using default azureuser:
ssh -i ~/.ssh/id_rsa azureuser@your-vm-public-ip
Troubleshooting Connection Issues:
Ensure your VM is running:
az vm show --resource-group clustrix-tutorial-rg --name clustrix-vm-01 --show-details --query powerStateCheck Network Security Group rules allow SSH (port 22)
Verify your SSH key is correct and has proper permissions (
chmod 600 ~/.ssh/id_rsa)
Example: Remote Computation on Azure VMΒΆ
[ ]:
@cluster(cores=2, memory="4GB")
def azure_numerical_analysis(matrix_size=1000, iterations=10):
"""Perform numerical analysis on Azure VM."""
import numpy as np
import time
results = []
for i in range(iterations):
# Generate random matrix
matrix = np.random.rand(matrix_size, matrix_size)
# Perform eigenvalue decomposition
start_time = time.time()
eigenvalues = np.linalg.eigvals(matrix)
computation_time = time.time() - start_time
results.append({
'iteration': i + 1,
'max_eigenvalue': float(np.max(eigenvalues.real)),
'min_eigenvalue': float(np.min(eigenvalues.real)),
'computation_time': computation_time
})
return {
'matrix_size': matrix_size,
'total_iterations': iterations,
'average_time': np.mean([r['computation_time'] for r in results]),
'results': results
}
# Run computation on Azure VM (uncomment after configuring your VM)
# result = azure_numerical_analysis(matrix_size=500, iterations=5)
# print(f"Completed {result['total_iterations']} iterations")
# print(f"Average computation time: {result['average_time']:.3f} seconds")
print("Example function defined. Configure your VM IP address and uncomment the lines above to run.")
Method 2: Azure Batch ConfigurationΒΆ
Azure Batch provides managed job scheduling for large-scale parallel workloads:
[ ]:
def setup_azure_batch_environment():
"""
Template for setting up Azure Batch environment.
This requires manual setup through Azure portal or CLI.
"""
batch_setup_commands = """
# Create Azure Batch account
az batch account create \\
--name clustrixbatch \\
--resource-group clustrix-tutorial-rg \\
--location eastus
# Create storage account for Batch
az storage account create \\
--name clustrixstorage \\
--resource-group clustrix-tutorial-rg \\
--location eastus \\
--sku Standard_LRS
# Link storage to Batch account
az batch account set \\
--name clustrixbatch \\
--resource-group clustrix-tutorial-rg \\
--storage-account clustrixstorage
# Create Batch pool
az batch pool create \\
--id clustrix-pool \\
--vm-size Standard_D2s_v3 \\
--target-dedicated-nodes 2 \\
--image canonical:0001-com-ubuntu-server-jammy:22_04-lts \\
--node-agent-sku-id "batch.node.ubuntu 22.04"
# Create Batch job
az batch job create \\
--id clustrix-job \\
--pool-id clustrix-pool
"""
batch_config = {
'account_name': 'clustrixbatch',
'account_url': 'https://clustrixbatch.eastus.batch.azure.com',
'resource_group': 'clustrix-tutorial-rg',
'pool_id': 'clustrix-pool',
'job_id': 'clustrix-job'
}
return batch_config, batch_setup_commands
batch_config, batch_commands = setup_azure_batch_environment()
print("Azure Batch Configuration:")
print(json.dumps(batch_config, indent=2))
print("\nTo set up Azure Batch, run these commands:")
print("-" * 50)
print(batch_commands)
Important Notes for Azure Batch:
Azure Batch integration with Clustrix requires custom implementation
Consider using Azure CycleCloud for HPC workloads instead
Batch is better suited for managed job scheduling at scale
Method 3: Azure CycleCloud IntegrationΒΆ
Azure CycleCloud is designed for HPC workloads and provides SLURM integration:
[ ]:
# Azure CycleCloud cluster template for Clustrix
cyclecloud_template = """
# CycleCloud SLURM cluster template
# Save as clustrix-slurm.txt and import into CycleCloud
[cluster clustrix-slurm]
FormLayout = selectionpanel
Category = Schedulers
IconUrl = static/cloud/cluster/ui/ClusterIcon/slurm.png
[[node defaults]]
UsePublicNetwork = false
Credentials = $Credentials
SubnetId = $SubnetId
Region = $Region
KeyPairLocation = ~/.ssh/cyclecloud.pem
# Install clustrix on all nodes
[[[configuration]]]
clustrix.version = latest
[[[cluster-init clustrix:default:1.0.0]]]
[[node master]]
MachineType = $MasterMachineType
IsReturnProxy = $ReturnProxy
AdditionalClusterInitSpecs = $MasterClusterInitSpecs
[[[configuration]]]
slurm.version = $configuration_slurm_version
[[[cluster-init slurm:master:2.7.2]]]
[[[network-interface eth0]]]
AssociatePublicIpAddress = $UsePublicNetwork
[[nodearray execute]]
MachineType = $ExecuteMachineType
MaxCoreCount = $MaxExecuteCoreCount
Interruptible = $UseLowPrio
AdditionalClusterInitSpecs = $ExecuteClusterInitSpecs
[[[configuration]]]
slurm.version = $configuration_slurm_version
[[[cluster-init slurm:execute:2.7.2]]]
[[[network-interface eth0]]]
AssociatePublicIpAddress = false
[parameters About]
Order = 1
[[parameters About Clustrix]]
[[[parameter clustrix]]]
HideLabel = true
Config.Plugin = pico.widget.HtmlTemplateWidget
Config.Template = "Clustrix-enabled SLURM cluster for distributed computing"
[parameters Required Settings]
Order = 10
[[parameters Virtual Machines]]
Description = "Configure the VM types and sizes"
Order = 20
[[[parameter Region]]]
Label = Region
Description = Deployment Location
ParameterType = Cloud.Region
DefaultValue = eastus
[[[parameter MasterMachineType]]]
Label = Master VM Type
Description = Master node VM type
ParameterType = Cloud.MachineType
DefaultValue = Standard_D4s_v3
[[[parameter ExecuteMachineType]]]
Label = Execute VM Type
Description = Execute node VM type
ParameterType = Cloud.MachineType
DefaultValue = Standard_H16r
"""
def configure_for_cyclecloud(master_ip, cluster_name="clustrix-slurm"):
"""Configure Clustrix to use Azure CycleCloud SLURM cluster."""
configure(
cluster_type="slurm",
cluster_host=master_ip,
username="cyclecloud", # Default CycleCloud user
key_file="~/.ssh/cyclecloud.pem",
remote_work_dir="/shared/clustrix", # Use shared storage
package_manager="uv",
module_loads=["python3"],
environment_variables={
"CLUSTRIX_CLUSTER": cluster_name
},
default_cores=8,
default_memory="16GB",
default_time="02:00:00",
default_partition="hpc"
)
return f"Configured Clustrix for CycleCloud cluster: {cluster_name}"
print("CycleCloud Template (save as clustrix-slurm.txt):")
print(cyclecloud_template)
# Example configuration (uncomment and modify as needed)
# config_message = configure_for_cyclecloud("10.1.0.4", "my-clustrix-cluster")
# print(config_message)
Azure CycleCloud Benefits:
Best-in-class HPC cluster management for Azure
Native SLURM integration works seamlessly with Clustrix
Automatic scaling and cost optimization
Enterprise-grade security and compliance
Hybrid cloud capabilities for on-premises integration
Data Management with Azure Blob StorageΒΆ
[ ]:
@cluster(cores=2, memory="4GB")
def process_blob_data(storage_account, container_name, input_blob, output_blob, storage_key=None):
"""Process data from Azure Blob Storage and save results back."""
from azure.storage.blob import BlobServiceClient
from azure.identity import DefaultAzureCredential
import numpy as np
import pickle
import io
# Initialize Blob Service Client
if storage_key:
account_url = f"https://{storage_account}.blob.core.windows.net"
blob_service_client = BlobServiceClient(account_url=account_url, credential=storage_key)
else:
# Use managed identity or Azure CLI authentication
account_url = f"https://{storage_account}.blob.core.windows.net"
credential = DefaultAzureCredential()
blob_service_client = BlobServiceClient(account_url=account_url, credential=credential)
# Download data from blob storage
blob_client = blob_service_client.get_blob_client(container=container_name, blob=input_blob)
blob_data = blob_client.download_blob()
data = pickle.loads(blob_data.readall())
# Process the data
processed_data = {
'original_shape': data.shape if hasattr(data, 'shape') else len(data),
'mean': float(np.mean(data)) if hasattr(data, '__iter__') else float(data),
'std': float(np.std(data)) if hasattr(data, '__iter__') else 0.0,
'max': float(np.max(data)) if hasattr(data, '__iter__') else float(data),
'min': float(np.min(data)) if hasattr(data, '__iter__') else float(data),
'processing_timestamp': time.time(),
'processed_on': 'azure-vm'
}
# Upload results to blob storage
output_buffer = io.BytesIO()
pickle.dump(processed_data, output_buffer)
output_buffer.seek(0)
output_blob_client = blob_service_client.get_blob_client(container=container_name, blob=output_blob)
output_blob_client.upload_blob(output_buffer.getvalue(), overwrite=True)
return f"Processed data saved to blob: {output_blob}"
# Utility functions for Azure Blob Storage
def upload_to_blob(data, storage_account, container_name, blob_name, storage_key=None):
"""Upload data to Azure Blob Storage."""
if storage_key:
account_url = f"https://{storage_account}.blob.core.windows.net"
blob_service_client = BlobServiceClient(account_url=account_url, credential=storage_key)
else:
account_url = f"https://{storage_account}.blob.core.windows.net"
credential = DefaultAzureCredential()
blob_service_client = BlobServiceClient(account_url=account_url, credential=credential)
buffer = io.BytesIO()
pickle.dump(data, buffer)
buffer.seek(0)
blob_client = blob_service_client.get_blob_client(container=container_name, blob=blob_name)
blob_client.upload_blob(buffer.getvalue(), overwrite=True)
return f"Data uploaded to blob: {blob_name}"
def download_from_blob(storage_account, container_name, blob_name, storage_key=None):
"""Download data from Azure Blob Storage."""
if storage_key:
account_url = f"https://{storage_account}.blob.core.windows.net"
blob_service_client = BlobServiceClient(account_url=account_url, credential=storage_key)
else:
account_url = f"https://{storage_account}.blob.core.windows.net"
credential = DefaultAzureCredential()
blob_service_client = BlobServiceClient(account_url=account_url, credential=credential)
blob_client = blob_service_client.get_blob_client(container=container_name, blob=blob_name)
blob_data = blob_client.download_blob()
return pickle.loads(blob_data.readall())
# Example usage (uncomment and modify as needed):
# sample_data = np.random.rand(1000, 50)
# upload_result = upload_to_blob(sample_data, 'yourstorageaccount', 'data', 'input/sample.pkl')
# print(upload_result)
#
# process_result = process_blob_data('yourstorageaccount', 'data', 'input/sample.pkl', 'output/results.pkl')
# print(process_result)
print("Azure Blob Storage integration functions defined.")
Azure Machine Learning Compute IntegrationΒΆ
[ ]:
def setup_azure_ml_compute():
"""
Template for setting up Azure ML compute clusters.
These can be used with Clustrix for ML workloads.
"""
aml_setup_commands = """
# Create Azure ML workspace
az ml workspace create \\
--name clustrix-ml-workspace \\
--resource-group clustrix-tutorial-rg \\
--location eastus
# Create compute cluster
az ml compute create \\
--name clustrix-compute \\
--type amlcompute \\
--min-instances 0 \\
--max-instances 4 \\
--size Standard_DS3_v2 \\
--workspace-name clustrix-ml-workspace \\
--resource-group clustrix-tutorial-rg
# Create compute instance for development
az ml compute create \\
--name clustrix-dev-instance \\
--type computeinstance \\
--size Standard_DS3_v2 \\
--workspace-name clustrix-ml-workspace \\
--resource-group clustrix-tutorial-rg
"""
return {
'workspace': 'clustrix-ml-workspace',
'compute_cluster': 'clustrix-compute',
'compute_instance': 'clustrix-dev-instance',
'commands': aml_setup_commands
}
@cluster(cores=4, memory="8GB")
def azure_ml_training_job(dataset_params, model_params):
"""Example ML training job that could run on Azure ML compute."""
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_classification
import time
# Generate synthetic dataset (in real scenario, load from Azure ML datasets)
X, y = make_classification(
n_samples=dataset_params['n_samples'],
n_features=dataset_params['n_features'],
n_classes=dataset_params['n_classes'],
random_state=42
)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Train model
start_time = time.time()
model = RandomForestClassifier(**model_params)
model.fit(X_train, y_train)
training_time = time.time() - start_time
# Evaluate
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
return {
'accuracy': accuracy,
'training_time': training_time,
'training_samples': len(X_train),
'test_samples': len(X_test),
'feature_importance': model.feature_importances_.tolist()[:10], # Top 10
'model_params': model_params,
'dataset_params': dataset_params
}
aml_config = setup_azure_ml_compute()
print("Azure ML Setup Commands:")
print(aml_config['commands'])
# Example usage (uncomment to run after setting up Azure ML):
# dataset_config = {'n_samples': 10000, 'n_features': 20, 'n_classes': 3}
# model_config = {'n_estimators': 100, 'max_depth': 10, 'random_state': 42, 'n_jobs': -1}
# result = azure_ml_training_job(dataset_config, model_config)
# print(f"Model trained with accuracy: {result['accuracy']:.4f}")
print("Azure ML integration example defined.")
Security Best PracticesΒΆ
[ ]:
def setup_azure_security_for_clustrix(resource_group='clustrix-tutorial-rg', location='eastus'):
"""
Security configuration for Azure + Clustrix deployment.
"""
security_commands = f"""
# Create virtual network with private subnets
az network vnet create \\
--resource-group {resource_group} \\
--name clustrix-vnet \\
--address-prefix 10.1.0.0/16 \\
--subnet-name clustrix-subnet \\
--subnet-prefix 10.1.0.0/24 \\
--location {location}
# Create Network Security Group with restrictive rules
az network nsg create \\
--resource-group {resource_group} \\
--name clustrix-nsg \\
--location {location}
# Allow SSH only from your IP (replace with your actual IP)
az network nsg rule create \\
--resource-group {resource_group} \\
--nsg-name clustrix-nsg \\
--name AllowSSHFromMyIP \\
--protocol tcp \\
--priority 1000 \\
--destination-port-range 22 \\
--source-address-prefixes YOUR_IP_ADDRESS/32 \\
--access allow
# Allow internal communication
az network nsg rule create \\
--resource-group {resource_group} \\
--nsg-name clustrix-nsg \\
--name AllowVnetInbound \\
--protocol '*' \\
--priority 1001 \\
--source-address-prefixes 10.1.0.0/16 \\
--destination-address-prefixes 10.1.0.0/16 \\
--access allow
# Create Key Vault for secrets management
az keyvault create \\
--resource-group {resource_group} \\
--name clustrix-keyvault-$(uuidgen | tr '[:upper:]' '[:lower:]' | cut -c1-8) \\
--location {location} \\
--enable-disk-encryption \\
--sku standard
# Create managed identity for VMs
az identity create \\
--resource-group {resource_group} \\
--name clustrix-identity \\
--location {location}
# Create storage account with private endpoint
az storage account create \\
--resource-group {resource_group} \\
--name clustrixstorage$(uuidgen | tr '[:upper:]' '[:lower:]' | cut -c1-8) \\
--location {location} \\
--sku Standard_LRS \\
--allow-blob-public-access false \\
--https-only true \\
--min-tls-version TLS1_2
# Enable Azure Security Center
az security auto-provisioning-setting update \\
--name default \\
--auto-provision on
"""
return {
'resource_group': resource_group,
'location': location,
'vnet_name': 'clustrix-vnet',
'subnet_name': 'clustrix-subnet',
'nsg_name': 'clustrix-nsg',
'security_commands': security_commands
}
security_config = setup_azure_security_for_clustrix()
print("Azure Security Setup Commands:")
print(security_config['security_commands'])
print("\nIMPORTANT: Replace 'YOUR_IP_ADDRESS' with your actual public IP address!")
print("Find your IP with: curl ifconfig.me")
Azure Security Checklist for ClustrixΒΆ
β Authentication and Access
Use Azure Active Directory for authentication
Enable managed identities instead of service principals when possible
Restrict Network Security Groups to your IP address only
Use private endpoints for storage accounts
β Infrastructure Security
Enable disk encryption for all VMs
Use Azure Key Vault for secrets and certificates
Enable Azure Security Center recommendations
Use Azure Private Link for service connectivity
β Monitoring and Compliance
Enable diagnostic logging and monitoring
Implement Azure Policy for compliance
Use Azure Defender for cloud workload protection
Regularly rotate access keys and certificates
β Cost and Resource Management
Set up cost alerts and spending limits
Tag all resources for governance and cost tracking
Cost Management and OptimizationΒΆ
[ ]:
# Import Clustrix cost monitoring for Azure
from clustrix import cost_tracking_decorator, get_cost_monitor, generate_cost_report, get_pricing_info
# Example 1: Cost tracking with Azure VMs
@cost_tracking_decorator('azure', 'Standard_NC6s_v3')
@cluster(cores=6, memory="112GB")
def azure_training_with_cost_tracking():
"""Example training function with Azure cost tracking."""
import time
import numpy as np
print("Starting Azure training with cost monitoring...")
time.sleep(2) # Simulate training
# Simulate ML workload
data = np.random.randn(1500, 1500)
result = np.linalg.qr(data)
print("Training completed!")
return {'accuracy': 0.89, 'training_time': 2.0}
# Example 2: Compare Azure VM pricing
def compare_azure_pricing():
"""Compare Azure VM pricing for different instance types."""
pricing = get_pricing_info('azure')
if pricing:
print("Azure VM Pay-as-you-go Pricing (USD/hour):")
# Group by category
gpu_vms = {k: v for k, v in pricing.items() if k.startswith('Standard_NC')}
general_vms = {k: v for k, v in pricing.items() if k.startswith('Standard_D')}
compute_vms = {k: v for k, v in pricing.items() if k.startswith('Standard_F')}
memory_vms = {k: v for k, v in pricing.items() if k.startswith('Standard_E')}
print("\nGPU VMs:")
for vm, price in sorted(gpu_vms.items(), key=lambda x: x[1]):
print(f" {vm:<25}: ${price:.3f}/hour")
print("\nGeneral Purpose:")
for vm, price in sorted(general_vms.items(), key=lambda x: x[1]):
print(f" {vm:<25}: ${price:.3f}/hour")
print("\nCompute Optimized:")
for vm, price in sorted(compute_vms.items(), key=lambda x: x[1]):
print(f" {vm:<25}: ${price:.3f}/hour")
# Example 3: Azure Spot VM savings analysis
def azure_spot_cost_analysis():
"""Analyze potential savings with Azure Spot VMs."""
monitor = get_cost_monitor('azure')
if monitor:
print("Azure Spot VM Savings Analysis:")
print("-" * 40)
vm_types = ['Standard_NC6s_v3', 'Standard_D4s_v3', 'Standard_F8s_v2', 'Standard_E8s_v3']
for vm in vm_types:
pay_as_you_go = monitor.estimate_cost(vm, 1.0, use_spot=False)
spot = monitor.estimate_cost(vm, 1.0, use_spot=True)
savings = ((pay_as_you_go.hourly_rate - spot.hourly_rate) / pay_as_you_go.hourly_rate) * 100
print(f"{vm}:")
print(f" Pay-as-you-go: ${pay_as_you_go.hourly_rate:.3f}/hour")
print(f" Spot: ${spot.hourly_rate:.3f}/hour")
print(f" Savings: {savings:.1f}%")
print()
# Example 4: Azure Batch cost estimation
def estimate_azure_batch_costs():
"""Estimate costs for Azure Batch workloads."""
monitor = get_cost_monitor('azure')
if monitor:
batch_estimate = monitor.estimate_batch_cost(
pool_name="clustrix-batch-pool",
vm_size="Standard_D4s_v3",
target_nodes=8,
estimated_duration_hours=2.0
)
print("Azure Batch Cost Estimation:")
print(f" Pool Name: {batch_estimate['pool_name']}")
print(f" VM Size: {batch_estimate['vm_size']}")
print(f" Target Nodes: {batch_estimate['target_nodes']}")
print(f" Duration: {batch_estimate['estimated_duration_hours']} hours")
print(f" Total Compute Hours: {batch_estimate['total_compute_hours']}")
print(f" Estimated Cost: ${batch_estimate['estimated_cost']:.2f}")
print(f" Cost per Node-Hour: ${batch_estimate['cost_per_node_hour']:.3f}")
# Example 5: Regional pricing comparison
def compare_azure_regions():
"""Compare Azure pricing across different regions."""
monitor = get_cost_monitor('azure')
if monitor:
print("Azure Regional Pricing Comparison for Standard_NC6s_v3:")
print("-" * 55)
regional_pricing = monitor.get_region_pricing_comparison('Standard_NC6s_v3')
for region, pricing_info in regional_pricing.items():
print(f"{region}:")
print(f" Pay-as-you-go: ${pricing_info['pay_as_you_go_hourly']:.3f}/hour")
print(f" Est. Spot: ${pricing_info['estimated_spot_hourly']:.3f}/hour")
print()
# Example 6: Real-time Azure cost monitoring
def monitor_azure_costs():
"""Monitor current Azure resource usage and costs."""
report = generate_cost_report('azure', 'Standard_NC6s_v3')
if report:
print("Current Azure Resource Status:")
print(f" CPU Usage: {report['resource_usage']['cpu_percent']:.1f}%")
print(f" Memory Usage: {report['resource_usage']['memory_percent']:.1f}%")
if report['resource_usage']['gpu_stats']:
print(f" GPU Count: {len(report['resource_usage']['gpu_stats'])}")
print(f" Hourly Rate: ${report['cost_estimate']['hourly_rate']:.3f}")
if report['recommendations']:
print("\nCost Optimization Recommendations:")
for rec in report['recommendations']:
print(f" β’ {rec}")
# Example 7: Spot VM configuration for cost savings
def configure_spot_vm():
"""Example configuration for using Azure Spot VMs."""
configure(
cluster_type="ssh",
cluster_host="your-spot-vm-ip",
username="azureuser",
key_file="~/.ssh/id_rsa",
remote_work_dir="/tmp/clustrix",
# Spot VMs can be evicted, so use shorter timeouts
default_time="00:30:00",
job_poll_interval=60, # Check more frequently
cleanup_on_success=True # Clean up quickly
)
return "Configured for Azure Spot VMs with appropriate timeouts."
# Run examples
print("Azure Cost Monitoring Examples:")
print("=" * 40)
print("\n1. Azure VM Pricing Comparison:")
compare_azure_pricing()
print("\n2. Spot VM Savings Analysis:")
azure_spot_cost_analysis()
print("\n3. Azure Batch Cost Estimation:")
estimate_azure_batch_costs()
print("\n4. Regional Pricing Comparison:")
compare_azure_regions()
print("\n5. Current Azure Status:")
monitor_azure_costs()
print("\nβ
Azure cost monitoring examples ready!")
print("π‘ Use @cost_tracking_decorator('azure', 'vm_size') for automatic cost tracking")
# Example spot VM configuration (uncomment to use)
# spot_config = configure_spot_vm()
# print(f"Configuration result: {spot_config}")
Azure Cost Optimization for ClustrixΒΆ
Cost Monitoring CommandsΒΆ
# Set up budget alerts
az consumption budget create \
--budget-name clustrix-monthly-budget \
--amount 100 \
--time-grain Monthly \
--time-period-start 2025-01-01 \
--time-period-end 2025-12-31
# Get current costs
az consumption usage list \
--start-date 2025-01-01 \
--end-date 2025-01-31
# List resource costs by resource group
az costmanagement query \
--type Usage \
--dataset-aggregation '{"totalCost":{"name":"PreTaxCost","function":"Sum"}}' \
--dataset-grouping name=ResourceGroup type=Dimension
# Set up auto-shutdown for VMs
az vm auto-shutdown \
--resource-group clustrix-tutorial-rg \
--name clustrix-vm-01 \
--time 1900 \
--email your-email@example.com
Cost Optimization RecommendationsΒΆ
Use Spot VMs for batch processing (up to 90% savings)
Enable auto-shutdown for dev resources
Implement lifecycle policies for blob storage
Set up budget alerts and spending limits
Regular cost reviews and resource optimization
Use reserved instances for predictable workloads
Choose appropriate VM sizes based on actual usage
Azure Cost Optimization for ClustrixΒΆ
1. Compute OptimizationΒΆ
Use Azure Spot VMs for non-critical workloads (up to 90% savings)
Choose B-series burstable VMs for variable workloads
Use reserved instances for predictable workloads (1-3 year terms)
Enable auto-shutdown for dev/test VMs
Right-size VMs based on actual usage
2. Storage OptimizationΒΆ
Use appropriate storage tiers (Hot, Cool, Archive)
Enable lifecycle management for blob storage
Use managed disks with appropriate performance tiers
Implement data deduplication and compression
3. Network OptimizationΒΆ
Minimize data transfer between regions
Use Azure CDN for static content
Optimize data transfer patterns
4. Monitoring and ManagementΒΆ
Set up budget alerts and spending limits
Use Azure Cost Management + Billing
Implement proper resource tagging
Regular cost reviews and optimizations
5. Service-SpecificΒΆ
Use Azure Functions for small, event-driven tasks
Consider Azure Container Instances for short-running jobs
Use Azure Batch for large-scale parallel processing
Resource CleanupΒΆ
[ ]:
def cleanup_azure_resources(resource_group='clustrix-tutorial-rg'):
"""
Clean up Azure resources to avoid ongoing charges.
Args:
resource_group: Name of the resource group to clean up
"""
cleanup_commands = f"""
# List all resources in the resource group
az resource list --resource-group {resource_group} --output table
# Stop all VMs first (to gracefully shut down)
az vm deallocate --resource-group {resource_group} --name clustrix-vm-01
# Delete specific resources individually (optional - more granular control)
# az vm delete --resource-group {resource_group} --name clustrix-vm-01 --yes
# az disk delete --resource-group {resource_group} --name clustrix-vm-01_disk1_* --yes
# az network public-ip delete --resource-group {resource_group} --name clustrix-vm-01PublicIP
# WARNING: Delete the entire resource group (removes ALL resources)
az group delete --name {resource_group} --yes --no-wait
# Verify deletion
az group list --output table | grep {resource_group}
"""
return {
'resource_group': resource_group,
'cleanup_commands': cleanup_commands
}
cleanup_info = cleanup_azure_resources()
print(f"Azure Resource Cleanup Commands for Resource Group: {cleanup_info['resource_group']}")
print("=" * 70)
print(cleanup_info['cleanup_commands'])
print("\n" + "β οΈ " * 10 + " IMPORTANT WARNINGS " + "β οΈ " * 10)
print("1. The 'az group delete' command will permanently delete ALL resources in the group!")
print("2. Review the resources first with 'az resource list' before proceeding")
print("3. Make sure to backup any important data before deletion")
print("4. Consider stopping VMs instead of deleting if you plan to use them again")
print("5. Deleted resources cannot be recovered - this action is irreversible!")
print("=" * 70)
Advanced Example: Distributed Image ProcessingΒΆ
[ ]:
@cluster(cores=4, memory="8GB", time="00:45:00")
def azure_image_processing_pipeline(storage_config, processing_params):
"""
Distributed image processing pipeline using Azure Blob Storage.
"""
from azure.storage.blob import BlobServiceClient
from azure.identity import DefaultAzureCredential
import numpy as np
from PIL import Image
import io
import time
# Connect to Azure Blob Storage
account_url = f"https://{storage_config['account_name']}.blob.core.windows.net"
credential = DefaultAzureCredential()
blob_service_client = BlobServiceClient(account_url=account_url, credential=credential)
container_client = blob_service_client.get_container_client(storage_config['container'])
processed_images = []
processing_stats = []
# List images to process
blob_list = container_client.list_blobs(name_starts_with=storage_config['input_prefix'])
for blob in blob_list:
if blob.name.lower().endswith(('.png', '.jpg', '.jpeg')):
start_time = time.time()
try:
# Download image
blob_client = blob_service_client.get_blob_client(
container=storage_config['container'], blob=blob.name
)
image_data = blob_client.download_blob().readall()
# Process image
image = Image.open(io.BytesIO(image_data))
# Apply processing operations
if processing_params.get('resize'):
image = image.resize(processing_params['resize'])
if processing_params.get('grayscale'):
image = image.convert('L')
if processing_params.get('rotate'):
image = image.rotate(processing_params['rotate'])
# Convert back to bytes
output_buffer = io.BytesIO()
image.save(output_buffer, format='PNG')
output_buffer.seek(0)
# Upload processed image
output_blob_name = blob.name.replace(
storage_config['input_prefix'],
storage_config['output_prefix']
)
output_blob_client = blob_service_client.get_blob_client(
container=storage_config['container'], blob=output_blob_name
)
output_blob_client.upload_blob(output_buffer.getvalue(), overwrite=True)
processing_time = time.time() - start_time
processed_images.append(output_blob_name)
processing_stats.append({
'input_blob': blob.name,
'output_blob': output_blob_name,
'processing_time': processing_time,
'original_size': image.size,
'processed_size': image.size
})
except Exception as e:
print(f"Error processing {blob.name}: {e}")
return {
'processed_count': len(processed_images),
'total_processing_time': sum(stat['processing_time'] for stat in processing_stats),
'average_processing_time': np.mean([stat['processing_time'] for stat in processing_stats]) if processing_stats else 0,
'processed_images': processed_images[:10], # First 10 for brevity
'processing_stats': processing_stats[:5] # First 5 for brevity
}
# Example usage (uncomment and modify as needed):
# storage_config = {
# 'account_name': 'yourstorageaccount',
# 'container': 'images',
# 'input_prefix': 'raw/',
# 'output_prefix': 'processed/'
# }
#
# processing_config = {
# 'resize': (800, 600),
# 'grayscale': True,
# 'rotate': 0
# }
#
# result = azure_image_processing_pipeline(storage_config, processing_config)
# print(f"Processed {result['processed_count']} images in {result['total_processing_time']:.2f} seconds")
print("Advanced image processing pipeline example defined.")
SummaryΒΆ
This tutorial covered:
Setup: Azure authentication and Clustrix installation
VM Integration: Direct Azure VM configuration
Azure Batch: Managed job scheduling
CycleCloud: HPC-optimized clusters with SLURM
Blob Storage: Data storage and retrieval
Azure ML: Machine learning compute integration
Security: Best practices for safe deployment
Cost Management: Strategies to minimize expenses
Resource Management: Proper cleanup procedures
Next StepsΒΆ
Set up your Azure credentials and test the basic configuration
Start with a simple VM for initial testing
Consider CycleCloud for production HPC workloads
Implement proper monitoring and cost controls
Explore Azure Spot VMs for cost-effective batch processing
Azure-Specific AdvantagesΒΆ
CycleCloud: Best-in-class HPC cluster management
Azure ML: Integrated machine learning platform
Hybrid Cloud: Seamless integration with on-premises
Enterprise Integration: Active Directory and enterprise tools
Compliance: Strong compliance and security certifications
ResourcesΒΆ
Remember: Always monitor your Azure costs and clean up resources when not in use!