{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# SLURM Cluster Tutorial\n", "\n", "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ContextLab/clustrix/blob/master/docs/notebooks/slurm_tutorial.ipynb)\n", "\n", "This tutorial demonstrates how to use Clustrix with SLURM (Simple Linux Utility for Resource Management) clusters. SLURM is one of the most popular workload managers for HPC clusters.\n", "\n", "## Prerequisites\n", "\n", "- Access to a SLURM cluster\n", "- SSH key configured for the cluster\n", "- Clustrix installed: `pip install clustrix`" ], "id": "cell-0" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Installation and Setup\n", "\n", "First, install Clustrix if you haven't already:" ], "id": "cell-1" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Install Clustrix (uncomment if needed)\n", "# !pip install clustrix\n", "\n", "import clustrix\n", "from clustrix import cluster, configure\n", "import numpy as np\n", "import time" ], "id": "cell-2" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Basic SLURM Configuration\n", "\n", "Configure Clustrix to connect to your SLURM cluster:" ], "id": "cell-3" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Configure for SLURM cluster\n", "configure(\n", " cluster_type=\"slurm\",\n", " cluster_host=\"your-slurm-cluster.edu\", # Replace with your cluster hostname\n", " username=\"your-username\", # Replace with your username\n", " key_file=\"~/.ssh/id_rsa\", # Path to your SSH key\n", " \n", " # Default resource requirements\n", " default_cores=4,\n", " default_memory=\"8GB\",\n", " default_time=\"01:00:00\",\n", " default_partition=\"normal\", # Replace with your default partition\n", " \n", " # Remote work directory\n", " remote_work_dir=\"/scratch/your-username/clustrix\", # Adjust for your cluster\n", " \n", " # Optional: Load modules on the cluster\n", " module_loads=[\"python/3.9\", \"gcc/9.3.0\"],\n", " \n", " # Cleanup settings\n", " cleanup_on_success=True,\n", " max_parallel_jobs=20\n", ")\n", "\n", "print(\"SLURM cluster configured successfully!\")" ], "id": "cell-4" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example 1: Simple Mathematical Computation\n", "\n", "Let's start with a basic example that performs a mathematical computation on the cluster:" ], "id": "cell-5" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "@cluster(cores=2, memory=\"4GB\", time=\"00:10:00\")\n", "def calculate_pi_monte_carlo(n_samples=1000000):\n", " \"\"\"\n", " Calculate pi using Monte Carlo method.\n", " This will run on the SLURM cluster.\n", " \"\"\"\n", " import numpy as np\n", " \n", " # Generate random points\n", " x = np.random.uniform(-1, 1, n_samples)\n", " y = np.random.uniform(-1, 1, n_samples)\n", " \n", " # Check if points are inside unit circle\n", " inside_circle = (x**2 + y**2) <= 1\n", " \n", " # Estimate pi\n", " pi_estimate = 4 * np.sum(inside_circle) / n_samples\n", " \n", " return {\n", " 'pi_estimate': pi_estimate,\n", " 'n_samples': n_samples,\n", " 'error': abs(pi_estimate - np.pi)\n", " }\n", "\n", "# Execute on cluster (this will submit a SLURM job)\n", "result = calculate_pi_monte_carlo(5000000)\n", "print(f\"Pi estimate: {result['pi_estimate']:.6f}\")\n", "print(f\"Error: {result['error']:.6f}\")\n", "print(f\"Samples used: {result['n_samples']:,}\")" ], "id": "cell-6" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example 2: Machine Learning Model Training\n", "\n", "Train a machine learning model with specific resource requirements:" ], "id": "cell-7" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "@cluster(\n", " cores=8, \n", " memory=\"32GB\", \n", " time=\"02:00:00\",\n", " partition=\"gpu\", # Use GPU partition if available\n", " gres=\"gpu:1\" # Request 1 GPU (SLURM-specific)\n", ")\n", "def train_random_forest(n_samples=100000, n_features=50, n_estimators=200):\n", " \"\"\"\n", " Train a Random Forest model on synthetic data.\n", " \"\"\"\n", " from sklearn.ensemble import RandomForestClassifier\n", " from sklearn.datasets import make_classification\n", " from sklearn.model_selection import train_test_split, cross_val_score\n", " from sklearn.metrics import accuracy_score\n", " import numpy as np\n", " \n", " print(f\"Generating dataset with {n_samples:,} samples and {n_features} features...\")\n", " \n", " # Generate synthetic dataset\n", " X, y = make_classification(\n", " n_samples=n_samples,\n", " n_features=n_features,\n", " n_informative=int(n_features * 0.7),\n", " n_redundant=int(n_features * 0.2),\n", " n_clusters_per_class=2,\n", " random_state=42\n", " )\n", " \n", " # Split the data\n", " X_train, X_test, y_train, y_test = train_test_split(\n", " X, y, test_size=0.2, random_state=42\n", " )\n", " \n", " print(f\"Training Random Forest with {n_estimators} estimators...\")\n", " \n", " # Train model\n", " model = RandomForestClassifier(\n", " n_estimators=n_estimators,\n", " max_depth=20,\n", " min_samples_split=5,\n", " n_jobs=-1, # Use all available cores\n", " random_state=42\n", " )\n", " \n", " model.fit(X_train, y_train)\n", " \n", " # Evaluate model\n", " train_accuracy = accuracy_score(y_train, model.predict(X_train))\n", " test_accuracy = accuracy_score(y_test, model.predict(X_test))\n", " \n", " # Cross-validation\n", " cv_scores = cross_val_score(model, X, y, cv=5, n_jobs=-1)\n", " \n", " return {\n", " 'train_accuracy': train_accuracy,\n", " 'test_accuracy': test_accuracy,\n", " 'cv_mean': np.mean(cv_scores),\n", " 'cv_std': np.std(cv_scores),\n", " 'feature_importance': model.feature_importances_.tolist(),\n", " 'n_samples': n_samples,\n", " 'n_features': n_features,\n", " 'n_estimators': n_estimators\n", " }\n", "\n", "# Train model on cluster\n", "ml_result = train_random_forest(n_samples=50000, n_features=30, n_estimators=100)\n", "\n", "print(f\"Training Accuracy: {ml_result['train_accuracy']:.4f}\")\n", "print(f\"Test Accuracy: {ml_result['test_accuracy']:.4f}\")\n", "print(f\"Cross-validation: {ml_result['cv_mean']:.4f} ± {ml_result['cv_std']:.4f}\")" ], "id": "cell-8" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example 3: Parallel Data Processing with Automatic Loop Distribution\n", "\n", "Process multiple data chunks in parallel using Clustrix's automatic loop parallelization:" ], "id": "cell-9" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "@cluster(\n", " cores=16, \n", " memory=\"64GB\", \n", " time=\"01:30:00\",\n", " parallel=True # Enable automatic loop parallelization\n", ")\n", "def process_data_chunks(chunk_size=10000, num_chunks=20):\n", " \"\"\"\n", " Process multiple data chunks in parallel.\n", " The for loop will be automatically distributed across cores.\n", " \"\"\"\n", " import numpy as np\n", " from scipy import stats\n", " \n", " results = []\n", " \n", " # This loop will be automatically parallelized by Clustrix\n", " for chunk_id in range(num_chunks):\n", " # Generate chunk data with different random seed\n", " np.random.seed(chunk_id * 42)\n", " data = np.random.exponential(scale=2.0, size=chunk_size)\n", " \n", " # Perform statistical analysis on chunk\n", " chunk_stats = {\n", " 'chunk_id': chunk_id,\n", " 'mean': np.mean(data),\n", " 'std': np.std(data),\n", " 'median': np.median(data),\n", " 'skewness': stats.skew(data),\n", " 'kurtosis': stats.kurtosis(data),\n", " 'min': np.min(data),\n", " 'max': np.max(data),\n", " 'percentile_95': np.percentile(data, 95)\n", " }\n", " \n", " results.append(chunk_stats)\n", " \n", " # Aggregate results\n", " overall_stats = {\n", " 'num_chunks': len(results),\n", " 'total_samples': num_chunks * chunk_size,\n", " 'mean_of_means': np.mean([r['mean'] for r in results]),\n", " 'std_of_means': np.std([r['mean'] for r in results]),\n", " 'chunk_results': results\n", " }\n", " \n", " return overall_stats\n", "\n", "# Process data chunks in parallel\n", "parallel_result = process_data_chunks(chunk_size=5000, num_chunks=10)\n", "\n", "print(f\"Processed {parallel_result['num_chunks']} chunks\")\n", "print(f\"Total samples: {parallel_result['total_samples']:,}\")\n", "print(f\"Mean of chunk means: {parallel_result['mean_of_means']:.4f}\")\n", "print(f\"Std of chunk means: {parallel_result['std_of_means']:.4f}\")\n", "\n", "# Display first few chunk results\n", "print(\"\\nFirst 3 chunk results:\")\n", "for i, chunk in enumerate(parallel_result['chunk_results'][:3]):\n", " print(f\" Chunk {chunk['chunk_id']}: mean={chunk['mean']:.3f}, std={chunk['std']:.3f}\")" ], "id": "cell-10" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example 4: Scientific Computing - Numerical Integration\n", "\n", "Perform numerical integration using high-performance computing resources:" ], "id": "cell-11" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "@cluster(\n", " cores=32,\n", " memory=\"128GB\",\n", " time=\"03:00:00\",\n", " partition=\"bigmem\" # Use high-memory partition\n", ")\n", "def numerical_integration_adaptive(function_type=\"gaussian\", intervals=1000000, precision_target=1e-8):\n", " \"\"\"\n", " Perform high-precision numerical integration using adaptive methods.\n", " \"\"\"\n", " import numpy as np\n", " from scipy import integrate\n", " import math\n", " \n", " def gaussian_function(x):\n", " \"\"\"Standard Gaussian function\"\"\"\n", " return np.exp(-x**2 / 2) / np.sqrt(2 * np.pi)\n", " \n", " def oscillatory_function(x):\n", " \"\"\"Highly oscillatory function\"\"\"\n", " return np.sin(100 * x) * np.exp(-x**2)\n", " \n", " def polynomial_function(x):\n", " \"\"\"High-degree polynomial\"\"\"\n", " return x**10 * np.exp(-x)\n", " \n", " # Select function based on type\n", " functions = {\n", " \"gaussian\": (gaussian_function, -5, 5, math.erf(5/np.sqrt(2)) - math.erf(-5/np.sqrt(2))),\n", " \"oscillatory\": (oscillatory_function, -2, 2, None), # No analytical solution\n", " \"polynomial\": (polynomial_function, 0, 10, math.gamma(11)) # Analytical: 10!\n", " }\n", " \n", " if function_type not in functions:\n", " raise ValueError(f\"Unknown function type: {function_type}\")\n", " \n", " func, a, b, analytical = functions[function_type]\n", " \n", " print(f\"Integrating {function_type} function from {a} to {b}...\")\n", " print(f\"Target precision: {precision_target}\")\n", " \n", " # High-precision adaptive integration\n", " result, error = integrate.quad(\n", " func, a, b, \n", " epsabs=precision_target,\n", " epsrel=precision_target,\n", " limit=intervals\n", " )\n", " \n", " # Monte Carlo integration for comparison\n", " n_mc = 10000000 # 10 million samples\n", " x_mc = np.random.uniform(a, b, n_mc)\n", " y_mc = func(x_mc)\n", " mc_result = (b - a) * np.mean(y_mc)\n", " mc_error = (b - a) * np.std(y_mc) / np.sqrt(n_mc)\n", " \n", " integration_result = {\n", " 'function_type': function_type,\n", " 'integration_bounds': [a, b],\n", " 'adaptive_result': result,\n", " 'adaptive_error': error,\n", " 'monte_carlo_result': mc_result,\n", " 'monte_carlo_error': mc_error,\n", " 'precision_target': precision_target,\n", " 'mc_samples': n_mc\n", " }\n", " \n", " if analytical is not None:\n", " integration_result['analytical_result'] = analytical\n", " integration_result['adaptive_vs_analytical'] = abs(result - analytical)\n", " integration_result['mc_vs_analytical'] = abs(mc_result - analytical)\n", " \n", " return integration_result\n", "\n", "# Perform numerical integration\n", "integration_results = []\n", "\n", "for func_type in [\"gaussian\", \"polynomial\", \"oscillatory\"]:\n", " result = numerical_integration_adaptive(func_type, precision_target=1e-10)\n", " integration_results.append(result)\n", " \n", " print(f\"\\n{func_type.upper()} FUNCTION INTEGRATION:\")\n", " print(f\"Adaptive result: {result['adaptive_result']:.10f} ± {result['adaptive_error']:.2e}\")\n", " print(f\"Monte Carlo result: {result['monte_carlo_result']:.10f} ± {result['monte_carlo_error']:.2e}\")\n", " \n", " if 'analytical_result' in result:\n", " print(f\"Analytical result: {result['analytical_result']:.10f}\")\n", " print(f\"Adaptive error vs analytical: {result['adaptive_vs_analytical']:.2e}\")\n", " print(f\"MC error vs analytical: {result['mc_vs_analytical']:.2e}\")" ], "id": "cell-12" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example 5: Bioinformatics - Sequence Analysis\n", "\n", "Analyze biological sequences using cluster computing:" ], "id": "cell-13" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "@cluster(\n", " cores=24,\n", " memory=\"96GB\", \n", " time=\"04:00:00\",\n", " partition=\"bioqueue\" # Specialized bioinformatics partition\n", ")\n", "def analyze_genome_sequences(num_sequences=1000, sequence_length=10000):\n", " \"\"\"\n", " Analyze synthetic genome sequences for various biological properties.\n", " \"\"\"\n", " import numpy as np\n", " import random\n", " from collections import Counter\n", " import re\n", " \n", " # DNA bases\n", " bases = ['A', 'T', 'G', 'C']\n", " \n", " # Common biological motifs\n", " motifs = {\n", " 'CpG_sites': 'CG',\n", " 'TATA_box': 'TATAAA',\n", " 'start_codon': 'ATG',\n", " 'stop_codons': ['TAA', 'TAG', 'TGA'],\n", " 'poly_A': 'AAAAAAA', # 7 consecutive A's\n", " 'GC_rich': 'GCGCGC'\n", " }\n", " \n", " def generate_sequence(length, gc_content=0.5):\n", " \"\"\"Generate a random DNA sequence with specified GC content\"\"\"\n", " # Adjust probabilities for GC content\n", " gc_prob = gc_content / 2 # Equal prob for G and C\n", " at_prob = (1 - gc_content) / 2 # Equal prob for A and T\n", " \n", " probs = [at_prob, at_prob, gc_prob, gc_prob] # A, T, G, C\n", " return ''.join(np.random.choice(bases, size=length, p=probs))\n", " \n", " def analyze_sequence(sequence):\n", " \"\"\"Analyze a single sequence for biological properties\"\"\"\n", " # Basic composition\n", " composition = Counter(sequence)\n", " total_bases = len(sequence)\n", " \n", " gc_content = (composition['G'] + composition['C']) / total_bases\n", " at_content = (composition['A'] + composition['T']) / total_bases\n", " \n", " # Motif analysis\n", " motif_counts = {}\n", " motif_counts['CpG_sites'] = len(re.findall(motifs['CpG_sites'], sequence))\n", " motif_counts['TATA_boxes'] = len(re.findall(motifs['TATA_box'], sequence))\n", " motif_counts['start_codons'] = len(re.findall(motifs['start_codon'], sequence))\n", " motif_counts['poly_A_signals'] = len(re.findall(motifs['poly_A'], sequence))\n", " motif_counts['GC_rich_regions'] = len(re.findall(motifs['GC_rich'], sequence))\n", " \n", " # Stop codons (any of the three)\n", " stop_codon_count = sum(len(re.findall(codon, sequence)) for codon in motifs['stop_codons'])\n", " motif_counts['stop_codons'] = stop_codon_count\n", " \n", " # Calculate complexity (entropy)\n", " entropy = -sum((count/total_bases) * np.log2(count/total_bases) \n", " for count in composition.values() if count > 0)\n", " \n", " # Find longest homopolymer runs\n", " max_runs = {}\n", " for base in bases:\n", " runs = re.findall(f'{base}+', sequence)\n", " max_runs[f'max_{base}_run'] = max(len(run) for run in runs) if runs else 0\n", " \n", " return {\n", " 'length': total_bases,\n", " 'gc_content': gc_content,\n", " 'at_content': at_content,\n", " 'base_composition': dict(composition),\n", " 'entropy': entropy,\n", " 'motif_counts': motif_counts,\n", " 'max_homopolymer_runs': max_runs\n", " }\n", " \n", " print(f\"Generating and analyzing {num_sequences:,} sequences of length {sequence_length:,}...\")\n", " \n", " # Generate sequences with varying GC content\n", " gc_contents = np.random.uniform(0.3, 0.7, num_sequences) # Realistic range\n", " \n", " sequence_analyses = []\n", " \n", " for i, gc_content in enumerate(gc_contents):\n", " if i % 100 == 0:\n", " print(f\"Analyzing sequence {i+1}/{num_sequences}...\")\n", " \n", " sequence = generate_sequence(sequence_length, gc_content)\n", " analysis = analyze_sequence(sequence)\n", " analysis['target_gc_content'] = gc_content\n", " analysis['sequence_id'] = i\n", " sequence_analyses.append(analysis)\n", " \n", " # Aggregate statistics\n", " gc_contents_actual = [s['gc_content'] for s in sequence_analyses]\n", " entropies = [s['entropy'] for s in sequence_analyses]\n", " \n", " # Motif statistics\n", " all_motif_counts = {motif: [s['motif_counts'][motif] for s in sequence_analyses] \n", " for motif in sequence_analyses[0]['motif_counts'].keys()}\n", " \n", " aggregate_results = {\n", " 'num_sequences_analyzed': len(sequence_analyses),\n", " 'total_bases_analyzed': len(sequence_analyses) * sequence_length,\n", " 'gc_content_stats': {\n", " 'mean': np.mean(gc_contents_actual),\n", " 'std': np.std(gc_contents_actual),\n", " 'min': np.min(gc_contents_actual),\n", " 'max': np.max(gc_contents_actual)\n", " },\n", " 'entropy_stats': {\n", " 'mean': np.mean(entropies),\n", " 'std': np.std(entropies),\n", " 'min': np.min(entropies),\n", " 'max': np.max(entropies)\n", " },\n", " 'motif_statistics': {\n", " motif: {\n", " 'total_found': sum(counts),\n", " 'mean_per_sequence': np.mean(counts),\n", " 'std_per_sequence': np.std(counts),\n", " 'sequences_with_motif': sum(1 for c in counts if c > 0)\n", " } for motif, counts in all_motif_counts.items()\n", " },\n", " 'individual_analyses': sequence_analyses[:10] # Return first 10 for inspection\n", " }\n", " \n", " return aggregate_results\n", "\n", "# Analyze genome sequences\n", "genome_results = analyze_genome_sequences(num_sequences=500, sequence_length=5000)\n", "\n", "print(f\"\\nGENOME SEQUENCE ANALYSIS COMPLETE\")\n", "print(f\"Sequences analyzed: {genome_results['num_sequences_analyzed']:,}\")\n", "print(f\"Total bases: {genome_results['total_bases_analyzed']:,}\")\n", "\n", "print(\"\\nGC Content Statistics:\")\n", "gc_stats = genome_results['gc_content_stats']\n", "print(f\" Mean: {gc_stats['mean']:.3f} ± {gc_stats['std']:.3f}\")\n", "print(f\" Range: {gc_stats['min']:.3f} - {gc_stats['max']:.3f}\")\n", "\n", "print(\"\\nSequence Complexity (Entropy):\")\n", "entropy_stats = genome_results['entropy_stats']\n", "print(f\" Mean: {entropy_stats['mean']:.3f} ± {entropy_stats['std']:.3f}\")\n", "print(f\" Range: {entropy_stats['min']:.3f} - {entropy_stats['max']:.3f}\")\n", "\n", "print(\"\\nMotif Analysis:\")\n", "for motif, stats in genome_results['motif_statistics'].items():\n", " print(f\" {motif}: {stats['total_found']} total, \"\n", " f\"{stats['mean_per_sequence']:.1f}±{stats['std_per_sequence']:.1f} per sequence, \"\n", " f\"{stats['sequences_with_motif']} sequences contain motif\")" ], "id": "cell-14" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Advanced SLURM Features\n", "\n", "### Job Arrays for Parameter Sweeps\n", "\n", "Use SLURM job arrays to efficiently run parameter sweeps:" ], "id": "cell-15" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "@cluster(\n", " cores=4,\n", " memory=\"16GB\",\n", " time=\"00:30:00\",\n", " array=\"1-10\" # SLURM job array with 10 tasks\n", ")\n", "def parameter_sweep_simulation(base_params):\n", " \"\"\"\n", " Run simulation with parameter variations using SLURM job arrays.\n", " Each array task will run with different parameters.\n", " \"\"\"\n", " import os\n", " import numpy as np\n", " \n", " # Get SLURM array task ID\n", " task_id = int(os.environ.get('SLURM_ARRAY_TASK_ID', '1'))\n", " \n", " # Define parameter variations\n", " learning_rates = np.logspace(-4, -1, 10) # 10 different learning rates\n", " learning_rate = learning_rates[task_id - 1] # SLURM arrays start from 1\n", " \n", " # Update parameters\n", " params = base_params.copy()\n", " params['learning_rate'] = learning_rate\n", " params['task_id'] = task_id\n", " \n", " print(f\"Task {task_id}: Running with learning_rate = {learning_rate:.6f}\")\n", " \n", " # Simulate training process\n", " np.random.seed(task_id * 42) # Reproducible but different per task\n", " \n", " losses = []\n", " current_loss = 10.0 # Starting loss\n", " \n", " for epoch in range(params['epochs']):\n", " # Simulate gradient descent\n", " gradient = np.random.normal(0, 0.1) + 0.1 * current_loss\n", " current_loss -= learning_rate * gradient\n", " current_loss = max(0.01, current_loss) # Prevent negative loss\n", " losses.append(current_loss)\n", " \n", " final_loss = losses[-1]\n", " convergence_epoch = next((i for i, loss in enumerate(losses) if loss < 0.1), len(losses))\n", " \n", " return {\n", " 'task_id': task_id,\n", " 'learning_rate': learning_rate,\n", " 'final_loss': final_loss,\n", " 'convergence_epoch': convergence_epoch,\n", " 'loss_history': losses[::10], # Every 10th loss for brevity\n", " 'converged': final_loss < 0.1\n", " }\n", "\n", "# Run parameter sweep\n", "base_parameters = {\n", " 'epochs': 1000,\n", " 'batch_size': 32,\n", " 'model_size': 'medium'\n", "}\n", "\n", "# This will submit a SLURM job array with 10 tasks\n", "sweep_results = parameter_sweep_simulation(base_parameters)\n", "\n", "print(f\"Parameter sweep completed for task {sweep_results['task_id']}\")\n", "print(f\"Learning rate: {sweep_results['learning_rate']:.6f}\")\n", "print(f\"Final loss: {sweep_results['final_loss']:.4f}\")\n", "print(f\"Converged: {sweep_results['converged']}\")\n", "if sweep_results['converged']:\n", " print(f\"Convergence epoch: {sweep_results['convergence_epoch']}\")" ], "id": "cell-16" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Monitoring and Debugging\n", "\n", "Use Clustrix's built-in monitoring capabilities:" ], "id": "cell-17" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from clustrix import ClusterExecutor\n", "\n", "# Get the configured executor\n", "config = clustrix.get_config()\n", "executor = ClusterExecutor(config)\n", "\n", "# Check cluster connectivity\n", "try:\n", " executor.connect()\n", " print(\"✓ Successfully connected to SLURM cluster\")\n", " \n", " # Test basic command execution\n", " stdout, stderr = executor._execute_command(\"sinfo --version\")\n", " print(f\"✓ SLURM version: {stdout.strip()}\")\n", " \n", " # Check available partitions\n", " stdout, stderr = executor._execute_command(\"sinfo -h -o '%P %A %l'\")\n", " print(\"\\nAvailable partitions:\")\n", " for line in stdout.strip().split('\\n')[:5]: # Show first 5 partitions\n", " parts = line.split()\n", " if len(parts) >= 3:\n", " partition, avail, timelimit = parts[0], parts[1], parts[2]\n", " print(f\" {partition}: {avail} nodes available, time limit: {timelimit}\")\n", " \n", " executor.disconnect()\n", " print(\"\\n✓ Connection test completed successfully\")\n", " \n", "except Exception as e:\n", " print(f\"✗ Connection failed: {e}\")\n", " print(\"Please check your cluster configuration and SSH setup\")" ], "id": "cell-18" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Configuration Best Practices\n", "\n", "### 1. Environment-Specific Configuration\n", "\n", "Create different configurations for different environments:" ], "id": "cell-19" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Development configuration (smaller resources)\n", "dev_config = {\n", " 'cluster_type': 'slurm',\n", " 'cluster_host': 'dev-cluster.university.edu',\n", " 'username': 'your-username',\n", " 'default_cores': 2,\n", " 'default_memory': '4GB',\n", " 'default_time': '00:15:00',\n", " 'default_partition': 'debug',\n", " 'max_parallel_jobs': 5\n", "}\n", "\n", "# Production configuration (larger resources)\n", "prod_config = {\n", " 'cluster_type': 'slurm',\n", " 'cluster_host': 'hpc-cluster.university.edu',\n", " 'username': 'your-username',\n", " 'default_cores': 16,\n", " 'default_memory': '64GB',\n", " 'default_time': '04:00:00',\n", " 'default_partition': 'normal',\n", " 'max_parallel_jobs': 50\n", "}\n", "\n", "# Choose configuration based on environment\n", "import os\n", "environment = os.environ.get('CLUSTRIX_ENV', 'development')\n", "\n", "if environment == 'production':\n", " clustrix.configure(**prod_config)\n", " print(\"Configured for production environment\")\n", "else:\n", " clustrix.configure(**dev_config)\n", " print(\"Configured for development environment\")" ], "id": "cell-20" }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2. Resource Estimation Guidelines\n", "\n", "Guidelines for choosing appropriate resources:" ], "id": "cell-21" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def estimate_resources(task_type, data_size_mb, complexity='medium'):\n", " \"\"\"\n", " Estimate computational resources needed for different task types.\n", " \"\"\"\n", " \n", " base_configs = {\n", " 'data_processing': {\n", " 'cores': max(2, min(16, data_size_mb // 100)),\n", " 'memory_gb': max(4, min(64, data_size_mb // 10)),\n", " 'time_hours': max(0.5, min(8, data_size_mb / 1000))\n", " },\n", " 'machine_learning': {\n", " 'cores': max(4, min(32, data_size_mb // 50)),\n", " 'memory_gb': max(8, min(128, data_size_mb // 5)),\n", " 'time_hours': max(1, min(12, data_size_mb / 500))\n", " },\n", " 'simulation': {\n", " 'cores': max(8, min(64, data_size_mb // 25)),\n", " 'memory_gb': max(16, min(256, data_size_mb // 2)),\n", " 'time_hours': max(2, min(24, data_size_mb / 100))\n", " },\n", " 'bioinformatics': {\n", " 'cores': max(4, min(24, data_size_mb // 20)),\n", " 'memory_gb': max(16, min(128, data_size_mb // 2)),\n", " 'time_hours': max(1, min(16, data_size_mb / 200))\n", " }\n", " }\n", " \n", " if task_type not in base_configs:\n", " raise ValueError(f\"Unknown task type: {task_type}\")\n", " \n", " config = base_configs[task_type].copy()\n", " \n", " # Adjust for complexity\n", " complexity_multipliers = {\n", " 'low': 0.7,\n", " 'medium': 1.0,\n", " 'high': 1.5,\n", " 'very_high': 2.0\n", " }\n", " \n", " multiplier = complexity_multipliers.get(complexity, 1.0)\n", " \n", " config['cores'] = int(config['cores'] * multiplier)\n", " config['memory_gb'] = int(config['memory_gb'] * multiplier)\n", " config['time_hours'] = config['time_hours'] * multiplier\n", " \n", " # Format time as HH:MM:SS\n", " hours = int(config['time_hours'])\n", " minutes = int((config['time_hours'] - hours) * 60)\n", " config['time_formatted'] = f\"{hours:02d}:{minutes:02d}:00\"\n", " \n", " return config\n", "\n", "# Example usage\n", "examples = [\n", " ('machine_learning', 1000, 'high'),\n", " ('data_processing', 5000, 'medium'),\n", " ('simulation', 100, 'very_high'),\n", " ('bioinformatics', 2000, 'high')\n", "]\n", "\n", "print(\"Resource Estimation Examples:\")\n", "print(\"=\" * 80)\n", "\n", "for task_type, data_size, complexity in examples:\n", " resources = estimate_resources(task_type, data_size, complexity)\n", " print(f\"\\n{task_type.replace('_', ' ').title()} ({data_size} MB, {complexity} complexity):\")\n", " print(f\" Cores: {resources['cores']}\")\n", " print(f\" Memory: {resources['memory_gb']} GB\")\n", " print(f\" Time: {resources['time_formatted']} ({resources['time_hours']:.1f} hours)\")" ], "id": "cell-22" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Summary\n", "\n", "This tutorial covered:\n", "\n", "1. **Basic SLURM Configuration** - Setting up Clustrix for SLURM clusters\n", "2. **Simple Computations** - Monte Carlo methods and mathematical functions\n", "3. **Machine Learning** - Training models with GPU support\n", "4. **Parallel Processing** - Automatic loop distribution across cores\n", "5. **Scientific Computing** - High-precision numerical integration\n", "6. **Bioinformatics** - Genome sequence analysis\n", "7. **Advanced Features** - Job arrays and parameter sweeps\n", "8. **Monitoring** - Connection testing and debugging\n", "9. **Best Practices** - Resource estimation and configuration management\n", "\n", "### Key Takeaways:\n", "\n", "- **Resource Planning**: Always estimate resources based on your data size and complexity\n", "- **Partition Selection**: Choose appropriate SLURM partitions for your workload\n", "- **Time Limits**: Set realistic time limits with some buffer for completion\n", "- **Memory Management**: Monitor memory usage and adjust accordingly\n", "- **Parallel Efficiency**: Use automatic parallelization for loop-heavy computations\n", "- **Error Handling**: Always test connectivity and handle failures gracefully\n", "\n", "### Next Steps:\n", "\n", "- Check out the [PBS Tutorial](pbs_tutorial.ipynb) for Torque/PBS clusters\n", "- Explore [Kubernetes Tutorial](kubernetes_tutorial.ipynb) for containerized computing\n", "- Review the [SSH Setup Guide](../ssh_setup.rst) for secure authentication\n", "- Read the [API Documentation](../api/decorator.rst) for advanced decorator options\n", "\n", "For more information, visit the [Clustrix Documentation](https://clustrix.readthedocs.io)." ], "id": "cell-23" } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.0" } }, "nbformat": 4, "nbformat_minor": 4 }