Clustrix Documentationยถ
Clustrix is a Python package that enables seamless distributed computing on clusters. With a simple decorator, you can execute any Python function remotely on cluster resources while automatically handling dependency management, environment setup, and result collection.
Featuresยถ
Simple Decorator Interface: Just add
@clusterto any functionAdvanced Function Packaging: AST-based dependency analysis replaces pickle limitations
Interactive Jupyter Widget:
%%clusterfymagic command with GUI configuration managerMultiple Cluster Support: SLURM, PBS, SGE, Kubernetes, and SSH
Unified Filesystem Utilities: Work with files seamlessly across local and remote clusters
Shared Storage Optimization: Automatic detection and optimization for HPC shared filesystems
Native Cost Monitoring: Built-in cost tracking for AWS, GCP, Azure, and Lambda Cloud
Automatic Dependency Management: Captures and replicates your exact Python environment
Loop Parallelization: Automatically distributes loops across cluster nodes
Local Parallelization: Multi-core execution for development and testing
Flexible Configuration: Easy setup with config files, environment variables, or interactive widget
Error Handling: Comprehensive error reporting and job monitoring
Quick Startยถ
Installationยถ
pip install clustrix
Basic Usageยถ
import clustrix
# Configure your cluster
clustrix.configure(
cluster_type='slurm',
cluster_host='your-cluster.example.com',
username='your-username',
default_cores=4,
default_memory='8GB'
)
# Decorate your function
@clustrix.cluster(cores=8, memory='16GB', time='02:00:00')
def expensive_computation(data, iterations=1000):
import numpy as np
result = 0
for i in range(iterations):
result += np.sum(data ** 2)
return result
# Execute on cluster
data = [1, 2, 3, 4, 5]
result = expensive_computation(data, iterations=10000)
print(f"Result: {result}")
Jupyter Notebook Integrationยถ
For Jupyter notebook users, Clustrix provides an interactive configuration widget:
import clustrix # Auto-loads the magic command and displays widget
%%clusterfy
# Interactive widget appears with:
# - Dropdown to select configurations
# - Forms to create/edit cluster setups
# - One-click configuration application
# - Save/load configurations to files
Interactive Configuration Widgetยถ
The Clustrix widget provides a comprehensive GUI for managing cluster configurations directly in Jupyter notebooks.
Default View
When you first import clustrix or use the %%clusterfy magic command, the widget displays with a default โLocal Single-coreโ configuration:
Configuration Templates
The dropdown menu includes pre-built templates for various cluster types and cloud providers:
HPC Cluster Configuration
For traditional HPC clusters like SLURM, the widget provides all essential configuration fields:
The advanced settings accordion reveals additional options for modules, environment variables, and custom commands:
Cloud Provider Support
Cloud providers have dynamic field visibility showing only relevant options:
Google Cloud Platform:
Lambda Cloud GPU Instances:
The widget includes templates for AWS, Google Cloud, Azure, SLURM, Kubernetes, Lambda Cloud, and HuggingFace Spaces.
Table of Contentsยถ
User Guide
Tutorials
Interactive Notebooks
- Filesystem Utilities Tutorial
- Clustrix Configuration Manager Example
- Complete Clustrix API Demonstration
- SLURM Cluster Tutorial
- Prerequisites
- Installation and Setup
- Basic SLURM Configuration
- Example 1: Simple Mathematical Computation
- Example 2: Machine Learning Model Training
- Example 3: Parallel Data Processing with Automatic Loop Distribution
- Example 4: Scientific Computing - Numerical Integration
- Example 5: Bioinformatics - Sequence Analysis
- Advanced SLURM Features
- Monitoring and Debugging
- Configuration Best Practices
- Summary
- PBS/Torque Cluster Tutorial
- Prerequisites
- Installation and Setup
- PBS Cluster Configuration
- Example 1: Bioinformatics - DNA Sequence Analysis
- Example 2: Materials Science - Molecular Dynamics Simulation
- Example 3: Environmental Science - Climate Data Analysis
- PBS Queue Management and Resource Selection
- PBS Job Arrays for Parameter Studies
- Monitoring PBS Jobs
- PBS Configuration Best Practices
- Summary
- SGE (Sun Grid Engine) Tutorial
- Kubernetes Tutorial
- Prerequisites
- Kubernetes Configuration
- Example 1: Containerized Machine Learning
- Example 2: Distributed Data Processing
- Example 3: Fault-Tolerant Scientific Computing
- Kubernetes Resource Management and Best Practices
- Clustrix Kubernetes Configuration Examples
- Kubernetes Job Patterns
- Kubernetes Resource Management Guidelines
- Kubernetes Cluster Monitoring
- Summary
- ๐ SSH Remote Execution Tutorial
- โจ New: Automated SSH Key Setup
- ๐ Prerequisites
- ๐ Step 1: Automated SSH Key Setup
- โ๏ธ Step 2: Configure Clustrix
- ๐งฎ Example 1: Basic Remote Computation
- ๐ Example 2: Remote Data Processing with NumPy
- ๐๏ธ Example 3: Remote File System Analysis
- ๐งช Example 4: Remote Environment Testing
- ๐ง SSH Connection Testing and Troubleshooting
- ๐ Summary and Best Practices
- Clustrix Basic Usage Tutorial
Cloud Platform Tutorials
- Amazon Web Services (AWS) Cloud Tutorial
- Overview
- Prerequisites
- Complete AWS Setup Guide
- Installation and Setup
- AWS Credentials Configuration
- Method 1: Direct EC2 Instance Configuration
- Method 2: AWS Batch Configuration
- Method 3: AWS ParallelCluster Integration
- Data Management with S3
- Security Best Practices
- Cost Optimization
- Resource Cleanup
- Advanced Example: Distributed Machine Learning
- Summary
- Microsoft Azure Cloud Tutorial
- Overview
- Prerequisites
- Step-by-Step Setup Guide
- Step 4: Azure Authentication Setup
- Method 1: Azure Virtual Machines Configuration
- Method 2: Azure Batch Configuration
- Method 3: Azure CycleCloud Integration
- Data Management with Azure Blob Storage
- Azure Machine Learning Compute Integration
- Security Best Practices
- Cost Management and Optimization
- Resource Cleanup
- Advanced Example: Distributed Image Processing
- Summary
- Google Cloud Platform (GCP) Tutorial
- Overview
- Complete Setup Guide from Scratch
- Prerequisites Checklist
- Installation and Setup
- GCP Authentication Setup
- Method 1: Google Compute Engine Configuration
- Method 2: Google Kubernetes Engine (GKE) Configuration
- Method 3: Google Cloud Batch
- Data Management with Google Cloud Storage
- Vertex AI Integration
- Security Best Practices
- Resource Cleanup
- Advanced Example: Distributed Scientific Computing
- Summary
- HuggingFace Spaces Tutorial
- Lambda Cloud Tutorial
- Overview
- Prerequisites
- Installation and Setup
- Lambda Cloud Authentication and Setup
- Configure Clustrix for Lambda Cloud
- Example 1: Distributed Deep Learning Training
- Example 2: Transformer Model Fine-tuning
- Example 3: Computer Vision with Large Datasets
- Multi-GPU Training on Lambda Cloud
- Multi-GPU Training on Lambda Cloud
- Cost Optimization Strategies
- Lambda Cloud Cost Optimization
- Best Practices and Troubleshooting
- Lambda Cloud Best Practices
- Instance Management and Cleanup
- Summary
- Cloud Cost Monitoring and Optimization
API Reference
- Decorator API
- Filesystem Utilities
FileInfoDiskUsageClusterFilesystemcluster_ls()cluster_find()cluster_stat()cluster_exists()cluster_isdir()cluster_isfile()cluster_glob()cluster_du()cluster_count_files()- Overview
- Key Features
- Core Functions
- Data Classes
- Core Implementation
- Usage Examples
- Error Handling
- Best Practices
- See Also
- Dependency Analysis
FilesystemCallImportInfoLocalFunctionCallFileReferenceDependencyGraphDependencyAnalyzerLoopAnalyzeranalyze_function_dependencies()analyze_function_loops()- Overview
- Key Features
- Core Components
- Data Structures
- Convenience Functions
- Usage Examples
- Error Handling
- Best Practices
- Integration with Packaging
- Limitations
- See Also
- File Packaging System
PackageInfoExecutionContextFilePackagercreate_execution_context()package_function_for_execution()- Overview
- Key Features
- Architecture
- Core Components
- Usage Examples
- Error Handling
- Configuration and Options
- Performance Considerations
- Best Practices
- Integration with @cluster Decorator
- Troubleshooting
- See Also
- Configuration API
- Notebook Magic Commands
- Cost Monitoring
- Local Executor API
Supported Cluster Typesยถ
Traditional HPC Schedulers
Cluster Type |
Status |
Notes |
|---|---|---|
SLURM |
โ Full Support |
Production ready |
PBS/Torque |
โ Full Support |
Production ready |
SGE |
โ Full Support |
Production ready |
SSH |
โ Full Support |
Direct execution |
Container Orchestration
Platform |
Status |
Notes |
|---|---|---|
Kubernetes |
โ Full Support |
Native K8s API with auto-deps |
AWS EKS |
โ Full Support |
Kubernetes + AWS integration |
Azure AKS |
โ Full Support |
Kubernetes + Azure integration |
Google GKE |
โ Full Support |
Kubernetes + GCP integration |
Cloud Computing Platforms
Platform |
Status |
Notes |
|---|---|---|
AWS EC2 |
โ Full Support |
Auto-provisioning + cost monitor |
Azure VMs |
โ Full Support |
Auto-provisioning + cost monitor |
Google Cloud |
โ Full Support |
Auto-provisioning + cost monitor |
Lambda Cloud |
โ Full Support |
GPU-optimized instances |
HF Spaces |
โ Full Support |
Hugging Face Spaces integration |
Linksยถ
GitHub Repository: https://github.com/ContextLab/clustrix
PyPI Package: https://pypi.org/project/clustrix/
Issue Tracker: https://github.com/ContextLab/clustrix/issues
Discussions: https://github.com/ContextLab/clustrix/discussions