Pipeline overview
Layers in detail
Models
Every supported MLIP is wrapped as an ASECalculator subclass and registered in mlip_arena/models/registry.yaml. At import time mlip_arena/models/__init__.py reads the registry, dynamically imports each model class, and builds MLIPEnum — a Python Enum where each member’s value is its calculator class.
Models fall into two categories:
- External ASE calculators — implemented under
mlip_arena/models/externals/. These wrap third-party packages (e.g.,mace-torch,chgnet,matgl) and expose an ASECalculatorinterface. - HuggingFace models — inherit
MLIP(which extendsnn.ModuleandPyTorchModelHubMixin), enabling checkpoint upload and download via the Hub.
Tasks
A task is one operation on one input structure decorated with Prefect’s@task. Each task:
- Accepts an
atoms: Atomsobject and acalculator: BaseCalculator. - Uses
TASK_SOURCE + INPUTScache policy so identical work is not repeated. - Returns a dictionary of results (relaxed structure, energies, trajectory data, etc.).
EOS internally calls OPT for full relaxation followed by a series of constrained OPT tasks at different volumes.
Flows
A flow wraps multiple task calls under a Prefect@flow and uses .submit() to dispatch them concurrently to workers. Flows are what you run in production on an HPC cluster or locally with a Prefect agent.
Benchmarks
Benchmarks are Python scripts (or Jupyter notebooks) underbenchmarks/ that build a flow over all MLIPEnum members, collect results, and upload them to the atomind/mlip-arena HuggingFace dataset repository.
Leaderboard
serve/app.py is a Streamlit application hosted on Hugging Face Spaces. It reads result data from the dataset repository and renders interactive benchmark pages for each task registered in mlip_arena/tasks/registry.yaml.
Prefect workflow orchestration
MLIP Arena uses Prefect as its workflow engine. Prefect provides:Task caching
Results are cached by
TASK_SOURCE + INPUTS policy. Re-running a benchmark skips already-completed calculations.Parallel execution
.submit() dispatches tasks to a Prefect worker pool, enabling concurrent execution across models and structures.HPC integration
dask_jobqueue integrates with SLURM, PBS, and other schedulers for cluster-scale parallelism.Observability
The Prefect UI tracks task states, logs, and failure reasons for every benchmark run.
HuggingFace integration
MLIP Arena uses three HuggingFace surfaces:| Surface | Purpose | Key operation |
|---|---|---|
| Model repos | Store pretrained MLIP checkpoints | MLIP.from_pretrained(repo_id) |
Dataset repo (atomind/mlip-arena) | Store benchmark results as JSON | HfApi.upload_file() |
Spaces (atomind/mlip-arena) | Host the Streamlit leaderboard | streamlit run serve/app.py |
ASE Calculator abstraction
All models expose a unified interface through ASE’sCalculator base class. This means any task written against BaseCalculator works with any registered model without modification.
Registry pattern
Both models and tasks use a YAML registry as a single source of truth for metadata.- Model registry
- Task registry
mlip_arena/models/registry.yaml stores per-model metadata: Python module path, class name, model family, training datasets, supported tasks, prediction types, and license.At import time, __init__.py reads this file and imports each class:Adding a new model or benchmark does not require changing Python code in the core library — only the relevant YAML registry needs updating.