NeurIPS 2025 Spotlight — MLIP Arena has been accepted as a Spotlight at the 39th Annual Conference on Neural Information Processing Systems (NeurIPS 2025). Read the paper
Why MLIP Arena?
Beyond Error Metrics
Move past static DFT reference comparisons. MLIP Arena reveals failure modes in real-world physical tasks like MD stability and combustion.
Fair Benchmarking
Reproducible, leakage-free benchmarks designed to be agnostic to model architecture and training dataset.
15+ Foundation Models
Unified interface for MACE-MP, CHGNet, M3GNet, SevenNet, ORBv2, eqV2, eSEN, MatterSim, ALIGNN, ANI2x, and more.
HPC-Scale Workflows
Prefect-powered orchestration for parallel benchmark execution on high-throughput computing clusters.
Key Capabilities
Modular Tasks
OPT, EOS, MD, PHONON, NEB, ELASTICITY — composable and reusable across benchmarks.
Physical Soundness Tests
Homonuclear diatomics, energy conservation, force equivariance, equation of state.
Live Leaderboard
Real-time benchmark results visualized on Hugging Face Spaces with interactive Streamlit dashboards.
Benchmark Suite
MLIP Arena evaluates models across two main categories: Fundamentals — tests of basic physical correctness:- Homonuclear Diatomics — dissociation energy curves for elemental pairs
- Equation of State — energy-volume relationships for bulk crystals
- Energy-Volume Scans — WBM dataset energy-volume profiles
- MD Stability — long-timescale NVT/NPT simulation stability
- Combustion — reactive molecular dynamics for combustion reactions
Supported Models
| Model | Family | Training Data | Predictions |
|---|---|---|---|
| MACE-MP(M) | MACE | MPTrj | EFS |
| MACE-MPA | MACE | MPTrj, Alexandria | EFS |
| CHGNet | CHGNet | MPTrj | EFSM |
| M3GNet | MatGL | MPF | EFS |
| MatterSim | MatterSim | MPTrj, Alexandria | EFS |
| ORBv2 | ORB | MPTrj, Alexandria | EFS |
| SevenNet | SevenNet | MPTrj | EFS |
| eqV2(OMat) | FairChem | OMat, MPTrj, Alexandria | EFS |
| eSEN | FairChem | OMat, MPTrj, Alexandria | EFS |
| ANI2x | ANI | COMP6 | EFS |
| ALIGNN | ALIGNN | MP22 | EFS |
| DeepMD | DeepMD | MPTrj | EFS |
Quick Start
Installation
Install from PyPI or build from source with all model dependencies.
Quickstart
Run your first benchmark in minutes.