Skip to main content

Physical motivation

The energy-volume (E-V) scan benchmark tests a model’s ability to produce physically consistent energy landscapes across a broad sweep of materials chemistry — specifically for structures drawn from the WBM thermodynamic stability screening dataset. Unlike the Equation of State benchmark, this benchmark applies uniaxial strain without prior structure relaxation. This tests the model’s raw energy surface rather than its optimized EOS fit, and is a stricter test of transferability: a model cannot rely on relaxing away an unfavorable starting geometry. Key questions this benchmark answers:
  • Does the model produce a smooth, bowl-shaped energy well around the reference volume?
  • Does it assign lower energy to compressed and expanded structures in the correct order?
  • Does the energy profile behave consistently across very different crystal chemistries?

The WBM dataset

The benchmark uses structures from the WBM dataset (Wang, Bai, and Materials Project), stored in benchmarks/wbm_structures.db as an ASE database file. The WBM dataset was constructed by a high-throughput DFT screening workflow that predicts thermodynamic stability relative to the Materials Project convex hull. It contains materials spanning a wide range of compositions and crystal symmetries.
The same wbm_structures.db file is shared between the Energy-Volume Scans benchmark and the Equation of State benchmark. The two benchmarks differ in their strain protocol: EOS uses isotropic strain after relaxation; E-V scans use uniaxial strain on the unrelaxed reference structure.

What is measured

For each WBM structure, the benchmark applies uniaxial strain across a range of ±20% in 21 evenly spaced steps, scaling the unit cell while keeping fractional atomic coordinates fixed. At each strain point, the model evaluates the potential energy. The strain protocol is defined in benchmarks/wbm_ev/run.py:
max_abs_strain = 0.2
npoints = 21
for uniaxial_strain in np.linspace(-max_abs_strain, max_abs_strain, npoints):
    scale_factor = uniaxial_strain + 1
    cloned.set_cell(c0 * scale_factor, scale_atoms=True)
    energies.append(cloned.get_potential_energy())

Metrics

The leaderboard reports the following per-structure metrics:
MetricDescription
volume-ratio V/V₀Strained volume normalized by the reference (unstrained) volume
energy-delta-per-atom ΔE/NEnergy relative to the minimum, per atom (eV/atom)
energy-diff-flip-timesNumber of sign changes in dE/dV — measures curve smoothness
tortuosityTotal variation of the energy curve divided by its range
spearman-compression-energySpearman rank correlation between volume and energy under compression
spearman-compression-derivativeSpearman rank correlation between volume and dE/dV under compression
spearman-tension-energySpearman rank correlation between volume and energy under tension
missingWhether the model failed to return a result for this structure
The y-axis on the leaderboard shows relative energy per atom (eV/atom) normalized to zero at the minimum. This makes curves for different materials directly comparable on the same plot.

Model support

The following models support this benchmark (gpu-tasks: wbm_ev in the model registry):
ModelFamilyTraining data
MACE-MP(M)mace-mpMPTrj
MACE-MPAmace-mpMPTrj, Alexandria
CHGNetchgnetMPTrj
M3GNetmatglMPF
MatterSimmattersimMPTrj, Alexandria
ORBv2orbMPTrj, Alexandria
SevenNetsevennetMPTrj
eqV2(OMat)fairchemOMat, MPTrj, Alexandria
eSENfairchemOMat, MPTrj, Alexandria
ALIGNNalignnMP22

How to run

1

Configure your cluster

Edit the SLURM settings in benchmarks/wbm_ev/run.py to match your HPC environment. The benchmark uses Dask-JobQueue to dispatch tasks to SLURM workers.
2

Run the scans

python benchmarks/wbm_ev/run.py
Results are saved as Parquet files: benchmarks/wbm_ev/<ModelName>.parquet.
3

Analyze results

python benchmarks/wbm_ev/analyze.py
This generates summary.csv and summary.tex aggregating metrics across all structures per model.
Running the full benchmark across all WBM structures for all models requires significant GPU time. The cluster.adapt(minimum_jobs=25, maximum_jobs=50) setting in run.py assumes a large SLURM cluster. Adjust accordingly for smaller environments.

Interpreting results

A well-performing model on this benchmark produces E-V profiles that:
  • Have a clear single minimum near V/V₀ = 1 (the reference DFT volume)
  • Are smooth and convex near the minimum, with no energy oscillations
  • Show physically correct asymptotic behavior — energy rises monotonically for large compression and large tension
  • Have a low missing rate — the model handles diverse crystal chemistries without crashing
A high tortuosity score or many energy-diff-flip-times indicates that the model’s energy landscape is non-convex or noisy, which correlates with instability in downstream MD simulations.
Compare E-V scan results against Equation of State results for the same model. If a model performs well on EOS (which uses relaxed structures) but poorly on E-V scans (unrelaxed), it may be over-relying on structural relaxation to reach the basin of attraction.