add DPA-ADAPT toolkit for downstream property adaptation#5572
add DPA-ADAPT toolkit for downstream property adaptation#5572zhaiwenxi wants to merge 160 commits into
Conversation
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
feat: add DeePMD property tools
for more information, see https://pre-commit.ci
Add property tools
dpa_tools merge
…t, unify --target-key
…utput parsing - DPAFineTuner: extract _FrozenSklearnPipeline helper; keep public API unchanged - MFTFineTuner: defer _read_fitting_net_from_ckpt to first access - DPATrainer._parse_test_output: single anchored regex per metric, auto-detect format
…perty metrics - _load_labels: accept str | list[str], stack columns for multi-property - build_sklearn_head: n_outputs param, wrap RF/Ridge with MultiOutputRegressor - evaluate: per-property mae/rmse/r2 dict when target_key is a list - freeze/DPAPredictor: store and load target_key as-is (str or list) - CLI: --target-key homo,lumo parsed via _maybe_split_list - 6 new tests covering fit, evaluate, freeze/load round-trip
The old _load_descriptor_model, _validate_type_map, _remap_atom_types, _extract_features_cached, and _extract_features method bodies were left in place alongside the new thin delegators, causing CodeQL 'variable defined multiple times' warnings. Removed the old bodies; kept _extract_features_cached on DPAFineTuner directly so that test patches on DPAFineTuner._extract_features are honoured through the cache wrapper.
… method - Replace try/except ImportError in _unwrap_multioutput with direct import (sklearn is always available when dpa_tools is loaded) - Remove _FrozenSklearnPipeline.extract_features_cached (dead code; the caching wrapper lives on DPAFineTuner so test patches work)
The workflow still referenced the deleted deepmd_property_tools/ directory. Updated paths trigger to deepmd/dpa_tools/** and test command to source/tests/dpa_tools/. Added torch to lightweight dependencies.
numpy 2.3+ requires Python>=3.11, but the property_tools_tests workflow runs on Python 3.10. Pin numpy>=1.21,<2.2 to keep the lightweight dependency install working on older Python.
refactor: unify dpa_tools CLI/API and merge deepmd_property_tools
Fix unicode headers in dp test detail output
Signed-off-by: zhaiwenxi <144502730+zhaiwenxi@users.noreply.github.com>
fix: guard _sklearn._device assignment against None
ci: align build wheel workflow with upstream
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #5572 +/- ##
==========================================
- Coverage 82.14% 82.14% -0.01%
==========================================
Files 900 901 +1
Lines 104139 104139
Branches 4471 4473 +2
==========================================
- Hits 85550 85547 -3
- Misses 17178 17181 +3
Partials 1411 1411 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
Linking the property-workflow issues that this PR covers under the updated DPA-ADAPT command surface:
Command-name update: this PR implements the workflow as standalone Authored by OpenClaw (model: custom-chat-jinzhezeng-group/gpt-5.5) |
njzjz-bot
left a comment
There was a problem hiding this comment.
Thanks for putting this together. I think this needs another revision before merge: there are a few correctness issues in the DPA-ADAPT code path, and the example/test material should be trimmed and made portable. I left inline comments on the specific blockers.
Authored by OpenClaw (model: custom-chat-jinzhezeng-group/gpt-5.5)
| self._sklearn._model = self._model | ||
| if self._device is not None: | ||
| self._sklearn._device = self._device | ||
| self._sklearn._checkpoint_type_map = self._checkpoint_type_map |
There was a problem hiding this comment.
This sync overwrites the pipeline's checkpoint type_map with the parent object's initial []. _FrozenSklearnPipeline.load_descriptor_model() sets self._checkpoint_type_map from the checkpoint, but the parent DPAFineTuner._checkpoint_type_map is never updated, so the next _ensure_sklearn() call clears it again. That disables unsupported-element validation and local-to-checkpoint atom-type remapping; for non-prefix type maps, descriptors can be computed with wrong atom-type indices. Please either sync the loaded value back to the parent or avoid overwriting the pipeline value after it is loaded.
Authored by OpenClaw (model: custom-chat-jinzhezeng-group/gpt-5.5)
| def _per_system_cache_path(system) -> Path: | ||
| """Return the cache path for a single system's descriptors.""" | ||
| fp = _system_fingerprint(system) | ||
| return _cache_dir() / f"{fp}.npy" |
There was a problem hiding this comment.
The per-system descriptor cache key only depends on the input system fingerprint, but ensure_per_system_cache() also takes pretrained, model_branch, and pooling. A cache file generated with one checkpoint/branch/pooling will be silently reused for another, which can train/evaluate on stale descriptors. Please include the resolved checkpoint identity/mtime, branch, and pooling in the per-system key; the bulk _cache_key() above should also include model_branch.
Authored by OpenClaw (model: custom-chat-jinzhezeng-group/gpt-5.5)
| stderr=subprocess.STDOUT, | ||
| text=True, | ||
| bufsize=1, | ||
| cwd=self.output_dir, |
There was a problem hiding this comment.
Running dp with cwd=self.output_dir breaks the default relative output_dir. Just above, input_json is built as ./dpa_output/mft_input.json (or similar); after changing cwd into ./dpa_output, the command now looks for ./dpa_output/dpa_output/mft_input.json. Relative train/aux paths embedded in the generated config have the same issue. Please use absolute paths in the config/command or run from the original working directory.
Authored by OpenClaw (model: custom-chat-jinzhezeng-group/gpt-5.5)
| # ADAPT example | ||
|
|
||
| This directory contains a small ready-to-run example for `dpa_adapt`. | ||
| The example uses 50 pre-processed QM9 molecules to fine-tune and evaluate a |
There was a problem hiding this comment.
This example currently commits 50 preprocessed QM9 systems (252 files, about 1.5 MB) under examples/dpa_adapt/data. That feels too large and noisy for a repository example, especially since prepare_data.py can regenerate data. Please reduce the checked-in dataset to the minimal number of tiny systems needed to demonstrate the commands (or keep only generated-on-demand data), and leave larger QM9 regeneration to the script/docs.
Authored by OpenClaw (model: custom-chat-jinzhezeng-group/gpt-5.5)
| import numpy as np | ||
|
|
||
| # ── paths ────────────────────────────────────────────────────────────────── | ||
| DEMO_DIR = Path("/home/ziren/aisi-intern/deepmd-kit/examples/dpa_adapt/data") |
There was a problem hiding this comment.
This test is not portable: it hard-codes a local /home/ziren/... checkout and, below, a local pretrained checkpoint path. It will fail for anyone else running the repository tests locally and should not be merged as-is. Please move this under source/tests/dpa_adapt/ and build paths from the repository root / tmp_path, with any real checkpoint-dependent coverage skipped or mocked.
Authored by OpenClaw (model: custom-chat-jinzhezeng-group/gpt-5.5)
| - **implements the Deep Potential series models**, which have been successfully applied to finite and extended systems, including organic molecules, metals, semiconductors, insulators, etc. | ||
| - **implements MPI and GPU supports**, making it highly efficient for high-performance parallel and distributed computing. | ||
| - **highly modularized**, easy to adapt to different descriptors for deep learning-based potential energy models. | ||
| - **fine-tunes pre-trained DPA models through a scikit-learn-style Python API**, via [`dpa_adapt`](dpa_adapt/README.md) — construct a `DPAFineTuner`, then `fit` and `predict` to adapt a large pre-trained model to your own property dataset, with no input files to write. |
There was a problem hiding this comment.
This link points to dpa_adapt/README.md, but this PR does not add that file (the README lives under doc/dpa_adapt/README.md). As written, the top-level README will contain a broken link. Please either add the package README or link to the documentation path that actually exists.
Authored by OpenClaw (model: custom-chat-jinzhezeng-group/gpt-5.5)
|
|
||
| # TODO: replace with dedicated DescriptorExtractor class after refactor. | ||
| # For now, DPAFineTuner is reused purely as a descriptor feature extractor. | ||
| self._extractor = DPAFineTuner( |
There was a problem hiding this comment.
The frozen-model predictor loads the saved type_map into self._type_map, but the descriptor extractor is constructed without that map. _extract_and_condition() validates against self._type_map, then _extract_features() uses the extractor's own empty/default type map state. For data without type_map.raw, this can compute descriptors with the wrong checkpoint atom-type indices. Please pass/sync the saved type map into the extractor before validation/extraction.
Authored by OpenClaw (model: custom-chat-jinzhezeng-group/gpt-5.5)
| t.train_data if isinstance(t.train_data, list) else [t.train_data] | ||
| ) | ||
|
|
||
| training = { |
There was a problem hiding this comment.
MFTFineTuner.fit(..., valid_data=...) stores valid_data, but the generated MFT config never emits a validation_data block for either branch. As a result, callers who provide validation data silently train without validation. Please either wire valid_data into the config or reject it explicitly.
Authored by OpenClaw (model: custom-chat-jinzhezeng-group/gpt-5.5)
| ) | ||
| # Paper default 0.5/0.5; aux_prob (default 0.5) controls the split, the | ||
| # downstream share is the complement. Legacy keeps downstream at 1.0. | ||
| downstream_prob = (1.0 - t.aux_prob) if is_property else 1.0 |
There was a problem hiding this comment.
aux_prob is not range-validated before using 1.0 - t.aux_prob. Values outside [0, 1] produce negative model sampling probabilities (for example aux_prob=1.2 gives downstream -0.2), which will fail later or train with invalid branch weights. Please validate this in the tuner constructor before building the DeepMD input.
Authored by OpenClaw (model: custom-chat-jinzhezeng-group/gpt-5.5)
| ) | ||
| return str(latest) | ||
|
|
||
| if self.fparam_dim > 0: |
There was a problem hiding this comment.
When fparam_dim > 0, this validates only the training systems. Validation systems can still be missing set.*/fparam.npy or have a different fparam width, so dp --pt train will fail later or validate with inconsistent feature dimensions. Please validate valid_systems with the same fparam_dim before writing/running the config.
Authored by OpenClaw (model: custom-chat-jinzhezeng-group/gpt-5.5)
| self._condition_manager = None | ||
| if self.fparam_dim > 0: | ||
| conditions = _read_fparam_from_systems(systems) | ||
| if conditions is not None: |
There was a problem hiding this comment.
For frozen sklearn training, requested fparams are silently ignored if _read_fparam_from_systems() returns None. If fparam_dim > 0, missing fparam data should be a hard error, not a fallback to a model without conditions. This also needs to ensure all systems have fparams with the expected width, otherwise a partial read can concatenate condition rows against the wrong descriptor rows.
Authored by OpenClaw (model: custom-chat-jinzhezeng-group/gpt-5.5)
| "fmt is not supported for mft evaluate(); " | ||
| "provide deepmd/npy system directories." | ||
| ) | ||
| result = self._ensure_mft().predict(data) |
There was a problem hiding this comment.
The public wrapper's MFT evaluate() always calls MFTFineTuner.predict(), but predict() explicitly rejects downstream_task_type='ener'. MFTFineTuner.evaluate() already supports the energy-mode path, so legacy energy-mode MFT evaluation is unreachable through DPAFineTuner.evaluate(). Please dispatch to MFTFineTuner.evaluate() for energy mode.
Authored by OpenClaw (model: custom-chat-jinzhezeng-group/gpt-5.5)
| "freeze() was called before fit(). Train the model with fit() first." | ||
| ) | ||
|
|
||
| bundle = { |
There was a problem hiding this comment.
After frozen_head, finetune, or mft training, _fitted is set to True, so this freeze() path is allowed even though no sklearn predictor/target metadata was fit. The resulting bundle has predictor=None (and default task metadata) and can be loaded by DPAPredictor only to fail or behave nonsensically. Please restrict this freeze format to the sklearn strategy, or implement separate serialization for the other strategies.
Authored by OpenClaw (model: custom-chat-jinzhezeng-group/gpt-5.5)
njzjz
left a comment
There was a problem hiding this comment.
I tested this PR locally on 7111f678f1df3d35679a2b7f49fbe3b686ceda41 with srun --gres=gpu:1 on an RTX 5090. After installing -e .[dpa-adapt] and replacing the CPU torch wheel with torch 2.12.1+cu129, CUDA was visible from the venv (torch.cuda.is_available() == True).
What passed:
python -m pytest source/tests/dpa_adapt/ -v --ignore=source/tests/dpa_adapt/test_trainer_dim_case_embd.py: 293 passed, 12 skipped.python -m pytest source/tests/dpa_adapt/test_backend_contract.py -v: 7 passed with CUDA torch.srun --gres=gpu:1 ... python examples/dpa_adapt/scripts/run_evaluate_frozen_sklearn.py: completed, MAE 1.1801 eV, RMSE 1.4642 eV, R2 -0.5223.srun --gres=gpu:1 ... python examples/dpa_adapt/scripts/run_evaluate_frozen_head.py: completed numerically, but exposed the issue in the inline comment: the spawneddp --pt traincame from/home/jzzeng/miniconda3/bin/dpinstead of the active venv'sdp.
Requesting changes because the dp subprocess resolution can silently run a different DeePMD-kit/torch environment from the one importing dpa_adapt, so the training/evaluation paths are not reliable in common symlinked-venv setups.
| from pathlib import Path as _Path | ||
|
|
||
| exe_name = "dp.exe" if _os.name == "nt" else "dp" | ||
| candidate = _Path(_sys.executable).resolve().parent / exe_name |
There was a problem hiding this comment.
This escapes the active virtualenv when sys.executable is a symlink. In my local venv, sys.executable is /home/jzzeng/codes/deepmd-kit/venv/bin/python, but Path(sys.executable).resolve().parent becomes /home/jzzeng/miniconda3/bin, so resolve_dp_command() returns /home/jzzeng/miniconda3/bin/dp even though shutil.which('dp') points at /home/jzzeng/codes/deepmd-kit/venv/bin/dp. The frozen_head example then printed Running: /home/jzzeng/miniconda3/bin/dp --pt train ..., i.e. it trained with a different DeePMD-kit/torch install (deepmd-kit 3.2.0b1.dev42, torch 2.10.0+cu128) than the PR venv (deepmd-kit 3.2.0b1.dev203, torch 2.12.1+cu129).
Please do not dereference the interpreter symlink here. Use the scripts directory for the active environment, e.g. Path(sys.executable).parent / exe_name or sysconfig.get_path('scripts'), before falling back to shutil.which('dp').
There was a problem hiding this comment.
f"{sys.executable} -m deepmd has the same effect
Summary
This PR adds DPA-ADAPT, a toolkit for adapting pretrained DPA models to downstream atomistic property prediction tasks.
The new package provides a scikit-learn-style Python API and standalone CLI for fine-tuning, descriptor extraction, prediction, evaluation, cross-validation, and data preparation, without requiring users to manually write DeePMD-kit training input files.
Main changes
dpa_adaptPython package.dpa-adaptdpaadfrozen_sklearn: frozen DPA descriptors with scikit-learn regressorsfrozen_head: train a property head on top of a frozen DPA backbonefinetune: end-to-end DPA fine-tuningmft: multi-task fine-tuning with auxiliary energy/force trainingfparam.npydoc/dpa_adapt/.examples/dpa_adapt/.dpa-adaptoptional dependencies inpyproject.toml.source/tests/dpa_adapt/.Co-authored-by: zirenjin <zirenjin@umich.edu>