Repository to the paper: Space Beats Time in Training Machine Learning Models for Carbon Uptake Variability
David Hafezi Rachti, Alexander J. Winkler, and Christian Reimers
Max Planck Institute for Biogeochemistry, Jena, Germany
Machine learning (ML) is widely used to upscale in-situ ecosystem carbon flux observations to the globe. While performing well on seasonal cycles and spatial patterns, these methods notoriously fail to reproduce long-term trends and interannual variability (IAV). Whether this shortcoming stems from low signal-to-noise ratios or insufficient training data remains unclear.
Here, we examine how an interpretable ML framework responds to three progressively larger training datasets with contrasting spatial and temporal properties. Improving spatial representation in the training data increases the ability to predict trends and IAV more than extending the time series (
project/
├── data/
│ ├── pixel_information/
│ │
│ ├── datasets/
│ │ ├── basic_obs/
│ │ ├── space_set/
│ │ ├── time_set/
│ │ ├── timespace_set/
│ │ └── test_sets/
│ │
│ ├── pre_processing_steps/
│ │
│ └── treeFrac.nc
│
├── js_bach_data/
│ └── pre_processing_steps/
│
├── trained_models/
│ └── version1/
│ └── ig_results/
│ └── plots/
│ └── ig_results_anom/
│ └── results/
│ └── model_performance/
│ └── space_set/
│ └── time_set/
│ └── timespace_set/
│
├── data_preprocessing/
│
├── model_evaluation/
│
├── model_training/
│
├── logs/
│ └── slurm_output/
│
├── environment.yml
├── README.md
To reproduce the environment used in this project, install the Conda environment:
conda env create -f environment.yml