Model Configuration

The issue tracker on Github is being used to track additions to this documentation section. Please see issue 37.

Configuration File

The batch running script uses an YAML file to parameterize the run. The YAML file uses several sections:

  1. dataset: Describes dataset attributes common to all analysis
  2. YATSM: Contains model parameters common to all analysis and declares what change detection algorithm should be run. The algorithm specified should be a section within the configuration file.
  3. ${ALGORITHM}: The section referenced as algorithm in YATSM describing how the time series analysis should be run (e.g., a section CCDCesque when algorithm: "CCDCesque")
  4. phenology: Describes phenology fitting parameters
  5. classification: Describes classification training data inputs

The following tables describes the meanings of the parameter and values used in the configuration file used in YATSM. Any parameters left blank will be interpreted as None (e.g., cache_line_dir =).

Dataset Parameters


This section is out of date for v0.5.0 and requires re-writing

Note: you can use scripts/ to generate the CSV file for input_file.

YATSM Analysis Parameters


This section is out of date for v0.5.0 and requires re-writing

YATSM Algorithm Parameters

This section will differ depending on what algorithm is used and specified in the algorithm key in the YATSM section. For more information, visit the science models guide section for information the available algorithms.


The option for long term mean phenology calculation is an optional addition to YATSM. As such, visit the phenology guide page for configuration options.


The scripts included in YATSM which perform classification utilize a configuration INI file that specify which algorithm will be used and the parameters for said algorithm. The configuration details specified along with the dataset and YATSM algorithm options deal with the training data, not the algorithm details. These training data configuration options include:

Parameter Data Type Explanation
training_data str Training data raster image containing labeled pixels
mask_values list Values within the training data image to mask or ignore
training_start str Earliest date that training data are applicable. Training data labels will be paired with models that begin at least before this date
training_end str Latest date that training data are applicable. Training data labels will be paired with models that end at least after this date
training_date_format str Format specification that maps training_start and training_end to a Python datetime object (e.g., %Y-%m-%d)
cache_xy str Filename used for caching paired X features and y training labels


An example template of the parameter file is located within examples/p013r030/p013r030.yaml:

# Example configuration file for YATSM line runner
# This configuration includes details about the dataset and how YATSM should
# run

# Version of config
version: "0.7.0"

    # Text file containing dates and images
    input_file: "examples/p013r030/images.csv"
    # Input date format
    date_format: "%Y%j"
    # Output location
    output: "landsat_stack/p013r030/subset/YATSM"
    # Output file prefix (e.g., [prefix]_[line].npz)
    output_prefix: "yatsm_r"
    # Total number of bands
    n_bands: 8
    # Mask band (e.g., Fmask)
    mask_band: 8
    # List of integer values to mask within the mask band
    mask_values: [2, 3, 4, 255]
    # Valid range of band data
    # specify 1 range for all bands, or specify ranges for each band
    min_values: 0
    max_values: 10000
    # Use BIP image reader? If not, use GDAL to read in
    use_bip_reader: False
    # Directory location for caching dataset lines
    cache_line_dir: "landsat_stack/p013r030/subset/cache"

# Parameters common to all timeseries analysis models within YATSM package
    algorithm: "CCDCesque"
    prediction: "sklearn_Lasso20"
    design_matrix: "1 + x + harm(x, 1) + harm(x, 2) + harm(x, 3)"
    reverse: False
    # Re-fit each segment, adding new coefficients & RMSE info to record
        prefix: [rlm_]
        prediction: [rlm_maxiter10]
        stay_regularized: [True]

# Parameters for CCDCesque algorithm -- referenced by "algorithm" key in YATSM
    init:  # hyperparameters
        consecutive: 5
        threshold: 3.5
        min_obs: 24
        min_rmse: 150
        test_indices: [2, 3, 4, 5]
        retrain_time: 365.25
        screening: RLM
        screening_crit: 400.0
        slope_test: False
        remove_noise: True
        dynamic_rmse: False
        # Indices for multi-temporal cloud masking (indexed on 1)
        green_band: 2
        swir1_band: 5

# Section for phenology fitting
    enable: True
        # Specification for dataset indices required for EVI based phenology monitoring
        red_index: 2
        nir_index: 3
        blue_index: 0
        # Scale factor for reflectance bands
        scale: 0.0001
        # You can also specify index of EVI if contained in dataset to override calculation
        # Number of years to group together when normalizing EVI to upper and lower percentiles
        year_interval: 3
        # Upper and lower percentiles of EVI used for max/min scaling
        q_min: 10
        q_max: 90

# Section for training and classification
    # Training data file
    training_image: "training_data.gtif"
    # Training data masked values
    roi_mask_values: [0, 255]
    # Date range
    training_start: "1999-01-01"
    training_end: "2001-01-01"
    training_date_format: "%Y-%m-%d"
    # Cache X feature input and y labels for training data image into file?