Model Configuration

The issue tracker on Github is being used to track additions to this documentation section. Please see issue 37.

Configuration File

The batch running script uses an YAML file to parameterize the run. The YAML file uses several sections:

  1. dataset describes dataset attributes common to all analysis
  2. YATSM describes model parameters common to all analysis and declares what change detection algorithm should be run
  3. classification describes classification training data inputs
  4. phenology describes phenology fitting parameters

The following tables describes the meanings of the parameter and values used in the configuration file used in YATSM. Any parameters left blank will be interpreted as None (e.g., cache_line_dir =).

Dataset Parameters

Note

This section is out of date for v0.5.0 and requires re-writing

Note: you can use scripts/gen_date_file.sh to generate the CSV file for input_file.

Model Parameters

Note

This section is out of date for v0.5.0 and requires re-writing

Phenology

The option for long term mean phenology calculation is an optional addition to YATSM. As such, visit the phenology guide page for configuration options.

Classification

The scripts included in YATSM which perform classification utilize a configuration INI file that specify which algorithm will be used and the parameters for said algorithm. The configuration details specified along with the dataset and YATSM algorithm options deal with the training data, not the algorithm details. These training data configuration options include:

Parameter Data Type Explanation
training_data str Training data raster image containing labeled pixels
mask_values list Values within the training data image to mask or ignore
training_start str Earliest date that training data are applicable. Training data labels will be paired with models that begin at least before this date
training_end str Latest date that training data are applicable. Training data labels will be paired with models that end at least after this date
training_date_format str Format specification that maps training_start and training_end to a Python datetime object (e.g., %Y-%m-%d)
cache_xy str Filename used for caching paired X features and y training labels

Example

An example template of the parameter file is located within examples/p013r030/p013r030.yaml:

# Example configuration file for YATSM line runner
#
# This configuration includes details about the dataset and how YATSM should
# run

# Version of config
version: "0.6.0"

dataset:
    # Text file containing dates and images
    input_file: "/home/ceholden/Documents/yatsm/examples/p013r030/images.csv"
    # Input date format
    date_format: "%Y%j"
    # Output location
    output: "/home/ceholden/Documents/landsat_stack/p013r030/subset/YATSM"
    # Output file prefix (e.g., [prefix]_[line].npz)
    output_prefix: "yatsm_r"
    # Total number of bands
    n_bands: 8
    # Mask band (e.g., Fmask)
    mask_band: 8
    # List of integer values to mask within the mask band
    mask_values: [2, 3, 4, 255]
    # Valid range of band data
    # specify 1 range for all bands, or specify ranges for each band
    min_values: 0
    max_values: 10000
    # Indices for multi-temporal cloud masking (indexed on 1)
    green_band: 2
    swir1_band: 5
    # Use BIP image reader? If not, use GDAL to read in
    use_bip_reader: False
    # Directory location for caching dataset lines
    cache_line_dir: "/home/ceholden/Documents/landsat_stack/p013r030/subset/cache"

# Parameters common to all timeseries analysis models within YATSM package
YATSM:
    algorithm: "CCDCesque"
    prediction: "GLMNET_Lasso20"
    design_matrix: "1 + x + harm(x, 1) + harm(x, 2) + harm(x, 3)"
    reverse: False
    commission_alpha:
    # Re-fit each segment, adding new coefficients & RMSE info to record
    refit:
        prefix: [robust]
        prediction: [RLM]
        stay_regularized: [True]

# Parameters for CCDCesque algorithm -- referenced by "algorithm" key in YATSM
CCDCesque:
    init:  # hyperparameters
        consecutive: 5
        threshold: 3.5
        min_obs: 24
        min_rmse: 150
        test_indices: [2, 3, 4, 5]
        retrain_time: 365.25
        screening: RLM
        screening_crit: 400.0
        slope_test: False
        remove_noise: True
        dynamic_rmse: False

# Regression estimators
LassoCV:
    pickle: "/home/ceholden/Documents/yatsm/yatsm/regression/pickles/sklearn_LassoCV_n50.pkl"
    fit:  # optional arguments to the ``fit`` method of the predictor

Lasso20:
    pickle: "/home/ceholden/Documents/yatsm/yatsm/regression/pickles/sklearn_Lasso20.pkl"

OLS:
    pickle: "/home/ceholden/Documents/yatsm/yatsm/regression/pickles/OLS.pkl"

GLMNET_LassoCV:
    pickle: "/home/ceholden/Documents/yatsm/yatsm/regression/pickles/glmnet_LassoCV_n50.pkl"

GLMNET_Lasso20:
    pickle: "/home/ceholden/Documents/yatsm/yatsm/regression/pickles/glmnet_Lasso20.pkl"
    fit:
        # 8 penalties for 8 coefficients
        penalties: [1, 0, 1, 1, 1, 1, 1, 1]

RLM:
    pickle: "/home/ceholden/Documents/yatsm/yatsm/regression/pickles/rlm_maxiter10.pkl"

# Section for phenology fitting
phenology:
    enable: True
    init:
        # Specification for dataset indices required for EVI based phenology monitoring
        red_index: 2
        nir_index: 3
        blue_index: 0
        # Scale factor for reflectance bands
        scale: 0.0001
        # You can also specify index of EVI if contained in dataset to override calculation
        evi_index:
        evi_scale:
        # Number of years to group together when normalizing EVI to upper and lower percentiles
        year_interval: 3
        # Upper and lower percentiles of EVI used for max/min scaling
        q_min: 10
        q_max: 90

# Section for segmentation
segment:
    # Segmentation image
    segmentation:
    # Resegmentation threshold (0 turns off resegmentation)
    resegment_crit: 0
    # Resegmentation size thresholds
    resegment_minpix: 5
    resegment_maxpix: 50

# Section for training and classification
classification:
    # Training data file
    training_image: "/home/ceholden/Documents/yatsm/examples/training_data.gtif"
    # Training data masked values
    roi_mask_values: [0, 255]
    # Date range
    training_start: "1999-01-01"
    training_end: "2001-01-01"
    training_date_format: "%Y-%m-%d"
    # Cache X feature input and y labels for training data image into file?
    cache_training: