Model Configuration¶
The issue tracker on Github is being used to track additions to this documentation section. Please see issue 37.
Configuration File¶
The batch running script uses an YAML file to parameterize the run. The YAML file uses several sections:
dataset
describes dataset attributes common to all analysisYATSM
describes model parameters common to all analysis and declares what change detection algorithm should be runclassification
describes classification training data inputsphenology
describes phenology fitting parameters
The following tables describes the meanings of the parameter and values used
in the configuration file used in YATSM. Any parameters left blank will be
interpreted as None
(e.g., cache_line_dir =
).
Dataset Parameters¶
Note
This section is out of date for v0.5.0 and requires re-writing
Note: you can use scripts/gen_date_file.sh
to generate the CSV
file for input_file
.
Model Parameters¶
Note
This section is out of date for v0.5.0 and requires re-writing
Phenology¶
The option for long term mean phenology calculation is an optional addition to YATSM. As such, visit the phenology guide page for configuration options.
Classification¶
The scripts included in YATSM which perform classification utilize a configuration INI file that specify which algorithm will be used and the parameters for said algorithm. The configuration details specified along with the dataset and YATSM algorithm options deal with the training data, not the algorithm details. These training data configuration options include:
Parameter | Data Type | Explanation |
---|---|---|
training_data |
str |
Training data raster image containing labeled pixels |
mask_values |
list |
Values within the training data image to mask or ignore |
training_start |
str |
Earliest date that training data are applicable. Training data labels will be paired with models that begin at least before this date |
training_end |
str |
Latest date that training data are applicable. Training data labels will be paired with models that end at least after this date |
training_date_format |
str |
Format specification that maps training_start and training_end to a Python datetime object (e.g., %Y-%m-%d ) |
cache_xy |
str |
Filename used for caching paired X features and y training labels |
Example¶
An example template of the parameter file is located within
examples/p013r030/p013r030.yaml
:
# Example configuration file for YATSM line runner
#
# This configuration includes details about the dataset and how YATSM should
# run
# Version of config
version: "0.6.0"
dataset:
# Text file containing dates and images
input_file: "/home/ceholden/Documents/yatsm/examples/p013r030/images.csv"
# Input date format
date_format: "%Y%j"
# Output location
output: "/home/ceholden/Documents/landsat_stack/p013r030/subset/YATSM"
# Output file prefix (e.g., [prefix]_[line].npz)
output_prefix: "yatsm_r"
# Total number of bands
n_bands: 8
# Mask band (e.g., Fmask)
mask_band: 8
# List of integer values to mask within the mask band
mask_values: [2, 3, 4, 255]
# Valid range of band data
# specify 1 range for all bands, or specify ranges for each band
min_values: 0
max_values: 10000
# Indices for multi-temporal cloud masking (indexed on 1)
green_band: 2
swir1_band: 5
# Use BIP image reader? If not, use GDAL to read in
use_bip_reader: False
# Directory location for caching dataset lines
cache_line_dir: "/home/ceholden/Documents/landsat_stack/p013r030/subset/cache"
# Parameters common to all timeseries analysis models within YATSM package
YATSM:
algorithm: "CCDCesque"
prediction: "GLMNET_Lasso20"
design_matrix: "1 + x + harm(x, 1) + harm(x, 2) + harm(x, 3)"
reverse: False
commission_alpha:
# Re-fit each segment, adding new coefficients & RMSE info to record
refit:
prefix: [robust]
prediction: [RLM]
stay_regularized: [True]
# Parameters for CCDCesque algorithm -- referenced by "algorithm" key in YATSM
CCDCesque:
init: # hyperparameters
consecutive: 5
threshold: 3.5
min_obs: 24
min_rmse: 150
test_indices: [2, 3, 4, 5]
retrain_time: 365.25
screening: RLM
screening_crit: 400.0
slope_test: False
remove_noise: True
dynamic_rmse: False
# Regression estimators
LassoCV:
pickle: "/home/ceholden/Documents/yatsm/yatsm/regression/pickles/sklearn_LassoCV_n50.pkl"
fit: # optional arguments to the ``fit`` method of the predictor
Lasso20:
pickle: "/home/ceholden/Documents/yatsm/yatsm/regression/pickles/sklearn_Lasso20.pkl"
OLS:
pickle: "/home/ceholden/Documents/yatsm/yatsm/regression/pickles/OLS.pkl"
GLMNET_LassoCV:
pickle: "/home/ceholden/Documents/yatsm/yatsm/regression/pickles/glmnet_LassoCV_n50.pkl"
GLMNET_Lasso20:
pickle: "/home/ceholden/Documents/yatsm/yatsm/regression/pickles/glmnet_Lasso20.pkl"
fit:
# 8 penalties for 8 coefficients
penalties: [1, 0, 1, 1, 1, 1, 1, 1]
RLM:
pickle: "/home/ceholden/Documents/yatsm/yatsm/regression/pickles/rlm_maxiter10.pkl"
# Section for phenology fitting
phenology:
enable: True
init:
# Specification for dataset indices required for EVI based phenology monitoring
red_index: 2
nir_index: 3
blue_index: 0
# Scale factor for reflectance bands
scale: 0.0001
# You can also specify index of EVI if contained in dataset to override calculation
evi_index:
evi_scale:
# Number of years to group together when normalizing EVI to upper and lower percentiles
year_interval: 3
# Upper and lower percentiles of EVI used for max/min scaling
q_min: 10
q_max: 90
# Section for segmentation
segment:
# Segmentation image
segmentation:
# Resegmentation threshold (0 turns off resegmentation)
resegment_crit: 0
# Resegmentation size thresholds
resegment_minpix: 5
resegment_maxpix: 50
# Section for training and classification
classification:
# Training data file
training_image: "/home/ceholden/Documents/yatsm/examples/training_data.gtif"
# Training data masked values
roi_mask_values: [0, 255]
# Date range
training_start: "1999-01-01"
training_end: "2001-01-01"
training_date_format: "%Y-%m-%d"
# Cache X feature input and y labels for training data image into file?
cache_training: