yatsm.algorithms.yatsm module

Yet Another TimeSeries Model baseclass

class yatsm.algorithms.yatsm.YATSM(test_indices=None, estimator={'object': <MagicMock name='mock.linear_model.Lasso()' id='139631632731920'>, 'fit': {}}, **kwargs)[source]

Bases: object

Yet Another TimeSeries Model baseclass

Note

When YATSM objects are fit, the intended order of method calls is:

  1. Setup the model with setup()

  2. Preprocess a time series for one unit area with preprocess()

  3. Fit the time series with the YATSM model using fit()

  4. A fitted model can be used to

    • Predict on additional design matrixes with predict()
    • Plot diagnostic information with plot()
    • Return goodness of fit diagnostic metrics with score()

Note

Record structured arrays must contain the following:

  • start (int): starting dates of timeseries segments
  • end (int): ending dates of timeseries segments
  • break (int): break dates of timeseries segments
  • coef (double (n x p shape)): number of bands x number of features coefficients matrix for predictions
  • rmse (double (n length)): Root Mean Squared Error for each band
  • px (int): pixel X coordinate
  • py (int): pixel Y coordinate
Parameters:
  • test_indices (numpy.ndarray) – Test for changes with these indices of Y. If not provided, all series in Y will be used as test indices
  • estimator (dict) – dictionary containing estimation model from scikit-learn used to fit and predict timeseries and, optionally, a dict of options for the estimation model fit method (default: {'object': Lasso(alpha=20), 'fit': {}})
  • kwargs (dict) – dictionary of addition keyword arguments (for sub-classes)
Variables:
  • record_template (numpy.ndarray) – An empty NumPy structured array that is a template for the model’s record
  • models (numpy.ndarray) – prediction model objects
  • record (numpy.ndarray) – NumPy structured array containing timeseries model attribute information
  • n_record (int) – number of recorded segments in time series model
  • n_series (int) – number of bands in Y
  • px (int) – pixel X location or index
  • n_features (int) – number of coefficients in X design matrix
  • py (int) – pixel Y location or index
fit(X, Y, dates)[source]

Fit timeseries model

Parameters:
  • X (numpy.ndarray) – design matrix (number of observations x number of features)
  • Y (numpy.ndarray) – independent variable matrix (number of series x number of observations)
  • dates (numpy.ndarray) – ordinal dates for each observation in X/Y
Returns:

NumPy structured array containing timeseries

model attribute information

Return type:

numpy.ndarray

fit_models(X, Y, bands=None)[source]

Fit timeseries models for bands within Y for a given X

Updates or initializes fit for self.models

Parameters:
  • X (numpy.ndarray) – design matrix (number of observations x number of features)
  • Y (numpy.ndarray) – independent variable matrix (number of series x number of observations) observation in the X design matrix
  • bands (iterable) – Subset of bands of Y to fit. If None are provided, fit all bands in Y
plot(X, Y, dates, **config)[source]

Plot the timeseries model results

Parameters:
  • X (numpy.ndarray) – design matrix (number of observations x number of features)
  • Y (numpy.ndarray) – independent variable matrix (number of series x number of observations)
  • dates (numpy.ndarray) – ordinal dates for each observation in X/Y
  • config (dict) – YATSM configuration dictionary from user, including ‘dataset’ and ‘YATSM’ sub-configurations
predict(X, dates, series=None)[source]

Return a 2D NumPy array of y-hat predictions for a given X

Predictions are made from ensemble of timeseries models such that predicted values are generated for each date using the model from the timeseries segment that intersects each date.

Parameters:
  • X (numpy.ndarray) – Design matrix (number of observations x number of features)
  • dates (int or numpy.ndarray) – A single ordinal date or a np.ndarray of length X.shape[0] specifying the ordinal dates for each prediction
  • series (iterable, optional) – Return prediction for subset of series within timeseries model. If None is provided, returns predictions from all series
Returns:

Prediction for given X (number of series x number of

observations)

Return type:

numpy.ndarray

preprocess(X, Y, dates, min_values=None, max_values=None, mask_band=None, mask_values=None, **kwargs)[source]

Preprocess a unit area of data (e.g., pixel, segment, etc.)

This preprocessing step will remove all observations that either fall outside of the minimum/maximum range of the data or are flagged for masking in the mask_band variable in Y. If min_values or max_values are not specified, this masking step is skipped. Similarly, masking based on a QA/QC or cloud mask will not be performed if mask_band or mask_values are not provided.

Parameters:
  • X (numpy.ndarray) – design matrix (number of observations x number of features)
  • Y (numpy.ndarray) – independent variable matrix (number of series x number of observations)
  • dates (numpy.ndarray) – ordinal dates for each observation in X/Y
  • min_values (np.ndarray) – Minimum possible range of values for each variable in Y (optional)
  • max_values (np.ndarray) – Maximum possible range of values for each variable in Y (optional)
  • mask_band (int) – The mask band in Y (optional)
  • mask_values (sequence) – A list or np.ndarray of values in the mask_band to mask (optional)
Returns:

X, Y, and dates after

being preprocessed and masked

Return type:

tuple (np.ndarray, np.ndarray, np.ndarray)

score(X, Y, dates)[source]

Return timeseries model performance scores

Parameters:
  • X (numpy.ndarray) – design matrix (number of observations x number of features)
  • Y (numpy.ndarray) – independent variable matrix (number of series x number of observations)
  • dates (numpy.ndarray) – ordinal dates for each observation in X/Y
Returns:

performance summary statistics

Return type:

namedtuple

setup(df, **config)[source]

Setup model for input dataset and (optionally) return design matrix

Parameters:
  • df (pandas.DataFrame) – Pandas dataframe containing dataset attributes (e.g., dates, image ID, path/row, metadata, etc.)
  • config (dict) – YATSM configuration dictionary from user, including ‘dataset’ and ‘YATSM’ sub-configurations
Returns:

return design matrix if used by algorithm

Return type:

numpy.ndarray or None

record_template

YATSM record template for features in X and series in Y

Record template will set px and py if defined as class attributes. Otherwise px and py coordinates will default to 0.

Returns:
NumPy structured array containing a template of a
YATSM record
Return type:numpy.ndarray