yatsm.classifiers.diagnostics module

class yatsm.classifiers.diagnostics.SpatialKFold(y, row, col, n_folds=3, shuffle=False, random_state=None)[source]

Bases: object

Spatial cross validation iterator

Training data samples physically located next to test samples are likely to be strongly related due to spatial autocorrelation. This violation of independence will artificially inflate crossvalidated measures of algorithm performance.

Provides training and testing indices to split data into training and testing sets. Splits a “Region of Interest” image into k consecutive folds. Each fold is used as a validation set once while k - 1 remaining folds form the training set.

Parameters:
  • y – Labeled features
  • row – Row (y) pixel location for each y
  • col – Column (x) pixel location for each x
  • n_folds – Number of folds (default: 3)
  • shuffle – Shuffle the unique training data regions before splitting into batches (default: False)
  • random_state – Pseudo-random number generator to use for random sampling. If None, default to numpy RNG for shuffling
shuffle = False
class yatsm.classifiers.diagnostics.SpatialKFold_ROI(roi, n_folds=3, mask_values=[0], shuffle=False, random_state=None)[source]

Bases: object

Spatial cross validation iterator on ROI images

Training data samples physically located next to test samples are likely to be strongly related due to spatial autocorrelation. This violation of independence will artificially inflate crossvalidated measures of algorithm performance.

Provides training and testing indices to split data into training and testing sets. Splits a “Region of Interest” image into k consecutive folds. Each fold is used as a validation set once while k - 1 remaining folds form the training set.

Parameters:
  • roi – “Region of interest” matrix providing training data samples of some class
  • n_folds – Number of folds (default: 3)
  • mask_values – one or more values within roi to ignore from sampling (default: [0])
  • shuffle – Shuffle the unique training data regions before splitting into batches (default: False)
  • random_state – Pseudo-random number generator to use for random sampling. If None, default to numpy RNG for shuffling
shuffle = False
yatsm.classifiers.diagnostics.kfold_scores(X, y, algo, kf_generator)[source]

Performs KFold crossvalidation and reports mean/std of scores

Parameters:
  • X – X feature input used in classification
  • y – y labeled examples
  • algo – classifier used from scikit-learn
  • kf_generator – generator for indices used in crossvalidation
Returns:

mean and standard deviation of crossvalidation scores

Return type:

(mean, std)