data.external.drift

UCSD Drift dataset.

class data.external.drift.DriftDataset(root: str, **kwargs)

Processing logic for the UCSD drift dataset.

Implements the protected _standardize() to extract, transform, and save buffer and labels from the raw .dat files downloaded from the UCI repository.

load(indices: slice = None)

Load drift data and labels to memory from the .pt files saved by _standardize(). Overwrites buffer and descriptors. If specified, selects only the rows at indices.

Parameters:

indices (slice, optional) – Specific indices to load.

save()

Dump the contents of the buffer and metadata table to disk at destination. This implementation saves the tensors as .pt files.

select(conditions: str | list[str], axis: int = 0) slice

Selects sample indices by descriptor values based on a pandas logical statement applied to one or more AxisDescriptor objects, aggregated into a dataframe using aggregate_descriptors().

Importantly, these filtering operations can be performed prior to loading any data to memory, as they only depend on AxisDescriptor objects (labels) attached to this dataset.

Note

Applying selection criteria with pd.eval syntax is simple:

  • “age > 21” would execute pd.eval(“self.table.age > 21”, target=df).

  • “lobe.isin[‘F’, ‘P’]” would execute pd.eval(“self.table.lobe.isin([‘F’, ‘P’]”, target=df).

Where df is the table attribute of the AxisDescriptor the condition is applied to.

Warning

This method is general and may eventually be moved to the base Data class.