data.external.drift
UCSD Drift dataset.
- class data.external.drift.DriftDataset(root: str, **kwargs)
Processing logic for the UCSD drift dataset.
Implements the protected
_standardize()
to extract, transform, and save buffer and labels from the raw .dat files downloaded from the UCI repository.See also
- load(indices: slice = None)
Load drift data and labels to memory from the .pt files saved by
_standardize()
. Overwrites buffer and descriptors. If specified, selects only the rows at indices.- Parameters:
indices (slice, optional) – Specific indices to load.
- save()
Dump the contents of the buffer and metadata table to disk at destination. This implementation saves the tensors as .pt files.
- select(conditions: str | list[str], axis: int = 0) slice
Selects sample indices by descriptor values based on a pandas logical statement applied to one or more
AxisDescriptor
objects, aggregated into a dataframe usingaggregate_descriptors()
.Importantly, these filtering operations can be performed prior to loading any data to memory, as they only depend on
AxisDescriptor
objects (labels) attached to this dataset.Note
Applying selection criteria with pd.eval syntax is simple:
“age > 21” would execute pd.eval(“self.table.age > 21”, target=df).
“lobe.isin[‘F’, ‘P’]” would execute pd.eval(“self.table.lobe.isin([‘F’, ‘P’]”, target=df).
Where df is the table attribute of the
AxisDescriptor
the condition is applied to.Warning
This method is general and may eventually be moved to the base
Data
class.