mbi.Dataset
- class mbi.Dataset(data: ArrayLike | dict[str, ArrayLike], domain: Domain, weights: ndarray | None = None)[source]
Bases:
objectcreate a Dataset object
- Parameters:
data – a numpy array (n x d) or a dictionary of 1d arrays (length n), keyed by attribute.
domain – a domain object
weight – weight for each row
Methods
__init__create a Dataset object
Compresses the dataset by mapping domain elements to a smaller domain.
return the database in vector-of-counts form
Decompresses the dataset by reversing the mapping.
Returns a new Dataset with the specified columns removed.
Load data into a dataset object
project dataset onto a subset of columns
Generate synthetic data conforming to the given domain
Attributes
Returns the number of records (rows) in the dataset.
- property df
- static synthetic(domain: Domain, N: int) Dataset[source]
Generate synthetic data conforming to the given domain
- Parameters:
domain – The domain object
N – the number of individuals
- static load(path: str, domain: str | Domain) Dataset[source]
Load data into a dataset object
- Parameters:
path – path to csv file
domain – path to json file encoding the domain information
- project(cols: int | str | Sequence[str] | Sequence[int]) Factor[source]
project dataset onto a subset of columns
- property records: int
Returns the number of records (rows) in the dataset.
- datavector(flatten: bool = True) ndarray[tuple[Any, ...], dtype[_ScalarT]][source]
return the database in vector-of-counts form
- compress(mapping: dict[str, ndarray]) Dataset[source]
Compresses the dataset by mapping domain elements to a smaller domain.
- Parameters:
mapping – A dictionary where keys are attribute names and values are 1D arrays. mapping[attr][i] gives the new value for original value i.
- Returns:
A new Dataset with transformed values and updated domain.
- decompress(mapping: dict[str, ndarray]) Dataset[source]
Decompresses the dataset by reversing the mapping. Since the mapping is surjective, the reverse mapping is one-to-many. We sample uniformly from the possible original values.
- Parameters:
mapping – The same mapping dictionary used for compression.
- Returns:
A new Dataset with restored domain size and sampled values.