mbi.Dataset

class mbi.Dataset(data: ArrayLike | dict[str, ArrayLike], domain: Domain, weights: ndarray | None = None)[source]

Bases: object

create a Dataset object

Parameters:
  • data – a numpy array (n x d) or a dictionary of 1d arrays (length n), keyed by attribute.

  • domain – a domain object

  • weight – weight for each row

Methods

__init__

create a Dataset object

compress

Compresses the dataset by mapping domain elements to a smaller domain.

datavector

return the database in vector-of-counts form

decompress

Decompresses the dataset by reversing the mapping.

drop

Returns a new Dataset with the specified columns removed.

load

Load data into a dataset object

project

project dataset onto a subset of columns

supports

synthetic

Generate synthetic data conforming to the given domain

to_dict

Attributes

df

records

Returns the number of records (rows) in the dataset.

to_dict() dict[str, ndarray][source]
property df
static synthetic(domain: Domain, N: int) Dataset[source]

Generate synthetic data conforming to the given domain

Parameters:
  • domain – The domain object

  • N – the number of individuals

static load(path: str, domain: str | Domain) Dataset[source]

Load data into a dataset object

Parameters:
  • path – path to csv file

  • domain – path to json file encoding the domain information

project(cols: int | str | Sequence[str] | Sequence[int]) Factor[source]

project dataset onto a subset of columns

supports(cols: str | Sequence[str]) bool[source]
drop(cols: Sequence[str]) Factor[source]

Returns a new Dataset with the specified columns removed.

property records: int

Returns the number of records (rows) in the dataset.

datavector(flatten: bool = True) ndarray[tuple[Any, ...], dtype[_ScalarT]][source]

return the database in vector-of-counts form

compress(mapping: dict[str, ndarray]) Dataset[source]

Compresses the dataset by mapping domain elements to a smaller domain.

Parameters:

mapping – A dictionary where keys are attribute names and values are 1D arrays. mapping[attr][i] gives the new value for original value i.

Returns:

A new Dataset with transformed values and updated domain.

decompress(mapping: dict[str, ndarray]) Dataset[source]

Decompresses the dataset by reversing the mapping. Since the mapping is surjective, the reverse mapping is one-to-many. We sample uniformly from the possible original values.

Parameters:

mapping – The same mapping dictionary used for compression.

Returns:

A new Dataset with restored domain size and sampled values.