Simulatrex Synthetic Audience Docs#

class transforms.autobrew.trft.Subspace.PartitionConfig(group_defining_columns, train_data_defining_columns, partition_at)#

Bases: object

Creates a structured schema for a subpartition training.

Parameters:
  • group_defining_columns (list) – Names of the columns that define a group member

  • train_data_defining_columns (list[dict]) – Schema of the data columns to train on (that are not user defining). This can be question/ reply pairs where column name is the question and col. value is the reply.

  • partition_at (dict) – A parameter (existing column) that splits the groups. Must be a value of group_defining_columns

Examples

>>> # Defines the group (e.g. demographically split users)
>>> group_defining_columns = ['name', 'gender', 'monthly_income']
>>> # Defines the training data schema
>>> train_data_defining_columns = [{
>>>     'columName':  'how much do you  spend on skincare products monthly',
>>>     'dataType': 'string',
>>>     'valueType': 'enum',
>>>     'validOptions': ['option a', 'option b', 'option c']
>>>     },{
>>>     'columnName': 'where do you shop for skincare products?',
>>>     'dataType': 'string',
>>>     'valueType': 'multipleChoice',
>>>     'validOptions': ['Brand websites', 'Department stores', 'option c']
>>>     },
>>>    {
>>>     'columnName': 'tell me something about yourself',
>>>     'dataType': 'string',
>>>     'valueType': 'freeText',
>>>     },
>>> ]
>>> # Divide data into groups
>>> partition_at = {
>>>     'column': 'monthly_income',
>>>     'dataType': 'int',
>>>     'groups': [{
>>>         'name': 'low_spenders',
>>>         'lower_bound': 0,
>>>         'upper_bound': 50,
>>>     },{
>>>         'name': 'mid_spenders',
>>>         'lower_bound': 51,
>>>         'upper_bound': 150,
>>>     },{
>>>         'name': 'high_spenders',
>>>         'lower_bound': 151,
>>>         'upper_bound': inf,
>>>     }]
>>> }
>>> ### Groups can be partitioned with a lower and upper bound (see above). However when working with string data that cannot be directly compared, we need to define all options in a group:
>>> partition_at = {
>>>     'column': 'monthly_income',
>>>     'dataType': 'string',
>>>     'groups': [{
>>>         'name': 'low_spenders',
>>>         'members': ['<5€', '5€-25€', '26€ - 50€'],
>>>     },{
>>>         'name': 'mid_spenders',
>>>         'members': ['51€ - 100€', '100€ - 150€'],
>>>     },{
>>>         'name': 'high_spenders',
>>>         'members': ['above 150€'],
>>>     }]
>>> }

Important

userDefiningcolumns and partition_at depend on each other.The partition_at column must be present inside userDefiningcolumns and the definition in groups must represent the data passed to the trainer.

class transforms.autobrew.trft.Subspace.Subspace(config: PartitionConfig, rank: int = 8, via_ranges=False, via_explicit=True)#

Bases: object

Divides multiple sub groups into partions of the trained layer

Parameters:
create_subspace_partition(via_ranges=False, via_explicit=False) list#

Creates a subspace partition list targeting layers Either defines the layers as layer partition ranges [[0, 384], [384, 768]] or explicitly names the partitions [[[0,1,3,4]]]

Parameters:
  • via_ranges (bool) – Wether or not defining via a range (s. above)

  • via_explicit (bool) – Wether or not defining partitions explicitly (s. above)

Note

Either via_ranges or via_explicit need to be True

get_partition(column_name, key='name', groups='groups')#

Returns the elements for this named partition

Example

>>> subspace.get_partition('low_spenders')
>>> #[0,1,2,3]
mix_subspaces()#