You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The new version of SDV allows me to change and update RDT transformers to my liking. For PII columns, I'd like to use pseudo-anonymization instead of full anonymization.
Whenever I try to use the PsuedoAnonymizedFaker, the synthesizer crashes when I try to sample.
Steps to reproduce
(Note that due to #1206, we have to update the metadata for the address column.)
fromsdv.datasets.demoimportdownload_demofromsdv.single_tableimportGaussianCopulaSynthesizerfromrdt.transformers.piiimportPseudoAnonymizedFakerdata, metadata=download_demo(
modality='single_table',
dataset_name='student_placements_pii')
# due to issue #1206, we need to update the metadatametadata.update_column(
column_name='address',
sdtype='address',
pii=True
)
synthesizer=GaussianCopulaSynthesizer(metadata)
synthesizer.auto_assign_transformers(data)
# update address to use psuedo_anonymizationaddress_transformer=PseudoAnonymizedFaker(provider_name='address', function_name='address')
synthesizer.update_transformers(column_name_to_transformer={
'address': address_transformer
})
synthesizer.fit(data)
synthesizer.sample(1)
Stack Trace
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
[/usr/local/lib/python3.8/dist-packages/pandas/core/indexes/base.py](https://localhost:8080/#) in get_loc(self, key, method, tolerance)
3360 try:
-> 3361 return self._engine.get_loc(casted_key)
3362 except KeyError as err:
KeyError: 'address'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
[/usr/local/lib/python3.8/dist-packages/sdv/single_table/base.py](https://localhost:8080/#) in _sample_with_progress_bar(self, num_rows, max_tries_per_batch, batch_size, output_file_path, conditions, show_progress_bar)
713 progress_bar.set_description('Sampling rows')
--> 714 sampled = self._sample_in_batches(
715 num_rows=num_rows,
[/usr/local/lib/python3.8/dist-packages/sdv/single_table/base.py](https://localhost:8080/#) in _sample_in_batches(self, num_rows, batch_size, max_tries_per_batch, conditions, transformed_conditions, float_rtol, progress_bar, output_file_path)
638 for step in range(math.ceil(num_rows / batch_size)):
--> 639 sampled_rows = self._sample_batch(
640 batch_size=batch_size,
[/usr/local/lib/python3.8/dist-packages/sdv/single_table/base.py](https://localhost:8080/#) in _sample_batch(self, batch_size, max_tries, conditions, transformed_conditions, float_rtol, progress_bar, output_file_path)
571 prev_num_valid = num_valid
--> 572 sampled, num_valid = self._sample_rows(
573 num_rows_to_sample,
[/usr/local/lib/python3.8/dist-packages/sdv/single_table/base.py](https://localhost:8080/#) in _sample_rows(self, num_rows, conditions, transformed_conditions, float_rtol, previous_rows)
496
--> 497 sampled = self._data_processor.reverse_transform(sampled)
498
[/usr/local/lib/python3.8/dist-packages/sdv/data_processing/data_processor.py](https://localhost:8080/#) in reverse_transform(self, data, reset_keys)
672 elif column_name in self._keys:
--> 673 column_data = generated_keys[column_name]
674 else:
[/usr/local/lib/python3.8/dist-packages/pandas/core/frame.py](https://localhost:8080/#) in __getitem__(self, key)
3457 return self._getitem_multilevel(key)
-> 3458 indexer = self.columns.get_loc(key)
3459 if is_integer(indexer):
[/usr/local/lib/python3.8/dist-packages/pandas/core/indexes/base.py](https://localhost:8080/#) in get_loc(self, key, method, tolerance)
3362 except KeyError as err:
-> 3363 raise KeyError(key) from err
3364
KeyError: 'address'
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
[<ipython-input-36-3c01bb2cc760>](https://localhost:8080/#) in <module>
----> 1 synthesizer.sample(1)
[/usr/local/lib/python3.8/dist-packages/sdv/single_table/utils.py](https://localhost:8080/#) in handle_sampling_error(is_tmp_file, output_file_path, sampling_error)
78
79 if error_msg:
---> 80 raise type(sampling_error)(error_msg + '\n' + str(sampling_error))
81
82 raise sampling_error
KeyError: "Error: Sampling terminated. Partial results are stored in a temporary file: .sample.csv.temp. This file will be overridden the next time you sample. Please rename the file if you wish to save these results.\n'address'"
The text was updated successfully, but these errors were encountered:
Environment Details
Error Description
The new version of SDV allows me to change and update RDT transformers to my liking. For PII columns, I'd like to use pseudo-anonymization instead of full anonymization.
Whenever I try to use the
PsuedoAnonymizedFaker
, the synthesizer crashes when I try to sample.Steps to reproduce
(Note that due to #1206, we have to update the metadata for the address column.)
Stack Trace
The text was updated successfully, but these errors were encountered: