You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When trying to reverse_transform categorical data using rdt.transformers.OneHotEncoder having more than 500 categories the memory usage is being increased drastically.
Steps to reproduce
In [1]: importpandasaspdIn [2]: data=pd.DataFrame({
...: 'categorical': [str(x) forxinrange(600)] *10
...: })
In [3]: fromrdt.hyper_transformerimportHyperTransformerIn [4]: fromrdt.transformersimportOneHotEncodingTransformerIn [5]: ht=HyperTransformer(transformers={'categorical': OneHotEncodingTransformer()})
In [6]: ht.fit(data)
In [7]: transform=ht.transform(data)
In [8]: reverse=ht.reverse_transform(transform)
On the image can be observed the memory consumption which increases when reverse_transform is run.
The main cause of this memory increase is that using pandas.DataFrame.pop duplicates the data frame view and increases the memory used by the application instead of reducing it. When we try to pop more than 500 columns we end up creating 500+ views of the data frame which ends up increasing the memory drastically.
The text was updated successfully, but these errors were encountered:
Environment Details
Error Description
When trying to
reverse_transform
categorical data usingrdt.transformers.OneHotEncoder
having more than 500 categories the memory usage is being increased drastically.Steps to reproduce
On the image can be observed the memory consumption which increases when
reverse_transform
is run.This issue was originally raised on sdv-dev/SDV#304
Cause
The main cause of this memory increase is that using
pandas.DataFrame.pop
duplicates the data frame view and increases the memory used by the application instead of reducing it. When we try topop
more than 500 columns we end up creating 500+ views of the data frame which ends up increasing the memory drastically.The text was updated successfully, but these errors were encountered: