faker-pyspark
is a PySpark DataFrame and Schema (StructType) provider for the Faker
Python package.
faker-pyspark
provides PySpark based fake data for testing purposes. The definition of "fake" in this context really means "random," as the data may look real. However, I make no claims about accuracy, so do not use this as real data!
Install with pip:
pip install faker-pyspark
Add as a provider to your Faker instance:
from faker import Faker
from faker_pyspark import PySparkProvider
fake = Faker()
fake.add_provider(PySparkProvider)
>>> df = fake.pyspark_dataframe()
>>> schema = fake.pyspark_schema()
>>> df_updated = fake.pyspark_update_dataframe(df)
>>> column_names = fake.pyspark_column_names()
>>> data = fake.pyspark_data_dict_using_schema(schema)
>>> data = fake.pyspark_data_dict()
$ faker pyspark_schema -i faker_pyspark
$ faker pyspark_dataframe -i faker_pyspark
$ faker pyspark_schema -i faker_pyspark
$ faker pyspark_column_names -i faker_pyspark
$ faker pyspark_data_dict -i faker_pyspark