Parquet workshop using the End of Term Web Archive
- Leveraging Parquet Files for Efficient Web Archive Collection Analytics
- An Introduction to Parquet File Format for Data Analytics
The datasets for this tutorial are available at the following URLs.
- EOTNL.cdxj.gz (206 MB)
- EOTNL.parquet (200 MB)
Download these files and place them into the tutorials
folder in order for the interactive tutorials to work as expected.
If you have questions or comments about this workshop please feel free to contact us:
- Sawood Alam ([email protected])
- Mark Phillips ([email protected])