This project provides a tool to download, process, and interactively explore statistics about public packages using the Libraries.io API. It fetches data about all public packages associated with Statistics Norway and presents the results in an interactive table format.
If you're just interested in the processed results, visit the GitHub Pages deployment.
- Package Data Fetching: Fetches data about all public packages from Libraries.io.
- Interactive Table: Displays package data in a dynamic, searchable, and sortable table using Tabulator.js.
- CSV Download: Allows users to download the dataset as a CSV file for offline use.
- DuckDB Integration: Easily query and sample the data in the DuckDB Web Shell for further analysis.
To fetch and process data using the Libraries.io API:
-
Libraries.io API Key:
- You'll need a valid API key from Libraries.io. You can sign up for one here.
- Add your API key to the appropriate part of the data-fetching script.
-
Python Environment:
- Install the required Python dependencies:
pip install pandas requests
- Install the required Python dependencies:
Run the data-fetching script to download the package data:
python fetch_data.py
The script will:
- Fetch all public packages associated with Statistics Norway from Libraries.io.
- Save the results as
results.csv
in thesrc/
directory.
- Open
index.html
in your browser to view the interactive table.
- Use the "Download CSV" button to save the data for offline use.
- Use the "Open in DuckDB Web Shell" button to query the dataset directly in the DuckDB Web Shell.
If you don't want to fetch and process the data yourself, you can access the processed results directly:
The DuckDB Web Shell button includes a query to:
- Load the dataset into a table called
ssb_packages
. - Sample 10 random rows from the table.
The SQL query used:
-- Load CSV file and create a table
CREATE TABLE ssb_packages AS
SELECT *
FROM read_csv_auto('https://trygu.github.io/ssb-pypi-statistics/results.csv');
-- Sample 10 rows from the table
FROM ssb_packages USING SAMPLE 10;
- Ensure the API key is correctly configured in the
fetch_data.py
script before running it. - The data viewer (
index.html
) is designed to use a preprocessedresults.csv
. Modify the DuckDB query URL in the HTML if hosting the dataset elsewhere.
This project is licensed under the MIT License.