Skip to content

Commit

Permalink
Merge pull request #1 from shawndeggans/feature/setup
Browse files Browse the repository at this point in the history
Checking in for first test
  • Loading branch information
shawndeggans authored Sep 23, 2024
2 parents f33f1df + b7e7c02 commit 3e51a99
Show file tree
Hide file tree
Showing 5 changed files with 216 additions and 0 deletions.
48 changes: 48 additions & 0 deletions .github/workflows/docs_generation.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
name: Generate Docs

on:
push:
branches:
- main
paths:
- 'notebooks/**'

jobs:
convert-and-publish:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2

- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: '3.x'

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
- name: Convert notebooks to markdown
run: |
for notebook in notebooks/*.ipynb; do
jupyter nbconvert --to markdown "$notebook" --output-dir docs/
done
- name: Update index page
run: |
echo "## Available Documentation:" >> docs/index.md
echo "" >> docs/index.md
for file in docs/*.md; do
if [ "$(basename "$file")" != "index.md" ]; then
echo "- [$(basename "$file" .md)]($(basename "$file"))" >> docs/index.md
fi
done
- name: Commit and push changes
run: |
git config --local user.email "[email protected]"
git config --local user.name "GitHub Action"
git add docs
git commit -m "Update documentation" || echo "No changes to commit"
git push
3 changes: 3 additions & 0 deletions docs/_config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
title: notebook_docs
description: a repo to demonstrate how to create GitHub pages documentation from Jupyter notebooks
theme: jekyll-theme-cayman
12 changes: 12 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
---
layout: default
title: Home
---

## Welcome to Notebook Documentation

This site contains documentation for our jupyter notebooks.

## Available Documentation

(This list will be automatically populated)
151 changes: 151 additions & 0 deletions notebooks/testnb.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Import cleaning process\n",
"\n",
"The following notebook is part of our import cleaning process.\n",
"This notebook accomplishes the following:\n",
"- Imports a CSV file\n",
"- Removes extra columns\n",
"- Converts strings to correct data types\n",
"- Saves in the cleansed directory"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Pip install\n",
"%pip install pandas\n",
"\n",
"# Here we would import libraries\n",
"import sys\n",
"import pandas as pd\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Import file from raw data folder\n",
"Here we import the file from the raw folder"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Pseudo code for opening a file, importing a CSV, and loading it into pandas\n",
"\n",
"# Define the file path\n",
"file_path = 'path/to/your/csvfile.csv'\n",
"\n",
"# Use pandas to read the CSV file\n",
"df = pd.read_csv(file_path)\n",
"\n",
"# Display the first few rows of the dataframe\n",
"print(df.head())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Removes extra columns\n",
"Removing the address and phone fields"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# BEGIN: Remove extra columns\n",
"\n",
"# List of columns to remove\n",
"columns_to_remove = ['address', 'phone']\n",
"\n",
"# Remove the specified columns\n",
"df_cleaned = df.drop(columns=columns_to_remove)\n",
"\n",
"# Display the first few rows of the cleaned dataframe\n",
"print(df_cleaned.head())\n",
"\n",
"# END: Remove extra columns"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Set data types\n",
"Sets the correct datatypes for date and identity fields."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Pseudo code for setting data types\n",
"\n",
"# Convert the 'date_field' to datetime\n",
"df_cleaned['date_field'] = pd.to_datetime(df_cleaned['date_field'])\n",
"\n",
"# Convert the 'identity_field' to numeric (integer)\n",
"df_cleaned['identity_field'] = pd.to_numeric(df_cleaned['identity_field'], errors='coerce')\n",
"\n",
"# Display the data types of the dataframe to verify changes\n",
"print(df_cleaned.dtypes)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Save new data to a cleansed directory\n",
"Write the cleansed data from Pandas to a new CSV file in the cleansed folder"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Pseudo code to write the cleansed data to a new CSV file in the cleansed folder\n",
"\n",
"# Define the output file path\n",
"output_file_path = 'path/to/cleansed/folder/cleansed_data.csv'\n",
"\n",
"# Use pandas to write the dataframe to a CSV file\n",
"df_cleaned.to_csv(output_file_path, index=False)\n",
"\n",
"# Confirm the file has been written\n",
"print(f\"Cleansed data has been written to {output_file_path}\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.12.1"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
2 changes: 2 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
nbconvert
jupyter

0 comments on commit 3e51a99

Please sign in to comment.