Skip to content

Commit

Permalink
read in json from hdfs (#27)
Browse files Browse the repository at this point in the history
* initial commit for reading in json from hdfs

* code is in a function and pydoop added to requirements.txt

* moved read into main.py

* comment out r and d

* remove old file
  • Loading branch information
robertswh authored Jun 13, 2024
1 parent 9a040a1 commit c5d1f5b
Show file tree
Hide file tree
Showing 2 changed files with 14 additions and 0 deletions.
13 changes: 13 additions & 0 deletions main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
import pandas as pd

from src.utils.hdfs_mods import hdfs_load_json as read_json

# TODO: read from config
folder_path = "/dapsen/workspace_zone/mbs-results/"
file_name = "snapshot-202212-002-2156d36b-e61f-42f1-a0f1-61d1f8568b8e.json"
file_path = folder_path + file_name

snapshot = read_json(file_path)

contributors = pd.DataFrame(snapshot["contributors"])
responses = pd.DataFrame(snapshot["responses"])
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ black
isort
nbstripout
nbqa
#research_and_development==1.0.0
pre_commit_hooks
flake8
pandas==1.1.5
Expand Down

0 comments on commit c5d1f5b

Please sign in to comment.