GitHub - jadesrochers/censusdata: Data pipeline from census sources to mongodb

Census Data Pipeline

This library pull data direct from census and related sources regarding income/population data. The data is transformed using some shell/sed commands, and then streamed into MongoDB databases.

How to use it

While all of the downloading, processing, and inserting into the database is done automatically, it does not create a database. You need a MongoDB instance running on localhost:27017. Also, the password/username for that db must be specified in mongouser.txt and mongopass.txt files in the top level of the project.
If you have that set up correctly, then the data populating can be called using these scripts:

./Census_Scripts/INC_process.sh 
./Census_Scripts/POP_PST_process.sh
./Census_BEA_scripts/BEA_Process.sh

Changing the variables or database layout

Editing the .sh scripts would allow downloading data sets other than the ones I have selected.

## This determines what raw data is being downloaded
curl -O https://www2.census.gov/library/publications/2011/compendia/usa-counties/zip/POP.zip
curl -O  https://www2.census.gov/library/publications/2011/compendia/usa-counties/zip/PST.zip

The sed files show my variable name translations. The Raw data uses variable codes from the census bureau, which I change to the full names so that I know what I am working with using these sed scripts.

## To change the file editing, look to the headeredit_sed and geoidedit_sed files.
sed -r -i.back -f Census_Scripts/POP_headeredit_sed.txt Census_POP_2011/POP01.csv.[0-9]
sed -r -i.oldgeoids -f Census_Scripts/POP_geoidedit_sed.txt Census_POP_2011/POP01.csv.[0-9]

Then, the .sh scripts call the ./csv_todb.js script with the variables to be put in the database and which collection to put them in, which could also be edited.

## This could be edited or called directly if you wanted to put different data
## into the db than I am doing by default.
node csv_todb.js --keyvars "GEO_ID,Areaname" --db 'census_income_data' --collection 'income_summary' --vars 'Areaname,GEO_ID,Median_household_income_1979,Median_household_income_1989,Median_household_income_1999,Median_household_income_2009,Mean_household_income_2009' 'Census_INC_2011/INC01.csv.0'

This grabs only a small amount of data overall

The US Census bureau has many, many more data sets than these. These same scripts should work for many of them, but would require meaningful modification becase I have not configured it to get all possible datasets, just a few I was looking at.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Census_BEA_scripts		Census_BEA_scripts
Census_Scripts		Census_Scripts
.gitignore		.gitignore
README.md		README.md
csv_todb.js		csv_todb.js
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Census Data Pipeline

How to use it

Changing the variables or database layout

This grabs only a small amount of data overall

About

Releases

Packages

Languages

jadesrochers/censusdata

Folders and files

Latest commit

History

Repository files navigation

Census Data Pipeline

How to use it

Changing the variables or database layout

This grabs only a small amount of data overall

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages