-
Notifications
You must be signed in to change notification settings - Fork 16
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
remove pgloader, load directly into postgres using ./reparse.sh
- Loading branch information
Showing
5 changed files
with
64 additions
and
81 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,14 +1,9 @@ | ||
#!/bin/bash -e | ||
|
||
# createdb stabilization 2>/dev/null || : | ||
|
||
export PGPASSWORD=docker4data | ||
export PGUSER=postgres | ||
export PGHOST=localhost | ||
export PGPORT=54321 | ||
export PGDATABASE=postgres | ||
|
||
# need to have pgloader installed | ||
pgloader pgloader.load | ||
|
||
psql -f cross-tab-rs-counts.sql |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,50 @@ | ||
#!/bin/bash | ||
|
||
source .env/bin/activate | ||
time python parse.py data/ >data/rawdata.csv 2>data/rawdata.log & | ||
|
||
export PGPASSWORD=docker4data | ||
export PGUSER=postgres | ||
export PGHOST=localhost | ||
export PGPORT=54321 | ||
export PGDATABASE=postgres | ||
|
||
psql -c 'drop table if exists rawdata cascade;' | ||
psql -c 'create table rawdata ( | ||
bbl bigint, | ||
activityThrough DATE, | ||
section TEXT, | ||
key TEXT, | ||
dueDate DATE, | ||
activityDate DATE, | ||
value TEXT, | ||
meta TEXT, | ||
apts TEXT | ||
);' | ||
psql -c 'drop table if exists rgb cascade;' | ||
psql -c 'create table rgb ( | ||
source VARCHAR, | ||
borough SMALLINT, | ||
year INT, | ||
add_421a INT, | ||
add_421g INT, | ||
add_420c INT, | ||
add_j51 INT, | ||
add_ML_buyout INT, | ||
add_loft INT, | ||
add_former_control REAL, | ||
sub_high_rent_income INT, | ||
sub_high_rent_vacancy INT, | ||
sub_coop_condo_conversion INT, | ||
sub_421a_expiration INT, | ||
sub_j51_expiration INT, | ||
sub_substantial_rehab INT, | ||
sub_commercial_prof_conversion INT, | ||
sub_other INT, | ||
total_sub INT, | ||
total_add REAL, | ||
inflated VARCHAR, | ||
net REAL | ||
);' | ||
time cat data/rgb.csv | psql -c "COPY rgb FROM stdin WITH CSV HEADER NULL '' QUOTE'\"';" | ||
|
||
time python parse.py data/ 2>data/rawdata.log | psql -c "COPY rawdata FROM stdin WITH CSV HEADER NULL '' QUOTE '\"';" |
48995c8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@talos, wondering what's the motivation behind dropping pgloader...
FYI, there's talk considering using it for CKAN ckan/ideas#150
48995c8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jqnatividad pgloader is great, but it's unnecessary if the CSV being imported is of perfect quality. In this case, I'm generating the CSV in Python and can ensure it's high quality, so I may as well use postgres's
COPY
directly.If you're feeding in large CSVs from external sources,
pgloader
is great. Here, it's an unnecessary dependency.