Skip to content

Commit

Permalink
Merge pull request #14 from aaowens/changetests
Browse files Browse the repository at this point in the history
Add tests, CI
  • Loading branch information
aaowens authored Nov 22, 2019
2 parents cfb905c + 10e3ad3 commit 2edb397
Show file tree
Hide file tree
Showing 8 changed files with 61 additions and 19 deletions.
8 changes: 8 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
language: julia
os:
- linux
- windows
julia:
- 1.2
- 1.3
sudo: false
4 changes: 2 additions & 2 deletions Project.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name = "PSID"
uuid = "92fd0282-be9c-47fb-a489-f0d0a91db595"
version = "0.1.3"
version = "0.2.0"

[deps]
AbstractTrees = "1520ce14-60c1-5f80-bbc7-55ef81b5835c"
Expand All @@ -16,7 +16,7 @@ SHA = "ea8e919c-243c-51af-8825-aaa63cd721ce"
XLSX = "fdbf4ff8-1666-58a4-91e7-1b58723a45e0"

[compat]
julia = "1"
julia = "1.1"
AbstractTrees = "0.2.1"
CSV = "0.5.16"
DataDeps = "0.7.0"
Expand Down
6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# PSID.jl

[![Build Status](https://travis-ci.com/aaowens/PSID.jl.svg?branch=master)](https://travis-ci.com/aaowens/PSID.jl)

The Panel Study of Income Dynamics (PSID) is a longitudinal public dataset which has been following a collection of families and their descendants since 1968. It provides a breadth of information about labor supply and life-cycle dynamics. More information is available at https://psidonline.isr.umich.edu/.

This package produces a labeled panel of individuals with a consistent individual ID across time. You provide a JSON file describing the variables you want. An example input file can be found at [examples/user_input.json.](https://github.com/aaowens/PSID.jl/blob/master/examples/user_input.json). Currently only variables in the family files can be added, but in the future it should be possible to support variables in the individual files or the supplements.
Expand Down Expand Up @@ -37,9 +39,9 @@ The file passed to `makePSID` describes the variables you want.
},
```
There are three fields, `name_user`, `varID`, and `unit`. `name_user` is a name chosen by you. `varID` is one of the codes assigned by the PSID to this variable. These can be looked up in the PSID [cross-year index](https://simba.isr.umich.edu/VS/i.aspx). For example, hours above can be found in the crosswalk at ` Family Public Data Index 01>WORK 02>Hours and Weeks 03>annual in prior year 04>head 05>total:`. Clicking on the variable info will show the the list of years and associated IDs when that variable is available. Choose any of the IDs for `varID`, it does not matter. `PSID.jl` will look up all available years for that variable in the crosswalk. You must also indicate the unit, which can be `head`, `spouse`, or `family`. This makes sure the variable is assigned to the correct individual.


# Features

# Features

This package provides the following features:
1. Automatically labels missing values by searching the value labels from the codebook for strings like "NA", "Inap.", or "Missing".
Expand Down
4 changes: 2 additions & 2 deletions src/allfiles_hash.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
{
"J265684_codebook.xml": "6d9a40ef8c61aa359a31aa4d04746b31511893a05e81337aa7e7a365c8452f0a",
"psid.xlsx": "9f277e239d8483a3c852b7214bee6ac7d54cb101b27b4f95f75058e186b435d2",
"fam1968.zip": "38292539d020824be3ca3908c04093edf8d3ccef8dc44cb81147a2315e94cd00",
"fam1969.zip": "7942f14d8c0c3f42c05efa3fd6b90b011721e280f198fe6b78a417bb28dc9335",
"fam1970.zip": "07bf03d9aa9ca9258aff3546abf60e66447c02031819d5a552f4769ddb8bb90f",
Expand Down Expand Up @@ -40,6 +41,5 @@
"fam2013er.zip": "43a527b834dc31b753881d3ab03fe1a4c4f1dde7eb5aa2d77a7c3bb79095d15c",
"fam2015er.zip": "726236d3f9d25e804d2605eff6a6a11f322999530d27a2528d0d01cf31af6066",
"fam2017er.zip": "5ade1a3f42ed84c892fe8ff16365b85b0dc84ac66f4d454e291af12008e9b35d",
"ind2017er.zip": "7ea5837017603841afeb0d4d0365745d1a592e5c7a77021d4e1d617b7aed486c",
"psid_crossyear.xlsx": "9f277e239d8483a3c852b7214bee6ac7d54cb101b27b4f95f75058e186b435d2"
"ind2017er.zip": "7ea5837017603841afeb0d4d0365745d1a592e5c7a77021d4e1d617b7aed486c"
}
11 changes: 9 additions & 2 deletions src/init.jl
Original file line number Diff line number Diff line change
@@ -1,10 +1,17 @@
function checkhash(filename)
filename |> read |> sha256 |> bytes2hex
end
function verifyfiles(allfilesjson)
function verifyfiles(allfilesjson; skip = false)
allfiles_dict = JSON3.read(read(allfilesjson, String), SortedDict{String, String})
for (f, v) in allfiles_dict
isfile(f) || error("$f not found and is required.")
if !isfile(f)
if !skip
error("$f not found and is required.")
else
@warn "$f not found, skipping"
continue
end
end
fh = checkhash(f)
if fh == v
println("Found file $f, hash OK")
Expand Down
2 changes: 1 addition & 1 deletion src/use_codebook.jl
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@ function process_input(inputjson)
j2 = jsontable(read("output/codebook.json", String));
d2 = DataFrame(j2);
d2.codedict = [Dict(string(x) => y for (x, y) in dt) for dt in d2.codedict]
df = DataFrame(XLSX.readtable("psid_crossyear.xlsx", "MATRIX")...)
df = DataFrame(XLSX.readtable("psid.xlsx", "MATRIX")...)
df = mapcols(x -> [xx for xx in x], df)
## Need a map from VAR to the right row
df_vars = df[!, r"^Y.+"]
Expand Down
6 changes: 6 additions & 0 deletions test/Project.toml
Original file line number Diff line number Diff line change
@@ -1,2 +1,8 @@
[deps]
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
DataDeps = "124859b0-ceae-595e-8997-d05f6a7a8dfe"
JSON3 = "0f8b85d8-7281-11e9-16c2-39a750bddbf1"

[compat]
DataDeps = "0.7.0"
JSON3 = "= 0.1.12"
39 changes: 29 additions & 10 deletions test/runtests.jl
Original file line number Diff line number Diff line change
@@ -1,14 +1,33 @@
using Test
using PSID
using PSID, DataDeps, JSON3


@show pwd()
#=
x = dirname(pathof(PSID))
fx = "$x/allfiles_hash.json"
@show isfile(fx)
PSID.verifyfiles(fx)
PSID.process_codebook()
PSID.process_input("user_input.json")
famdatas, inddata = PSID.unzip_data()
PSID.construct_alldata(famdatas, inddata)
=#
makePSID("user_input.json")
skipdata = try
PSID.verifyfiles(fx, skip = skip)
println("Found all files, running full tests")
false
catch
println("Did not find data files, running partial tests")
true
end

if skipdata
Base.download("https://raw.githubusercontent.com/aaowens/PSID.jl/master/examples/user_input.json", "user_input.json")
Base.download("https://drive.google.com/uc?authuser=0&id=1nz1UaVGcj0ur2Bp3ev7a8agJbj0A5JTF&export=download", "J265684_codebook.zip")
run(DataDeps.unpack_cmd("J265684_codebook.zip", "$(pwd())", ".zip", ""))
Base.download("https://psidonline.isr.umich.edu/help/xyr/psid.xlsx", "psid.xlsx")
userinput_json = "user_input.json"
isfile(userinput_json) || error("$userinput_json not found in current directory")
isdir("output") || mkdir("output")
isdir("datafiles") || mkdir("datafiles")
PSID.process_codebook()
PSID.process_input("user_input.json")
JSON3.read(read("output/user_output.json", String), Vector{PSID.VarInfo5})
#famdatas, inddata = PSID.unzip_data()
#PSID.construct_alldata(famdatas, inddata)
else
makePSID("user_input.json")
end

0 comments on commit 2edb397

Please sign in to comment.