Skip to content

Commit

Permalink
Merge branch 'master' of github.com:aaowens/PSID.jl
Browse files Browse the repository at this point in the history
  • Loading branch information
aaowens committed Nov 15, 2019
2 parents 7ba8003 + 780b61e commit 2c0d2e7
Showing 1 changed file with 20 additions and 3 deletions.
23 changes: 20 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,14 @@
# PSID.jl

This package produces a labeled panel of individuals with a consistent individual ID across time. You provide a JSON file describing the variables you want. An example input file can be found at [examples/user_input.json.](https://github.com/aaowens/PSID.jl/blob/master/examples/user_input.json). Currently only variables in the family files can be added, but in the future it should be possible to support variables in the individual files or the supplements.

# Instructions

To add this package, use
```
(v1.2) pkg> add https://github.com/aaowens/PSID.jl
```

This package produces a labeled panel of individuals with a consistent individual ID across time. You provide a JSON file describing the variables you want. An example input file can be found at [examples/user_input.json.](https://github.com/aaowens/PSID.jl/blob/master/examples/user_input.json). Currently only variables in the family files can be added, but in the future it should be possible to support variables in the individual files or the supplements.

Requirements: A list of data files required to be in the current directory can be found [here](https://github.com/aaowens/PSID.jl/blob/master/src/allfiles_hash.json). These files are

1. The PSID codebook in XML format. You can download this from me here https://drive.google.com/open?id=1nz1UaVGcj0ur2Bp3ev7a8agJbj0A5JTF . In the future there will be a way to download this from the PSID directly.
Expand All @@ -20,11 +23,25 @@ julia> makePSID("user_input.json")
```
It will verify the required files exist and then construct the data. If successful, it will print `Finished constructing individual data, saved to output/allinds.csv` after about 5 minutes.

## The input JSON file
The file passed to `makePSID` describes the variables you want.
```
{
"name_user": "hours",
"varID": "V465",
"unit": "head"
},
```
There are three fields, `name_user`, `varID`, and `unit`. `name_user` is a name chosen by you. `varID` is one of the codes assigned by the PSID to this variable. These can be looked up in the PSID [cross-year index](https://simba.isr.umich.edu/VS/i.aspx). For example, hours above can be found in the crosswalk at ` Family Public Data Index 01>WORK 02>Hours and Weeks 03>annual in prior year 04>head 05>total:`. Clicking on the variable info will show the the list of years and associated IDs when that variable is available. Choose any of the IDs for `varID`, it does not matter. `PSID.jl` will look up all available years for that variable in the crosswalk. You must also indicate the unit, which can be `head`, `spouse`, or `family`. This makes sure the variable is assigned to the correct individual.


# Features

This package provides the following features:
1. Automatically labels missing values by searching the value labels from the codebook for strings like "NA", "Inap.", or "Missing".
2. Tries to produce consistent value labels across years for categotical variables. This is difficult because the labels in the PSID sometimes change between years. This package uses an algorithm to try to harmonize the labels when possible by removing common subsets. For example, in one year race is labeled as "Asian" but in the next year it is "Asian, Pacific Islander". The first is a subset of the second, so the final label will be "Asian, Pacific Islander". When this is not possible, the final label will be "A or B or C" for however many incomparable labels were found.
3. Matches the individuals across time to produce a panel with consistent (ID, year) keys and their associated variables.
4. Produces consistent individual or spouse variables for individuals. In the input JSON file, you must indicate whether a variable is family level, household head level, or household spouse level. The final output will have variables of the form VAR_family, VAR_ind, or VAR_spouse. When the individual is a household head, VAR_ind will come from the household head version of that variable, and VAR_spouse will come from the household spouse version. If the individual is a household spouse, it is the reverse.
4. Produces consistent individual or spouse variables for individuals. In the input JSON file, you must indicate whether a variable is family level, household head level, or household spouse level. The final output will have variables of the form `VAR_family`, `VAR_ind`, or `VAR_spouse`. When the individual is a household head, `VAR_ind` will come from the household head version of that variable, and `VAR_spouse` will come from the household spouse version. If the individual is a household spouse, it is the reverse. Both individuals will get all family level variables.
5. It's easiest to track individuals, but this package also produces a consistent family ID by treating a family as a combination of head and spouse (if spouse exists). If you keep only household heads and drop years before 1970, (famid, year) should be an ID.

This package is new and not well tested, please file issues if you find a bug.

0 comments on commit 2c0d2e7

Please sign in to comment.