Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to read JSON reform and assumption files on web #2070

Closed
martinholmer opened this issue Oct 4, 2018 · 16 comments
Closed

Add ability to read JSON reform and assumption files on web #2070

martinholmer opened this issue Oct 4, 2018 · 16 comments

Comments

@martinholmer
Copy link
Collaborator

@andersonfrailey has brought to my attention that Tax-Calculator users would like the ability to specify the URL of a JSON reform or assumption file rather than downloading it from some website and specifying its file name on the local computer). This seems like a reasonable enhancement request.

It seems as if the easiest way to implement this enhancement is to generalize the logic of the Calculator.read_json_param_objects(reform, assump) method so that the two arguments can be strings containing the URL of an online JSON file. So, for example, the URL of the Brown-Khanna Grow American Incomes Now (GAIN) Act of 2017 that is on the Tax-Calculator GitHub website is as follows:

https://github.com/open-source-economics/Tax-Calculator/blob/master/taxcalc/reforms/BrownKhanna.json

The objective of the enhancement would be to make reading the JSON at a given URL something the Calculator.read_json_param_objects(reform, assump) method can do. This enhancement would be available both to those who develop Python scripts using the taxcalc package and to those who use the tc command-line interface to the taxcalc package.

Does this approach make sense as a way to provide this enhancement? Are there other easier approaches? The approach described here provides the flexibility to read any URL; is that ability important?

@MaxGhenis
Copy link
Contributor

See #1644 (comment) where I provided code to do this:

import urllib2

REFORM_PATH = 'https://raw.githubusercontent.com/open-source-economics/Tax-Calculator/master/taxcalc/reforms/'
url = REFORM_PATH + 'TCJA_Reconciliation.json'

reform_text = urllib2.urlopen(url).read()

reform = Calculator.read_json_param_objects(reform_text, 'JCT_Behavior.json')

@feenberg
Copy link
Contributor

feenberg commented Oct 4, 2018 via email

@martinholmer
Copy link
Collaborator Author

@MaxGhenis said in issue #2070:

See #1644 (comment) where I provided code to do this [that is, read JSON reform files directly from the Tax-Calculator website]

@MaxGhenis is absolutely correct about a Tax-Calculator enhancement being unnecessary because this can be done (in Python 2.7 using the urllib2 package) with only a couple Python statements.

A new user might not have found closed issue #1644 (which was open during Nov-Dec 2017), but how to read JSON reform files in Python 3.6 from the Tax-Calculator website is illustrated in recipe 1 of the Cookbook of Tested Recipes for Python Programming with Tax-Calculator. For those who have not read that recipe, here are the first few lines:

from urllib.request import urlopen
from taxcalc import *

# read two "old" reform files from Tax-Calculator website
# ("old" means the reform files are defined relative to pre-TCJA policy)
#   For more about the compound-reform technique used in this recipe,
#   read answer to Question 1 of FAQ at the following URL:
#   https://github.com/open-source-economics/Tax-Calculator/issues/1830
BASE_URL = ('https://raw.githubusercontent.com/'
            'open-source-economics/Tax-Calculator/master/taxcalc/reforms/')

baseline_name = '2017_law.json'  # pre-TCJA policy
baseline_text = urlopen(BASE_URL + baseline_name).read().decode()
baseline = Calculator.read_json_param_objects(baseline_text, None)

reform1_name = 'TCJA_Senate.json'  # TCJA as orginally proposed in Senate
reform1_text = urlopen(BASE_URL + reform1_name).read().decode()
reform2_name = 'TCJA_Reconciliation.json'  # TCJA as passed by Congress
reform2_text = urlopen(BASE_URL + reform2_name).read().decode()

# specify Policy object for static analysis of reform1 relative to pre-TCJA
reform1 = Calculator.read_json_param_objects(reform1_text, None)
policy1 = Policy()
policy1.implement_reform(baseline['policy'], print_warnings=False)
policy1.implement_reform(reform1['policy'], print_warnings=False)

# specify Policy object for static analysis of reform2 relative to pre-TCJA
reform2 = Calculator.read_json_param_objects(reform2_text, None)
policy2 = Policy()
policy1.implement_reform(baseline['policy'], print_warnings=False)
policy2.implement_reform(reform2['policy'], print_warnings=False)

So, given all this, what should we make of the fact that (new?) users are asking @andersonfrailey how to read JSON reform files directly from the Tax-Calculator website? Do these users think the approach illustrated in Cookbook recipe 1 is too difficult to do in a Python script? Or is it the case that these users have never read Cookbook recipe 1? If users have read Cookbook recipe 1, but think it is too difficult, then we have one kind of problem. But if users have never read the Cookbook, then we have a very different kink of problem. @andersonfrailey, which is it?

@andersonfrailey
Copy link
Collaborator

@martinholmer I believe it's both. To my knowledge the users I've talked to haven't read the cookbook, but they're also relatively new to Python and aren't familiar with packages like requests and urllib.

I agree that this enhancement isn't technically necessary because there is a solution as has been stated earlier in this conversation, but I also think providing an easy way for users to access at least the sample reforms we've created using a function of the taxcalc class would be a nice feature especially for inexperienced users. I'm envisioning them having a workflow that looks something like this:

import taxcalc

reform = taxcalc.reforms('2017_law')
params = taxcalc.Calculator.read_json_param_objects(reform, None)
pol = taxcalc.Policy()
pol.implement_reform(params['policy'])
...

The reforms function could work in a couple of ways. First, we could limit it to just accessing the reforms we have stored in taxcalc/reforms. The user would specify by name which one they wanted, then the function would pull the file from GitHub and return JSON text that's ready to be read by Calculator.read_json_param_objects or a dictionary that can be passed directly into Policy.implement_reform. This is what I laid out in the above example.

Second option would be to generalize reforms so that it can take as an argument any URL from which to pull the JSON text whether it's hosted on GitHub or some other site and return either JSON text or a dictionary. This would effectively take this part of the cookbook recipe and put it in taxcalc:

baseline_name = '2017_law.json'  # pre-TCJA policy
baseline_text = urlopen(BASE_URL + baseline_name).read().decode()

By making this an actual feature of taxcalc we could also extend this capability to the command line users by adding an argument that takes either a URL or reform name.

@MaxGhenis
Copy link
Contributor

I provided the code to save some time implementing this feature. I think both of @andersonfrailey's solutions would be really nice additions, i.e. supporting both:

taxcalc.Calculator.read_json_param_objects(taxcalc.reforms('2017_law'), None)
taxcalc.Calculator.read_json_param_objects('http://...path.../2017_law.json'), None)

@hdoupe
Copy link
Collaborator

hdoupe commented Oct 4, 2018

I think option 1 that @andersonfrailey outlines could be a nice feature. I think the limited scope of the first part of the proposal means that this could be added and maintained with a minimal amount of effort.

Many tutorials using scikit-learn make use of a built in "iris" data set. Users can easily follow along and replicate the analyses just by pip/conda installing the packages.

@martinholmer
Copy link
Collaborator Author

martinholmer commented Oct 4, 2018

@andersonfrailey seems to be saying in issue #2070 that users generally aren't reading the Cookbook, but even if they had read the Cookbook they would want an easier way to read JSON reform/assumption files from a website. And @MaxGhenis agrees that an ease-of-use enhancement would be desirable.

If we are going to add such an ease-of-use enhancement, I don't like the idea of restricting access to only JSON files that are on the Tax-Calculator website. The whole idea of the Tax-Calculator code is that it is highly portable, so adding a Calculator method that is Open-Source-Economics centric is not desirable. We want to encourage users to post their own JSON reform files on their personal websites for others to use in their Python scripts. This would be a nice feature that might encourage the growth of an interactive user group. And it is no more difficult to implement what @andersonfrailey and @MaxGhenis called option 2 (see above comments) than it is to implement option 1.

The way to implement option 2 is along the lines I outlined in the first comment in this issue.

Would somebody like to volunteer to prepare a pull request that implements option 2?

@andersonfrailey
Copy link
Collaborator

@martinholmer I can take care of it.

@MattHJensen
Copy link
Contributor

I agree with @martinholmer that we should encourage the development of JSON files on websites other than Tax-Calculator's.

Establishing the Tax-Calculator repo as a central hub for reform files places a significant review and maintenance burden on Tax-Calculator developers, and it creates a fairly serious reputational risk for the project.

As an aside:

At some point we may even want to move the existing JSON reform files out of the Tax-Calculator code base and into separate unaffiliated or semi-affiliated repositories. Yes, I was an advocate for putting them in Tax-Calculator to begin with, and it was probably the right decision at the time, but I am not so sure whether it is going forward. We should have a discussion about this at some point.

@martinholmer
Copy link
Collaborator Author

@andersonfrailey said:

I can take care of it.

Great! Look forward to your PR.

@martinholmer
Copy link
Collaborator Author

martinholmer commented Oct 4, 2018

@MattHJensen said in issue #2070:

we should encourage the development of JSON files on websites other than Tax-Calculator's.

Establishing the Tax-Calculator repo as a central hub for reform files places a significant review and maintenance burden on Tax-Calculator developers, and it creates a fairly serious reputational risk for the project.

As an aside:

At some point we may even want to move the existing JSON reform files out of the Tax-Calculator code base and into separate unaffiliated or semi-affiliated repositories. Yes, I was an advocate for putting them in Tax-Calculator to begin with, and it was probably the right decision at the time, but I am not so sure whether it is going forward. We should have a discussion about this at some point.

@MattHJensen, thanks for making these excellent points. I hadn't thought of this angle, but I agree with your line of thinking.

@martinholmer
Copy link
Collaborator Author

@hdoupe said in issue #2070:

Many tutorials using scikit-learn make use of a built in "iris" data set. Users can easily follow along and replicate the analyses just by pip/conda installing the packages.

Yes and Tax-Calculator makes use of a built in CPS data set via a single statement:

recs = Records.cps_constructor()

So, on the data side Tax-Calculator is even easier to use than the "iris" data set in scikit-learn because users don't have to install a data package (the CPS data is built into the taxcalc package).

Making online JSON reform/assumption files useable with the Calculator.read_json_param_objects() method will help users write their Python programs. Does scikit-learn have that kind of capability?

@hdoupe
Copy link
Collaborator

hdoupe commented Oct 4, 2018

@martinholmer I think it's great that we already have the cps_constructor method. I lopped the reform files into that same "data" category which already includes the cps file. So, I was trying to say that I wanted more data (the reform files) accessible from the python package.

@martinholmer
Copy link
Collaborator Author

@hdoupe said in issue #2070:

I think it's great that we already have the cps_constructor method. I lopped the reform files into that same "data" category which already includes the cps file. So, I was trying to say that I wanted more data (the reform files) accessible from the python package.

OK. That makes sense.

@hdoupe
Copy link
Collaborator

hdoupe commented Oct 4, 2018

Yep, sorry for the misunderstanding. I'm excited to see @andersonfrailey's implementation of the reform file fetcher.

@martinholmer
Copy link
Collaborator Author

This issue was resolved by the merger of pull request #2079.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants