Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tax-Calculator 0.13.0 changes results names #735

Closed
hdoupe opened this issue Nov 9, 2017 · 11 comments
Closed

Tax-Calculator 0.13.0 changes results names #735

hdoupe opened this issue Nov 9, 2017 · 11 comments
Milestone

Comments

@hdoupe
Copy link
Collaborator

hdoupe commented Nov 9, 2017

The names of the results columns were changed in PSLmodels/Tax-Calculator#1605. This is fine for future runs. However, the results from older runs will need to be handled in a special way until a better solution is found.

The best short term solution that I can think of is this: Keep a copy of both lists of names. The taxcalc version used in the old run is saved. So, we can use the old names for taxcalc version < 0.13.0.

In the long-term, we need to do a one-time update for the results database with the new variable names.

@martinholmer @GoFroggyRun do you have any thoughts on how to better handle this?

@martinholmer
Copy link
Contributor

@hdoupe said:

The names of the results columns were changed in PSLmodels/Tax-Calculator#1605. This is fine for future runs. However, the results from older runs will need to be handled in a special way until a better solution is found.

The best short term solution that I can think of is this: Keep a copy of both lists of names. The taxcalc version used in the old run is saved. So, we can use the old names for taxcalc version < 0.13.0.

In the long-term, we need to do a one-time update for the results database with the new variable names.

I don't know much about how TaxBrain saves run results in its database, but the two-step process makes logical sense. But several questions come to mind:

  • Is jumping straight to the second step too much work for the next PolicyBrain release?
  • Would it somehow be easier (if ultimately more time consuming) to pursue the two-step conversion?

I don't know the answers, but these are the kind of questions you need to be asking yourselves.

@GoFroggyRun @MattHJensen

@hdoupe
Copy link
Collaborator Author

hdoupe commented Nov 10, 2017

@martinholmer thanks for the advice. I'm going to try to do the full fix by replacing the variable names in the database. I'm giving myself to the end of the weekend. At that point, I'll have to move to the temporary fix I described above.

I set up postgres on my computer and downloaded the test and production databases. I think this is a matter of cycling through the results and replacing the names. Once I fix the variable names, I should be able to push the edited database to heroku with the push command described in the heroku docs.

@talumbau @brittainhard does this sound sensible?

@MattHJensen
Copy link
Contributor

MattHJensen commented Nov 10, 2017

Would it be easier to build the backwards compatibility into tax-calculator? This is something we'll need to learn how to do eventually and will need to do more often after we release 1.0.0. I don't think it would be unreasonable to think about how to do it now. But if it is easier to deal with this in policybrain, that seems fine too.

@talumbau
Copy link
Member

@hdoupe, can you give an example of what has changed? I'm assuming you mean a record entry has changed going forward and not a change in the name of a model attribute (ie a column name in the database). The latter is handled through migrations in Django, which is a pretty robust system at this point.

@hdoupe
Copy link
Collaborator Author

hdoupe commented Nov 10, 2017

@talumbau The keys in the taxcalc results dictionary changed, not the names of variables.

For example, now when you call taxcalc from PolicyBrain you get:

In [32]:     resdict =tbi.run_nth_year_tax_calc_model(year_n, start_year, #year_n=2
    ...:                                           use_puf_not_cps=False,
    ...:                                           use_full_sample=True,
    ...:                                           user_mods=user_mods,
    ...:                                           return_dict=True)

In [27]: resdict['dist2_xbin'].keys()
Out[27]: dict_keys(['<$10K_2', '$10-20K_2', '$20-30K_2', '$30-40K_2', '$40-50K_2', '$50-75K_2', '$75-100K_2', '$100-200K_2', '$200-500K_2', '$500-1000K_2', '>$1000K_2', 'all_2'])

In previous versions of Tax-Calculator, you would have gotten these keys:

BIN_ROW_NAMES = ['less_than_10', 'ten_twenty', 'twenty_thirty', 'thirty_forty',
                 'forty_fifty', 'fifty_seventyfive', 'seventyfive_hundred',
                 'hundred_twohundred', 'twohundred_fivehundred',
                 'fivehundred_thousand', 'thousand_up', 'all']

This is a problem when we create the final tables in taxbrain/helpers.taxcalc_results_to_tables.

@hdoupe
Copy link
Collaborator Author

hdoupe commented Nov 10, 2017

@MattHJensen asked:

Would it be easier to build the backwards compatibility into tax-calculator?

It probably would be. I think we could do something like this in tbi.py and that would work.

def run_nth_year_tax_calc_model(year_n, start_year,
                                use_puf_not_cps,
                                use_full_sample,
                                user_mods,
                                return_dict=True,
                                decile_row_names=DECILE_ROW_NAMES,
                                webbin_row_names=WEBBIN_ROW_NAMES,
                                agg_row_names=AGG_ROW_NAMES):

@talumbau
Copy link
Member

Ah, I see what you are saying. My preferred solution would just be to create some kind of mapper function that would replace any keys of the old "allowed set" with the equivalent in the new set. You could then just make sure that any access to this attribute on a model was wrapped with this call. Or, perhaps you could even do a decorator on this attribute for the model that needs to be modified. The key point is that the function would only change the old values to the new, leaving the new values unchanged. Further, this solution would only involve doing additional computation when old data is accessed, instead of going through every record in the database to enforce this new policy.

However, if you really wish to change every value in the database from the old keys to the new, what you propose is possible. I would just triple check that you have good backups of the production database, as you would always want to be able to restore to the original state should something go wrong.

@martinholmer
Copy link
Contributor

@talumbau said:

However, if you really wish to change every value in the database from the old keys to the new, what you propose is possible. I would just triple check that you have good backups of the production database, as you would always want to be able to restore to the original state should something go wrong.

Several questions and a suggestion:

  • Where in the PolicyBrain repo is the database schema (that is, the SQL "CREATE TABLE ..." commands)?

  • And where are the statements that store results in the database (that is, SQL "INSERT ..." commands) and
    where are the statements that retrieve results from the database (that is, SQL "SELECT ..." commands)?

The suggestion:

If, for example, 'ten_twenty' is a value in the database, then a simple SQL "UPDATE ... WHERE ..." command will change all the 'ten_twenty' values to '$10-20K'. Isn't this pretty easy to do?

@hdoupe @MattHJensen

@talumbau
Copy link
Member

talumbau commented Nov 10, 2017 via email

@hdoupe
Copy link
Collaborator Author

hdoupe commented Nov 10, 2017

The more I think about this, the more I think that this is a high risk low reward situation. I'm not willing to compromise the database to clean up the codebase during this tax reform push. I think doing a full database update should be added to the ever-growing project and maintenance list that we have slated for the spring.

@talumbau I like the idea of addressing this problem at the source with a decorator for the tax_result attribute.

@talumbau and @martinholmer thank you for your input.

@hdoupe hdoupe mentioned this issue Nov 14, 2017
3 tasks
@hdoupe hdoupe added this to the 1.2.0 Release milestone Nov 14, 2017
@hdoupe
Copy link
Collaborator Author

hdoupe commented Jan 17, 2018

Closed via #738

@hdoupe hdoupe closed this as completed Jan 17, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants