Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistently use baseline expanded_income to fuzz reform results in dropq tables #1537

Merged
merged 46 commits into from
Sep 6, 2017
Merged
Show file tree
Hide file tree
Changes from 37 commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
624b4f6
Streamline dropq and util diff-table code
martinholmer Aug 29, 2017
ae319af
Remove unneeded float casts in utilsprvt.py
martinholmer Aug 29, 2017
1d7fbf6
Move perc_aftertax calc into diff_table_stats utility function.
martinholmer Aug 29, 2017
6113ebe
Move more diff table logic to diff_table_stats function
martinholmer Aug 29, 2017
aa86c8b
Move create_difference_table logic to diff_table_stats
martinholmer Aug 29, 2017
29be90b
Use create_difference_table utility in dropq logic
martinholmer Aug 30, 2017
3a97e38
Rename function in utilsprvt.py
martinholmer Aug 30, 2017
6a1cdb5
Complete renaming to weighted_perc_cut
martinholmer Aug 30, 2017
0a45826
Minor change in create_difference_table handling of input
martinholmer Aug 30, 2017
bb15e85
Change current_year ValueError to assert in utils.py
martinholmer Aug 30, 2017
4a0d609
Merge in recent changes on master branch
martinholmer Aug 30, 2017
9925a3b
Improve diff-table label for per_aftertax column
martinholmer Aug 30, 2017
8df23d2
Nest diff_table_stats function in create_difference_table utility
martinholmer Aug 30, 2017
df47734
Cosmetic consistency change from 1e99 to 9e99 in bins
martinholmer Aug 30, 2017
7ce7357
Remove obsolete tests from test_dropq.py
martinholmer Aug 30, 2017
25c2762
Add new test in test_dropq.py
martinholmer Aug 30, 2017
18f9a98
Add stronger create_distribution_table tests
martinholmer Aug 30, 2017
381e858
Remove baseline_obj and diff arguments from create_distribution_table
martinholmer Aug 30, 2017
49b2f99
Remove obsolete dropq_dist_table tests
martinholmer Aug 30, 2017
246f41c
Revise create_distribution_table arguments
martinholmer Aug 31, 2017
a95f04c
Merge branch 'master' into revise-dist-table
martinholmer Aug 31, 2017
1abdc6b
First step in fixing dropq fuzzing logic
martinholmer Aug 31, 2017
a918c1a
Second step in fixing dropq fuzzing logic
martinholmer Aug 31, 2017
90432f0
Third step in fixing dropq fuzzing logic
martinholmer Sep 1, 2017
542cedb
Clarify code in dropq_summary function
martinholmer Sep 1, 2017
0b62f5a
Simplify dropq test_run_tax_calc_model
martinholmer Sep 1, 2017
23a08a1
Change add_*_bins function arguments
martinholmer Sep 1, 2017
8b34a32
Rename add_weighted_income_bins as add_quantile_bins
martinholmer Sep 1, 2017
54d80ae
Clafify documentation in dropq.py and dropq_utils.py
martinholmer Sep 1, 2017
4f69162
Rename and streamline dropq fuzz_records function
martinholmer Sep 1, 2017
ec022ae
Clarify suffix logic in dropq_summary function
martinholmer Sep 1, 2017
6c548b7
Fuzz records used in difference tables using baseline income
martinholmer Sep 2, 2017
a4a243e
Revise suffix names in dropq functions
martinholmer Sep 2, 2017
3b1432f
Refactor dropq fuzz_df2_records function
martinholmer Sep 2, 2017
2b86786
Standardize dropq summary results names
martinholmer Sep 2, 2017
a8594ee
Revise a few dropq comments
martinholmer Sep 2, 2017
3dd22bc
Update RELEASES.md info
martinholmer Sep 2, 2017
afa4a55
Merge branch 'master' into revise-dist-table
martinholmer Sep 5, 2017
537b1c0
Better documentation what 'fuzzing' results in dropq means
martinholmer Sep 5, 2017
d3ae770
Add doc and asserts to create_di*table utility functions
martinholmer Sep 5, 2017
aed63de
Clarify documentation in both dropq files
martinholmer Sep 5, 2017
7b79c22
Consistently use baseline income measure for binning in dropq logic
martinholmer Sep 5, 2017
c62df24
Add test of new utils.py code
martinholmer Sep 5, 2017
c920116
Simplify nested fuzz function in dropq_utils.py
martinholmer Sep 5, 2017
b1e54ef
Edit top docstring in dropq.py
martinholmer Sep 6, 2017
f6b3e8a
Construct dropq aggregate table by fuzzing just three records
martinholmer Sep 6, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions RELEASES.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,15 @@ Go
[here](https://github.com/open-source-economics/Tax-Calculator/pulls?q=is%3Apr+is%3Aclosed)
for a complete commit history.

Release 0.Y.Z on 2017-??-??
Release 0.11.0 on 2017-??-??
----------------------------
(last merged pull request is
[#xxxx](https://github.com/open-source-economics/Tax-Calculator/pull/xxxx))

**API Changes**
- None
- Revise dropq distribution and difference tables used by TaxBrain
[[#1537](https://github.com/open-source-economics/Tax-Calculator/pull/1537)
by Anderson Frailey and Martin Holmer]

**New Features**
- None
Expand Down
184 changes: 99 additions & 85 deletions taxcalc/dropq/dropq.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
"""
The dropq functions are used by TaxBrain to call Tax-Calculator in order
to maintain the privacy of the micro data being used by TaxBrain.
to maintain the privacy of the micro data being used by TaxBrain. This
is done by adding random "fuzz" to the results in each table cell.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly: "This is done by adding random "fuzz" to the sample from which the results in each table cell are drawn."

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. The description of what "fuzzing" means needs to be improved.
Commit 537b1c0 is an attempt to improve the documentation.

"""
# CODING-STYLE CHECKS:
# pep8 --ignore=E402 dropq.py
Expand All @@ -20,7 +21,7 @@
# specify constants
PLAN_COLUMN_TYPES = [float] * len(TABLE_LABELS)

DIFF_COLUMN_TYPES = [int, int, int, float, float, str, str, str]
DIFF_COLUMN_TYPES = [int, int, int, float, float, str, str, str, str]

DECILE_ROW_NAMES = ['perc0-10', 'perc10-20', 'perc20-30', 'perc30-40',
'perc40-50', 'perc50-60', 'perc60-70', 'perc70-80',
Expand Down Expand Up @@ -102,98 +103,111 @@ def run_nth_year_tax_calc_model(year_n, start_year,
np.random.seed(seed) # pylint: disable=no-member

# construct dropq summary results from raw results
(m2_dec, m1_dec, df_dec, pdf_dec, cdf_dec,
m2_bin, m1_bin, df_bin, pdf_bin, cdf_bin,
itax_sumd, ptax_sumd, comb_sumd,
itax_sum1, ptax_sum1, comb_sum1,
itax_sum2, ptax_sum2, comb_sum2) = dropq_summary(rawres1, rawres2, mask)

# construct DataFrames containing selected summary results
totsd = [itax_sumd, ptax_sumd, comb_sumd]
fiscal_tots_diff = pd.DataFrame(data=totsd, index=TOTAL_ROW_NAMES)

tots1 = [itax_sum1, ptax_sum1, comb_sum1]
fiscal_tots_baseline = pd.DataFrame(data=tots1, index=TOTAL_ROW_NAMES)

tots2 = [itax_sum2, ptax_sum2, comb_sum2]
fiscal_tots_reform = pd.DataFrame(data=tots2, index=TOTAL_ROW_NAMES)

# remove negative incomes from selected summary results
df_bin.drop(df_bin.index[0], inplace=True)
pdf_bin.drop(pdf_bin.index[0], inplace=True)
cdf_bin.drop(cdf_bin.index[0], inplace=True)
m2_bin.drop(m2_bin.index[0], inplace=True)
m1_bin.drop(m1_bin.index[0], inplace=True)
(dist1_dec, dist2_dec,
diff_itax_dec, diff_ptax_dec, diff_comb_dec,
dist1_bin, dist2_bin,
diff_itax_bin, diff_ptax_bin, diff_comb_bin,
aggr_itax_d, aggr_ptax_d, aggr_comb_d,
aggr_itax_1, aggr_ptax_1, aggr_comb_1,
aggr_itax_2, aggr_ptax_2, aggr_comb_2) = dropq_summary(rawres1,
rawres2,
mask)

# construct DataFrames containing aggregate tax totals
# ... for reform-minus-baseline difference
aggrd = [aggr_itax_d, aggr_ptax_d, aggr_comb_d]
aggr_d = pd.DataFrame(data=aggrd, index=TOTAL_ROW_NAMES)
# ... for baseline
aggr1 = [aggr_itax_1, aggr_ptax_1, aggr_comb_1]
aggr_1 = pd.DataFrame(data=aggr1, index=TOTAL_ROW_NAMES)
# ... for reform
aggr2 = [aggr_itax_2, aggr_ptax_2, aggr_comb_2]
aggr_2 = pd.DataFrame(data=aggr2, index=TOTAL_ROW_NAMES)

elapsed_time = time.time() - start_time
print('elapsed time for this run: ', elapsed_time)

def append_year(tdf):
def append_year(pdf):
"""
append_year embedded function
append_year embedded function revises all column names in pdf
"""
tdf.columns = [str(col) + '_{}'.format(year_n) for col in tdf.columns]
return tdf
pdf.columns = [str(col) + '_{}'.format(year_n) for col in pdf.columns]
return pdf

# optionally return non-JSON results
if not return_json:
return (append_year(m2_dec), append_year(m1_dec), append_year(df_dec),
append_year(pdf_dec), append_year(cdf_dec),
append_year(m2_bin), append_year(m1_bin), append_year(df_bin),
append_year(pdf_bin), append_year(cdf_bin),
append_year(fiscal_tots_diff),
append_year(fiscal_tots_baseline),
append_year(fiscal_tots_reform))

# optionally construct JSON results
decile_row_names_i = [x + '_' + str(year_n) for x in DECILE_ROW_NAMES]
m2_dec_table_i = create_json_table(m2_dec,
row_names=decile_row_names_i,
column_types=PLAN_COLUMN_TYPES)
m1_dec_table_i = create_json_table(m1_dec,
row_names=decile_row_names_i,
column_types=PLAN_COLUMN_TYPES)
df_dec_table_i = create_json_table(df_dec,
row_names=decile_row_names_i,
column_types=DIFF_COLUMN_TYPES)
pdf_dec_table_i = create_json_table(pdf_dec,
row_names=decile_row_names_i,
column_types=DIFF_COLUMN_TYPES)
cdf_dec_table_i = create_json_table(cdf_dec,
row_names=decile_row_names_i,
column_types=DIFF_COLUMN_TYPES)
bin_row_names_i = [x + '_' + str(year_n) for x in BIN_ROW_NAMES]
m2_bin_table_i = create_json_table(m2_bin,
row_names=bin_row_names_i,
column_types=PLAN_COLUMN_TYPES)
m1_bin_table_i = create_json_table(m1_bin,
row_names=bin_row_names_i,
column_types=PLAN_COLUMN_TYPES)
df_bin_table_i = create_json_table(df_bin,
row_names=bin_row_names_i,
column_types=DIFF_COLUMN_TYPES)
pdf_bin_table_i = create_json_table(pdf_bin,
row_names=bin_row_names_i,
column_types=DIFF_COLUMN_TYPES)
cdf_bin_table_i = create_json_table(cdf_bin,
row_names=bin_row_names_i,
column_types=DIFF_COLUMN_TYPES)
total_row_names_i = [x + '_' + str(year_n) for x in TOTAL_ROW_NAMES]
fiscal_yr_total_df = create_json_table(fiscal_tots_diff,
row_names=total_row_names_i)
fiscal_yr_total_df = dict((k, v[0]) for k, v in fiscal_yr_total_df.items())
fiscal_yr_total_bl = create_json_table(fiscal_tots_baseline,
row_names=total_row_names_i)
fiscal_yr_total_bl = dict((k, v[0]) for k, v in fiscal_yr_total_bl.items())
fiscal_yr_total_rf = create_json_table(fiscal_tots_reform,
row_names=total_row_names_i)
fiscal_yr_total_rf = dict((k, v[0]) for k, v in fiscal_yr_total_rf.items())
return (append_year(dist2_dec),
append_year(dist1_dec),
append_year(diff_itax_dec),
append_year(diff_ptax_dec),
append_year(diff_comb_dec),
append_year(dist2_bin),
append_year(dist1_bin),
append_year(diff_itax_bin),
append_year(diff_ptax_bin),
append_year(diff_comb_bin),
append_year(aggr_d),
append_year(aggr_1),
append_year(aggr_2))

# optionally construct JSON results tables for year n
dec_row_names_n = [x + '_' + str(year_n) for x in DECILE_ROW_NAMES]
dist2_dec_table_n = create_json_table(dist2_dec,
row_names=dec_row_names_n,
column_types=PLAN_COLUMN_TYPES)
dist1_dec_table_n = create_json_table(dist1_dec,
row_names=dec_row_names_n,
column_types=PLAN_COLUMN_TYPES)
diff_itax_dec_table_n = create_json_table(diff_itax_dec,
row_names=dec_row_names_n,
column_types=DIFF_COLUMN_TYPES)
diff_ptax_dec_table_n = create_json_table(diff_ptax_dec,
row_names=dec_row_names_n,
column_types=DIFF_COLUMN_TYPES)
diff_comb_dec_table_n = create_json_table(diff_comb_dec,
row_names=dec_row_names_n,
column_types=DIFF_COLUMN_TYPES)
bin_row_names_n = [x + '_' + str(year_n) for x in BIN_ROW_NAMES]
dist2_bin_table_n = create_json_table(dist2_bin,
row_names=bin_row_names_n,
column_types=PLAN_COLUMN_TYPES)
dist1_bin_table_n = create_json_table(dist1_bin,
row_names=bin_row_names_n,
column_types=PLAN_COLUMN_TYPES)
diff_itax_bin_table_n = create_json_table(diff_itax_bin,
row_names=bin_row_names_n,
column_types=DIFF_COLUMN_TYPES)
diff_ptax_bin_table_n = create_json_table(diff_ptax_bin,
row_names=bin_row_names_n,
column_types=DIFF_COLUMN_TYPES)
diff_comb_bin_table_n = create_json_table(diff_comb_bin,
row_names=bin_row_names_n,
column_types=DIFF_COLUMN_TYPES)
total_row_names_n = [x + '_' + str(year_n) for x in TOTAL_ROW_NAMES]
aggr_d_table_n = create_json_table(aggr_d,
row_names=total_row_names_n)
aggr_d_table_n = dict((k, v[0]) for k, v in aggr_d_table_n.items())
aggr_1_table_n = create_json_table(aggr_1,
row_names=total_row_names_n)
aggr_1_table_n = dict((k, v[0]) for k, v in aggr_1_table_n.items())
aggr_2_table_n = create_json_table(aggr_2,
row_names=total_row_names_n)
aggr_2_table_n = dict((k, v[0]) for k, v in aggr_2_table_n.items())

# return JSON results
return (m2_dec_table_i, m1_dec_table_i, df_dec_table_i, pdf_dec_table_i,
cdf_dec_table_i, m2_bin_table_i, m1_bin_table_i, df_bin_table_i,
pdf_bin_table_i, cdf_bin_table_i, fiscal_yr_total_df,
fiscal_yr_total_bl, fiscal_yr_total_rf)
return (dist2_dec_table_n,
dist1_dec_table_n,
diff_itax_dec_table_n,
diff_ptax_dec_table_n,
diff_comb_dec_table_n,
dist2_bin_table_n,
dist1_bin_table_n,
diff_itax_bin_table_n,
diff_ptax_bin_table_n,
diff_comb_bin_table_n,
aggr_d_table_n,
aggr_1_table_n,
aggr_2_table_n)


def run_nth_year_gdp_elast_model(year_n, start_year,
Expand All @@ -218,10 +232,10 @@ def run_nth_year_gdp_elast_model(year_n, start_year,
# return gdp_effect results
if return_json:
gdp_df = pd.DataFrame(data=[gdp_effect], columns=['col0'])
gdp_elast_names_i = [x + '_' + str(year_n)
gdp_elast_names_n = [x + '_' + str(year_n)
for x in GDP_ELAST_ROW_NAMES]
gdp_elast_total = create_json_table(gdp_df,
row_names=gdp_elast_names_i,
row_names=gdp_elast_names_n,
num_decimals=5)
gdp_elast_total = dict((k, v[0]) for k, v in gdp_elast_total.items())
return gdp_elast_total
Expand Down
Loading