Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different test results on pr-261-MH branch #265

Closed
wants to merge 5 commits into from
Closed

Different test results on pr-261-MH branch #265

wants to merge 5 commits into from

Conversation

martinholmer
Copy link
Contributor

I'm getting different PUF test_data results on a local branch that includes the code changes in PR #261.
Here is what I've done:

  1. Include all recent code changes on master branch in my local master branch
  2. Execute make git-pr N=261
  3. Execute git merge master to include recent master changes in the local copy of PR Include Dependent Filers in PUF #261
  4. Execute make puf_data/cps-matched-puf.csv
  5. Execute make puf_data/puf.csv
  6. Execute test_pufcsv_data
  7. Copy puf_agg_actual.txt to puf_agg_expect.txt

@anderson, note the maximum age_spouse difference and others.
Have I done something wrong? Why are the test results different?
Can it be that the float_format='%.2f' used to write the cps-matched-puf.csv file makes a difference?
What version of pandas are you using? I'm using 0.23.0 pandas.
Do the test_pufcsv_data results I'm getting look "fishy" in any way?

@andersonfrailey
Copy link
Collaborator

@martinholmer I'm using version 0.22.0 of pandas, which is causing an error after PR #262 because pd.concat doesn't take sort as an argument. So there might be a difference in Pandas versions causing this discrepancy.

@martinholmer
Copy link
Contributor Author

@andersonfrailey, One possibility not mentioned earlier in #265, is that the two raw PUF and CPS files I have on my computer are not the same as what you have on your computer. Here is the byte size and MD5 checksum for the two raw files:

iMac2:Matching mrh$ ls -l puf2011.csv 
-rw-r--r--@ 1 mrh  staff  91305290 Jul 23 17:44 puf2011.csv

iMac2:Matching mrh$ md5 puf2011.csv 
MD5 (puf2011.csv) = 75f71a76baf2dedaa647c1baeee1b9ff

iMac2:Matching mrh$ ls -l cpsmar2016.csv 
-rw-r--r--@ 1 mrh  staff  296097633 Jul 24 09:20 cpsmar2016.csv

iMac2:Matching mrh$ md5 cpsmar2016.csv 
MD5 (cpsmar2016.csv) = e15da5b3fed60db7ec9ebae4fe59178d

Do you get the same size and checksum values on your computer?

@martinholmer
Copy link
Contributor Author

martinholmer commented Jul 24, 2018

@andersonfrailey said:

I'm using version 0.22.0 of pandas, which is causing an error after PR #262 because pd.concat doesn't take sort as an argument. So there might be a difference in Pandas versions causing this discrepancy.

Can you upgrade to 0.23.0 to see if you get the results I'm getting. Also, look at the concat documentation for 0.23.0 to see what the value of sort should be given what you're doing. My hunch is the before 0.23.0 pandas didn't realize there was an ambiguity when using concat in some instances.

@hdoupe

@andersonfrailey
Copy link
Collaborator

I get the same md5 results as you, @martinholmer. Just updated my pandas version and am running everything again.

@martinholmer
Copy link
Contributor Author

martinholmer commented Jul 24, 2018

@andersonfrailey said:

I get the same md5 results [for the raw data files] as you, @martinholmer. Just updated my pandas version and am running everything again.

Here's my report: when I upgraded from pandas 0.23.0 to 0.23.3 I got slightly different results.
I'll post those results in a new commit to #265. Click on this commit f035a94 to see the differences between 0.23.0 and 0.23.3 pandas.

@martinholmer
Copy link
Contributor Author

@andersonfrailey, continuing to use pandas 0.23.3 (the latest version), I find that make puf_data/puf.csv produces exactly the same results no matter what value is used for the concat function argument sort. Both True and False produce the same puf_agg test results.

@martinholmer
Copy link
Contributor Author

@andersonfrailey, if you glance through pandas issue 4588, which was opened in 2013 and was just recently "resolved" in 0.23.0 with bug fixed in 0.23.1, you can see a possible reason why we are getting different results with pandas 0.22 and 0.23.1+. The title of that issue is BUG: concat unwantedly sorts DataFrame column names if they differ.

The discussion of this issue (bug report) eventually came up with the idea of adding the sort parameter to the concat function. That was introduced in 0.23.0, but apparently had a bug that was fixed in the 0.23.1 version.

@hdoupe had an experience like this with pandas last year, in which they broke our code (and many other people's code) when they introduced without any notice a change in groupby behavior.

If you can get the same results as I get when you use pandas 0.23.3 (the latest version), then I think we should consider changing the environment and conda recipe to pandas >= 0.23.3 and move on with our work. What do you think?

@andersonfrailey
Copy link
Collaborator

andersonfrailey commented Jul 25, 2018

@martinholmer said:

If you can get the same results as I get when you use pandas 0.23.3 (the latest version), then I think we should consider changing the environment and conda recipe to pandas >= 0.23.3 and move on with our work. What do you think?

I think that's sensible. I'll update to the latest version of Pandas and compare results.

@martinholmer
Copy link
Contributor Author

The tip of pull request #265 uses pandas 0.23.3 (the latest version) and changes the concat call from sort=True to sort=False and then does "make all". As mentioned previously, when using 0.23.3 the value of sort makes no difference: True and False produce the same puf.csv file. (But that seemed not to be the case when using pandas 0.23.0, which suggests there was a problem in 0.23.0 that has been resolved.)

The new puf.csv, puf_weights.csv.gz, and puf_ratios.csv files pass all the tests.

This pull request is not for merging, but put on GitHub in order to facilitate comparison of results generated on other local computers.

@andersonfrailey

@andersonfrailey
Copy link
Collaborator

@martinholmer, even with the new Pandas version I'm getting different results than you, but they haven't changed with regards to PR #261. I'm going to push up all my changes to #261. Maybe there's a commit somewhere that didn't get pushed before you started testing.

I'll also start #261 from scratch and see what happens. Shouldn't take very long to get initial results.

@andersonfrailey
Copy link
Collaborator

Even starting #261 from scratch I get the same results. @martinholmer can you try updating your branch and running everything again?

@hdoupe
Copy link
Collaborator

hdoupe commented Jul 26, 2018

[UPDATE: Puts resulting files in a directory on your regular file system]

Here are the results that I got:

HDoupe-MacBook-Pro:taxdata_265 henrydoupe$ md5 artifacts/cps-matched-puf.csv 
MD5 (artifacts/cps-matched-puf.csv) = 246299f18a5688ad8bb6f2dad9ed74a6
HDoupe-MacBook-Pro:taxdata_265 henrydoupe$ md5 artifacts/puf.csv 
MD5 (artifacts/puf.csv) = 92254771dd81331174d96a821aa0d134
HDoupe-MacBook-Pro:taxdata_265 henrydoupe$ ls

Here's the Dockerfile:

FROM continuumio/miniconda3

# install make
RUN apt-get update && apt-get -y install build-essential

WORKDIR /home
RUN git clone https://github.com/open-source-economics/taxdata

COPY ./puf2011.csv /home/taxdata/puf_data/StatMatch/Matching/
COPY ./cpsmar2016.csv /home/taxdata/puf_data/StatMatch/Matching/

WORKDIR /home/taxdata/

RUN git config --global user.email "[email protected]" && git config --global user.name "hdoupe"
RUN git fetch origin
RUN git fetch origin pull/261/head:dsifix
RUN git checkout dsifix
RUN git merge origin/master

# create conda environment
RUN conda env create

# directory where the results will go
RUN mkdir /home/artifacts

CMD ["/bin/bash"]

To build, set your directory like so (artifacts is a directory where the results will go):

HDoupe-MacBook-Pro:taxdata_265 henrydoupe$ ls
Dockerfile	artifacts	cpsmar2016.csv	puf2011.csv
HDoupe-MacBook-Pro:taxdata_265 henrydoupe$ 

Build the image:

docker build -t taxdata:pr261 ./

Run the container:

docker run -v $PWD/artifacts:/home/artifacts -it taxdata:pr261 /bin/bash

Run the following commands:

source activate taxdata-dev
make puf_data/cps-matched-puf.csv
make puf_data/puf.csv
md5sum puf_data/puf.csv 
md5sum puf_data/cps-matched-puf.csv 
cp puf_data/puf.csv /home/artifacts
cp puf_data/cps-matched-puf.csv /home/artifacts/

I hope this helps. I got the exact checksum locally. Perhaps, I did something wrong in this process. The upside is that everyone can see exactly how the environment and files were set up and any errors can be fixed relatively painlessly.

@martinholmer
Copy link
Contributor Author

@hdoupe, Thanks for all the docker work.

The checksums are a good start, but it seems to me the next step is to compare puf.csv records with the same RECID value in the puf.csv files generated on different computers or under different environments on the same computer.

@andersonfrailey

@hdoupe
Copy link
Collaborator

hdoupe commented Jul 26, 2018

Sure, I'll update the comment above to show how to transfer the files from the container back to your local host file system.

@hdoupe
Copy link
Collaborator

hdoupe commented Jul 26, 2018

I updated my comment with a command to put files in the regular file system.

@andersonfrailey
Copy link
Collaborator

I created a version of puf.csv in the Docker container @hdoupe created and the file I got was the exact same as what he got.

@martinholmer
Copy link
Contributor Author

@andersonfrailey said:

I created a version of puf.csv in the Docker container @hdoupe created and the file I got was the exact same as what he got.

What does that mean? You said this morning that @hdoupe had generated a puf.csv file that was different from what you had generated and from what I had generated.

Is the docker-generated puf.csv you just generated exactly the same as the ones you've been generating over the past few days without using docker?

@hdoupe
Copy link
Collaborator

hdoupe commented Jul 27, 2018

One way to think about it is that the resulting files generated by docker (I generated the same ones without docker, too) is the "correct" one if you agree with the steps used to create the docker image and run the file creation scripts. I say "correct" because the process for setting up the environment and running the scripts is much more transparent--you can see each and every step taken. Further, you don't have to worry about other factors that could be affecting your results like what was in your conda build cache when the environment was built.

The next step that I would take would be to try to recreate the files without docker using a process similar to that of the docker container process outlined above. If you can't, then there may be some local environment differences that are affecting the results. If you are successful in doing so and you agree that the steps taken to check out the correct git branch, merge in the correct changes, use the correct files, install the correct packages, and run the correct scripts; then that is probably the correct file.

I'm happy to continue this discussion. If there are any questions or misunderstandings about docker, I am happy to address them or reference the appropriate documentation.

@martinholmer
Copy link
Contributor Author

@hdoupe and @andersonfrailey, The use of docker to figure out why we all got different puf.csv files without using docker is an excellent idea. But none of the discussion here shows exactly which python/package environment docker is creating. There are many environments (probably in the hundreds) that are consistent with the current taxdata/environment.yml file. Which environment did docker create?

@martinholmer
Copy link
Contributor Author

@hdoupe, I don't understand your use of md5sum. On my Mac, I see this:

iMac2:taxdata mrh$ cd puf_data
iMac2:puf_data mrh$ md5 puf.csv
MD5 (puf.csv) = 33797d7ae7fc098fc8df468de7c17ce5
iMac2:puf_data mrh$ md5sum puf.csv
-bash: md5sum: command not found
iMac2:puf_data mrh$ 

What's the md5sum program?

@andersonfrailey
Copy link
Collaborator

@martinholmer it's my understanding that the Docker container is creating an environment with just the latest versions of the packages specified in environment.yml and their dependencies. Here is what I get when I run conda list in the container:

# Name                    Version                   Build  Channel
atomicwrites              1.1.5                    py36_0    conda-forge
attrs                     18.1.0                     py_1    conda-forge
blas                      1.1                    openblas    conda-forge
bokeh                     0.13.0                   py36_0    conda-forge
bzip2                     1.0.6                h470a237_2    conda-forge
ca-certificates           2018.4.16                     0    conda-forge
certifi                   2018.4.16                py36_0    conda-forge
jinja2                    2.10                       py_1    conda-forge
libffi                    3.2.1                         3    conda-forge
libgcc-ng                 7.2.0                hdf63c60_3  
libgfortran               3.0.0                         1  
libstdcxx-ng              7.2.0                hdf63c60_3  
markupsafe                1.0                      py36_0    conda-forge
more-itertools            4.2.0                    py36_1    conda-forge
ncurses                   6.1                  hfc679d8_1    conda-forge
numpy                     1.14.5          py36_blas_openblashd3ea46f_201  [blas_openblas]  conda-forge
openblas                  0.2.20                        8    conda-forge
openssl                   1.0.2o                        0    conda-forge
packaging                 17.1                       py_0    conda-forge
pandas                    0.23.3                   py36_0    conda-forge
patsy                     0.5.0                      py_1    conda-forge
pip                       18.0                     py36_0    conda-forge
pluggy                    0.6.0                      py_0    conda-forge
pulp                      1.6.8                    py36_0    conda-forge
py                        1.5.4                      py_0    conda-forge
pyparsing                 2.2.0                      py_1    conda-forge
pytest                    3.6.3                    py36_0    conda-forge
python                    3.6.6                h5001a0f_0    conda-forge
python-dateutil           2.7.3                      py_0    conda-forge
pytz                      2018.5                     py_0    conda-forge
pyyaml                    3.12                     py36_1    conda-forge
readline                  7.0                  haf1bffa_1    conda-forge
scipy                     1.1.0           py36_blas_openblashd3ea46f_201  [blas_openblas]  conda-forge
setuptools                40.0.0                   py36_0    conda-forge
six                       1.11.0                   py36_1    conda-forge
sqlite                    3.24.0               h2f33b56_0    conda-forge
statsmodels               0.9.0                    py36_0    conda-forge
tk                        8.6.8                         0    conda-forge
tornado                   5.1                      py36_0    conda-forge
tqdm                      4.23.4                     py_0    conda-forge
wheel                     0.31.1                   py36_0    conda-forge
xz                        5.2.3                         0    conda-forge
yaml                      0.1.7                         0    conda-forge
zlib                      1.2.11               h470a237_3    conda-forge

It's possible that overtime the taxdata-dev environments you and I have on our machines became polluted with different versions of packages that are affecting the results. If this is the case, it makes sense that @hdoupe gets the same results as the docker container because, to my knowledge, yesterday was the first time he had ever created the taxdata-dev environment and run the matching scripts so, like the docker container, he's got a totally clean environment.

I'm going to try deleting and recreating my taxdata-dev environment to see if I get the same output files that were created by @hdoupe and in the docker container. If I do, I think we can say with confidence that the taxdata-dev environment you and I are running are at the root cause of our troubles.

@martinholmer
Copy link
Contributor Author

@andersonfrailey said:

I'm going to try deleting and recreating my taxdata-dev environment to see if I get the same output files that were created by @hdoupe and in the docker container. If I do, I think we can say with confidence that the taxdata-dev environment you and I are running are at the root cause of our troubles.

So, you're saying that @hdoupe generated exactly the same puf.csv file without docker and with docker?
Is that right? If so, what is the MD5 checksum of that puf.csv file?

@andersonfrailey
Copy link
Collaborator

@martinholmer that's correct. Here is the MD5 for that file:
92254771dd81331174d96a821aa0d134

@andersonfrailey
Copy link
Collaborator

Good news! After deleting and recreating my taxdata-dev environment I produced the exact same puf.csv file as @hdoupe did and the exact same file as what was produced in the docker container.

Now that we've been able to produce the same file across two machines and in a docker container I feel confident in saying the issue was with packaging. @martinholmer would you mind running conda list in the environment you were using to create your files so we can compare versions of everything and try to pin down which package was causing us problems?

@martinholmer
Copy link
Contributor Author

@andersonfrailey said:

If I do, I think we can say with confidence that the taxdata-dev environment you and I are running are at the root cause of our troubles.

I would not be surprised if this turns out to be true.

But I think you're not giving enough consideration to another possibility: that the Matching code (and there is a lot of it) has been written in a way that does not respect the differences between Python 2.7 and Python 3.6.
I actually have proof, as of this morning, that this is a fact. On my local branch of PR#261, I generate a puf.csv file with this checksum:

iMac2:taxdata mrh$ md5 puf_data/puf*csv
MD5 (puf_data/puf-MH-0724.csv) = 33797d7ae7fc098fc8df468de7c17ce5
MD5 (puf_data/puf-MH.csv) = 33797d7ae7fc098fc8df468de7c17ce5

But when I add this line at the top of each Python file in the Matching subdirectory, I generate a different puf.csv file:

from __future__ import division

Here is the info on my newly generated (under Python 2.7) puf.csv file now that integer division produces a floating-point values as on Python 3.6:

iMac2:taxdata mrh$ md5 puf_data/puf*csv
MD5 (puf_data/puf-MH-0724.csv) = 33797d7ae7fc098fc8df468de7c17ce5
MD5 (puf_data/puf-MH.csv) = 33797d7ae7fc098fc8df468de7c17ce5
MD5 (puf_data/puf.csv) = 6cac486cbec54b5c0d5c378910672e69

Why is the puf.csv file different? Probably because some of the Matching code divides two integers and expects to get a floating-point result (which is what happens in Python 3.6). But under Python 2.7 the result is integer division. Here's some examples of lines of code that might fail in this regard:

grep -nH -e " / " *py
add_cps_vars.py:25:    puffile['wt'] = puffile['s006'] / 100
cps_rets.py:355:                record['130'] = wasp / float(totalwas)
cps_rets.py:692:                        if income / float(totincx + income) < 0.5:
cps_rets.py:775:            if indjs == 1 and float(totincx) / income > 0.99:
phase1.py:70:        wageshr = was / tpi
phase1.py:71:        capshr = (intst + texint + dbe) / tpi
phase1.py:159:                                countx['SOI_wgt'] / countx['CPS_wgt'], 0)
phase2.py:21:        factor = SOI['wt'].sum() / CPS['wt'].sum()
soi_rets.py:59:    wt = SOI['s006'] / 100

It looks to me that the S006 values in the puf2011.csv file are all integer. So, the line above from the add_cps_vars.py file gives different results on Python 2.7 (without the future import) and Python 3.6.
For example, if puffile['s006'] is 2090, the Python 3.6 division result is 20.90 while under Python 2.7 it is the integer 20.

The fact that adding from __future__ import division did not produce the same MD5 checksum as from the docker environment suggests, to me, that there are 2.7-vs-3.6 incompatibilities in the Matching code other than integer division.

@hdoupe

@martinholmer
Copy link
Contributor Author

@andersonfrailey said:

would you mind running conda list in the environment you were using to create your files so we can compare versions of everything and try to pin down which package was causing us problems?

Here is what I have on my computer:

iMac2:taxdata mrh$ conda list
# packages in environment at /Users/mrh/anaconda:
#
# Name                    Version                   Build  Channel
_license                  1.1                      py27_1  
alabaster                 0.7.10           py27h9dd7d6e_0  
anaconda                  custom           py27h2cfa9e9_0  
anaconda-client           1.6.5            py27hc13fba8_0  
anaconda-navigator        1.6.9            py27h103b016_0  
anaconda-project          0.8.0            py27h9e3d455_0  
appnope                   0.1.0            py27hb466136_0  
appscript                 1.0.1            py27h451298e_1  
argcomplete               1.0.0                    py27_1  
asn1crypto                0.22.0           py27h61af4a7_1  
astroid                   1.5.3            py27h96f3fd4_0  
astropy                   2.0.2            py27h87cc2bd_4  
attrs                     17.4.0                   py27_0  
babel                     2.5.0            py27h7311c9e_0  
backports                 1.0              py27hb4f9756_1  
backports.functools_lru_cache 1.4              py27h2aca819_1  
backports.shutil_get_terminal_size 1.0.0            py27hc9115de_2  
backports_abc             0.5              py27h6972548_0  
beautifulsoup4            4.6.0            py27h9416283_1  
bitarray                  0.8.1            py27hd5bfd95_0  
bkcharts                  0.2              py27haafc882_0  
blas                      1.0                         mkl  
blaze                     0.11.3           py27hb49378a_0  
bleach                    2.0.0            py27ha7d1710_0  
bokeh                     0.12.16                  py27_0  
boto                      2.48.0           py27hacdd0fd_1  
bottleneck                1.2.1            py27h71f98a3_0  
bzip2                     1.0.6                h92991f9_1  
ca-certificates           2018.03.07                    0  
cdecimal                  2.3              py27hf5d9fd9_1  
certifi                   2018.4.16                py27_0  
cffi                      1.10.0           py27haac214c_1  
chardet                   3.0.4            py27h2842e91_1  
chest                     0.2.3                    py27_0  
click                     6.7              py27h2b86a94_0  
cloudpickle               0.4.0            py27h665dddb_0  
clyent                    1.2.2            py27hc0ae608_0  
colorama                  0.3.9            py27hbbe92b6_0  
conda                     4.5.8                    py27_0  
conda-build               3.0.10           py27h2b31a83_0  
conda-env                 2.6.0                h36134e3_0  
conda-verify              2.0.0                    py27_0  
configobj                 5.0.6                    py27_0  
configparser              3.5.0            py27hc7edf1b_0  
contextlib2               0.5.5            py27h9cb85f4_0  
coverage                  4.4.1                    py27_0  
cryptography              2.0.3            py27hab69567_1  
curl                      7.55.1               h7601780_3  
cycler                    0.10.0           py27hfc73c78_0  
cylp                      0.7.1                     <pip>
cython                    0.26.1           py27h6a053f9_0  
cytoolz                   0.8.2            py27ha56cda1_0  
dask                      0.15.3           py27h8a1b457_0  
dask-core                 0.15.3           py27h6a32bf4_0  
datashape                 0.5.4            py27hd6a1745_0  
dbus                      1.10.22              h50d9ad6_0  
decorator                 4.1.2            py27h9f877ea_0  
dill                      0.2.5                    py27_0  
distributed               1.19.1           py27h67dd2ec_0  
docutils                  0.14             py27h0befae3_0  
entrypoints               0.2.3            py27hd680fb1_2  
enum34                    1.1.6            py27hf475452_1  
et_xmlfile                1.0.1            py27hc42f929_0  
execnet                   1.3.0                    py27_1  
expat                     2.2.4                h8f26bf8_1  
fastcache                 1.0.2            py27hc4635c7_0  
filelock                  2.0.12           py27h72fe922_0  
flask                     0.12.2           py27h3ac5568_0  
flask-cors                3.0.3            py27h13db576_0  
freetype                  2.9.1                hb4e5f40_0  
funcsigs                  1.0.2            py27hb9f6266_0  
functools32               3.2.3.2          py27h8ceab06_1  
futures                   3.1.1            py27hb02a37a_0  
get_terminal_size         1.0.0                h7520d66_0  
gettext                   0.19.8.1             hb0f4f8b_2  
gevent                    1.2.2            py27hc02608c_0  
glib                      2.53.6               ha08cb78_1  
glob2                     0.5              py27h2f8fe13_0  
gmp                       6.1.2                h4a9834d_0  
gmpy2                     2.0.8            py27h7e2fca4_1  
greenlet                  0.4.12           py27h081ed54_0  
grin                      1.2.1            py27hc43e5f3_1  
h5py                      2.7.0            py27h217cc45_1  
hdf5                      1.10.1               h6090a45_0  
heapdict                  1.0.0            py27hb5e74ad_0  
html5lib                  0.999999999      py27hec7e2bc_0  
icu                       58.2                 hea21ae5_0  
idna                      2.6              py27hedea723_1  
imageio                   2.2.0            py27h37746d9_0  
imagesize                 0.7.1            py27h4f7bcc8_0  
intel-openmp              2018.0.0             h68bdfb3_7  
ipaddress                 1.0.18           py27h5b9a5b9_0  
ipykernel                 4.6.1            py27h1e70a78_0  
ipython                   5.4.1            py27h2b3d779_1  
ipython_genutils          0.2.0            py27h8b9a179_0  
ipywidgets                7.0.0            py27h3e52029_0  
isort                     4.2.15           py27h5bf637f_0  
itsdangerous              0.24             py27h3948ded_1  
jbig                      2.1                  h4d881f8_0  
jdcal                     1.3              py27hfeaf94f_0  
jedi                      0.10.2           py27hbb5dc62_0  
jinja2                    2.9.6            py27h92590e2_1  
jpeg                      9b                   haccd157_1  
jsonschema                2.6.0            py27hd9b497e_0  
jupyter                   1.0.0            py27hec63c99_0  
jupyter_client            5.1.0            py27hfaf569a_0  
jupyter_console           5.2.0            py27h9702a86_1  
jupyter_core              4.3.0            py27hd5161ba_0  
jupyterlab                0.27.0           py27h25d4955_2  
jupyterlab_launcher       0.4.0            py27he518b91_0  
kiwisolver                1.0.1            py27h9856860_0  
lazy-object-proxy         1.3.1            py27h712ce3f_0  
libcxx                    4.0.1                h579ed51_0  
libcxxabi                 4.0.1                hebd6815_0  
libedit                   3.1                  hb4e282d_0  
libffi                    3.2.1                hd939716_3  
libgfortran               3.0.1                h93005f0_2  
libiconv                  1.15                 h99df5da_5  
libopenblas               0.2.20               hdc02c5d_4  
libpng                    1.6.34               he12f830_0  
libsodium                 1.0.13               hba5e272_2  
libssh2                   1.8.0                h1218725_2  
libtiff                   4.0.9                hcb84e12_1  
libxml2                   2.9.4                hbd0960b_5  
libxslt                   1.1.29               h95a2935_5  
llvmlite                  0.22.0           py27h0df46ed_0  
locket                    0.2.0            py27ha10513d_1  
lxml                      4.1.0            py27hcb5f3a6_0  
lzo                       2.10                 hb6b8854_1  
markupsafe                1.0              py27hd3c86fa_1  
matplotlib                2.2.2            py27hbf02d85_2  
mccabe                    0.6.1            py27h1f69e8d_0  
memory_profiler           0.52.0                   py27_0  
mistune                   0.7.4            py27h1658d75_0  
mkl                       2018.0.3                      1  
mkl-service               1.1.2                    py27_3  
mkl_fft                   1.0.1            py27h917ab60_0  
mkl_random                1.0.1            py27h78cc56f_0  
mock                      2.0.0                    py27_0  
mpc                       1.0.3                hc455b36_4  
mpfr                      3.1.5                h7fa3772_1  
mpmath                    0.19             py27h09cdc99_2  
msgpack-python            0.4.8            py27h635ded4_0  
multipledispatch          0.4.9            py27h10993aa_0  
navigator-updater         0.1.0            py27ha63e0b4_0  
nbconvert                 5.3.1            py27h6455e4c_0  
nbformat                  4.4.0            py27hddc86d0_0  
ncurses                   6.0                  ha932d30_1  
networkx                  2.0              py27h2503496_0  
nltk                      3.2.4            py27h1626047_0  
nose                      1.3.7            py27h2ee3cb8_2  
notebook                  5.0.0            py27h5f5981d_2  
numba                     0.37.0          np114py27h6027bcc_0  
numexpr                   2.6.5            py27h057f876_0  
numpy                     1.14.5           py27h648b28d_4  
numpy-base                1.14.5           py27ha9ae307_4  
numpydoc                  0.7.0            py27h022f19e_0  
odo                       0.5.1            py27h992a9f7_0  
olefile                   0.44             py27h73ba740_0  
openpyxl                  2.4.8            py27h70e7ed9_1  
openssl                   1.0.2o               h26aff7b_0  
packaging                 16.8             py27h24b219a_0  
pandas                    0.23.3           py27h6440ff4_0  
pandoc                    1.19.2.1             ha5e8f32_1  
pandocfilters             1.4.2            py27hed78c4e_1  
partd                     0.3.8            py27h7560dbf_0  
path.py                   10.3.1           py27h5e25276_0  
pathlib2                  2.3.0            py27he09da1e_0  
patsy                     0.4.1            py27h40ed276_0  
pbr                       1.10.0                   py27_0  
pcre                      8.41                 h29eefc5_0  
pep8                      1.7.0            py27hff3397c_0  
pexpect                   4.2.1            py27hc4e4961_0  
pickleshare               0.7.4            py27h37e3d41_0  
pillow                    5.1.0            py27hb68e598_0  
pip                       9.0.1            py27h61def0c_3  
pkginfo                   1.4.1            py27ha9221e7_0  
pluggy                    0.6.0            py27had36429_0  
ply                       3.10             py27h6279b8a_0  
policybrain-builder       0.1.0                     <pip>
policybrain-builder       0.0.1                     <pip>
prompt_toolkit            1.0.15           py27h4a7b9c2_0  
psutil                    5.4.3            py27h1de35cc_0  
ptyprocess                0.5.2            py27h70f6364_0  
pulp                      1.6.8                    py27_0    conda-forge
py                        1.5.2            py27he6783ac_0  
pyasn1                    0.1.9                    py27_0  
pyaudio                   0.2.7            py27h3777516_1  
pycodestyle               2.3.1            py27h5b634e0_0  
pycosat                   0.6.3            py27h6c51c7e_0  
pycparser                 2.18             py27h0d28d88_1  
pycrypto                  2.6.1            py27h4efa152_1  
pycurl                    7.43.0           py27h398a7fe_3  
pyflakes                  1.6.0            py27h4446e76_0  
pygments                  2.2.0            py27h1a556bb_0  
pylint                    1.7.4            py27hc678664_0  
pyodbc                    4.0.17           py27hc9de18c_0  
pyopenssl                 17.2.0           py27h732fe57_0  
pyparsing                 2.2.0            py27h5bb6aaf_0  
pyqt                      5.6.0            py27hf21fe59_6  
pysocks                   1.6.7            py27h1cff6a6_1  
pytables                  3.4.2            py27ha4551b8_2  
pytest                    3.4.2                    py27_0  
pytest-forked             0.2              py27hff25d5b_0  
pytest-xdist              1.22.1                   py27_0  
python                    2.7.15               h138c1fe_0  
python-dateutil           2.6.1            py27hd56c96b_1  
python.app                2                py27h84f57d0_6  
pytz                      2017.2           py27hb891d23_1  
pywavelets                0.5.2            py27hd99e88a_0  
pyyaml                    3.12             py27ha7932d0_1  
pyzmq                     16.0.2           py27he61c07e_2  
qt                        5.6.2               h9975529_14  
qtawesome                 0.4.4            py27hdeb2f59_0  
qtconsole                 4.3.1            py27hdc90b4f_0  
qtpy                      1.3.1            py27h39159f8_0  
readline                  7.0                  h81b24a6_3  
redis                     3.2.0                         0  
redis-py                  2.10.5                   py27_0  
requests                  2.18.4           py27h9b2b37c_1  
rope                      0.10.5           py27he855d02_0  
ruamel_yaml               0.11.14          py27h31666c4_2  
scandir                   1.6              py27h97aa1ee_0  
scikit-image              0.13.0           py27h03e84e1_1  
scikit-learn              0.19.1           py27h9788993_0  
scipy                     1.1.0            py27hf1f7d93_0  
seaborn                   0.8.0            py27h92884e4_0  
setuptools                36.5.0           py27h2a45cec_0  
simplegeneric             0.8.1            py27h6db5e31_0  
singledispatch            3.4.0.3          py27he22c18d_0  
sip                       4.18.1           py27h6300f65_2  
six                       1.11.0           py27h7252ba3_1  
snowballstemmer           1.2.1            py27h68ac032_0  
sockjs-tornado            1.0.3                    py27_0  
sortedcollections         0.5.3            py27h8094be4_0  
sortedcontainers          1.5.7            py27h322dbbf_0  
sphinx                    1.6.3            py27h11269f0_0  
sphinxcontrib             1.0              py27hd2ed746_1  
sphinxcontrib-websupport  1.0.1            py27h857890b_1  
spyder                    3.2.4            py27h93d9c3e_0  
sqlalchemy                1.1.13           py27hcbc9ed3_0  
sqlite                    3.23.1               hf1716c9_0  
ssl_match_hostname        3.5.0.1          py27h8780752_2  
statsmodels               0.9.0            py27h917ab60_0  
subprocess32              3.2.7            py27h24b2887_0  
sympy                     1.1.1            py27hce55102_0  
tblib                     1.3.2            py27ha684fc4_0  
terminado                 0.6              py27he40bf16_0  
testpath                  0.3.1            py27h72d81a5_0  
tk                        8.6.7                hcdce994_1  
toolz                     0.8.2            py27h27228c4_0  
tornado                   4.5.2            py27h29aec9e_0  
tqdm                      4.23.4                   py27_0  
traitlets                 4.3.2            py27hcf08151_0  
typing                    3.6.2            py27h646fea0_0  
unicodecsv                0.14.1           py27h170f95c_0  
unixodbc                  2.3.4                h4cb4dde_1  
urllib3                   1.22             py27hc3787e9_0  
wcwidth                   0.1.7            py27h817c265_0  
webencodings              0.5.1            py27h19a9f58_1  
werkzeug                  0.12.2           py27hcac71f8_0  
wheel                     0.29.0           py27h84bd1c0_1  
widgetsnbextension        3.0.2            py27h56f70de_1  
wrapt                     1.10.11          py27hd341262_0  
xlrd                      1.1.0            py27hbd41ed1_1  
xlsxwriter                1.0.2            py27h7f1064a_0  
xlwings                   0.11.4           py27h4d78f01_0  
xlwt                      1.2.0            py27hbeec4ae_0  
xz                        5.2.3                ha24016e_1  
yaml                      0.1.7                hff548bb_1  
zeromq                    4.2.2                h131e0f7_1  
zict                      0.1.3            py27h5fff8b1_0  
zlib                      1.2.11               h60db283_1  
iMac2:taxdata mrh$ 

@andersonfrailey
Copy link
Collaborator

After a phone conversation with @martinholmer, I modified the matching scripts to ensure that all division in the matching scripts results in floating point numbers whether you're using Python 2.7 or 3.6. Once I did this I was able to produce the same file using both 2.7 and 3.6 and each is the same as what we've been getting using the docker container.

It looks like between the working with a clean environment and the changes I made at the suggestion of @martinholmer to ensure the same results in 2.7 and 3.6 have solved our replicability problem. For a little more detail, I looked at the differences between the environment I used and the one @martinholmer uses. Here are the only packages using different versions:

attrs                MH: 17.4.0       AF: 18.1.0
blas                 MH: 1.0          AF: 1.1
bokeh                MH: 0.12.16      AF: 0.13.0
ca-certificates      MH: 2018.03.07   AF: 2018.4.16
jinja2               MH: 2.9.6        AF: 2.10
libgfortran          MH: 3.0.1        AF: 3.0.0
ncurses              MH: 6.0          AF: 6.1
packaging            MH: 16.8         AF: 17.1
patsy                MH: 0.4.1        AF: 0.5.0
pip                  MH: 9.0.1        AF: 18.0
py                   MH: 1.5.2        AF: 1.5.4
pytest               MH: 3.4.2        AF: 3.6.3
python               MH: 2.7.15       AF: 3.6.6
python-dateutil      MH: 2.6.1        AF: 2.7.3
pytz                 MH: 2017.2       AF: 2018.5
setuptools           MH: 36.5.0       AF: 40.0.0
sqlite               MH: 3.23.1       AF: 3.24.0
tk                   MH: 8.6.7        AF: 8.6.8
tornado              MH: 4.5.2        AF: 5.1
tqdm                 MH: 4.23.4       AF: 4.24.0
wheel                MH: 0.29.0       AF: 0.31.1
xz                   MH: 5.2.3        AF: 5.2.4

I'm going to run all of the make files overnight to get a new PUF, new weights, and new ratios and will update PR #261 afterwards for others to test.

@martinholmer
Copy link
Contributor Author

Conversation has continued in #261.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants