Notes:
a) If your security office allows to run the exe file generated by us
then the installation is reduced to extracting a zip archive and
changing the config/settings_hasher.py
file -- see [Section A](#Section A).
b) If your deployment machine (where you have PHI) does have access to internet then please follow steps from [Section B](#Section B).
c) If your deployment machine (where you have PHI) does not have access to internet then please follow steps from [Section C](#Section C).
to be able to download the python libraries listed in the
requirements-to-freeze.txt
file.
- install git-for-windows
The benefit of this tool is that you get a Linux-lite
environment
where you can grep, unzip, and find files easy.
-
start the "Git Bash" console
-
extract hasher_software_44e42ba2a056dec3b826c3ebb4d8cb46.zip archive
$ unzip hasher_software_44e42ba2a056dec3b826c3ebb4d8cb46.zip
or copy the folder from the repo
../exe_releases/hasher_md5sum_44e42ba2a056dec3b826c3ebb4d8cb46/
-
verify that the extracted folder structure looks like
. |-- config | |-- logs.cfg | `-- settings_hasher.py |-- hasher.exe |-- logs `-- phi.csv
-
run the software with the sample input file
phi.csv
$ hasher.exe
You should get some output indicating that a file was produced:
>> Wrote output file: phi_hashes.csv
The output file should have the following columns: patid
, F_L_D_G
, F_L_D_R
-
replace the
phi.csv
with actual data and re-run thehasher.exe
-
verify that the number of lines in both files is the same
$ wc -l phi.csv $ wc -l phi_hashes.csv
- install git-for-windows
The benefit of this tool is that you get a Linux-lite
environment
where you can grep, unzip, and find files easily.
- download and install the latest python 3 release (python >= 3.4) from python-3.6.5.exe
Note: Make sure that you have the option "Add Python to environment variables" checked when asked during installation.
-
start the "Git Bash" console
-
create a folder for storing dependencies
$ cd ~ $ mkdir deduper
-
install the helper tool for isolating the installation files
$ pip install virtualenv
-
create and activate the isolation environment
$ virtualenv venv $ source deduper/Scripts/activate
-
verify that the prompt has changed and indicates (venv) as an active python environment
-
install the software
$ pip install -U deduper
-
create a directory for storing configuration and log files
$ mkdir -p ~/deduper/logs
-
create a config file by downloading
config/example/settings_hasher.py.example
file as a template$ wget https://github.com/ufbmi/onefl-deduper/blob/master/config/example/settings_hasher.py.example settings_hasher.py
-
save the test input file phi.csv
$ wget https://github.com/ufbmi/onefl-deduper/blob/master/phi.csv
-
display the software version and run it
$ hasher.exe -v $ hasher.exe -c ~/deduper/settings_hasher.py
You should get some output indicating that a file was produced:
>> Wrote output file: ./phi_hashes.csv
The output file should have the following columns: `patid`, `F_L_D_G`, `F_L_D_R`
-
replace the
phi.csv
with actual data and re-run thehasher.exe
-
verify that the number of lines in both files is the same
$ wc -l phi.csv $ wc -l phi_hashes.csv
Steps 1-5 are necessary to obtain the installation files which will be transferred to the machine without internet access
- install git-for-windows
The benefit of this tool is that you get a Linux-lite
environment
where you can grep, unzip, and find files easily.
- download and install the latest python 3 release (python >= 3.4) from python-3.6.5.exe
Note: Make sure that you have the option "Add Python to environment variables" checked when asked during installation.
-
start the "Git Bash" console
-
create a folder for storing installation files
$ cd ~ $ mkdir my_pypi
-
download the installation files, the config file and sample input file
$ pip download pandas $ pip download virtualenv invoke deduper $ wget https://github.com/ufbmi/onefl-deduper/blob/master/config/example/settings_hasher.py.example settings_hasher.py $ wget https://github.com/ufbmi/onefl-deduper/blob/master/phi.csv
At this point the contents of the
my_pypi
folder should look something like:deduper-0.0.7-py3-none-any.whl dill-0.2.7.1.tar.gz invoke-0.22.1-py3-none-any.whl numpy-1.14.2-cp36-none-win32.whl pandas-0.22.0-cp36-cp36m-win32.whl pyodbc-4.0.23-cp36-cp36m-win32.whl pyreadline-2.1.zip python_dateutil-2.7.2-py2.py3-none-any.whl pytz-2018.4-py2.py3-none-any.whl setuptools_scm-2.0.0-py2.py3-none-any.whl six-1.11.0-py2.py3-none-any.whl SQLAlchemy-1.2.7.tar.gz stevedore-1.28.0-py2.py3-none-any.whl virtualenv_clone-0.3.0-py2.py3-none-any.whl virtualenv-15.2.0-py2.py3-none-any.whl virtualenvwrapper-4.8.2-py2.py3-none-any.whl --- settings_hasher.py phi.csv
-
Transfer the
my_pypi
folder to the restricted windows machine -
install git-for-windows on the restricted windows machine
-
Start the "Git Bash" executable on the restricted windows machine
-
create a folder for storing the installation files
$ mkdir -p ~/deduper/logs $ cd ~/deduper
Note: the next steps assume that the my_pipy
folder is inside the ~/deduper folder
-
create and activate the isolation environment
$ pip install --no-index --find-links=~/deduper/my_pypi virtualenv $ virtualenv venv $ source deduper/Scripts/activate
-
verify that the prompt has changed and indicates (venv) as an active python environment
-
install the software
$ pip install --no-index --find-links=~/deduper/my_pypi deduper
-
create a config file by using the
config/example/settings_hasher.py.example
file as a template$ cp ~/deduper/my_pypi/settings_hasher.py ~/deduper/settings_hasher.py
-
copy the test input file phi.csv
$ cp ~/deduper/my_pypi/phi.csv .
-
display the software version and run it
$ hasher.exe -v $ hasher.exe -c settings_hasher.py
You should get some output indicating that a file was produced:
>> Wrote output file: ./phi_hashes.csv
The output file should have the following columns:
patid
,F_L_D_G
,F_L_D_R
-
replace the
phi.csv
with actual data and re-run thehasher.exe
-
verify that the number of lines in both files is the same
$ wc -l phi.csv $ wc -l phi_hashes.csv