Skip to content

Latest commit

 

History

History
247 lines (148 loc) · 7.52 KB

installation-windows.md

File metadata and controls

247 lines (148 loc) · 7.52 KB

Installation Steps - Windows

Notes:

a) If your security office allows to run the exe file generated by us then the installation is reduced to extracting a zip archive and changing the config/settings_hasher.py file -- see [Section A](#Section A).

b) If your deployment machine (where you have PHI) does have access to internet then please follow steps from [Section B](#Section B).

c) If your deployment machine (where you have PHI) does not have access to internet then please follow steps from [Section C](#Section C).

to be able to download the python libraries listed in the requirements-to-freeze.txt file.

Section A - run a precompiled exe

The benefit of this tool is that you get a Linux-lite environment where you can grep, unzip, and find files easy.

  • start the "Git Bash" console

  • extract hasher_software_44e42ba2a056dec3b826c3ebb4d8cb46.zip archive

      $ unzip hasher_software_44e42ba2a056dec3b826c3ebb4d8cb46.zip
    

or copy the folder from the repo

../exe_releases/hasher_md5sum_44e42ba2a056dec3b826c3ebb4d8cb46/

  • verify that the extracted folder structure looks like

      .
      |-- config
      |   |-- logs.cfg
      |   `-- settings_hasher.py
      |-- hasher.exe
      |-- logs
      `-- phi.csv
    
  • run the software with the sample input file phi.csv

      $ hasher.exe
    

You should get some output indicating that a file was produced:

    >> Wrote output file: phi_hashes.csv

The output file should have the following columns: patid, F_L_D_G, F_L_D_R

  • replace the phi.csv with actual data and re-run the hasher.exe

  • verify that the number of lines in both files is the same

      $ wc -l phi.csv
      $ wc -l phi_hashes.csv
    

Section B - install on a machine with internet access

  1. install git-for-windows

The benefit of this tool is that you get a Linux-lite environment where you can grep, unzip, and find files easily.

  1. download and install the latest python 3 release (python >= 3.4) from python-3.6.5.exe

Note: Make sure that you have the option "Add Python to environment variables" checked when asked during installation.

  1. start the "Git Bash" console

  2. create a folder for storing dependencies

     $ cd ~
     $ mkdir deduper
    
  3. install the helper tool for isolating the installation files

     $ pip install virtualenv
    
  4. create and activate the isolation environment

     $ virtualenv venv
     $ source deduper/Scripts/activate
    
  5. verify that the prompt has changed and indicates (venv) as an active python environment

  6. install the software

     $ pip install -U deduper
    
  7. create a directory for storing configuration and log files

     $ mkdir -p ~/deduper/logs
    
  8. create a config file by downloading config/example/settings_hasher.py.example file as a template

    $ wget https://github.com/ufbmi/onefl-deduper/blob/master/config/example/settings_hasher.py.example settings_hasher.py
    
  9. save the test input file phi.csv

    $ wget https://github.com/ufbmi/onefl-deduper/blob/master/phi.csv
    
  10. display the software version and run it

    $ hasher.exe -v
    $ hasher.exe -c ~/deduper/settings_hasher.py
    

You should get some output indicating that a file was produced:

    >> Wrote output file: ./phi_hashes.csv

The output file should have the following columns: `patid`, `F_L_D_G`, `F_L_D_R`
  1. replace the phi.csv with actual data and re-run the hasher.exe

  2. verify that the number of lines in both files is the same

    $ wc -l phi.csv
    $ wc -l phi_hashes.csv
    

Section C - install on a machine without internet access

Steps 1-5 are necessary to obtain the installation files which will be transferred to the machine without internet access

  1. install git-for-windows

The benefit of this tool is that you get a Linux-lite environment where you can grep, unzip, and find files easily.

  1. download and install the latest python 3 release (python >= 3.4) from python-3.6.5.exe

Note: Make sure that you have the option "Add Python to environment variables" checked when asked during installation.

  1. start the "Git Bash" console

  2. create a folder for storing installation files

     $ cd ~
     $ mkdir my_pypi
    
  3. download the installation files, the config file and sample input file

     $ pip download pandas
     $ pip download virtualenv invoke deduper
    
     $ wget https://github.com/ufbmi/onefl-deduper/blob/master/config/example/settings_hasher.py.example settings_hasher.py
     $ wget https://github.com/ufbmi/onefl-deduper/blob/master/phi.csv
    

    At this point the contents of the my_pypi folder should look something like:

     deduper-0.0.7-py3-none-any.whl
     dill-0.2.7.1.tar.gz
     invoke-0.22.1-py3-none-any.whl
     numpy-1.14.2-cp36-none-win32.whl
     pandas-0.22.0-cp36-cp36m-win32.whl
     pyodbc-4.0.23-cp36-cp36m-win32.whl
     pyreadline-2.1.zip
     python_dateutil-2.7.2-py2.py3-none-any.whl
     pytz-2018.4-py2.py3-none-any.whl
     setuptools_scm-2.0.0-py2.py3-none-any.whl
     six-1.11.0-py2.py3-none-any.whl
     SQLAlchemy-1.2.7.tar.gz
     stevedore-1.28.0-py2.py3-none-any.whl
     virtualenv_clone-0.3.0-py2.py3-none-any.whl
     virtualenv-15.2.0-py2.py3-none-any.whl
     virtualenvwrapper-4.8.2-py2.py3-none-any.whl
     ---
     settings_hasher.py
     phi.csv
    
  4. Transfer the my_pypi folder to the restricted windows machine

  5. install git-for-windows on the restricted windows machine

  6. Start the "Git Bash" executable on the restricted windows machine

  7. create a folder for storing the installation files

     $ mkdir -p ~/deduper/logs
     $ cd ~/deduper
    

Note: the next steps assume that the my_pipy folder is inside the ~/deduper folder

  1. create and activate the isolation environment

    $ pip install --no-index --find-links=~/deduper/my_pypi virtualenv
    $ virtualenv venv
    $ source deduper/Scripts/activate
    
  2. verify that the prompt has changed and indicates (venv) as an active python environment

  3. install the software

    $ pip install --no-index --find-links=~/deduper/my_pypi deduper
    
  4. create a config file by using the config/example/settings_hasher.py.example file as a template

    $ cp ~/deduper/my_pypi/settings_hasher.py ~/deduper/settings_hasher.py
    
  5. copy the test input file phi.csv

    $ cp ~/deduper/my_pypi/phi.csv .
    
  6. display the software version and run it

    $ hasher.exe -v
    $ hasher.exe -c settings_hasher.py
    

    You should get some output indicating that a file was produced:

    >> Wrote output file: ./phi_hashes.csv
    

    The output file should have the following columns: patid, F_L_D_G, F_L_D_R

  7. replace the phi.csv with actual data and re-run the hasher.exe

  8. verify that the number of lines in both files is the same

    $ wc -l phi.csv
    $ wc -l phi_hashes.csv