Skip to content

Commit

Permalink
added install.sh to append local repository to python search paths
Browse files Browse the repository at this point in the history
  • Loading branch information
Peter Habelitz committed Jan 22, 2019
1 parent ec96c12 commit 820026e
Show file tree
Hide file tree
Showing 4 changed files with 121 additions and 102 deletions.
52 changes: 27 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,11 @@ phs is an ergonomic tool for performing hyperparameter searches on numerous cump
+ handy monitor and visualization functions for supervising the the progress are already built-in
+ (fault tolerance)

## Installation
In order to use the full functionality of phs just clone this git repository to a local folder. Since phs itself is based on the package called phs you have to simply import it in your script.
## Standalone Installation
In order to use the full functionality of phs just clone this git repository to a local folder. Installation consists of executing the ```install.sh``` script, which appends the absolute path of your git repo permanently to the module search paths by creating a path configuration file (.pth)

## Usage on Carme-Cluster
Since this project is a built-in package of [Carme][3] it is usable with ```import CarmeModules.HyperParameterSearch.phs.parallel_hyperparameter_search as phs```.

## Quick Start
The easiest way to become familiar with the tool is to go through the following example of finding minima of the second order [Griewank function][2] which is a common test scenario for optimization algorithms. Code with respective output can be found in the examples folder.
Expand All @@ -32,47 +35,45 @@ def test_griewank(parameter):
z = (x*x+y*y)/4000 - ma.cos(x)*ma.cos(y/ma.sqrt(2)) + 1
return z
```
+ create a new script to define a phs experiment (customize ```repository_root_dir```and the arguments ```working_dir, custom_module_root_dir, custom_module_name``` of the class instantiation):
+ create a new script to define a phs experiment (customize the arguments ```working_dir, custom_module_root_dir, custom_module_name``` of the class instantiation):

**exemplary phs experiment**

```python
repository_root_dir = 'path/to/repository' #(example:'/home/NAME')
import phs.parallel_hyperparameter_search as phs # standalone import
# Make sure that python can import 'phs'.
# One way is to run the 'install.sh' script provided within this project.

import sys
sys.path.append(repository_root_dir + '/parallel_hyperparameter_search/phs')
# import CarmeModules.HyperParameterSearch.phs.parallel_hyperparameter_search as phs # import on Carme

from phs import parallel_hyperparameter_search

hs = parallel_hyperparameter_search.ParallelHyperparameterSearch(experiment_name='experiment_griewank_1',
working_dir='/absolute/path/to/a/folder/your/experiments/should/be/saved',
repository_root_dir,
custom_module_root_dir='/absolute/path/to/root/dir/in/which/your/test_function/resides',
custom_module_name='file_name_with_test_function_definition_(without_extension)',
custom_function_name='test_griewank',
parallelization='processes',
parameter_data_types={'x':float,'y':float})
hs = phs.ParallelHyperparameterSearch(
experiment_name='experiment_griewank_1',
working_dir='/absolute/path/to/a/folder/your/experiments/should/be/saved',
custom_module_root_dir='/absolute/path/to/root/dir/in/which/your/test_function/resides',
custom_module_name='file_name_with_test_function_definition_(without_extension)',
custom_function_name='test_griewank',
parallelization='processes',
parameter_data_types={'x': float, 'y': float})

for i in range(20):
hs.add_random_numeric_parameter(parameter_name='x',bounds=[-5,5],distribution='uniform',round_digits=3)
hs.add_random_numeric_parameter(parameter_name='y',bounds=[-5,5],distribution='uniform',round_digits=3)
hs.add_random_numeric_parameter(parameter_name='x', bounds=[-5, 5], distribution='uniform', round_digits=3)
hs.add_random_numeric_parameter(parameter_name='y', bounds=[-5, 5], distribution='uniform', round_digits=3)
hs.register_parameter_set()

for i in range(10):
hs.add_bayesian_parameter(parameter_name='x',bounds=[-5,5],round_digits=3)
hs.add_bayesian_parameter(parameter_name='y',bounds=[-5,5],round_digits=3)
hs.add_bayesian_parameter(parameter_name='x', bounds=[-5, 5], round_digits=3)
hs.add_bayesian_parameter(parameter_name='y', bounds=[-5, 5], round_digits=3)
hs.register_parameter_set(ignore_duplicates=False)

hs.show_parameter_set()

hs.start_execution()
```



```

## Parallelization Technique
At the moment two general types of parallelization are implemented, a third is under development. All of these share the same functionalities but differ in definition of workers and underlaying technology of task scheduling. On the software side there is one lightweight master. It runs the parameter setup, manages the task scheduling to the workers and gathers the results immediately as they are completed. Some monitoring and visualization possibilities are already build in. This way the user can observe and evaluate the progress. These functionalities can be customized, extended or performed only once after the last task.
At the moment two general types of parallelization are implemented, a third is under development. All of these share the same functionalities but differ in definition of workers and underlying technology of task scheduling. On the software side there is one lightweight master. It runs the parameter setup, manages the task scheduling to the workers and gathers the results immediately as they are completed. Some monitoring and visualization possibilities are already built-in. This way the user can observe and evaluate the progress. These functionalities can be customized, extended or performed only once after the last task.

Each function evaluation is done on a single worker. Even the Bayesian optimization for suggesting new parameter values is done on the workers themself. By this means the master is relieved and the prerequisites for a solid scaling behavior are ensured.

Expand All @@ -82,16 +83,17 @@ Each function evaluation is done on a single worker. Even the Bayesian optimizat
The definition and setup of one scheduler and multiple workers is done automatically in the background during the initialization routines of a multinode job on the Carme Cluster. Currently two workers are started on each node. Every worker sees one of the two GPUs of a node exclusively, while owning 4 CPU cores each. This static environment will be customizable to meet different use cases in the future.

### Processes
Beside Dask native processes of the Python build in concurrent.futures module is implemented as an alternative backend. It provides the same functionalities and user experience. Local processes serve as workers which means that the CPU cores of one node can be utilized exclusively but the GPU is always shared among the processes. In this case reserving a multinode job makes no sense. The intention of this kind of computing resources is less on the GPU heavy machine learning domain in terms of productive use but rather on testing and debugging especially when only one node is avaiable. But taking CPU only function evaluations into account the processes version can also be utilized in a meaningfull manner.
Beside Dask native processes of the Python built-in concurrent.futures module is implemented as an alternative back end. It provides the same functionalities and user experience. Local processes serve as workers which means that the CPU cores of one machine can be utilized exclusively. The intention of this kind of computing resources is less on the computation heavy use in terms of production but rather on testing and debugging especially when no HPC system is available. But taking CPU only function evaluations into account the processes version can also be utilized in a meaningful manner.


[1]: http://docs.dask.org/en/latest/index.html "DASK"
[2]: https://en.wikipedia.org/wiki/Griewank_function "Griewank"
[3]: www.open-carme.org "Carme"

## Author

Peter Michael Habelitz
Fraunhofer-Institut für Techno- und Wirtschaftsmathematik ITWM
Fraunhofer-Platz 1, 67663 Kaiserslautern, Germany
Tel: +49 631 31600-4942, Fax: +49 631 31600-5942
<[email protected]>
<[email protected]>
35 changes: 17 additions & 18 deletions examples/quick_start.py
Original file line number Diff line number Diff line change
@@ -1,29 +1,28 @@
repository_root_dir = 'path/to/repository' #(example:'/home/NAME')
import phs.parallel_hyperparameter_search as phs # standalone import
# Make sure that python can import 'phs'.
# One way is to run the 'install.sh' script provided within this project.
# import CarmeModules.HyperParameterSearch as phs # import on Carme

import sys
sys.path.append(repository_root_dir + '/parallel_hyperparameter_search/phs')

from phs import parallel_hyperparameter_search

hs = parallel_hyperparameter_search.ParallelHyperparameterSearch(experiment_name='experiment_griewank_1',
working_dir='/absolute/path/to/a/folder/your/experiments/should/be/saved',
repository_root_dir,
custom_module_root_dir='/absolute/path/to/root/dir/in/which/your/test_function/resides',
custom_module_name='file_name_with_test_function_definition_(without_extension)',
custom_function_name='test_griewank',
parallelization='processes',
parameter_data_types={'x':float,'y':float})
hs = phs.ParallelHyperparameterSearch(
experiment_name='experiment_griewank_1',
working_dir='/absolute/path/to/a/folder/your/experiments/should/be/saved',
custom_module_root_dir='/absolute/path/to/root/dir/in/which/your/test_function/resides',
custom_module_name='file_name_with_test_function_definition_(without_extension)',
custom_function_name='test_griewank',
parallelization='processes',
parameter_data_types={'x': float, 'y': float})

for i in range(20):
hs.add_random_numeric_parameter(parameter_name='x',bounds=[-5,5],distribution='uniform',round_digits=3)
hs.add_random_numeric_parameter(parameter_name='y',bounds=[-5,5],distribution='uniform',round_digits=3)
hs.add_random_numeric_parameter(parameter_name='x', bounds=[-5, 5], distribution='uniform', round_digits=3)
hs.add_random_numeric_parameter(parameter_name='y', bounds=[-5, 5], distribution='uniform', round_digits=3)
hs.register_parameter_set()

for i in range(10):
hs.add_bayesian_parameter(parameter_name='x',bounds=[-5,5],round_digits=3)
hs.add_bayesian_parameter(parameter_name='y',bounds=[-5,5],round_digits=3)
hs.add_bayesian_parameter(parameter_name='x', bounds=[-5, 5], round_digits=3)
hs.add_bayesian_parameter(parameter_name='y', bounds=[-5, 5], round_digits=3)
hs.register_parameter_set(ignore_duplicates=False)

hs.show_parameter_set()

hs.start_execution()
hs.start_execution()
15 changes: 15 additions & 0 deletions install.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
#!/bin/bash

# The purpose of this script is to append the absolute path of this script (DIR)
# to the module search paths (where python looks for modules) using the 'import'
# command in python.
DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"

# Instead of manipulating the environment variable PYTHONPATH it is possible to
# create a path configuration file
# First we have to find out where python searches for these .pth files:
SITEDIR=$(python -m site --user-site)
# create if it doesn't exist
mkdir -p "$SITEDIR"
# create new .pth file with our path
echo "$DIR" > "$SITEDIR/parallel_hyperparameter_search.pth"
Loading

0 comments on commit 820026e

Please sign in to comment.