This guide applies to the development within the OpenSearch-py-ml project
This guide is for any developer who wants a running local development environment where you can make, see, and test changes. It's opinionated to get you running as quickly and easily as possible, but it's not the only way to set up a development environment.
If you're only interested in installing and running this project, you can install from pypi
If you're planning to contribute code (features or fixes) to this repository, great! Make sure to also read the contributing guide.
OpenSearch-py-ml is primarily a python based client plugin for machine learning in opensearch. To effectively contribute you should be familiar with PYTHON.
To develop on OpenSearch-py-ml, you'll need:
- A GitHub account
git
for version controlPython
, Package installer for python. for example:pip
- A code editor of your choice, configured for Python. If you don't have a favorite editor, we suggest Pycharm
If you already have these installed or have your own preferences for installing them, skip ahead to the Fork and clone OpenSearch-py-ml section.
If you don't already have it installed (check with git --version
) we recommend following the git
installation guide for your OS.
You can install any version of python starting from 3.8.
All local development should be done in a forked repository. Fork OpenSearch-py-ml by clicking the "Fork" button at the top of the GitHub repository.
Clone your forked version of OpenSearch-py-ml to your local machine (replace opensearch-project
in the command below with your GitHub username):
$ git clone [email protected]:opensearch-project/opensearch-py-ml.git
If you haven't already, change directories to your cloned repository directory:
$ cd OpenSearch-py-ml
The pip install
command will install the project's dependencies and build all internal packages and plugins.
$ pip install -r requirements-dev.txt
OpenSearch-py-ml requires a running version of OpenSearch (from opensearch 2.5) to connect to.
You can install opensearch multiple ways:
- https://opensearch.org/downloads.html#docker-compose
- https://opensearch.org/docs/2.5/install-and-configure/install-opensearch/tar/
Now that you have a development environment to play with, there are a number of different paths you may take next.
After navigating to OpenSearch Dashboards you should update the persistent settings for the cluster. The settings will update the behavior of the machine learning plugin, specifically the ml_commons plugin. ML Commons cluster settings: https://opensearch.org/docs/latest/ml-commons-plugin/cluster-settings/
You should paste this settings in the Dev Tools
window and run it:
PUT /_cluster/settings
{
"persistent" : {
"plugins.ml_commons.only_run_on_ml_node" : false,
"plugins.ml_commons.native_memory_threshold" : 100,
"plugins.ml_commons.max_model_on_node": 20
}
}
- These Notebook Examples will show you how to use opensearch-py-ml for data exploration and machine learning.
- API references provides helpful guidance using different functionalities of opensearch-py-ml
$ nox -s lint
$ nox -s format
$ nox -s test
# New HTML pages will be created in build/html
$ cd docs
$ pip install -r requirements-docs.txt
$ make clean
$ make html
opensearch.hosts: ["https://localhost:9200"]
opensearch.username: "admin" # Default username
opensearch.password: "admin" # Default password
All filenames should use snake_case
.
Right: opensearch_py_ml/ml_commons/ml_commons_client.py
Wrong: opensearch_py_ml/mlCommons/mlCommonsClient.py
We use a version management system. If a line of code is no longer needed, remove it, don't simply comment it out.
These are numbers (or other values) simply used in line in your code. Do not use these, give them a variable name, so they can be understood and changed easily.
// good
minWidth = 300
if width < minWidth:
...
// bad
if width < 300:
...
Don't do this. Everything should be wrapped in a module that can be depended on by other modules. Even things as simple as a single value should be a module.
Keep your functions short. A good function fits on a slide that the people in the last row of a big room can comfortably read. So don't count on them having perfect vision and limit yourself to ~15 lines of code per function.