Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Add Question Answering Model (old) #329

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion .ci/opensearch/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,10 +1,18 @@
ARG OPENSEARCH_VERSION
ARG OPENSEARCH_VERSION=latest
FROM opensearchproject/opensearch:$OPENSEARCH_VERSION

# OPENSEARCH_VERSION needs to be redefined as any arg before FROM is outside build scope.
# Reference: https://docs.docker.com/engine/reference/builder/#understand-how-arg-and-from-interact
ARG OPENSEARCH_VERSION=latest
ARG opensearch_path=/usr/share/opensearch
ARG opensearch_yml=$opensearch_path/config/opensearch.yml

ARG SECURE_INTEGRATION
RUN echo "plugins.ml_commons.only_run_on_ml_node: false" >> $opensearch_yml;
RUN echo "plugins.ml_commons.native_memory_threshold: 100" >> $opensearch_yml;
RUN if [ "$OPENSEARCH_VERSION" == "2.11.0" ] ; then \
echo "plugins.ml_commons.model_access_control_enabled: true" >> $opensearch_yml; \
echo "plugins.ml_commons.allow_registering_model_via_local_file: true" >> $opensearch_yml; \
echo "plugins.ml_commons.allow_registering_model_via_url: true" >> $opensearch_yml; \
fi
RUN if [ "$SECURE_INTEGRATION" != "true" ] ; then echo "plugins.security.disabled: true" >> $opensearch_yml; fi
1 change: 0 additions & 1 deletion .ci/run-opensearch.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@
# to form a cluster suitable for running the REST API tests.
#
# Export the NUMBER_OF_NODES variable to start more than 1 node

script_path=$(dirname $(realpath -s $0))
source $script_path/imports.sh
set -euo pipefail
Expand Down
5 changes: 5 additions & 0 deletions .github/workflows/integration.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@ name: Integration tests

on: [push, pull_request]

concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true

jobs:
integration:
name: Integ
Expand All @@ -13,6 +17,7 @@ jobs:
secured: ["true"]
entry:
- { opensearch_version: 2.7.0 }
- { opensearch_version: 2.11.0 }

steps:
- name: Checkout
Expand Down
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ Inspired from [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
- Add workflow and scripts for automating model listing updating process by @thanawan-atc in ([#210](https://github.com/opensearch-project/opensearch-py-ml/pull/210))
- Add script to trigger ml-models-release jenkins workflow with generic webhook by @thanawan-atc in ([#211](https://github.com/opensearch-project/opensearch-py-ml/pull/211))
- Add example notebook for tracing and registering a CLIPTextModel to OpenSearch with the Neural Search plugin by @patrickbarnhart in ([#283](https://github.com/opensearch-project/opensearch-py-ml/pull/283))
- Add support for train api functionality by @rawwar in ([#310](https://github.com/opensearch-project/opensearch-py-ml/pull/310))
- Add support for Model Access Control - Register, Update, Search and Delete by @rawwar in ([#332](https://github.com/opensearch-project/opensearch-py-ml/pull/332))

### Changed
- Modify ml-models.JenkinsFile so that it takes model format into account and can be triggered with generic webhook by @thanawan-atc in ([#211](https://github.com/opensearch-project/opensearch-py-ml/pull/211))
Expand All @@ -25,6 +27,9 @@ Inspired from [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
- Update model upload history - sentence-transformers/distiluse-base-multilingual-cased-v1 (v.1.0.1)(TORCH_SCRIPT) by @dhrubo-os ([#281](https://github.com/opensearch-project/opensearch-py-ml/pull/281))
- Update pretrained_models_all_versions.json (2023-09-14 10:28:41) by @dhrubo-os ([#282](https://github.com/opensearch-project/opensearch-py-ml/pull/282))
- Enable the model upload workflow to add model_content_size_in_bytes & model_content_hash_value to model config automatically @thanawan-atc ([#291](https://github.com/opensearch-project/opensearch-py-ml/pull/291))
- Update pretrained_models_all_versions.json (2023-10-18 18:11:34) by @dhrubo-os ([#322](https://github.com/opensearch-project/opensearch-py-ml/pull/322))
- Update model upload history - sentence-transformers/paraphrase-mpnet-base-v2 (v.1.0.0)(BOTH) by @dhrubo-os ([#321](https://github.com/opensearch-project/opensearch-py-ml/pull/321))
- Replaced usage of `is_datetime_or_timedelta_dtype` with `is_timedelta64_dtype` and `is_datetime64_any_dtype`([#316](https://github.com/opensearch-project/opensearch-py-ml/pull/316))

### Fixed
- Enable make_model_config_json to add model description to model config file by @thanawan-atc in ([#203](https://github.com/opensearch-project/opensearch-py-ml/pull/203))
Expand All @@ -37,6 +42,7 @@ Inspired from [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
- Roll over pretrained_model_listing.json because of ml-commons dependency by @thanawan-atc in ([#252](https://github.com/opensearch-project/opensearch-py-ml/pull/252))
- Fix pandas dependency issue in nox session by installing pandas package to python directly by @thanawan-atc in ([#266](https://github.com/opensearch-project/opensearch-py-ml/pull/266))
- Fix conditional job execution issue in model upload workflow by @thanawan-atc in ([#294](https://github.com/opensearch-project/opensearch-py-ml/pull/294))
- fix bug in `MLCommonClient_client.upload_model` by @rawwar in ([#336](https://github.com/opensearch-project/opensearch-py-ml/pull/336))

## [1.1.0]

Expand Down
19 changes: 19 additions & 0 deletions DEVELOPER_GUIDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,9 @@ After navigating to OpenSearch Dashboards you should update the persistent setti

You should paste this settings in the `Dev Tools` window and run it:


For OpenSearch versions below 2.7

```yml
PUT /_cluster/settings
{
Expand All @@ -101,6 +104,22 @@ You should paste this settings in the `Dev Tools` window and run it:
}
```

For OpenSearch versions 2.8 or above

```yml
PUT /_cluster/settings
{
"persistent" : {
"plugins.ml_commons.only_run_on_ml_node" : false,
"plugins.ml_commons.native_memory_threshold" : 100,
"plugins.ml_commons.max_model_on_node": 20,
"plugins.ml_commons.enable_inhouse_python_model": true,
"plugins.ml_commons.allow_registering_model_via_local_file": true,
"plugins.ml_commons.allow_registering_model_via_url": true
}
}
```

#### Review user tutorials to understand the key features and workflows

- These [Notebook Examples](https://opensearch-project.github.io/opensearch-py-ml/examples/index.html) will show you how to use opensearch-py-ml for data exploration and machine learning.
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ clusters using the [ml-commons](https://github.com/opensearch-project/ml-commons
For more information, see [opensearch.org](https://opensearch.org/docs/latest/clients/opensearch-py-ml/) and the [API Doc](https://opensearch-project.github.io/opensearch-py-ml/index.html).


##Installing Opensearch-py-ml
## Installing Opensearch-py-ml


Opensearch-py-ml can be installed from [PyPI](https://pypi.org/project/opensearch-py-ml) via pip:
Expand Down
12 changes: 8 additions & 4 deletions opensearch_py_ml/field_mappings.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,10 +43,10 @@
from pandas.core.dtypes.common import is_bool_dtype # type: ignore
from pandas.core.dtypes.common import (
is_datetime64_any_dtype,
is_datetime_or_timedelta_dtype,
is_float_dtype,
is_integer_dtype,
is_string_dtype,
is_timedelta64_dtype,
)
from pandas.core.dtypes.inference import is_list_like

Expand Down Expand Up @@ -91,7 +91,9 @@ def is_numeric(self) -> bool:

@property
def is_timestamp(self) -> bool:
return is_datetime_or_timedelta_dtype(self.pd_dtype)
return is_datetime64_any_dtype(self.pd_dtype) or is_timedelta64_dtype(
self.pd_dtype
)

@property
def is_bool(self) -> bool:
Expand Down Expand Up @@ -509,7 +511,7 @@ def _pd_dtype_to_os_dtype(pd_dtype) -> Optional[str]:
os_dtype = "boolean"
elif is_string_dtype(pd_dtype):
os_dtype = "keyword"
elif is_datetime_or_timedelta_dtype(pd_dtype):
elif is_datetime64_any_dtype(pd_dtype) or is_timedelta64_dtype(pd_dtype):
os_dtype = "date"
elif is_datetime64_any_dtype(pd_dtype):
os_dtype = "date"
Expand Down Expand Up @@ -794,7 +796,9 @@ def metric_source_fields(
pd_dtypes.append(np.dtype(pd_dtype))
os_field_names.append(os_field_name)
os_date_formats.append(os_date_format)
elif include_timestamp and is_datetime_or_timedelta_dtype(pd_dtype):
elif include_timestamp and (
is_datetime64_any_dtype(pd_dtype) or is_timedelta64_dtype(pd_dtype)
):
pd_dtypes.append(np.dtype(pd_dtype))
os_field_names.append(os_field_name)
os_date_formats.append(os_date_format)
Expand Down
28 changes: 26 additions & 2 deletions opensearch_py_ml/ml_commons/ml_commons_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

import json
import time
from typing import Any, List, Union
from typing import Any, List, Optional, Union

from deprecated.sphinx import deprecated
from opensearchpy import OpenSearch
Expand All @@ -21,6 +21,7 @@
MODEL_VERSION_FIELD,
TIMEOUT,
)
from opensearch_py_ml.ml_commons.model_access_control import ModelAccessControl
from opensearch_py_ml.ml_commons.model_execute import ModelExecute
from opensearch_py_ml.ml_commons.model_uploader import ModelUploader

Expand All @@ -35,6 +36,7 @@ def __init__(self, os_client: OpenSearch):
self._client = os_client
self._model_uploader = ModelUploader(os_client)
self._model_execute = ModelExecute(os_client)
self.model_access_control = ModelAccessControl(os_client)

def execute(self, algorithm_name: str, input_json: dict) -> dict:
"""
Expand Down Expand Up @@ -96,7 +98,9 @@ def upload_model(
:rtype: string
"""
model_id = self._model_uploader._register_model(
model_path, model_config_path, isVerbose
model_path=model_path,
model_meta_path=model_config_path,
isVerbose=isVerbose,
)

# loading the model chunks from model index
Expand All @@ -105,6 +109,26 @@ def upload_model(

return model_id

def train_model(
self, algorithm_name: str, input_json: dict, is_async: Optional[bool] = False
) -> dict:
"""
This method trains an ML Model
"""

params = {}
if not isinstance(input_json, dict):
input_json = json.loads(input_json)
if is_async:
params["async"] = "true"

return self._client.transport.perform_request(
method="POST",
url=f"{ML_BASE_URI}/_train/{algorithm_name}",
body=input_json,
params=params,
)

def register_model(
self,
model_path: str,
Expand Down
105 changes: 105 additions & 0 deletions opensearch_py_ml/ml_commons/model_access_control.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
# SPDX-License-Identifier: Apache-2.0
# The OpenSearch Contributors require contributions made to
# this file be licensed under the Apache-2.0 license or a
# compatible open source license.
# Any modifications Copyright OpenSearch Contributors. See
# GitHub history for details.

from typing import List, Optional

from opensearchpy import OpenSearch
from opensearchpy.exceptions import NotFoundError

from opensearch_py_ml.ml_commons.ml_common_utils import ML_BASE_URI
from opensearch_py_ml.ml_commons.validators.model_access_control import (
validate_create_model_group_parameters,
validate_delete_model_group_parameters,
validate_search_model_group_parameters,
validate_update_model_group_parameters,
)


class ModelAccessControl:
API_ENDPOINT = "model_groups"

def __init__(self, os_client: OpenSearch):
self.client = os_client

def register_model_group(
self,
name: str,
description: Optional[str] = None,
access_mode: Optional[str] = "private",
backend_roles: Optional[List[str]] = None,
add_all_backend_roles: Optional[bool] = False,
):
validate_create_model_group_parameters(
name, description, access_mode, backend_roles, add_all_backend_roles
)

body = {"name": name, "add_all_backend_roles": add_all_backend_roles}
if description:
body["description"] = description
if access_mode:
body["access_mode"] = access_mode
if backend_roles:
body["backend_roles"] = backend_roles

return self.client.transport.perform_request(
method="POST", url=f"{ML_BASE_URI}/{self.API_ENDPOINT}/_register", body=body
)

def update_model_group(
self,
update_query: dict,
model_group_id: Optional[str] = None,
):
validate_update_model_group_parameters(update_query, model_group_id)
return self.client.transport.perform_request(
method="PUT",
url=f"{ML_BASE_URI}/{self.API_ENDPOINT}/{model_group_id}",
body=update_query,
)

def search_model_group(self, query: dict):
validate_search_model_group_parameters(query)
return self.client.transport.perform_request(
method="GET", url=f"{ML_BASE_URI}/{self.API_ENDPOINT}/_search", body=query
)

def search_model_group_by_name(
self,
model_group_name: str,
_source: Optional[List] = None,
size: Optional[int] = 1,
):
query = {"query": {"match": {"name": model_group_name}}, "size": size}
if _source:
query["_source"] = _source
return self.search_model_group(query)

def get_model_group_id_by_name(self, model_group_name: str):
try:
res = self.search_model_group_by_name(model_group_name)
if res["hits"]["hits"]:
return res["hits"]["hits"][0]["_id"]
else:
raise NotFoundError
except NotFoundError:
print(f"No model group found with name:{model_group_name}")
return None
except Exception as ex:
print(f"Error in get_model_group_id_by_name: {ex}")
return None

def delete_model_group(self, model_group_id: str):
validate_delete_model_group_parameters(model_group_id)
return self.client.transport.perform_request(
method="DELETE", url=f"{ML_BASE_URI}/{self.API_ENDPOINT}/{model_group_id}"
)

def delete_model_group_by_name(self, model_group_name: str):
model_group_id = self.get_model_group_id_by_name(model_group_name)
if model_group_id is None:
raise NotFoundError(f"Model group {model_group_name} not found")
return self.delete_model_group(model_group_id=model_group_id)
6 changes: 6 additions & 0 deletions opensearch_py_ml/ml_commons/validators/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# SPDX-License-Identifier: Apache-2.0
# The OpenSearch Contributors require contributions made to
# this file be licensed under the Apache-2.0 license or a
# compatible open source license.
# Any modifications Copyright OpenSearch Contributors. See
# GitHub history for details.
Loading
Loading