Guardrails for remote model input and output #2209

jngz-es · 2024-03-17T18:05:12Z

Description

Add guardrails for remote model input and output. We support two ways stop words and regex.

Example:

POST /_plugins/_ml/models/LHkGS44BStH5-WXNbXxT/_predict
{
  "parameters": {
    "prompt": "\n\nHuman:this is a test of <stop words>\n\nnAssistant:"
  }
}

The response
{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "guardrails triggered for user input"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "guardrails triggered for user input"
  },
  "status": 400
}

Issues Resolved

[List any issues this PR will resolve]

Check List

New functionality includes testing.
- All tests pass
New functionality has been documented.
- New functionality has javadoc added
Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Jing Zhang <[email protected]>

common/src/main/java/org/opensearch/ml/common/CommonValue.java

common/src/main/java/org/opensearch/ml/common/MLModel.java

common/src/main/java/org/opensearch/ml/common/CommonValue.java

common/src/main/java/org/opensearch/ml/common/model/MLGuard.java

common/src/main/java/org/opensearch/ml/common/model/StopWords.java

common/src/main/java/org/opensearch/ml/common/model/Guardrails.java

common/src/main/java/org/opensearch/ml/common/model/Guardrail.java

common/src/main/java/org/opensearch/ml/common/model/MLGuard.java

lezzago

I think we need to create more of a guardrails framework here. Down the road we would add many more different guardrail mechanisms as the LLM space with guardrails is new and constantly evolving. We should think with that in mind.

common/src/main/java/org/opensearch/ml/common/model/Guardrails.java

lezzago · 2024-03-18T23:20:21Z

common/src/main/java/org/opensearch/ml/common/model/Guardrail.java

+public class Guardrail implements ToXContentObject {
+    public static final String STOP_WORDS_FIELD = "stop_words";
+    public static final String REGEX_FIELD = "regex";


Guardrail should be a super class and stop words is a sub class for a type of guardrail and same with regex being a sub class of type guardrail. This would be easier to integrate new types of guardrails and create a framework around it.

lezzago · 2024-03-18T23:30:52Z

common/src/main/java/org/opensearch/ml/common/model/MLGuard.java

+    public Boolean validateRegexList(String input, List<Pattern> regexPatterns) {
+        for (Pattern pattern : regexPatterns) {
+            if (!validateRegex(input, pattern)) {
+                return false;
+            }
+        }
+        return true;
+    }
+
+    public Boolean validateRegex(String input, Pattern pattern) {
+        Matcher matcher = pattern.matcher(input);
+        return !matcher.matches();
+    }
+
+    public Boolean validateStopWords(String input, Map<String, List<String>> stopWordsIndices) {
+        for (Map.Entry entry : stopWordsIndices.entrySet()) {
+            if (!validateStopWordsSingleIndex(input, (String) entry.getKey(), (List<String>) entry.getValue())) {
+                return false;
+            }
+        }
+        return true;
+    }
+
+    public Boolean validateStopWordsSingleIndex(String input, String indexName, List<String> fieldNames) {


These function should be a part of that specific guardrail model class and we just call a validate function on the different guardrail component and that will determine how to correctly validate the input.

Good suggestion, will refactor it in next pr.

lezzago · 2024-03-19T00:04:31Z

I think we need to create more of a guardrails framework here. Down the road we would add many more different guardrail mechanisms as the LLM space with guardrails is new and constantly evolving. We should think with that in mind.

I am fine with these changes happening post this PR

common/src/main/java/org/opensearch/ml/common/model/MLGuard.java

Signed-off-by: Jing Zhang <[email protected]>

* guardrails Signed-off-by: Jing Zhang <[email protected]> * update guardrails Signed-off-by: Jing Zhang <[email protected]> * bug fix Signed-off-by: Jing Zhang <[email protected]> * add some UT Signed-off-by: Jing Zhang <[email protected]> * change stop words search to unblocking way Signed-off-by: Jing Zhang <[email protected]> * add more UT Signed-off-by: Jing Zhang <[email protected]> * address comments Signed-off-by: Jing Zhang <[email protected]> * add latch countdown when catching exception Signed-off-by: Jing Zhang <[email protected]> --------- Signed-off-by: Jing Zhang <[email protected]> (cherry picked from commit 2d401bc)

* guardrails Signed-off-by: Jing Zhang <[email protected]> * update guardrails Signed-off-by: Jing Zhang <[email protected]> * bug fix Signed-off-by: Jing Zhang <[email protected]> * add some UT Signed-off-by: Jing Zhang <[email protected]> * change stop words search to unblocking way Signed-off-by: Jing Zhang <[email protected]> * add more UT Signed-off-by: Jing Zhang <[email protected]> * address comments Signed-off-by: Jing Zhang <[email protected]> * add latch countdown when catching exception Signed-off-by: Jing Zhang <[email protected]> --------- Signed-off-by: Jing Zhang <[email protected]> (cherry picked from commit 2d401bc) Co-authored-by: Jing Zhang <[email protected]>

dhrubo-os · 2024-03-19T20:50:59Z

common/src/main/java/org/opensearch/ml/common/CommonValue.java

@@ -265,7 +265,10 @@ public class CommonValue {
                        + MLModel.CONNECTOR_FIELD
                        + "\": {" + ML_CONNECTOR_INDEX_FIELDS + "    }\n},"
                        + USER_FIELD_MAPPING
-                        + "    }\n"
+                        + "    },\n"


you added comma here.

ylwu-amzn · 2024-03-25T23:40:44Z

common/src/main/java/org/opensearch/ml/common/model/MLGuard.java

+            searchSourceBuilder.parseXContent(queryParser);
+            searchSourceBuilder.size(1); //Only need 1 doc returned, if hit.
+            searchRequest = new SearchRequest().source(searchSourceBuilder).indices(indexName);
+            context.restore();


Why restore context here? I see already have restore in action listener in line 147

ylwu-amzn · 2024-03-25T23:43:33Z

common/src/main/java/org/opensearch/ml/common/model/MLGuard.java

+                .of("query", Map.of("percolate", Map.of("field", "query", "document", documentMap)));
+        CountDownLatch latch = new CountDownLatch(1);
+
+        try (ThreadContext.StoredContext context = client.threadPool().getThreadContext().stashContext()) {


What if the indexName is not system index ? I think we should only stash context for system index.

jngz-es added 3 commits March 17, 2024 00:00

guardrails

28995fc

Signed-off-by: Jing Zhang <[email protected]>

update guardrails

2a5ab75

Signed-off-by: Jing Zhang <[email protected]>

bug fix

6f2af00

Signed-off-by: Jing Zhang <[email protected]>

jngz-es requested review from b4sjoo, dhrubo-os, model-collapse, rbhavna, ylwu-amzn, zane-neo, Zhangxunmt, austintlee, HenryL27 and sam-herman as code owners March 17, 2024 18:05

jngz-es had a problem deploying to ml-commons-cicd-env March 17, 2024 18:05 — with GitHub Actions Error

jngz-es had a problem deploying to ml-commons-cicd-env March 17, 2024 18:05 — with GitHub Actions Failure

jngz-es had a problem deploying to ml-commons-cicd-env March 17, 2024 18:05 — with GitHub Actions Error

jngz-es had a problem deploying to ml-commons-cicd-env March 17, 2024 18:05 — with GitHub Actions Failure