From 693b6197eb13deef4cfcc3f836b5289f762c5093 Mon Sep 17 00:00:00 2001 From: Julia Kroll <75504951+jkroll-aws@users.noreply.github.com> Date: Thu, 18 Aug 2022 11:50:47 -0500 Subject: [PATCH] Fix 'JSONLines' -> 'JSON Lines' (#3554) Co-authored-by: atqy <95724753+atqy@users.noreply.github.com> --- .../fairness_and_explainability_byoc.ipynb | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/sagemaker_processing/fairness_and_explainability/fairness_and_explainability_byoc.ipynb b/sagemaker_processing/fairness_and_explainability/fairness_and_explainability_byoc.ipynb index cfa55ef88d..df31b3060e 100644 --- a/sagemaker_processing/fairness_and_explainability/fairness_and_explainability_byoc.ipynb +++ b/sagemaker_processing/fairness_and_explainability/fairness_and_explainability_byoc.ipynb @@ -422,9 +422,9 @@ "* At container startup, the script initializes an estimator using the model file provided by the client side deploy() method. The model directory and model file name are the same as in the `train` script.\n", "* Once started, the server is ready to serve inference requests. The logic resides in the `predict` method,\n", " * Input validation. The example container supports the same MIME types as Clarify job does, i.e., `text/csv` and `application/jsonlines`.\n", - " * Parse payload. Clarify job may send **batch requests** to the container for better efficiency, i.e., the payload can have multiple lines and each is a sample. So, the method decodes request payload and then split lines, then loads the lines according to the content type. For JSONLines content, the method uses a key \"features\" to extract the list of features from a JSON line. The key shall be the same as the one defined in your Clarify job analysis configuration `predictor.content_template`. It is a **contract** between the Clarify job and the container, here you can change it to something else, like \"attributes\", but remember to update the `predictor.content_template` configuration accordingly.\n", + " * Parse payload. Clarify job may send **batch requests** to the container for better efficiency, i.e., the payload can have multiple lines and each is a sample. So, the method decodes request payload and then split lines, then loads the lines according to the content type. For JSON Lines content, the method uses a key \"features\" to extract the list of features from a JSON line. The key shall be the same as the one defined in your Clarify job analysis configuration `predictor.content_template`. It is a **contract** between the Clarify job and the container, here you can change it to something else, like \"attributes\", but remember to update the `predictor.content_template` configuration accordingly.\n", " * Do prediction. The method gets the probability scores instead of binary labels, because scores are better for feature explainability.\n", - " * Format output. For a **batch request**, Clarify job expects the same number of result lines as the number of samples in the request. So, the method encodes each prediction and then join them by line-break. For JSONLines accept type, the method uses two keys \"predicted_label\" and \"score\" to indicate the prediction. The keys shall be the same as your Clarify job analysis configuration `predictor.label` and `predictor.probability`, and they are used by the Clarify job to extract predictions from container response payload. The keys are **contracts** between the Clarify job and the container, here you can change them to something else, but remember to update the analysis configuration accordingly.\n", + " * Format output. For a **batch request**, Clarify job expects the same number of result lines as the number of samples in the request. So, the method encodes each prediction and then join them by line-break. For JSON Lines accept type, the method uses two keys \"predicted_label\" and \"score\" to indicate the prediction. The keys shall be the same as your Clarify job analysis configuration `predictor.label` and `predictor.probability`, and they are used by the Clarify job to extract predictions from container response payload. The keys are **contracts** between the Clarify job and the container, here you can change them to something else, but remember to update the analysis configuration accordingly.\n", "\n", "Similarly, the script is built from scratch for demonstration purpose. In a real project, you can utilize [SageMaker Inference Toolkit](https://github.com/aws/sagemaker-inference-toolkit) which implements a model serving stack built on [Multi Model Server](https://github.com/awslabs/multi-model-server), and it can serve your own models or those you trained on SageMaker using Machine Learning frameworks with native SageMaker support." ] @@ -473,7 +473,7 @@ "python serve --model_dir \n", "```\n", "\n", - "Upon successful execution, the script should be listening on local host port `8080` for inference requests. The following cell generates a few CURL commands to send inference requests (both CSV and JSONLines) to the port. You can copy&paste them to your local terminal for execution, to hit the port and trigger the inference code. For a single sample request, the command should output only one result, and for a batch request, the command should output the same number of results (lines) as the number of samples in the request." + "Upon successful execution, the script should be listening on local host port `8080` for inference requests. The following cell generates a few CURL commands to send inference requests (both CSV and JSON Lines) to the port. You can copy&paste them to your local terminal for execution, to hit the port and trigger the inference code. For a single sample request, the command should output only one result, and for a batch request, the command should output the same number of results (lines) as the number of samples in the request." ] }, { @@ -923,13 +923,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "There are three scenarios where Clarify handles data types, and they all support both CSV (`text/csv`) and JSONLines (`application/jsonlines`).\n", + "There are three scenarios where Clarify handles data types, and they all support both CSV (`text/csv`) and JSON Lines (`application/jsonlines`).\n", "\n", "* dataset type: the MIME type of the dataset and SHAP baseline.\n", "* content type: the MIME type of the shadow endpoint request payload\n", "* accept type: the MIME type of the shadow endpoint response payload\n", "\n", - "The Clarify jobs in this notebook always uses CSV for dataset type, but you can choose for the other two. The following code chose JSONLines for both, but it is fine if you change one of them or both of them to CSV, because CSV and JSONLines are supported by the customer container as well." + "The Clarify jobs in this notebook always uses CSV for dataset type, but you can choose for the other two. The following code chose JSON Lines for both, but it is fine if you change one of them or both of them to CSV, because CSV and JSON Lines are supported by the customer container as well." ] }, { @@ -991,7 +991,7 @@ "A [ModelConfig](https://sagemaker.readthedocs.io/en/stable/api/training/processing.html#sagemaker.clarify.ModelConfig) object communicates information about your trained model. To avoid additional traffic to your production models, SageMaker Clarify sets up and tears down a dedicated endpoint when processing.\n", "* `instance_type` and `instance_count` specify your preferred instance type and instance count used to run your model on during SageMaker Clarify's processing. The testing dataset is small so a single standard instance is good enough to run this example. If you have a large complex dataset, you may want to use a better instance type to speed up, or add more instances to enable Spark parallelization.\n", "* `accept_type` denotes the endpoint response payload format, and `content_type` denotes the payload format of request to the endpoint.\n", - "* `content_template` is used by SageMaker Clarify to compose the request payload if the content type is JSONLines. To be more specific, the placeholder `$features` will be replaced by the features list from samples. For example, the first sample of the test dataset is `25,2,226802,1,7,4,6,3,2,1,0,0,40,37`, so the corresponding request payload is `'{\"features\":[25,2,226802,1,7,4,6,3,2,1,0,0,40,37]}'`, which conforms to [SageMaker JSONLines dense format](https://docs.aws.amazon.com/sagemaker/latest/dg/cdf-inference.html#common-in-formats)." + "* `content_template` is used by SageMaker Clarify to compose the request payload if the content type is JSON Lines. To be more specific, the placeholder `$features` will be replaced by the features list from samples. For example, the first sample of the test dataset is `25,2,226802,1,7,4,6,3,2,1,0,0,40,37`, so the corresponding request payload is `'{\"features\":[25,2,226802,1,7,4,6,3,2,1,0,0,40,37]}'`, which conforms to [SageMaker JSON Lines dense format](https://docs.aws.amazon.com/sagemaker/latest/dg/cdf-inference.html#common-in-formats)." ] }, { @@ -1017,7 +1017,7 @@ "#### Writing ModelPredictedLabelConfig\n", "\n", "A [ModelPredictedLabelConfig](https://sagemaker.readthedocs.io/en/stable/api/training/processing.html#sagemaker.clarify.ModelPredictedLabelConfig) provides information on the format of your predictions.\n", - "* `probability` is used by SageMaker Clarify to locate the probability score in endpoint response if the accept type is JSONLines. In this case, the response payload for a single sample request looks like `'{\"predicted_label\": 0, \"score\": 0.026494730307781475}'`, so SageMaker Clarify can find the score `0.026494730307781475` by JSONPath `'score'`.\n", + "* `probability` is used by SageMaker Clarify to locate the probability score in endpoint response if the accept type is JSON Lines. In this case, the response payload for a single sample request looks like `'{\"predicted_label\": 0, \"score\": 0.026494730307781475}'`, so SageMaker Clarify can find the score `0.026494730307781475` by JSONPath `'score'`.\n", "* `probability_threshold` is used by SageMaker Clarify to convert the probability to binary labels for bias analysis. Prediction above the threshold is interpreted as label value 1 and below or equal as label value 0." ] },