From 6a38bb23adc72022c3c5b422984bd07fd7457991 Mon Sep 17 00:00:00 2001 From: Julia Kroll <75504951+jkroll-aws@users.noreply.github.com> Date: Thu, 18 Aug 2022 12:33:57 -0500 Subject: [PATCH] Fix 'JSONLines' -> 'JSON Lines' (#3556) Co-authored-by: atqy <95724753+atqy@users.noreply.github.com> --- ...rness_and_explainability_jsonlines_format.ipynb | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/sagemaker_processing/fairness_and_explainability/fairness_and_explainability_jsonlines_format.ipynb b/sagemaker_processing/fairness_and_explainability/fairness_and_explainability_jsonlines_format.ipynb index 4a7412b295..90b08ec30d 100644 --- a/sagemaker_processing/fairness_and_explainability/fairness_and_explainability_jsonlines_format.ipynb +++ b/sagemaker_processing/fairness_and_explainability/fairness_and_explainability_jsonlines_format.ipynb @@ -4,7 +4,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Fairness and Explainability with SageMaker Clarify - JSONLines Format" + "# Fairness and Explainability with SageMaker Clarify - JSON Lines Format" ] }, { @@ -44,7 +44,7 @@ "1. Explaining the importance of the various input features on the model's decision\n", "1. Accessing the reports through SageMaker Studio if you have an instance set up.\n", "\n", - "In doing so, the notebook will first train a [SageMaker Linear Learner](https://docs.aws.amazon.com/sagemaker/latest/dg/linear-learner.html) model using training dataset, then use SageMaker Clarify to analyze a testing dataset in [SageMaker JSONLines dense format](https://docs.aws.amazon.com/sagemaker/latest/dg/cdf-inference.html#common-in-formats). SageMaker Clarify also supports analyzing CSV dataset, which is illustrated in [another notebook](https://github.com/aws/amazon-sagemaker-examples/blob/master/sagemaker_processing/fairness_and_explainability/fairness_and_explainability.ipynb)." + "In doing so, the notebook will first train a [SageMaker Linear Learner](https://docs.aws.amazon.com/sagemaker/latest/dg/linear-learner.html) model using training dataset, then use SageMaker Clarify to analyze a testing dataset in [SageMaker JSON Lines dense format](https://docs.aws.amazon.com/sagemaker/latest/dg/cdf-inference.html#common-in-formats). SageMaker Clarify also supports analyzing CSV dataset, which is illustrated in [another notebook](https://github.com/aws/amazon-sagemaker-examples/blob/master/sagemaker_processing/fairness_and_explainability/fairness_and_explainability.ipynb)." ] }, { @@ -247,7 +247,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Then save the testing dataset to a JSONLines file. The file conforms to [SageMaker JSONLines dense format](https://docs.aws.amazon.com/sagemaker/latest/dg/cdf-inference.html#common-in-formats), with an additional field to hold the ground truth label." + "Then save the testing dataset to a JSON Lines file. The file conforms to [SageMaker JSON Lines dense format](https://docs.aws.amazon.com/sagemaker/latest/dg/cdf-inference.html#common-in-formats), with an additional field to hold the ground truth label." ] }, { @@ -392,14 +392,14 @@ "#### Writing DataConfig and ModelConfig\n", "A `DataConfig` object communicates some basic information about data I/O to SageMaker Clarify. We specify where to find the input dataset, where to store the output, the target column (`label`), the header names, and the dataset type.\n", "\n", - "Some special things to note about this configuration for the JSONLines dataset,\n", + "Some special things to note about this configuration for the JSON Lines dataset,\n", "* Argument `features` or `label` is **NOT** header string. Instead, it is a [JSONPath string](https://jmespath.org/specification.html) to locate the features list or label in the dataset. For example, for a sample like below, `features` should be 'data.features.values', and `label` should be 'data.label'. \n", "\n", "```\n", "{\"data\": {\"features\": {\"values\": [25, 2, 226802, 1, 7, 4, 6, 3, 2, 1, 0, 0, 40, 37]}, \"label\": 0}}\n", "```\n", "\n", - "* SageMaker Clarify will load the JSONLines dataset into tabular representation for further analysis, and argument `headers` is the list of column names. The label header shall be the last one in the headers list, and the order of feature headers shall be the same as the order of features in a sample." + "* SageMaker Clarify will load the JSON Lines dataset into tabular representation for further analysis, and argument `headers` is the list of column names. The label header shall be the last one in the headers list, and the order of feature headers shall be the same as the order of features in a sample." ] }, { @@ -426,7 +426,7 @@ "A `ModelConfig` object communicates information about your trained model. To avoid additional traffic to your production models, SageMaker Clarify sets up and tears down a dedicated endpoint when processing.\n", "* `instance_type` and `instance_count` specify your preferred instance type and instance count used to run your model on during SageMaker Clarify's processing. The testing dataset is small so a single standard instance is good enough to run this example. If your have a large complex dataset, you may want to use a better instance type to speed up, or add more instances to enable Spark parallelization.\n", "* `accept_type` denotes the endpoint response payload format, and `content_type` denotes the payload format of request to the endpoint.\n", - "* `content_template` is used by SageMaker Clarify to compose the request payload if the content type is JSONLines. To be more specific, the placeholder `$features` will be replaced by the features list from samples. The request payload of a sample from the testing dataset happens to be similar to the sample itself, like `'{\"features\": [25, 2, 226802, 1, 7, 4, 6, 3, 2, 1, 0, 0, 40, 37]}'`, because both the dataset and the model input conform to [SageMaker JSONLines dense format](https://docs.aws.amazon.com/sagemaker/latest/dg/cdf-inference.html#common-in-formats)." + "* `content_template` is used by SageMaker Clarify to compose the request payload if the content type is JSON Lines. To be more specific, the placeholder `$features` will be replaced by the features list from samples. The request payload of a sample from the testing dataset happens to be similar to the sample itself, like `'{\"features\": [25, 2, 226802, 1, 7, 4, 6, 3, 2, 1, 0, 0, 40, 37]}'`, because both the dataset and the model input conform to [SageMaker JSON Lines dense format](https://docs.aws.amazon.com/sagemaker/latest/dg/cdf-inference.html#common-in-formats)." ] }, { @@ -465,7 +465,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "If you are building your own model, then you may choose a different JSONLines format, as long as it has the key elements like label and features list, and request payload built using `content_template` is supported by the model (you can customize the template but the placeholder of features list must be `$features`). Also, `dataset_type`, `accept_type` and `content_type` don't have to be the same, for example, a use case may use CSV dataset and content type, but JSONLines accept type." + "If you are building your own model, then you may choose a different JSON Lines format, as long as it has the key elements like label and features list, and request payload built using `content_template` is supported by the model (you can customize the template but the placeholder of features list must be `$features`). Also, `dataset_type`, `accept_type` and `content_type` don't have to be the same, for example, a use case may use CSV dataset and content type, but JSON Lines accept type." ] }, {