Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix 'JSONLines' -> 'JSON Lines' #3556

Merged
merged 5 commits into from
Aug 18, 2022
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Fairness and Explainability with SageMaker Clarify - JSONLines Format"
"# Fairness and Explainability with SageMaker Clarify - JSON Lines Format"
]
},
{
Expand Down Expand Up @@ -44,7 +44,7 @@
"1. Explaining the importance of the various input features on the model's decision\n",
"1. Accessing the reports through SageMaker Studio if you have an instance set up.\n",
"\n",
"In doing so, the notebook will first train a [SageMaker Linear Learner](https://docs.aws.amazon.com/sagemaker/latest/dg/linear-learner.html) model using training dataset, then use SageMaker Clarify to analyze a testing dataset in [SageMaker JSONLines dense format](https://docs.aws.amazon.com/sagemaker/latest/dg/cdf-inference.html#common-in-formats). SageMaker Clarify also supports analyzing CSV dataset, which is illustrated in [another notebook](https://github.com/aws/amazon-sagemaker-examples/blob/master/sagemaker_processing/fairness_and_explainability/fairness_and_explainability.ipynb)."
"In doing so, the notebook will first train a [SageMaker Linear Learner](https://docs.aws.amazon.com/sagemaker/latest/dg/linear-learner.html) model using training dataset, then use SageMaker Clarify to analyze a testing dataset in [SageMaker JSON Lines dense format](https://docs.aws.amazon.com/sagemaker/latest/dg/cdf-inference.html#common-in-formats). SageMaker Clarify also supports analyzing CSV dataset, which is illustrated in [another notebook](https://github.com/aws/amazon-sagemaker-examples/blob/master/sagemaker_processing/fairness_and_explainability/fairness_and_explainability.ipynb)."
]
},
{
Expand Down Expand Up @@ -247,7 +247,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Then save the testing dataset to a JSONLines file. The file conforms to [SageMaker JSONLines dense format](https://docs.aws.amazon.com/sagemaker/latest/dg/cdf-inference.html#common-in-formats), with an additional field to hold the ground truth label."
"Then save the testing dataset to a JSON Lines file. The file conforms to [SageMaker JSON Lines dense format](https://docs.aws.amazon.com/sagemaker/latest/dg/cdf-inference.html#common-in-formats), with an additional field to hold the ground truth label."
]
},
{
Expand Down Expand Up @@ -392,14 +392,14 @@
"#### Writing DataConfig and ModelConfig\n",
"A `DataConfig` object communicates some basic information about data I/O to SageMaker Clarify. We specify where to find the input dataset, where to store the output, the target column (`label`), the header names, and the dataset type.\n",
"\n",
"Some special things to note about this configuration for the JSONLines dataset,\n",
"Some special things to note about this configuration for the JSON Lines dataset,\n",
"* Argument `features` or `label` is **NOT** header string. Instead, it is a [JSONPath string](https://jmespath.org/specification.html) to locate the features list or label in the dataset. For example, for a sample like below, `features` should be 'data.features.values', and `label` should be 'data.label'. \n",
"\n",
"```\n",
"{\"data\": {\"features\": {\"values\": [25, 2, 226802, 1, 7, 4, 6, 3, 2, 1, 0, 0, 40, 37]}, \"label\": 0}}\n",
"```\n",
"\n",
"* SageMaker Clarify will load the JSONLines dataset into tabular representation for further analysis, and argument `headers` is the list of column names. The label header shall be the last one in the headers list, and the order of feature headers shall be the same as the order of features in a sample."
"* SageMaker Clarify will load the JSON Lines dataset into tabular representation for further analysis, and argument `headers` is the list of column names. The label header shall be the last one in the headers list, and the order of feature headers shall be the same as the order of features in a sample."
]
},
{
Expand All @@ -426,7 +426,7 @@
"A `ModelConfig` object communicates information about your trained model. To avoid additional traffic to your production models, SageMaker Clarify sets up and tears down a dedicated endpoint when processing.\n",
"* `instance_type` and `instance_count` specify your preferred instance type and instance count used to run your model on during SageMaker Clarify's processing. The testing dataset is small so a single standard instance is good enough to run this example. If your have a large complex dataset, you may want to use a better instance type to speed up, or add more instances to enable Spark parallelization.\n",
"* `accept_type` denotes the endpoint response payload format, and `content_type` denotes the payload format of request to the endpoint.\n",
"* `content_template` is used by SageMaker Clarify to compose the request payload if the content type is JSONLines. To be more specific, the placeholder `$features` will be replaced by the features list from samples. The request payload of a sample from the testing dataset happens to be similar to the sample itself, like `'{\"features\": [25, 2, 226802, 1, 7, 4, 6, 3, 2, 1, 0, 0, 40, 37]}'`, because both the dataset and the model input conform to [SageMaker JSONLines dense format](https://docs.aws.amazon.com/sagemaker/latest/dg/cdf-inference.html#common-in-formats)."
"* `content_template` is used by SageMaker Clarify to compose the request payload if the content type is JSON Lines. To be more specific, the placeholder `$features` will be replaced by the features list from samples. The request payload of a sample from the testing dataset happens to be similar to the sample itself, like `'{\"features\": [25, 2, 226802, 1, 7, 4, 6, 3, 2, 1, 0, 0, 40, 37]}'`, because both the dataset and the model input conform to [SageMaker JSON Lines dense format](https://docs.aws.amazon.com/sagemaker/latest/dg/cdf-inference.html#common-in-formats)."
]
},
{
Expand Down Expand Up @@ -465,7 +465,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"If you are building your own model, then you may choose a different JSONLines format, as long as it has the key elements like label and features list, and request payload built using `content_template` is supported by the model (you can customize the template but the placeholder of features list must be `$features`). Also, `dataset_type`, `accept_type` and `content_type` don't have to be the same, for example, a use case may use CSV dataset and content type, but JSONLines accept type."
"If you are building your own model, then you may choose a different JSON Lines format, as long as it has the key elements like label and features list, and request payload built using `content_template` is supported by the model (you can customize the template but the placeholder of features list must be `$features`). Also, `dataset_type`, `accept_type` and `content_type` don't have to be the same, for example, a use case may use CSV dataset and content type, but JSON Lines accept type."
]
},
{
Expand Down