Support all NeptuneML API command parameters in neptune_ml magics, accept unified JSON blob for parameter input #202

michaelnchin · 2021-09-27T05:13:18Z

Issue #, if available: #65, #103, #187

Description of changes:

Added support for all of the NeptuneML API command parameters listed in the Neptune documentation.
Support unified JSON object as a single variable to supply parameter inputs for each phase (excluding Export)

JSON parameter blob usage:

Define the JSON blob as a notebook variable, following the format below. Note that this example defines the bare minimum set of parameters required for each step; you may define any number of additional optional parameters listed at the NeptuneML API.

training_job_name=neptune_ml.get_training_job_name('link-prediction')  # example job ID for substitution 

ml_params = {
    "dataprocessing": {
        "id": f'{training_job_name}',
        "configFileName": "training-data-configuration.json",
        "inputDataS3Location": f"{export_results['outputS3Uri']}",  # ensure that the export job has finished first, or this will fail
        "processedDataS3Location" : f"{str(s3_bucket_uri)}/preloading"
    },
    "training": {
        "id": f'{training_job_name}',
        "dataProcessingJobId": f'{training_job_name}',
        "trainingInstanceType": "ml.p3.2xlarge",
        "trainModelS3Location": f"{str(s3_bucket_uri)}/training",
        "maxHPONumberOfTrainingJobs": 2,
        "maxHPOParallelTrainingJobs": 2
    },
    "endpoint": {
        "id": f'{training_job_name}',
        "mlModelTrainingJobId": f'{training_job_name}'
    },
    "modeltransform": {
        "id": f'{training_job_name}',
        "mlModelTrainingJobId": f'{training_job_name}',
        "dataProcessingJobId": f'{training_job_name}',
        "modelTransformOutputS3Location": f"{str(s3_bucket_uri)}/modeltransform"
    }
}

Pass the JSON variable into the cell body of the desired %%neptune_ml cell magic command, using the cell variable injection syntax. Re-use with as many steps as needed.

%%neptune_ml dataprocessing start --wait --store-to processing_results 
${ml_params}

New `%neptune_ml` Arguments:

%neptune_ml dataprocessing:

--prev-job-id - The job ID of a completed data processing job run on an earlier version of the data.
--instance-type - The type of ML instance used during data processing.
--instance-volume-size-in-gb - The disk volume size of the processing instance.
--timeout-in-seconds - Timeout in seconds for the data processing job.
--model-type - Heterogeneous graph model (heterogeneous) or knowledge graph model (kge).

%neptune_ml training:

(REQUIRED) --max-hpo-number - Maximum total number of training jobs to start for the hyperparameter tuning job.
(REQUIRED) --max-hpo-parallel - Maximum number of parallel training jobs to start for the hyperparameter tuning job.
---prev-job-id - The job ID of a completed model-training job that you want to update incrementally based on updated data.
-model_name - The model type for training. If not specified, the model-training job will use the same modelType used in the data processing step.
-base-processing-instance-type - The type of ML instance used in preparing and managing training of ML models.
--instance-volume-size-in-gb - The disk volume size of the training instance.
--timeout-in-seconds - Timeout in seconds for the training job.

%neptune_ml modeltransform:

--job-id - A unique identifier for the new job.
--s3-output-uri - The URI of the S3 bucket/location to store your transform result.
--data-processing-job-id - The job Id of a completed data-processing job.
--model-training-job-id - The job Id of a completed model-training job.
--training-job-name - The name of a completed SageMaker training job.
--base-processing-instance-type - The type of ML instance used in preparing and managing training of ML models.
--base-processing-instance-volume-size-in-gb - The disk volume size of the new training instance.

Note that you must now specify either:

a) --data-processing-job-id AND --model-training-job-id
b) --training-job-name

%neptune_ml endpoint:

--model-training-job-id - The job Id of a completed model-training job.
--model-transform-job-id - The job Id of a completed model-transform job.
--update - Indicates that this is an update request.
--model-name - Model type that was used for training.
--instance-type - The type of ML instance used.
--instance-count - The minimum number of Amazon EC2 instances to deploy to an endpoint for prediction.
--neptune-iam-role-arn, -volume-encryption-kms-key - See "Shared Security Parameters" below

Note that you must now specify either:

a) --model-training-job-id
b) --model-transform-job-id

Shared security parameters for dataprocessing, training, and modeltransform:

--sagemaker-iam-role-arn - The ARN of an IAM role for SageMaker execution.
--neptune-iam-role-arn - The ARN of an IAM role that provides Neptune access to SageMaker and Amazon S3 resources.
--subnets - The IDs of the subnets in the Neptune VPC.
--security-group-ids - The VPC security group IDs.
--volume-encryption-kms-key - The Key Management Service(KMS) key used by SageMaker to encrypt data on the storage volume attached to the ML compute instances that run the job.
--s3-output-encryption-kms-key - The KMS key that SageMaker uses to encrypt the output of the job.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

…into ML-Parameters

...otebooks/04-Machine-Learning/Neptune-ML-04-Introduction-to-Edge-Classification-Gremlin.ipynb

michaelnchin and others added 5 commits September 26, 2021 21:17

Add all NeptuneML command parameters as args for %neptune_ml

2c98fc4

Update Changelog

3f6caf9

Merge branch 'main' into ML-Parameters

7f9a1a9

Support single JSON object as parameter intake for all steps

d4798b3

Merge branch 'ML-Parameters' of https://github.com/aws/graph-notebook …

dcb32e9

…into ML-Parameters

michaelnchin changed the title ~~Add all NeptuneML API command parameters as args for %neptune_ml~~ Support all NeptuneML API command parameters in neptune_ml magics, accept unified JSON blob for parameter input Sep 28, 2021

michaelnchin and others added 2 commits September 28, 2021 03:56

Update Changelog for JSON blob

707dc1b

Merge branch 'main' into ML-Parameters

686eefb

bechbd reviewed Sep 29, 2021

View reviewed changes

...otebooks/04-Machine-Learning/Neptune-ML-04-Introduction-to-Edge-Classification-Gremlin.ipynb Show resolved Hide resolved

bechbd approved these changes Sep 29, 2021

View reviewed changes

michaelnchin merged commit 44bc851 into main Oct 1, 2021

michaelnchin mentioned this pull request Oct 1, 2021

How to pass the parameter processingInstanceType to the dataprocessing command? #103

Closed

michaelnchin deleted the ML-Parameters branch October 25, 2021 20:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support all NeptuneML API command parameters in neptune_ml magics, accept unified JSON blob for parameter input #202

Support all NeptuneML API command parameters in neptune_ml magics, accept unified JSON blob for parameter input #202

michaelnchin commented Sep 27, 2021 •

edited

Loading

Support all NeptuneML API command parameters in neptune_ml magics, accept unified JSON blob for parameter input #202

Support all NeptuneML API command parameters in neptune_ml magics, accept unified JSON blob for parameter input #202

Conversation

michaelnchin commented Sep 27, 2021 • edited Loading

JSON parameter blob usage:

New %neptune_ml Arguments:

michaelnchin commented Sep 27, 2021 •

edited

Loading

New `%neptune_ml` Arguments: