Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support all NeptuneML API command parameters in neptune_ml magics, accept unified JSON blob for parameter input #202

Merged
merged 7 commits into from
Oct 1, 2021

Conversation

michaelnchin
Copy link
Member

@michaelnchin michaelnchin commented Sep 27, 2021

Issue #, if available: #65, #103, #187

Description of changes:

  • Added support for all of the NeptuneML API command parameters listed in the Neptune documentation.
  • Support unified JSON object as a single variable to supply parameter inputs for each phase (excluding Export)

JSON parameter blob usage:

  1. Define the JSON blob as a notebook variable, following the format below. Note that this example defines the bare minimum set of parameters required for each step; you may define any number of additional optional parameters listed at the NeptuneML API.
training_job_name=neptune_ml.get_training_job_name('link-prediction')  # example job ID for substitution 

ml_params = {
    "dataprocessing": {
        "id": f'{training_job_name}',
        "configFileName": "training-data-configuration.json",
        "inputDataS3Location": f"{export_results['outputS3Uri']}",  # ensure that the export job has finished first, or this will fail
        "processedDataS3Location" : f"{str(s3_bucket_uri)}/preloading"
    },
    "training": {
        "id": f'{training_job_name}',
        "dataProcessingJobId": f'{training_job_name}',
        "trainingInstanceType": "ml.p3.2xlarge",
        "trainModelS3Location": f"{str(s3_bucket_uri)}/training",
        "maxHPONumberOfTrainingJobs": 2,
        "maxHPOParallelTrainingJobs": 2
    },
    "endpoint": {
        "id": f'{training_job_name}',
        "mlModelTrainingJobId": f'{training_job_name}'
    },
    "modeltransform": {
        "id": f'{training_job_name}',
        "mlModelTrainingJobId": f'{training_job_name}',
        "dataProcessingJobId": f'{training_job_name}',
        "modelTransformOutputS3Location": f"{str(s3_bucket_uri)}/modeltransform"
    }
}

  1. Pass the JSON variable into the cell body of the desired %%neptune_ml cell magic command, using the cell variable injection syntax. Re-use with as many steps as needed.
%%neptune_ml dataprocessing start --wait --store-to processing_results 
${ml_params}

New %neptune_ml Arguments:

%neptune_ml dataprocessing:

  • --prev-job-id - The job ID of a completed data processing job run on an earlier version of the data.
  • --instance-type - The type of ML instance used during data processing.
  • --instance-volume-size-in-gb - The disk volume size of the processing instance.
  • --timeout-in-seconds - Timeout in seconds for the data processing job.
  • --model-type - Heterogeneous graph model (heterogeneous) or knowledge graph model (kge).

%neptune_ml training:

  • (REQUIRED) --max-hpo-number - Maximum total number of training jobs to start for the hyperparameter tuning job.
  • (REQUIRED) --max-hpo-parallel - Maximum number of parallel training jobs to start for the hyperparameter tuning job.
    ---prev-job-id - The job ID of a completed model-training job that you want to update incrementally based on updated data.
  • -model_name - The model type for training. If not specified, the model-training job will use the same modelType used in the data processing step.
  • -base-processing-instance-type - The type of ML instance used in preparing and managing training of ML models.
  • --instance-volume-size-in-gb - The disk volume size of the training instance.
  • --timeout-in-seconds - Timeout in seconds for the training job.

%neptune_ml modeltransform:

  • --job-id - A unique identifier for the new job.
  • --s3-output-uri - The URI of the S3 bucket/location to store your transform result.
  • --data-processing-job-id - The job Id of a completed data-processing job.
  • --model-training-job-id - The job Id of a completed model-training job.
  • --training-job-name - The name of a completed SageMaker training job.
  • --base-processing-instance-type - The type of ML instance used in preparing and managing training of ML models.
  • --base-processing-instance-volume-size-in-gb - The disk volume size of the new training instance.

Note that you must now specify either:

a) --data-processing-job-id AND --model-training-job-id
b) --training-job-name

%neptune_ml endpoint:

  • --model-training-job-id - The job Id of a completed model-training job.
  • --model-transform-job-id - The job Id of a completed model-transform job.
  • --update - Indicates that this is an update request.
  • --model-name - Model type that was used for training.
  • --instance-type - The type of ML instance used.
  • --instance-count - The minimum number of Amazon EC2 instances to deploy to an endpoint for prediction.
  • --neptune-iam-role-arn, -volume-encryption-kms-key - See "Shared Security Parameters" below

Note that you must now specify either:

a) --model-training-job-id
b) --model-transform-job-id

Shared security parameters for dataprocessing, training, and modeltransform:

  • --sagemaker-iam-role-arn - The ARN of an IAM role for SageMaker execution.
  • --neptune-iam-role-arn - The ARN of an IAM role that provides Neptune access to SageMaker and Amazon S3 resources.
  • --subnets - The IDs of the subnets in the Neptune VPC.
  • --security-group-ids - The VPC security group IDs.
  • --volume-encryption-kms-key - The Key Management Service(KMS) key used by SageMaker to encrypt data on the storage volume attached to the ML compute instances that run the job.
  • --s3-output-encryption-kms-key - The KMS key that SageMaker uses to encrypt the output of the job.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@michaelnchin michaelnchin changed the title Add all NeptuneML API command parameters as args for %neptune_ml Support all NeptuneML API command parameters in neptune_ml magics, accept unified JSON blob for parameter input Sep 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants