Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LDA-Science.ipynb generate_griffiths_data() error #902

Closed
yibeichenmcdermott opened this issue Oct 11, 2019 · 8 comments
Closed

LDA-Science.ipynb generate_griffiths_data() error #902

yibeichenmcdermott opened this issue Oct 11, 2019 · 8 comments

Comments

@yibeichenmcdermott
Copy link

I was running the example and cannot proceed at this line:

known_alpha, known_beta, documents, topic_mixtures = generate_griffiths_data( num_documents=num_documents, num_topics=10)

Here is the output:


ValueError Traceback (most recent call last)
in ()
2 num_documents = 6000
3 known_alpha, known_beta, documents, topic_mixtures = generate_griffiths_data(
----> 4 num_documents=num_documents, num_topics=10)
5 # num_topics, vocabulary_size = known_beta.shape
6

~/SageMaker/lda_topic_modeling_2019-10-11/generate_example_data.py in generate_griffiths_data(num_documents, average_document_length, num_topics, alpha, eta, seed)
112 topic_index = np.argmax(word_topic)
113 topic_word_distribution = beta[topic_index]
--> 114 word = sp.stats.multinomial.rvs(1, topic_word_distribution, size=1).reshape(vocabulary_size)
115 documents[m] += word
116

~/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/scipy/stats/_multivariate.py in rvs(self, n, p, size, random_state)
3218 n, p, npcond = self._process_parameters(n, p)
3219 random_state = self._get_random_state(random_state)
-> 3220 return random_state.multinomial(n, p, size)
3221
3222

mtrand.pyx in numpy.random.mtrand.RandomState.multinomial()

common.pyx in numpy.random.common.check_array_constraint()

ValueError: pvals < 0, pvals > 1 or pvals contains NaNs

@lefnire
Copy link

lefnire commented Oct 18, 2019

Possible lead: numpy.random.mtrand.RandomState.multinomial

The probability inputs should be normalized. As an implementation detail, the value of the last entry is ignored and assumed to take up any leftover probability mass, but this should not be relied on. A biased coin which has twice as much weight on one side as on the other should be sampled like so:

>>> np.random.multinomial(100, [1.0 / 3, 2.0 / 3])  # RIGHT
array([38, 62]) # random

not like:

>>> np.random.multinomial(100, [1.0, 2.0])  # WRONG
Traceback (most recent call last):
ValueError: pvals < 0, pvals > 1 or pvals contains NaNs

If I print(topic_word_distribution) generate_example_data.py#L114, it seems normalized: print output

@Robinspecteur
Copy link

Robinspecteur commented Jan 7, 2020

I've found a solution here: https://www.e-learn.cn/topic/2637790
From what I gather, this has to do with numpy casting implicitly the values... Not sure about the details, but casting theta and topic_word_distribution as float64 and normalizing them works:

        document_length = document_lengths[m]
        theta = thetas[m]
        theta = np.asarray(theta).astype('float64')
        theta = theta / np.sum(theta)
        topic = sp.stats.multinomial.rvs(1, theta, size=document_length)  # precompute topics for performance
        # generate word counts within document
        for n in range(document_length):
            word_topic = topic[n]
            topic_index = np.argmax(word_topic)
            topic_word_distribution = beta[topic_index]
            topic_word_distribution = np.asarray(topic_word_distribution).astype('float64')
            topic_word_distribution = topic_word_distribution / np.sum(topic_word_distribution)
            word = sp.stats.multinomial.rvs(1, topic_word_distribution, size=1).reshape(vocabulary_size)
            documents[m] += word```

@premsridhar
Copy link

hi, I am still getting issues. Is there any solution to this?

@zorrofox
Copy link

zorrofox commented Aug 5, 2020

I have some workaround for this issue to normalize the np array in line 104 and line 111.

Line 104:

theta = thetas[m] / np.sum(thetas[m])

Line 111:

topic_word_distribution = beta[topic_index] / np.sum(beta[topic_index])

@austinlasseter
Copy link

@zorrofox -- I attempted the workaround you suggested in generate_example_data.py but I continue to get the same error.

@zorrofox
Copy link

@austinlasseter Do you restart the Jupyter Kernel?

@duderevolucion
Copy link

For reference, the error noted at the beginning of this thread:
ValueError: pvals < 0, pvals > 1 or pvals contains NaNs

This error is caused by a floating point precision issue in the script. This causes in some instances the topic-word distributions not to sum to 1.0 prior to calling scipy.stats.multinomial.rvs.

A fixed version of this script can be found at https://github.com/duderevolucion/aws-sagemaker-lda-example. That repository also contains an updated version of the lda example in https://github.com/awslabs/amazon-sagemaker-examples that works with the Sagemaker Python SDK Version 2.x.

@haratyma
Copy link

@duderevolucion thanks works now!

ajaykarpur added a commit that referenced this issue Dec 1, 2020
* GluonCV YoloV3 Darknet53 example training and inference with Neo (#1266)

* upgrade MNIST experiment notebook to SDK v2 (#1576)

* GluonCV YoloV3 Darknet53 example minor fixes (#1582)

* Code cell type corrected. Removed empty cell

* Unzip datasets if not available in the notebook's folder

* fix invalid json in MNIST notetook (#1594)

* Kkoppolu inference examples (#1587)

* Compilation examples changes for new inference containers

Update examples for PyTorch
 - to use the new inference containers
 - Use SageMaker 2.x

* Clear outputs

Clear outputs in the notebook

* Fix typo

Fix typo in text box

* Undo change to iterations in old way

Undo change to iterations in old way

* Code Review feedback

Organize imports

Code Review feedback

* CR

Use new inference containers for both uncompiled and compiled flows.

* CR

Remove incorrect code comments

* Update versions of torch and torchvision

Co-authored-by: EC2 Default User <[email protected]>

* add template notebook (#1570)

* add template notebook

* resolve comments

* Bump tensorflow (#1574)

Bumps [tensorflow](https://github.com/tensorflow/tensorflow) from 1.13.1 to 1.15.4.
- [Release notes](https://github.com/tensorflow/tensorflow/releases)
- [Changelog](https://github.com/tensorflow/tensorflow/blob/master/RELEASE.md)
- [Commits](tensorflow/tensorflow@v1.13.1...v1.15.4)

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* mxnet_mnist.ipynb fix (#1597)

* Update mxnet_mnist.ipynb

Set notebook to default to CPU training

* Update mxnet_mnist.ipynb

* updated birds dataset download source (#1593)

* fix pandas errors in notebooks (#1490)

* Refactor the Debugger detect_stalled_training_job_and_stop.ipynb notebook (#1592)

* publish BYOC with Debugger notebook

* some test change

* revert the kernel names in the metadata

* fix typos

* incorporate feedback

* incorporate comments

* pin to pysdk v1

* remove installation output logs

* refactor the stalled training job notebook

* remove unnecessary module imports / minor fix

* incorporate feedback

* minor fix

* fix typo

* minor fix

* fix unfinished sentence

* incorporate feedback

* minor fix

Co-authored-by: Miyoung Choi <[email protected]>

* Make RL training compatible with PyTorch (#1520)

* Make RLEstimator() PyTorch compatible & modify cartpole notebook

* set use_pytorch to False by default

* minor refactor; check in first unit test

* indent correction

* Verify sagemaker SDK version (#1606)

* updating mxnet_mnist notebook (#1588)

* updating mxnet_mnist notebook

* typo fix

* refactoring

* refactored mnist.py

* updated bucket paths in the notebook for better organization

* notebook updated to handle sdk upgrade

Co-authored-by: EC2 Default User <[email protected]>
Co-authored-by: EC2 Default User <[email protected]>

* fixing Model Package ARNs and removing region specific dependency (#1611)

* fixing Model Package ARNs and removing region specific dependency

* Adding a disclaimer on reference notebooks

Co-authored-by: kwwaikar <[email protected]>

* Fix: add 'import tensorflow as tf' required by _save_tf_model (#1560)

Co-authored-by: Felipe Antunes <[email protected]>

* Update xgboost churn neo example for sagemaker v2 (#1591)

* Update xgboost churn neo example for sagemaker v2

* Remove use of latest version

* Add sagemaker installation command and remove duplicate import

* Use sagemaker pysdk v2

* Add setup and cleanup steps

* clear output

* Revert kernel metadata

Co-authored-by: Nikhil Kulkarni <[email protected]>

* Add integration tests using Papermill library for RL notebooks. List of notebooks covered in the tests: (#1580)

1. rl_cartpole_coach/rl_cartpole_coach_gymEnv.ipynb
2. rl_cartpole_ray/rl_cartpole_ray_gymEnv.ipynb

Co-authored-by: Akash Goel <[email protected]>

* Delete KernelExplainerWrapper and remove importing LogitLink and IdentityLink (#1603)

* update-neo-mxnet-notebooks (#1625)

* update-neo-mxnet-notebooks

* refactoring and typo fixes

* Add Ground Truth Streaming notebooks (#1617)

* Add Ground Truth Streaming notebooks

* Made below changes

* Replace .format with f-strings
* Added pip sagemaker isntall
* Download image from public url
* Minor comments

* Minor f-string updates to chained notebook

Co-authored-by: Gopalakrishna, Priyanka <[email protected]>

* Added downgrade to SDK 1.72 and edited the text. Verified notebook runs through with no errors. (#1633)

* Add SDK version rollback code. (#1634)

* Running tests in parallel for RL notebooks. (#1624)

Co-authored-by: Akash Goel <[email protected]>

* fix: resolve breaking changes of neo container, adding `softmax_label` to `compile_model` (#1635)

* Fixes #902 (#1632)

* fix probability out of bound

* fixed probability out of bound

* cleared the notebook output

* fix of probabilities out of bound

* adding an example for Linear Learner regression use case with abalone dataset and input csv format (#1622)

* infra: add PR buildspec (#1642)

* add notebook instance buildspec

* Update HPO_Analyze_TuningJob_Results.ipynb on where to retrieve a HP job (#1637)

* Update HPO_Analyze_TuningJob_Results.ipynb

Adding instructions on where to find the hyperparameter jobs needed as input.

* Update hyperparameter_tuning/analyze_results/HPO_Analyze_TuningJob_Results.ipynb

Co-authored-by: Aaron Markham <[email protected]>

* infra: update buildspec (#1649)

* update buildspec

* terminate early if no notebooks in PR

* reformat command

* move conditional to build phase as one command

* removing object2vec_multilabel_genre_classification.ipynb (#1648)

* adding preprocessing tabular data notebooks

* incorporating changes

* incorporating changes

* incorporating changes

* incorporating few changes

* minor fix to persist sagemaker version

* minor fix to persist sagemaker version

* removing notebook

Co-authored-by: Ajay Karpur <[email protected]>

* fix: move the Tensorflow import in coach_launcher.py inside the _save_tf_model fn (#1652)

Co-authored-by: Akash Goel <[email protected]>

* delete extra common folder inside rl_game_server_autopilot/sagemaker directory (#1653)

Co-authored-by: Akash Goel <[email protected]>

* Removed pip install, edited for clarity, tested on JupyterLab (#1660)

* doc: fix typos in PyTorch CIFAR-10 notebook (#1650)

* fix typos in PyTorch CIFAR-10 notebook

* deliberately raise error to test PR build

* Revert "deliberately raise error to test PR build"

This reverts commit 7c2bac3.

* Update mm byo (#1663)

* Added note that nb won't run in studio, add note about kernel and sdk version testing details

* changed kernel metadata back to conda_mxnet_p36

* Removed conda command to install s3fs. (#1659)

* change: updated for sagemaker python sdk 2.x (#1667)

* min_df was larger than max_df and outside of the acceptable range of 0.0-1.0 (#1601)

* min_df was larger than max_df and outside of the acceptable range of 0.0 to 1.0. This gave me an error but changing the min_df to 0.2 or 0.02 resolved the error. It is unclear if the author intended min_df to be 0.2 or 0.02.

* Update ntm_20newsgroups_topic_model.ipynb

remove output and changed min_df to a likely better default of 0.2

Co-authored-by: Aaron Markham <[email protected]>

* Neo pytorch inf1 notebook (#1583)

* Add Neo notebook for PT model on Inf1

* Change target to inf1

* resolve comments

* Add revert sm version

* Add multiple cores instruction and fix revert sagemaker version

* polish instructions

* one more polish

* make sm version at least 2.11.0

* change to upgrade only

* remove fixed pytorch version

Co-authored-by: EC2 Default User <[email protected]>
Co-authored-by: EC2 Default User <[email protected]>
Co-authored-by: Aaron Markham <[email protected]>

* Update generate_example_data.py (#1077)

Added code solution for Bug in the Multinomial 
lines:        theta = np.asarray(theta).astype('float64')
        theta = theta / np.sum(theta)
and lines:             topic_word_distribution = np.asarray(topic_word_distribution).astype('float64')
            topic_word_distribution = topic_word_distribution / np.sum(topic_word_distribution)

Co-authored-by: Aaron Markham <[email protected]>

* Fix boolean argument parsing (#1681)

* Fixed predictions showing as array of False instead of a single True or False value (#1679)

* Fixed predictions matched showing as array of False instead of showing whether prediction is correct (True or False).

* Fixed predictions matched showing as array of False

* Fixed predictions showing as array of False instead of a single True or False

* Dev branch (#1688)

* Adding new project gpt-2

* Reviewed. Reset Kernel.

* made fix to reflect region names in model_package_arns

* Minor notebook content rearrangement

* fixed region-specific arns

* Update README.md

Added description for new project 'creative-writing-using-gpt-2-text-generation' under 'using_model_packages'

* Update README.md

added description for new project 'creative-writing-using-gpt-2-text-generation' under 'aws_marketplace/using_model_packages'

Co-authored-by: Alex Ignatov <[email protected]>

* fix: use image_uris module for retrieval (#1698)

* added autogluon v0.0.14 support, changed the build method (#1640)

* added autogluon v0.0.14 support, changed the build method

* changed the bash execution

Co-authored-by: Eric Johnson <[email protected]>

* added data ingestion notebooks (#1602)

* added data ingestion notebooks

data ingestion notebooks v1

* Added image for Athena and Redshift notebook

Added images displayed in two data ingestion notebooks -- Athena and Redshift

* Text Data Pre-processing Notebook

New notebook added for text data pre-processing, feedback incorporated

* Include Data Aggregation to text data ingestion (S3)

include the text data aggregation content to the text data ingestion notebook

* Modified Data Ingestion Notebooks and Text preprocessing Notebooks

Modified all seven (7) data ingestion and text preprocessing notebooks to incorporate feedback

* Modified the image data ingestion notebook

Added some note to downloading COCO dataset from online resources

* updated all the links in the notebooks

links to notebooks are changed to relative links; links to videos are removed for now and can be added later. Citations to data sources and existing aws notebooks are added.

* modified some links that were not working

modified links that's not working (refer to another folder)

* Modified 012 for running error

Removed a typo in 012

* updated SageMaker SDK, clear output, added data downloading

added data downloading to the beginning of each notebook; update SageMaker SDK at the beginning of each notebook; output cleared.

* Modified packages used in notebooks

modified packages used in 011, 012, 02, 04 and text data pre-processing.

Co-authored-by: ZoeMa <[email protected]>
Co-authored-by: Talia <[email protected]>
Co-authored-by: Aaron Markham <[email protected]>
Co-authored-by: Ajay Karpur <[email protected]>

* * Add framework_version to SKLearn estimator (#1716)

Co-authored-by: Sean Morgan <[email protected]>

* Fix autopilot_customer_churn.ipynb notebook for Sagemaker V2 SDK (#1699)

* Fix notebook for Sagemaker V2 SDK

* revert account change

Co-authored-by: Michele Ricciardi <[email protected]>

* Notebook fixed and cleaned (#1726)

* Notebook fixed and cleaned

* Comment reformatted

* Fixed notebooks for errors due to syntax change and cleaned notebooks (#1723)

* Revert "Fixed notebooks for errors due to syntax change and cleaned notebooks (#1723)" (#1730)

This reverts commit e691349.

* Revert "Notebook fixed and cleaned (#1726)" (#1732)

This reverts commit b68acb4.

* Sample notebook fix 2 (#1675)

* Reducing the random hpo resource values 

We've specified the total number of training jobs to be only 20 and the maximum number of parallel jobs to be 2.

* Edited the text to be consistent with the new parameter values.

With the new parameter values, this notebook now runs without error.

* fixed typo

fixed a typo

* Updated Neo compilation notebook for GluonCV Yolo example (#1638)

* Updated Neo compilation notebook for GluonCV Yolo example

* Minor fixes to comments and logging

Co-authored-by: Eric Johnson <[email protected]>
Co-authored-by: Ajay Karpur <[email protected]>

* Fixed malformed TensorFlow estimator declaration. (#1628)

* Fixed malformed TensorFlow estimator declaration.

* Removed extraneous output.

Co-authored-by: Eric Johnson <[email protected]>

* logx=False plots data as User_Score is <=10 (#1265)

logx=True doesn't seem appropriate since User_Score is <=10 the plot shows nothing

Co-authored-by: Aaron Markham <[email protected]>
Co-authored-by: Ajay Karpur <[email protected]>

* Update detect_stalled_training_job_and_stop.ipynb (#1735)

* Updated sagemaker attribute configurations for V2 SDK support (#1636)

Co-authored-by: Aaron Markham <[email protected]>

* Update Batch Transform - breast cancer prediction with high level SDK.ipynb (#1138)

Fix a small bug.
Before specifying content_type='text/csv' in sm_transformer.transform, I get error that "Loading libsvm data failed with Exception, please ensure data is in libsvm format: <class 'ValueError'>"

Co-authored-by: Aaron Markham <[email protected]>

* Edit xgboost_customer_churn_studio.ipynb (#1060)

Co-authored-by: Aaron Markham <[email protected]>

* added a feature selection notebook (#1664)

* added a feature selection notebook

* addressed comments and renamed files for CI

* used model.model_data to index last trained model in s3

* added pip sagemaker>=2.15.0

* add lineage example notebooks (#90)

* add example notebook skeleton for fairness and explainability (#91)

Co-authored-by: Xinyu Liu <[email protected]>

Co-authored-by: Bartek Pawlik <[email protected]>
Co-authored-by: Dana Benson <[email protected]>
Co-authored-by: Krishna Chaitanya Koppolu <[email protected]>
Co-authored-by: EC2 Default User <[email protected]>
Co-authored-by: Aaron Markham <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: IvyBazan <[email protected]>
Co-authored-by: chenonit <[email protected]>
Co-authored-by: Valentin Flunkert <[email protected]>
Co-authored-by: Miyoung <[email protected]>
Co-authored-by: Miyoung Choi <[email protected]>
Co-authored-by: Anna Luo <[email protected]>
Co-authored-by: Pratyush Bagaria <[email protected]>
Co-authored-by: EC2 Default User <[email protected]>
Co-authored-by: EC2 Default User <[email protected]>
Co-authored-by: Kanchan Waikar <[email protected]>
Co-authored-by: kwwaikar <[email protected]>
Co-authored-by: Felipe Antunes <[email protected]>
Co-authored-by: Felipe Antunes <[email protected]>
Co-authored-by: Nikhil Kulkarni <[email protected]>
Co-authored-by: Nikhil Kulkarni <[email protected]>
Co-authored-by: Akash Goel <[email protected]>
Co-authored-by: Akash Goel <[email protected]>
Co-authored-by: Somnath Sarkar <[email protected]>
Co-authored-by: gopalakp <[email protected]>
Co-authored-by: Gopalakrishna, Priyanka <[email protected]>
Co-authored-by: Laren-AWS <[email protected]>
Co-authored-by: Chuyang <[email protected]>
Co-authored-by: Hongshan Li <[email protected]>
Co-authored-by: moagaber <[email protected]>
Co-authored-by: Roald Bradley Severtson <[email protected]>
Co-authored-by: Paul B <[email protected]>
Co-authored-by: Eric Slesar <[email protected]>
Co-authored-by: PaulC-AWS <[email protected]>
Co-authored-by: Corvus LEE <[email protected]>
Co-authored-by: aserfass <[email protected]>
Co-authored-by: minlu1021 <[email protected]>
Co-authored-by: EC2 Default User <[email protected]>
Co-authored-by: EC2 Default User <[email protected]>
Co-authored-by: hbono2019 <[email protected]>
Co-authored-by: H. Furkan Bozkurt <[email protected]>
Co-authored-by: Eitan Sela <[email protected]>
Co-authored-by: awsmrud <[email protected]>
Co-authored-by: Alex Ignatov <[email protected]>
Co-authored-by: Eric Johnson <[email protected]>
Co-authored-by: Yohei Nakayama <[email protected]>
Co-authored-by: ZoeMa <[email protected]>
Co-authored-by: ZoeMa <[email protected]>
Co-authored-by: Talia <[email protected]>
Co-authored-by: Sean Morgan <[email protected]>
Co-authored-by: Sean Morgan <[email protected]>
Co-authored-by: Michele Ricciardi <[email protected]>
Co-authored-by: Michele Ricciardi <[email protected]>
Co-authored-by: vivekmadan2 <[email protected]>
Co-authored-by: playphil <[email protected]>
Co-authored-by: Gili Nachum <[email protected]>
Co-authored-by: sdoyle <[email protected]>
Co-authored-by: fyang1234 <[email protected]>
Co-authored-by: annbech <[email protected]>
Co-authored-by: Xinyu <[email protected]>
Co-authored-by: Xinyu Liu <[email protected]>
ajaykarpur added a commit that referenced this issue Dec 1, 2020
* GluonCV YoloV3 Darknet53 example training and inference with Neo (#1266)

* upgrade MNIST experiment notebook to SDK v2 (#1576)

* GluonCV YoloV3 Darknet53 example minor fixes (#1582)

* Code cell type corrected. Removed empty cell

* Unzip datasets if not available in the notebook's folder

* fix invalid json in MNIST notetook (#1594)

* Kkoppolu inference examples (#1587)

* Compilation examples changes for new inference containers

Update examples for PyTorch
 - to use the new inference containers
 - Use SageMaker 2.x

* Clear outputs

Clear outputs in the notebook

* Fix typo

Fix typo in text box

* Undo change to iterations in old way

Undo change to iterations in old way

* Code Review feedback

Organize imports

Code Review feedback

* CR

Use new inference containers for both uncompiled and compiled flows.

* CR

Remove incorrect code comments

* Update versions of torch and torchvision

Co-authored-by: EC2 Default User <[email protected]>

* add template notebook (#1570)

* add template notebook

* resolve comments

* Bump tensorflow (#1574)

Bumps [tensorflow](https://github.com/tensorflow/tensorflow) from 1.13.1 to 1.15.4.
- [Release notes](https://github.com/tensorflow/tensorflow/releases)
- [Changelog](https://github.com/tensorflow/tensorflow/blob/master/RELEASE.md)
- [Commits](tensorflow/tensorflow@v1.13.1...v1.15.4)

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* mxnet_mnist.ipynb fix (#1597)

* Update mxnet_mnist.ipynb

Set notebook to default to CPU training

* Update mxnet_mnist.ipynb

* updated birds dataset download source (#1593)

* fix pandas errors in notebooks (#1490)

* Refactor the Debugger detect_stalled_training_job_and_stop.ipynb notebook (#1592)

* publish BYOC with Debugger notebook

* some test change

* revert the kernel names in the metadata

* fix typos

* incorporate feedback

* incorporate comments

* pin to pysdk v1

* remove installation output logs

* refactor the stalled training job notebook

* remove unnecessary module imports / minor fix

* incorporate feedback

* minor fix

* fix typo

* minor fix

* fix unfinished sentence

* incorporate feedback

* minor fix

Co-authored-by: Miyoung Choi <[email protected]>

* Make RL training compatible with PyTorch (#1520)

* Make RLEstimator() PyTorch compatible & modify cartpole notebook

* set use_pytorch to False by default

* minor refactor; check in first unit test

* indent correction

* Verify sagemaker SDK version (#1606)

* updating mxnet_mnist notebook (#1588)

* updating mxnet_mnist notebook

* typo fix

* refactoring

* refactored mnist.py

* updated bucket paths in the notebook for better organization

* notebook updated to handle sdk upgrade

Co-authored-by: EC2 Default User <[email protected]>
Co-authored-by: EC2 Default User <[email protected]>

* fixing Model Package ARNs and removing region specific dependency (#1611)

* fixing Model Package ARNs and removing region specific dependency

* Adding a disclaimer on reference notebooks

Co-authored-by: kwwaikar <[email protected]>

* Fix: add 'import tensorflow as tf' required by _save_tf_model (#1560)

Co-authored-by: Felipe Antunes <[email protected]>

* Update xgboost churn neo example for sagemaker v2 (#1591)

* Update xgboost churn neo example for sagemaker v2

* Remove use of latest version

* Add sagemaker installation command and remove duplicate import

* Use sagemaker pysdk v2

* Add setup and cleanup steps

* clear output

* Revert kernel metadata

Co-authored-by: Nikhil Kulkarni <[email protected]>

* Add integration tests using Papermill library for RL notebooks. List of notebooks covered in the tests: (#1580)

1. rl_cartpole_coach/rl_cartpole_coach_gymEnv.ipynb
2. rl_cartpole_ray/rl_cartpole_ray_gymEnv.ipynb

Co-authored-by: Akash Goel <[email protected]>

* Delete KernelExplainerWrapper and remove importing LogitLink and IdentityLink (#1603)

* update-neo-mxnet-notebooks (#1625)

* update-neo-mxnet-notebooks

* refactoring and typo fixes

* Add Ground Truth Streaming notebooks (#1617)

* Add Ground Truth Streaming notebooks

* Made below changes

* Replace .format with f-strings
* Added pip sagemaker isntall
* Download image from public url
* Minor comments

* Minor f-string updates to chained notebook

Co-authored-by: Gopalakrishna, Priyanka <[email protected]>

* Added downgrade to SDK 1.72 and edited the text. Verified notebook runs through with no errors. (#1633)

* Add SDK version rollback code. (#1634)

* Running tests in parallel for RL notebooks. (#1624)

Co-authored-by: Akash Goel <[email protected]>

* fix: resolve breaking changes of neo container, adding `softmax_label` to `compile_model` (#1635)

* Fixes #902 (#1632)

* fix probability out of bound

* fixed probability out of bound

* cleared the notebook output

* fix of probabilities out of bound

* adding an example for Linear Learner regression use case with abalone dataset and input csv format (#1622)

* infra: add PR buildspec (#1642)

* add notebook instance buildspec

* Update HPO_Analyze_TuningJob_Results.ipynb on where to retrieve a HP job (#1637)

* Update HPO_Analyze_TuningJob_Results.ipynb

Adding instructions on where to find the hyperparameter jobs needed as input.

* Update hyperparameter_tuning/analyze_results/HPO_Analyze_TuningJob_Results.ipynb

Co-authored-by: Aaron Markham <[email protected]>

* infra: update buildspec (#1649)

* update buildspec

* terminate early if no notebooks in PR

* reformat command

* move conditional to build phase as one command

* removing object2vec_multilabel_genre_classification.ipynb (#1648)

* adding preprocessing tabular data notebooks

* incorporating changes

* incorporating changes

* incorporating changes

* incorporating few changes

* minor fix to persist sagemaker version

* minor fix to persist sagemaker version

* removing notebook

Co-authored-by: Ajay Karpur <[email protected]>

* fix: move the Tensorflow import in coach_launcher.py inside the _save_tf_model fn (#1652)

Co-authored-by: Akash Goel <[email protected]>

* delete extra common folder inside rl_game_server_autopilot/sagemaker directory (#1653)

Co-authored-by: Akash Goel <[email protected]>

* Removed pip install, edited for clarity, tested on JupyterLab (#1660)

* doc: fix typos in PyTorch CIFAR-10 notebook (#1650)

* fix typos in PyTorch CIFAR-10 notebook

* deliberately raise error to test PR build

* Revert "deliberately raise error to test PR build"

This reverts commit 7c2bac3.

* Update mm byo (#1663)

* Added note that nb won't run in studio, add note about kernel and sdk version testing details

* changed kernel metadata back to conda_mxnet_p36

* Removed conda command to install s3fs. (#1659)

* change: updated for sagemaker python sdk 2.x (#1667)

* min_df was larger than max_df and outside of the acceptable range of 0.0-1.0 (#1601)

* min_df was larger than max_df and outside of the acceptable range of 0.0 to 1.0. This gave me an error but changing the min_df to 0.2 or 0.02 resolved the error. It is unclear if the author intended min_df to be 0.2 or 0.02.

* Update ntm_20newsgroups_topic_model.ipynb

remove output and changed min_df to a likely better default of 0.2

Co-authored-by: Aaron Markham <[email protected]>

* Neo pytorch inf1 notebook (#1583)

* Add Neo notebook for PT model on Inf1

* Change target to inf1

* resolve comments

* Add revert sm version

* Add multiple cores instruction and fix revert sagemaker version

* polish instructions

* one more polish

* make sm version at least 2.11.0

* change to upgrade only

* remove fixed pytorch version

Co-authored-by: EC2 Default User <[email protected]>
Co-authored-by: EC2 Default User <[email protected]>
Co-authored-by: Aaron Markham <[email protected]>

* Update generate_example_data.py (#1077)

Added code solution for Bug in the Multinomial 
lines:        theta = np.asarray(theta).astype('float64')
        theta = theta / np.sum(theta)
and lines:             topic_word_distribution = np.asarray(topic_word_distribution).astype('float64')
            topic_word_distribution = topic_word_distribution / np.sum(topic_word_distribution)

Co-authored-by: Aaron Markham <[email protected]>

* Fix boolean argument parsing (#1681)

* Fixed predictions showing as array of False instead of a single True or False value (#1679)

* Fixed predictions matched showing as array of False instead of showing whether prediction is correct (True or False).

* Fixed predictions matched showing as array of False

* Fixed predictions showing as array of False instead of a single True or False

* Dev branch (#1688)

* Adding new project gpt-2

* Reviewed. Reset Kernel.

* made fix to reflect region names in model_package_arns

* Minor notebook content rearrangement

* fixed region-specific arns

* Update README.md

Added description for new project 'creative-writing-using-gpt-2-text-generation' under 'using_model_packages'

* Update README.md

added description for new project 'creative-writing-using-gpt-2-text-generation' under 'aws_marketplace/using_model_packages'

Co-authored-by: Alex Ignatov <[email protected]>

* fix: use image_uris module for retrieval (#1698)

* added autogluon v0.0.14 support, changed the build method (#1640)

* added autogluon v0.0.14 support, changed the build method

* changed the bash execution

Co-authored-by: Eric Johnson <[email protected]>

* added data ingestion notebooks (#1602)

* added data ingestion notebooks

data ingestion notebooks v1

* Added image for Athena and Redshift notebook

Added images displayed in two data ingestion notebooks -- Athena and Redshift

* Text Data Pre-processing Notebook

New notebook added for text data pre-processing, feedback incorporated

* Include Data Aggregation to text data ingestion (S3)

include the text data aggregation content to the text data ingestion notebook

* Modified Data Ingestion Notebooks and Text preprocessing Notebooks

Modified all seven (7) data ingestion and text preprocessing notebooks to incorporate feedback

* Modified the image data ingestion notebook

Added some note to downloading COCO dataset from online resources

* updated all the links in the notebooks

links to notebooks are changed to relative links; links to videos are removed for now and can be added later. Citations to data sources and existing aws notebooks are added.

* modified some links that were not working

modified links that's not working (refer to another folder)

* Modified 012 for running error

Removed a typo in 012

* updated SageMaker SDK, clear output, added data downloading

added data downloading to the beginning of each notebook; update SageMaker SDK at the beginning of each notebook; output cleared.

* Modified packages used in notebooks

modified packages used in 011, 012, 02, 04 and text data pre-processing.

Co-authored-by: ZoeMa <[email protected]>
Co-authored-by: Talia <[email protected]>
Co-authored-by: Aaron Markham <[email protected]>
Co-authored-by: Ajay Karpur <[email protected]>

* * Add framework_version to SKLearn estimator (#1716)

Co-authored-by: Sean Morgan <[email protected]>

* Fix autopilot_customer_churn.ipynb notebook for Sagemaker V2 SDK (#1699)

* Fix notebook for Sagemaker V2 SDK

* revert account change

Co-authored-by: Michele Ricciardi <[email protected]>

* Notebook fixed and cleaned (#1726)

* Notebook fixed and cleaned

* Comment reformatted

* Fixed notebooks for errors due to syntax change and cleaned notebooks (#1723)

* Revert "Fixed notebooks for errors due to syntax change and cleaned notebooks (#1723)" (#1730)

This reverts commit e691349.

* Revert "Notebook fixed and cleaned (#1726)" (#1732)

This reverts commit b68acb4.

* Sample notebook fix 2 (#1675)

* Reducing the random hpo resource values 

We've specified the total number of training jobs to be only 20 and the maximum number of parallel jobs to be 2.

* Edited the text to be consistent with the new parameter values.

With the new parameter values, this notebook now runs without error.

* fixed typo

fixed a typo

* Updated Neo compilation notebook for GluonCV Yolo example (#1638)

* Updated Neo compilation notebook for GluonCV Yolo example

* Minor fixes to comments and logging

Co-authored-by: Eric Johnson <[email protected]>
Co-authored-by: Ajay Karpur <[email protected]>

* Fixed malformed TensorFlow estimator declaration. (#1628)

* Fixed malformed TensorFlow estimator declaration.

* Removed extraneous output.

Co-authored-by: Eric Johnson <[email protected]>

* logx=False plots data as User_Score is <=10 (#1265)

logx=True doesn't seem appropriate since User_Score is <=10 the plot shows nothing

Co-authored-by: Aaron Markham <[email protected]>
Co-authored-by: Ajay Karpur <[email protected]>

* Update detect_stalled_training_job_and_stop.ipynb (#1735)

* Updated sagemaker attribute configurations for V2 SDK support (#1636)

Co-authored-by: Aaron Markham <[email protected]>

* Update Batch Transform - breast cancer prediction with high level SDK.ipynb (#1138)

Fix a small bug.
Before specifying content_type='text/csv' in sm_transformer.transform, I get error that "Loading libsvm data failed with Exception, please ensure data is in libsvm format: <class 'ValueError'>"

Co-authored-by: Aaron Markham <[email protected]>

* Edit xgboost_customer_churn_studio.ipynb (#1060)

Co-authored-by: Aaron Markham <[email protected]>

* added a feature selection notebook (#1664)

* added a feature selection notebook

* addressed comments and renamed files for CI

* used model.model_data to index last trained model in s3

* added pip sagemaker>=2.15.0

* add lineage example notebooks (#90)

* add example notebook skeleton for fairness and explainability (#91)

Co-authored-by: Xinyu Liu <[email protected]>

Co-authored-by: Bartek Pawlik <[email protected]>
Co-authored-by: Dana Benson <[email protected]>
Co-authored-by: Krishna Chaitanya Koppolu <[email protected]>
Co-authored-by: EC2 Default User <[email protected]>
Co-authored-by: Aaron Markham <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: IvyBazan <[email protected]>
Co-authored-by: chenonit <[email protected]>
Co-authored-by: Valentin Flunkert <[email protected]>
Co-authored-by: Miyoung <[email protected]>
Co-authored-by: Miyoung Choi <[email protected]>
Co-authored-by: Anna Luo <[email protected]>
Co-authored-by: Pratyush Bagaria <[email protected]>
Co-authored-by: EC2 Default User <[email protected]>
Co-authored-by: EC2 Default User <[email protected]>
Co-authored-by: Kanchan Waikar <[email protected]>
Co-authored-by: kwwaikar <[email protected]>
Co-authored-by: Felipe Antunes <[email protected]>
Co-authored-by: Felipe Antunes <[email protected]>
Co-authored-by: Nikhil Kulkarni <[email protected]>
Co-authored-by: Nikhil Kulkarni <[email protected]>
Co-authored-by: Akash Goel <[email protected]>
Co-authored-by: Akash Goel <[email protected]>
Co-authored-by: Somnath Sarkar <[email protected]>
Co-authored-by: gopalakp <[email protected]>
Co-authored-by: Gopalakrishna, Priyanka <[email protected]>
Co-authored-by: Laren-AWS <[email protected]>
Co-authored-by: Chuyang <[email protected]>
Co-authored-by: Hongshan Li <[email protected]>
Co-authored-by: moagaber <[email protected]>
Co-authored-by: Roald Bradley Severtson <[email protected]>
Co-authored-by: Paul B <[email protected]>
Co-authored-by: Eric Slesar <[email protected]>
Co-authored-by: PaulC-AWS <[email protected]>
Co-authored-by: Corvus LEE <[email protected]>
Co-authored-by: aserfass <[email protected]>
Co-authored-by: minlu1021 <[email protected]>
Co-authored-by: EC2 Default User <[email protected]>
Co-authored-by: EC2 Default User <[email protected]>
Co-authored-by: hbono2019 <[email protected]>
Co-authored-by: H. Furkan Bozkurt <[email protected]>
Co-authored-by: Eitan Sela <[email protected]>
Co-authored-by: awsmrud <[email protected]>
Co-authored-by: Alex Ignatov <[email protected]>
Co-authored-by: Eric Johnson <[email protected]>
Co-authored-by: Yohei Nakayama <[email protected]>
Co-authored-by: ZoeMa <[email protected]>
Co-authored-by: ZoeMa <[email protected]>
Co-authored-by: Talia <[email protected]>
Co-authored-by: Sean Morgan <[email protected]>
Co-authored-by: Sean Morgan <[email protected]>
Co-authored-by: Michele Ricciardi <[email protected]>
Co-authored-by: Michele Ricciardi <[email protected]>
Co-authored-by: vivekmadan2 <[email protected]>
Co-authored-by: playphil <[email protected]>
Co-authored-by: Gili Nachum <[email protected]>
Co-authored-by: sdoyle <[email protected]>
Co-authored-by: fyang1234 <[email protected]>
Co-authored-by: annbech <[email protected]>
Co-authored-by: Xinyu <[email protected]>
Co-authored-by: Xinyu Liu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants