Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consider a better default Request content type #14

Open
ajaykarpur opened this issue Sep 24, 2019 · 0 comments
Open

consider a better default Request content type #14

ajaykarpur opened this issue Sep 24, 2019 · 0 comments

Comments

@ajaykarpur
Copy link
Contributor

@ajaykarpur ajaykarpur transferred this issue from aws/sagemaker-containers Jan 6, 2020
satishpasumarthi added a commit to satishpasumarthi/sagemaker-training-toolkit that referenced this issue Oct 13, 2022
* fix: propagate log level to aws services (aws#79)

* fix: propagate log level to aws services

* drop py27 and add py38 support

* update unit test

* recover buildspeck

* remove py38 build

* install latest sagemaker 1.x version

* fix: removing py27/py38

* fix arg name

Co-authored-by: Chuyang Deng <[email protected]>

* prepare release v3.6.3

* update development version to v3.6.4.dev0

* doc: fix typo in ENVIRONMENT_VARIABLES.md (aws#81)

Removed typo ')'.

Co-authored-by: Ajay Karpur <[email protected]>

* prepare release v3.6.3.post0

* update development version to v3.6.4.dev0

* infra: use ECR-hosted image for ubuntu:16.04 (aws#87)

* infra: use ECR-hosted image for ubuntu:16.04

* use public ECR repo

* disable prompts in Docker build

* fix: workaround to print stderr when capturing (aws#86)

Co-authored-by: Ajay Karpur <[email protected]>

* prepare release v3.6.4

* update development version to v3.6.5.dev0

* feature: add data parallelism support (aws#3) (aws#8)

* change: use format in place of f-strings and use comment style type annotations (aws#10)

* change: update tox to use sagemaker 2.18.0 for tests

* prepare release v3.7.0

* update development version to v3.7.1.dev0

* fix:decode binary stderr string before dumping it out (aws#89)

* fix:decode binary stderr string before dumping it out

* fix failing test

Co-authored-by: Rui Wang Napieralski <[email protected]>

* prepare release v3.7.1

* update development version to v3.7.2.dev0

* change: set btl_vader_single_copy_mechanism to none (aws#90)

* prepare release v3.7.2

* update development version to v3.7.3.dev0

* change: set btl_vader_single_copy_mechanism to none to avoid Read -1 Warning messages (aws#95)

* prepare release v3.7.3

* update development version to v3.7.4.dev0

* Update Dockerfile to accomomdate Rust dependency. (aws#98)

* Update Dockerfile to accomomdate Rust dependency.

cryptography module has added RUST as its dependency. Upgrading PIP to solve this dependency.

* pinning to particular version of pip

pinned to pip version 21.0.1 which solves the Rust dependency

* prepare release v3.7.4

* update development version to v3.7.5.dev0

* Change: smdataparallel change FI_PROVIDER to efa from sockets (aws#96)

* prepare release v3.7.5

* update development version to v3.7.6.dev0

* feature: smdataparallel custom mpi options support (aws#99)

* feature: smdataparallel custom mpi options support

* Fixed pylint

* Fixed black-check

* Fixed unit test

* prepare release v3.8.0

* update development version to v3.8.1.dev0

* feature: smdataparallel enable EFA RDMA flag (aws#101)

* feature: smdataparallel enable EFA RDMA flag

* added changes to unit test

* updated the flag to use only for ml.p4d.24xlarge instance

* prepare release v3.9.0

* update development version to v3.9.1.dev0

* change: [smdataparallel] better messages to establish the SSH connection between workers (aws#103)

* change: [smdataparallel] better messages for to establish the SSH connection between workers

* python timeout.timeout raises TimeoutError

* Added detailed error message

* prepare release v3.9.1

* update development version to v3.9.2.dev0

* Reverted -x FI_EFA_USE_DEVICE_RDMA=1 to fix a crash on PyTorch Dataloaders for Distributed training (aws#106)

* prepare release v3.9.2

* update development version to v3.9.3.dev0

* Fix logging issues (aws#108)

* Fix logging issues 

Use asyncio to read stdout and stderr streams in realtime
Report Exit code on failures
Convey user informative message if process gets OOM Killed
Filter out stderr to look for error messages and report
Prepend tags to the log files to enable easy filtering in CloudWatch
Update Amazon Licensing
Update SM doc urls
Support - Added Py38, Removed py36 and py27
Added unittests for asyncio APIs
Install libssl1.1 and openssl packages

* prepare release v3.9.3

* update development version to v3.9.4.dev0

* breaking: Add py38, dropped py36 and py2 support. Bump pypi to 4.0.0 (changes from PR aws#108) (aws#109)

* prepare release v4.0.0

* update development version to v4.0.1.dev0

* Fix: Enable custom failure logging (aws#118)

* prepare release v4.0.1

* update development version to v4.0.2.dev0

* feature: add back FI_EFA_USE_DEVICE_RDMA=1 flag, revert 2936f22 (aws#121)

fix: fixed the black lint, upgraded black to version 21.3.0
fix: remove u prefix of strings, as python3 defaults to unicode strings

note: EFA is only available on p3dn or p4dn instances
note: EFA version 1.15.1 and OFI 1.1.5-aws have the issue fixed
note: black format reference on remove u prefix
https://black.readthedocs.io/en/stable/the_black_code_style/current_style.html#strings

* prepare release v4.1.0

* update development version to v4.1.1.dev0

* fix: missing args when shell script is used (aws#122)

* prepare release v4.1.1

* update development version to v4.1.2.dev0

* fix: fix flaky issue with incorrect rc being given (aws#124)

* fix: fix flaky issue with incorrect rc being given

* Add logging around proc.wait.

* prepare release v4.1.2

* update development version to v4.1.3.dev0

* Feature: Adding new parameter for TF Multi Worker Mirrored Strategy (aws#130)

* feature: Adding new parameter for TF Multi Worker Mirrored Strategy

* fix: changing variable name for MWMS

* fix: freezing protobuf version and renaming variable for MWMS

* fix: linting

* prepare release v4.1.3

* update development version to v4.1.4.dev0

Co-authored-by: Chuyang <[email protected]>
Co-authored-by: Chuyang Deng <[email protected]>
Co-authored-by: ci <ci>
Co-authored-by: Pedro Martins <[email protected]>
Co-authored-by: Ajay Karpur <[email protected]>
Co-authored-by: sboshin <[email protected]>
Co-authored-by: ChaiBapchya <[email protected]>
Co-authored-by: Dan <[email protected]>
Co-authored-by: icywang86rui <[email protected]>
Co-authored-by: Rui Wang Napieralski <[email protected]>
Co-authored-by: Eric Johnson <[email protected]>
Co-authored-by: Karan Jariwala <[email protected]>
Co-authored-by: Rajan Singh <[email protected]>
Co-authored-by: Piyush Ghai <[email protected]>
Co-authored-by: Daiming Yang <[email protected]>
Co-authored-by: matherit <[email protected]>
Co-authored-by: Loki <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant