-
Notifications
You must be signed in to change notification settings - Fork 324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Process the shards using multiple processes in prepare_train_data #813
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, nice change! Only a few comments.
Could you also bump the minor version ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating! How about making the max_processes
flag in data_io.py
a proper int with default 1 instead of None?
sockeye/data_io.py
Outdated
shard_sources: List[str], shard_target: str, | ||
shard_stats: 'DataStatistics', output_prefix: str, keep_tmp_shard_files: bool): | ||
""" | ||
Load a shard source/target data files into an NDArrays and then save it to desk. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Load a shard source/target data files into an NDArrays and then save it to desk. | |
Load shard source and target data files into NDArrays and save to disk. |
5356b15
to
c422dd8
Compare
Process the shards using multiple processes in prepare_train_data.
I tested the change by running the following commands and checking the outputs
Pull Request Checklist
until you can check this box.
pytest
)pytest test/system
)./style-check.sh
)sockeye/__init__.py
. Major version bump if this is a backwards incompatible change.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.