Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core Feature] Add .flyteignore for excluding the generated files from fast register package #1437

Closed
yindia opened this issue Sep 6, 2021 · 4 comments
Assignees
Labels
enhancement New feature or request flytekit FlyteKit Python related issue

Comments

@yindia
Copy link
Contributor

yindia commented Sep 6, 2021

Motivation: Why do you think this is important?
Currently, if we have extra files like env, generated tar, and Large CSV for testing then pyflyte includes those files in the package for fast register. In this case size of the source increases and now if you execute the task then you will see an error

[1/1] currentAttempt done. Last Error: USER::Pod failed. No message received from kubernetes.
[wpf0byvhic-n0-0] terminated with exit code (1). Reason [OOMKilled]. Message: 
ta_proxy.py", line 123, in get_data
    proxy.download(remote_path, local_path)
  File "/opt/venv/lib/python3.8/site-packages/flytekit/interfaces/data/s3/s3proxy.py", line 159, in download
    return _update_cmd_config_and_execute(cmd)
  File "/opt/venv/lib/python3.8/site-packages/flytekit/interfaces/data/s3/s3proxy.py", line 44, in _update_cmd_config_and_execute
    return _subprocess.check_call(cmd, env=env)
  File "/opt/venv/lib/python3.8/site-packages/flytekit/tools/subprocess.py", line 25, in check_call
    raise Exception(
Exception: Called process exited with error code: -9.  Stderr dump:

b''

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/venv/bin/pyflyte-fast-execute", line 8, in <module>
    sys.exit(fast_execute_task_cmd())
  File "/opt/venv/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/opt/venv/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/opt/venv/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/venv/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/opt/venv/lib/python3.8/site-packages/flytekit/bin/entrypoint.py", line 490, in fast_execute_task_cmd
    _download_distribution(additional_distribution, dest_dir)
  File "/opt/venv/lib/python3.8/site-packages/flytekit/tools/fast_registration.py", line 100, in download_distribution
    _data_proxy.Data.get_data(additional_distribution, destination)
  File "/opt/venv/lib/python3.8/site-packages/flytekit/interfaces/data/data_proxy.py", line 125, in get_data
    raise _user_exception.FlyteAssertion(
flytekit.common.exceptions.user.FlyteAssertion: Failed to get data from s3://test/v16-fasta07781c2ea2446c22115febe25d3c704.tar.gz to /root (recursive=False).

Original exception: Called process exited with error code: -9.  Stderr dump:

b''
.

Here the issue is the size of the source, If you delete all those extra files then your size will reduce and it will work, But pyflyte currently do not have any mechanism to identify those files

Goal: What should the final outcome look like, ideally?
user will have a file in his project dir i.e. .flyteignore. It's similar to .gitignore, .dockerignore. Pyflyte will use this file to exclude all extra files that are mention in .flyteignore

Describe alternatives you've considered

  • NA

[Optional] Propose: Link/Inline OR Additional context
If you have ideas about the implementation please propose the change. If inline keep it short, if larger then you link to an external document.

@yindia yindia added enhancement New feature or request untriaged This issues has not yet been looked at by the Maintainers labels Sep 6, 2021
@wild-endeavor
Copy link
Contributor

Should we just start out by respecting .gitignore and .dockerignore?

@wild-endeavor
Copy link
Contributor

cc @erowan

@wild-endeavor wild-endeavor added flytekit FlyteKit Python related issue and removed untriaged This issues has not yet been looked at by the Maintainers labels Feb 24, 2022
@wild-endeavor wild-endeavor added this to the 1.0.0 - Phoenix! milestone Feb 24, 2022
@EngHabu EngHabu removed this from the 1.0.0 - Phoenix! milestone Mar 2, 2022
@stephen37
Copy link
Contributor

stephen37 commented Mar 21, 2022

Should we just start out by respecting .gitignore and .dockerignore?

I think that makes sense, at least on our side we would see some great benefits from that. Would it be fine to integrate a way to parse the .gitignore and .dockerignore to the function defined here? https://github.com/flyteorg/flytekit/blob/master/flytekit/tools/fast_registration.py#L33-L45
It's the one that is called here https://github.com/flyteorg/flytekit/blob/master/flytekit/clis/sdk_in_container/package.py#L129

cc @bimtauer

@bimtauer
Copy link
Contributor

I'm happy to implement this, in fact already started. @evalsocket please let me know if thats ok for you or if you have own work in the pipeline

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request flytekit FlyteKit Python related issue
Projects
None yet
Development

No branches or pull requests

6 participants