Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] function that read and write to file #96

Merged
merged 7 commits into from
Aug 5, 2019

Conversation

djarecka
Copy link
Collaborator

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Summary

we removed all tests with function that operates on files

something is not working properly, can't provide relative paths

Checklist

  • All tests passing
  • I have added tests to cover my changes
  • I have updated documentation (if necessary)
  • My code follows the code style of this project
    (we are using black: you can pip install pre-commit,
    run pre-commit install in the pydra directory
    and black will be run automatically with each commit)

Acknowledgment

  • I acknowledge that this contribution will be available under the Apache 2 license.

@satra
Copy link
Contributor

satra commented Jul 23, 2019

a few notes:

  1. file isolation is still a yet to be implemented feature.
  2. remember that tasks can run both on the existing environment, but also in containers, which would mean files would need to be appropriately mapped between systems
  3. relative paths are not allowed
  4. when functions are annotated, we should use https://github.com/nipype/pydra/blob/master/pydra/engine/specs.py#L5 to ensure the hash function then does a content hash vs an str which would simply hash the string. and File should only be used to represent input files not output paths/filenames (those should be of type str)

@satra
Copy link
Contributor

satra commented Jul 23, 2019

i should clarify that relative paths are allowed within the context of the dir within which the code executes, but you cannot pop out of the directory and into another directory. this is mostly to allow for checksum based look ups in the future.

@djarecka
Copy link
Collaborator Author

but I thought that default should be that we run in one file system and we add the relative path to the output_dir.

How the function and test should look like?

@satra
Copy link
Contributor

satra commented Jul 23, 2019

think of it within a container context, we may mount an output directory into the container, and everything will be expected to reside within that output directory. the output directory may well be the checksum dir.

relative inside is ok, relative outside is not (so, not <output_dir>/../<path>)

@djarecka
Copy link
Collaborator Author

djarecka commented Jul 23, 2019

@satra - ok, but I think for now it doesn't know that if I provide <filepath> it should assume self.output_dir/<filepath>

@satra
Copy link
Contributor

satra commented Jul 23, 2019

the general assumption (like nipype) is that people would provide absolute paths. right now there is no type checking or path checking. for pydra we want to keep assumptions minimum so if you provide and if it is not absolute it should crash per normal python.

without type annotation there is no way to determine if the input is a file or a string, so we simply rely on user rather than internal mechanics.

similar for outputs, the function should return an absolute path, there is no way pydra can determine if it is returning a file or a string without annotation.

once file annotations are in place, we can augment some of these things (just like the hash).

@djarecka
Copy link
Collaborator Author

anyone has experience with pickling new types?

Trying to use specs.File, that uses typing.NewType, in annotations (in order to hash the file content), but have the pickling error. The easiest way to see:

import cloudpickle as cp
import typing as ty

# this works:
aa=ty.List
cp.dump(aa)

# this doesn't
bb=ty.NewType("my_list", ty.List)
cp.dump(bb)

@djarecka djarecka added the help wanted Extra attention is needed label Jul 26, 2019
@effigies
Copy link
Contributor

I might open an issue on cloudpickle.

@djarecka
Copy link
Collaborator Author

@effigies - will do, was just thinking that maybe I'm breaking some basic pickling assumption that I'm not aware of.

@djarecka
Copy link
Collaborator Author

opened the issue: cloudpipe/cloudpickle#301

@effigies
Copy link
Contributor

@djarecka It looks like the problem is that we're using pk.dumps, not cp.dumps.

pydra/pydra/engine/core.py

Lines 137 to 138 in f59083f

state["input_spec"] = pk.dumps(state["input_spec"])
state["output_spec"] = pk.dumps(state["output_spec"])

And these should be cp.loads:

pydra/pydra/engine/core.py

Lines 143 to 144 in f59083f

state["input_spec"] = pk.loads(state["input_spec"])
state["output_spec"] = pk.loads(state["output_spec"])

@satra
Copy link
Contributor

satra commented Jul 29, 2019

@effigies - that's because cloudpickle at that time did not support mappingproxy objects. it may be worthwhile checking if cloudpickle works for everything.

@effigies
Copy link
Contributor

@satra looks like your cloudpipe/cloudpickle#223 was closed by cloudpipe/cloudpickle#245.

I can submit a PR with this fix to make sure no old tests are failing.

@djarecka
Copy link
Collaborator Author

@effigies - sure, this will be helpful! I'm still not completely sure if I understand, since my small example (that used only cloudpickle) was really failing, but happy to see fix

@effigies
Copy link
Contributor

I haven't been able to reproduce a cloudpickle failure, either. What's your cloudpickle.__version__?

@djarecka
Copy link
Collaborator Author

@effigies - i also can't reproduce the error today... I'm not sure if I can explain how did it happen... I was sure that I was testing with these exact lines... But I also missed the fact that we are using two different pickling libraries, so it's possible that I messed up the imports...

Anyway, thank you for help!

@codecov
Copy link

codecov bot commented Jul 29, 2019

Codecov Report

Merging #96 into master will increase coverage by 0.05%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #96      +/-   ##
==========================================
+ Coverage    86.5%   86.55%   +0.05%     
==========================================
  Files          14       14              
  Lines        2178     2179       +1     
  Branches      545      545              
==========================================
+ Hits         1884     1886       +2     
  Misses        205      205              
+ Partials       89       88       -1
Flag Coverage Δ
#unittests 86.55% <ø> (+0.05%) ⬆️
Impacted Files Coverage Δ
pydra/engine/core.py 87.43% <0%> (+0.03%) ⬆️
pydra/engine/helpers.py 83.63% <0%> (+0.9%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 69ab220...7d0053b. Read the comment docs.

@satra satra merged commit b4af7fd into nipype:master Aug 5, 2019
satra added a commit that referenced this pull request Aug 5, 2019
Enh/file - allow support for hashing Files - closes #96
@djarecka djarecka deleted the task_files branch December 30, 2022 20:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants