Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task outputting Torch.Tensor now errors (reports having no out attribute) #767

Open
wilke0818 opened this issue Jan 31, 2025 · 5 comments
Open
Labels
bug Something isn't working

Comments

@wilke0818
Copy link

What version of Pydra are you using?
0.25.0
Of importance (we believe): this issue comes from upgrading torch to 2.6.0

What were you trying to do?
Use Pydra with torch.Tensors in tasks (splitting list of tensors to functions that input and output individual tensors).

What did you expect will happen?
Test to pass.

What actually happened?
Test no longer passes and errors with:
AttributeError: Task 'test_task_task' has no output attribute 'out', available: '_is_param', 'all_'

Example code:

"""Tests Pydra Helping functions."""

import pydra
import torch


@pydra.mark.task
def pydra_task(test_input: torch.Tensor) -> torch.Tensor:
    """Task function for Pydra workflow to run."""
    return test_input + 2


def test_pydra() -> None:
    """Test simple tensor pydra workflow."""
    wf = pydra.Workflow(name="wf_test", input_spec=["x"])
    wf.split("x", x=[torch.tensor([[3, 4], [5, 6]]), torch.tensor([[0, 1], [1, 2]])])

    wf.add(pydra_task(name="test_task_task", test_input=wf.lzin.x))
    wf.set_output([("wf_out", wf.test_task_task.lzout.out)])

    with pydra.Submitter(plugin="serial", n_procs=1) as sub:
        sub(wf)

    results = wf.result()

    assert results[0].output.wf_out.equal(torch.tensor([[5, 6], [7, 8]]))
    assert results[1].output.wf_out.equal(torch.tensor([[2, 3], [3, 4]]))

Expected: Pass the test
Actual:

Note: based on #761 this code might be needed for the test to pass:

@register_serializer(torch.Tensor)
def bytes_repr_arraylike(obj: torch.Tensor, cache: Cache) -> Iterator[bytes]:
    """Register a serializer for Torch tensors that allows Pydra to properly use them."""
    yield f"{obj.__class__.__module__}{obj.__class__.__name__}:".encode()
    array = np.asanyarray(obj)
    yield f"{array.size}:".encode()
    if array.dtype == "object":
        yield from bytes_repr_sequence_contents(iter(array.ravel()), cache)
    else:
        yield array.tobytes(order="C")
@wilke0818 wilke0818 added the bug Something isn't working label Jan 31, 2025
@satra
Copy link
Contributor

satra commented Feb 1, 2025

@wilke0818 - the following change seems sufficient for this test to pass (together with the register_serializer).

@pydra.mark.task
@pydra.mark.annotate({"return": {"out": torch.Tensor}})
def pydra_task(test_input: torch.Tensor) -> torch.Tensor:
     """Task function for Pydra workflow to run."""
     return test_input + 2

some more info, which seems to indicate how using torch.Tensor has specific interaction with output_spec generation. perhaps @tclose or @djarecka can spell a quick fix.

without annotate:

In [41]: foo = pydra_task()

In [42]: foo.output_spec
Out[42]: SpecInfo(name='Tensor', fields=[('_is_param', <class 'bool'>)], bases=(<class 'pydra.engine.specs.BaseSpec'>,))

with annotate:

In [46]: foo = pydra_task()

In [47]: foo.output_spec
Out[47]: SpecInfo(name='Output', fields=[('out', <class 'torch.Tensor'>)], bases=(<class 'pydra.engine.specs.BaseSpec'>,))

in comparison using np.array seems to be fine.

In [48]: import pydra
    ...: import torch
    ...: 
    ...: 
    ...: @pydra.mark.task
    ...: def pydra_task(test_input: np.array) -> np.array:
    ...:     """Task function for Pydra workflow to run."""
    ...:     return test_input + 2
    ...: 

In [49]: foo = pydra_task()

In [50]: foo.output_spec
Out[50]: SpecInfo(name='Output', fields=[('out', <built-in function array>)], bases=(<class 'pydra.engine.specs.BaseSpec'>,))

@satra
Copy link
Contributor

satra commented Feb 2, 2025

this code chunk is the reason why it behaves differently for numpy and torch:

the torch.Tensor object has annotations ({'_is_param': bool}) (@wilke0818 - this was probably the change between torch 2.5 and 2.6) while numpy.ndarray does not. hence the code block gets executed and we don't get the out field.

@tclose - why do we assume that any object with annotations provides a meaningful signature for outputs? was there something you came across that provided this?

@tclose
Copy link
Contributor

tclose commented Feb 3, 2025

Hi @satra, I don't believe I have touched that code (unless git says otherwise). However, the #766 completely rewrites/replaces it. It is pretty much ready to go, I'm just working through the unittests and updating them to the new syntax.

@satra
Copy link
Contributor

satra commented Feb 3, 2025

@tclose - this was the original commit: 7265a37

if you don't remember why, i can try a hack before the new syntax is merged. one of our packages ran into this issue, which is why @wilke0818 posted this. there is a workaround for the moment, but looking forward to the new syntax, hopefully we can get it merged soon.

@tclose
Copy link
Contributor

tclose commented Feb 4, 2025

No sorry, I can't remember my thinking behind that one. Looks like I was just re-enabling something I thought would work after the refactor to use FileFormats. I am just working through the unittests for my PR now, so hoping to have a prototype ready to check out soon, so if you have a short term fix that sounds like a good idea

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants