Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel mapper reset() do not reset the source node #1443

Open
keunwoochoi opened this issue Feb 15, 2025 · 2 comments
Open

Parallel mapper reset() do not reset the source node #1443

keunwoochoi opened this issue Feb 15, 2025 · 2 comments

Comments

@keunwoochoi
Copy link

🐛 Describe the bug

Currently (https://github.com/pytorch/data/blob/main/torchdata/nodes/map.py#L437) in 0.10.1, ParallelMapper class do not init the source node.


    def reset(self, initial_state: Optional[Dict[str, Any]] = None):
        super().reset(initial_state)
        if initial_state is not None:
            self._it.reset(initial_state[self.IT_STATE_KEY])
        else:
            self._it.reset()


is it an intended behavior?

my understanding is that any node that has a source node would be supposed to reset its source node, so that resetting any node will recursively reset every node toward the end of the source node. please let me know if i'm misunderstanding.

Versions

(nothing to do with any other packages.)

torchdata version: 0.10.1

@andrewkho
Copy link
Contributor

You're right, when a node's .reset() is called, it should reset its own source node(s), and so I believe this is working correctly: ParallelMapper.reset() is calling self._it.reset() so it should reset source node as we

@keunwoochoi
Copy link
Author

Oh I see. I had a "patched" version that resets the source node, but I'll try without it again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants