Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Occasional randomized test failure in CI #305

Open
eb8680 opened this issue Oct 9, 2023 · 2 comments
Open

Occasional randomized test failure in CI #305

eb8680 opened this issue Oct 9, 2023 · 2 comments

Comments

@eb8680
Copy link
Contributor

eb8680 commented Oct 9, 2023

The indexing unit test test_scatter_gather_tensor fails occasionally in CI. It's not a big deal since it usually passes after re-running the CI build, but we investigate and should make sure there's not an underlying bug in scatter and gather causing the failures.

Example of failing CI run with test and error message below: https://github.com/BasisResearch/chirho/actions/runs/6443034118/job/17494507123?pr=241

=================================== FAILURES ===================================
________ test_scatter_gather_tensor[True-(2,)-(2,)-(2, 2, 3)-(2, 3, 2)] ________

enum_shape = (2,), plate_shape = (2,), batch_shape = (2, 2, 3)
event_shape = (2, 3, 2), use_effect = True

    @pytest.mark.parametrize(
        "enum_shape,plate_shape,batch_shape,event_shape", SHAPE_CASES, ids=str
    )
    @pytest.mark.parametrize("use_effect", [True, False])
    def test_scatter_gather_tensor(
        enum_shape, plate_shape, batch_shape, event_shape, use_effect
    ):
        cf_dim = -1 - len(plate_shape)
        name_to_dim = {f"dim_{i}": cf_dim - i for i in range(len(batch_shape))}
    
        full_batch_shape = enum_shape + batch_shape + plate_shape
        value = torch.randn(full_batch_shape + event_shape)
    
        world = IndexSet(
            **{
                name: {full_batch_shape[dim] - 2}
                for name, dim in name_to_dim.items()
                if full_batch_shape[dim] > 1
            }
        )
    
        with contextlib.ExitStack() as stack:
            if use_effect:
                stack.enter_context(IndexPlatesMessenger(cf_dim))
                for name, dim in name_to_dim.items():
                    add_indices(
                        IndexSet(**{name: set(range(max(2, full_batch_shape[dim])))})
                    )
                _name_to_dim = None
            else:
                _name_to_dim = name_to_dim
    
            actual = gather(
                value, world, event_dim=len(event_shape), name_to_dim=_name_to_dim
            )
            actual = scatter(
                actual,
                world,
                result=value.new_zeros(full_batch_shape + event_shape),
                event_dim=len(event_shape),
                name_to_dim=_name_to_dim,
            )
    
        mask = indexset_as_mask(
            world,
            event_dim=len(event_shape),
            name_to_dim_size={
                name: (dim, full_batch_shape[dim]) for name, dim in name_to_dim.items()
            },
        )
        _, mask = torch.broadcast_tensors(value, mask)
    
        expected = value
        assert (actual == expected)[mask].all()
>       assert not (actual == expected)[~mask].any()
E       assert not tensor(True)
E        +  where tensor(True) = <built-in method any of Tensor object at 0x7f4afa5f6f40>()
E        +    where <built-in method any of Tensor object at 0x7f4afa5f6f40> = tensor([False, False, False, False, False, False, False, False, False, False,\n        False, False, False, False, Fals...alse, False, False, False, False, False, False, False,\n        False, False, False, False, False, False, False, False]).any

tests/indexed/test_internals.py:2[85](https://github.com/BasisResearch/chirho/actions/runs/6443034118/job/17494507123?pr=241#step:7:86): AssertionError
@eb8680 eb8680 added the testing label Oct 9, 2023
@SamWitty
Copy link
Collaborator

I'm having a hard time reproducing this error locally. It's especially perplexing because I don't see any randomness in the test.

@eb8680
Copy link
Contributor Author

eb8680 commented Oct 16, 2023

value is random:

value = torch.randn(full_batch_shape + event_shape)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants