Occasional randomized test failure in CI #305

eb8680 · 2023-10-09T12:46:54Z

The indexing unit test test_scatter_gather_tensor fails occasionally in CI. It's not a big deal since it usually passes after re-running the CI build, but we investigate and should make sure there's not an underlying bug in scatter and gather causing the failures.

Example of failing CI run with test and error message below: https://github.com/BasisResearch/chirho/actions/runs/6443034118/job/17494507123?pr=241

=================================== FAILURES ===================================
________ test_scatter_gather_tensor[True-(2,)-(2,)-(2, 2, 3)-(2, 3, 2)] ________

enum_shape = (2,), plate_shape = (2,), batch_shape = (2, 2, 3)
event_shape = (2, 3, 2), use_effect = True

    @pytest.mark.parametrize(
        "enum_shape,plate_shape,batch_shape,event_shape", SHAPE_CASES, ids=str
    )
    @pytest.mark.parametrize("use_effect", [True, False])
    def test_scatter_gather_tensor(
        enum_shape, plate_shape, batch_shape, event_shape, use_effect
    ):
        cf_dim = -1 - len(plate_shape)
        name_to_dim = {f"dim_{i}": cf_dim - i for i in range(len(batch_shape))}
    
        full_batch_shape = enum_shape + batch_shape + plate_shape
        value = torch.randn(full_batch_shape + event_shape)
    
        world = IndexSet(
            **{
                name: {full_batch_shape[dim] - 2}
                for name, dim in name_to_dim.items()
                if full_batch_shape[dim] > 1
            }
        )
    
        with contextlib.ExitStack() as stack:
            if use_effect:
                stack.enter_context(IndexPlatesMessenger(cf_dim))
                for name, dim in name_to_dim.items():
                    add_indices(
                        IndexSet(**{name: set(range(max(2, full_batch_shape[dim])))})
                    )
                _name_to_dim = None
            else:
                _name_to_dim = name_to_dim
    
            actual = gather(
                value, world, event_dim=len(event_shape), name_to_dim=_name_to_dim
            )
            actual = scatter(
                actual,
                world,
                result=value.new_zeros(full_batch_shape + event_shape),
                event_dim=len(event_shape),
                name_to_dim=_name_to_dim,
            )
    
        mask = indexset_as_mask(
            world,
            event_dim=len(event_shape),
            name_to_dim_size={
                name: (dim, full_batch_shape[dim]) for name, dim in name_to_dim.items()
            },
        )
        _, mask = torch.broadcast_tensors(value, mask)
    
        expected = value
        assert (actual == expected)[mask].all()
>       assert not (actual == expected)[~mask].any()
E       assert not tensor(True)
E        +  where tensor(True) = <built-in method any of Tensor object at 0x7f4afa5f6f40>()
E        +    where <built-in method any of Tensor object at 0x7f4afa5f6f40> = tensor([False, False, False, False, False, False, False, False, False, False,\n        False, False, False, False, Fals...alse, False, False, False, False, False, False, False,\n        False, False, False, False, False, False, False, False]).any

tests/indexed/test_internals.py:2[85](https://github.com/BasisResearch/chirho/actions/runs/6443034118/job/17494507123?pr=241#step:7:86): AssertionError

The text was updated successfully, but these errors were encountered:

SamWitty · 2023-10-16T16:28:23Z

I'm having a hard time reproducing this error locally. It's especially perplexing because I don't see any randomness in the test.

eb8680 · 2023-10-16T17:28:41Z

value is random:

value = torch.randn(full_batch_shape + event_shape)

eb8680 added the testing label Oct 9, 2023

eb8680 mentioned this issue Jan 8, 2024

Reorganize code in chirho.explainable #485

Merged

eb8680 added the module:indexed label Jan 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Occasional randomized test failure in CI #305

Occasional randomized test failure in CI #305

eb8680 commented Oct 9, 2023

SamWitty commented Oct 16, 2023

eb8680 commented Oct 16, 2023

Occasional randomized test failure in CI #305

Occasional randomized test failure in CI #305

Comments

eb8680 commented Oct 9, 2023

SamWitty commented Oct 16, 2023

eb8680 commented Oct 16, 2023