pre_transform works not as expected #4663

oleg-kachan · 2022-05-17T11:36:46Z

I want to run an expensive transform to modify data on vertices, so I think I should pass it to pre_transform key -- it would run a transform on every graph in the dataset and will cache processed dataset.

But it works not as expected. For example, take OneHotDegree transform. Note that I delete dataset folder on every run of the following snippets:

No transform

dataset = TUDataset(root="../data/", name="MUTAG")
dataset[0].x

tensor([[1., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0.],
        [0., 1., 0., 0., 0., 0., 0.],
        [0., 0., 1., 0., 0., 0., 0.],
        [0., 0., 1., 0., 0., 0., 0.]])

Transform

dataset = TUDataset(root="../data/", name="MUTAG", transform=OneHotDegree(5))
dataset[0].x

tensor([[1., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],
        [0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
        [0., 0., 1., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
        [0., 0., 1., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.]])

Pre-transform

dataset = TUDataset(root="../data/", name="MUTAG", pre_transform=OneHotDegree(5))
dataset[0].x

tensor([[0., 0., 1., 0., 0., 0.],
        [0., 0., 1., 0., 0., 0.],
        [0., 0., 1., 0., 0., 0.],
        [0., 0., 0., 1., 0., 0.],
        [0., 0., 0., 1., 0., 0.],
        [0., 0., 1., 0., 0., 0.],
        [0., 0., 1., 0., 0., 0.],
        [0., 0., 1., 0., 0., 0.],
        [0., 0., 0., 1., 0., 0.],
        [0., 0., 0., 1., 0., 0.],
        [0., 0., 1., 0., 0., 0.],
        [0., 0., 1., 0., 0., 0.],
        [0., 0., 0., 1., 0., 0.],
        [0., 0., 1., 0., 0., 0.],
        [0., 0., 0., 1., 0., 0.],
        [0., 1., 0., 0., 0., 0.],
        [0., 1., 0., 0., 0., 0.]])

I expect to have the result of transform (case 2) in the case of pre-transform too. Is it bug or feature?

How should I apply expensive transforms and get the data modified as in the case 2?

The text was updated successfully, but these errors were encountered:

oleg-kachan · 2022-05-17T11:55:00Z

Constant as pre_transform does nothing at all:

Transform

dataset = TUDataset(root="../data/", name="MUTAG", transform=Constant(2.0))
dataset[0].x
tensor([[1., 0., 0., 0., 0., 0., 0., 2.],
        [1., 0., 0., 0., 0., 0., 0., 2.],
        [1., 0., 0., 0., 0., 0., 0., 2.],
        [1., 0., 0., 0., 0., 0., 0., 2.],
        [1., 0., 0., 0., 0., 0., 0., 2.],
        [1., 0., 0., 0., 0., 0., 0., 2.],
        [1., 0., 0., 0., 0., 0., 0., 2.],
        [1., 0., 0., 0., 0., 0., 0., 2.],
        [1., 0., 0., 0., 0., 0., 0., 2.],
        [1., 0., 0., 0., 0., 0., 0., 2.],
        [1., 0., 0., 0., 0., 0., 0., 2.],
        [1., 0., 0., 0., 0., 0., 0., 2.],
        [1., 0., 0., 0., 0., 0., 0., 2.],
        [1., 0., 0., 0., 0., 0., 0., 2.],
        [0., 1., 0., 0., 0., 0., 0., 2.],
        [0., 0., 1., 0., 0., 0., 0., 2.],
        [0., 0., 1., 0., 0., 0., 0., 2.]])

Pre-transform

dataset = TUDataset(root="../data/", name="MUTAG", pre_transform=Constant(2.0))
dataset[0].x
tensor([], size=(17, 0))

oleg-kachan · 2022-05-17T13:00:03Z

I have discovered crazy behavior of Constant as a pre-transform

If passed 0.0 it add to the old value x a vector of all zeros:

dataset = TUDataset(root="../data/", name="MUTAG", pre_transform=Constant(0.0))
dataset[0].x

tensor([[1., 0., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0.],
        [0., 1., 0., 0., 0., 0., 0., 0.],
        [0., 0., 1., 0., 0., 0., 0., 0.],
        [0., 0., 1., 0., 0., 0., 0., 0.]])

If passed 1.0 it removes the old value x, replacing it with a vector of all ones:

dataset = TUDataset(root="../data/", name="MUTAG", pre_transform=Constant(1.0))
dataset[0].x
tensor([[1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.]])

Any other digit deletes x completely:

dataset = TUDataset(root="../data/", name="MUTAG", pre_transform=Constant(2.0))
dataset[0].x
tensor([], size=(17, 0))

oleg-kachan · 2022-05-17T13:18:44Z

I have create a simple transform Replace which tries to replace data.x with a vector of size of the number of nodes with a constant value

class Replace(BaseTransform):

    def __init__(self, value):
        self.value = value
    
    def __call__(self, data):
        data.x = torch.tensor(self.value).repeat(data.num_nodes).reshape(-1, 1)
        return data

As a transform it works like expected

dataset = TUDataset(root="../data/", name="MUTAG", transform=Replace(0.0))
dataset[0].x

tensor([[0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.]])

dataset = TUDataset(root="../data/", name="MUTAG", transform=Replace(1.0))
dataset[0].x

tensor([[1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.]])

dataset = TUDataset(root="../data/", name="MUTAG", transform=Replace(2.0))
dataset[0].x

tensor([[2.],
        [2.],
        [2.],
        [2.],
        [2.],
        [2.],
        [2.],
        [2.],
        [2.],
        [2.],
        [2.],
        [2.],
        [2.],
        [2.],
        [2.],
        [2.],
        [2.]])

As a pre-transform it gives the same odd behavior, it seems some code after pre-transforms do not accepts the change of data if it is not a vector of all ones:

dataset = TUDataset(root="../data/", name="MUTAG", pre_transform=Replace(0.0))
dataset[0].x

tensor([], size=(17, 0))

dataset = TUDataset(root="../data/", name="MUTAG", pre_transform=Replace(1.0))
dataset[0].x

tensor([[1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.]])

dataset = TUDataset(root="../data/", name="MUTAG", pre_transform=Replace(2.0))
dataset[0].x

tensor([], size=(17, 0))

oleg-kachan · 2022-05-17T13:29:05Z

Moreover, as a pre-transform it accepts only a vector of all ones, truncating the matrix of all ones to a single column

class ReplaceBlock(BaseTransform):

    def __init__(self, value, width=2):
        self.value = value
        self.width = width
    
    def __call__(self, data):
        data.x = torch.tensor(self.value).reshape(-1, 1).repeat(data.num_nodes, self.width)
        return data

dataset = TUDataset(root="../data/", name="MUTAG", transform=ReplaceBlock(1.0, width=2))
dataset[0].x

tensor([[1., 1.],
        [1., 1.],
        [1., 1.],
        [1., 1.],
        [1., 1.],
        [1., 1.],
        [1., 1.],
        [1., 1.],
        [1., 1.],
        [1., 1.],
        [1., 1.],
        [1., 1.],
        [1., 1.],
        [1., 1.],
        [1., 1.],
        [1., 1.],
        [1., 1.]])

dataset = TUDataset(root="../data/", name="MUTAG", pre_transform=ReplaceBlock(1.0, width=2))
dataset[0].x

tensor([[1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.]])

rusty1s · 2022-05-17T22:09:45Z

This will be fixed in #4669. It was caused by the weird interplay of TUDataset and the detection of "categorical" features induced by the use_node_attr argument.

oleg-kachan added the bug label May 17, 2022

rusty1s linked a pull request May 17, 2022 that will close this issue

Fix the interplay between TUDataset and pre_transform that modify node features #4669

Merged

rusty1s closed this as completed in #4669 May 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pre_transform works not as expected #4663

pre_transform works not as expected #4663

oleg-kachan commented May 17, 2022 •

edited

Loading

oleg-kachan commented May 17, 2022 •

edited

Loading

oleg-kachan commented May 17, 2022 •

edited

Loading

oleg-kachan commented May 17, 2022

oleg-kachan commented May 17, 2022 •

edited

Loading

rusty1s commented May 17, 2022

pre_transform works not as expected #4663

pre_transform works not as expected #4663

Comments

oleg-kachan commented May 17, 2022 • edited Loading

oleg-kachan commented May 17, 2022 • edited Loading

oleg-kachan commented May 17, 2022 • edited Loading

oleg-kachan commented May 17, 2022

oleg-kachan commented May 17, 2022 • edited Loading

rusty1s commented May 17, 2022

oleg-kachan commented May 17, 2022 •

edited

Loading

oleg-kachan commented May 17, 2022 •

edited

Loading

oleg-kachan commented May 17, 2022 •

edited

Loading

oleg-kachan commented May 17, 2022 •

edited

Loading