Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pre_transform works not as expected #4663

Closed
oleg-kachan opened this issue May 17, 2022 · 5 comments · Fixed by #4669
Closed

pre_transform works not as expected #4663

oleg-kachan opened this issue May 17, 2022 · 5 comments · Fixed by #4669
Labels

Comments

@oleg-kachan
Copy link

oleg-kachan commented May 17, 2022

I want to run an expensive transform to modify data on vertices, so I think I should pass it to pre_transform key -- it would run a transform on every graph in the dataset and will cache processed dataset.

But it works not as expected. For example, take OneHotDegree transform. Note that I delete dataset folder on every run of the following snippets:

No transform

dataset = TUDataset(root="../data/", name="MUTAG")
dataset[0].x

tensor([[1., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0.],
        [0., 1., 0., 0., 0., 0., 0.],
        [0., 0., 1., 0., 0., 0., 0.],
        [0., 0., 1., 0., 0., 0., 0.]])

Transform

dataset = TUDataset(root="../data/", name="MUTAG", transform=OneHotDegree(5))
dataset[0].x

tensor([[1., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],
        [0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
        [0., 0., 1., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
        [0., 0., 1., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.]])

Pre-transform

dataset = TUDataset(root="../data/", name="MUTAG", pre_transform=OneHotDegree(5))
dataset[0].x

tensor([[0., 0., 1., 0., 0., 0.],
        [0., 0., 1., 0., 0., 0.],
        [0., 0., 1., 0., 0., 0.],
        [0., 0., 0., 1., 0., 0.],
        [0., 0., 0., 1., 0., 0.],
        [0., 0., 1., 0., 0., 0.],
        [0., 0., 1., 0., 0., 0.],
        [0., 0., 1., 0., 0., 0.],
        [0., 0., 0., 1., 0., 0.],
        [0., 0., 0., 1., 0., 0.],
        [0., 0., 1., 0., 0., 0.],
        [0., 0., 1., 0., 0., 0.],
        [0., 0., 0., 1., 0., 0.],
        [0., 0., 1., 0., 0., 0.],
        [0., 0., 0., 1., 0., 0.],
        [0., 1., 0., 0., 0., 0.],
        [0., 1., 0., 0., 0., 0.]])

I expect to have the result of transform (case 2) in the case of pre-transform too. Is it bug or feature?

How should I apply expensive transforms and get the data modified as in the case 2?

@oleg-kachan
Copy link
Author

oleg-kachan commented May 17, 2022

Constant as pre_transform does nothing at all:

Transform

dataset = TUDataset(root="../data/", name="MUTAG", transform=Constant(2.0))
dataset[0].x
tensor([[1., 0., 0., 0., 0., 0., 0., 2.],
        [1., 0., 0., 0., 0., 0., 0., 2.],
        [1., 0., 0., 0., 0., 0., 0., 2.],
        [1., 0., 0., 0., 0., 0., 0., 2.],
        [1., 0., 0., 0., 0., 0., 0., 2.],
        [1., 0., 0., 0., 0., 0., 0., 2.],
        [1., 0., 0., 0., 0., 0., 0., 2.],
        [1., 0., 0., 0., 0., 0., 0., 2.],
        [1., 0., 0., 0., 0., 0., 0., 2.],
        [1., 0., 0., 0., 0., 0., 0., 2.],
        [1., 0., 0., 0., 0., 0., 0., 2.],
        [1., 0., 0., 0., 0., 0., 0., 2.],
        [1., 0., 0., 0., 0., 0., 0., 2.],
        [1., 0., 0., 0., 0., 0., 0., 2.],
        [0., 1., 0., 0., 0., 0., 0., 2.],
        [0., 0., 1., 0., 0., 0., 0., 2.],
        [0., 0., 1., 0., 0., 0., 0., 2.]])

Pre-transform

dataset = TUDataset(root="../data/", name="MUTAG", pre_transform=Constant(2.0))
dataset[0].x
tensor([], size=(17, 0))

@oleg-kachan
Copy link
Author

oleg-kachan commented May 17, 2022

I have discovered crazy behavior of Constant as a pre-transform

If passed 0.0 it add to the old value x a vector of all zeros:

dataset = TUDataset(root="../data/", name="MUTAG", pre_transform=Constant(0.0))
dataset[0].x

tensor([[1., 0., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0.],
        [0., 1., 0., 0., 0., 0., 0., 0.],
        [0., 0., 1., 0., 0., 0., 0., 0.],
        [0., 0., 1., 0., 0., 0., 0., 0.]])

If passed 1.0 it removes the old value x, replacing it with a vector of all ones:

dataset = TUDataset(root="../data/", name="MUTAG", pre_transform=Constant(1.0))
dataset[0].x
tensor([[1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.]])

Any other digit deletes x completely:

dataset = TUDataset(root="../data/", name="MUTAG", pre_transform=Constant(2.0))
dataset[0].x
tensor([], size=(17, 0))

@oleg-kachan
Copy link
Author

I have create a simple transform Replace which tries to replace data.x with a vector of size of the number of nodes with a constant value

class Replace(BaseTransform):

    def __init__(self, value):
        self.value = value
    
    def __call__(self, data):
        data.x = torch.tensor(self.value).repeat(data.num_nodes).reshape(-1, 1)
        return data

As a transform it works like expected

dataset = TUDataset(root="../data/", name="MUTAG", transform=Replace(0.0))
dataset[0].x

tensor([[0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.]])
dataset = TUDataset(root="../data/", name="MUTAG", transform=Replace(1.0))
dataset[0].x

tensor([[1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.]])
dataset = TUDataset(root="../data/", name="MUTAG", transform=Replace(2.0))
dataset[0].x

tensor([[2.],
        [2.],
        [2.],
        [2.],
        [2.],
        [2.],
        [2.],
        [2.],
        [2.],
        [2.],
        [2.],
        [2.],
        [2.],
        [2.],
        [2.],
        [2.],
        [2.]])

As a pre-transform it gives the same odd behavior, it seems some code after pre-transforms do not accepts the change of data if it is not a vector of all ones:

dataset = TUDataset(root="../data/", name="MUTAG", pre_transform=Replace(0.0))
dataset[0].x

tensor([], size=(17, 0))
dataset = TUDataset(root="../data/", name="MUTAG", pre_transform=Replace(1.0))
dataset[0].x

tensor([[1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.]])
dataset = TUDataset(root="../data/", name="MUTAG", pre_transform=Replace(2.0))
dataset[0].x

tensor([], size=(17, 0))

@oleg-kachan
Copy link
Author

oleg-kachan commented May 17, 2022

Moreover, as a pre-transform it accepts only a vector of all ones, truncating the matrix of all ones to a single column

class ReplaceBlock(BaseTransform):

    def __init__(self, value, width=2):
        self.value = value
        self.width = width
    
    def __call__(self, data):
        data.x = torch.tensor(self.value).reshape(-1, 1).repeat(data.num_nodes, self.width)
        return data
dataset = TUDataset(root="../data/", name="MUTAG", transform=ReplaceBlock(1.0, width=2))
dataset[0].x

tensor([[1., 1.],
        [1., 1.],
        [1., 1.],
        [1., 1.],
        [1., 1.],
        [1., 1.],
        [1., 1.],
        [1., 1.],
        [1., 1.],
        [1., 1.],
        [1., 1.],
        [1., 1.],
        [1., 1.],
        [1., 1.],
        [1., 1.],
        [1., 1.],
        [1., 1.]])
dataset = TUDataset(root="../data/", name="MUTAG", pre_transform=ReplaceBlock(1.0, width=2))
dataset[0].x

tensor([[1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.]])

@rusty1s
Copy link
Member

rusty1s commented May 17, 2022

This will be fixed in #4669. It was caused by the weird interplay of TUDataset and the detection of "categorical" features induced by the use_node_attr argument.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants