-
Notifications
You must be signed in to change notification settings - Fork 401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace NeighborSampler with NeighborLoader in mag240m #382
base: master
Are you sure you want to change the base?
Conversation
ogb/lsc/mag240m.py
Outdated
path = osp.join(self.dir, 'processed', 'paper', 'node_label.npy') | ||
data["paper"].y = torch.from_numpy(np.load(path)) | ||
path = osp.join(self.dir, 'processed', 'paper', 'node_year.npy') | ||
data["paper"].year = torch.from_numpy(np.load(path, mmap_mode='r')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would need to add data['author'].num_nodes = ...
and data['institution'].num_nodes = ...
to register them as node types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
data['author'].num_nodes = self.__meta__['author']
data['institution'].num_nodes = self.__meta__['institution']
I add these two lines to register author
and institution
as node types. And the RuntimeError is stil there.
def to_pyg_hetero_data(self): | ||
data = HeteroData() | ||
path = osp.join(self.dir, 'processed', 'paper', 'node_feat.npy') | ||
# Current is not in-memory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean:
```suggestion
# Currently in-memory only
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
data["paper"].x = torch.from_numpy(np.load(path, mmap_mode='r'))
is from @property def paper_label(self)...
, which is called when self.in_memory
is False
. So I comment here, to remind myself to enable in_memory
part.
ogb/lsc/mag240m.py
Outdated
name = f'{src}___{rel}___{dst}' | ||
path = osp.join(self.dir, 'processed', name, 'edge_index.npy') | ||
return np.load(path) | ||
# def edge_index(self, id1: str, id2: str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uncomment back in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function edge_index
is no need any more. The edge_index info can be found in data[(('author', 'writes', 'paper'))].edge_index
, data[('author', 'affiliated_with', 'institution')].edge_index
and data[('paper', 'cites', 'paper')].edge_index
, right?
@@ -163,7 +183,8 @@ def save_test_submission(self, input_dict: Dict, dir_path: str, mode: str): | |||
|
|||
|
|||
if __name__ == '__main__': | |||
dataset = MAG240MDataset() | |||
dataset = MAG240MDataset('/home/user/yanbing/pyg/ogb/ogb/lsc/dataset') | |||
data = dataset.to_pyg_hetero_data() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's test this separately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/home/user/yanbing/pyg/ogb/ogb/lsc/dataset
is the dev root, will remove it.
examples/lsc/mag240m/gnn.py
Outdated
adjs_t=[adj_t.to(*args, **kwargs) for adj_t in self.adjs_t], | ||
) | ||
|
||
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') | ||
|
||
class MAG240M(LightningDataModule): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could try to make use of torch_geometric.data.LightningNodeDataset
for this. This would simplify the construction of neighbor loaders.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry. There is no LightningNodeDataset
in pyg.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean LightningNodeData
? Will try this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the code using LightningNodeData
, but it still get the RuntimeError Node conv1__paper1 target conv1.author__writes__paper references nonexistent attribute author__writes__paper of conv1
.
5e5d0d7
to
d6d0fd0
Compare
@yanbing-j if not opposed I can take this over when I find time in the next few weeks and finish this PR as it is needed for my work |
@puririshi98 Sure. Please go ahead. |
Currently, this PR is a draft PR that contains many print log.