Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAPLE dataset in graph format #2

Open
HoytWen opened this issue Mar 22, 2023 · 5 comments
Open

MAPLE dataset in graph format #2

HoytWen opened this issue Mar 22, 2023 · 5 comments

Comments

@HoytWen
Copy link

HoytWen commented Mar 22, 2023

Dear MAPLE authors,

Thanks for your amazing work, I feel like this dataset can be transformed into graph form and promote the research of the graph community.
I can't wait to try this dataset for graph learning, and below are one question about this dataset.

It seems each field can be regarded as a sub-graph of the Microsoft Academic Graph. I just tried to transform the papers in each field into a citation graph and find many of their references can not be mapped to the papers within the same field. Does this mean the reference papers in a field may come from other fields? If so, why there are some papers without any reference information?

I am really looking forward to your help to resolve my question.

Best,
Qianlong

@yuzhimanhua
Copy link
Owner

Dear Qianlong,

Thank you very much for your interest in our work!

We agree with your comment that a graph format of MAPLE may increase its usability. We will try to work on that and release it in several weeks. Thanks for the suggestion!

Regarding your question about paper references, for each paper in MAPLE, we include all of its references (represented by IDs) in our dataset. A considerable proportion of these references may not appear as papers in MAPLE (e.g., they are not published in top journals / conferences); some others, as you said, may appear in MAPLE but in a different field. In our paper, the reference ID is used as an input feature to the paper classifier, so we no longer need to know other information about the reference (e.g., text and metadata). However, if you would like to construct a graph, you may need to remove those references not appearing in MAPLE.

Please let us know if you have further questions.

Best,
Yu

@HoytWen
Copy link
Author

HoytWen commented Mar 22, 2023

Thanks for your further illustration, I really appreciate it.

Yes, I believe removing the references not appearing in MAPLE is certainly an option, but the constructed graph could also be overly sparse since a large portion of references will be removed (some fields might have 80%~90% unmapped references according to my statistics study). Since MAPLE is constructed from MAG, is there any possibility that we can directly utilize the graph structure in MAG and split it into different sub-graphs (fields) as MAPLE?

Anyway, thanks again for your help, I look forward to you releasing the graph format of MAPLE!

@yuzhimanhua
Copy link
Owner

Hi Qianlong,

We have created a graph format of MAPLE. The data is available at https://zenodo.org/record/7797563.
You can refer to https://github.com/yuzhimanhua/MAPLE/blob/master/README_Graph.md for more details.

We removed the references not appearing in MAPLE to construct the graph. As you mentioned, in some fields (e.g., Art, History), the graph was sparse. We also tried to add all those missing references to the graph (by retrieving their text and metadata from MAG). In this case, the graph certainly became larger, but it did not become denser because the newly added papers brought even more unmapped neighbors.

We agree that directly splitting MAG may solve the problem. Thank you for the suggestion! We will explore it later.

@HoytWen
Copy link
Author

HoytWen commented Apr 4, 2023

Thanks for your work and contribution, I really appreciate it!

@HoytWen
Copy link
Author

HoytWen commented Apr 26, 2023

Dear MAPLE authors,

I recently did some preliminary experiments on some sub-fields (e.g., CSRankings and Art) of the MAPLE graph dataset and found a interesting phenomenon. In my experiments, I found that MLPs easily outperformed GNNs with the same number of parameters, which was unexpected. Typically, the absence of graph structures results in a 10-40% performance downgrade, but in this dataset, the opposite was observed. This phenomenon suggests that the graph structures used in this task may be detrimental to node classification performance.

Could you please help me resolve my question?

Best,
Qianlong

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants