Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

end-to-end issue #4

Open
Yide-Qiu opened this issue Oct 30, 2023 · 2 comments
Open

end-to-end issue #4

Yide-Qiu opened this issue Oct 30, 2023 · 2 comments

Comments

@Yide-Qiu
Copy link

Hello. In your paper you mentioned that this work "was unified into an end-to-end framework".
However, in your published code: 1) you directly use ogb-supplied features instead of text attributes; 2) your work includes an inevitable pre-training process.
Do you consider this work as "end-to-end" and why? Looking forward to your reply.

@yueliu1999
Copy link
Owner

yueliu1999 commented Oct 30, 2023

Hi, thanks for your attention.

Most deep graph clustering methods, and even most graph representation learning methods, all directly user ogbn-supplied features instead of raw text. We consider it to be a default setting. Recently, benefitting from the strong general knowledge understanding capability of LLMs, a few methods [1, 2] deal with the raw text via LLMs. It is a potential way.

The purpose of the pre-training process is to obtain the initialized cluster center embeddings. It is a widely used technique of graph learning, CV, and NLP. The related competitor S3GC [3] first performs graph representation learning and then directly performs k-means on the learned node embeddings. We consider this process to separate the representation learning and clustering optimization. Therefore, we first pre-train the encoders, and then, at the fine-tuning stage, we unified the representation learning and clustering optimization into an end-to-end framework.

Without pre-training, namely, training the whole network from scratch, it is hard to achieve promising performance, especially in the purely unsupervised clustering task. There are some methods [4, 5] that are free from pre-training, but they are similar to S3GC. They first perform representation learning and then perform k-means. If you have any questions or suggestions, feel free to contact me on WeChat: ly13081857311. Any issues and pull requests are also welcomed.

[1] He X, Bresson X, Laurent T, et al. Explanations as Features: LLM-Based Features for Text-Attributed Graphs[J]. arXiv preprint arXiv:2305.19523, 2023.
[2] Zhao J, Zhuo L, Shen Y, et al. Graphtext: Graph reasoning in text space[J]. arXiv preprint arXiv:2310.01089, 2023.
[3] Devvrit F, Sinha A, Dhillon I, et al. S3GC: Scalable self-supervised graph clustering[J]. Advances in Neural Information Processing Systems, 2022, 35: 3248-3261.
[4] Liu Y, Yang X, Zhou S, et al. Simple contrastive graph clustering[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023.
[5] Liu Y, Yang X, Zhou S, et al. Hard sample aware network for contrastive deep graph clustering[C]//Proceedings of the AAAI conference on artificial intelligence. 2023, 37(7): 8914-8922.

@Yide-Qiu
Copy link
Author

Yide-Qiu commented Nov 1, 2023

Thanks for your detailed reply, pre-training to get clustering embeddings is indeed a good idea. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants