Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Community detection using spectral matrix analysis and clustering #23

Open
henripal opened this issue Feb 1, 2017 · 9 comments
Open

Comments

@henripal
Copy link
Contributor

henripal commented Feb 1, 2017

The idea here is to treat the graph matrix as a feature matrix and to use traditional dimension reduction/clustering techniques on these features.

An example workflow would be:

good testing ground is the twitter #far-right data.

also check out great post and tutorial by @alejandrox1 Data4Democracy/discursive#4
https://github.com/Data4Democracy/tutorials

@ashkan-leo
Copy link
Member

ashkan-leo commented Feb 1, 2017

I can start working on this one. I'm thinking doing spectral clustering on graph Laplacian (instead of the adjacency matrix itself). How are we going to test the algorithm though? (do we have the labels?) I don't know where to find the #far-right data.

@bstarling
Copy link
Contributor

@ashkan-leo added you to github org so I can assign this one to you. Please ping me on slack @bstarling to get the far-right data.

@henripal
Copy link
Contributor Author

henripal commented Feb 3, 2017

@ashkan-leo we don't have labels. How to evaluate the results is a great question. We could be rank users (by number of followers or PageRank) then try to manually identify some communities using the top ranked users as a guideline & comparing to the algorithmically generated communities?

@Data4Democracy Data4Democracy locked and limited conversation to collaborators Feb 3, 2017
@Data4Democracy Data4Democracy unlocked this conversation Feb 3, 2017
@gvdr
Copy link

gvdr commented Feb 3, 2017

Hi all.

My personal taste for large scale linear algebra problems is to first give it a go with Julia. The base svds is as powerful as I like it. shttp://docs.julialang.org/en/stable/stdlib/linalg/#Base.svds

We also have improved algos for large networks through https://github.com/nassarhuda/MatrixNetworks.jl and https://github.com/JuliaGraphs/LightGraphs.jl

@gvdr
Copy link

gvdr commented Feb 3, 2017

e.g., truncated SVD (10 singular values computed) on a sparse 45600x45600 matrix on my laptop:
16.078000 seconds (3.13 M allocations: 1.117 GB, 0.80% gc time)

@henripal
Copy link
Contributor Author

henripal commented Feb 4, 2017

@gvdr I'm prototyping in julia as well and love it; but definitely not a problem if anyone else want to prototype in their favorite language either at this stage, I guess?

@gvdr
Copy link

gvdr commented Feb 4, 2017

Absolutely! I was thinking in terms of infrastructures: if we end setting up a virtual environment where to do analysis (wherever it is) let's make it open to Julia as well, not only python and R ;-)

@bstarling
Copy link
Contributor

I reached out to the eventador folks about adding Julia kernel to the exiting notebook (they already added R). Next question will probably be in regards to packaging. Do you have a list of most common that you would want pre install you can post in channel or DM me? FYI domino who has donated compute infrastructure has Julia kernel as well.

@JoeMcEwen
Copy link

Curious, is this still being worked on? I am interested in helping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants