Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Community detection #4

Closed
hadoopjax opened this issue Jan 1, 2017 · 5 comments
Closed

Community detection #4

hadoopjax opened this issue Jan 1, 2017 · 5 comments

Comments

@hadoopjax
Copy link
Contributor

There are lots of ways to do community detection using Twitter data. We'll want to discuss the nuts-and-bolts on Slack but once we select an implementation we like we can track progress here. There's lots of neat emerging research we could try out, too (i.e. https://arxiv.org/pdf/1608.01771v1.pdf)!

@alejandrox1
Copy link

I would like to help on this. I see this is a relatively old issue, Is there already something set up?

@hadoopjax
Copy link
Contributor Author

Hi @alejandrox1 nope this one's just on the list but not yet started. I'll DM you on Slack to talk about getting started!

@alejandrox1
Copy link

alejandrox1 commented Jan 21, 2017

Hello, time to get started!

There are many ways to get this started, there are different consensus on the best tools/methods to use and what data is most important for community detection in social media. I have included a couple references I found interesting in here:
https://github.com/alejandrox1/References

First of all, I would like to encourage everyone contribute whatever references you have found interesting so that we can have them all in one place.
I think it would be best if everyone wanting to contribute tried to replicate the results from one of the reference materials. By each one of us working to try and replicate the work that has already been done we can all learn by actually doing - while having a benchmark for comparison - and by maintaining communication through Github and Slack it will become obvious what the common issues, the benefits, and the shortcomings of the different methods are.

In terms of possible projects within this issue are those related to building networks, visualization, analysis, prediction, and performance.

To get started on any of these topics check out these tutorials:

These tutorials briefly cover how to build networks, visualize them, some measures that can be used to analyze the network, and link prediction
@nick and @grichardson are working on building graphs.

Also, there is the library community, which works on top of networkx and is used for community detection:
https://bitbucket.org/taynaud/python-louvain

Graph Databases

@acompa
Copy link

acompa commented Jan 29, 2017

Hey there! I might be able to help out with this -- I worked on both the algorithmic and engineering sides of community detection at Scale Model. I'll share some scattered thoughts below.

We used friendships between users to build graphs (eg. user A follows user B => A -> B), although we had to drop directionality since, IIRC, Louvain (which we also used) cannot partition directed graphs.

I've used both igraph and networkx for building and partitioning Twitter subgraphs. Note that igraph is actually a C-optimized graph library similar to networkx. You'll find networkx to be easy but slow, while igraph has a more esoteric API that runs way faster.

I actually think the best large-scale graph solution is something like GraphX, while igraph is best for partitioning smaller graphs efficiently (if we don't have money to throw at this problem, like your average startup :) ).

Feel free to message me on D4D Slack (achompas) or on here if you have any specific questions.

@hadoopjax
Copy link
Contributor Author

I'm closing this issue as it has moved to Assemble

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants