Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exploratory facebook notebook #14 issue #38

Merged
merged 28 commits into from
Nov 2, 2021

Conversation

stef4k
Copy link
Contributor

@stef4k stef4k commented May 14, 2021

I have created a facebook circles notebook as described in #14 . The analysis consists of the graph representation, basic topological attributes, centrality measures, clustering effects, bridges, assortativity and communities. However, I am not sure about:

  • I have used many graph representations in different sections to showcase the findings, which take time to load due to the plethora of nodes and edges

@stef4k stef4k force-pushed the notebook-development branch from 87a62bb to de5ea8b Compare May 15, 2021 14:57
@rossbar
Copy link
Contributor

rossbar commented May 15, 2021

Thanks for the contribution @stef4k ! I've only had time to skim it so far, but this seems like a great addition!

I have used many graph representations in different sections to showcase the findings, which take time to load due to the plethora of nodes and edges

This is indeed something we'll have to iterate on - there is currently a timeout of 30 seconds/cell, which is why the ci is failing. This limit is intentional - the idea being that for a tutorial to be effective as an interactive document, the computation times should be manageable. If it takes several minutes to run the analysis in a single cell, that's unlikely to be as useful to users who are trying to run the notebooks interactively. I haven't investigated in detail yet, so maybe there are opportunities to speed some things up. I think the most effective strategy will likely be to limit long-running analyses to a subset of data to illustrate/test things out (this is a best-practice when it comes to exploratory data analysis), but we will see.

I will aim to start taking a closer look at this ASAP!

@stef4k
Copy link
Contributor Author

stef4k commented May 16, 2021

Thank you for the quick response @rossbar . From what I understand, the main problem is the nx.draw_networkx() command due to the large network size. Some different options I thought are:

  • Uploading the graph representations as images e.g. in a folder called images (That could only solve the problem for the draw_networkx command)
  • Reducing the facebook circles analyzed (currently 10 from the data) to a more managable amount (would require significant changes to the analysis).

Lastly, the other commands with long waiting time are nx.diameter, nx.average_shortest_path_length, nx.shortest_path, nx.centrality.betweenness_centrality and nx.centrality.closeness_centrality

@stef4k
Copy link
Contributor Author

stef4k commented Sep 29, 2021

@rossbar is there anything I can or should do here?

@rossbar
Copy link
Contributor

rossbar commented Sep 30, 2021

Thanks for the ping @stef4k , I let this slip through the cracks "/

I'll aim to take another look at it early next week!

Copy link
Contributor

@rossbar rossbar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay @stef4k , I've finally had a chance to take a look at the entire notebook. I think it's a great example and we should definitely aim to get it in!

There are a few high-level things that I think need to be taken care of to polish this up. Fortunately I don't think there are any major blockers, but I do suspect that it'll take quite a few iterations to improve things in a manner that isn't too overwhelming, since the notebook covers so much.

I think the GitHub suggestion mechanism is too cumbersome for some of the changes, so it'd probably be best to work out another workflow... If you're comfortable with it, I'd be happy to open PRs against your notebook-development branch for some of the larger suggestions. Another alternative would be to make minor improvements now, put the PR in, then make the more extensive changes as subsequent PRs. I'm happy with that approach too, though it might need a bit of extra reviewer bandwidth to get changes in. @MridulS do you have a preference (or other ideas)?

@MridulS
Copy link
Member

MridulS commented Oct 9, 2021

Great work @stef4k! We should get this in ASAP. But as @rossbar mentions that this notebook covers a lot so it may take a lot of back and forth. We could merge this in and then make changes as required, and we could also do a "pair-programming-review" over Zoom/Hangouts to reduce the overhead of the back and forth review if @stef4k and @rossbar are okay with it?

@stef4k
Copy link
Contributor Author

stef4k commented Oct 10, 2021

Thank you for the kind words. @rossbar and @MridulS I am okay with any plan you suggest to follow

Copy link
Contributor

@rossbar rossbar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote to see how far we can get working on GitHub. I basically see three big steps:

  1. Minor code fixups, like the suggestions in this review
  2. A few larger suggestions for refactoring sections of code, and
  3. Doing a pass over the text.

I think the GitHub suggestion feature is more than adequate for step 1. For step 2, I propose to make some pull requests against your notebook-development branch @stef4k . This may be okay - I'll try to keep things short :). If it gets too cumbersome we can think about a different approach. Finally, step 3 is always tricky because it's so subjective - I would think a single pass of suggestions should do the trick though.

Finally - one thing that would be helpful as a first step would be to merge with (or, if you're comfortable doing so, rebase on) main - there's some configuration there that might get the longer-running CI cells passing, which makes reviewing a bit easier. I'm happy to do this as well if you'd prefer @stef4k , just LMK!

content/exploratory_notebooks/facebook_notebook.md Outdated Show resolved Hide resolved
content/exploratory_notebooks/facebook_notebook.md Outdated Show resolved Hide resolved
content/exploratory_notebooks/facebook_notebook.md Outdated Show resolved Hide resolved
content/exploratory_notebooks/facebook_notebook.md Outdated Show resolved Hide resolved
Fetching upsteam changes to notebook-development branch
@stef4k stef4k force-pushed the notebook-development branch from cd2eebc to 610e159 Compare October 18, 2021 22:14
@stef4k
Copy link
Contributor Author

stef4k commented Oct 18, 2021

@rossbar thank you for the suggestions. I accepted them and fetched the upstream changes both into my main and notebook-development branch of my origin. Also, I invited you as a collaborator in my origin to make things easier. Do you want me to merge the notebook-development branch into the main in my origin?

@rossbar
Copy link
Contributor

rossbar commented Oct 24, 2021

Do you want me to merge the notebook-development branch into the main in my origin?

Nope that shouldn't be necessary. I will open PRs to stef4k/notebook-development instead, which you should see on your fork. I recently opened stef4k#2 with a proposal to reuse the plot layouts to cut the notebook runtime by nearly a factor of 2.

I wanted to open a PR instead of pushing directly to your branch because these are relatively big changes, and I wanted to give you a chance to review them and question/comment rather than just forcing stuff directly into your work! Once you've had a chance to review the PR on your fork, you can merge it (if you're happy with it) and the changes should automatically show up here without the need for any more git acrobatics :)

Suggestion: Reuse layouts for plotting
@stef4k
Copy link
Contributor Author

stef4k commented Oct 28, 2021

@rossbar I totally understand. This way will let me learn and understand more things, while I am also participating.

However, after committing the changes I saw that there was a new error at build ubuntu and in detail at install dependencies, as you can see in the screenshot:

image

@rossbar
Copy link
Contributor

rossbar commented Oct 28, 2021

Ah, thanks for the heads up - time to update the Python version for the CI checks!

@rossbar
Copy link
Contributor

rossbar commented Oct 28, 2021

Ok @stef4k , the CI is fixed now on the main branch which I merged into this branch and pushed up so everything should be running again. Just make sure to git pull the latest changes.

Also it looks like the performance improvements + the more relaxed runtime restrictions on main combined so that the circleCI execution no longer times out 🎉 This means the preview feature should now be working: https://119-40210422-gh.circle-artifacts.com/0/site/_build/html/content/exploratory_notebooks/facebook_notebook.html.

Copy link
Contributor

@rossbar rossbar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we've cut the run time in half, I think this is in shape to go in. There are still a few things I'd like to experiment with/maybe some wording reorganization, but all of those things fall firmly into follow-up-PR territory. What do you think @MridulS ?

A huge thanks @stef4k for your patience and perserverance!

Copy link
Member

@MridulS MridulS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM too! @rossbar we can do a follow up PR to take care of the reorganisation of this notebook.

Thanks @stef4k!

@rossbar rossbar merged commit 88db175 into networkx:main Nov 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants