Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to host our documentation #12

Closed
rabernat opened this issue Sep 19, 2017 · 22 comments · Fixed by #28
Closed

How to host our documentation #12

rabernat opened this issue Sep 19, 2017 · 22 comments · Fixed by #28

Comments

@rabernat
Copy link
Member

A major goal of this project is basically producing a ton of documentation. Eventually I envision an online resource consisting of:

Our wiki is useful for getting info online fast, but it might not be the best option for scaling up. It's not scriptable, PR-able, etc. And it can't easily import jupyter notebooks.

Some options for moving forward are:

There are pros and cons to each. I'm curious to hear your thoughts.

@dopplershift
Copy link
Contributor

I'm a big fan of using sphinx. As much as it can be a pain, I like the workflow of making sure PRs pass a build (using Travis) as well as having Travis handle deploying updated docs to GitHub Pages. You can also use this build step to do a link check and ensure that you don't have any stale links in your materials.

There's even sphinx-gallery that streamlines creating a gallery of images from a set of example scripts; these examples can even be downloaded as notebooks. The only downside is that you have to reformat the notebooks as python scripts (with a bit of markup to separate cells). I will say that version controlling python scripts is much easier than trying to do so for notebooks.

We have an example of this up here (rendered here).

@mrocklin
Copy link
Member

Any of the options seems fine to me.

@jhamman
Copy link
Member

jhamman commented Sep 20, 2017

I'll cast my vote for sphinx. I think the seaborn documentation, which uses sphinx, covers a lot of the functionality we are looking for. Those docs are built from a combinations of RST, jupyter notebooks (tutorials), and python scripts.

I'd also be okay with using a Jekyll plugin that renders jupyter notebooks and building a notebook gallery into the existing website.

@rabernat
Copy link
Member Author

Turnout was low, but sounds like sphinx is the winner. (It's also my preference...but I didn't want to tip the scale initially.)

Any volunteers to set it up? @dopplershift...it sounds like you've got some solid experience with this. Maybe we could just fork the Unidata python-gallery repo to get started? I would like to have a system in place to start hosting the use-case notebooks asap.

@dopplershift
Copy link
Contributor

@rabernat I'm happy to set it up. Is there an existing repo we want this set up on?

And to be clear, the python-gallery solution doesn't actually store notebooks, they have to be formatted python scripts: https://github.com/Unidata/python-gallery/blob/master/examples/xarray_500hPa_map.py
Building the docs turns them into notebooks.

If it would be a better workflow for people just to commit notebooks, we can look into some other sphinx extensions. I just want to make sure we're clear on all this.

@rabernat
Copy link
Member Author

If it would be a better workflow for people just to commit notebooks, we can look into some other sphinx extensions. I just want to make sure we're clear on all this.

I'm glad you did clarify. I had misunderstood what python-gallery does. In the xgcm docs (conf.py) we use the nbsphinx extension to allow notebooks to be rendered directly. This would probably be a better choice for pangeo docs.

As for the repo, I was thinking we could just use this one (and rename it to "pangeo-docs"). Alternatively, we could create a new repo. What do people think? I have no problem repurposing this for docs + general discussion, since they are closely related.

@jhamman
Copy link
Member

jhamman commented Sep 28, 2017

@dopplershift - maybe we should touch base over the phone today. There are sphinx extensions that do notebook conversion. I have played around with these a bit and would be happy to help you put this together.

@dopplershift
Copy link
Contributor

@jhamman Right, and we can use those instead. I have found, though, that the script approach is significantly easier to manage through GitHub, lint using flake8, and automatically execute to ensure they work. It's also nice not to have to deal with the junk that results when updating images within notebooks. Yes, you can have nbconvert execute the notebooks, but I've run into a variety of challenges there, the worst of which is the fact that errors from running the notebooks can be inscrutable.

Regardless, I'm not passionately against the notebooks, and I'm happy to set up a repo either way. Really, it probably comes down to who's adding notebooks and what the best workflow is for them.

@jhamman
Copy link
Member

jhamman commented Sep 28, 2017

A few thoughts.

  1. it is possible to lint jupyter notebooks, so we can look into that.
  2. Are the use case notebooks intended to be fully portable? I was assuming they were going to be examples of a workflow but would not be required to run apart from Cheyenne/GCP. Ideally, we just convert the content (including figures) into html from the notebook without executing the code. I could be wrong here though.

@dopplershift
Copy link
Contributor

My experience is that notebooks which don't run regularly quickly bitrot, but I'm also a bit of a fanatic about automating everything using Travis. I'll defer to others on what we want to do here.

@rabernat
Copy link
Member Author

rabernat commented Oct 4, 2017

Are the use case notebooks intended to be fully portable?

I think that the notebooks will not be fully portable because the underlying data is too big to be portable. The best we can hope for in terms of portability is that, if a user has access to Cheyenne (or, eventually, GCP), then they should be able to execute the notebook. Travis will certainly not be able to download the TB of data associated with the use cases just to build the docs.

My experience is that notebooks which don't run regularly quickly bitrot

@dopplershift, I appreciate your point here. That is certainly important for the primary documentation for a package such as metpy or xarray. However, for the reasons mentioned above, I think it is impractical here.

We can still use travis to automatically build the docs. We can just disable notebook execution within sphinx.

@rabernat
Copy link
Member Author

rabernat commented Oct 4, 2017

Perhaps it would be possible to set up an automated system to re-execute the notebooks on the supported platforms at some regular interval (or in response to github events). That could help prevent the bitrot issue.

@jmunroe
Copy link

jmunroe commented Oct 4, 2017

For what's worth, this is exactly the workflow I use for our COSIMA Cookbooks:

  1. Travis (off-site) for checking the that the code/packages behind the notebooks have not broken
  2. A Jenkins server (hosted internally) rebuilds the notebooks on a compute node that access file-level access to the data stored on the HPC system.
  3. readthedocs.io (off-site) generates webpages from notebooks as committed using a sphinx plugin with notebook execution disabled

To satisfy the 'portability problem' I have played around with using data services (e.g. opendap) to allow users to run the notebook even if they do not have their own accounts on the HPC resource.

@dopplershift
Copy link
Contributor

@rabernat Regardless of end solution, I should be able to make a start on this. Did you have a repo in mind where these should live?

@rabernat
Copy link
Member Author

rabernat commented Oct 4, 2017

Did you have a repo in mind where these should live?

This repo! Why not?

@rabernat
Copy link
Member Author

rabernat commented Oct 5, 2017

To clarify, I propose we rename this repo pangeo-docs and host the website as a project page under pangeo-data.github.io/pangeo-docs

@jhamman
Copy link
Member

jhamman commented Oct 9, 2017

@rabernat, can we just call the repo pangeo? I'll propose putting a few conda environments and docker files in the same repo.

@rabernat
Copy link
Member Author

rabernat commented Oct 9, 2017

Yes, let's just rename this repo to generic pangeo and start putting actual stuff in it.

@rabernat
Copy link
Member Author

rabernat commented Oct 9, 2017

renaming done

@dopplershift
Copy link
Contributor

Feel free to commit whatever notebooks/content (anything not in a notebook should be restructured text) and I will reorganize as necessary when I have the cycles to put together the doc system.

@rabernat
Copy link
Member Author

As long as we are discussing documentation stuff in this thread, I will raise a slightly OT issue.

I am pondering whether notebooks are actually the best format for documenting the use cases. Notebooks are great for interactive development, inline plotting, and quick sharing of the whole package. However, it is not currently possible to comment on a notebook submitted by PR to github (see jupyter/notebook#2727). This could end up being a big problem as we collect use cases.

@naomi-henderson's comment on #1 includes a link to a notebook that I would like to propose some changes to before merging. This would be trivial if it were just a python script in a text file, but impossible with a notebook.

This points back to the earlier python-gallery suggestion after all.

@rabernat
Copy link
Member Author

Just checking in on this issue. It would be great to get a basic documentation site going soonish.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants