How to host our documentation #12

rabernat · 2017-09-19T20:55:49Z

A major goal of this project is basically producing a ton of documentation. Eventually I envision an online resource consisting of:

Documentation on how to deploy a "pangeo environment" (including distributed cluster) in various contexts such as
- different clusters (Cheyenne [already started], Gaea, Pleiades, etc.)
- generic local cluster
- cloud environment
Guides for system administrators about how to support pangeo environment
Domain-specific tutorials based on the use-cases (e.g. Use Case Notebook for "Atmospheric Moisture Budgets" #1, Use Case Notebook for "Statistical Downscaling" #11)
Benchmarking results

Our wiki is useful for getting info online fast, but it might not be the best option for scaling up. It's not scriptable, PR-able, etc. And it can't easily import jupyter notebooks.

Some options for moving forward are:

Just keep the github wiki as the central pangeo documentation resource.
Use the Pangeo jeykll site (repo).
Develop a sphinx site for our documentation.

There are pros and cons to each. I'm curious to hear your thoughts.

dopplershift · 2017-09-19T22:22:32Z

I'm a big fan of using sphinx. As much as it can be a pain, I like the workflow of making sure PRs pass a build (using Travis) as well as having Travis handle deploying updated docs to GitHub Pages. You can also use this build step to do a link check and ensure that you don't have any stale links in your materials.

There's even sphinx-gallery that streamlines creating a gallery of images from a set of example scripts; these examples can even be downloaded as notebooks. The only downside is that you have to reformat the notebooks as python scripts (with a bit of markup to separate cells). I will say that version controlling python scripts is much easier than trying to do so for notebooks.

We have an example of this up here (rendered here).

mrocklin · 2017-09-19T22:24:59Z

Any of the options seems fine to me.

jhamman · 2017-09-20T05:23:01Z

I'll cast my vote for sphinx. I think the seaborn documentation, which uses sphinx, covers a lot of the functionality we are looking for. Those docs are built from a combinations of RST, jupyter notebooks (tutorials), and python scripts.

I'd also be okay with using a Jekyll plugin that renders jupyter notebooks and building a notebook gallery into the existing website.

rabernat · 2017-09-26T22:08:43Z

Turnout was low, but sounds like sphinx is the winner. (It's also my preference...but I didn't want to tip the scale initially.)

Any volunteers to set it up? @dopplershift...it sounds like you've got some solid experience with this. Maybe we could just fork the Unidata python-gallery repo to get started? I would like to have a system in place to start hosting the use-case notebooks asap.

dopplershift · 2017-09-28T15:59:44Z

@rabernat I'm happy to set it up. Is there an existing repo we want this set up on?

And to be clear, the python-gallery solution doesn't actually store notebooks, they have to be formatted python scripts: https://github.com/Unidata/python-gallery/blob/master/examples/xarray_500hPa_map.py
Building the docs turns them into notebooks.

If it would be a better workflow for people just to commit notebooks, we can look into some other sphinx extensions. I just want to make sure we're clear on all this.

rabernat · 2017-09-28T16:34:55Z

If it would be a better workflow for people just to commit notebooks, we can look into some other sphinx extensions. I just want to make sure we're clear on all this.

I'm glad you did clarify. I had misunderstood what python-gallery does. In the xgcm docs (conf.py) we use the nbsphinx extension to allow notebooks to be rendered directly. This would probably be a better choice for pangeo docs.

As for the repo, I was thinking we could just use this one (and rename it to "pangeo-docs"). Alternatively, we could create a new repo. What do people think? I have no problem repurposing this for docs + general discussion, since they are closely related.

jhamman · 2017-09-28T16:44:40Z

@dopplershift - maybe we should touch base over the phone today. There are sphinx extensions that do notebook conversion. I have played around with these a bit and would be happy to help you put this together.

dopplershift · 2017-09-28T18:48:24Z

@jhamman Right, and we can use those instead. I have found, though, that the script approach is significantly easier to manage through GitHub, lint using flake8, and automatically execute to ensure they work. It's also nice not to have to deal with the junk that results when updating images within notebooks. Yes, you can have nbconvert execute the notebooks, but I've run into a variety of challenges there, the worst of which is the fact that errors from running the notebooks can be inscrutable.

Regardless, I'm not passionately against the notebooks, and I'm happy to set up a repo either way. Really, it probably comes down to who's adding notebooks and what the best workflow is for them.

jhamman · 2017-09-28T19:19:16Z

A few thoughts.

it is possible to lint jupyter notebooks, so we can look into that.
Are the use case notebooks intended to be fully portable? I was assuming they were going to be examples of a workflow but would not be required to run apart from Cheyenne/GCP. Ideally, we just convert the content (including figures) into html from the notebook without executing the code. I could be wrong here though.

dopplershift · 2017-09-28T19:44:28Z

My experience is that notebooks which don't run regularly quickly bitrot, but I'm also a bit of a fanatic about automating everything using Travis. I'll defer to others on what we want to do here.

rabernat · 2017-10-04T00:27:53Z

Are the use case notebooks intended to be fully portable?

I think that the notebooks will not be fully portable because the underlying data is too big to be portable. The best we can hope for in terms of portability is that, if a user has access to Cheyenne (or, eventually, GCP), then they should be able to execute the notebook. Travis will certainly not be able to download the TB of data associated with the use cases just to build the docs.

My experience is that notebooks which don't run regularly quickly bitrot

@dopplershift, I appreciate your point here. That is certainly important for the primary documentation for a package such as metpy or xarray. However, for the reasons mentioned above, I think it is impractical here.

We can still use travis to automatically build the docs. We can just disable notebook execution within sphinx.

rabernat · 2017-10-04T00:30:11Z

Perhaps it would be possible to set up an automated system to re-execute the notebooks on the supported platforms at some regular interval (or in response to github events). That could help prevent the bitrot issue.

jmunroe · 2017-10-04T00:42:18Z

For what's worth, this is exactly the workflow I use for our COSIMA Cookbooks:

Travis (off-site) for checking the that the code/packages behind the notebooks have not broken
A Jenkins server (hosted internally) rebuilds the notebooks on a compute node that access file-level access to the data stored on the HPC system.
readthedocs.io (off-site) generates webpages from notebooks as committed using a sphinx plugin with notebook execution disabled

To satisfy the 'portability problem' I have played around with using data services (e.g. opendap) to allow users to run the notebook even if they do not have their own accounts on the HPC resource.

dopplershift · 2017-10-04T15:52:58Z

@rabernat Regardless of end solution, I should be able to make a start on this. Did you have a repo in mind where these should live?

rabernat · 2017-10-04T20:50:28Z

Did you have a repo in mind where these should live?

This repo! Why not?

rabernat · 2017-10-05T21:26:09Z

To clarify, I propose we rename this repo pangeo-docs and host the website as a project page under pangeo-data.github.io/pangeo-docs

jhamman · 2017-10-09T17:20:45Z

@rabernat, can we just call the repo pangeo? I'll propose putting a few conda environments and docker files in the same repo.

rabernat · 2017-10-09T17:23:17Z

Yes, let's just rename this repo to generic pangeo and start putting actual stuff in it.

rabernat · 2017-10-09T17:23:44Z

renaming done

dopplershift · 2017-10-09T21:30:34Z

Feel free to commit whatever notebooks/content (anything not in a notebook should be restructured text) and I will reorganize as necessary when I have the cycles to put together the doc system.

rabernat · 2017-10-10T01:19:37Z

As long as we are discussing documentation stuff in this thread, I will raise a slightly OT issue.

I am pondering whether notebooks are actually the best format for documenting the use cases. Notebooks are great for interactive development, inline plotting, and quick sharing of the whole package. However, it is not currently possible to comment on a notebook submitted by PR to github (see jupyter/notebook#2727). This could end up being a big problem as we collect use cases.

@naomi-henderson's comment on #1 includes a link to a notebook that I would like to propose some changes to before merging. This would be trivial if it were just a python script in a text file, but impossible with a notebook.

This points back to the earlier python-gallery suggestion after all.

rabernat · 2017-10-30T14:25:00Z

Just checking in on this issue. It would be great to get a basic documentation site going soonish.

jhamman mentioned this issue Oct 9, 2017

Use Case Notebook for "Statistical Downscaling" #11

Closed

5 tasks

rabernat mentioned this issue Nov 12, 2017

Sphinx site #28

Merged

rabernat closed this as completed in #28 Nov 13, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to host our documentation #12

How to host our documentation #12

rabernat commented Sep 19, 2017

dopplershift commented Sep 19, 2017

mrocklin commented Sep 19, 2017

jhamman commented Sep 20, 2017

rabernat commented Sep 26, 2017

dopplershift commented Sep 28, 2017

rabernat commented Sep 28, 2017

jhamman commented Sep 28, 2017

dopplershift commented Sep 28, 2017

jhamman commented Sep 28, 2017

dopplershift commented Sep 28, 2017

rabernat commented Oct 4, 2017

rabernat commented Oct 4, 2017

jmunroe commented Oct 4, 2017

dopplershift commented Oct 4, 2017

rabernat commented Oct 4, 2017

rabernat commented Oct 5, 2017

jhamman commented Oct 9, 2017

rabernat commented Oct 9, 2017

rabernat commented Oct 9, 2017

dopplershift commented Oct 9, 2017

rabernat commented Oct 10, 2017

rabernat commented Oct 30, 2017

How to host our documentation #12

How to host our documentation #12

Comments

rabernat commented Sep 19, 2017

dopplershift commented Sep 19, 2017

mrocklin commented Sep 19, 2017

jhamman commented Sep 20, 2017

rabernat commented Sep 26, 2017

dopplershift commented Sep 28, 2017

rabernat commented Sep 28, 2017

jhamman commented Sep 28, 2017

dopplershift commented Sep 28, 2017

jhamman commented Sep 28, 2017

dopplershift commented Sep 28, 2017

rabernat commented Oct 4, 2017

rabernat commented Oct 4, 2017

jmunroe commented Oct 4, 2017

dopplershift commented Oct 4, 2017

rabernat commented Oct 4, 2017

rabernat commented Oct 5, 2017

jhamman commented Oct 9, 2017

rabernat commented Oct 9, 2017

rabernat commented Oct 9, 2017

dopplershift commented Oct 9, 2017

rabernat commented Oct 10, 2017

rabernat commented Oct 30, 2017