Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate out documentation building and publishing per provider #11423

Closed
potiuk opened this issue Oct 11, 2020 · 22 comments · Fixed by #12892
Closed

Separate out documentation building and publishing per provider #11423

potiuk opened this issue Oct 11, 2020 · 22 comments · Fixed by #12892

Comments

@potiuk
Copy link
Member

potiuk commented Oct 11, 2020

No description provided.

@potiuk
Copy link
Member Author

potiuk commented Oct 11, 2020

Hey @mik-laj @kaxil - I guess it would be great to separate out the docs per-provider - both generation and publishing. This would only be needed when we go to 2.0 so we have quite some time for that, But I thought you might be the best people to take care about it :)

@mik-laj
Copy link
Member

mik-laj commented Nov 5, 2020

I worked on this ticket yesterday / today and managed to build documentation for providers package..
https://wicked-army.surge.sh/
I haven't migrated all the content yet, but the most difficult case - Google package have been successfully migrated fully, along with reference documentation for Python API and configuration.
https://wicked-army.surge.sh/google/html/index.html

There are two more serious issues that need to be discussed.

  1. ReadtheDocs: Unfortunately, we will have to abandon ReadTheDocs to build the documentation. It doesn't allow you to run your own build scripts. and we can only have one documentation for the repository. Besides, it causes various problems over which we have little control. For example, now Python API reference documentation does not build properly - https://airflow.readthedocs.io/en/latest/_api/index.html
    We will probably be able to solve this problem quickly if we receive financial support for CI. Then we will be able to build the documentation ourselves and publish on S3/GCS or other.

  2. Operators and hooks reference: This page have information from all providers, so it is not possible to divide it.
    https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html
    In its present form, it cannot remain if we want to have a separate docs per provider. I propose that we maintain the same information in the YAML file and then reuse them as needed.
    For development purposes, we can generate a markdown file which we will store in the repository.
    For production/website, we can also display this data as a markdown on website, or ... as build a complex interface similar to Terraofmr Registry. It could be fairly simple if we have all data in YAML and we had a contributor with React experience.

CC: @ryw @potiuk @iadi7ya @francescomucio @jward-bw @jhtimmins @kaxil @paolaperaza @pcandoalmeida @xinbinhuang

Related issue: apache/airflow-site#301

@potiuk
Copy link
Member Author

potiuk commented Nov 5, 2020

Will take a look shortly :)

@ryw
Copy link
Member

ryw commented Nov 9, 2020

Hi @mik-laj - I reviewed this and chatted w/ @kaxil today, looks good structurally for v1.

In the short term, is the idea to build this into the airflow website next to the other docs? Trying to think what is simplest to get v1 out there. Do we want to provide versioned docs for each provider, I don't think so - just "latest"

We could have sublinks across the top "Airflow" and "Airflow Providers" as a way to navigate to this providers docs?

Happy to jump on a call to brainstorm.

@potiuk
Copy link
Member Author

potiuk commented Nov 9, 2020

I think eventually we might need a doc per provider version. We can fully automate it - once we automate it for "latest" it will be almost no effort to automate it for "per-provider-version". And it would be rather confusing for people looking at the provider's doc from latest version while they will be using another.

Just the fact that we agreed to Semver and agreed that we might have breaking changes pretty much implies that we need to have "per-version" documentation. Imagine we have 1.0.0 versions of Google provider and then we introduce 2.0.0 which will introduce breaking changes (for example after we migrate to Google 2.0 Python APIS). We need to provide docs for both versions for quite some time. And It's even likely that we will release a 1.0.1 Google provider with bugfixes for 1.0.0 (though this still waits for #11425 to be completed).

I think we have no choice but to implement all of it, including the possibility of choosing version per provider - this IMHO is pretty much sealed when we agreed to allow for breaking changes for each provider. And it's not even difficult - we can (and should) fully automate it.

It does not have to be there for "Day 0" - like when we release 2.0.0 and set of 1.0.0 providers, it can be "no version" but very soon after we have to support versions. And our tooling has to be prepared for that (and have it automated), because keeping it manually updated will be impossible.

@ryw
Copy link
Member

ryw commented Nov 9, 2020

Agree - we should ship v1 as "universal docs" since it's the first release for all the providers, but we'll have to address the problem pretty soon as providers start to independently update + release.

@mik-laj
Copy link
Member

mik-laj commented Nov 14, 2020

In the short term, is the idea to build this into the airflow website next to the other docs? Trying to think what is simplest to get v1 out there. Do we want to provide versioned docs for each provider, I don't think so - just "latest"

We update vendors very often, so I think it's worth breaking down these dossiers as soon as possible. If we are going to publish these documents, we must also give the opportunity to look at the archival version of the documentation. Mainly, so that the user can check whether a given operator is available in a given or needs to update to the latest version.

We could have sublinks across the top "Airflow" and "Airflow Providers" as a way to navigate to this providers docs?

I would like us to have an index (at the address: https://airflow.apache.org/docs/ ) that will describe all the products we release. For now, my focus is only on Airflow-core, a provider packages, but in the future we may add documentation for the rest of the products we release.
#11152

When the user selects a product, they gets a view similar to:
https://airflow.apache.org/docs/stable/#
However, there will be some differences for providers:

  • the search will work for content from the current product and version.
  • The title/breadcrumbs will contain information about the name of the package.

It does not have to be there for "Day 0" - like when we release 2.0.0 and set of 1.0.0 providers, it can be "no version" but very soon after we have to support versions. And our tooling has to be prepared for that (and have it automated), because keeping it manually updated will be impossible.

I think we should be prepared with the documentation for "Day 0". Otherwise we will have mixed content for different products and versions in one documentation. However, this documentation will not be easily updated, e.g. links will still point to out-of-date documents.

If we do not split the documentation, they will have problems with publishing some documents at "Day 0", e.g. changelog for provider packages.

@mik-laj
Copy link
Member

mik-laj commented Nov 14, 2020

During the split, I would also like to introduce one additional change - migrate the development version of the documentation to the official template. Now all contributors are using documentation that has a different template and sometimes the final documentation is buggy as a result. If everyone used one template, the bugs would be fixed faster.

@potiuk
Copy link
Member Author

potiuk commented Nov 16, 2020

This is cool! And yeah! if we can make it split from day 0. I'd really love that!

@mik-laj
Copy link
Member

mik-laj commented Nov 18, 2020

I already have the first successful build of full documentation on S3:
http://apache-airflow-docs.s3-website.eu-central-1.amazonaws.com/
For now, the content for google provider only is migrated, but in the follow-up PR we can migrate the rest of the content. Today I will open PR with what I have already managed to do.

@mik-laj
Copy link
Member

mik-laj commented Nov 19, 2020

Hello.

Today I would like to discuss the next step - Sphinx theme for our documentation. This theme is currently being developed in the airflow-site repository, but the theme package for installation is not published anywhere. Quite simply, if you want to build a production documentation, you have to install this theme on your own. This is reasonably OK if we only build documentation once every few months, but this is far from ideal.

The production and development documentation looks completely different. This means that if there is an error in the theme, we find out about it after publishing the documentation and any change is then much more difficult. This usually means that we have to edit each HTML file individually.

I would like to improve it now and install theme in Breeze and also provide a way to install this theme if you want to build documentation locally. I would not like to publish this package on Pypi so as not to clutter the public repository with packages that will not be used by other projects.

I think the easiest way is to build a theme on Github Action for airfllow-site and then publish theme to S3. Then we will be able to install the theme with the command:

pip install airflow-sphinx-theme --extra-index-url https://apache-airflow-pypi.s3-website.eu-central-1.amazonaws.com/

This looks like a simple task if we use https://github.com/novemberfiveco/s3pypi.
I was thinking about installing with pip+git:

pip install git+https://github.com

Unfortunately, this won't work as this theme has a complex build process. We must first build a website to generate the necessary artifacts to build a theme package.

CC: @potiuk @ryw @kaxil

@ashb
Copy link
Member

ashb commented Nov 20, 2020

Sounds great!

Another option might be to publish it as a Release on Github, and then we could install it as

pip install \
    https://github.com/apache/airflow-site/releases-download/1.0.0/apache_airflow_docs_theme-1.0.0-py3-none-any.whl

(To test this I uploaded the artifacts to 2.0.0b3 on Airflow: https://github.com/apache/airflow/releases/tag/2.0.0b3)

The advantage of using Github is Actions already has credentials to create releases (I think?) and we then dont need to manage keys for S3. Disadvantage is that we could only point at specific releases, and couldn't do airflow-docs-theme>=1.2.3 to instance.

@potiuk
Copy link
Member Author

potiuk commented Nov 21, 2020

Maybe - we can do better than that. Why don't we create a separate repository "apache/airflow-doc-theme" and put all the theme there ? then we can develop it separately and point to the tags/versions of the code (without even releasing it) same way as we do with airflow now:

pip install https://github.com/apache/airflow-doc-theme/<BRANCH_OR_TAG>.tar.gz#egg=apache-airflow-doc-theme

This will run setup.py locally, to build the theme. But maybe this is not as complex and can be done? The benefit is that if we decide to move it to PyPI, we can publish pre-built binary themes there similarly to NumPy prebuilt packages (PyPI accepts different variants of releases).

@potiuk
Copy link
Member Author

potiuk commented Nov 21, 2020

And we could combine both - keeping theme in separate repo and making them available as release as well).

Also - releasing it to PyPI is super easy. I think we should also consider simply releasing it via PyPI. Once we have the right set of artifacts, it is as easy as running "twine upload".

I am not sure why we excluded that so easily? Is there any problem with that @mik-laj since this is standard way of distributing packages? I do not think there is a "clutter" or any kind there, to be honest if it makes our life easier.

@mik-laj
Copy link
Member

mik-laj commented Nov 21, 2020

@potiuk The website and theme will share some files, more specifically you must have the site build output files to be able to build the theme. For this reason, moving this theme to a separate repository could be problematic.

We can think about using Pypi, but if this is actually going to be for internal use only and we don't expect users to install this theme, I don't think we should make it easy to find this theme. If publication on a private repository of packages will not be a big problem for us.

Now I even think that publishing on Github Releases might be easier for us as we won't have to provide credentials.

@mik-laj
Copy link
Member

mik-laj commented Nov 21, 2020

I am also wondering if publishing on Pypi will result in us having to meet some releasing requirements. Any user will be able to easily find this theme and install it. If we use a private repository, the files will be available only to the developers of this project. Ideally, we would also be able to configure the full CI / CD so that the package is available without any of our intervention. Even a manual little extra step of publishing the artifact would be a pain for us.

@potiuk
Copy link
Member Author

potiuk commented Nov 21, 2020

@potiuk The website and theme will share some files, more specifically you must have the site build output files to be able to build the theme. For this reason, moving this theme to a separate repository could be problematic.

I see. Fine for me.

Now I even think that publishing on Github Releases might be easier for us as we won't have to provide credentials.

Yep - that's much better. Just remember this will only work from master merge or workfow_run (the {PR token is read only)

@ashb
Copy link
Member

ashb commented Nov 21, 2020

Now I even think that publishing on Github Releases might be easier for us as we won't have to provide credentials.

Yep - that's much better. Just remember this will only work from master merge or workfow_run (the {PR token is read only)

In my head I had is only publishing a new release on tags - but we could probably have a "latest" release too that we overwrite the release blob for.

And if you want to test out the theme built from a pr we can use the upload-artifact action

@mik-laj
Copy link
Member

mik-laj commented Nov 23, 2020

I have prepared a PR that publishes the theme package on Github Action.
apache/airflow-site#308

@kaxil
Copy link
Member

kaxil commented Nov 23, 2020

I have prepared a PR that publishes the theme package on Github Action.
apache/airflow-site#308

LGTM

@potiuk
Copy link
Member Author

potiuk commented Dec 7, 2020

I guess this is done @mik-laj ? can we close it ?

@mik-laj
Copy link
Member

mik-laj commented Dec 7, 2020

Last PR: #12892

It would be nice to have this merged before the RC1 release, but that doesn't affect end users, so we can also merge it later and do one release more manually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants