Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using stub pages to replace redirects #2147

Open
oraNod opened this issue Nov 13, 2024 · 3 comments · May be fixed by #2266
Open

Using stub pages to replace redirects #2147

oraNod opened this issue Nov 13, 2024 · 3 comments · May be fixed by #2266
Assignees

Comments

@oraNod
Copy link
Contributor

oraNod commented Nov 13, 2024

This issue outlines a plan to replace redirects from .htaccess configuration files in the ansible/docsite repository with stub pages.

The purpose of this plan is to facilitate the migration of Ansible community documentation from Red Hat managed infrastructure to Read The Docs hosting. Redirects are created when pages move to avoid 404 errors. However we have more redirects now than we can add to projects on Read The Docs.

Stub pages

Stub pages are RST files that use the orphan metadata field so that the page is generated at build time but not included in the toctree. The result is that the stub page does not appear in the navigation or documentation structure, while still being reachable from a direct external link such as a bookmark or reference in a third-party site.

Stub pages provide an alternative to dynamic HTTP redirects that prevent 404 errors and broken links. They have the following benefits:

  • Avoid degradation of SEO authority compared with 404s.
  • Allow community contributors to restructure documentation when access to server-side redirects is not available.
  • Encourage users to update stale bookmarks and external links. This is arguably a disadvantage because it requires manual intervention.
  • Easier to maintain than redirects because there is no need for regex or special knowledge.

Stub pages also have some drawbacks:

  • Dynamic redirects are more efficient for users because they do not require manual intervention to access pages that have been moved or removed.
  • Stub pages can add clutter to the source repository. A large number of unused files can be confusing for new contributors and more difficult to navigate. Although we can put most stubs in a special folder to avoid polluting active directories.
  • Adds to the build time. While stub pages are not included in the toctree, they still need to be generated into HTML as part of the build.
  • Search engines are likely to index stub pages, which potentially dilutes SEO value of the moved pages. To offset this we should ensure that the actual content is not duplicated in the stub page and the new / moved page.
  • URLs for stub pages maintain a 200 status instead of a 301 status. We can, however, set the preferred url with canonical meta tags in the RST files.

Example stub page

Here is what an RST stub page should look like:

:orphan:

.. meta::
   :canonical: https://docs.ansible.com/ansible/latest/guide/page.html

**********
Page Title
**********

This page has moved to :ref:`moved_page_reference`.
Please update your bookmarks or links with the correct page url.

Background

So why do we need to create stub files in the first place?

Ansible community documentation that is available from docs.ansible.com has long been hosted on Red Hat managed infrastructure. This infrastructue includes Apache httpd services with the mod_rewrite module that provides dynamic redirect functionality. When pages in the Ansible package documentation were relocated or removed, a redirect rule was added to the .htaccess configuration file. These files are sourced in the ansible/docsite repository.

To migrate hosting to ReadTheDocs, and provide greater access to community maintainers, a strategy is needed to prevent broken redirects that would result in 404s and degradation of SEO authority. We can create some redirects in the ReadTheDocs project but there is a limit of 100 redirects per project.

Unfortunately the number of existing redirects already exceeds that limit. For moved pages alone, we have more than 200 redirects. Read The Docs imposes a limi of 100 redirects per project.

Additionally, a goal is to move away from creating redirects because this adds a lot of maintenance overhead to the project. The reasoning for the creation of many of the redirect rules in the .htaccess configuration files is historical knowledge with little no documentation.

Looking in the .htaccess file you can see rules such as this one:

RedirectMatch permanent "^/ansible/(devel|latest)/user_guide/playbooks_blocks.html" "/ansible/$1/playbook_guide/playbooks_blocks.html"

When we migrate the docs.ansible.com subdomain to Read The Docs hosting, we can update these redirects to point to the new project as follows:

RedirectMatch permanent "^/ansible/(devel|latest)/user_guide/playbooks_blocks.html" "https://docs.ansible.com/projects/ansible/$1/playbook_guide/playbooks_blocks.html"

This redirect works for any links that are internal to the Ansible community documentation. For instance, if a page references user_guide/playbooks_blocks.html the redirect will point to the corresponding page on ReadTheDocs.

However, if there is an external link such as a bookmark or reference from a third-party site, the redirect does not take effect because the docs.ansible.com subdomain has moved to ReadTheDocs.

In other words, the following page will not be on Read The Docs and the redirect does not take effect because docs.ansible.com is served by Read The Docs:

https://docs.ansible.com/ansible/latest/user_guide/playbooks_blocks.html

As a result, any external link that points to that page will cause a 404 error.

Stub pages on Read The Docs

One thing to note is that, as long as docs.ansible.com is on Red Hat managed infrastructure, we can keep the redirects in place. We do want to create all the stub pages before we migrate the subdomain over to RTD but they don't really serve a purpose until after the migration.

On ReadTheDocs we will have a top-level redirect in place that handles the new URL structure with the /projects/ subfolder:

Type: Exact Redirect
From URL: /ansible/*
To URL: /projects/ansible/:splat
Force Redirect: True

Here is how that top-level redirect will work:

# Original url
https://docs.ansible.com/ansible/latest/user_guide/playbooks_blocks.html

# Target url after redirection
https://docs.ansible.com/projects/ansible/latest/user_guide/playbooks_blocks.html

In this case, the stub page is in place for the target url. You can actually view the user_guide/playbooks_blocks.html stub page on ReadTheDocs today.

We created that stub page when we restructured the user guide. We also created the redirects for docs.ansible.com but, again, we won't be moving the redirects over to Read The Docs because there are too many.

@oraNod oraNod added the DaWGs Good discussion item for the DaWGs label Nov 13, 2024
@ansible-documentation-bot ansible-documentation-bot bot added the needs_triage Needs a first human triage before being processed. label Nov 13, 2024
@oraNod
Copy link
Contributor Author

oraNod commented Nov 13, 2024

Stubs that do not exist

Stub pages in this section do not exist in the ansible-documentation repository and need to be added if we go ahead with this plan. I've taken these pages from the .htaccess configuration here: https://raw.githubusercontent.com/ansible/docsite/refs/heads/main/.htaccess

Getting started

/ansible/(devel|latest)/intro_getting_started.html
/ansible/(devel|latest)/intro.html
/ansible/(devel|latest)/quickstart.html

Command line tools

/ansible/(devel|latest)/command_line_tools.html
/ansible/(devel|latest)/user_guide/command_line_tools.html
/ansible/(devel|latest)/intro_adhoc.html

CLI guides

/ansible/(devel|latest)/ansible.html
/ansible/(devel|latest)/ansible-vault.html
/ansible/(devel|latest)/ansible-pull.html
/ansible/(devel|latest)/ansible-playbook.html
/ansible/(devel|latest)/ansible-inventory.html
/ansible/(devel|latest)/ansible-galaxy.html
/ansible/(devel|latest)/ansible-doc.html
/ansible/(devel|latest)/ansible-console.html
/ansible/(devel|latest)/ansible-config.html

Vault guide

/ansible/(devel|latest)/playbooks_vault.html
/ansible/(devel|latest)/vault.html

Modules and plugins

/ansible/(devel|latest)/modules.html
/ansible/(devel|latest)/modules_intro.html
/ansible/(devel|latest)/modules_support.html
/ansible/(devel|latest)/plugin_filtering_config.html
/ansible/latest/modules.html

Installation guide

/ansible/(devel|latest)/intro_configuration.html
/ansible/(devel|latest)/intro_installation.html

Inventory guide

/ansible/(devel|latest)/intro_inventory.html
/ansible/(devel|latest)/intro_patterns.html
/ansible/(devel|latest)/intro_dynamic_inventory.html

Module list landing topics

/ansible/(devel|latest)/network_maintained.html
/ansible/(devel|latest)/partner_maintained.html
/ansible/(devel|latest)/community_maintained.html
/ansible/(devel|latest)/core_maintained.html
/ansible/(devel|latest)/list_of_windows_modules.html
/ansible/(devel|latest)/list_of_web_infrastructure_modules.html
/ansible/(devel|latest)/list_of_utilities_modules.html
/ansible/(devel|latest)/list_of_system_modules.html
/ansible/(devel|latest)/list_of_storage_modules.html
/ansible/(devel|latest)/list_of_source_control_modules.html
/ansible/(devel|latest)/list_of_remote_management_modules.html
/ansible/(devel|latest)/list_of_packaging_modules.html
/ansible/(devel|latest)/list_of_notification_modules.html
/ansible/(devel|latest)/list_of_network_modules.html
/ansible/(devel|latest)/list_of_net_tools_modules.html
/ansible/(devel|latest)/list_of_monitoring_modules.html
/ansible/(devel|latest)/list_of_messaging_modules.html
/ansible/(devel|latest)/list_of_inventory_modules.html
/ansible/(devel|latest)/list_of_identity_modules.html
/ansible/(devel|latest)/list_of_files_modules.html
/ansible/(devel|latest)/list_of_database_modules.html
/ansible/(devel|latest)/list_of_crypto_modules.html
/ansible/(devel|latest)/list_of_commands_modules.html
/ansible/(devel|latest)/list_of_clustering_modules.html
/ansible/(devel|latest)/list_of_cloud_modules.html
/ansible/(devel|latest)/list_of_all_modules.html

Networking guide

/ansible/(devel|latest)/network_best_practices_2.5.html
/ansible/(devel|latest)/network_debug_troubleshooting.html
/ansible/(devel|latest)/network_working_with_command_output.html
/ansible/(devel|latest)/network.html
/ansible/(devel|latest)/intro_networking.html

Plugins list

/ansible/(devel|latest)/plugins.html

Porting guide

/ansible/(devel|latest)/porting_guide_2.0.html
/ansible/(devel|latest)/porting_guide_2.3.html
/ansible/(devel|latest)/porting_guide_2.4.html
/ansible/(devel|latest)/porting_guide_2.5.html
/ansible/(devel|latest)/porting_guides.html

Reference and appendices

/ansible/(devel|latest)/common_return_values.html
/ansible/(devel|latest)/config.html
/ansible/(devel|latest)/faq.html
/ansible/(devel|latest)/galaxy.html
/ansible/(devel|latest)/reference_appendices/galaxy.html
/ansible/(devel|latest)/glossary.html
/ansible/(devel|latest)/guide_aci.html
/ansible/(devel|latest)/playbooks_keywords.html
/ansible/(devel|latest)/python_3_support.html
/ansible/(devel|latest)/release_and_maintenance.html
/ansible/(devel|latest)/test_strategies.html
/ansible/(devel|latest)/tower.html
/ansible/(devel|latest)/YAMLSyntax.html

Scenario Guides

/ansible/(devel|latest)/guide_aws.html
/ansible/(devel|latest)/guide_azure.html
/ansible/(devel|latest)/guide_cloudstack.html
/ansible/(devel|latest)/guide_docker.html
/ansible/(devel|latest)/guide_gce.html
/ansible/(devel|latest)/guide_kubernetes.html
/ansible/(devel|latest)/guide_packet.html
/ansible/(devel|latest)/guide_rax.html
/ansible/(devel|latest)/guide_rolling_upgrade.html
/ansible/(devel|latest)/guide_vagrant.html
/ansible/(devel|latest)/guide_vmware.html
/ansible/(devel|latest)/guides.html
/ansible/(devel|latest)/scenario_guides/guide_rolling_upgrade.html
/ansible/(devel|latest)/vmware/scenarios.html
/ansible/(devel|latest)/vmware/faq.html
/ansible/(devel|latest)/vmware/index.html
/ansible/(devel|latest)/vmware/vmware_?(.+)?
/ansible/(devel|latest)/vmware/scenario_?(.+)?

Renamed module reference directory

/ansible/(devel|latest)/modules_by_category.html
/ansible/(devel|latest)/modules/modules_by_category.html

Community page

/ansible/(devel|latest)/community.html

Windows and BSD guide

/ansible/(devel|latest)/windows.html
/ansible/(devel|latest)/intro_bsd.html
/ansible/(devel|latest)/intro_windows.html
/ansible/(devel|latest)/windows_dsc.html
/ansible/(devel|latest)/windows_faq.html
/ansible/(devel|latest)/windows_setup.html
/ansible/(devel|latest)/windows_usage.html
/ansible/(devel|latest)/windows_winrm.html

Not devel or latest version files

Some files in the redirects are non-versioned or specific to older versions.
Maybe these should be ported to ReadTheDocs redirects or just dropped.

/ansible/(developing_[^/]+).html
/ansible/developing.html
/ansible/dev_guide(/)?
/ansible/modules_by_category.html
/ansible/community.html
/ansible/modules.html

Redirects a full directory

/ansible/(devel|latest)/module_docs/?(.+)?

Pages moved in older versions, 2.5 and 2.6

/ansible/([^/]+)/user_guide/playbooks_vault.html
/ansible/([^/]+)/user_guide/quickstart.html
/ansible/([^/]+)/vmware/index.html

Related to PR 74834 (link still exists in 2.9 error messages)

/ansible/playbooks_vault.html
/ansible/network_debug_troubleshooting.html

Playbook guide

/ansible/(devel|latest)/playbooks.html
/ansible/(devel|latest)/become.html
/ansible/(devel|latest)/playbooks_advanced_syntax.html
/ansible/(devel|latest)/playbooks_async.html
/ansible/(devel|latest)/playbooks_best_practices.html
/ansible/(devel|latest)/playbooks_blocks.html
/ansible/(devel|latest)/playbooks_checkmode.html
/ansible/(devel|latest)/playbooks_conditionals.html
/ansible/(devel|latest)/playbooks_debugger.html
/ansible/(devel|latest)/playbooks_delegation.html
/ansible/(devel|latest)/playbooks_environment.html
/ansible/(devel|latest)/playbooks_error_handling.html
/ansible/(devel|latest)/playbooks_filters_ipaddr.html
/ansible/(devel|latest)/playbooks_filters.html
/ansible/(devel|latest)/playbooks_intro.html
/ansible/(devel|latest)/playbooks_lookups.html
/ansible/(devel|latest)/playbooks_loops.html
/ansible/(devel|latest)/playbooks_prompts.html
/ansible/(devel|latest)/playbooks_reuse_includes.html
/ansible/(devel|latest)/playbooks_reuse_roles.html
/ansible/(devel|latest)/playbooks_reuse.html
/ansible/(devel|latest)/playbooks_startnstep.html
/ansible/(devel|latest)/playbooks_strategies.html
/ansible/(devel|latest)/playbooks_tags.html
/ansible/(devel|latest)/playbooks_tests.html
/ansible/(devel|latest)/playbooks_variables.html
/ansible/(devel|latest)/playbook_pathing.html
/ansible/(devel|latest)/playbooks_python_version.html
/ansible/(devel|latest)/playbooks_roles.html
/ansible/(devel|latest)/playbooks_special_topics.html
/ansible/(devel|latest)/playbooks_templating.html

@oraNod
Copy link
Contributor Author

oraNod commented Nov 13, 2024

Stubs that already exist

For completeness, here is a list of all the stub pages that already exist in the ansible-documentation repository.

User guide

/ansible/(devel|latest)/user_guide/basic_concepts.html
/ansible/(devel|latest)/user_guide/intro_getting_started.html
/ansible/(devel|latest)/user_guide/intro.html
/ansible/(devel|latest)/user_guide/quickstart.html
/ansible/(devel|latest)/user_guide/intro_adhoc.html
/ansible/(devel|latest)/user_guide/cheatsheet.html
/ansible/(devel|latest)/user_guide/vault.html
/ansible/(devel|latest)/user_guide/modules.html
/ansible/(devel|latest)/user_guide/modules_intro.html
/ansible/(devel|latest)/user_guide/modules_support.html
/ansible/(devel|latest)/user_guide/plugin_filtering_config.html
/ansible/(devel|latest)/user_guide/collections_using.html
/ansible/(devel|latest)/user_guide/sample_setup.html
/ansible/(devel|latest)/user_guide/connection_details.html
/ansible/(devel|latest)/user_guide/intro_inventory.html
/ansible/(devel|latest)/user_guide/intro_patterns.html
/ansible/(devel|latest)/user_guide/intro_dynamic_inventory.html
/ansible/(devel|latest)/user_guide/windows_performance.html
/ansible/(devel|latest)/user_guide/windows.html
/ansible/(devel|latest)/user_guide/intro_bsd.html
/ansible/(devel|latest)/user_guide/intro_windows.html
/ansible/(devel|latest)/user_guide/windows_dsc.html
/ansible/(devel|latest)/user_guide/windows_faq.html
/ansible/(devel|latest)/user_guide/windows_setup.html
/ansible/(devel|latest)/user_guide/windows_usage.html
/ansible/(devel|latest)/user_guide/windows_winrm.html
/ansible/(devel|latest)/user_guide/complex_data_manipulation.html
/ansible/(devel|latest)/user_guide/playbooks_module_defaults.html
/ansible/(devel|latest)/user_guide/playbooks_vars_facts.html
/ansible/(devel|latest)/user_guide/playbooks.html
/ansible/(devel|latest)/user_guide/become.html
/ansible/(devel|latest)/user_guide/playbooks_advanced_syntax.html
/ansible/(devel|latest)/user_guide/playbooks_async.html
/ansible/(devel|latest)/user_guide/playbooks_best_practices.html
/ansible/(devel|latest)/user_guide/playbooks_blocks.html
/ansible/(devel|latest)/user_guide/playbooks_checkmode.html
/ansible/(devel|latest)/user_guide/playbooks_conditionals.html
/ansible/(devel|latest)/user_guide/playbooks_debugger.html
/ansible/(devel|latest)/user_guide/playbooks_delegation.html
/ansible/(devel|latest)/user_guide/playbooks_environment.html
/ansible/(devel|latest)/user_guide/playbooks_error_handling.html
/ansible/(devel|latest)/user_guide/playbooks_filters_ipaddr.html
/ansible/(devel|latest)/user_guide/playbooks_filters.html
/ansible/(devel|latest)/user_guide/playbooks_intro.html
/ansible/(devel|latest)/user_guide/playbooks_lookups.html
/ansible/(devel|latest)/user_guide/playbooks_loops.html
/ansible/(devel|latest)/user_guide/playbooks_prompts.html
/ansible/(devel|latest)/user_guide/playbooks_reuse_includes.html
/ansible/(devel|latest)/user_guide/playbooks_reuse_roles.html
/ansible/(devel|latest)/user_guide/playbooks_reuse.html
/ansible/(devel|latest)/user_guide/playbooks_startnstep.html
/ansible/(devel|latest)/user_guide/playbooks_strategies.html
/ansible/(devel|latest)/user_guide/playbooks_tags.html
/ansible/(devel|latest)/user_guide/playbooks_tests.html
/ansible/(devel|latest)/user_guide/playbooks_variables.html
/ansible/(devel|latest)/user_guide/playbook_pathing.html
/ansible/(devel|latest)/user_guide/playbooks_python_version.html
/ansible/(devel|latest)/user_guide/playbooks_roles.html
/ansible/(devel|latest)/user_guide/playbooks_special_topics.html
/ansible/(devel|latest)/user_guide/playbooks_templating.html

Developer

/ansible/(devel|latest)/dev_guide/testing_compile.html

@samccann samccann removed DaWGs Good discussion item for the DaWGs needs_triage Needs a first human triage before being processed. labels Nov 19, 2024
@oraNod oraNod self-assigned this Dec 5, 2024
@oraNod
Copy link
Contributor Author

oraNod commented Dec 5, 2024

@samccann Got an update on this one. We can discuss at the next DaWGs but I'll put some notes in here.

I started hacking around with another sphinx extension to generate stub pages at build time. It works like this:

  1. Define stub page details in a yaml file, for example:
intro_getting_started:
  title: Getting Started
  canonical: https://docs.ansible.com/ansible/latest/getting_started/index.html
  reference: getting_started_index
  path: intro_getting_started
intro:
  title: Introduction
  canonical: https://docs.ansible.com/ansible/latest/getting_started/index.html
  reference: getting_started_index
  path: intro

Here are descriptions of the fields:

title: <- This is the title of the source page.
canonical: <- This is the canonical url that we want to set.
reference: <- This is the RST reference.
path: <- This is the path to the resulting stub file that gets generated. e.g. user_guide/foo
  1. Create a stub page template under docs/docsite/.template such as this one:
:orphan:

.. meta::
   :canonical: {{ item.canonical }}

********************
{{ item.title }}
********************

This page has moved to :ref:`{{ item.reference }}`.
Please update your bookmarks or links with the correct page url.

The stub generator extension can go through and generate all the RST files at build time. I've got this to work but it's sort of janky. It's good because we don't need to actually create a bunch more stub files in the repo like we have in the docs/docsite/rst/user_guide directory by hand.

What sucks is that the stub generator actually generates the RST files so we'd either need to add them all to .gitignore or do them all at once and commit the lot. Otherwise we'll start getting hundreds of untracked files every time we do a local build, which is terrible.

Another thing that isn't great is that the resulting HTML pages have two canonical urls:

<meta content="https://docs.ansible.com/ansible/latest/getting_started/index.html" name="canonical" />

<link rel="canonical" href="https://docs.ansible.com/ansible/latest/intro_getting_started.html"/>

The first one (meta) is created by the stub generator. The second is created by Sphinx during the build. I'm not sure what effect this will have on SEO but I'd imagine it isn't great.

So this leaves us with two challenges to tackle:

  1. Need a way to generate the stub files in memory instead of actually creating them on disk. This, to me, feels like a blocker. As you know, the package docs build is already a memory hog. I don't think we want to add to that. I guess the alternative is adding hundreds of stub pages and polluting the repo.
  2. Need to figure out how to set the html_use_canonical Sphinx directive while generating the stub pages so that the link rel="canonical" tag is not included in the stub pages. Right now this is beyond me. I also don't like the idea of adding more code to the extension and increasing the complexity.

Good news

Thankfully there's already a Sphinx extension that solves most of our problems. I give you: https://pypi.org/project/sphinx-reredirects/

Using the same two examples from that sample yaml file above, it's as simple as adding this to the conf.py file:

redirects = {
     "intro_getting_started": "getting_started/index.html",
     "intro": "getting_started/index.html",
}

This results in pages under _build/html as follows:

  • intro.html
<html><head><meta http-equiv="refresh" content="0; url=getting_started/index.html"></head></html>
  • intro_getting_started.html
<html><head><meta http-equiv="refresh" content="0; url=getting_started/index.html"></head></html>

When you attempt to access either of those pages, you get automatically redirected to the Getting Started page. It looks promising and a happy medium for us. We don't need any custom code and we can define redirects in conf.py in a straightforward way, instead of more complicated yaml.

Now I don't think this is the absolute best for SEO. As the FAQ page points out, 301 redirects are preferable. At the same time, these redirects are much better than 404 pages.

It also looks super promising because that extension supports wildcards so we can redirect full directories and potentially some of the consolidated redirects from older versions, instead of relying on a limited number of Read The Docs redirects which might be tricky to set up for subprojects.

@oraNod oraNod linked a pull request Dec 6, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🆕 Triage
Development

Successfully merging a pull request may close this issue.

2 participants