Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move hosting to top-level navigation #1351

Closed
1 of 4 tasks
esthermmoturi opened this issue Apr 17, 2024 · 9 comments
Closed
1 of 4 tasks

Move hosting to top-level navigation #1351

esthermmoturi opened this issue Apr 17, 2024 · 9 comments
Assignees

Comments

@esthermmoturi
Copy link
Contributor

esthermmoturi commented Apr 17, 2024

The hosting section will be consolidated as a top navigation level. To enable that, the following steps should be done:

  • URLs should be listed before and after the move is done
  • Broken links should be detected and removed fixed with aliases
  • Existing documentation should be moved from other existing sections to the new top level section
  • New documentation should be added to consolidate/introduce undocumented items let's do just the move in this ticket.
@mrjones-plip
Copy link
Contributor

mrjones-plip commented Apr 17, 2024

@esthermmoturi - are you OK to tighten the scope of this to just "hosting" section? oops! I see you have that in the title already - sorry!

As well, I recommend tightening it even more to just the move and alias effort, skipping the "New documentation should be added to consolidate/introduce undocumented items" part for another ticket.

I've otherwise made some small adjustments to the language in the body- feel free to undo if you disagree with them!

By week's end I hope to have scripts that will:

  1. run against current version of docs site and find all URLs in the https://docs.communityhealthtoolkit.org/apps/guides/hosting/ path
  2. output these to a file
  3. write a second script that will take the output file and test that all URLs are redirected correctly on a new version of the site.

this will cover the first 2 checklist items and you can re-run it as much as you need to.

@mrjones-plip
Copy link
Contributor

mrjones-plip commented Apr 17, 2024

ok! Here's the scripts (below) and the steps to use them:

  1. copy scripts per their file names in the titles. make sure to chmod +x get.urls.sh and chmod +x check.urls.sh after you create them
  2. check main branch on cht-docs
  3. update these two values to be accurate. be sure path does not have a trailing slash (/):
    base_url="http://localhost:1313"
    path="/apps/guides/hosting"
  4. gather URLs for the area you're changing (defaults to path of /apps/guides/hosting):
    ./get.urls.sh > hosting.urls.txt
  5. switch to your branch, make edits to move pages, save
  6. run checker script, make sure all pages 200:
    ./check.urls.sh hosting.urls.txt

get.urls.sh

#!/bin/bash

# Function to crawl URLs recursively
function crawl_urls {
    local base_url="$1"
    local path="$2"
    local url="$base_url$path"
    local visited_urls=("${@:3}")

    # Check if the URL has already been visited
    if [[ " ${visited_urls[@]} " =~ " $url " ]]; then
        return
    fi

    # Add the current URL to the visited list
    visited_urls+=("$url")

    # Fetch the HTML content of the URL and suppress all output
    html_content=$(wget -qO- "$url" 2>/dev/null)
    wget_exit_status=$?

    # Check if wget command was successful
    if [ $wget_exit_status -ne 0 ]; then
        return
    fi

    # Extract all anchor tags and their href attributes
    local links=$(echo "$html_content" | grep -oE '<a [^>]+>' | grep -oE 'href="([^"#]+)"' | sed -e 's/^href="//' -e 's/"$//')

    # Output each URL found under the current URL
    for link in $links; do
        # Construct absolute URL if the link is relative
        if [[ $link == /* ]]; then
            link="$base_url$link"
        fi

        # Check if the URL is under the specified path and has not been visited before
        if [[ $link == "$base_url$path/"* && ! " ${visited_urls[@]} " =~ " $link " ]]; then
            echo "$link"
            # Recursively crawl the URL
            crawl_urls "$base_url" "$path" "$link" "${visited_urls[@]}"
        fi
    done
}

# Start crawling from the base URL with the specified path
base_url="http://localhost:1313"
path="/apps/guides/hosting"
declare -a visited_urls=()
crawl_urls "$base_url" "$path" "${visited_urls[@]}" | sort -u

sample output:

http://localhost:1313/apps/guides/hosting/
http://localhost:1313/apps/guides/hosting/3.x/
http://localhost:1313/apps/guides/hosting/3.x/app-developer/
http://localhost:1313/apps/guides/hosting/3.x/ec2-setup-guide/
http://localhost:1313/apps/guides/hosting/3.x/offline/
http://localhost:1313/apps/guides/hosting/3.x/self-hosting/
http://localhost:1313/apps/guides/hosting/3.x/ssl-cert-install/
http://localhost:1313/apps/guides/hosting/4.x/
http://localhost:1313/apps/guides/hosting/4.x/adding-tls-certificates/
http://localhost:1313/apps/guides/hosting/4.x/app-developer/
http://localhost:1313/apps/guides/hosting/4.x/backups/
http://localhost:1313/apps/guides/hosting/4.x/data-migration/
http://localhost:1313/apps/guides/hosting/4.x/logs/
http://localhost:1313/apps/guides/hosting/4.x/self-hosting/
http://localhost:1313/apps/guides/hosting/4.x/self-hosting/multiple-nodes/
http://localhost:1313/apps/guides/hosting/4.x/self-hosting/self-hosting-k3s-multinode/
http://localhost:1313/apps/guides/hosting/4.x/self-hosting/single-node/
http://localhost:1313/apps/guides/hosting/monitoring/
http://localhost:1313/apps/guides/hosting/monitoring/integration/
http://localhost:1313/apps/guides/hosting/monitoring/introduction/
http://localhost:1313/apps/guides/hosting/monitoring/postgres-ingest/
http://localhost:1313/apps/guides/hosting/monitoring/production/
http://localhost:1313/apps/guides/hosting/monitoring/setup/
http://localhost:1313/apps/guides/hosting/requirements/
http://localhost:1313/apps/guides/hosting/vertical-vs-horizontal/

check.urls.sh

#!/bin/bash

# Function to get HTTP response code of a URL
get_response_code() {
    local url=$1
    local response_code=$(curl -s -o /dev/null -w "%{http_code}" "$url")
    echo "$response_code"
}

# Function to check for meta refresh tag in HTML content
check_meta_refresh() {
    local html_content=$1
    if grep -q '<meta http-equiv="refresh"' <<< "$html_content"; then
        local redirect_url=$(grep -oP 'url=[^"]+' <<< "$html_content" | cut -d'=' -f2)
        local redirect_response_code=$(get_response_code "$redirect_url")
        echo "    -> $redirect_url $redirect_response_code "
#        echo "Redirect URL: "
#        echo -e "$redirect_response_code\t$redirect_url"
    fi
}

# Check if the user provided a file containing URLs
if [ $# -ne 1 ]; then
    echo "Usage: $0 <file_with_urls>"
    exit 1
fi

# Loop through each URL in the file
while IFS= read -r url; do
    # Get HTTP response code
    response_code=$(get_response_code "$url")
    echo "$url $response_code"

    # If response code is 200, check for meta refresh tag
    if [ "$response_code" -eq 200 ]; then
        html_content=$(curl -s "$url")
        check_meta_refresh "$html_content"
    fi
done < "$1"

sample output for moving /apps/guides/hosting/requirements/ to /apps/guides/requirements/

http://localhost:1313/apps/guides/hosting/ 200
http://localhost:1313/apps/guides/hosting/3.x/ 200
http://localhost:1313/apps/guides/hosting/3.x/app-developer/ 200
http://localhost:1313/apps/guides/hosting/3.x/ec2-setup-guide/ 200
http://localhost:1313/apps/guides/hosting/3.x/offline/ 200
http://localhost:1313/apps/guides/hosting/3.x/self-hosting/ 200
http://localhost:1313/apps/guides/hosting/3.x/ssl-cert-install/ 200
http://localhost:1313/apps/guides/hosting/4.x/ 200
http://localhost:1313/apps/guides/hosting/4.x/adding-tls-certificates/ 200
http://localhost:1313/apps/guides/hosting/4.x/app-developer/ 200
http://localhost:1313/apps/guides/hosting/4.x/backups/ 200
http://localhost:1313/apps/guides/hosting/4.x/data-migration/ 200
http://localhost:1313/apps/guides/hosting/4.x/logs/ 200
http://localhost:1313/apps/guides/hosting/4.x/self-hosting/ 200
http://localhost:1313/apps/guides/hosting/4.x/self-hosting/multiple-nodes/ 200
http://localhost:1313/apps/guides/hosting/4.x/self-hosting/self-hosting-k3s-multinode/ 200
http://localhost:1313/apps/guides/hosting/4.x/self-hosting/single-node/ 200
http://localhost:1313/apps/guides/hosting/monitoring/ 200
http://localhost:1313/apps/guides/hosting/monitoring/integration/ 200
http://localhost:1313/apps/guides/hosting/monitoring/introduction/ 200
http://localhost:1313/apps/guides/hosting/monitoring/postgres-ingest/ 200
http://localhost:1313/apps/guides/hosting/monitoring/production/ 200
http://localhost:1313/apps/guides/hosting/monitoring/setup/ 200
http://localhost:1313/apps/guides/hosting/requirements/ 200
    -> http://localhost:1313/apps/guides/requirements/ 200 
http://localhost:1313/apps/guides/hosting/vertical-vs-horizontal/ 200

@mrjones-plip mrjones-plip removed their assignment Apr 17, 2024
@mrjones-plip mrjones-plip moved this from This Week's commitments to Done in Product Team Activities Apr 17, 2024
@mrjones-plip mrjones-plip self-assigned this Apr 17, 2024
@mrjones-plip
Copy link
Contributor

over to you @esthermmoturi - you should be able to re-use this script for all the different sections.

@esthermmoturi
Copy link
Contributor Author

Thanks @mrjones-plip , looking forward to using the scripts and sharing any issues/successes.

@esthermmoturi
Copy link
Contributor Author

esthermmoturi commented Apr 24, 2024

Hey @mrjones-plip , I am testing the script and have two questions:

  1. Should the .sh files be placed somewhere specific?
  2. Should the base URL and path values be changed/defined in the getget.urls.sh file?

@mrjones-plip
Copy link
Contributor

Should the .sh files be placed somewhere specific?

Nope! They can be anywhere you want.

Should the base URL and path values be changed/defined in the get get.urls.sh file?

No, they're fine as is:

base_url="http://localhost:1313"
path="/apps/guides/hosting"

This is because I'm assuming you running locally with the default hugo config, which runs on localhost and port 1313. When you run on main branch you'll get the base list saved to hosting.urls.tx. You can then run against your different branch after you move files. All of this works without changing base_url or path

@esthermmoturi
Copy link
Contributor Author

esthermmoturi commented Apr 26, 2024

@mrjones-plip Thanks for the clarification, I was able to run both commands. The only issue is that my hosting.txt file is empty after running while on main and on my branch.

@mrjones-plip
Copy link
Contributor

interesting!

  1. were you using hosting.txt? The command above outputs hosting.urls.txt so I wanted to double check
  2. what happens when you run get.urls.sh by itself? It should output the correct URLs. Maybe you're not running on port 1313?

@esthermmoturi
Copy link
Contributor Author

For anyone else getting an empty hosting.urls.txt, confirm that wget is installed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Done
Development

No branches or pull requests

2 participants