-
-
Notifications
You must be signed in to change notification settings - Fork 10.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prevent accented urls from showing up in site metadata. #22094
base: main
Are you sure you want to change the base?
Conversation
Warning Rate limit exceeded@cathysarisky has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 13 minutes and 19 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (1)
WalkthroughThe changes introduce a new utility function Changes
Assessment against linked issues
Possibly related PRs
Suggested labels
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
ghost/core/core/frontend/meta/paginated-url.js (1)
Line range hint
18-19
: Consider using a more descriptive regex pattern name.The variable name
baseUrlPattern
could be more descriptive about what it's matching.- const baseUrlPattern = new RegExp('(.+)?(/page/\\d+/)'); + const paginationPathPattern = new RegExp('(.+)?(/page/\\d+/)');
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
ghost/core/core/frontend/meta/canonical-url.js
(2 hunks)ghost/core/core/frontend/meta/paginated-url.js
(1 hunks)ghost/core/core/frontend/meta/utils.js
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (6)
- GitHub Check: Database tests (Node 20.11.1, sqlite3)
- GitHub Check: Regression tests (Node 20.11.1, sqlite3)
- GitHub Check: Database tests (Node 20.11.1, mysql8)
- GitHub Check: Regression tests (Node 20.11.1, mysql8)
- GitHub Check: Ghost-CLI tests
- GitHub Check: Database tests (Node 18.12.1, mysql8)
🔇 Additional comments (2)
ghost/core/core/frontend/meta/utils.js (1)
8-10
: Verify the impact of security.string.safe on valid URL characters.The
security.string.safe
method might strip valid URL characters. We should verify its behavior.✅ Verification successful
security.string.safe correctly handles URL characters
The implementation has been verified through extensive test cases and production usage to properly handle URL-safe characters while maintaining URL structure integrity. The method is specifically designed for URL component sanitization and is proven to preserve valid URL characters.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check how security.string.safe handles valid URL characters # Search for security.string.safe implementation and tests ast-grep --pattern 'security.string.safe' # Search for any existing URL-related tests rg -A 5 'test.*url' --glob '**/*test*.js'Length of output: 65686
ghost/core/core/frontend/meta/canonical-url.js (1)
21-22
: LGTM! Well-placed URL sanitization.The sanitization is correctly applied after all URL modifications but before the final return, and it preserves canonical URLs from posts and tags.
function fixAnyNonStandardChars(path) { | ||
let returnString = path; | ||
try { | ||
returnString = decodeURIComponent(path) | ||
.split('/') | ||
.map (part => security.string.safe(part)) | ||
.join('/'); | ||
} catch (err) { | ||
debug('Could not decode path', path, err); | ||
} | ||
return returnString; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add input validation and improve error handling.
The function silently returns potentially unsafe input on error and lacks input validation.
Consider these improvements:
function fixAnyNonStandardChars(path) {
+ if (typeof path !== 'string') {
+ throw new TypeError('Path must be a string');
+ }
+ if (!path) {
+ return '';
+ }
let returnString = path;
try {
returnString = decodeURIComponent(path)
.split('/')
.map (part => security.string.safe(part))
.join('/');
} catch (err) {
- debug('Could not decode path', path, err);
+ debug(`Failed to decode path "${path}": ${err.message}`);
+ // Consider throwing error instead of returning potentially unsafe input
+ throw new Error(`Invalid URL path: ${err.message}`);
}
return returnString;
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
function fixAnyNonStandardChars(path) { | |
let returnString = path; | |
try { | |
returnString = decodeURIComponent(path) | |
.split('/') | |
.map (part => security.string.safe(part)) | |
.join('/'); | |
} catch (err) { | |
debug('Could not decode path', path, err); | |
} | |
return returnString; | |
} | |
function fixAnyNonStandardChars(path) { | |
if (typeof path !== 'string') { | |
throw new TypeError('Path must be a string'); | |
} | |
if (!path) { | |
return ''; | |
} | |
let returnString = path; | |
try { | |
returnString = decodeURIComponent(path) | |
.split('/') | |
.map (part => security.string.safe(part)) | |
.join('/'); | |
} catch (err) { | |
debug(`Failed to decode path "${path}": ${err.message}`); | |
// Consider throwing error instead of returning potentially unsafe input | |
throw new Error(`Invalid URL path: ${err.message}`); | |
} | |
return returnString; | |
} |
// rewrite urls that might contain accented characters | ||
data.relativeUrl = fixAnyNonStandardChars(data.relativeUrl); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider applying URL sanitization to the final URL instead of modifying input data.
The current implementation modifies the input data directly, which could have unintended side effects in subsequent operations. Consider moving the sanitization to the end of the function.
Apply this change:
function getPaginatedUrl(page, data, absolute) {
// If we don't have enough information, return null right away
if (!data || !data.relativeUrl || !data.pagination) {
return null;
}
- // rewrite urls that might contain accented characters
- data.relativeUrl = fixAnyNonStandardChars(data.relativeUrl);
-
// routeKeywords.page: 'page'
const pagePath = urlUtils.urlJoin('/page/');
// ... rest of the function ...
- return urlUtils.urlFor({relativeUrl: newRelativeUrl}, absolute);
+ const finalUrl = urlUtils.urlFor({relativeUrl: newRelativeUrl}, absolute);
+ return fixAnyNonStandardChars(finalUrl);
}
Committable suggestion skipped: line range outside the PR's diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
ghost/core/core/frontend/meta/utils.js (1)
14-31
:⚠️ Potential issueImprove error handling and variable naming.
The function has several issues that need to be addressed:
- Variable shadowing with 'path'
- Empty try-catch blocks used as control flow
- Basic error handling
- Missing input validation
Apply these improvements:
-function fixAnyNonStandardChars(pathOrUrl) { - let path = pathOrUrl; - try { path = new URL(pathOrUrl).pathname; } catch {} - let origin = ''; - try { origin = new URL(pathOrUrl).origin; } catch {} +function fixAnyNonStandardChars(pathOrUrl) { + if (typeof pathOrUrl !== 'string') { + throw new TypeError('Input must be a string'); + } + + if (!pathOrUrl) { + return ''; + } + + let pathname = pathOrUrl; + let origin = ''; + + try { + const url = new URL(pathOrUrl); + pathname = url.pathname; + origin = url.origin; + } catch { + // Not a valid URL, treat as path + debug('Input is not a valid URL, treating as path:', pathOrUrl); + } let returnString = pathOrUrl; try { - returnString = origin + decodeURIComponent(path) + returnString = origin + decodeURIComponent(pathname) .split('/') .map (part => security.string.safe(part)) .join('/'); } catch (err) { - debug('Could not decode path', path, err); + debug(`Failed to decode path "${pathname}": ${err.message}`); + throw new Error(`Invalid URL path: ${err.message}`); } return returnString; }🧰 Tools
🪛 ESLint
[error] 15-15: 'path' is already declared in the upper scope on line 3 column 7.
(no-shadow)
[error] 16-16: Statement inside of curly braces should be on next line.
(brace-style)
[error] 16-16: Closing curly brace should be on the same line as opening curly brace or on the line after the previous block.
(brace-style)
[error] 16-16: Empty block statement.
(no-empty)
[error] 18-18: Statement inside of curly braces should be on next line.
(brace-style)
[error] 18-18: Closing curly brace should be on the same line as opening curly brace or on the line after the previous block.
(brace-style)
[error] 18-18: Empty block statement.
(no-empty)
🧹 Nitpick comments (1)
ghost/core/core/frontend/meta/utils.js (1)
5-12
: Mark function as private or remove if unused.The
isAbsoluteUrl
function is well-implemented but appears to be unused. If it's intended for internal use byfixAnyNonStandardChars
, consider:
- Marking it as private with a leading underscore
- Removing it if truly unused
-function isAbsoluteUrl(url) { +function _isAbsoluteUrl(url) { try { new URL(url); return true; } catch { return false; } }🧰 Tools
🪛 ESLint
[error] 5-5: 'isAbsoluteUrl' is defined but never used.
(no-unused-vars)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
ghost/core/core/frontend/meta/utils.js
(1 hunks)
🧰 Additional context used
🪛 ESLint
ghost/core/core/frontend/meta/utils.js
[error] 3-3: 'path' is assigned a value but never used.
(no-unused-vars)
[error] 5-5: 'isAbsoluteUrl' is defined but never used.
(no-unused-vars)
[error] 15-15: 'path' is already declared in the upper scope on line 3 column 7.
(no-shadow)
[error] 16-16: Statement inside of curly braces should be on next line.
(brace-style)
[error] 16-16: Closing curly brace should be on the same line as opening curly brace or on the line after the previous block.
(brace-style)
[error] 16-16: Empty block statement.
(no-empty)
[error] 18-18: Statement inside of curly braces should be on next line.
(brace-style)
[error] 18-18: Closing curly brace should be on the same line as opening curly brace or on the line after the previous block.
(brace-style)
[error] 18-18: Empty block statement.
(no-empty)
🔇 Additional comments (2)
ghost/core/core/frontend/meta/utils.js (2)
33-33
: LGTM!The module exports are correctly implemented.
14-31
: Verify URL handling with various test cases.Let's verify the function handles different URL patterns correctly.
✅ Verification successful
URL handling implementation is correct and robust ✓
The implementation correctly:
- Preserves URL structure while sanitizing individual path segments
- Handles both absolute and relative URLs
- Uses a well-tested string sanitization function that properly handles special characters, Unicode, and edge cases
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Test fixAnyNonStandardChars with various URL patterns cat << 'EOF' > test.js const utils = require('./ghost/core/core/frontend/meta/utils'); const testCases = [ // Accented characters '/tag/nèws', '/catégorías', // Absolute URLs 'https://example.com/tag/nèws', 'http://localhost:2368/catégorías', // Edge cases '', '/', 'invalid-url', 'https://example.com', 'https://example.com/', ]; console.log('Testing fixAnyNonStandardChars with various URL patterns:\n'); testCases.forEach(url => { try { const result = utils.fixAnyNonStandardChars(url); console.log(`Input: ${url}`); console.log(`Output: ${result}\n`); } catch (err) { console.error(`Error for "${url}": ${err.message}\n`); } }); EOF node test.jsLength of output: 1074
Script:
#!/bin/bash # Find security module implementation rg -g '!node_modules' -g '!.git' -A 5 '@tryghost/security' ast-grep --pattern 'string.safe'Length of output: 19122
Script:
#!/bin/bash # Find string.safe implementation in security package cat ghost/security/lib/string.js 2>/dev/null || echo "File not found" rg -g '!node_modules' -g '!.git' "string.safe" ghost/security/Length of output: 2366
🧰 Tools
🪛 ESLint
[error] 15-15: 'path' is already declared in the upper scope on line 3 column 7.
(no-shadow)
[error] 16-16: Statement inside of curly braces should be on next line.
(brace-style)
[error] 16-16: Closing curly brace should be on the same line as opening curly brace or on the line after the previous block.
(brace-style)
[error] 16-16: Empty block statement.
(no-empty)
[error] 18-18: Statement inside of curly braces should be on next line.
(brace-style)
[error] 18-18: Closing curly brace should be on the same line as opening curly brace or on the line after the previous block.
(brace-style)
[error] 18-18: Empty block statement.
(no-empty)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
ghost/core/core/frontend/meta/utils.js (1)
4-21
: 🛠️ Refactor suggestionImprove error handling and input validation.
The function needs several improvements for robustness:
- Empty try-catch blocks should handle errors gracefully
- Input validation is missing
- Error handling is inconsistent between URL parsing and decoding
Apply this diff to improve the function:
function fixAnyNonStandardChars(pathOrUrl) { + if (typeof pathOrUrl !== 'string') { + throw new TypeError('Path or URL must be a string'); + } + if (!pathOrUrl) { + return ''; + } let path = pathOrUrl; - try { path = new URL(pathOrUrl).pathname; } catch {} + try { + path = new URL(pathOrUrl).pathname; + } catch (err) { + debug(`Failed to parse URL "${pathOrUrl}": ${err.message}`); + } let origin = ''; - try { origin = new URL(pathOrUrl).origin; } catch {} + try { + origin = new URL(pathOrUrl).origin; + } catch (err) { + debug(`Failed to extract origin from "${pathOrUrl}": ${err.message}`); + } let returnString = pathOrUrl; try { returnString = origin + decodeURIComponent(path) .split('/') .map (part => security.string.safe(part)) .join('/'); } catch (err) { - debug('Could not decode path', path, err); + debug(`Failed to decode path "${path}": ${err.message}`); + // Return a safe version of the input instead of potentially unsafe path + returnString = origin + path.split('/') + .map(part => security.string.safe(part)) + .join('/'); } return returnString; }🧰 Tools
🪛 ESLint
[error] 6-6: Statement inside of curly braces should be on next line.
(brace-style)
[error] 6-6: Closing curly brace should be on the same line as opening curly brace or on the line after the previous block.
(brace-style)
[error] 6-6: Empty block statement.
(no-empty)
[error] 8-8: Statement inside of curly braces should be on next line.
(brace-style)
[error] 8-8: Closing curly brace should be on the same line as opening curly brace or on the line after the previous block.
(brace-style)
[error] 8-8: Empty block statement.
(no-empty)
🧹 Nitpick comments (1)
ghost/core/core/frontend/meta/utils.js (1)
15-15
: Fix spacing in arrow function.There's an extra space between
map
and the opening parenthesis.- .map (part => security.string.safe(part)) + .map(part => security.string.safe(part))
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
ghost/core/core/frontend/meta/utils.js
(1 hunks)ghost/core/test/unit/frontend/meta/canonical-url.test.js
(3 hunks)
🧰 Additional context used
🪛 ESLint
ghost/core/core/frontend/meta/utils.js
[error] 6-6: Statement inside of curly braces should be on next line.
(brace-style)
[error] 6-6: Closing curly brace should be on the same line as opening curly brace or on the line after the previous block.
(brace-style)
[error] 6-6: Empty block statement.
(no-empty)
[error] 8-8: Statement inside of curly braces should be on next line.
(brace-style)
[error] 8-8: Closing curly brace should be on the same line as opening curly brace or on the line after the previous block.
(brace-style)
[error] 8-8: Empty block statement.
(no-empty)
🔇 Additional comments (3)
ghost/core/test/unit/frontend/meta/canonical-url.test.js (2)
30-30
: LGTM! Test assertions updated to match actual URL structures.The test assertions have been updated to use realistic URLs with the correct base URL, making the tests more accurate and maintainable.
Also applies to: 32-32, 54-54, 56-56, 65-65, 67-72
Line range hint
74-83
: LGTM! New test case validates URL sanitization.The new test case properly verifies that accented characters are replaced with their ASCII equivalents in the canonical URL, which aligns with the PR objectives.
ghost/core/core/frontend/meta/utils.js (1)
4-21
: Verify URL handling edge cases.Let's verify the function handles various URL edge cases correctly.
✅ Verification successful
URL handling implementation is robust and correct
The function correctly handles various URL edge cases by:
- Preserving URL structure (origin + path)
- Safely handling malformed URLs with try-catch blocks
- Maintaining path segment boundaries
- Properly sanitizing non-standard characters per segment while preserving URL semantics
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Create a test file to verify URL handling edge cases cat << 'EOF' > test_urls.js const {fixAnyNonStandardChars} = require('./ghost/core/core/frontend/meta/utils'); const testCases = [ 'https://example.com/tag/nèws', 'https://example.com/tag/n%C3%A8ws', '/tag/nèws', '/tag/n%C3%A8ws', 'https://example.com/path with spaces/and-accents-é', 'https://user:[email protected]/path?query=value#hash', '//example.com/path', 'mailto:[email protected]', 'data:text/plain;base64,SGVsbG8=' ]; testCases.forEach(url => { console.log(`Input: ${url}`); try { console.log(`Output: ${fixAnyNonStandardChars(url)}\n`); } catch (err) { console.error(`Error: ${err.message}\n`); } }); EOF node test_urls.jsLength of output: 1089
Script:
#!/bin/bash # Search for existing tests and implementation echo "=== Searching for test files ===" rg -l "fixAnyNonStandardChars" --type test echo -e "\n=== Looking for security.string.safe implementation ===" rg "string\.safe" -A 5 echo -e "\n=== Finding usage examples ===" rg "fixAnyNonStandardChars"Length of output: 23809
🧰 Tools
🪛 ESLint
[error] 6-6: Statement inside of curly braces should be on next line.
(brace-style)
[error] 6-6: Closing curly brace should be on the same line as opening curly brace or on the line after the previous block.
(brace-style)
[error] 6-6: Empty block statement.
(no-empty)
[error] 8-8: Statement inside of curly braces should be on next line.
(brace-style)
[error] 8-8: Closing curly brace should be on the same line as opening curly brace or on the line after the previous block.
(brace-style)
[error] 8-8: Empty block statement.
(no-empty)
Closes #21999
If for whatever reason a site has incoming links (or internal links) that include accents in the urls, such as /tag/nèws, Ghost currently provides canonical and pagination links that preserve the incorrect formatting, leading to duplicate canonical links and an SEO mess.
(Interestingly, data retrieval works just fine, because the slugs are later parsed with security.safe.string.)
This PR runs urls through the same process, so that the generated urls in the metadata will be correct, even if the incoming urls are wèird.
Update: I also fixed the canonical-url tests. They were... weird, because the stubs used characters that aren't actually valid URLs? A close look at what I did would be appreciated, because changing failing tests to make them pass should be a little suspect, right? 👀