Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gatsby does not resolve / find unicode URLs encoded with encodeURI #16765

Closed
eyalroth opened this issue Aug 20, 2019 · 32 comments
Closed

Gatsby does not resolve / find unicode URLs encoded with encodeURI #16765

eyalroth opened this issue Aug 20, 2019 · 32 comments
Labels
stale? Issue that may be closed soon due to the original author not responding any more.

Comments

@eyalroth
Copy link
Contributor

eyalroth commented Aug 20, 2019

Description

Gatsby does not support pages with a path containing unicode characters and encoded with encodeURI. The development server (gatsby develop) will fail to find these pages, while the production build (gatsby build) will fail to find them if the service worker plugin (gatsby-plugin-offline) is enabled.

This was previously discussed in this issue, however it was closed by the Gatsby bot so I am reopening it, as it is a crucial bug for me.

Steps to reproduce

  1. Create a new gatsby project from the default starter.
  2. Add a new page component in src/components/page3.js:
import React from "react"
import { Link } from "gatsby"

import Layout from "./layout"
import SEO from "./seo"

const ThirdPage = () => (
  <Layout>
    <SEO title="Page three" />
    <h1>Hi from the third page</h1>
    <p>Welcome to page 3</p>
    <Link to="/">Go back to the homepage</Link>
  </Layout>
)

export default ThirdPage
  1. Add this to gatsby-node.js:
const path = require('path')

exports.createPages = ({ actions }) => {
    const { createPage } = actions
    createPage({
        path: encodeURI("/page-שלוש/"), // this is "three" in Hebrew
        component: path.resolve('./src/components/page3.js'),
    })
}
  1. Run gatsby develop.
  2. Try navigating to the page via either http://localhost:8000/page-שלוש/ or http://localhost:8000/page-%D7%A9%D7%9C%D7%95%D7%A9/, and you'll see nothing comes up.
  3. Add gatsby-plugin-offline in gatsby-config.js (simply un-comment it).
  4. Run gatsby build && gatsby serve.
  5. Once again try navigating to the aformentioned URLs (port 9000), and again you'll see nothing comes up.

Expected result

Encoded URLs should be resolved and found correctly.

Actual result

URLs are not found.

Environment

gatsby info --clipboard:

  System:
    OS: Linux 4.4 Ubuntu 16.04.6 LTS (Xenial Xerus)
    CPU: (4) x64 Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz
    Shell: 4.3.48 - /bin/bash
  Binaries:
    Node: 10.16.0 - ~/n/bin/node
    npm: 6.10.3 - ~/n/bin/npm
  Languages:
    Python: 2.7.12 - /usr/bin/python
  npmPackages:
    gatsby: ^2.13.70 => 2.13.70
    gatsby-image: ^2.2.9 => 2.2.9
    gatsby-plugin-manifest: ^2.2.5 => 2.2.5
    gatsby-plugin-offline: ^2.2.6 => 2.2.6
    gatsby-plugin-react-helmet: ^3.1.3 => 3.1.3
    gatsby-plugin-sharp: ^2.2.12 => 2.2.12
    gatsby-source-filesystem: ^2.1.9 => 2.1.9
    gatsby-transformer-sharp: ^2.2.6 => 2.2.6
  npmGlobalPackages:
    gatsby-cli: 2.6.13

Note that this is a WSL installation on Windows 10.0.17134.799.

@robinzimmer1989
Copy link

I'm experiencing a similar issue.

I'm querying data with Japanese and Chinese content from WordPress. The URL's mostly containing foreign characters so WP encodes them automatically so they look like this:
http://localhost:8000/%E4%BC%9A%E7%A4%BE%E6%A6%82%…%B7%E3%83%BC%E3%83%9D%E3%83%AA%E3%82%B7%E3%83%BC/

When I run gatsby develop all pages return a 404 error except the ones without special characters (i.e. http://localhost:8000/posts).

So I've tried to decode the pathname before creating the page and it worked - the pages were loading fine.

createPage({ path: decodeURIComponent(pathname), component: template })

But for some reason, the Gatsby Link component stopped working: When I navigate via the menu to a different page I get a white screen (no console errors) and nothing happens. But when I refresh the page it's all working fine again. Also when I visit any page from the generic development 404 page it's working just fine.

@wardpeet
Copy link
Contributor

wardpeet commented Sep 4, 2019

Is it possible to give us access to a reproduction with a wordpress api?

#15551 says it's working as expected.

@wardpeet wardpeet added status: needs more info Needs triaging and reproducible examples or more information to be resolved status: needs reproduction This issue needs a simplified reproduction of the bug for further troubleshooting. labels Sep 4, 2019
@eyalroth
Copy link
Contributor Author

eyalroth commented Sep 4, 2019

@wardpeet This has nothing to do with WordPress or any other CMS. Check out the steps to reproduce in the issue description.

@gatsbot gatsbot bot added the stale? Issue that may be closed soon due to the original author not responding any more. label Sep 25, 2019
@gatsbot
Copy link

gatsbot bot commented Sep 25, 2019

Hiya!

This issue has gone quiet. Spooky quiet. 👻

We get a lot of issues, so we currently close issues after 30 days of inactivity. It’s been at least 20 days since the last update here.

If we missed this issue or if you want to keep it open, please reply here. You can also add the label "not stale" to keep this issue open!

As a friendly reminder: the best way to see this issue, or any other, fixed is to open a Pull Request. Check out gatsby.dev/contribute for more information about opening PRs, triaging issues, and contributing!

Thanks for being a part of the Gatsby community! 💪💜

@eyalroth
Copy link
Contributor Author

@gatsbybot @wardpeet Nope, not stale. Still happening, and the reproduction information is available in this ticket.

@dsegovia90
Copy link

dsegovia90 commented Oct 1, 2019

Hello! Not sure if at all helpful, but I'm experiencing the same issue, although weirdly only in production and with double quotes " (%22) in the url. Also, the link is accessible if it's not the first load (SSR).

If you try to go to this link directly (server side rendered), the site throws an error (index):28 Uncaught SyntaxError: Unexpected string. Inspecting the code, gatsby is adding a double quote in the window.pagePath="/instrumentos/Viola-Cremona-SVA-130-708834-15"-1" instead of url encoding it, or escaping it.

But if you go here first, and click on the first instrument, it will take you to the same url above but not throw the error.

Again, not sure if at all helpful, let me know if you need more information.

@davegreig
Copy link

@dsegovia90 I have a similar but distinct issue to what you're reporting.
See #17556

TL;DR: In my case, I can navigate directly to a page with non-encoded URLs except on MS Edge. On MS Edge, I have the same symptoms of your bug - the page renders for a second, but then goes white and the <div id="___gatsby"> is contentless

@eyalroth
Copy link
Contributor Author

eyalroth commented Oct 1, 2019

@roadwig I believe it's a problem with Edge, not Gatsby. The whole reason I first started encoding my URLs is because Edge was failing to load them.

Edit: Reading through your issue, perhaps this is indeed a problem with Gatsby. Hard to say if Gatsby is to blame or Edge. Regardless, this definitely seem related.

@gatsbot
Copy link

gatsbot bot commented Oct 12, 2019

Hey again!

It’s been 30 days since anything happened on this issue, so our friendly neighborhood robot (that’s me!) is going to close it.

Please keep in mind that I’m only a robot, so if I’ve closed this issue in error, I’m HUMAN_EMOTION_SORRY. Please feel free to reopen this issue or create a new one if you need anything else.

As a friendly reminder: the best way to see this issue, or any other, fixed is to open a Pull Request. Check out gatsby.dev/contribute for more information about opening PRs, triaging issues, and contributing!

Thanks again for being part of the Gatsby community!

@gatsbot gatsbot bot closed this as completed Oct 12, 2019
@eyalroth
Copy link
Contributor Author

@gatsbybot @wardpeet Still not stale, still has all the required info, and things have definitely happened on this issue in the past 30 days 😕

@btk
Copy link
Contributor

btk commented Oct 13, 2019

@eyalroth Your reproduction steps breaks.

But what is your actual reason using encodeURI()? I have been using createPage() with Turkish script characters, and haven't faced any issue.

Just tried as;

createPage({
        path: "/page-שלוש/", // this is "three" in Hebrew
        component: path.resolve('./src/components/page3.js'),
    })

And this works perfect. Both staging and production.

On this system;

  System:
    OS: Windows 10
    CPU: (8) x64 Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
  Binaries:
    Yarn: 1.16.0 - C:\Users\ileti\AppData\Roaming\npm\yarn.CMD
    npm: 6.9.0 - C:\Program Files\nodejs\npm.CMD
  Browsers:
    Edge: 44.17763.1.0
  npmPackages:
    gatsby: ^2.15.36 => 2.15.36 
    gatsby-image: ^2.2.27 => 2.2.27 
    gatsby-plugin-manifest: ^2.2.21 => 2.2.21 
    gatsby-plugin-offline: ^3.0.14 => 3.0.14 
    gatsby-plugin-react-helmet: ^3.1.11 => 3.1.11 
    gatsby-plugin-sharp: ^2.2.29 => 2.2.29 
    gatsby-source-filesystem: ^2.1.31 => 2.1.31 
    gatsby-transformer-sharp: ^2.2.21 => 2.2.21 

@vincentpelage
Copy link

vincentpelage commented Oct 14, 2019

Got same issue here:
Our client (Gatsby / WordPressAPI hosted on Netlify) created an URL with accent on WordPress then decided to remove accent, created a new URL and ask to redirect for SEO purpose. We used in gatsby-node.js:

createRedirect({
    fromPath: "/réseaux-sociaux",
    toPath: "/reseaux-sociaux/",
    isPermanent: true,
  })

When we try to access "/réseaux-sociaux", a 404 is displayed few ms before being replaced by a blank page. We also notice that every page that contains unicode character doesn't display 404 but blank page.

We have built several website with Gatsby and did not face this issue few month ago.
We tried to downgrade both gatsby and react version but it did not resolve anything.

We also tried this before we found out that this issue wasn't related to the redirects, It didn't work:

createRedirect({
    fromPath: encodeURI("/réseaux-sociaux"),
    toPath: "/reseaux-sociaux/",
    isPermanent: true,
  })

@travis-r6s travis-r6s added not stale and removed stale? Issue that may be closed soon due to the original author not responding any more. labels Oct 15, 2019
@travis-r6s
Copy link
Contributor

travis-r6s commented Oct 15, 2019

I have this issue when using locales from Prismic - I have a slug with cyrillic characters, i.e. /bg/за-нас (shown as /bg/%D0%B7%D0%B0-%D0%BD%D0%B0%D1%81) and when navigating to that page, Gatsby says it cannot be found, even though it shows that exact page url below.

image

EDIT: I didn't read above efforts properly 🤦‍♂️ - I tried decodeURI(slug) and that seems to have fixed my issue.

@eyalroth
Copy link
Contributor Author

eyalroth commented Oct 15, 2019

@btk encodeURI() should no break the page. There shouldn't be a special reason to use it, as it is a standard JavaScript method. Moreover, creating a page with Unicode characters (such as Turkish script) but without this method will fail to load the page on MS Edge (see #17556).

@eyalroth eyalroth removed status: needs more info Needs triaging and reproducible examples or more information to be resolved status: needs reproduction This issue needs a simplified reproduction of the bug for further troubleshooting. labels Oct 15, 2019
@eyalroth eyalroth reopened this Oct 15, 2019
@siavashh
Copy link

Nothing new on this?

@adamgen
Copy link

adamgen commented May 21, 2020

Nothing to do with Edge, it happens on Chrome.

@adamgen
Copy link

adamgen commented May 21, 2020

I made a test with both plain English characters and with Unicode characters, the English only characters work well, the Unicode characters don't work.

Source site http://www.wpexpert.co.il/%D7%91%D7%9C%D7%95%D7%92/

Reproduction repo: https://github.com/adamgen/gatsby-wp-unicode-error-poc

image

@adamgen
Copy link

adamgen commented May 21, 2020

Just making sure we're on the same page here - the long weird %d7%9.... is a 100% legit URL, you can see many of these in the example Source site I sent

@adamgen
Copy link

adamgen commented May 21, 2020

I found a local fix, but I really think it should be fixed on gatsby.

To make a long story short - you should use decodeURIComponent instead of encodeURI since you're naming a filename and not a URL path. Taking the example from above this should work:

const path = require('path')

exports.createPages = ({ actions }) => {
    const { createPage } = actions
    createPage({
        path: decodeURIComponent("/page-שלוש/"), // this is "three" in Hebrew
        component: path.resolve('./src/components/page3.js'),
    })
}

I think that gatsby should apply decodeURIComponent on file paths by itself to avoid similar issues.

@github-actions
Copy link

Hiya!

This issue has gone quiet. Spooky quiet. 👻

We get a lot of issues, so we currently close issues after 30 days of inactivity. It’s been at least 20 days since the last update here.
If we missed this issue or if you want to keep it open, please reply here. You can also add the label "not stale" to keep this issue open!
As a friendly reminder: the best way to see this issue, or any other, fixed is to open a Pull Request. Check out gatsby.dev/contribute for more information about opening PRs, triaging issues, and contributing!

Thanks for being a part of the Gatsby community! 💪💜

@github-actions github-actions bot added the stale? Issue that may be closed soon due to the original author not responding any more. label Jun 22, 2020
@github-actions
Copy link

github-actions bot commented Jul 2, 2020

Hey again!

It’s been 30 days since anything happened on this issue, so our friendly neighborhood robot (that’s me!) is going to close it.
Please keep in mind that I’m only a robot, so if I’ve closed this issue in error, I’m HUMAN_EMOTION_SORRY. Please feel free to reopen this issue or create a new one if you need anything else.
As a friendly reminder: the best way to see this issue, or any other, fixed is to open a Pull Request. Check out gatsby.dev/contribute for more information about opening PRs, triaging issues, and contributing!

Thanks again for being part of the Gatsby community! 💪💜

@github-actions github-actions bot closed this as completed Jul 2, 2020
@jeffstahlnecker
Copy link

Still an issue that should be fixed in Gatsby.

@moreguppy
Copy link

Finding a similar issue with meta tags that have links to URLs — Twitter is not parsing any html entities in URLs where & has become &amp; so it can't get the meta image.

@djun-kim
Copy link

Having the same issue here. This is a blocker for enabling multi-lingual sites.

@creotip
Copy link

creotip commented Sep 20, 2020

Same problem here.
The solution with decodeURIComponent not working for me

@barbareshet
Copy link

@adamgen solution for Hebrew is working well

@marcus13371337
Copy link

Having the same issue!

@machineghost
Copy link

machineghost commented Jan 10, 2021

Still an issue; GitHub REALLY needs a way to better surface falsely-closed issues.

EDIT: At the bare minimum, even if absolutely no code fix is needed (which seems unlikely), a documentation update is needed. https://www.gatsbyjs.com/docs/reference/routing/creating-routes/ doesn't even menton URI encoding/decoding, let alone explain how Gatsby expects you to handle it.

EDIT 2: Similarly https://www.gatsbyjs.com/docs/reference/config-files/actions/#createPage makes no mention of encoding problems, and simply defines the path parameter as:

path string
Any valid URL. Must start with a forward slash

Encoded URLs (which is to say encodeURIComponent-ed strings, eg. "a:b" => a%3Ab) certainly are "valid", but they break when you use them with createPage.

@LekoArts
Copy link
Contributor

Following the paper trail of #17556 or #15551 some issues around MS Edge and reach/router were solved. However, this issue here is way to vague on what's actually the issue now (whether it's a specific browser issue, problem with encodeURI or decodeURIComponent) and "having the same issue" comments doesn't help in resolving it.

So please open one new bug report (and others can comment on it with a reproduction) where you give a reproduction and outline where the problem lies now.

  • Is decodeURIComponent in Gatsby itself needed?
  • Does the routing not work with this?
  • Does encodeURI not work?

@seankovacs
Copy link

While all of the above comments mention actual pages that have unicode characters, a similar and easily testable issue I'm running into is accessing a purposely invalid URL with a %23 in the path part of a URL. The 404 page tries to render (the title says 404), but the page is blank and the console dumps out the error mentioned above - Cannot read property 'page' of undefined.

@Auspicus
Copy link
Contributor

Auspicus commented Aug 3, 2022

@LekoArts

I believe that due to this call (see: find-path.js) decodeURIComponent, paths with pre-encoded unicode characters (ie. /%E2%80%9Chmmm-%E2%80%9D) cannot be used in the createPages API. However, passing regular unicode characters works fine.

I agree with @machineghost that this should be documented in the createPage API and Routing reference docs. Or the issue should be addressed so that both pre-encoded and regular unicode characters work the same. From my perspective, I was expecting both to just work.

Sorry to re-ignite an old issue but I think it should be addressed in some capacity. A minimal reproduction would be (with “ being a unicode character, however any unicode character should have this issue):

// works
createPage({
  path: `/“hmmm”`,
  component: require.resolve("./src/templates/some-template.js") /* not relevant */,
  context: {},
})

// doesn't work
createPage({
  path: encodeURI(`/“hmmm”`),
  component: require.resolve("./src/templates/some-template.js") /* not relevant */,
  context: {},
})

TLDR;

  • /“hmmm” works fine ✔️
  • /%E2%80%9Chmmm-%E2%80%9D or encodeURI("/“hmmm”") is a 404 ❌

@Auspicus
Copy link
Contributor

@machineghost The docs have been updated in those two sections to reflect this limitation. Cheers for helping me find those limitations. I spent A LOT of time wondering why those URLs were coming up as 404.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale? Issue that may be closed soon due to the original author not responding any more.
Projects
None yet
Development

No branches or pull requests