Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pages with utf-8 name don't work properly under SSR #10084

Open
frei-0xff opened this issue Jan 14, 2020 · 22 comments
Open

Pages with utf-8 name don't work properly under SSR #10084

frei-0xff opened this issue Jan 14, 2020 · 22 comments
Milestone

Comments

@frei-0xff
Copy link

Bug report

Pages with utf-8 non-ASCII characters in their name don't work properly under SSR

Describe the bug

Pages with utf-8 non-ASCII characters in their name work just fine with client-side navigation,
but when rendered on server side return "404 This page could not be found."

To Reproduce

Steps to reproduce the behavior, please provide code snippets or a repository:

  1. Create page 'pages/тест.js'
  2. Navigate to http://localhost:3000/тест
  3. See error "404 This page could not be found."

Expected behavior

I'm expecting to see page 'pages/тест.js' rendered

System information

  • OS: Windows
  • Version of Next.js: 9.1.7

Additional context

Minimal repository to reproduce bug: https://github.com/frei-0xff/nextjs-utf8-pagename

@StarpTech
Copy link
Contributor

What's the purpose of using none-ASCII chars if your page name should be displayed as a valid URL?

http://localhost:3000/тест is converted to http://localhost:3000/%D1%82%D0%B5%D1%81%D1%82 and can't be found.

@kachkaev
Copy link
Contributor

@StarpTech you might want to have a link like http://яндекс.рф/тест. These display as Cyrillic URLs modern browser tabs. From my experience, тест turns into %D1%82%D0%B5%D1%81%D1%82 only when you copy the URL into buffer.

@frei-0xff
Copy link
Author

frei-0xff commented Jan 16, 2020

@StarpTech none-ASCII URL's displayed properly in all modern browsers and used by popular sites. For example by wikipedia.org

@StarpTech
Copy link
Contributor

Thanks for the examples. I have never used it.

@Itzik7
Copy link

Itzik7 commented May 21, 2020

As a workaround you can use dynamic page [page and switch case on pages names in utf8 pages name.

@frei-0xff
Copy link
Author

frei-0xff commented Aug 7, 2020

In version 9.2 client-side routing for pages with non-ASCII characters worked just fine. The issue was only with the server-side routing, that could be worked around with custom server.js with decodeURI(parsedUrl.pathname).

After updating to version 9.5.1 client-side routing for pages with non-ASCII characters stopped working at all. In development mode, after clicking on the link with such a page name, no navigation happens without any error messages. After routeChangeStart event neither routeChangeComplete nor routeChangeError events are fired, and only after clicking on another link routeChangeError with "Error: Route Cancelled" is fired.

Edit:
It seems that this #14827 was the breaking change.
Because URLs returned by WHATWG URL API are URL-encoded and it is inconsistent with other parts of the code.

@jonrh
Copy link

jonrh commented Nov 15, 2020

Tested v9.5.0, v9.5.5, and v10.0.1 and none of them support statically generated pages with non-ascii names like /тест and /hæ. It worked as expected in a few versions I tested between v9.0.0 and v9.4.4. I would classify this is a bug or an undocumented breaking change in v9.5.

jonrh added a commit to jonrh/jonogmarteinn.is that referenced this issue Nov 16, 2020
Mostly done but utf-8 page names is currently not supported in Next.js. Most likely a bug. 

See: 
vercel/next.js#19135
vercel/next.js#10084
@fillon
Copy link

fillon commented Dec 17, 2020

I am experiencing the same issue with route in thai language

Is there a workaround?

next-10.0.3

kodiakhq bot pushed a commit that referenced this issue Dec 28, 2020
…19135)

This ensures we handle encoding/decoding for SSG prerendered/fallback pages correctly. Since we only encode path delimiters when outputting to the disk we need to match this encoding when building the `ssgCacheKey` to look-up the prerendered pages. This also fixes non-ascii prerendered paths (e.g. 商業日語) not matching correctly. 

This does not resolve 👉  #10084 and further investigation will be needed before addressing non-ascii paths for non-SSG pages. 

The encoding output was tested against https://tst-encoding-l7amu5b9c.vercel.app/ to ensure the values will match correctly on Vercel. 

Closes: #17582
Closes: #17642
x-ref: #14717
@jonrh
Copy link

jonrh commented Dec 29, 2020

Tested again and found something really peculiar. It works as expected when deployed on Vercel. It does not work locally when running next dev nor next build && next start, returns 404 error.

Sample code: https://github.com/jonrh/next-unicode-bugs
Sample live website: https://next-unicode-bugs.vercel.app/

Video showing it working on Vercel:
https://user-images.githubusercontent.com/58344/103299546-b4da8600-49f4-11eb-8dd9-92ffd8536407.mov

I would also like to clarify that this is only testing static routes, not server side rendering (SSR/SSG) as the title of this issue states.

@tyteen4a03
Copy link

Can trigger this issue with 9.5+. Any ETA on this as I really want to upgrade to React 17 and webpack 5?

@andreyshedko
Copy link

If this will help someone, I had fixed this issue the following way:

const res = await fetch(`${process.env.HOST}/api/tags/read`, headers);
  const data = await res.json();
  let paths: { params: ParsedUrlQuery }[] = [];
  if (Array.isArray(data)) {
    paths = data.map((tag: Tag) => ({
      params: { tag: encodeURI(tag.tagName) },
    }));
  }

  return {
    paths,
    fallback: false,
  };

@aynik
Copy link

aynik commented Mar 16, 2021

As a workaround I used rewrites on next.config.js:

  async rewrites() {
    return [
      {
        source: `/${encodeURIComponent('カート')}`,
        destination: '/cart',
      },
      {
        source: `/${encodeURIComponent('アカウント')}`,
        destination: '/account',
      },
    ]
  }

jonrh added a commit to jonrh/jonogmarteinn.is that referenced this issue Mar 16, 2021
Mostly done but utf-8 page names is currently not supported in Next.js. Most likely a bug. 

See: 
vercel/next.js#19135
vercel/next.js#10084
jonrh added a commit to jonrh/jonogmarteinn.is that referenced this issue Apr 15, 2021
Mostly done but utf-8 page names is currently not supported in Next.js. Most likely a bug. 

See: 
vercel/next.js#19135
vercel/next.js#10084
jonrh added a commit to jonrh/jonogmarteinn.is that referenced this issue Apr 15, 2021
Note that there is a bug in Next.js where utf-8 page names do not work when developing locally. For example the route /málarameistari. However when deployed to Vercel it works as expected.

For now I will deal with local dev pain later, probably by just manually rerouting to ASCII routes via a config.

See: 
vercel/next.js#19135
vercel/next.js#10084
@JanDez
Copy link

JanDez commented Jun 24, 2021

Tested v9.5.0, v9.5.5, and v10.0.1 and none of them support statically generated pages with non-ascii names like /тест and /hæ. It worked as expected in a few versions I tested between v9.0.0 and v9.4.4. I would classify this is a bug or an undocumented breaking change in v9.5.

That happend to my with ñ's ans ´'s words

@ShahriarKh
Copy link
Contributor

for me, decodeURI is the answer:

export async function getStaticPaths() {
   const { posts } = await request(CMS, POSTS);

   const paths = posts.nodes.map((post) => ({
      params: { slug: decodeURI(post.slug) },
   }));

   return { paths, fallback: false };
}

@nbouvrette
Copy link
Contributor

nbouvrette commented Sep 12, 2021

We just released a new package that overcomes this issue (and many others): https://github.com/Avansai/next-multilingual

Looking forward to hearing feedback on our approach.

@Tobeyforce
Copy link

Tobeyforce commented Jul 2, 2022

While this package shows some promise, shouldn't international urls be supported by default? Internationalization is the concept of supporting multiple languages, which has nothing to do (maybe a little) with UTF-8-based urls.

It looks like this package e.g enforces every url to use a language prefix, e.g /fr/my-international-url
I think it's quite simple - international urls should be supported by default.

For example:
I want to use the swedish characters å,ä, ö and have a url called /påsk

This doesn't work. However, if I name my page p%C3%A5sk it works.... until I use getStaticPaths, then it breaks. Not to mention ISR revalidation doesn't work either.
This really causes a ton of confusion.

Using the approach above with rewrites is also not quite feasable when you got multiple pages using e.g getstaticpaths.
Using rewrites messes with packages like next-sitemap.

We just released a new package that overcomes this issue (and many others): https://github.com/Avansai/next-multilingual

Looking forward to hearing feedback on our approach.

NicholasLYang added a commit to vercel/turborepo that referenced this issue Jun 20, 2023
### Description

We're moving all paths to UTF-8 for a whole bunch of reasons such as:

- We know it'll be supported everywhere, across platforms, in the
browser, and so on.
- We have no evidence that any user is using non-UTF-8 paths
- It's very very hard to manipulate paths without converting them to
Rust Strings.
- For instance the [only way to add a trailing
slash](https://users.rust-lang.org/t/trailing-in-paths/43166/8) to a
path is by doing `path.push("")`
- `bstr` [implicitly
converts](https://docs.rs/bstr/latest/bstr/#handling-of-invalid-utf-8)
invalid Unicode into replacement characters, which is probably not what
we want
- `bstr` also explicitly notes that the end result of its conversion
functions (which again either error or implicitly convert) is [“you’re
guaranteed to write correct code for Unix, at the cost of getting a
corner case wrong on
Windows”](https://docs.rs/bstr/latest/bstr/#file-paths-and-os-strings)
- Considering we know that we have Windows users and are committed to
supporting Windows, that should be a higher priority than supporting
hypothetical users using non-UTF-8 encodings.
- To quote [camino](https://docs.rs/camino/latest/camino/):
- “Unicode is the common subset of supported paths across Windows and
Unix platforms.”
- “The '[makefile
problem](https://www.mercurial-scm.org/wiki/EncodingStrategy#The_.22makefile_problem.22)' (which
also applies to `Cargo.toml`, and any other metadata file that lists the
names of other files) has *no general, cross-platform solution* in
systems that support non-UTF-8 paths. However, restricting paths to
UTF-8 eliminates this problem.”
- Basically, if we have non-Unicode encodings, you could have
“packages/星巴克” in your turbo.json that does not match to “packages/星巴克”
in your file system because the file system is using big5 and turbo.json
is using Unicode.
- “There are already many systems, such as Cargo, that only support
UTF-8 paths. If your own tool interacts with any such system, you can
assume that paths are valid UTF-8 without creating any additional
burdens on consumers.”
- [npm does not allow even Unicode in package
names](https://github.com/npm/validate-npm-package-name). Only url-safe
characters, i.e. characters, numbers and a few other ASCII characters
- Next has [issues with Unicode paths
too](vercel/next.js#10084)
- How would you even import a non-Unicode JavaScript file? JavaScript
strings are Unicode.
- `path-slash` also only works on `AsRef<str` or requires a lossy
conversion.
- Glob walking appears to assume UTF-8 as well.
- This simplifies our code significantly since we can drop a lot of
errors on invalid Unicode that are sprinkled throughout the codebase.

### Testing Instructions

<!--
  Give a quick description of steps to test your changes.
-->

---------

Co-authored-by: --global <Nicholas Yang>
@rinarakaki
Copy link

I keep getting this error when I go to non-ascii path in the local dev mode (npm run dev), trying to use Dynamic Routes with App Router:

TypeError: Cannot convert argument to a ByteString because the character at index 8 has a value of <value> which is greater than 255.
    at Object.fetch (node:internal/deps/undici/undici:16287:11)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async invokeRequest (/path/to/node_modules/next/dist/server/lib/server-ipc/invoke-request.js:21:12)
    at async invokeRender (/path/to/node_modules/next/dist/server/lib/router-server.js:229:29)
    at async handleRequest (/path/to/node_modules/next/dist/server/lib/router-server.js:422:24)
    at async requestHandler (/path/to/node_modules/next/dist/server/lib/router-server.js:439:13)

@rinarakaki

This comment has been minimized.

joekiller added a commit to joekiller/counter-two that referenced this issue Jan 3, 2024
generateStaticParams behaves differently when doing SSG in dev vs build. So the slug params must have `encodeURI(param.slug)` called before being put into the list of generateStaticParams (even if they are not really encoded) and then any page must call `decodeURI(decodeURI(param.slug))` to make it work on build and dev. Blah.

https://github.com/vercel/next.js/blame/5d5f58560f46b3300d2e5dc7de90025f46730da1/packages/next/src/server/base-server.ts#L2098C7-L2099

seems similar to:
vercel/next.js#10084
vercel/next.js#30007
joekiller added a commit to joekiller/counter-two that referenced this issue Jan 4, 2024
In dev mode incoming `[slug]` requests get the `encodeURI` treatment which then gives you the error that your `[slug]` is not part of the generateStaticParams result. BLAH! SO, to make dev mode work, you have to ensure params created by generateStaticParams that have non-ASCII, UTF-8, characters get the `encodeURI` treatment prior to returning their value so when a request comes in, it'll find the `encodeURI` param BUT of course that makes your `{params: {slug}}` encoded on the page side. So you must `decodeURI` the slug on the page side for it to match it up. Do not encode for build thought, or they will be encoded on the filesystem which is undesired.

https://github.com/vercel/next.js/blame/5d5f58560f46b3300d2e5dc7de90025f46730da1/packages/next/src/server/base-server.ts#L2098C7-L2099

seems similar to:
vercel/next.js#10084
vercel/next.js#30007

fix: decode once, always
@coffeecupjapan
Copy link
Contributor

coffeecupjapan commented Jan 8, 2024

I just take a quick look at this problem only , but it seems like nextjs at build runtime replace any non-word characters to blank and therefore you cannot use non-ascii words, I assume.

// replace any non-word characters since they can break
// the named regex
let cleanedKey = key.replace(/\W/g, '')

Are any one eagerly want to do add non-ascii (ex. UTF-8) words at least here ( and supposely more, I cannot pick every dependencies.. sorry) ?

/**
* Builds a function to generate a minimal routeKey using only a-z and minimal
* number of characters.
*/
function buildGetSafeRouteKey() {
let i = 0
return () => {
let routeKey = ''
let j = ++i
while (j > 0) {
routeKey += String.fromCharCode(97 + ((j - 1) % 26))
j = Math.floor((j - 1) / 26)
}
return routeKey
}
}

@abdessamadely
Copy link

In my case, I encountered this issue with Arabic pathnames, After debugging a little I noticed that we have a misalignment between dev, and export (on validation I think), as a workaround, I did the following:

process.env.NODE_ENV === 'development' ? encodeURI(page) : page
// or
process.env.NODE_ENV === 'development' ? encodeURI(page) : decodeURI(page)

On dev, I encoded the pathname. So, it would match what the Next server has, but on export/build, I give it the value I want for the generated filename.

Full example:

const pages = ['من-نحن', 'سياسة-الخصوصية', 'الشروط-والأحكام']
export async function generateStaticParams() {
  return pages.map((page) => ({
    pathname: process.env.NODE_ENV === 'development' ? encodeURI(page) : page 
  }))
}

@Arctomachine
Copy link

What exactly is main source of this problem? We could come up with solution together and fix it by v15 stable release perhaps

@vncafecode
Copy link

It's been 4 years and no one else at Next showing interest of fixing it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.