diff --git a/review-drafts/2023-02.bs b/review-drafts/2023-02.bs new file mode 100644 index 00000000..2d8aa07e --- /dev/null +++ b/review-drafts/2023-02.bs @@ -0,0 +1,4023 @@ +
+Group: WHATWG
+Status: RD
+Date: 2023-02-20
+H1: URL
+Shortname: url
+Text Macro: TWITTER urlstandard
+Text Macro: LATESTRD 2023-02
+Abstract: The URL Standard defines URLs, domains, IP addresses, the application/x-www-form-urlencoded
format, and their API.
+Translation: ja https://triple-underscore.github.io/URL-ja.html
+Required IDs: application/x-www-form-urlencoded,urlencoded-parsing
+
+
++spec: ECMA-262; url: https://tc39.es/ecma262/#sec-encodeuricomponent-uricomponent; text: "encodeURIComponent() [sic]"; type: method +spec: UTS46; urlPrefix: https://www.unicode.org/reports/tr46/ + type: abstract-op; text: ToASCII; url: #ToASCII + type: abstract-op; text: ToUnicode; url: #ToUnicode ++ + + + + +
The URL standard takes the following approach towards making URLs fully interoperable: + +
Align RFC 3986 and RFC 3987 with contemporary implementations and + obsolete the RFCs in the process. (E.g., spaces, other "illegal" code points, + query encoding, equality, canonicalization, are all concepts not entirely + shared, or defined.) URL parsing needs to become as solid as HTML parsing. + [[RFC3986]] + [[RFC3987]] + +
Standardize on the term URL. URI and IRI are just confusing. In + practice a single algorithm is used for both so keeping them distinct is + not helping anyone. URL also easily wins the + search result popularity contest. + +
Supplanting Origin of a URI [sic]. + [[RFC6454]] + +
Define URL's existing JavaScript API in full detail and add
+ enhancements to make it easier to work with. Add a new URL
+ object as well for URL manipulation without usage of HTML elements. (Useful
+ for JavaScript worker environments.)
+
+
Ensure the combination of parser, serializer, and API guarantee idempotence. For example, a + non-failure result of a parse-then-serialize operation will not change with any further + parse-then-serialize operations applied to it. Similarly, manipulating a non-failure result through + the API will not change from applying any number of serialize-then-parse operations to it. +
As the editors learn more about the subject matter the goals +might increase in scope somewhat. + + + +
This specification depends on Infra. [[!INFRA]] + +
Some terms used in this specification are defined in the following standards and specifications: + +
To serialize an integer, represent it as the shortest possible decimal +number. + + +
A validation error indicates a mismatch between input and +valid input. User agents, especially conformance checkers, are encouraged to report them somewhere. + +
A validation error does not mean that the parser terminates. Termination of a parser is + always stated explicitly, e.g., through a return statement. + +
It is useful to signal validation errors as error-handling can be non-intuitive, legacy + user agents might not implement correct error-handling, and the intent of what is written might be + unclear to other developers. +
Error type + | Error description + | Failure + + |
---|---|---|
IDNA + | ||
domain-to-ASCII + |
+ Unicode ToASCII records an error or returns the empty string. + [[UTS46]] + If details about Unicode ToASCII errors are + recorded, user agents are encouraged to pass those along. + | Yes + |
domain-to-Unicode + |
+ Unicode ToUnicode records an error. [[UTS46]] + The same considerations as with domain-to-ASCII apply. + | · + |
Host parsing + + | ||
domain-invalid-code-point + |
+ The input's host contains a forbidden domain code point. +
+
+ Hosts are percent-decoded before being processed when the URL
+ is special, which would result in the following host portion becoming
+ " " | Yes + + |
host-invalid-code-point + |
+ An opaque host (in a URL that is not special) contains a + forbidden host code point. + " | Yes + + |
IPv4-empty-part + |
+ An IPv4 address ends with a U+002E (.). + " | · + |
IPv4-too-many-parts + |
+ An IPv4 address does not consist of exactly 4 parts. + " | Yes + |
IPv4-non-numeric-part + |
+ An IPv4 address part is not numeric. + " | Yes + |
IPv4-non-decimal-part + |
+ The IPv4 address contains numbers expressed using hexadecimal or octal digits. + " | · + |
IPv4-out-of-range-part + |
+ An IPv4 address part exceeds 255. + " | Yes (only if applicable to the last part) + + |
IPv6-unclosed + |
+ An IPv6 address is missing the closing U+005D (]). + https://[::1" + | Yes + + |
IPv6-invalid-compression + |
+ An IPv6 address begins with improper compression. + " | Yes + |
IPv6-too-many-pieces + |
+ An IPv6 address contains more than 8 pieces. + " | Yes + |
IPv6-multiple-compression + |
+ An IPv6 address is compressed in more than one spot. + " | Yes + |
IPv6-invalid-code-point + |
+ An IPv6 address contains a code point that is neither an ASCII hex digit + nor a U+003A (:). Or it unexpectedly ends. +
+
+ " " | Yes + |
IPv6-too-few-pieces + |
+ An uncompressed IPv6 address contains fewer than 8 pieces. + " | Yes + |
IPv4-in-IPv6-too-many-pieces + |
+ An IPv6 address with IPv4 address syntax: the IPv6 address has more + than 6 pieces. + " | Yes + |
IPv4-in-IPv6-invalid-code-point + |
+ An IPv6 address with IPv4 address syntax: +
+
+ " " " " " | Yes + |
IPv4-in-IPv6-out-of-range-part + |
+ An IPv6 address with IPv4 address syntax: an IPv4 part exceeds 255. + " | Yes + |
IPv4-in-IPv6-too-few-parts + |
+ An IPv6 address with IPv4 address syntax: an IPv4 address contains + too few parts. + " | Yes + |
URL parsing + + | ||
invalid-URL-unit + |
+ A code point is found that is not a URL unit. +
+
+ " " " " | · + |
special-scheme-missing-following-solidus + |
+ The input's scheme is not followed by "
+
+ " "
+ | · + |
missing-scheme-non-relative-URL + |
+ The input is missing a scheme, because it does not begin with an + ASCII alpha, and either no base URL was provided or the base URL cannot be + used as a base URL because it has an opaque path. +
+
+ Input's scheme is missing and no base URL is given: +
+ Input's scheme is missing, but the base URL has an + opaque path. +
+ | Yes + |
invalid-reverse-solidus + |
+ The URL has a special scheme and it uses U+005C (\) instead of U+002F (/). + " | · + |
invalid-credentials + |
+ The input includes credentials. +
+
+ " " | Yes (only if there is no host) + |
host-missing + |
+ The input has a special scheme, but does not contain a host. +
+
+ " " | · + |
port-out-of-range + |
+ The input's port is too big. + " | Yes + |
port-invalid + |
+ The input's port is invalid. + " | Yes + |
file-invalid-Windows-drive-letter + |
+ The input is a relative-URL string that starts with a Windows drive letter and
+ the base URL's scheme is "
+ | · + |
file-invalid-Windows-drive-letter-host + |
+ A " | · + |
The EOF code point is a conceptual code point that signifies the end of a string or +code point stream. + +
A pointer for a string input is an integer that points to a +code point within input. Initially it points to the start of +input. If it is −1 it points nowhere. If it is greater than or equal to +input's code point length, it points to the EOF code point. + +
When a pointer is used, c references the code point the +pointer points to as long as it does not point nowhere. When the pointer points to +nowhere c cannot be used. + +
When a pointer is used, remaining references the +code point substring from the +pointer + 1 to the end of the string, as long as c is not the EOF code point. +When c is the EOF code point remaining cannot be used. + +
If "mailto:username@example
" is a string
+being processed and a pointer points to @, c is U+0040 (@) and remaining is
+"example
".
+
+
If the empty string is being processed and a pointer +points to the start and is then decreased by 1, using c or remaining would be an +error. + + +
A percent-encoded byte is U+0025 (%), followed by two ASCII hex digits. + +
It is generally a good idea for sequences of percent-encoded bytes to be such +that, when percent-decoded and then passed to +UTF-8 decode without BOM or fail, they do not end up as failure. How important this is +depends on where the percent-encoded bytes are used. E.g., for the host parser not +following this advice is fatal, whereas for URL rendering the +percent-encoded bytes would not be rendered percent-decoded. + +
To percent-encode a byte byte, +return a string consisting of U+0025 (%), followed by two ASCII upper hex digits +representing byte. +
To percent-decode a +byte sequence input, run these steps: + +
Using anything but UTF-8 decode without BOM when input contains +bytes that are not ASCII bytes might be insecure and is not recommended. + +
Let output be an empty byte sequence. + +
For each byte byte in input: + +
If byte is not 0x25 (%), then append byte to output. + +
Otherwise, if byte is 0x25 (%) and the next two bytes after + byte in input are not in the ranges 0x30 (0) to 0x39 (9), + 0x41 (A) to 0x46 (F), and 0x61 (a) to 0x66 (f), all inclusive, append byte to + output. + +
Otherwise: + +
Let bytePoint be the two bytes after byte in input, + decoded, and then interpreted as hexadecimal number. + + +
Append a byte whose value is bytePoint to + output. + +
Skip the next two bytes in input. +
Return output. +
To percent-decode a scalar value string +input: + +
Let bytes be the UTF-8 encoding of input. + +
Return the percent-decoding of bytes. +
In general, percent-encoding results in a string with more U+0025 (%) code points than +the input, and percent-decoding results in a byte sequence with less 0x25 (%) bytes than the input. +
The C0 control percent-encode set are the C0 controls +and all code points greater than U+007E (~). + +
The fragment percent-encode set is the C0 control percent-encode set and +U+0020 SPACE, U+0022 ("), U+003C (<), U+003E (>), and U+0060 (`). + +
The query percent-encode set is the C0 control percent-encode set and +U+0020 SPACE, U+0022 ("), U+0023 (#), U+003C (<), and U+003E (>). + +
The query percent-encode set cannot be defined in terms of the +fragment percent-encode set due to the omission of U+0060 (`). + +
The special-query percent-encode set is the query percent-encode set and +U+0027 ('). + +
The path percent-encode set is the +query percent-encode set and U+003F (?), U+0060 (`), U+007B ({), and U+007D (}). + +
The userinfo percent-encode set is the +path percent-encode set and U+002F (/), U+003A (:), U+003B (;), U+003D (=), U+0040 (@), +U+005B ([) to U+005E (^), inclusive, and U+007C (|). + +
The component percent-encode set is the userinfo percent-encode set and +U+0024 ($) to U+0026 (&), inclusive, U+002B (+), and U+002C (,). + +
This is used by HTML for
+{{NavigatorContentUtils/registerProtocolHandler()}}, and could also be used by other standards to
+percent-encode data that can then be embedded in a URL's path,
+query, or fragment; or in an opaque host. Using it with
+UTF-8 percent-encode gives identical results to JavaScript's
+encodeURIComponent()
[sic]. [[HTML]] [[ECMA-262]]
+
+
The application/x-www-form-urlencoded
percent-encode set is the
+component percent-encode set and U+0021 (!), U+0027 (') to U+0029 RIGHT PARENTHESIS,
+inclusive, and U+007E (~).
+
+
The application/x-www-form-urlencoded
percent-encode set contains
+all code points, except the ASCII alphanumeric, U+002A (*), U+002D (-), U+002E (.), and
+U+005F (_).
+
+
To percent-encode after encoding, given an encoding +encoding, scalar value string input, a +percentEncodeSet, and an optional boolean spaceAsPlus (default false): + +
Let encoder be the result of getting an encoder from encoding. + +
Let inputQueue be input converted to an I/O queue. + +
Let output be the empty string. + +
Let potentialError be 0. + +
This needs to be a non-null value to initiate the subsequent while loop. + +
While potentialError is non-null: + +
Let encodeOutput be an empty I/O queue. + +
Set potentialError to the result of running encode or fail with + inputQueue, encoder, and encodeOutput. + +
For each byte of encodeOutput converted to a byte sequence: + +
If spaceAsPlus is true and byte is 0x20 (SP), then append + U+002B (+) to output and continue. + +
Let isomorph be a code point whose value + is byte's value. + +
Assert: percentEncodeSet includes all non-ASCII code points. + +
If isomorph is not in percentEncodeSet, then append + isomorph to output. + +
Otherwise, percent-encode byte and append the result to + output. +
If potentialError is non-null, then append "%26%23
", followed by the
+ shortest sequence of ASCII digits representing potentialError in base
+ ten, followed by "%3B
", to output.
+
+
This can happen when encoding is not UTF-8. +
Return output. +
Of the possible values for the percentEncodeSet argument only two end up
+encoding U+0025 (%) and thus give “roundtripable data”: component percent-encode set and
+application/x-www-form-urlencoded
percent-encode set. The other values for the
+percentEncodeSet argument — which happen to be used by the URL parser — leave
+U+0025 (%) untouched and as such it needs to be
+percent-encoded first in order to be properly
+represented.
+
+
To UTF-8 percent-encode a +scalar value scalarValue using a percentEncodeSet, return the +result of running percent-encode after encoding with UTF-8, +scalarValue as a string, and percentEncodeSet. +
To UTF-8 percent-encode a scalar value string +input using a percentEncodeSet, return the result of running +percent-encode after encoding with UTF-8, input, and +percentEncodeSet. +
Here is a summary, by way of example, of the operations defined above: + +
Operation + | Input + | Output + |
---|---|---|
Percent-encode input + | 0x23 + | "%23 "
+ |
0x7F + | "%7F "
+ | |
Percent-decode input + | `%25%s%1G `
+ | `%%s%1G `
+ |
Percent-decode input + | "‽%25%2E "
+ | 0xE2 0x80 0xBD 0x25 0x2E + |
Percent-encode after encoding with Shift_JIS, + input, and the userinfo percent-encode set + | " "
+ | "%20 "
+ |
"≡ "
+ | "%81%DF "
+ | |
"‽ "
+ | "%26%238253%3B "
+ | |
Percent-encode after encoding with ISO-2022-JP, input, + and the userinfo percent-encode set + | "¥ "
+ | "%1B(J\%1B(B "
+ |
Percent-encode after encoding with Shift_JIS, input, the + userinfo percent-encode set, and true + | "1+1 ≡ 2%20‽ "
+ | "1+1+%81%DF+2%20%26%238253%3B "
+ |
UTF-8 percent-encode input using the + userinfo percent-encode set + | U+2261 (≡) + | "%E2%89%A1 "
+ |
U+203D (‽) + | "%E2%80%BD "
+ | |
UTF-8 percent-encode input using the + userinfo percent-encode set + | "Say what‽ "
+ | "Say%20what%E2%80%BD "
+ |
The security of a URL is a function of its environment. Care is to be +taken when rendering, interpreting, and passing URLs around. + +
When rendering and allocating new URLs "spoofing" needs to be considered. An attack +whereby one host or URL can be confused for another. For instance, +consider how 1/l/I, m/rn/rri, 0/O, and а/a can all appear eerily similar. Or worse, consider how +U+202A LEFT-TO-RIGHT EMBEDDING and similar code points are invisible. [[UTR36]] + +
When passing a URL from party A to B, both need to +carefully consider what is happening. A might end up leaking data it does not +want to leak. B might receive input it did not expect and take an action that +harms the user. In particular, B should never trust A, as at some +point URLs from A can come from untrusted sources. + + + +
At a high level, a host, valid host string, host parser, and +host serializer relate as follows: + +
The host parser takes an arbitrary scalar value string and returns either + failure or a host. + +
A host can be seen as the in-memory representation. + +
A valid host string defines what input would not trigger a validation error + or failure when given to the host parser. I.e., input that would be considered conforming or + valid. + +
The host serializer takes a host and returns an ASCII string. (If + that string is then parsed, the result will equal the + host that was serialized.) +
A parse-serialize roundtrip gives the + following results, depending on the isNotSpecial argument to the + host parser: + +
Input + | Output (isNotSpecial = false) + | Output (isNotSpecial = true) + |
---|---|---|
EXAMPLE.COM
+ | example.com (domain)
+ | EXAMPLE.COM (opaque host)
+ |
example%2Ecom
+ | example%2Ecom (opaque host)
+ | |
faß.example
+ | xn--fa-hia.example (domain)
+ | fa%C3%9F.example (opaque host)
+ |
0
+ | 0.0.0.0 (IPv4)
+ | 0 (opaque host)
+ |
%30
+ | %30 (opaque host)
+ | |
0x
+ | 0x (opaque host)
+ | |
0xffffffff
+ | 255.255.255.255 (IPv4)
+ | 0xffffffff (opaque host)
+ |
[0:0::1]
+ | [::1] (IPv6)
+ | |
[0:0::1%5D
+ | Failure + | |
[0:0::%31]
+ | ||
09
+ | Failure + | 09 (opaque host)
+ |
example.255
+ | example.255 (opaque host)
+ | |
example^example
+ | Failure + |
A host is a domain, an IP address, an +opaque host, or an empty host. Typically a host serves as a network +address, but it is sometimes used as opaque identifier in URLs where a network address +is not necessary. + +
A typical URL whose host is
+an opaque host is git://github.com/whatwg/url.git
.
+
+
The RFCs referenced in the paragraphs below are for informative purposes only. They +have no influence on host writing, parsing, and serialization. Unless stated otherwise +in the sections that follow. + +
A domain is a non-empty ASCII string that identifies a +realm within a network. +[[RFC1034]] + +
The domain labels of a domain domain are +the result of strictly splitting domain on U+002E (.). + +
The example.com
and example.com.
domains are
+not equivalent and typically treated as distinct.
+
+
An IP address is an IPv4 address or an IPv6 address. + +
An IPv4 address is a 32-bit unsigned integer that identifies a +network address. +[[RFC791]] + +
An IPv6 address is a 128-bit unsigned integer that identifies a +network address. For the purposes of this standard it is represented as a list of eight +16-bit unsigned integers, also known as +IPv6 pieces. +[[RFC4291]] + +
Support for <zone_id>
is
+intentionally omitted.
+
+
An opaque host is a non-empty ASCII string that can be used for further +processing. + +
An empty host is the empty string. + + +
A forbidden host code point is U+0000 NULL, U+0009 TAB, U+000A LF, U+000D CR, +U+0020 SPACE, U+0023 (#), U+002F (/), U+003A (:), U+003C (<), U+003E (>), U+003F (?), U+0040 (@), +U+005B ([), U+005C (\), U+005D (]), U+005E (^), or U+007C (|). + +
A forbidden domain code point is a forbidden host code point, +a C0 control, U+0025 (%), or U+007F DELETE. + +
To obtain the public suffix of a host host, +run these steps. They return null or a domain representing a portion of host +that is included on the Public Suffix List. [[!PSL]] + +
If host is not a domain, then return null. + +
Let trailingDot be ".
" if host
+ ends with ".
"; otherwise the empty string.
+
+
Let publicSuffix be the public suffix determined by running the + Public Suffix List algorithm + with host as domain. [[!PSL]] + +
Assert: publicSuffix is an ASCII string that does not
+ end with ".
".
+
+
Return publicSuffix and trailingDot concatenated. +
To obtain the registrable domain of a host +host, run these steps. They return null or a domain formed by +host's public suffix and the domain label preceding it, if +any. + +
If host's public suffix is null or host's + public suffix equals host, then return null. + +
Let trailingDot be ".
" if host
+ ends with ".
"; otherwise the empty string.
+
+
Let registrableDomain be the registrable domain determined by running the + Public Suffix List algorithm + with host as domain. [[!PSL]] + +
Assert: registrableDomain is an ASCII string that does not
+ end with ".
".
+
+
Return registrableDomain and trailingDot concatenated. +
Host input + | Public suffix + | Registrable domain + |
---|---|---|
com
+ | com
+ | null + |
example.com
+ | com
+ | example.com
+ |
www.example.com
+ | com
+ | example.com
+ |
sub.www.example.com
+ | com
+ | example.com
+ |
EXAMPLE.COM
+ | com
+ | example.com
+ |
example.com.
+ | com.
+ | example.com.
+ |
github.io
+ | github.io
+ | null + |
whatwg.github.io
+ | github.io
+ | whatwg.github.io
+ |
إختبار
+ | xn--kgbechtv
+ | null + |
example.إختبار
+ | xn--kgbechtv
+ | example.xn--kgbechtv
+ |
sub.example.إختبار
+ | xn--kgbechtv
+ | example.xn--kgbechtv
+ |
[2001:0db8:85a3:0000:0000:8a2e:0370:7334]
+ | null + | null + |
Specifications should prefer the origin concept +for security decisions. The notion of "public suffix" and +"registrable domain" cannot be relied-upon to provide a hard security boundary, as +the public suffix list will diverge from client to client. Specifications which ignore this advice +are encouraged to carefully consider whether URLs' schemes ought to be incorporated into any +decisions made, i.e. whether to use the same site or schemelessly same site +concepts. + + +
The domain to ASCII algorithm, given a string +domain and a boolean beStrict, runs these steps: + +
Let result be the result of running Unicode ToASCII + with domain_name set to domain, UseSTD3ASCIIRules set to + beStrict, CheckHyphens set to false, CheckBidi set to true, + CheckJoiners set to true, Transitional_Processing set to false, + and VerifyDnsLength set to beStrict. [[!UTS46]] + +
If beStrict is false, domain is an ASCII string, and
+ strictly splitting domain on U+002E (.) does not produce any
+ item that starts with an ASCII case-insensitive match for
+ "xn--
", this step is equivalent to ASCII lowercasing domain.
+
+
If result is a failure value, domain-to-ASCII validation error, + return failure. + +
If result is the empty string, domain-to-ASCII validation error, + return failure. + +
Return result. +
This document and the web platform at large use
+Unicode IDNA Compatibility Processing and not IDNA2008. For instance,
+☕.example
becomes xn--53h.example
and not failure. [[UTS46]] [[RFC5890]]
+
+
The domain to Unicode algorithm, given a domain +domain and a boolean beStrict, runs these steps: + +
Let result be the result of running + Unicode ToUnicode with domain_name set to domain, + CheckHyphens set to false, CheckBidi set to true, CheckJoiners set to true, + UseSTD3ASCIIRules set to beStrict, and Transitional_Processing set to + false. [[!UTS46]] + +
Signify domain-to-Unicode validation errors for any returned errors, and then, + return result. +
A valid host string must be a valid domain string, a +valid IPv4-address string, or: U+005B ([), followed by a +valid IPv6-address string, followed by U+005D (]). + +
A domain is a valid domain if these steps return success: + +
Let result be the result of running domain to ASCII with domain + and true. + +
If result is failure, then return failure. + +
Set result to the result of running domain to Unicode with + result and true. + +
If result contains any errors, return failure. + +
Return success. +
Ideally we define this in terms of a sequence of code points that make up a +valid domain rather than through a whack-a-mole: +issue 245. + +
A valid domain string must be a string that is a +valid domain. + +
A valid IPv4-address string must be four shortest +possible strings of ASCII digits, representing a decimal number in the range 0 to 255, +inclusive, separated from each other by U+002E (.). + +
A valid IPv6-address string is defined in the +"Text Representation of Addresses" chapter of IP Version 6 Addressing Architecture. +[[!RFC4291]] + + +
A valid opaque-host string must be one of the following: + +
one or more URL units excluding forbidden host code points +
U+005B ([), followed by a valid IPv6-address string, followed by U+005D (]). +
This is not part of the definition of valid host string as it requires context +to be distinguished. + + +
The host parser takes a +scalar value string input with an optional boolean isNotSpecial +(default false), and then runs these steps. They return failure or a host. + +
If input starts with U+005B ([), then: + +
If input does not end with U+005D (]), IPv6-unclosed + validation error, return failure. + +
Return the result of IPv6 parsing input with its + leading U+005B ([) and trailing U+005D (]) removed. +
If isNotSpecial is true, then return the result of + opaque-host parsing input. + +
Assert: input is not the empty string. + +
Let domain be the result of running UTF-8 decode without BOM on the + percent-decoding of input. + +
Alternatively UTF-8 decode without BOM or fail can be used, coupled with an + early return for failure, as domain to ASCII fails on U+FFFD (�). + +
Let asciiDomain be the result of running domain to ASCII with + domain and false. + +
If asciiDomain is failure, then return failure. + +
If asciiDomain contains a forbidden domain code point, + domain-invalid-code-point validation error, return failure. + +
If asciiDomain ends in a number, then return + the result of IPv4 parsing asciiDomain. + +
Return asciiDomain. +
The ends in a number checker takes an ASCII string input and then +runs these steps. They return a boolean. + +
Let parts be the result of strictly splitting input on + U+002E (.). + +
If the last item in parts is the empty string, then: + +
+ +Let last be the last item in parts. + +
If last is non-empty and contains only ASCII digits, then return true. + +
The erroneous input "09
" will be caught by the IPv4 parser at a
+ later stage.
+
+
If parsing last as an IPv4 number does not return + failure, then return true. + +
This is equivalent to checking that last is "0X
" or
+ "0x
", followed by zero or more ASCII hex digits.
+
+
Return false. +
The IPv4 parser takes an ASCII string input +and then runs these steps. They return failure or an IPv4 address. + +
The IPv4 parser is not to be invoked directly. Instead check that the +return value of the host parser is an IPv4 address. + +
Let parts be the result of strictly splitting input on + U+002E (.). + +
If the last item in parts is the empty string, then: + +
+ +If parts's size is greater than 4, IPv4-too-many-parts + validation error, return failure. + +
Let numbers be an empty list. + +
For each part of parts: + +
Let result be the result of parsing + part. + +
If result is failure, IPv4-non-numeric-part validation error, + return failure. + +
If result[1] is true, IPv4-non-decimal-part validation error. + +
Append result[0] to numbers. +
If any item in numbers is greater than 255, IPv4-out-of-range-part + validation error. + +
If any but the last item in numbers is greater than 255, then + return failure. + +
If the last item in numbers is greater than or equal to + 256(5 − numbers's size), then return failure. + +
Let ipv4 be the last item in numbers. + +
Let counter be 0. + +
For each n of numbers: + +
Increment ipv4 by n × + 256(3 − counter). + +
Increment counter by 1. +
Return ipv4. +
The IPv4 number parser takes an ASCII string input and then runs +these steps. They return failure or a tuple of a number and a boolean. + +
If input is the empty string, then return failure. + +
Let validationError be false. + +
Let R be 10. + +
If input contains at least two code points and the first two code points are either
+ "0X
" or "0x
", then:
+
+
Set validationError to true. + +
Remove the first two code points from input. + +
Set R to 16. +
Otherwise, if input contains at least two code points and the first code point is + U+0030 (0), then: + + +
Set validationError to true. + +
Remove the first code point from input. + +
Set R to 8. +
If input is the empty string, then return (0, true). + + +
If input contains a code point that is not a radix-R digit, then + return failure. + + +
Let output be the mathematical integer value that is represented by + input in radix-R notation, using ASCII hex digits for digits with + values 0 through 15. + + +
Return (output, validationError). +
The IPv6 parser takes a scalar value string +input and then runs these steps. They return failure or an IPv6 address. + +
The IPv6 parser could in theory be invoked directly, but please discuss +actually doing that with the editors of this document first. + +
Let address be a new IPv6 address whose IPv6 pieces are all 0. + +
Let pieceIndex be 0. + +
Let compress be null. + +
Let pointer be a pointer for input. + +
If c is U+003A (:), then: + +
If remaining does not start with U+003A (:), IPv6-invalid-compression + validation error, return failure. + +
Increase pointer by 2. + +
Increase pieceIndex by 1 and then set compress to + pieceIndex. +
While c is not the EOF code point: + +
If pieceIndex is 8, IPv6-too-many-pieces validation error, return + failure. + +
If c is U+003A (:), then: + +
If compress is non-null, IPv6-multiple-compression + validation error, return failure. + +
Let value and length be 0. + +
While length is less than 4 and c is an ASCII hex digit, set + value to value × 0x10 + c interpreted as hexadecimal number, + and increase pointer and length by 1. + +
If c is U+002E (.), then: + +
If length is 0, IPv4-in-IPv6-invalid-code-point + validation error, return failure. + +
Decrease pointer by length. + +
If pieceIndex is greater than 6, IPv4-in-IPv6-too-many-pieces + validation error, return failure. + +
Let numbersSeen be 0. + +
While c is not the EOF code point: + +
Let ipv4Piece be null. + +
If numbersSeen is greater than 0, then: + +
If c is a U+002E (.) and numbersSeen is less than 4, then increase + pointer by 1. + +
If c is not an ASCII digit, IPv4-in-IPv6-invalid-code-point + validation error, return failure. + + +
While c is an ASCII digit: + +
Let number be c interpreted as decimal number. + +
If ipv4Piece is null, then set ipv4Piece to number. + +
Otherwise, if ipv4Piece is 0, IPv4-in-IPv6-invalid-code-point + validation error, return failure. + +
Otherwise, set ipv4Piece to ipv4Piece × 10 + + number. + +
If ipv4Piece is greater than 255, IPv4-in-IPv6-out-of-range-part + validation error, return failure. + +
Increase pointer by 1. +
Set address[pieceIndex] to + address[pieceIndex] × 0x100 + ipv4Piece. + +
Increase numbersSeen by 1. + +
If numbersSeen is 2 or 4, then increase pieceIndex by 1. +
If numbersSeen is not 4, IPv4-in-IPv6-too-few-parts + validation error, return failure. + +
Break. +
Otherwise, if c is U+003A (:): + +
Increase pointer by 1. + +
If c is the EOF code point, IPv6-invalid-code-point + validation error, return failure. +
Otherwise, if c is not the EOF code point, IPv6-invalid-code-point + validation error, return failure. + +
Set address[pieceIndex] to value. + +
Increase pieceIndex by 1. +
If compress is non-null, then: + +
Let swaps be pieceIndex − compress. + +
Set pieceIndex to 7. + +
While pieceIndex is not 0 and swaps is greater than 0, swap + address[pieceIndex] with + address[compress + swaps − 1], and then decrease both + pieceIndex and swaps by 1. +
Otherwise, if compress is null and pieceIndex is not 8, + IPv6-too-few-pieces validation error, return failure. + +
Return address. +
The opaque-host parser takes a +scalar value string input, and then runs these steps. They return failure or an +opaque host. + +
If input contains a forbidden host code point, + host-invalid-code-point validation error, return failure. + +
If input contains a code point that is not a URL code point and not + U+0025 (%), invalid-URL-unit validation error. + +
If input contains a U+0025 (%) and the two code points following it are + not ASCII hex digits, invalid-URL-unit validation error. + +
Return the result of running UTF-8 percent-encode on input + using the C0 control percent-encode set. +
The host serializer takes a +host host and then runs these steps. They return an ASCII string. + +
If host is an IPv4 address, return the result of + running the IPv4 serializer on host. + +
Otherwise, if host is an IPv6 address, return U+005B ([), followed by the + result of running the IPv6 serializer on host, followed by U+005D (]). + +
Otherwise, host is a domain, opaque host, or empty host, + return host. +
Let output be the empty string. + +
Let n be the value of address. + +
For each i in the range 1 to 4, inclusive: + +
Prepend n % 256, serialized, to + output. + +
If i is not 4, then prepend U+002E (.) to output. + +
Set n to floor(n / 256). +
Return output. +
The IPv6 serializer takes an IPv6 address +address and then runs these steps. They return an ASCII string. + +
Let output be the empty string. + +
Let compress be an index to the first IPv6 piece in the first longest + sequences of address's IPv6 pieces that are 0. + +
In 0:f:0:0:f:f:0:0
it would point to
+ the second 0.
+
+
If there is no sequence of address's IPv6 pieces that are 0 that is + longer than 1, then set compress to null. + +
Let ignore0 be false. + +
For each pieceIndex in the range 0 to 7, inclusive: + +
If ignore0 is true and address[pieceIndex] is 0, then + continue. + +
Otherwise, if ignore0 is true, set ignore0 to false. + +
If compress is pieceIndex, then: + +
Let separator be "::
" if pieceIndex is 0, and
+ U+003A (:) otherwise.
+
+
Append separator to output. + +
Set ignore0 to true and continue. +
Append address[pieceIndex], represented as the shortest possible + lowercase hexadecimal number, to output. + +
If pieceIndex is not 7, then append U+003A (:) to output. +
Return output. +
This algorithm requires the recommendation from +A Recommendation for IPv6 Address Text Representation. +[[RFC5952]] + + + + +
To determine whether a host A +equals host B, +return true if A is B, and false otherwise. + +
Certificate comparison requires a host equivalence check that ignores the +trailing dot of a domain (if any). However, those hosts have also various other facets +enforced, such as DNS length, that are not enforced here, as URLs do not enforce them. If +anyone has a good suggestion for how to bring these two closer together, or what a good +unified model would be, please file an issue. + + + +
At a high level, a URL, valid URL string, URL parser, and +URL serializer relate as follows: + +
The URL parser takes an arbitrary scalar value string and returns either + failure or a URL. It might also record zero or more validation errors. + +
A URL can be seen as the in-memory representation. + +
A valid URL string defines what input would not trigger a validation error or + failure when given to the URL parser. I.e., input that would be considered conforming or + valid. + +
The URL serializer takes a URL and returns an ASCII string. (If + that string is then parsed, the result will equal the URL that was serialized.) The output of the + URL serializer is not always a valid URL string. +
Input + | Base + | Valid + | Output + |
---|---|---|---|
https:example.org
+ | + | ❌ + | https://example.org/
+ |
https://////example.com///
+ | + | ❌ + | https://example.com///
+ |
https://example.com/././foo
+ | + | ✅ + | https://example.com/foo
+ |
hello:world
+ | https://example.com/
+ | ✅ + | hello:world
+ |
https:example.org
+ | https://example.com/
+ | ❌ + | https://example.com/example.org
+ |
\example\..\demo/.\
+ | https://example.com/
+ | ❌ + | https://example.com/demo/
+ |
example
+ | https://example.com/demo
+ | ✅ + | https://example.com/example
+ |
file:///C|/demo
+ | + | ❌ + | file:///C:/demo
+ |
..
+ | file:///C:/demo
+ | ✅ + | file:///C:/
+ |
file://loc%61lhost/
+ | + | ✅ + | file:///
+ |
https://user:password@example.org/
+ | + | ❌ + | https://user:password@example.org/
+ |
https://example.org/foo bar
+ | + | ❌ + | https://example.org/foo%20bar
+ |
https://EXAMPLE.com/../x
+ | + | ✅ + | https://example.com/x
+ |
https://ex ample.org/
+ | + | ❌ + | Failure + |
example
+ | + | ❌, due to lack of base + | Failure + |
https://example.com:demo
+ | + | ❌ + | Failure + |
http://[www.example.com]/
+ | + | ❌ + | Failure + |
https://example.org//
+ | + | ✅ + | https://example.org//
+ |
https://example.com/[]?[]#[]
+ | + | ❌ + | https://example.com/[]?[]#[]
+ |
https://example/%?%#%
+ | + | ❌ + | https://example/%?%#%
+ |
https://example/%25?%25#%25
+ | + | ✅ + | https://example/%25?%25#%25
+ |
The base and output URL are represented in + serialized form for brevity. +
A URL is a struct that +represents a universal identifier. To disambiguate from a valid URL string it can also be +referred to as a URL record. + +
A URL's scheme is an +ASCII string that identifies the type of URL and can be used to +dispatch a URL for further processing after parsing. +It is initially the empty string. + +
A URL's username is an +ASCII string identifying a username. It is initially the empty string. + +
A URL's password is an +ASCII string identifying a password. It is initially the empty string. + +
A URL's host is null or a +host. It is initially null. + +
The following table lists allowed URL's scheme / + host combinations. + +
scheme + | host + | |||||
---|---|---|---|---|---|---|
domain + | IPv4 address + | IPv6 address + | opaque host + | empty host + | null + | |
Special schemes excluding "file "
+ | ✅ + | ✅ + | ✅ + | ❌ + | ❌ + | ❌ + |
"file "
+ | ✅ + | ✅ + | ✅ + | ❌ + | ✅ + | ❌ + |
Others + | ❌ + | ❌ + | ✅ + | ✅ + | ✅ + | ✅ + |
A URL's port is either +null or a 16-bit unsigned integer that identifies a networking port. It is initially null. + +
A URL's +path +is either a URL path segment or a list of zero or more URL path segments, +usually identifying a location. It is initially « ». + +
A special URL's path is always a +list, i.e., it is never opaque. + +
A URL's query is either +null or an ASCII string. It is initially null. + +
A URL's fragment is either null or +an ASCII string that can be used for further processing on the resource the +URL's other components identify. It is initially null. + +
A URL also has an associated +blob URL entry that is either null or a +blob URL entry. It is initially null. + +
This is used to support caching the object a "blob
" URL refers to as well
+as its origin. It is important that these are cached as the URL might be removed from
+the blob URL store between parsing and fetching, while fetching will still need to succeed.
+
+
The following table lists how valid URL strings, when parsed, map + to a URL's components. Username, password, and + blob URL entry are omitted; in the examples below they are the empty string, the + empty string, and null, respectively. + +
Input + | Scheme + | Host + | Port + | Path + | Query + | Fragment + |
---|---|---|---|---|---|---|
https://example.com/
+ | "https "
+ | "example.com "
+ | null + | « the empty string » + | null + | null + |
https://localhost:8000/search?q=text#hello
+ | "https "
+ | "localhost "
+ | 8000 + | « "search " »
+ | "q=text "
+ | "hello "
+ |
urn:isbn:9780307476463
+ | "urn "
+ | null + | null + | "isbn:9780307476463 "
+ | null + | null + |
file:///ada/Analytical%20Engine/README.md
+ |
A URL path segment is an ASCII string. It commonly refers to a +directory or a file, but has no predefined meaning. + +
A
+single-dot URL path segment
+is a URL path segment that is ".
" or an ASCII case-insensitive
+match for "%2e
".
+
+
+
A
+double-dot URL path segment
+is a URL path segment that is "..
" or an ASCII case-insensitive
+match for ".%2e
", "%2e.
", or "%2e%2e
".
+
+
+
+
A special scheme is an ASCII string that is listed in the first column +of the following table. The default port for a special scheme is listed in +the second column on the same row. The default port for any other ASCII string is +null. + +
Special scheme + | Default port + |
---|---|
"ftp " | 21 + |
"file " | null + |
"http " | 80 + |
"https " | 443 + |
"ws " | 80 + |
"wss " | 443 + |
A URL is special if its scheme is a +special scheme. A URL is not special if its scheme is +not a special scheme. + +
A URL +includes credentials if its +username or password is not the empty string. + + +
A URL has an opaque path if its path is a +URL path segment. + +
A URL cannot have a username/password/port if its
+host is null or the empty string, or its scheme is
+"file
".
+
+
A URL can be designated as base URL. + +
A base URL is useful for the URL parser when the input might be a +relative-URL string. + +
A Windows drive letter is two code points, of which the first is an ASCII alpha +and the second is either U+003A (:) or U+007C (|). + +
A normalized Windows drive letter is a Windows drive letter of which the second +code point is U+003A (:). + +
As per the URL writing section, only a +normalized Windows drive letter is conforming. + +
A string +starts with a Windows drive letter +if all of the following are true: + +
String + | Starts with a Windows drive letter + |
---|---|
"c: "
+ | ✅ + |
"c:/ "
+ | ✅ + |
"c:a "
+ | ❌ + |
To shorten a url's path: + +
Assert: url does not have an opaque path. + +
Let path be url's path. + +
If url's scheme is "file
", path's
+ size is 1, and path[0] is a normalized Windows drive letter, then
+ return.
+
+
Remove path's last item, if any. +
A valid URL string must be either a +relative-URL-with-fragment string or an absolute-URL-with-fragment string. + +
An +absolute-URL-with-fragment string must be +an absolute-URL string, optionally followed by U+0023 (#) and a URL-fragment string. + +
An absolute-URL string must be one of the following: + +
a URL-scheme string that is an ASCII case-insensitive match for a
+ special scheme and not an ASCII case-insensitive match for "file
",
+ followed by U+003A (:) and a scheme-relative-special-URL string
+
a URL-scheme string that is not an ASCII case-insensitive match for a + special scheme, followed by U+003A (:) and a relative-URL string +
a URL-scheme string that is an ASCII case-insensitive match for
+ "file
", followed by U+003A (:) and a scheme-relative-file-URL string
+
any optionally followed by U+003F (?) and a URL-query string. + +
A URL-scheme string must be one ASCII alpha, +followed by zero or more of ASCII alphanumeric, U+002B (+), U+002D (-), and U+002E (.). +Schemes should be registered in the +IANA URI [sic] Schemes registry. +[[!IANA-URI-SCHEMES]] +[[RFC7595]] + +
A relative-URL-with-fragment string +must be a relative-URL string, optionally followed by U+0023 (#) and a +URL-fragment string. + +
A relative-URL string must be one of the following, +switching on base URL's scheme: + +
file
"
+ a scheme-relative-special-URL string +
file
"
+ a scheme-relative-file-URL string +
a path-absolute-URL string if base URL's host is an + empty host +
a path-absolute-non-Windows-file-URL string if base URL's host + is not an empty host +
a scheme-relative-URL string +
any optionally followed by U+003F (?) and a URL-query string. + +
A non-null base URL is necessary when parsing a +relative-URL string. + +
A scheme-relative-special-URL string must be "//
", followed by a
+valid host string, optionally followed by U+003A (:) and a URL-port string, optionally
+followed by a path-absolute-URL string.
+
+
A URL-port string must be one of the following: + +
the empty string +
one or more ASCII digits representing a decimal number no greater than + 216 − 1. +
A scheme-relative-URL string must be
+"//
", followed by an opaque-host-and-port string, optionally followed by a
+path-absolute-URL string.
+
+
An opaque-host-and-port string must be either the empty string or: a +valid opaque-host string, optionally followed by U+003A (:) and a URL-port string. + +
A scheme-relative-file-URL string must
+be "//
", followed by one of the following:
+
+
a valid host string, optionally followed by a + path-absolute-non-Windows-file-URL string +
A path-absolute-URL string must be U+002F (/) +followed by a path-relative-URL string. + +
A path-absolute-non-Windows-file-URL string +must be a path-absolute-URL string that does not start with: U+002F (/), followed by a +Windows drive letter, followed by U+002F (/). + +
A path-relative-URL string must be zero or more +URL-path-segment strings, separated from each other by U+002F (/), and not start with +U+002F (/). + +
A +path-relative-scheme-less-URL string +must be a path-relative-URL string that does not start with: a URL-scheme string, +followed by U+003A (:). + +
A URL-path-segment string must be one of the +following: + +
zero or more URL units excluding U+002F (/) and U+003F (?), that together are not a + single-dot URL path segment or a double-dot URL path segment. +
A URL-query string must be zero or more URL units. + +
A URL-fragment string must be zero or more +URL units. + +
The URL code points are +ASCII alphanumeric, +U+0021 (!), +U+0024 ($), +U+0026 (&), +U+0027 ('), +U+0028 LEFT PARENTHESIS, +U+0029 RIGHT PARENTHESIS, +U+002A (*), +U+002B (+), +U+002C (,), +U+002D (-), +U+002E (.), +U+002F (/), +U+003A (:), +U+003B (;), +U+003D (=), +U+003F (?), +U+0040 (@), +U+005F (_), +U+007E (~), +and code points in the range U+00A0 to U+10FFFD, inclusive, excluding surrogates and +noncharacters. + + +
Code points greater than U+007F DELETE will be converted to +percent-encoded bytes by the URL parser. + +
In HTML, when the document encoding is a legacy encoding, code points in the +URL-query string that are higher than U+007F DELETE will be converted to +percent-encoded bytes using the document's encoding. This +can cause problems if a URL that works in one document is copied to another document that uses a +different document encoding. Using the UTF-8 encoding everywhere solves this problem. + +
For example, consider this HTML document: + +
+ <!doctype html>
+ <meta charset="windows-1252">
+ <a href="?smörgåsbord">Test</a>
+
+ Since the document encoding is windows-1252, the link's URL's query
+ will be "sm%F6rg%E5sbord
". If the document encoding had been UTF-8, it would instead
+ be "sm%C3%B6rg%C3%A5sbord
".
+
The URL units are URL code points and percent-encoded bytes. + +
Percent-encoded bytes can be used to encode code points that are not +URL code points or are excluded from being written. + +
There is no way to express a username or password of a +URL record within a valid URL string. + + +
The URL parser takes a +scalar value string input, with an optional null or base URL +base (default null) and an optional encoding encoding (default +UTF-8), and then runs these steps: + +
Non-web-browser implementations only need to implement the basic URL parser. + +
How user input in the web browser's address bar is converted to a +URL record is out-of-scope of this standard. This standard does include +URL rendering requirements as they pertain trust decisions. + +
Let url be the result of running the basic URL parser on input + with base and encoding. + +
If url is failure, return failure. + +
If url's scheme is not
+ "blob
", return url.
+
+
Set url's blob URL entry to the result of + resolving the blob URL url, if that did not return + failure, and null otherwise. + +
Return url. +
The basic URL parser takes a +scalar value string input, with an optional null or base URL +base (default null), an optional encoding encoding (default +UTF-8), an optional URL url, +and an optional state override state override, +and then runs these steps: + +
The encoding argument is a legacy concept only relevant for HTML. The + url and state override arguments are only for use by various APIs. [[HTML]] + + +
When the url and state override arguments are not passed, the + basic URL parser returns either a new URL or failure. If they are passed, the + algorithm modifies the passed url and can terminate without returning anything. +
If url is not given: + +
Set url to a new URL. + +
If input contains any leading or trailing C0 control or space, + invalid-URL-unit validation error. + +
Remove any leading and trailing C0 control or space from input. +
If input contains any ASCII tab or newline, invalid-URL-unit + validation error. + +
Remove all ASCII tab or newline from input. + +
Let state be state override + if given, or scheme start state otherwise. + +
Set encoding to the result of getting an output encoding from + encoding. + +
Let buffer be the empty string. + +
Let atSignSeen, insideBrackets, and passwordTokenSeen be + false. + +
Let pointer be a pointer for input. + +
Keep running the following state machine by switching on state. If after a run + pointer points to the EOF code point, go to the next step. Otherwise, increase + pointer by 1 and continue with the state machine. + +
If c is an ASCII alpha, + append c, lowercased, to buffer, and + set state to scheme state. + +
Otherwise, if state override is not given, set state to + no scheme state and decrease pointer by 1. + +
Otherwise, return failure. + + +
This indication of failure is used exclusively by the {{Location}} object's + {{Location/protocol}} setter. +
If c is an ASCII alphanumeric, U+002B (+), U+002D (-), or U+002E (.), + append c, lowercased, to buffer. + +
Otherwise, if c is U+003A (:), then: + +
If state override is given, then: + +
If url's scheme is a special scheme and + buffer is not a special scheme, then return. + +
If url's scheme is not a special scheme and + buffer is a special scheme, then return. + +
If url includes credentials or has a non-null port,
+ and buffer is "file
", then return.
+
+
If url's scheme is "file
" and its
+ host is an empty host, then return.
+
Set url's scheme to buffer. + +
If state override is given, then: + +
If url's port is url's scheme's + default port, then set url's port to null. + +
Return. +
Set buffer to the empty string. + +
If url's scheme is "file
", then:
+
+
If remaining does not start with "//
",
+ special-scheme-missing-following-solidus validation error.
+
+
Set state to file state. +
Otherwise, if url is special, base is non-null, and + base's scheme is url's scheme: + +
Assert: base is is special (and therefore does not + have an opaque path). + +
Set state to special relative or authority state. +
Otherwise, if url is special, set state to + special authority slashes state. + +
Otherwise, if remaining starts with an U+002F (/), set state to + path or authority state and increase pointer by 1. + +
Otherwise, set url's path to the empty string and set + state to opaque path state. +
Otherwise, if state override is not given, set + buffer to the empty string, state to + no scheme state, and start over (from the first code point + in input). + +
Otherwise, return failure. + + +
This indication of failure is used exclusively by the {{Location}} object's + {{Location/protocol}} setter. Furthermore, the non-failure termination earlier in this state + is an intentional difference for defining that setter. +
If base is null, or base has an opaque path and + c is not U+0023 (#), missing-scheme-non-relative-URL validation error, + return failure. + +
Otherwise, if base has an opaque path and c is + U+0023 (#), set url's scheme to + base's scheme, + url's path to + base's path, + url's query to + base's query, + url's fragment to the empty string, and set state to + fragment state. + +
Otherwise, if base's scheme is not "file
", set
+ state to relative state and decrease pointer by 1.
+
+
Otherwise, set state to file state and decrease pointer + by 1. +
If c is U+002F (/) and remaining starts with U+002F (/), then set + state to special authority ignore slashes state and increase + pointer by 1. + +
Otherwise, special-scheme-missing-following-solidus validation error, set + state to relative state and decrease pointer by 1. +
If c is U+002F (/), then set state to authority state. + +
Otherwise, set state to path state, and decrease pointer + by 1. +
Assert: base's scheme is not "file
".
+
+
If c is U+002F (/), then set state to relative slash state. + +
Otherwise, if url is special and c is U+005C (\), + invalid-reverse-solidus validation error, set state to + relative slash state. + +
Otherwise: + +
Set url's username to + base's username, + url's password to + base's password, + url's host to + base's host, + url's port to + base's port, + url's path to a clone of + base's path, and + url's query to + base's query. + +
If c is U+003F (?), then set url's query to the empty + string, and state to query state. + +
Otherwise, if c is U+0023 (#), set url's fragment to + the empty string and state to fragment state. + +
Otherwise, if c is not the EOF code point: + +
Set url's query to null. + +
Set state to path state and decrease pointer by 1. +
If url is special and c is U+002F (/) or U+005C (\), then: + +
If c is U+005C (\), invalid-reverse-solidus + validation error. + +
Set state to special authority ignore slashes state. +
Otherwise, if c is U+002F (/), then set state to + authority state. + +
Otherwise, set + url's username to + base's username, + url's password to + base's password, + url's host to + base's host, + url's port to + base's port, + state to path state, and then, decrease pointer by 1. +
If c is U+002F (/) and remaining starts with U+002F (/), then set + state to special authority ignore slashes state and increase + pointer by 1. + +
Otherwise, special-scheme-missing-following-solidus validation error, set + state to special authority ignore slashes state and decrease + pointer by 1. +
If c is neither U+002F (/) nor U+005C (\), then set state to + authority state and decrease pointer by 1. + +
Otherwise, special-scheme-missing-following-solidus validation error. +
If c is U+0040 (@), then: + +
If atSignSeen is true, then prepend "%40
" to
+ buffer.
+
+
Set atSignSeen to true. + +
For each codePoint in buffer: + +
If codePoint is U+003A (:) and passwordTokenSeen is false, + then set passwordTokenSeen to true and continue. + +
Let encodedCodePoints be the result of running + UTF-8 percent-encode codePoint using the + userinfo percent-encode set. + +
If passwordTokenSeen is true, then append encodedCodePoints to + url's password. + +
Otherwise, append encodedCodePoints to url's + username. +
Set buffer to the empty string. +
Otherwise, if one of the following is true: + +
c is the EOF code point, U+002F (/), U+003F (?), or U+0023 (#) +
url is special and c is U+005C (\) +
then: + +
If atSignSeen is true and buffer is the empty string, + invalid-credentials validation error, return failure. + + +
Decrease pointer by buffer's + code point length + 1, set buffer to the empty string, and set + state to host state. +
Otherwise, append c to buffer. +
If state override is given and url's scheme is
+ "file
", then decrease pointer by 1 and set state to
+ file host state.
+
+
Otherwise, if c is U+003A (:) and insideBrackets is false, then: + +
If buffer is the empty string, host-missing validation error, + return failure. + + +
If state override is given and state override is + hostname state, then return. + +
Let host be the result of host parsing buffer with + url is not special. + +
If host is failure, then return failure. + +
Set url's host to + host, buffer to the empty string, + and state to port state. +
Otherwise, if one of the following is true: + +
c is the EOF code point, U+002F (/), U+003F (?), or U+0023 (#) +
url is special and c is U+005C (\) +
then decrease pointer by 1, and then: + +
If url is special and buffer is the empty string, + host-missing validation error, return failure. + + +
Otherwise, if state override is given, buffer is the empty + string, and either url includes credentials or url's + port is non-null, return. + + +
Let host be the result of host parsing buffer with + url is not special. + +
If host is failure, then return failure. + +
Set url's host to + host, buffer to the empty string, + and state to path start state. + +
If state override is given, then return. +
Otherwise: + +
+If c is an ASCII digit, append c to buffer. + +
Otherwise, if one of the following is true: + +
c is the EOF code point, U+002F (/), U+003F (?), or U+0023 (#) +
url is special and c is U+005C (\) +
state override is given +
then: + +
If buffer is not the empty string, then: + +
Let port be the mathematical integer value that is represented + by buffer in radix-10 using ASCII digits for digits with values + 0 through 9. + +
If port is greater than 216 − 1, + port-out-of-range validation error, return failure. + +
Set url's port to null, if port is + url's scheme's default port; otherwise to port. + +
Set buffer to the empty string. +
If state override is given, then return. + +
Set state to path start state and decrease pointer by 1. +
Otherwise, port-invalid validation error, return failure. +
Set url's scheme to "file
".
+
+
Set url's host to the empty string. + +
If c is U+002F (/) or U+005C (\), then: + +
If c is U+005C (\), invalid-reverse-solidus validation error. + +
Set state to file slash state. +
Otherwise, if base is non-null and base's scheme is
+ "file
":
+
+
Set url's host to base's host, + url's path to a clone of base's + path, and url's query to base's + query. + +
If c is U+003F (?), then set url's query to the empty + string and state to query state. + +
Otherwise, if c is U+0023 (#), set url's fragment to + the empty string and state to fragment state. + +
Otherwise, if c is not the EOF code point: + +
Set url's query to null. + +
If the + code point substring from + pointer to the end of input does not + start with a Windows drive letter, then shorten url's + path. + +
Otherwise: + +
Set url's path to « ». +
This is a (platform-independent) Windows drive letter quirk. + +
Set state to path state and decrease pointer by 1. +
Otherwise, set state to path state, and decrease pointer + by 1. +
If c is U+002F (/) or U+005C (\), then: + +
If c is U+005C (\), invalid-reverse-solidus validation error. + +
Set state to file host state. +
Otherwise: + +
If base is non-null and base's scheme is
+ "file
", then:
+
+
If the code point substring + from pointer to the end of input does not + start with a Windows drive letter and base's path[0] is a + normalized Windows drive letter, then append base's + path[0] to url's path. + +
This is a (platform-independent) Windows drive letter quirk. + +
Set state to path state, and decrease pointer by 1. +
If c is the EOF code point, U+002F (/), U+005C (\), U+003F (?), or + U+0023 (#), then decrease pointer by 1 and then: + +
If state override is not given and buffer is a + Windows drive letter, file-invalid-Windows-drive-letter-host + validation error, set state to path state. + +
This is a (platform-independent) Windows drive letter quirk. buffer + is not reset here and instead used in the path state. + +
Otherwise, if buffer is the empty string, then: + +
Set url's host to the empty string. + +
If state override is given, then return. + +
Set state to path start state. +
Otherwise, run these steps: + +
Let host be the result of host parsing buffer with + url is not special. + +
If host is failure, then return failure. + +
If host is "localhost
", then set host to
+ the empty string.
+
+
Set url's host to host. + +
If state override is given, then return. + +
Set buffer to the empty string and state to + path start state. +
Otherwise, append c to buffer. +
If url is special, then: + +
If c is U+005C (\), invalid-reverse-solidus validation error. + +
Set state to path state. + +
If c is neither U+002F (/) nor U+005C (\), then decrease pointer + by 1. +
Otherwise, if state override is not given and c is U+003F (?), set + url's query to the empty string and state to + query state. + +
Otherwise, if state override is not given and c is U+0023 (#), set + url's fragment to the empty string and state to + fragment state. + +
Otherwise, if c is not the EOF code point: + +
Set state to path state. + +
If c is not U+002F (/), then decrease pointer by 1. +
Otherwise, if state override is given and url's + host is null, append the empty string to url's + path. +
If one of the following is true: + +
c is the EOF code point or U+002F (/) +
url is special and c is U+005C (\) +
state override is not given and c is U+003F (?) or U+0023 (#) +
then: + +
If url is special and c is U+005C (\), + invalid-reverse-solidus validation error. + +
If buffer is a double-dot URL path segment, then: + +
+ +Otherwise, if buffer is a single-dot URL path segment and if neither + c is U+002F (/), nor url is special and c is U+005C (\), + append the empty string to url's path. + +
Otherwise, if buffer is not a single-dot URL path segment, then: + +
+ +Set buffer to the empty string. + +
If c is U+003F (?), then set url's query to the empty + string and state to query state. + +
If c is U+0023 (#), then set url's fragment to the + empty string and state to fragment state. +
Otherwise, run these steps: + +
If c is not a URL code point and not U+0025 (%), + invalid-URL-unit validation error. + +
If c is U+0025 (%) and remaining does not start with two + ASCII hex digits, invalid-URL-unit validation error. + +
UTF-8 percent-encode c using the + path percent-encode set and append the result to buffer. +
If c is U+003F (?), then set url's query to the empty + string and state to query state. + +
Otherwise, if c is U+0023 (#), then set url's fragment + to the empty string and state to fragment state. + +
Otherwise: + +
If c is not the EOF code point, not a URL code point, and not + U+0025 (%), invalid-URL-unit validation error. + +
If c is U+0025 (%) and remaining does not start with two + ASCII hex digits, invalid-URL-unit validation error. + +
If c is not the EOF code point, + UTF-8 percent-encode c using the + C0 control percent-encode set and append the result to url's + path. +
If encoding is not UTF-8 and one of the following is true: + +
url is not special +
url's scheme is "ws
" or "wss
"
+
then set encoding to UTF-8. + + +
If one of the following is true: + +
state override is not given and c is U+0023 (#) +
c is the EOF code point +
then: + +
Let queryPercentEncodeSet be the special-query percent-encode set if + url is special; otherwise the query percent-encode set. + +
Percent-encode after encoding, with encoding, + buffer, and queryPercentEncodeSet, and append the result to + url's query. + +
This operation cannot be invoked code-point-for-code-point due to the stateful + ISO-2022-JP encoder. + +
Set buffer to the empty string. + +
If c is U+0023 (#), then set url's fragment to + the empty string and state to fragment state. +
Otherwise, if c is not the EOF code point: + +
If c is not a URL code point and not U+0025 (%), + invalid-URL-unit validation error. + +
If c is U+0025 (%) and remaining does not start with two + ASCII hex digits, invalid-URL-unit validation error. + +
Append c to buffer. +
If c is not the EOF code point, then: + +
If c is not a URL code point and not U+0025 (%), + invalid-URL-unit validation error. + +
If c is U+0025 (%) and remaining does not start with two + ASCII hex digits, invalid-URL-unit validation error. + +
UTF-8 percent-encode c using the + fragment percent-encode set and append the result to url's + fragment. +
Return url. +
To set the username given a url and +username, set url's username to the result of running +UTF-8 percent-encode on username using the +userinfo percent-encode set. + +
To set the password given a url and +password, set url's password to the result of running +UTF-8 percent-encode on password using the +userinfo percent-encode set. + + +
The URL serializer takes a +URL url, with an optional boolean +exclude fragment (default false), and then runs +these steps. They return an ASCII string. + +
Let output be url's scheme and U+003A (:) concatenated. + +
If url's host is non-null: + +
Append "//
" to output.
+
+
If url includes credentials, then: + +
+ +Append url's host, + serialized, to output. + +
If url's port is non-null, append U+003A (:) followed by + url's port, serialized, to + output. +
If url's host is null, url does not have an + opaque path, url's path's size is greater + than 1, and url's path[0] is the empty string, then append U+002F (/) + followed by U+002E (.) to output. + +
This prevents web+demo:/.//not-a-host/
or
+ web+demo:/path/..//not-a-host/
, when parsed and then
+ serialized, from ending up as web+demo://not-a-host/
(they
+ end up as web+demo:/.//not-a-host/
).
+
+
Append the result of URL path serializing url to output. + +
If url's query is non-null, append + U+003F (?), followed by url's query, to + output. + +
If exclude fragment is false and url's fragment is + non-null, then append U+0023 (#), followed by url's fragment, to + output. + +
Return output. +
The URL path serializer takes a +URL url and then runs these steps. They return an ASCII string. + +
If url has an opaque path, then return url's + path. + +
Let output be the empty string. + +
For each segment of url's path: append + U+002F (/) followed by segment to output. + +
Return output. +
To determine whether a URL A +equals URL B, with +an optional boolean exclude fragments (default false), +run these steps: + +
Let serializedA be the result of serializing + A, with exclude fragment set to + exclude fragments. + +
Let serializedB be the result of serializing + B, with exclude fragment set to + exclude fragments. + +
Return true if serializedA is serializedB; otherwise false. +
See origin's definition in HTML for the necessary background +information. [[HTML]] + +
The origin of a URL url +is the origin returned by running these steps, switching on url's +scheme: + +
blob
"
+ If url's blob URL entry is non-null, then return + url's blob URL entry's environment's + origin. + +
Let pathURL be the result of parsing the result of + URL path serializing url. + +
If pathURL is failure, then return a new opaque origin. + +
Return pathURL's origin. + +
The origin of
+ blob:https://whatwg.org/d0360e2f-caee-469f-9a2f-87d5b0456f6f
is the
+ tuple origin ("https
", "whatwg.org
", null, null).
+
+
ftp
"
+ http
"
+ https
"
+ ws
"
+ wss
"
+ Return the tuple origin (url's scheme, + url's host, url's port, null). + +
file
"
+ Unfortunate as it is, this is left as an exercise to the reader. When in doubt, + return a new opaque origin. + +
Return a new opaque origin. + +
This does indeed mean that these URLs cannot be same origin with + themselves. +
A URL should be rendered in its serialized form, with +modifications described below, when the primary purpose of displaying a URL is to have the user make +a security or trust decision. For example, users are expected to make trust decisions based on a URL +rendered in the browser address bar. + +
Remove components that can provide opportunities for spoofing or distract from security-relevant +information: + +
Browsers may render only a URL's host in places where it is important for end
+ users to distinguish between the host and other parts of the URL such as the path.
+ Browsers may consider simplifying the host further to draw attention to its
+ registrable domain. For example, browsers may omit a leading www
or
+ m
domain label to simplify the host, or display its registrable domain
+ only to remove spoofing opportunities posted by subdomains (e.g.,
+ https://examplecorp.attacker.com/
).
+
+
Browsers should not render a URL's username and password, as they can be mistaken for a URL's host (e.g.,
+ https://examplecorp.com@attacker.example/
).
+
+
Browsers may render a URL without its scheme if the display surface only ever
+ permits a single scheme (such as a browser feature that omits https://
because it is
+ only enabled for secure origins). Otherwise, the scheme may be replaced or supplemented with a
+ human-readable string (e.g., "Not secure"), a security indicator icon, or both.
+
In a space-constrained display, URLs should be elided carefully to avoid misleading the user when +making a security decision: + +
Browsers should ensure that at least the registrable domain can be shown
+ when the URL is rendered (to avoid showing, e.g., ...examplecorp.com
when loading
+ https://not-really-examplecorp.com/
).
+
+
When the full host cannot be rendered, browsers should elide
+ domain labels starting from the lowest-level domain label. For example,
+ examplecorp.com.evil.com
should be elided as ...com.evil.com
, not
+ examplecorp.com...
. (Note that bidirectional text means that the lowest-level domain
+ label may not appear on the left.)
+
Internationalized domain names (IDNs), special characters, and bidirectional text should be +handled with care to prevent spoofing: + +
Browsers should render a URL's host by running + domain to Unicode with the URL's host and false. + +
Various characters can be used in homograph spoofing attacks. Consider detecting + confusable characters and warning when they are in use. [[IDNFAQ]] [[UTS39]] + +
URLs are particularly prone to confusion between host and path when they contain + bidirectional text, so in this case it is particularly advisable to only render a URL's + host. For readability, other parts of the URL, if rendered, should have + their sequences of percent-encoded bytes replaced with code points resulting from running + UTF-8 decode without BOM on the percent-decoding of those sequences, + unless that renders those sequences invisible. Browsers may choose to not decode certain sequences + that present spoofing risks (e.g., U+1F512 (🔒)). + +
Browsers should render bidirectional text as if it were in a left-to-right embedding. [[!BIDI]] + +
Unfortunately, as rendered URLs are strings and can appear anywhere, a + specific bidirectional algorithm for rendered URLs would not see wide adoption. + Bidirectional text interacts with the parts of a URL in ways that can cause the + rendering to be different from the model. Users of bidirectional languages can come to expect + this, particularly in plain text environments. +
application/x-www-form-urlencoded
The application/x-www-form-urlencoded
format
+provides a way to encode a list of tuples, each consisting of a name and a
+value.
+
+
The application/x-www-form-urlencoded
format is in many ways an aberrant
+monstrosity, the result of many years of implementation accidents and compromises leading to a set
+of requirements necessary for interoperability, but in no way representing good design practices. In
+particular, readers are cautioned to pay close attention to the twisted details involving repeated
+(and in some cases nested) conversions between character encodings and byte sequences. Unfortunately
+the format is in widespread use due to the prevalence of HTML forms. [[HTML]]
+
+
+
application/x-www-form-urlencoded
parsingA legacy server-oriented implementation might have to support encodings
+other than UTF-8 as well as have special logic for tuples of which the name is
+`_charset
`. Such logic is not described here as only UTF-8 is conforming.
+
+
The
+application/x-www-form-urlencoded
parser
+takes a byte sequence input, and then runs these steps:
+
+
Let sequences be the result of splitting input on + 0x26 (&). + + +
Let output be an initially empty list of name-value tuples where + both name and value hold a string. + +
For each byte sequence bytes in sequences: + +
If bytes is the empty byte sequence, then continue. + +
If bytes contains a 0x3D (=), then let + name be the bytes from the start of bytes up to but + excluding its first 0x3D (=), and let value be the + bytes, if any, after the first 0x3D (=) up to the end of + bytes. If 0x3D (=) is the first byte, then + name will be the empty byte sequence. If it is the last, then + value will be the empty byte sequence. + +
Otherwise, let name have the value of bytes + and let value be the empty byte sequence. + +
Replace any 0x2B (+) in name and value with 0x20 (SP). + +
Let nameString and valueString be the result of running UTF-8 + decode without BOM on the percent-decoding of + name and value, respectively. + +
Append (nameString, valueString) to + output. +
Return output. +
application/x-www-form-urlencoded
serializingThe
+application/x-www-form-urlencoded
serializer
+takes a list of name-value tuples tuples, with an optional encoding
+encoding (default UTF-8), and then runs these steps. They return an
+ASCII string.
+
+
Set encoding to the result of getting an output encoding from + encoding. + +
Let output be the empty string. + +
For each tuple of tuples: + +
Assert: tuple's name and tuple's value are + scalar value strings. + +
Let name be the result of running
+ percent-encode after encoding with encoding,
+ tuple's name, the
+ application/x-www-form-urlencoded
percent-encode set, and true.
+
+
Let value be the result of running
+ percent-encode after encoding with encoding, tuple's
+ value, the application/x-www-form-urlencoded
percent-encode set, and true.
+
+
If output is not the empty string, then append U+0026 (&) to + output. + +
The
+application/x-www-form-urlencoded
string parser
+takes a scalar value string input, UTF-8 encodes it, and then returns the
+result of application/x-www-form-urlencoded
parsing it.
+
+
+
+
This section uses terminology from Web IDL. Browser user agents must support this +API. JavaScript implementations should support this API. Other user agents or programming languages +are encouraged to use an API suitable to their needs, which might not be this one. [[!WEBIDL]] + + +
+[Exposed=*, + LegacyWindowAlias=webkitURL] +interface URL { + constructor(USVString url, optional USVString base); + + stringifier attribute USVString href; + readonly attribute USVString origin; + attribute USVString protocol; + attribute USVString username; + attribute USVString password; + attribute USVString host; + attribute USVString hostname; + attribute USVString port; + attribute USVString pathname; + attribute USVString search; + [SameObject] readonly attribute URLSearchParams searchParams; + attribute USVString hash; + + USVString toJSON(); +}; ++ +
A {{URL}} object has an associated: + +
To potentially strip trailing spaces from an opaque path given a {{URL}} object +url: + +
If url's URL does not have an opaque path, then + return. + +
Remove all trailing U+0020 SPACE code points from url's + URL's path. +
The
+new URL(url, base)
+constructor steps are:
+
+
Let parsedBase be null. + +
If base is given, then: + +
Let parsedBase be the result of running the basic URL parser on + base. + +
If parsedBase is failure, then throw a {{TypeError}}. +
Let parsedURL be the result of running the basic URL parser on + url with parsedBase. + +
If parsedURL is failure, then throw a {{TypeError}}. + +
Let query be parsedURL's query, if that is non-null, + and the empty string otherwise. + +
Set this's query object to a new {{URLSearchParams}} object. + +
Initialize this's query object with + query. + +
Set this's query object's URL object to + this. +
To parse a string into a URL without using a + base URL, invoke the {{URL}} constructor with a single argument: + +
+var input = "https://example.org/💩",
+ url = new URL(input)
+url.pathname // "/%F0%9F%92%A9"
+
+ This throws an exception if the input is a relative-URL string: + +
+try {
+ var url = new URL("/🍣🍺")
+} catch(e) {
+ // that happened
+}
+
+ For those cases a base URL is necessary: + +
+var input = "/🍣🍺",
+ url = new URL(input, document.baseURI)
+url.href // "https://url.spec.whatwg.org/%F0%9F%8D%A3%F0%9F%8D%BA"
+
+ A {{URL}} object can be used as a base URL (as the IDL requires a string as argument, a + {{URL}} object stringifies to its {{URL/href}} getter return value):
+ +
+var url = new URL("🏳️🌈", new URL("https://pride.example/hello-world"))
+url.pathname // "/%F0%9F%8F%B3%EF%B8%8F%E2%80%8D%F0%9F%8C%88"
+The href
getter steps and the
+toJSON()
method steps are to return the
+serialization of this's URL.
+
+
The href
setter steps are:
+
+
Let parsedURL be the result of running the basic URL parser on the given + value. + +
If parsedURL is failure, then throw a {{TypeError}}. + +
Empty this's query object's list. + +
If query is non-null, then set this's + query object's list to the result of + parsing query. +
The origin
getter steps are to return the
+serialization of this's URL's
+origin. [[!HTML]]
+
+
The protocol
getter steps are to return this's
+URL's scheme, followed by U+003A (:).
+
+
The protocol
setter steps are to
+basic URL parse the given value, followed by U+003A (:), with
+this's URL as url and
+scheme start state as state override.
+
+
The username
getter steps are to return this's
+URL's username.
+
+
The username
setter steps are:
+
+
If this's URL cannot have a username/password/port, then + return. + +
Set the username given this's URL and the given value. +
The password
getter steps are to return this's
+URL's password.
+
+
The password
setter steps are:
+
+
If this's URL cannot have a username/password/port, then + return. + +
Set the password given this's URL and the given value. +
The host
getter steps are:
+
+
If url's host is null, then return the empty string. + +
If url's port is null, return url's + host, serialized. + +
Return url's host, serialized, + followed by U+003A (:) and url's port, + serialized. +
The host
setter steps are:
+
+
If this's URL has an opaque path, then return. + +
Basic URL parse the given value with this's + URL as url and host state as + state override. +
If the given value for the host
setter lacks a
+port, this's URL's port will not
+change. This can be unexpected as host
getter does return a URL-port string so
+one might have assumed the setter to always "reset" both.
+
+
The hostname
getter steps are:
+
+
If this's URL's host is null, then return the empty + string. + +
Return this's URL's host, + serialized. +
The hostname
setter steps are:
+
+
If this's URL has an opaque path, then return. + +
Basic URL parse the given value with this's + URL as url and hostname state as + state override. +
The port
getter steps are:
+
+
If this's URL's port is null, then return the empty + string. + +
Return this's URL's port, + serialized. +
The port
setter steps are:
+
+
If this's URL cannot have a username/password/port, then + return. + +
If the given value is the empty string, then set this's URL's + port to null.
Otherwise, basic URL parse the given value with + this's URL as url and + port state as state override. +
The pathname
getter steps are to return the result of
+URL path serializing this's URL.
+
+
The pathname
setter steps are:
+
+
If this's URL has an opaque path, then return. + +
Basic URL parse the given value with this's + URL as url and path start state as + state override. +
The search
getter steps are:
+
+
The search
setter steps are:
+
+
If the given value is the empty string: + +
Set url's query to null. + +
Empty this's query object's + list. + +
Potentially strip trailing spaces from an opaque path with this. + +
Return. +
Let input be the given value with a single leading U+003F (?) removed, if any. + +
Set url's query to the empty string. + +
Basic URL parse input with url as + url and query state as + state override. + +
Set this's query object's list to the + result of parsing input. +
The {{URL/search}} setter has the potential to remove trailing U+0020 SPACE +code points from this's URL's path. It does this +so that running the URL parser on the output of running the URL serializer on +this's URL does not yield a URL that is not equal. + +
The searchParams
getter steps are to return
+this's query object.
+
+
The hash
getter steps are:
+
+
The hash
setter steps are:
+
+
If the given value is the empty string: + +
Potentially strip trailing spaces from an opaque path with this. + +
Return. +
Let input be the given value with a single leading U+0023 (#) removed, if any. + +
Basic URL parse input with this's + URL as url and fragment state as + state override. +
The {{URL/hash}} setter has the potential to change this's URL's +path in a manner equivalent to the {{URL/search}} setter. + + +
+[Exposed=*] +interface URLSearchParams { + constructor(optional (sequence<sequence<USVString>> or record<USVString, USVString> or USVString) init = ""); + + undefined append(USVString name, USVString value); + undefined delete(USVString name); + USVString? get(USVString name); + sequence<USVString> getAll(USVString name); + boolean has(USVString name); + undefined set(USVString name, USVString value); + + undefined sort(); + + iterable<USVString, USVString>; + stringifier; +}; ++ +
Constructing and stringifying a {{URLSearchParams}} object is fairly straightforward: + +
+let params = new URLSearchParams({key: "730d67"})
+params.toString() // "key=730d67"
+As a {{URLSearchParams}} object uses the application/x-www-form-urlencoded
+ format underneath there are some difference with how it encodes certain code points compared to a
+ {{URL}} object (including {{URL/href}} and {{URL/search}}). This can be especially surprising when
+ using {{URL/searchParams}} to operate on a URL's query.
+
+
+const url = new URL('https://example.com/?a=b ~');
+console.log(url.href); // "https://example.com/?a=b%20~"
+url.searchParams.sort();
+console.log(url.href); // "https://example.com/?a=b+%7E"
+
+
+const url = new URL('https://example.com/?a=~&b=%7E');
+console.log(url.search); // "?a=~&b=%7E"
+console.log(url.searchParams.get('a')); // "~"
+console.log(url.searchParams.get('b')); // "~"
+
+ {{URLSearchParams}} objects will percent-encode anything in the
+ application/x-www-form-urlencoded
percent-encode set, and will encode
+ U+0020 SPACE as U+002B (+).
+
+
Ignoring encodings (use UTF-8), {{URL/search}} will percent-encode anything in the + query percent-encode set or the special-query percent-encode set (depending on + whether or not the URL is special). +
A {{URLSearchParams}} object has an associated: + +
A {{URLSearchParams}} object with a non-null URL object has +the potential to change that object's path in a manner equivalent to the {{URL}} +object's {{URL/search}} and {{URL/hash}} setters. + +
To initialize a +{{URLSearchParams}} object query with init, run these steps: + +
If init is a sequence, then for each innerSequence + of init: + +
+ +Otherwise, if init is a record, then for each + name → value of init, append (name, + value) to query's list. + +
Otherwise: + +
+To update a {{URLSearchParams}} +object query, run these steps: + +
If query's URL object is null, then return. + +
Let serializedQuery be the serialization of + query's list. + +
If serializedQuery is the empty string, then set serializedQuery to + null. + +
Set query's URL object's URL's + query to serializedQuery. + +
If serializedQuery is null, then + potentially strip trailing spaces from an opaque path with query's + URL object. +
The
+new URLSearchParams(init)
+constructor steps are:
If init is a string and starts with U+003F (?), then remove the first code point + from init. + +
Initialize this with init. +
The append(name, value)
+method steps are:
+
+
The delete(name)
method steps are:
+
+
The get(name)
method steps are to
+return the value of the first tuple whose name is name in this's
+list, if there is such a tuple; otherwise null.
+
+
The getAll(name)
method steps are
+to return the values of all tuples whose name is name in this's
+list, in list order; otherwise the empty sequence.
+
+
The has(name)
method steps are to
+return true if there is a tuple whose name is name in this's
+list; otherwise false.
+
+
The set(name, value)
+method steps are:
+
+
If this's list contains any + tuples whose name is name, then set the value of the first such + tuple to value and remove the others. + +
It can be useful to sort the name-value tuples in a {{URLSearchParams}} object, in particular to + increase cache hits. This can be accomplished through invoking the + {{URLSearchParams/sort()}} method: + +
+const url = new URL("https://example.org/?q=🏳️🌈&key=e1f7bc78");
+url.searchParams.sort();
+url.search; // "?key=e1f7bc78&q=%F0%9F%8F%B3%EF%B8%8F%E2%80%8D%F0%9F%8C%88"
+
+ To avoid altering the original input, e.g., for comparison purposes, construct a new + {{URLSearchParams}} object: + +
+const sorted = new URLSearchParams(url.search)
+sorted.sort()
+The sort()
method steps are:
+
+
Sort all tuples in this's list, if any, by + their names. Sorting must be done by comparison of code units. The relative order between + tuples with equal names must be preserved. + +
The value pairs to iterate over are this's list's +tuples with the key being the name and the value being the value. + +
The stringification behavior steps are to return the +serialization of this's list. + + +
A standard that exposes URLs, should expose the URL as a string (by +serializing an internal URL). A standard should not expose a +URL using a {{URL}} object. {{URL}} objects are meant for URL +manipulation. In IDL the USVString type should be used. + +
The higher-level notion here is that values are to be exposed as immutable data +structures. + +
If a standard decides to use a variant of the name "URL" for a feature it defines, it should name +such a feature "url" (i.e., lowercase and with an "l" at the end). Names such as "URL", "URI", and +"IRI" should not be used. However, if the name is a compound, "URL" (i.e., uppercase) is preferred, +e.g., "newURL" and "oldURL". + +
The {{EventSource}} and {{HashChangeEvent}} interfaces in HTML are +examples of proper naming. [[HTML]] + + + +
There have been a lot of people that have helped make URLs more interoperable over +the years and thereby furthered the goals of this standard. Likewise many people have helped making +this standard what it is today. + +
With that, many thanks to +100の人, +Adam Barth, +Addison Phillips, +Adrián Chaves, +Albert Wiersch, +Alex Christensen, +Alexis Hunt, +Alexandre Morgaut, +Alexis Hunt, +Alwin Blok, +Andrew Sullivan, +Arkadiusz Michalski, +Behnam Esfahbod, +Bobby Holley, +Boris Zbarsky, +Brad Hill, +Brandon Ross, +Chris Dumez, +Chris Rebert, +Corey Farwell, +Dan Appelquist, +Daniel Bratell, +Daniel Stenberg, +David Burns, +David Håsäther, +David Sheets, +David Singer, +David Walp, +Domenic Denicola, +Emily Schechter, +Emily Stark, +Eric Lawrence, +Erik Arvidsson, +Gavin Carothers, +Geoff Richards, +Glenn Maynard, +Gordon P. Hemsley, +Henri Sivonen, +Ian Hickson, +Ilya Grigorik, +Italo A. Casas, +Jakub Gieryluk, +James Graham, +James Manger, +James Ross, +Jeff Hodges, +Jeffrey Posnick, +Jeffrey Yasskin, +Joe Duarte, +Joshua Bell, +Jxck, +Karl Wagner, +田村健人 (Kent TAMURA), +Kevin Grandon, +Kornel Lesiński, +Larry Masinter, +Leif Halvard Silli, +Mark Amery, +Mark Davis, +Marcos Cáceres, +Marijn Kruisselbrink, +Martin Dürst, +Mathias Bynens, +Matt Falkenhagen, +Matt Giuca, +Michael Peick, +Michael™ Smith, +Michal Bukovský, +Michel Suignard, +Mikaël Geljić, +Noah Levitt, +Peter Occil, +Philip Jägenstedt, +Philippe Ombredanne, +Prayag Verma, +Rimas Misevičius, +Robert Kieffer, +Rodney Rehm, +Roy Fielding, +Ryan Sleevi, +Sam Ruby, +Sam Sneddon, +Santiago M. Mola, +Sebastian Mayr, +Simon Pieters, +Simon Sapin, +Steven Vachon, +Stuart Cook, +Sven Uhlig, +Tab Atkins, +吉野剛史 (Takeshi Yoshino), +Tantek Çelik, +Tiancheng "Timothy" Gu, +Tim Berners-Lee, +簡冠庭 (Tim Guan-tin Chien), +Titi_Alone, +Tomek Wytrębowicz, +Trevor Rowbotham, +Tristan Seligmann, +Valentin Gosu, +Vyacheslav Matva, +Wei Wang, +Wolf Lammen, +山岸和利 (Yamagishi Kazutoshi), +Yongsheng Zhang, +成瀬ゆい (Yui Naruse), and +zealousidealroll +for being awesome! + +
This standard is written by Anne van Kesteren
+(Apple, annevk@annevk.nl).
diff --git a/url.bs b/url.bs
index b8954e48..18cb972b 100644
--- a/url.bs
+++ b/url.bs
@@ -3,7 +3,7 @@ Group: WHATWG
H1: URL
Shortname: url
Text Macro: TWITTER urlstandard
-Text Macro: LATESTRD 2022-08
+Text Macro: LATESTRD 2023-02
Abstract: The URL Standard defines URLs, domains, IP addresses, the application/x-www-form-urlencoded
format, and their API.
Translation: ja https://triple-underscore.github.io/URL-ja.html
Required IDs: application/x-www-form-urlencoded,urlencoded-parsing