diff --git a/encoding.bs b/encoding.bs index 43d7e5d..b25b880 100644 --- a/encoding.bs +++ b/encoding.bs @@ -15,6 +15,8 @@ Translate IDs: dictdef-textdecoderoptions textdecoderoptions,dictdef-textdecodeo spec:infra; type:dfn; text:code point text:ascii case-insensitive +spec:streams; + type:interface; text:ReadableStream @@ -1038,36 +1040,26 @@ function decodeArrayOfStrings(buffer, encoding) { -

Interface {{TextDecoder}}

+

Interface mixin {{TextDecoderCommon}}

-dictionary TextDecoderOptions {
-  boolean fatal = false;
-  boolean ignoreBOM = false;
-};
-
-dictionary TextDecodeOptions {
-  boolean stream = false;
-};
-
-[Constructor(optional DOMString label = "utf-8", optional TextDecoderOptions options),
- Exposed=(Window,Worker)]
-interface TextDecoder {
+interface mixin TextDecoderCommon {
   readonly attribute DOMString encoding;
   readonly attribute boolean fatal;
   readonly attribute boolean ignoreBOM;
-  USVString decode(optional BufferSource input, optional TextDecodeOptions options);
-};
+}; + -

A {{TextDecoder}} object has an associated encoding, -decoder, stream, -ignore BOM flag (initially unset), -BOM seen flag (initially unset), -error mode (initially "replacement"), and -do not flush flag (initially unset). +

The {{TextDecoderCommon}} interface mixin defines common attributes that are shared between +{{TextDecoder}} and {{TextDecoderStream}} objects. These objects have an associated +encoding, +ignore BOM flag (initially unset), +BOM seen flag (initially unset), and +error mode (initially +"replacement"). -

A {{TextDecoder}} object also has an associated -serialize stream algorithm, that given a +

These objects also have an associated +serialize stream algorithm, that given a stream stream, runs these steps:

    @@ -1077,18 +1069,18 @@ interface TextDecoder {

    While true:

      -
    1. Let token be the result of - reading from stream. +

    2. Let token be the result of reading from stream.

    3. -

      If encoding is UTF-8, UTF-16BE, or UTF-16LE, and - ignore BOM flag and BOM seen flag are unset, then: +

      If encoding is UTF-8, UTF-16BE, or + UTF-16LE, and ignore BOM flag and + BOM seen flag are unset, then:

        -
      1. If token is U+FEFF, then set BOM seen flag. +

      2. If token is U+FEFF, then set BOM seen flag.

      3. Otherwise, if token is not end-of-stream, then set - BOM seen flag and append token to output. + BOM seen flag and append token to output.

      4. Otherwise, return output.

      @@ -1106,6 +1098,44 @@ control.
      +

      The encoding +attribute's getter, when invoked, must return this object's encoding's +name in ASCII lowercase. + +

      The fatal +attribute's getter, when invoked, must return true if this object's +error mode is "fatal", and false otherwise. + +

      The +ignoreBOM +attribute's getter, when invoked, must return true if this object's +ignore BOM flag is set, and false otherwise. + + +

      Interface {{TextDecoder}}

      + +
      +dictionary TextDecoderOptions {
      +  boolean fatal = false;
      +  boolean ignoreBOM = false;
      +};
      +
      +dictionary TextDecodeOptions {
      +  boolean stream = false;
      +};
      +
      +[Constructor(optional DOMString label = "utf-8", optional TextDecoderOptions options),
      + Exposed=(Window,Worker)]
      +interface TextDecoder {
      +  USVString decode(optional BufferSource input, optional TextDecodeOptions options);
      +};
      +TextDecoder includes TextDecoderCommon;
      +
      + +

      A {{TextDecoder}} object has an associated decoder, +stream, and do not flush flag (initially +unset). +

      decoder = new TextDecoder([label = "utf-8" [, options]])
      @@ -1115,20 +1145,21 @@ control. throws a {{RangeError}}. -
      decoder . encoding -

      Returns encoding's name, lowercased. +

      decoder . encoding +

      Returns encoding's name, lowercased. -

      decoder . fatal -

      Returns true if error mode is "fatal", and false - otherwise. +

      decoder . fatal +

      Returns true if error mode is "fatal", and + false otherwise. -

      decoder . ignoreBOM -

      Returns true if ignore BOM flag is set, and false otherwise. +

      decoder . ignoreBOM +

      Returns true if ignore BOM flag is set, and false + otherwise.

      decoder . decode([input [, options]])
      -

      Returns the result of running encoding's decoder. The - method can be invoked zero or more times with options's stream set to +

      Returns the result of running encoding's decoder. + The method can be invoked zero or more times with options's stream set to true, and then once without options's stream (or set to false), to process a fragmented stream. If the invocation without options's stream (or set to false) has no input, it's clearest to omit both arguments. @@ -1140,9 +1171,9 @@ while(buffer = next_chunk()) { } string += decoder.decode(); // end-of-stream -

      If the error mode is "fatal" and - encoding's decoder returns error, throws a - {{TypeError}}. +

      If the error mode is "fatal" and + encoding's decoder returns error, + throws a {{TypeError}}.

      The @@ -1156,33 +1187,25 @@ constructor, when invoked, must run these steps:

    4. Let dec be a new {{TextDecoder}} object. -

    5. Set dec's encoding to encoding. +

    6. Set dec's encoding to encoding.

    7. If options's fatal member is true, then set dec's - error mode to "fatal". + error mode to "fatal".

    8. If options's ignoreBOM member is true, then set dec's - ignore BOM flag. + ignore BOM flag.

    9. Return dec.

    -

    The encoding attribute's getter must return -encoding's name in ASCII lowercase. - -

    The fatal attribute's getter must return true -if error mode is "fatal", and false otherwise. - -

    The ignoreBOM attribute's getter must return -true if ignore BOM flag is set, and false otherwise. -

    The decode(input, options) method, when invoked, must run these steps:

    1. If the do not flush flag is unset, set decoder - to a new encoding's decoder, set stream - to a new stream, and unset the BOM seen flag. + to a new encoding's decoder, set + stream to a new stream, and unset the + BOM seen flag.

    2. If options's stream is true, set the do not flush flag, and unset the do not flush flag @@ -1207,7 +1230,8 @@ method, when invoked, must run these steps:

    3. If token is end-of-stream and the do not flush flag - is set, then return output, serialized. + is set, then return output, + serialized.

      The way streaming works is to not handle end-of-stream here when the do not flush flag is set and to not unset that flag. That way in a @@ -1220,10 +1244,10 @@ method, when invoked, must run these steps:

      1. Let result be the result of processing token for decoder, stream, output, and - error mode. + error mode.

      2. If result is finished, then return output, - serialized. + serialized.

      3. Otherwise, if result is error, then throw a {{TypeError}}. @@ -1231,6 +1255,20 @@ method, when invoked, must run these steps:

    +

    Interface mixin {{TextEncoderCommon}}

    + +
    +interface mixin TextEncoderCommon {
    +  readonly attribute DOMString encoding;
    +};
    +
    + +

    The {{TextEncoderCommon}} interface mixin defines common attributes that are shared between +{{TextEncoder}} and {{TextEncoderStream}} objects. + +

    The encoding +attribute's getter, when invoked, must return "utf-8". +

    Interface {{TextEncoder}}

    @@ -1238,9 +1276,10 @@ method, when invoked, must run these steps: [Constructor, Exposed=(Window,Worker)] interface TextEncoder { - readonly attribute DOMString encoding; [NewObject] Uint8Array encode(optional USVString input = ""); -}; +}; +TextEncoder includes TextEncoderCommon; +

    A {{TextEncoder}} object has an associated encoder. @@ -1254,7 +1293,7 @@ requires buffering of scalar values.

    encoder = new TextEncoder()

    Returns a new {{TextEncoder}} object. -

    encoder . encoding +
    encoder . encoding

    Returns "utf-8".

    encoder . encode([input = ""]) @@ -1272,9 +1311,6 @@ constructor, when invoked, must run these steps:
  1. Return enc.

-

The encoding attribute's getter must return -"utf-8". -

The encode(input) method, when invoked, must run these steps: @@ -1305,6 +1341,396 @@ must run these steps: +

Interface mixin {{GenericTransformStream}}

+ +

The {{GenericTransformStream}} interface mixin represents the concept of a +transform stream in IDL. It is not a {{TransformStream}}, though it has the same interface +and it delegates to one. + +

+interface mixin GenericTransformStream {
+  readonly attribute ReadableStream readable;
+  readonly attribute WritableStream writable;
+};
+
+ +

An object that includes {{GenericTransformStream}} has an associated +transform of type {{TransformStream}}. + +

The readable attribute's getter, +when invoked, must return this object's transform.\[[readable]]. + +

The writable attribute's getter, +when invoked, must return this object's transform.\[[writable]]. + + +

Interface {{TextDecoderStream}}

+ +
+[Constructor(optional DOMString label = "utf-8", optional TextDecoderOptions options),
+ Exposed=(Window,Worker)]
+interface TextDecoderStream {
+};
+TextDecoderStream includes TextDecoderCommon;
+TextDecoderStream includes GenericTransformStream;
+
+ +

A {{TextDecoderStream}} object has an associated +decoder, and stream. + +

+
decoder = new + TextDecoderStream([label = + "utf-8" [, options]]) +
+

Returns a new {{TextDecoderStream}} object. +

If label is either not a label or is a label for replacement, + throws a {{RangeError}}. + +

decoder . encoding +

Returns encoding's name, lowercased. + +

decoder . fatal +

Returns true if error mode is "fatal", and + false otherwise. + +

decoder . ignoreBOM +

Returns true if ignore BOM flag is set, and false + otherwise. + +

decoder . readable +
+

Returns a readable stream whose chunks are strings resulting from running + encoding's decoder on the chunks written to + {{GenericTransformStream/writable}}. + +

decoder . writable +
+

Returns a writable stream which accepts {{BufferSource}} chunks and runs them through + encoding's decoder before making them available to + {{GenericTransformStream/readable}}. + +

Typically this will be used via the {{ReadableStream/pipeThrough()}} method on a + {{ReadableStream}} source. + +


+var decoder = new TextDecoderStream(encoding);
+byteReadable
+  .pipeThrough(decoder)
+  .pipeTo(textWritable);
+ +

If the error mode is "fatal" and + encoding's decoder returns error, both + {{GenericTransformStream/readable}} and {{GenericTransformStream/writable}} will be errored with a + {{TypeError}}. +

+ +

The +TextDecoderStream(label, +options) constructor, when invoked, must run these steps: + +

    +
  1. Let encoding be the result of getting an encoding from label. + +

  2. If encoding is failure or replacement, then throw a {{RangeError}}. + +

  3. Let dec be a new {{TextDecoderStream}} object. + +

  4. Set dec's encoding to encoding. + +

  5. If options's fatal member is true, then set dec's + error mode to "fatal". + +

  6. If options's ignoreBOM member is true, then set dec's + ignore BOM flag. + +

  7. +

    Set dec's decoder to a new decoder + for dec's encoding, and set dec's + stream to a new stream. + +

  8. Let startAlgorithm be an algorithm that takes no arguments and returns nothing. + +

  9. Let transformAlgorithm be an algorithm which takes a chunk argument + and runs the decode and enqueue a chunk algorithm with dec and + chunk. + +

  10. Let flushAlgorithm be an algorithm which takes no arguments and runs the flush + and enqueue algorithm with dec. + +

  11. Let transform be the result of calling + CreateTransformStream(startAlgorithm, transformAlgorithm, + flushAlgorithm). + +

  12. Set dec's transform to transform. + +

  13. Return dec. +

+ +

The decode and enqueue a chunk algorithm, given a {{TextDecoderStream}} object +dec and a chunk, runs these steps: + +

    +
  1. Let bufferSource be the result of + converting chunk to a {{BufferSource}}. If this + throws an exception, then return a promise rejected with that exception. + +

  2. Push a copy of bufferSource to + dec's stream. If this throws an exception, then return a + promise rejected with that exception. + +

  3. Let controller be dec's + transform.\[[transformStreamController]]. + +

  4. Let output be a new stream. + +

  5. +

    While true, run these steps: + +

      +
    1. Let token be the result of reading from dec's + stream. + +

    2. +

      If token is end-of-stream, run these steps: +

        +
      1. Let outputChunk be output, + serialized. + +

      2. if outputChunk is non-empty, call + TransformStreamDefaultControllerEnqueue(controller, + outputChunk). + +

      3. Return a new promise resolved with undefined. +

      + +
    3. Let result be the result of processing token for + dec's decoder, dec's + stream, output, and dec's + error mode. + +

    4. If result is error, then return a new promise rejected with a + {{TypeError}} exception. +

    +
+ +

The flush and enqueue algorithm, which handles the end of data from the input +{{ReadableStream}} object, given a {{TextDecoderStream}} object dec, runs these steps: + +

    +
  1. Let output be a new stream. + +

  2. Let result be the result of processing end-of-stream for + dec's decoder and dec's + stream, output, and dec's + error mode. + +

  3. If result is finished, run these steps: +

      +
    1. Let outputChunk be output, + serialized. + +

    2. Let controller be dec's + transform.\[[transformStreamController]]. + +

    3. If outputChunk is non-empty, call + TransformStreamDefaultControllerEnqueue(controller, + outputChunk). + +

    4. Return a new promise resolved with undefined. +

    + +
  4. Otherwise, return a new promise rejected with a {{TypeError}} exception. +

+ + +

Interface {{TextEncoderStream}}

+ +
+[Constructor,
+ Exposed=(Window,Worker)]
+interface TextEncoderStream {
+};
+TextEncoderStream includes TextEncoderCommon;
+TextEncoderStream includes GenericTransformStream;
+
+ +

A {{TextEncoderStream}} object has an associated encoder, +and pending high surrogate (initially null). + +

A {{TextEncoderStream}} object offers no label argument as it +only supports UTF-8. + +

+
encoder = new TextEncoderStream() +

Returns a new {{TextEncoderStream}} object. + +

encoder . encoding +

Returns "utf-8". + +

encoder . readable +
+

Returns a readable stream whose chunks are {{Uint8Array}}s resulting from running + UTF-8's encoder on the chunks written to {{GenericTransformStream/writable}}. + +

encoder . writable +
+

Returns a writable stream which accepts string chunks and runs them through + UTF-8's encoder before making them available to + {{GenericTransformStream/readable}}. + +

Typically this will be used via the {{ReadableStream/pipeThrough()}} method on a + {{ReadableStream}} source. + +


+textReadable
+  .pipeThrough(new TextEncoderStream())
+  .pipeTo(byteWritable);
+
+ +

The +TextEncoderStream() +constructor, when invoked, must run these steps: + +

    +
  1. Let enc be a new {{TextEncoderStream}} object. + +

  2. Set enc's encoder to UTF-8's + encoder. + +

  3. Let startAlgorithm be an algorithm that takes no arguments and returns nothing. + +

  4. Let transformAlgorithm be an algorithm which takes a chunk argument + and runs the encode and enqueue a chunk algorithm with enc and chunk. + +

  5. Let flushAlgorithm be an algorithm which runs the encode and flush + algorithm with enc. + +

  6. Let transform be the result of calling + CreateTransformStream(startAlgorithm, transformAlgorithm, + flushAlgorithm). + +

  7. Set enc's transform to transform. + +

  8. Return enc. +

+ +
+ +

The encode and enqueue a chunk algorithm, given a {{TextEncoderStream}} object +enc and chunk, runs these steps: + +

    +
  1. Let input be the result of converting + chunk to a {{DOMString}}. If this throws an exception, then return a promise rejected + with that exception. + +

    {{DOMString}} is used here so that a surrogate pair that is split between chunks can + be reassembled into the appropriate scalar value. The behavior is otherwise identical to + {{USVString}}. In particular, lone surrogates will be replaced with U+FFFD. + +

  2. Convert input to a stream. + +

  3. Let output be a new stream. + +

  4. Let controller be enc's + transform.\[[transformStreamController]]. + +

  5. +

    While true, run these steps: + +

      +
    1. Let token be the result of reading from input. + +

    2. +

      If token is end-of-stream, run these steps: + +

        +
      1. Convert output into a byte sequence. + +

      2. +

        If output is non-empty, run these steps: + +

          +
        1. Let chunk be a {{Uint8Array}} object wrapping an {{ArrayBuffer}} containing + output. + +

        2. Call TransformStreamDefaultControllerEnqueue(controller, + chunk). +

        + +
      3. Return a new promise resolved with undefined. +

      + +
    3. Let result be the result of executing the convert code unit to scalar + value algorithm with enc, token and input. + +

    4. If result is not continue, then process result for + encoder, input, output. + +

    +
+ +

The convert code unit to scalar value algorithm, given a {{TextEncoderStream}} object +enc, token, and stream input, runs these steps: + +

    +
  1. +

    If enc's pending high surrogate is non-null, run these steps: + +

      +
    1. Let high surrogate be enc's pending high surrogate. + +

    2. Set enc's pending high surrogate to null. + +

    3. If token is in the range U+DC00 to U+DFFF, inclusive, then return a code point + whose value is 0x10000 + ((high surrogate − 0xD800) << 10) + + (token − 0xDC00). + +

    4. Prepend token to input. + +

    5. Return U+FFFD. +

    + +
  2. If token is in the range U+D800 to U+DBFF, inclusive, then set pending high + surrogate to token and return continue. + +

  3. If token is in the range U+DC00 to U+DFFF, inclusive, then return U+FFFD. + +

  4. Return token. +

+ +

This is equivalent to the "convert a JavaScript string into a scalar +value string" algorithm from the Infra Standard, but allows for surrogate pairs that are split +between strings. [[!INFRA]] + +

The encode and flush algorithm, given a {{TextEncoderStream}} object enc, +runs these steps: + +

    +
  1. +

    If enc's pending high surrogate is non-null, run these steps: + +

      +
    1. Let controller be enc's + transform.\[[transformStreamController]]. + +

    2. +

      Let output be the byte sequence 0xEF 0xBF 0xBD. + +

      This is the replacement character U+FFFD encoded as UTF-8. + +

    3. Let chunk be a {{Uint8Array}} object wrapping an {{ArrayBuffer}} containing + output. + +

    4. Call TransformStreamDefaultControllerEnqueue(controller, + chunk). +

    + +
  2. Return a new promise resolved with undefined. +

+ +

The encoding

@@ -2747,6 +3173,7 @@ Mark Crispin, Mark Davis, Martin Dürst, Masatoshi Kimura, +Mattias Buelens, Ms2ger, Nigel Megitt, Nigel Tao,