Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse from raw bytes #155

Open
mkmik opened this issue May 5, 2020 · 2 comments · May be fixed by #156
Open

Parse from raw bytes #155

mkmik opened this issue May 5, 2020 · 2 comments · May be fixed by #156

Comments

@mkmik
Copy link

mkmik commented May 5, 2020

YAML 1.2 spec all valid encodings are listed:

On input, a YAML processor must support the UTF-8 and UTF-16 character encodings. For JSON compatibility, the UTF-32 encodings must also be supported.

If a character stream begins with a byte order mark, the character encoding will be taken to be as as indicated by the byte order mark. Otherwise, the stream must begin with an ASCII character. This allows the encoding to be deduced by the pattern of null (#x00) characters.

IIUC the main loader API is load_from_str which takes a rust str which is a unicode string whose internal representation is UTF-8.

There are many ways such a string can be loaded from external input such as file, network, etc; some of them require the users to specify which encodings to support, while others might only support UTF-8)

Callers of the yaml-rust API might not be aware of the subtleties of the YAML-1.2 spec w.r.t allowed input encodings, and hence might decide to load the external YAML file using an UTF-8 decoder (because "everybody is using UTF-8, right?"). The resulting application will thus not accept all valid YAML-1.2 input byte streams.

If the yaml-rust library offered an API that accepts the raw input stream instead of a pre-decoded string, then the user would delegate the library the task of dealing with the gory details of the encodings.

For this to work, the API must be natural to use with files and other input streams and compete with the simplicity and terseness of:

let s = fs::read_to_string(filename).unwrap();
let docs = YamlLoader::load_from_str(s).unwrap();

I'm a rust noob so I won't detail a proposal here. No idea if the Read trait is idiomatic in those cases etc.

The underlying parser expects an Iterator<Item = char> , so anything that can produce such a thing should do.

mkmik added a commit to mkmik/yaml-rust that referenced this issue May 5, 2020
Closes chyh1990#155

Also helps in some cases with chyh1990#142, when the BOM is at the beginning of the file (common),
but not in corner case where the BOM is at the start of a document which is not the first one.
@mkmik mkmik linked a pull request May 5, 2020 that will close this issue
@mkmik
Copy link
Author

mkmik commented May 5, 2020

Blocked on #139

mkmik added a commit to mkmik/yaml-rust that referenced this issue May 7, 2020
Closes chyh1990#155

Also helps in some cases with chyh1990#142, when the BOM is at the beginning of the file (common),
but not in corner case where the BOM is at the start of a document which is not the first one.
@XVilka
Copy link
Contributor

XVilka commented Jul 29, 2020

mkmik added a commit to mkmik/yaml-rust that referenced this issue Jul 29, 2020
Closes chyh1990#155

Also helps in some cases with chyh1990#142, when the BOM is at the beginning of the file (common),
but not in corner case where the BOM is at the start of a document which is not the first one.
mkmik added a commit to mkmik/yaml-rust that referenced this issue Jul 30, 2020
Closes chyh1990#155

Also helps in some cases with chyh1990#142, when the BOM is at the beginning of the file (common),
but not in corner case where the BOM is at the start of a document which is not the first one.
Ethiraric pushed a commit to Ethiraric/yaml-rust2 that referenced this issue Mar 19, 2024
Also helps in some cases with chyh1990#142, when the BOM is at the beginning of
the file (common), but not in corner case where the BOM is at the start
of a document which is not the first one.

Closes: chyh1990#155
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants