-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding postion()
API to Reader (#654)
#657
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, thanks for the PR! Some minor questions/comments below.
src/IonReader.ts
Outdated
* @returns a [[number]] type presenting the position of the character the reading is | ||
* currently reading. | ||
*/ | ||
position(): number | null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few thoughts here:
-
What does it mean for a Reader's
position
to benull
? -
The
number
returned byposition()
needs to make sense for both text and binary Readers. The comment describes only the meaning for a text reader. -
Unfortunately, the definition of a "character" is ambiguous in Unicode terms. It could refer to a code point, code unit, glyph or grapheme cluster (among other possibilities). I think our best option here is to refer to code units:
A code unit is the unit of storage of a part of an encoded code point. In UTF-8 this means 8-bits, in UTF-16 this means 16-bits. A single code unit may represent a full code point, or part of a code point. For example, the snowman glyph (☃) is a single code point but 3 UTF-8 code units, and 1 UTF-16 code unit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the input!
- for default value, I took a second look, and actually once the
TextReader
/BinaryReader
is initialized, theposition
will be initialized as 0 inStringSpan
/BinarySpan
. So I guess thenull
actually doesn't make sense. - Comment updated, how about the below:
position(): number | null; | |
/** | |
* Return the position of the current reader. | |
* The position refers to the distance between the code units where the reader | |
* stared (e.g. the first code unit of the file), and the current code unit the | |
* reader is reading. | |
* | |
* A code unit is the unit of storage of a part of an encoded code point. | |
* Ref: https://stackoverflow.com/a/27331885/109549 | |
* | |
* @returns a [[number]] type presenting the position of the code unit the reader is | |
* currently reading. | |
*/ | |
position(): number; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The doc comment is an improvement, but we still need a couple of tweaks.
- Because the
Reader
interface is used for both text and binary, we need to explain whatposition()
means when theReader
is a binary reader. Namely, that it will be the byte offset from the start of the input rather than the code unit offset. - Because code units are different in different unicode encodings (UTF8 code units are 1-4 bytes, UTF16 are 2 bytes), we should specify that this method returns the number of UTF-16 code units, the encoding used by JavaScript strings.
- Even though it's a good explanation, I'd like to avoid using a StackOverflow link in the public documentation. We can point to JavaScript's String length documentation, however. We could also refer users to chapter 2.5 of the Unicode Standard, "Encoding Forms", but that's a bit dense for casual reference.
- We should warn users that
Reader
s cannot safely skip to a given position in the stream and begin reading as they may be skipping over system values like symbol definitions.
What do you think of:
/**
* Returns the Reader's offset from the beginning of its input.
*
* For binary Readers, the return value is the number of bytes that have
* been processed.
*
* For text Readers, the return value is the number of UTF-16 code units
* that have been processed, regardless of the input's original encoding.
* For more on JavaScript's in-memory representation of text, see:
* https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/length#Description
*
* Note that a Reader cannot safely skip to a given position in input without
* processing the stream leading up to that position. This is because there are
* mid-stream system level values that must be processed to guarantee that the
* Reader is in a valid state. It is safe, however, to start at the beginning of a data
* source and call next() until you reach the desired position, as the reader
* will still have the opportunity to process system-level values along the way.)
*
* @returns the [[number]] of bytes or UTF-16 code units that the reader has processed.
position(): number;
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed reply! This looks a lot better!
I didn't think that much when I started out, and referring to StackOverflow was really a miss 🤦🏻♂️
I will update the commit with the latest comment.
Ok, looking good! One other thing that I should've mentioned before (sorry! 😞): we need a couple of unit tests to prove that this works like we think it does. Could you add some tests to
|
Tests added! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! One last round of small cleanups.
Co-authored-by: Zack Slayton <[email protected]>
Picked all suggestions! 👍🏼 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks for contributing!
Hi team, can we know when this change can be released? Thanks! |
Issue #, if available:
#654
Description of changes:
(Copied from issue description)
Hi,
Description
As a user of ion-js, I would love to be able to get position information from ionReader, so that when using ionReader to read a ion file, I can know which position I'm currently at.
Application scenario
Our team uses ion to define a format, e.g. paragraph
P
uses certain format styleS
, and the definition of styleS
is out side of the paragraph itself. see blow:I'm working on a vs code extension: ion-style-peek, so that in the above ion file, when users are viewing paragraph
A
and callgo to definition
onS
, the editor will jump to styleS
. Similar implementation can be found: https://github.com/pranaygp/vscode-css-peekTo implement the above, we need to get the position when parsing style in the ion, hence the requesting issue.
We can get position information from the below two paths:
Thanks!
Ethan
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.