Add setOrigRanges() to YAML.parseCST() output #31

eemeli · 2018-08-14T23:39:39Z

This (hopefully) fixes #20 by adding a method setOrigRanges() to the array object returned by YAML.parseCST(). That method would set origStart and origEnd indices for all of the range objects contained within the CST, which in turn would point to the start and end of the source before the CR characters got stripped out.

@ikatyang, comment from you would be appreciated here esp. regarding the specifics of the API, as you would have an actual use case for this. In particular:

I'm not sure about the new method and member names, and would be interested to consider alternatives.
The current setOrigRanges() implementation will return a boolean indicating whether it did in fact add the ranges, to allow for a fast return for inputs that did not contain a \r\n string. Does this make its usage more difficult, as it's then not certain that origStart and origEnd are always set after calling?
If a range end is pointing at the \n character that in the source was preceded by a \r, the corresponding origEnd will then point at the \r character to maintain the same semantic meaning as before. This has particular significance for valueRange values, as this makes sure that a slice of the source using the range does not get a somewhat surprising \r suffix if origEnd is left pointing at the \n.
The Range class now has a couple of utility methods, apply(src) and applyOrig(src), for slicing the range contents from the source. Should these be published and documented? Atm just applyOrig is used, and even then in just one test case.

I did try a couple of more automated methods of handling all of this, e.g. mutating the range start and end values automatically after parsing to match what's now origX, but that messes up the value parsing and actually breaks YAML spec a bit. I also considered calling setOrigRanges() internally before returning from parseCST(), but that would make the presence or non-presence of origX values unclear, as well as adding a useless tree traversal for use cases that don't need the indices into the original source.

…) & setOrigRange()

ikatyang · 2018-08-15T03:31:33Z

I'm not sure about the new method and member names, and would be interested to consider alternatives.

I have no opinion on it. 😅

The current setOrigRanges() implementation will return a boolean indicating whether it did in fact add the ranges, to allow for a fast return for inputs that did not contain a \r\n string. Does this make its usage more difficult, as it's then not certain that origStart and origEnd are always set after calling?

I think it's fine either way since it's just a matter of range.origStart || range.start, writing docs for it would be helpful. ~~(It seems this change only affect valueRanges, shouldn't it also affect ranges?)~~

If a range end is pointing at the \n character that in the source was preceded by a \r, the corresponding origEnd will then point at the \r character to maintain the same semantic meaning as before. This has particular significance for valueRange values, as this makes sure that a slice of the source using the range does not get a somewhat surprising \r suffix if origEnd is left pointing at the \n.

👍

The Range class now has a couple of utility methods, apply(src) and applyOrig(src), for slicing the range contents from the source. Should these be published and documented? Atm just applyOrig is used, and even then in just one test case.

I actually won't use this functionality so I'm not sure, the original start/end is not suitable for my use case (e.g., PLAIN usually contains a lot of trailing whitespaces) and I have to adjust it then apply slicing.

ikatyang · 2018-08-15T02:48:38Z

src/cst/parse.js

+    for (let i = 1; i < cr.length; ++i) cr[i] -= i
+    let crOffset = 0
+    for (let i = 0; i < documents.length; ++i) {
+      documents[i].setOrigRanges(cr, crOffset)


crOffset = documents[i].setOrigRanges(cr, crOffset)? If so, we should add a test for it.

Oops. And yes, this should indeed be tested.

eemeli · 2018-08-15T21:57:49Z

It seems this change only affect valueRanges, shouldn't it also affect ranges?

No, all ranges should get origX values. Also including all props and even the BlockValue header.

Docs indeed will need to be updated for this change. And it'd probably be better to completely drop the apply and applyOrig methods, as they're a bit too arcane.

eemeli added 3 commits August 14, 2018 16:04

cst/Range: Refactor/extend API, dropping length and adding applyOrig(…

38f5b75

…) & setOrigRange()

Add setOrigRanges() to YAML.parseCST() output

2a893b4

Use CR offset in setOrigRanges() for multi-document streams

3255b2a

ikatyang reviewed Aug 15, 2018

View reviewed changes

eemeli added 6 commits August 15, 2018 18:43

cst/parse: Oops, actually set crOffset during setOrigRanges loop

7cbdbf9

cst/parse: Add tests

dafc1ed

cst/Range: Drop apply() and applyOrig() methods

8e28e9d

Update docs

4b629fb

Merge branch 'master' into set-orig-ranges

7f88e52

Fix cst/parse tests for document range changes

c4e3a62

eemeli merged commit 9d3fc58 into master Aug 23, 2018

eemeli deleted the set-orig-ranges branch August 23, 2018 03:28

eemeli mentioned this pull request Oct 4, 2019

Range info is off for \r\n line endings #127

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add setOrigRanges() to YAML.parseCST() output #31

Add setOrigRanges() to YAML.parseCST() output #31

eemeli commented Aug 14, 2018 •

edited

Loading

ikatyang commented Aug 15, 2018 •

edited

Loading

ikatyang Aug 15, 2018

eemeli Aug 15, 2018

eemeli commented Aug 15, 2018

Add setOrigRanges() to YAML.parseCST() output #31

Add setOrigRanges() to YAML.parseCST() output #31

Conversation

eemeli commented Aug 14, 2018 • edited Loading

ikatyang commented Aug 15, 2018 • edited Loading

ikatyang Aug 15, 2018

Choose a reason for hiding this comment

eemeli Aug 15, 2018

Choose a reason for hiding this comment

eemeli commented Aug 15, 2018

eemeli commented Aug 14, 2018 •

edited

Loading

ikatyang commented Aug 15, 2018 •

edited

Loading