Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add setOrigRanges() to YAML.parseCST() output #31

Merged
merged 9 commits into from
Aug 23, 2018
Merged

Conversation

eemeli
Copy link
Owner

@eemeli eemeli commented Aug 14, 2018

This (hopefully) fixes #20 by adding a method setOrigRanges() to the array object returned by YAML.parseCST(). That method would set origStart and origEnd indices for all of the range objects contained within the CST, which in turn would point to the start and end of the source before the CR characters got stripped out.

@ikatyang, comment from you would be appreciated here esp. regarding the specifics of the API, as you would have an actual use case for this. In particular:

  • I'm not sure about the new method and member names, and would be interested to consider alternatives.
  • The current setOrigRanges() implementation will return a boolean indicating whether it did in fact add the ranges, to allow for a fast return for inputs that did not contain a \r\n string. Does this make its usage more difficult, as it's then not certain that origStart and origEnd are always set after calling?
  • If a range end is pointing at the \n character that in the source was preceded by a \r, the corresponding origEnd will then point at the \r character to maintain the same semantic meaning as before. This has particular significance for valueRange values, as this makes sure that a slice of the source using the range does not get a somewhat surprising \r suffix if origEnd is left pointing at the \n.
  • The Range class now has a couple of utility methods, apply(src) and applyOrig(src), for slicing the range contents from the source. Should these be published and documented? Atm just applyOrig is used, and even then in just one test case.

I did try a couple of more automated methods of handling all of this, e.g. mutating the range start and end values automatically after parsing to match what's now origX, but that messes up the value parsing and actually breaks YAML spec a bit. I also considered calling setOrigRanges() internally before returning from parseCST(), but that would make the presence or non-presence of origX values unclear, as well as adding a useless tree traversal for use cases that don't need the indices into the original source.

@ikatyang
Copy link
Contributor

ikatyang commented Aug 15, 2018

I'm not sure about the new method and member names, and would be interested to consider alternatives.

I have no opinion on it. 😅

The current setOrigRanges() implementation will return a boolean indicating whether it did in fact add the ranges, to allow for a fast return for inputs that did not contain a \r\n string. Does this make its usage more difficult, as it's then not certain that origStart and origEnd are always set after calling?

I think it's fine either way since it's just a matter of range.origStart || range.start, writing docs for it would be helpful. (It seems this change only affect valueRanges, shouldn't it also affect ranges?)

If a range end is pointing at the \n character that in the source was preceded by a \r, the corresponding origEnd will then point at the \r character to maintain the same semantic meaning as before. This has particular significance for valueRange values, as this makes sure that a slice of the source using the range does not get a somewhat surprising \r suffix if origEnd is left pointing at the \n.

👍

The Range class now has a couple of utility methods, apply(src) and applyOrig(src), for slicing the range contents from the source. Should these be published and documented? Atm just applyOrig is used, and even then in just one test case.

I actually won't use this functionality so I'm not sure, the original start/end is not suitable for my use case (e.g., PLAIN usually contains a lot of trailing whitespaces) and I have to adjust it then apply slicing.

src/cst/parse.js Outdated
for (let i = 1; i < cr.length; ++i) cr[i] -= i
let crOffset = 0
for (let i = 0; i < documents.length; ++i) {
documents[i].setOrigRanges(cr, crOffset)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crOffset = documents[i].setOrigRanges(cr, crOffset)? If so, we should add a test for it.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. And yes, this should indeed be tested.

@eemeli
Copy link
Owner Author

eemeli commented Aug 15, 2018

It seems this change only affect valueRanges, shouldn't it also affect ranges?

No, all ranges should get origX values. Also including all props and even the BlockValue header.

Docs indeed will need to be updated for this change. And it'd probably be better to completely drop the apply and applyOrig methods, as they're a bit too arcane.

@eemeli eemeli merged commit 9d3fc58 into master Aug 23, 2018
@eemeli eemeli deleted the set-orig-ranges branch August 23, 2018 03:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Maintain the loc info from input text? (CRLF)
2 participants