-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
encoding/xml: add Decoder.InputPos #45628
Comments
duplicate of #43589 ? |
@seankhliao my bad. I think I should close this and create a pull request for #43589? |
I have added the code to Gerrit at https://go-review.googlesource.com/c/go/+/311270 |
Change https://golang.org/cl/311270 mentions this issue: |
This proposal has been added to the active column of the proposals project |
Over on #44221, we started with line number but ended up returning (line, column int) where column is 1-based and counts bytes (not runes). It seems like probably the same thing would be worthwhile here? So should the method be
? |
That makes more sense than just returning the line. Good point. |
@rsc Not sure if I can come up with a decent implementation for the column value though. I could increase a column counter every time a byte is read and set it to 0 when Otherwise one could read backwards when calling the new function |
Perhaps you can record the byte offset of the start of the line (using the |
I mistakenly assumed that I have updated the code on Gerrit. |
Based on the discussion above, this proposal seems like a likely accept. |
FWIW, I was against adding line-tracking logic to I don't think #44221 can serve as existing precedence for why this should be added. The CSV format fundamentally requires tracking where the new lines occur since newlines are the delimiter between records. In contrast, neither JSON nor XML have grammars where newlines have semantic significance beyond any other whitespace characters. |
@dsnet The line numbering is already there. So the only addition is the simple counting of the offset which is IMO a very trivial (in the sense of computation complexity / computation time) operation. While you are surely correct that the line number is not inherently used for grammar purposes, it is very helpful to give feedback to the user application in the case of an error. And this does not necessarily have to be a syntax error. (edit:) I also don't think that #44221 is the reason to add the InputPos. It merely suggests a function signature to be used for the reason of symmetry. |
@pgundlach, you are correct. It seems that line information already leaks into the API through the
Line information is useful, but performance is also a good goal. Regardless, the performance ship has already sailed as you pointed out. So my argument no longer has basis in the context of |
@dsnet Of course it will add some operations to the overall process, but when I read "the performance ship has already sailed" I get the impression that you want to say that line counting will dramatically slow down the processing. My (quick and not representative) benchmarks show me rather similar results with and without line counting and with column counting added in.
All numbers are within a range where I can't tell if the difference is due to the algorithm or due to some other things happening on my computer right now. It would need better benchmarks to see how much time these things add to XML parsing. Wich is what I' expect because if b == '\n' {
d.line++
} is absolutely negligible in terms of computation time compared to the rest of the code. |
The existing parsers in both the |
@dsnet ok, fair enough, good point. |
I'm skeptical that the counting fundamentally can't be done efficiently even in a faster JSON. But that's not the issue today. |
No change in consensus, so accepted. 🎉 |
Since this is my first Go proposal, may I humbly ask what would be the next steps? I assume that there should be an implementation of the proposal, so I allow myself to link to the code I have put on the code review: https://go-review.googlesource.com/c/go/+/311270 - this is just an ad hoc implementation and I don't know if it meets the Go standards for inclusion in a standard library. |
@pgundlach, your CL looks fine. We are already in the feature freeze for Go 1.17, so we will wait to review it when Go 1.18 work begins in August. Thanks! |
This includes a copy of Go's XML processing library until the build system has Go version 1.8. See golang/go#45628
Thank you @pgundlach for this issue and congrats on having your proposal accepted, it is also awesome to have you as a new Go contributor! Unfortunately we didn't get to look at your CL during the Go1.18 cycle, and we are already too deep into the freeze. However, I have left comments on your CL, and I shall annotate this as for early Go1.19. Apologies again, and we greatly appreciate your contributions and I am excited for your change to land in Go1.19, soon :-) Thank you also Russ, Sean, Joe, Ian and the proposal review committee for the discussions! |
This issue is currently labeled as early-in-cycle for Go 1.19. |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
I use the package
encoding/xml
for a programming language similar to XSLT. Now I'd like to provide user feedback if something is semantically wrong with the input. For example when the user has forgotten to initialise a variable. In that case the XML syntax is correct.I want to tell the user "you have a problem in line xyz" but I cannot get the current line from the XML decoder. Thus I propose to add a function to the XML package similar to
func (d *Decoder) InputOffset() int64 { return d.offset }
:where a test case would be something like this:
I'd be happy to provide a pull request, but since this function is so trivial, I just post it here as well.
The text was updated successfully, but these errors were encountered: