-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
encoding/json: add line number to SyntaxError #43513
Comments
Could you expand a bit on why the line number would help? Naively, an offset is enough to work out the line number if you were parsing a chunk of bytes, a buffer, or a file. Just count the number of newlines up to the offset. If you were parsing a stream of data, then I'm not convinced that a line number would actually help anyone. A line number is usually helpful for a human to go look at the input in the right place, but that is not possible if the stream isn't saved somewhere. Keeping track of extra information like line numbers also has an inherent cost, so I want to make sure the arguments in favor are solid. |
I agree, it should be possible to count lines having input at hand and without input it does not makes much sense. The proposal is coming from the usecase where parsing module does not have input at hand but bubbles up the error kubernetes/kubernetes#97651 |
encoding/json's syntax errors aren't very end-user-friendly, so it's entirely expected that a user-facing piece of software would wrap them or decorate them to better show the user what's going on. Calculating the line number if there is a syntax error should take just a couple of lines of code, so I agree it's not worth adding straight to SyntaxError. |
It would be nice to have a |
@networkimprov As far as I can tell line counter would not help in this case due to buffering around here go/src/encoding/json/stream.go Lines 146 to 169 in 3e1e13c
i.e. the parsing error might not necessary happen after last new line read. |
Indeed, most parsers/decoders read in big chunks, because doing one read for every single byte consumed would be incredibly expensive. So a io.Reader layer like @networkimprov suggests won't work. |
I don't think this needs the attention from the proposal review committee. They already have a backlog of hundreds of proposals as it is. We can make this decision amongst the encoding/json maintainers; CC @dsnet @rsc @bradfitz. My opinion is still that we should reject this idea. If anyone has thoughts on a helper API to "prettify" json errors for end users, perhaps we could consider that, but one could always write that in a few lines of simple Go code. |
Well two possible APIs are
I'd use this, as my apps import long (hundreds of lines) test sequences from JSON files. |
Line counting without input at hand may be implemented like: package main
import (
"encoding/json"
"fmt"
"io"
"os"
"sort"
)
type (
lineCounter struct {
r io.Reader
d *json.Decoder
bytes int64
lines int
offsets []offset
}
offset struct {
index int
offset int64
}
)
func (c *lineCounter) Read(data []byte) (n int, err error) {
// remove newline offsets less than input offset
i := c.search(c.d.InputOffset())
c.offsets = c.offsets[i:]
n, err = c.r.Read(data)
for i, b := range data[:n] {
if b == '\n' {
c.offsets = append(c.offsets, offset{c.lines, c.bytes + int64(i)})
c.lines++
}
}
c.bytes += int64(n)
return
}
func (c *lineCounter) newlinesBefore(offset int64) int {
if len(c.offsets) == 0 {
return c.lines
}
i := c.search(offset)
if i == len(c.offsets) {
return c.lines
}
return c.offsets[i].index
}
func (c *lineCounter) search(offset int64) int {
return sort.Search(len(c.offsets), func(i int) bool { return c.offsets[i].offset >= offset })
}
func main() {
c := &lineCounter{r: os.Stdin}
dec := json.NewDecoder(c)
c.d = dec
for {
var m map[string]interface{}
if err := dec.Decode(&m); err == io.EOF {
break
} else if se, ok := err.(*json.SyntaxError); ok {
fmt.Printf("%#v\n", err)
//fmt.Printf("%#v\n", c)
fmt.Printf("Line: %d\n", c.newlinesBefore(se.Offset)+1)
break
}
}
} Not trivial but does not require Decoder change. |
I updated my previous comment to suggest encoding/json |
Line counting (given the entire input buffer as line := 1 + strings.Count(jsonData[:offset], "\n")
column := 1 + offset - (strings.LastIndex(jsonData[:offset], "\n") + len("\n")) Given it's simplicity, it calls into question whether we must provide this feature. The main advantage of having Stepping back, I personally, have no problem with the concept of adding line:column information and even explored an implementation last month where that's provided [1]. However, as @mvdan mentioned earlier, the issue is the performance cost it imparts onto the
I don't think another API endpoint just for more verbose errors justifies its addition. Also, it's not clear to me that this fully avoids any performance slow down. I've been regularly surprised by how simply checking a flag for whether some branch should be taken itself incurs a performance slow down. Given the strong desire that we avoid slowing down decoding in the common case, I propose two possible approaches:
[1] This is in reference to an entirely new |
When reading a file or network message, you almost always use json.Decoder, not json.Unmarshal. It would be crazy in many cases to save the whole stream to a buffer in case of error. I don't think a performance cost is unreasonable when the feature is triggered by an API. When you really can't afford the cost, you simply don't incur it. Note that I suggested an extensible API which can accommodate other decoding options. Maybe accept_comments? :-) |
That's not quite true. We still need to check a flag for whether the feature is enabled or not and checking the flag itself is a non-zero cost.
This is outside the scope of this specific issue and has been explored quite a bit in other issues related to |
Worst case, you check the flag once per Decode(), to select a parsing function, no? Is there an issue proposing a decoding options type? If not, this could be turned into one. What other decoding options have been requested to date? |
I agree with this solution, if we want a solution.
https://github.com/golang/go/issues?q=is%3Aissue+is%3Aopen+encoding+json+in%3Atitle+label%3AProposal is a good start.
Personally, I don't think a GitHub thread is a good place to bikeshed rethinking the encoding/json options API. I think this thread should stay on topic. |
The only solution suitable for long json streams is the one Alexander suggested here: #43513 (comment). It's not very easy. Perhaps we should tag this as ProposalHold; to be considered when a DecodeOptions type emerges. |
Providing line:column information for large inputs is a weak use case. The primary purpose of line:column position information is for human consumption. By nature, line:column information is a poor mechanism for programmatically identifying the location of an error since it fundamentally requires O(n) processing to count the number of preceding lines. In contrast, a byte offset is always O(1) since it identifies exactly where in the input a problem occurred and it's suitable for machine consumption as you can always accurately convert it to a line:column position. Thus, line:column information is generally only useful for inputs that are only a few KiBs in size. Even many text editors today start choking when trying to open a text file that are a few MiB in size.
That technique assumes that there is a specialized parsing function for each possible combination of features. Since combinations are fundamentally O(n!), this incurs significant implementation complexity (and associated binary bloat). As someone who would potentially be responsible for maintaining
I don't believe that is necessary. It presumes that an API just to opt into logic for line-number tabulation is the direction we want to head. Personally, I don't think an option just to opt-into line:column tabulation is justified. |
I think it would be good to update the json.Decoder to feature an InputPos method — to bring it in line with both xml.Decoder.InputPos and csv.Reader.FieldPos It's easy to argue that OffsetPos is enough, because JSON is generally streamed and not read from a file. But I believe that ignores an important common usage pattern here... — The first thing one often does when faced with an issue when decoding a large complex JSON stream that fails somewhere in the middle, is to capture/save it to disk, as a test/debug case, and then open the file in a text editor (because these are human-readable files, by design), and go and take a look at the exact place where it has failed, to see what's what, and understand what it is that one's JSON-based decoder has fallen over on (whether it is the decoder or whatever, perhaps external tool, is creating the JSON). IMO not having the decoder be able to return line+column info simply adds friction to working with complex JSON in Go. All I want to be able to do, is easily find out where exactly my JSON stream decoder is failing, i.e. what incoming tag combination it is failing on (that it doesn't handle or handles incorrectly). And not having a simple API call to do this, when other decoders (XML and CSV) already feature this, seems like a painful oversight. Particularly when the main argument against it is performance, it's not a fast parser anyhow — particularly in the light that most folk are relying on built-in reflection to unmarshal / decode stuff. Personally, I'd rather take the additional minor performance hit parsing every JSON file, than have to jump through additional hoops when it comes to debugging some problem that should be much simpler to debug than it currently is. Please bring json.Decoder inline with the other text-based decoders here — if adding such functionality to those decoders (ostensibly only for debugging / problem solving), at the cost of some performance was acceptable there, then it should also be perfectly acceptable here. Related: #45628 — where @rsc says:
|
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes
What did you do?
At the moment
SyntaxError
provides an offset of the character that caused parsing error.The proposal is add the line number as well.
I could not find such an issue therefore opened this one, please close if it was already resolved.
The text was updated successfully, but these errors were encountered: