-
-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for CBOR stringref extension (CBORGenerator.Feature.STRINGREF
)
#347
Conversation
Ok, first of all: thank you for contributing this feature! It sounds a useful thing, CBOR implementing support something Smile has had for a while. One thing I was wondering about: since this is a sizable thing, I was wondering if it was possible to split this in two PRs: one dealing with parsing, and second one (building on first) for generation? I realize it may be tricky wrt tests, but it would be easier to code review that way if at all possible. Second thing: regardless of whether it's 1 or 2 PRs, one (and only) process thing we need is CLA. https://github.com/FasterXML/jackson/blob/master/contributor-agreement.pdf and is only needed once before the first contribution is merged (one is good for all future PRs for all Jackson projects). Thank you once again, looking forward to reviewing this PR and getting it merged for 2.15 |
From what I understand this would require separate branches, which would be very unwieldy to manage and would require the first PR to be submitted before the second could actually be looked at. (since the second one would contain the code from the first) What I can do is split it into two commits, which would allow you to review them separately while being part of the same branch/PR.
Since I'm contributing this on company time I've forwarded this to the OSS department to make sure I get the OK to sign it. Hopefully it won't take too long to get approval. |
Yes, would require 2 branches and I guess that won't play nicely with PRs. Too bad but I guess it makes sense to keep things bundled. And yes, CLA should be fine; there's CCLA available if your company prefers that. |
Just to give a heads up, we have a tentative agreement internally to sign the CCLA, but are trying to find the right people to sign it. I'm hoping we'll get it submitted by mid January. In the meantime I'll be out the next two weeks myself for the holidays. |
@cowtowncoder I finally managed to get a signature and forwarded the CCLA last Friday. I just wanted to make sure that it was received and we can continue with the PR. |
@here-abarany I can't seem to find CCLA. Was it sent to |
@cowtowncoder it was sent to |
@here-abarany Ok odd. I think |
@cowtowncoder I just forwarded it now. |
cbor/src/main/java/com/fasterxml/jackson/dataformat/cbor/CBORGenerator.java
Show resolved
Hide resolved
cbor/src/main/java/com/fasterxml/jackson/dataformat/cbor/CBORParser.java
Show resolved
Hide resolved
cbor/src/main/java/com/fasterxml/jackson/dataformat/cbor/CBORParser.java
Show resolved
Hide resolved
cbor/src/main/java/com/fasterxml/jackson/dataformat/cbor/CBORGenerator.java
Show resolved
Hide resolved
/** | ||
* Stack of text and binary string references. | ||
*/ | ||
protected Stack<StringRefList> _stringRefs = new Stack<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could probably create sub-class that has helper methods, if that simplifies calling code (will add a note where this'd help)
_streamReadContext = _streamReadContext.getParent(); | ||
return (_currToken = JsonToken.END_OBJECT); | ||
} | ||
return (_currToken = _decodePropertyName()); | ||
} | ||
} else { | ||
if (!_streamReadContext.expectMoreValues()) { | ||
_tagValue = -1; | ||
--_nestedDepth; | ||
if (!_stringRefs.empty() && _stringRefs.peek().depth == _nestedDepth) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a simple sub-class of Stack
would allow hiding this logic into single method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created a helper class for this. I chose to use containment rather than subclass for this case as the interface is slightly different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@here-abarany Ok yes, that is cleaner approach from OO design perspective. Although being only used internally not a big thing either way.
@@ -837,6 +978,9 @@ protected String _numberToName(int ch, boolean neg) throws IOException | |||
// [dataformats-binary#269] (and earlier [dataformats-binary#30]), | |||
// got some edge case to consider | |||
if (i < 0) { | |||
if (isStringref) { | |||
_reportError("String reference index too large."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove trailing .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I got all the added errors, let me know if I missed any.
Ok @here-abarany -- impressive changes! I had some questions etc, although nothing major. My main concern is actually getting this to merge against 3.0 ( I suspect I will need help with |
@cowtowncoder thanks for the review, I hope to push up an update later today based on your comments. I'll have two separate pushes: one to rebase onto the latest 2.15 changes and one to push up the updates. With regards to the whitespace changes, my editor is configured to clear trailing whitespace on save. I might be able to check out to the 2.15 head, save those files, and cherry-pick my changes on top of that to reduce some of the noise. If I'm successful I'll include that with my pushes later today. I can try merging into the master branch locally and see what happens. Hopefully everything merges cleanly, if not I'll note what issues I encountered. |
It's unlikely to merge cleanly, partly due to package renaming, partly due to big API changes. As to whitespace, I'll try to figure out how to do that on my end for 2.15; that way you can just rebase on 2.15 head. No need for you to have to do that now I think. |
@here-abarany Ok I did white-space trimming, merged it to |
1d5d919
to
ff93f22
Compare
@cowtowncoder I have pushed up my changes, I believe I addressed all the comment threads from the review. The first push is just a rebase, so you should be able to view just my latest changes by comparing the latest push just before this comment. |
@@ -236,6 +253,11 @@ public int getMask() { | |||
*/ | |||
protected boolean _bufferRecyclable; | |||
|
|||
/** | |||
* Table of previously referenced text and binary strings. | |||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... Used if StringRef functionality is enabled. // or something like that
@since 2.15
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't sure if protected fields should be marked, I'll go through and try to add it for the fields I've added. (as well as the comment clarification noted here)
|
||
/** | ||
* Shared string that should be used in place of _textBuffer when a string reference is used. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SInCE 2.15
@@ -47,6 +47,16 @@ public void testJsonSampleDoc() throws IOException | |||
verifyJsonSpecSampleDoc(cborParser(data), false, true); | |||
} | |||
|
|||
public void testJsonSampleDocStringref() throws IOException | |||
{ | |||
byte[] data = cborDoc(new CBORFactory().enable(CBORGenerator.Feature.STRINGREF), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you change this to use Builder? This merges cleanly with 3.0 where factories are immutable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've switched this and the one below to use a builder.
See http://cbor.schmorp.de/stringref for stringref definition. When a stringref namespace is encountered, indices will be generated for each valid binary and text string and used when referenced. Nested tags are now supported. The previous getCurrentTag() function is still supported for backwards compatibility, and returns the first tag. Otherwise a new TagList type is used to efficiently handle multiple tags.
Citm citm0 = MAPPER.readValue(getClass().getResourceAsStream("/data/citm_catalog.json"), | ||
Citm.class); | ||
ObjectMapper mapper = new CBORMapper( | ||
new CBORFactory().enable(CBORGenerator.Feature.STRINGREF)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, builder over factory configuration (I know, many existing tests use this, trying to move over with new code).
@here-abarany Ok, sorry, missed couple of smaller things; now done with reviewing I think. Minor test changes requested to make it more likely things merge less uncleanly. |
When enabled with the STRINGREF feature, a namespace is created for the root element and indices are produced for valid binary and text strings.
Copying tags can be dangerous in two main cases: 1. Jackson creates its own tags, such as for BigNum or the self describe tag. 2. The semantic meaning of an element changes. For example, converting between stringrefs and no stringrefs. In some cases it would result in duplicate tags, while in others it may cause completely incorrect tags, such as a stringref tag applying to a string rather than an integer. Before nested tag support was added this could generate files that Jackson could not parse, though even with nested tag support it may cause undefined behavior.
@cowtowncoder I have pushed up a new revision to fix your latest comments. |
Thank you @here-abarany! I will pick this up again today or tomorrow, get it merged. |
Ok, phew. Merge to I suspect it is since Could you have a look if you can see how to fix this, @here-abarany ? |
CBORGenerator.Feature.STRINGREF
)
@cowtowncoder thanks for getting it merged in, I'll see if I can take a look on Monday for the master issue. |
Thank you in advance! Hopefully it is relatively straight-forward. |
See http://cbor.schmorp.de/stringref for stringref definition. Repeated binary and text strings are assigned an index on write and a tagged integer is written when repeated. Parser respects the stringref IDs when reading data back in.
Nested tags are now supported. The previous getCurrentTag() function is still supported for backwards compatibility, and returns the first tag. Otherwise a new TagList type is used to efficiently handle multiple tags.