fix: parse_vector
for f32s with a 48-char representation
#531
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Howdy, thanks for such a cool project! I found an issue parsing vectors containing values with 48 character representations.
For example
-0.000000000000000000000000000000000000023509886
, a valid f32 given by0x80FFFFFF
with length 48.I ran into this because I had saved a vector with this value in a prior version of pgvecto.rs (0.2.1) and was running into the
Bad literal
error when importing the same data into version 0.3.0.I believe that the tokenizer in
parse_vector
is prepending a superfluousb'$'
into the token buffer. This PR removes it.Before
After
Discussion
Taking Chesterson's fence to heart, I really did try squinting every which way to figure out why the
'$'
is pushed totoken
at first but came up blank. I think it may have been testing code in the initial benchmark rig that @usamoi was using when speeding up the parsing (#316)? At any rate, if it is required I think that the buffer on line 50 needs to be constructed with len=49 rather than 48 to account for the extra'$'
. Note that very similar code parse_pgvector_svector does not seem to be affected by this issue.