Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dgraph live RDF parser does not support \u escape sequences in facets #4157

Closed
adg opened this issue Oct 10, 2019 · 1 comment · Fixed by #4175
Closed

dgraph live RDF parser does not support \u escape sequences in facets #4157

adg opened this issue Oct 10, 2019 · 1 comment · Fixed by #4175
Assignees
Labels
area/parsing Issues related to the parser or lexer. priority/P0 Critical issue that requires immediate attention. status/accepted We accept to investigate/work on it.

Comments

@adg
Copy link

adg commented Oct 10, 2019

Dgraph's RDF parser used in dgraph live does not appear to understand \uXXXX escape sequences inside facets.

The dgraph documentation says that it uses the RDF N-Quad spec, which specifies support for the \uXXXX escape sequences, but dgraph's implementation does not appear to respect it.

I tried adding these test cases to the chunker pacakage's TestLex, and the second one fails:

diff --git a/chunker/rdf_parser_test.go b/chunker/rdf_parser_test.go
index f2c45df5..7c733bbd 100644
--- a/chunker/rdf_parser_test.go
+++ b/chunker/rdf_parser_test.go
@@ -503,6 +503,28 @@ var testNQuads = []struct {
                },
                expectedErr: false,
        },
+       {
+               input: `<alice> <lives> "wonderland" (friend="hatter").`,
+               nq: api.NQuad{
+                       Subject:     "alice",
+                       Predicate:   "lives",
+                       ObjectId:    "",
+                       ObjectValue: &api.Value{Val: &api.Value_DefaultVal{DefaultVal: `wonderland`}},
+                       Facets:      []*api.Facet{{Key: "friend", Value: []byte("hatter"), Tokens: []string{"\001hatter"}}},
+               },
+               expectedErr: false,
+       },
+       {
+               input: `<alice> <lives> "wonderland" (friend="hatter \u0045") .`,
+               nq: api.NQuad{
+                       Subject:     "alice",
+                       Predicate:   "lives",
+                       ObjectId:    "",
+                       ObjectValue: &api.Value{Val: &api.Value_DefaultVal{DefaultVal: `wonderland`}},
+                       Facets:      []*api.Facet{{Key: "friend", Value: []byte("hatter E"), Tokens: []string{"\001hatter E"}}},
+               },
+               expectedErr: false,
+       },
        {
                input:       `<alice> <lives> "\u004 wonderland" .`,
                expectedErr: true, // should have 4 hex values after \u

The failure:

--- FAIL: TestLex (0.00s)
    rdf_parser_test.go:1008: 
        	Error Trace:	rdf_parser_test.go:1008
        	Error:      	Received unexpected error:
        	            	while lexing <alice> <lives> "wonderland" (friend="hatter \u0045"). at line 1 column 37: Not a valid escape char: 'u'
        	            	github.com/dgraph-io/dgraph/lex.(*Lexer).ValidateResult
        	            		/Users/adg/t/dgraph/dgraph/lex/lexer.go:200
        	            	github.com/dgraph-io/dgraph/chunker.ParseRDF
        	            		/Users/adg/t/dgraph/dgraph/chunker/rdf_parser.go:84
        	            	github.com/dgraph-io/dgraph/chunker.TestLex
        	            		/Users/adg/t/dgraph/dgraph/chunker/rdf_parser_test.go:1000
        	            	testing.tRunner
        	            		/Users/adg/go/src/testing/testing.go:909
        	            	runtime.goexit
        	            		/Users/adg/go/src/runtime/asm_amd64.s:1357
        	Test:       	TestLex
        	Messages:   	Got error for input: "<alice> <lives> \"wonderland\" (friend=\"hatter \\u0045\")."
FAIL

This manifests itself in dgraph live in that if you pass it an RDF of the form

<alice> <lives> "wonderland" (friend="hatter \u0045") .

it will fail with the above error.

@adg
Copy link
Author

adg commented Oct 10, 2019

Interestingly, it appears to work if you supply such an RDF as a mutation in dgraph-ratel.

@manishrjain manishrjain added the priority/P0 Critical issue that requires immediate attention. label Oct 15, 2019
@martinmr martinmr self-assigned this Oct 15, 2019
@martinmr martinmr added area/parsing Issues related to the parser or lexer. status/accepted We accept to investigate/work on it. labels Oct 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/parsing Issues related to the parser or lexer. priority/P0 Critical issue that requires immediate attention. status/accepted We accept to investigate/work on it.
Development

Successfully merging a pull request may close this issue.

3 participants