Bulk Loader On Single Server Not Loading All Predicates #2616

dareneiri · 2018-09-27T20:43:42Z

If you suspect this could be a bug, follow the template.

What version of Dgraph are you using?
v1.0.8
Have you tried reproducing the issue with latest release?
On latest release
What is the hardware spec (RAM, OS)?
macOS High Sierra 10.13.6
16 GB RAM
Docker Version 18.06.0-ce-mac73 (26764)
Steps to reproduce the issue (command/config used to run Dgraph).

Clone https://github.com/MichelDiz/Dgraph-Bulk-Script

After Dgraph is loaded, create test data:

Alter:

 predicate_with_no_uid_count:string  .
 predicate_with_default_type:default  .
 predicate_with_index_no_uid_count:string @index(exact) .

Mutate:

 {
  set {
    _:company1 <predicate_with_default_type> "CompanyABC" .
  }
}

Check count before and after. Before mutate, should be 0. After should be 1:

{
   q(func: has(predicate_with_default_type)) {
        count(uid)
   }
}

So now we have added three predicates. The first has count(uid)=0. The second has type of default with count(uid)=1, and the last has count(uid)=0 but also with an index.
Export the data
Bulk import the exported data
View in Dgraph Ratel the Schema shows only 9 of the 12 entries.

Expected behaviour and actual result.
- I would expect that after exporting using curl localhost:8080/admin/export, I would see all three predicates I added to the 1million movie dataset, which would be 12 entries in the schema.
- Instead, I see still only 9 entries after using the Bulk script (after changing the path of the schema and rdf).
- Additionally, if I modify the schema file so that predicate_with_default_type is changed from default to string, then the predicate will display and entries will total to 10 in the schema.
- I have forked the Dgraph-Bulk-Script repo and included the data I exported (using the steps above):
  https://github.com/dareneiri/Dgraph-Bulk-Script
  - In service/bulk-it-or-not-bulk-it.sh, you can use the .schema file where I changed the predicate type from default to string and shows 10 entries. Or you can use .schema.gz, which will result in 9 entries instead of the expected 12.

The text was updated successfully, but these errors were encountered:

manishrjain · 2018-09-28T10:20:22Z

That script is not officially supported by Dgraph. But, @MichelDiz might be able to help you with it.

MichelDiz · 2018-09-28T15:40:27Z

I performed the mentioned steps and was not able to reproduce the issue
There are 13 entries on mine test.
And 1million has 10 entries.

dareneiri · 2018-09-28T16:08:23Z

@MichelDiz

Updated 9/28/18 @ 1000 PST

I added additional steps to reproducing the issue. I neglected to specifically mention to export the data, then bulk upload it.

Thank you for your prompt response! Are these 13 entries also showing in Dgraph Ratel? Just want to clarify that it's the not the *.schema file itself that isn't showing the entries, but in Ratel after the bulk upload completes.

My *.schema file defines 12 entries so I expect 12 in Dgraph Ratel, but only see 9.

MichelDiz · 2018-09-28T17:13:43Z

In Ratel only 12, but "_share_hash_:string @index(exact) ." we do not take into account because it is from Dgraph.

MichelDiz · 2018-09-28T17:39:05Z

By doing what you said in:

I added additional steps to reproducing the issue. I neglected to specifically mention to export the data, then bulk upload it.

I was able to reproduce the issue. This happens in the insertion of Schema by bulkload, the data is okay.

You can work around this by adding your schema directly in Alter.

name:default . 
genre:uid . 
starring:uid . 
actor.film:uid . 
_share_hash_:string @index(exact) . 
director.film:uid . 
performance.film:uid . 
performance.actor:uid . 
initial_release_date:default . 
performance.character:uid . 
predicate_with_default_type:default . 
predicate_with_no_uid_count:string . 
predicate_with_index_no_uid_count:string @index(exact) .

@danielmai Can you help me with this? Maybe I did something wrong. If you confirm add a Bug flag

No need to use my script, just do this:

Make bulkload or LiveLoad the "1million dataset".
Add the predicates and mutation from @dareneiri. (It is not necessary, but only to keep the steps.)
Make an export
And then a bulkload and check if the schema is there correctly via Ratel.

In my example only the predicates with UID were recorded. The others were not recorded.

info:


Dgraph version   : v1.0.9-rc4
Commit SHA-1     : 2e5fab50
Commit timestamp : 2018-09-20 12:47:35 -0700
Branch           : HEAD

./dgraph bulk -s s.schema.gz -r rdf.rdf.gz
{
	"RDFDir": "rdf.rdf.gz",
	"SchemaFile": "s.schema.gz",
	"DgraphsDir": "out",
	"TmpDir": "tmp",
	"NumGoroutines": 16,
	"MapBufSize": 67108864,
	"ExpandEdges": true,
	"SkipMapPhase": false,
	"CleanupTmp": true,
	"NumShufflers": 1,
	"Version": false,
	"StoreXids": false,
	"ZeroAddr": "localhost:5080",
	"HttpAddr": "localhost:8080",
	"MapShards": 1,
	"ReduceShards": 1
}

…ema. (#2616)

* Don't skip predicates with value type of default when loading the schema. (#2616) * Allow running test.sh from another directory. * Keep all predicates from bulk import schema, not just the ones used. * Make set of predicates the union of predicates in the schema and rdf. * Add test for schema after export/bulk load. * Add more schema test cases.

* Don't skip predicates with value type of default when loading the schema. (dgraph-io#2616) * Allow running test.sh from another directory. * Keep all predicates from bulk import schema, not just the ones used. * Make set of predicates the union of predicates in the schema and rdf. * Add test for schema after export/bulk load. * Add more schema test cases.

srfrog added the investigate Requires further investigation label Oct 12, 2018

codexnull pushed a commit that referenced this issue Nov 16, 2018

Don't skip predicates with value type of default when loading the sch…

d0f2217

…ema. (#2616)

manishrjain assigned codexnull Nov 26, 2018

manishrjain added kind/bug Something is broken. and removed investigate Requires further investigation labels Nov 26, 2018

codexnull closed this as completed Nov 29, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bulk Loader On Single Server Not Loading All Predicates #2616

Bulk Loader On Single Server Not Loading All Predicates #2616

dareneiri commented Sep 27, 2018 •

edited

Loading

manishrjain commented Sep 28, 2018

MichelDiz commented Sep 28, 2018

dareneiri commented Sep 28, 2018 •

edited

Loading

MichelDiz commented Sep 28, 2018

MichelDiz commented Sep 28, 2018 •

edited

Loading

Bulk Loader On Single Server Not Loading All Predicates #2616

Bulk Loader On Single Server Not Loading All Predicates #2616

Comments

dareneiri commented Sep 27, 2018 • edited Loading

manishrjain commented Sep 28, 2018

MichelDiz commented Sep 28, 2018

dareneiri commented Sep 28, 2018 • edited Loading

Updated 9/28/18 @ 1000 PST

MichelDiz commented Sep 28, 2018

MichelDiz commented Sep 28, 2018 • edited Loading

dareneiri commented Sep 27, 2018 •

edited

Loading

dareneiri commented Sep 28, 2018 •

edited

Loading

MichelDiz commented Sep 28, 2018 •

edited

Loading