Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk Loader On Single Server Not Loading All Predicates #2616

Closed
dareneiri opened this issue Sep 27, 2018 · 5 comments
Closed

Bulk Loader On Single Server Not Loading All Predicates #2616

dareneiri opened this issue Sep 27, 2018 · 5 comments
Assignees
Labels
kind/bug Something is broken.

Comments

@dareneiri
Copy link

dareneiri commented Sep 27, 2018

If you suspect this could be a bug, follow the template.

  • What version of Dgraph are you using?
    v1.0.8

  • Have you tried reproducing the issue with latest release?
    On latest release

  • What is the hardware spec (RAM, OS)?
    macOS High Sierra 10.13.6
    16 GB RAM
    Docker Version 18.06.0-ce-mac73 (26764)

  • Steps to reproduce the issue (command/config used to run Dgraph).

  1. Clone https://github.com/MichelDiz/Dgraph-Bulk-Script
  2. After Dgraph is loaded, create test data:
    1. Alter:
     predicate_with_no_uid_count:string  .
     predicate_with_default_type:default  .
     predicate_with_index_no_uid_count:string @index(exact) .  
    
    1. Mutate:
     {
      set {
        _:company1 <predicate_with_default_type> "CompanyABC" .
      }
    }
    
    1. Check count before and after. Before mutate, should be 0. After should be 1:
    {
       q(func: has(predicate_with_default_type)) {
            count(uid)
       }
    }
    
  3. So now we have added three predicates. The first has count(uid)=0. The second has type of default with count(uid)=1, and the last has count(uid)=0 but also with an index.
  4. Export the data
  5. Bulk import the exported data
  6. View in Dgraph Ratel the Schema shows only 9 of the 12 entries.
  • Expected behaviour and actual result.
    • I would expect that after exporting using curl localhost:8080/admin/export, I would see all three predicates I added to the 1million movie dataset, which would be 12 entries in the schema.

    • Instead, I see still only 9 entries after using the Bulk script (after changing the path of the schema and rdf).

    • Additionally, if I modify the schema file so that predicate_with_default_type is changed from default to string, then the predicate will display and entries will total to 10 in the schema.

    • I have forked the Dgraph-Bulk-Script repo and included the data I exported (using the steps above):
      https://github.com/dareneiri/Dgraph-Bulk-Script

      • In service/bulk-it-or-not-bulk-it.sh, you can use the .schema file where I changed the predicate type from default to string and shows 10 entries. Or you can use .schema.gz, which will result in 9 entries instead of the expected 12.
@manishrjain
Copy link
Contributor

That script is not officially supported by Dgraph. But, @MichelDiz might be able to help you with it.

@MichelDiz
Copy link
Contributor

I performed the mentioned steps and was not able to reproduce the issue
There are 13 entries on mine test.
And 1million has 10 entries.

captura de tela 2018-09-28 as 12 36 07

@dareneiri
Copy link
Author

dareneiri commented Sep 28, 2018

@MichelDiz

Updated 9/28/18 @ 1000 PST

I added additional steps to reproducing the issue. I neglected to specifically mention to export the data, then bulk upload it.


Thank you for your prompt response! Are these 13 entries also showing in Dgraph Ratel? Just want to clarify that it's the not the *.schema file itself that isn't showing the entries, but in Ratel after the bulk upload completes.

My *.schema file defines 12 entries so I expect 12 in Dgraph Ratel, but only see 9.

screen shot 2018-09-28 at 9 07 39 am

@MichelDiz
Copy link
Contributor

In Ratel only 12, but "_share_hash_:string @index(exact) ." we do not take into account because it is from Dgraph.

@MichelDiz
Copy link
Contributor

MichelDiz commented Sep 28, 2018

By doing what you said in:

I added additional steps to reproducing the issue. I neglected to specifically mention to export the data, then bulk upload it.

I was able to reproduce the issue. This happens in the insertion of Schema by bulkload, the data is okay.

captura de tela 2018-09-28 as 14 26 35

You can work around this by adding your schema directly in Alter.

name:default . 
genre:uid . 
starring:uid . 
actor.film:uid . 
_share_hash_:string @index(exact) . 
director.film:uid . 
performance.film:uid . 
performance.actor:uid . 
initial_release_date:default . 
performance.character:uid . 
predicate_with_default_type:default . 
predicate_with_no_uid_count:string . 
predicate_with_index_no_uid_count:string @index(exact) . 

@danielmai Can you help me with this? Maybe I did something wrong. If you confirm add a Bug flag

No need to use my script, just do this:

  • Make bulkload or LiveLoad the "1million dataset".
  • Add the predicates and mutation from @dareneiri. (It is not necessary, but only to keep the steps.)
  • Make an export
  • And then a bulkload and check if the schema is there correctly via Ratel.

In my example only the predicates with UID were recorded. The others were not recorded.

info:


Dgraph version   : v1.0.9-rc4
Commit SHA-1     : 2e5fab50
Commit timestamp : 2018-09-20 12:47:35 -0700
Branch           : HEAD

./dgraph bulk -s s.schema.gz -r rdf.rdf.gz
{
	"RDFDir": "rdf.rdf.gz",
	"SchemaFile": "s.schema.gz",
	"DgraphsDir": "out",
	"TmpDir": "tmp",
	"NumGoroutines": 16,
	"MapBufSize": 67108864,
	"ExpandEdges": true,
	"SkipMapPhase": false,
	"CleanupTmp": true,
	"NumShufflers": 1,
	"Version": false,
	"StoreXids": false,
	"ZeroAddr": "localhost:5080",
	"HttpAddr": "localhost:8080",
	"MapShards": 1,
	"ReduceShards": 1
}


@srfrog srfrog added the investigate Requires further investigation label Oct 12, 2018
@manishrjain manishrjain added kind/bug Something is broken. and removed investigate Requires further investigation labels Nov 26, 2018
codexnull added a commit that referenced this issue Nov 29, 2018
* Don't skip predicates with value type of default when loading the schema. (#2616)

* Allow running test.sh from another directory.

* Keep all predicates from bulk import schema, not just the ones used.

* Make set of predicates the union of predicates in the schema and rdf.

* Add test for schema after export/bulk load.

* Add more schema test cases.
dna2github pushed a commit to dna2fork/dgraph that referenced this issue Jul 19, 2019
* Don't skip predicates with value type of default when loading the schema. (dgraph-io#2616)

* Allow running test.sh from another directory.

* Keep all predicates from bulk import schema, not just the ones used.

* Make set of predicates the union of predicates in the schema and rdf.

* Add test for schema after export/bulk load.

* Add more schema test cases.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something is broken.
Development

No branches or pull requests

5 participants