Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V1.1 Urgent Issue: Cannot Delete Outgoing Edge with N P * pattern After DGraph Upgrade #4182

Closed
limwcj opened this issue Oct 17, 2019 · 23 comments · Fixed by #4204
Closed

V1.1 Urgent Issue: Cannot Delete Outgoing Edge with N P * pattern After DGraph Upgrade #4182

limwcj opened this issue Oct 17, 2019 · 23 comments · Fixed by #4204
Assignees
Labels
kind/bug Something is broken. priority/P0 Critical issue that requires immediate attention. status/accepted We accept to investigate/work on it.

Comments

@limwcj
Copy link

limwcj commented Oct 17, 2019

What version of DGraph are you using?

v1.1.0

Have you tried reproducing the issue with the latest release?

Yes

What is the hardware spec (RAM, OS)?

This issue happens on both our local dev environment (macOS) and server. Local mac is running 8gb RAM SSD, Remove server is running Ubuntu 16.04 and is using Dgraph's recommended i3.xLarge on AWS.

Description

After upgrading to DGraph V1.1, we've experienced a lot of bugs in our app regarding relationships not being removed. It took us a while to figure it out what the issue was, but we finally narrowed down the problem to node's not being able to delete outgoing edge with the delete * mutation. We never use to have this problem, but after the Dgraph upgrade to V1.1 we started experiencing this problem. The weird part about this issue is that it only happens sometimes which is very weird and makes us think it's a race condition issue. We tried manually deleting the relatioship in Ratel as well, but the outgoing edge never get's deleted. We've followed the migration steps and made sure to add all type definitinos for all nodes and to define their one-to-one or one-to-many relationships in the schma file as well but it still doesn't seem to work. The delete mutation fires and returns success but the edge still remains.

Steps to reproduce the issue (command/config used to run Dgraph).

Schema definition for the relationship between cell and column option node.

Schema definition for cell and column option node

Before Delete Query

 Before delete query

Wildcard Delete Mutation with * .

Delete mutation for has_column_option edge

After Wildcard Delete Query

image

As seen above the delete mutation does not delete the has_column_option edge.

We noticed that if we delete the edge explicitly with the ID that it works with the N P N pattern but not N P * pattern.

delete {
  <node> <edge> <node> .
} 

but this does not work

delete {
  <node> <edge> * .
} 

The issue above only happens sometimes which is very weird since only certain nodes/edges seem to exhibit the behavior of not being able to be deleted with the pattern S P *.

Mutation Deleting Edge Explicitly with Outgoing Node ID with .

Deleting explicitly with outgoing node ID

Query after deleting edge explicitly

Query node after deleting explicitly

This is an urgent issue on our end as our app right now is completely broken so any help with this issue would be greatly appreciated. Thanks in advance!

@limwcj limwcj changed the title V1.1 Cannot Delete Edge Predicate with <node> <edge> * . Mutation V1.1 Urgent Issue: Cannot Delete Outgoing Edge with <node> <edge> * Delete Mutation After DGraph Upgrade Oct 17, 2019
@MichelDiz
Copy link
Contributor

MichelDiz commented Oct 17, 2019

Hey limwcj, nice to hear you're using Dgraph in production.

One thing to mention, do never upgrade your production application to any version before testing your codebase (do a "clone" of your application and test it). As you said is the first time you have issues with relationships. That means you come from an old or very old Dgraph version. Be careful, we are always improving Dgraph and that means sometimes that you need to review your whole codebase and let it in testing for some time. Always test first.

So, I can't reproduce your issue. It may be another thing. Did you upgrade in the right way? exporting it and reimporting. If you simply updated the binary over the data in the path (without exporting), this could lead to compatibility issues.

Please try to reproduce the steps below.

Sample

{
  "set": [
    {
      "name": "Parent 1",
      "hasChild": {
      "name": "Child 1"
      }
    }
  ]
}
{
  set {
    _:Parent <name> "Parent 2".
    _:Parent <hasChild> _:Child .
    _:Child <name> "Child 2" .
    }
}

Query

{
  q(func: eq(name, "Parent 1")){
    uid
    name
    hasChild {
      uid
      name
    }
  }
}

Result

{
  "data": {
    "q": [
      {
        "uid": "0x2750",
        "name": "Parent 1",
        "hasChild": [
          {
            "uid": "0x2751",
            "name": "Child 1"
          }
        ]
      }
    ]
  }
}

Delete

{
  "delete": [
    {
      "uid": "0x2750",
      "hasChild": { #Delete the relation
        "uid": "0x2751"
    },
    {
      "uid": "0x2751" #than delete the child
    },
    }
  ]
}
{
  delete {
    <0x2752> <hasChild> * .
    <0x2753> * * .
    }
}

Query Again

{
  q(func: eq(name, "Parent 1")){
    uid
    name
    hasChild {
      uid
      name
    }
  }
}

@limwcj
Copy link
Author

limwcj commented Oct 17, 2019

We're currently testing our codebase on our staging environment, and it's very hard to catch this issue because it only happens sometimes. We tested the delete mutation with S P * . hundres of times and it seems to only occur around 10% of the time. When it does happen, the broken node's outgoing edge cannot be deleted with the S P * pattern and can only be deleted with the explicit definition using the S P S pattern

The most concerning aspect of this is that we are testing this with new data not old data so this bug actually appears locally on completely new generated data.

Is there any logic that has changed the way delete operations work with S P *. that we're not aware of? Or maybe it's the one to one/one-to-many relation that has changed this? Thanks!

@limwcj limwcj changed the title V1.1 Urgent Issue: Cannot Delete Outgoing Edge with <node> <edge> * Delete Mutation After DGraph Upgrade V1.1 Urgent Issue: Cannot Delete Outgoing Edge with N P * pattern After DGraph Upgrade Oct 17, 2019
@limwcj limwcj changed the title V1.1 Urgent Issue: Cannot Delete Outgoing Edge with N P * pattern After DGraph Upgrade V1.1 Urgent Issue: Cannot Delete Outgoing Edge with N P * pattern After DGraph Upgrade Oct 17, 2019
@limwcj
Copy link
Author

limwcj commented Oct 17, 2019

The weirdest thing is that after deleting it with S P *, the mutation returns success and you can clearly see the predicate that has been "deleted" in the response. But when you query the result it's still there.

Success After Deleting

We've also made sure to define the relationship explicitly in the schema between the cell and the column option as seen here.
Cell and Column Option Schema definition

The curious case with this bug is that it only happens sometimes and it's actually hard to replicate even on our side but we have data right now that is "broken" that we can show. If possible, we'd like to demonstrate this on our end. Thanks!

@MichelDiz
Copy link
Contributor

Well, if you can reproduce this with new data, surely I could be able to do it too. Need to know how. I'll try to create crazy tests in JS.

Is there any logic that has changed the way delete operations work with S P *.

Yes, you must have a well-defined type system and scalar/entities types also correctly defined (like one-one, one-many). Only this. As far as I know.

Can you share the whole context config you have? dgraph configs and so on. Can you replicate this in a simple configuration?

@limwcj
Copy link
Author

limwcj commented Oct 17, 2019

Hi @MichelDiz, thank you for your quick response. The issue on our end is it's actually something that happens even when we do the delete mutation in DGraph ratel (not just using the dgraph-js package). My current presumption is that the issue is not an issue with the Dgraph JS package but more of an issue with the delete operation itself. I think it would be easiest for us to show you in this short video

@mangalaman93
Copy link
Contributor

@limwcj Have you tried reproducing this issue on current master, just in case? Happy to provide binaries if needed.

@rickytyho
Copy link

Hi @mangalaman93 , I'm with Lim as well. What do you need us to provide? We are currently using docker to pull Dgraph V1.1. Are you suggesting us clone master? That would require some significant changes on our end to get our setup running? Is it possible for us to export the data set to RDF and provide the relevant queries and mutations to replicate the issue on your end? Thanks!

@mangalaman93
Copy link
Contributor

mangalaman93 commented Oct 17, 2019

oh, yeah! That'd work too. You could share the data (assuming no compliance issue) here or email me on aman [AT] dgraph.io. What I would need is the whole p directories, you could just compress and share the tar.gz flie with me for each node.

@rickytyho
Copy link

rickytyho commented Oct 17, 2019

Yep @mangalaman93 , we are going to export a part of the data set for you and provide the exact mutation and queries to make to replicate the bug. Also did you manage to watch the video that Lim posted above? Do you see any immediate issues with the way he's making the delete mutation/query? Thanks!

@mangalaman93
Copy link
Contributor

I can't see any issues, looks like a bug to me. This is such a common use case that we do not expect a bug in this use case. We have many test cases that test this scenario. Though, only when we could look into it, we'd know more.

@rickytyho
Copy link

rickytyho commented Oct 17, 2019

Hi Aman @mangalaman93 , thank you for your response. I sent an email to [email protected] with our DGraph dataset and replication steps to see the bug. Thanks!

@animesh2049
Copy link
Contributor

I am able to reproduce the issue with the help of data set provided by @rickytyho . I am looking into it.

@animesh2049 animesh2049 self-assigned this Oct 17, 2019
@animesh2049 animesh2049 added kind/bug Something is broken. priority/P0 Critical issue that requires immediate attention. status/accepted We accept to investigate/work on it. labels Oct 17, 2019
@rickytyho
Copy link

Thanks @animesh2049 , let us know if you need any additional information. Happy to provide.

@martinmr
Copy link
Contributor

To clarify S P * deletions didn't change because of the type system (S * * deletions were changed by the type system). So it's probably nothing to do with the type system.

@rickytyho
Copy link

rickytyho commented Oct 18, 2019

We experienced the same bug again but this time we are unable to delete any predicates from the node with the S P *. We recorded another video here demonstrating the issue. We are currently exporting the data set and we will send this data to @animesh2049 and @mangalaman93 for review. Thanks!

@rickytyho
Copy link

@animesh2049 Hi Amimesh, this is happening to us quite frequently and is causing our application to break in many places. Have you narrowed down on the source of the bug?

@rickytyho
Copy link

Thanks @pawanrawal for the patch. This issue first surfaced when we loaded data from the bulk loader, but it also occurred again when we tested it locally with newly generated data. Does this patch address the same issue from newly created data as well (maybe something to do with the immutable layer)? Thanks

pawanrawal added a commit that referenced this issue Oct 23, 2019
… > 0 (#4204)

Don't iterate over the immutable layer when we have a delete marker. Earlier after doing a S P * deletion, we were still returning the first uid from the immutable layer.

Fixes #4182

This bug was introduced in #3105 as part of the PR which introduced multi-part posting lists.
@rickytyho
Copy link

Hi @pawanrawal , is there a planned scheduled release for this update fix?

@campoy
Copy link
Contributor

campoy commented Oct 24, 2019

This should be released with 1.1.1, scheduled to be released in a couple of weeks.

Alternatively, you could compile from source or use the dgraph/dgraph:master Docker image

@rickytyho
Copy link

@campoy Thanks a lot! We'll try using dgraph/dgraph:master for now.

@rickytyho
Copy link

@campoy , Just realized that the image dgraph/dgraph:master was built over a month ago. Is there a more recent image that encompasses the bug fix? Becuase dgraph/dgraph:master still has the same bug for 4182

@martinmr
Copy link
Contributor

We have some issues with our release script because we just moved to to using go modules and the script does not work with it.

But I managed to build and push a new docker image with the master tag. Can you give it a try? It doesn't come with Ratel because I haven't managed to make the script work with it. Should you need ratel you can always run it on your workstatoin. Let me know if you can pull the new image.

@rickytyho
Copy link

Thanks a lot @martinmr . Currently, we've actually built our own image of Dgraph using master repo, but we'll switch over to using dgraph/dgraph:master to see if that works as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something is broken. priority/P0 Critical issue that requires immediate attention. status/accepted We accept to investigate/work on it.
Development

Successfully merging a pull request may close this issue.

7 participants