Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Q: how do I delete a range? #18

Open
glycerine opened this issue Feb 1, 2017 · 6 comments
Open

Q: how do I delete a range? #18

glycerine opened this issue Feb 1, 2017 · 6 comments

Comments

@glycerine
Copy link

Suppose I have a simple set of intervals whose end time (endx) is stored in a btree. Lets say the interval represents the time a user's compute job completed, so I store the userid and a jobid with it too. I'll use the jobid (assume they are all unique) to break ties so that all records stick around.

Then I want to delete all those records with time <= some end time. Here's what I try:

package main

import (
	"fmt"

	"github.com/google/btree"
)

type job struct {
	endx  int64
	user  string
	jobid string
}

func (a *job) Less(than btree.Item) bool {
	b := than.(*job)
	if a.endx == b.endx {
		// simply distinguish the order,
		// so we get to keep both in the tree.
		return a.jobid < b.jobid
	}
	return a.endx < b.endx
}

func main() {
	tree := btree.New(2)

	tree.ReplaceOrInsert(&job{endx: 1, jobid: "hij"})
	tree.ReplaceOrInsert(&job{endx: 1, jobid: "abc"})
	tree.ReplaceOrInsert(&job{endx: 1, jobid: "def"})

	tree.ReplaceOrInsert(&job{endx: 2, jobid: "hij"})
	tree.ReplaceOrInsert(&job{endx: 2, jobid: "abc"})
	tree.ReplaceOrInsert(&job{endx: 2, jobid: "def"})

	tree.ReplaceOrInsert(&job{endx: 3, jobid: "hij"})
	tree.ReplaceOrInsert(&job{endx: 3, jobid: "abc"})
	tree.ReplaceOrInsert(&job{endx: 3, jobid: "def"})

	fmt.Printf("len = %v\n", tree.Len())

	// delete through the 2s
	tree.AscendGreaterOrEqual(&job{endx: 0}, func(item btree.Item) bool {
		j := item.(*job)
		if j.endx <= 2 {
			fmt.Printf("delete pass deletes %#v\n", item)
			tree.Delete(j)
			return true
		}
		fmt.Printf("delete pass ignores %#v\n", item)
		return false
	})

	fmt.Printf("\n\n tree after deleting everything with "+
		"endx <=2, is size %v; should be size 3\n",
		tree.Len())
	tree.AscendGreaterOrEqual(&job{endx: 0}, func(item btree.Item) bool {

		fmt.Printf("%#v\n", item)
		return true
	})

}

here's what I get. This strange result presents in the upstream petar/gollrb as well.

                                                                                                  
go run ex1.go                                                                                      
len = 9                                                                                            
delete pass deletes &main.job{endx:1, user:"", jobid:"abc"}                                        
delete pass deletes &main.job{endx:1, user:"", jobid:"hij"}                                        
delete pass deletes &main.job{endx:2, user:"", jobid:"abc"}                                        
delete pass ignores &main.job{endx:3, user:"", jobid:"abc"}                                        
                                                                                                   
                                                                                                   
 tree after deleting everything with endx <=2, is size 6; should be size 3                         
&main.job{endx:1, user:"", jobid:"def"}                                                            
&main.job{endx:2, user:"", jobid:"def"}                                                            
&main.job{endx:2, user:"", jobid:"hij"}                                                            
&main.job{endx:3, user:"", jobid:"abc"}                                                            
&main.job{endx:3, user:"", jobid:"def"}                                                            
&main.job{endx:3, user:"", jobid:"hij"}                                                            
@glycerine
Copy link
Author

For the sake of generating API ideas, consider how this works in a another red-black tree package. I can advance the iterator before deleting behind me:

package main
import (
	"fmt"
	"github.com/glycerine/rbtree"
)

type job struct {
	endx  int64
	user  string
	jobid string
}

func newTree() *rbtree.Tree {
	return rbtree.NewTree(
		func(a1, b2 rbtree.Item) int {
			a := a1.(*job)
			b := b2.(*job)
			if a.endx == b.endx {
				switch {
				case a.jobid == b.jobid:
					return 0
				case a.jobid < b.jobid:
					return -1
				case a.jobid > b.jobid:
					return 1
				}
			}
			return int(a.endx - b.endx)
		})
}

func main() {
	tree := newTree()

	tree.Insert(&job{endx: 1, jobid: "hij"})
	tree.Insert(&job{endx: 1, jobid: "abc"})
	tree.Insert(&job{endx: 1, jobid: "def"})

	tree.Insert(&job{endx: 2, jobid: "hij"})
	tree.Insert(&job{endx: 2, jobid: "abc"})
	tree.Insert(&job{endx: 2, jobid: "def"})

	tree.Insert(&job{endx: 3, jobid: "hij"})
	tree.Insert(&job{endx: 3, jobid: "abc"})
	tree.Insert(&job{endx: 3, jobid: "def"})

	show(tree)

	// delete through the 2s

	for it := tree.Min(); !it.Limit(); {
		item := it.Item().(*job)
		if item.endx <= 2 {
			fmt.Printf("delete pass deletes %#v\n", item)

			next := it.Next()
			tree.DeleteWithIterator(it)
			it = next
		} else {
			fmt.Printf("delete pass ignores %#v\n", item)
			break // we can stop scanning now.
		}
	}
	show(tree)
}

func show(tree *rbtree.Tree) {
	fmt.Printf("showing tree: len = %v\n", tree.Len())
	for it := tree.Min(); !it.Limit(); it = it.Next() {
		item := it.Item().(*job)
		fmt.Printf("%#v\n", item)
	}

}

output:

showing tree: len = 9                                                                              
&main.job{endx:1, user:"", jobid:"abc"}                                                            
&main.job{endx:1, user:"", jobid:"def"}                                                            
&main.job{endx:1, user:"", jobid:"hij"}                                                            
&main.job{endx:2, user:"", jobid:"abc"}                                                            
&main.job{endx:2, user:"", jobid:"def"}                                                            
&main.job{endx:2, user:"", jobid:"hij"}                                                            
&main.job{endx:3, user:"", jobid:"abc"}                                                            
&main.job{endx:3, user:"", jobid:"def"}                                                            
&main.job{endx:3, user:"", jobid:"hij"}                                                            
delete pass deletes &main.job{endx:1, user:"", jobid:"abc"}                                        
delete pass deletes &main.job{endx:1, user:"", jobid:"def"}                                        
delete pass deletes &main.job{endx:1, user:"", jobid:"hij"}                                        
delete pass deletes &main.job{endx:2, user:"", jobid:"abc"}                                        
delete pass deletes &main.job{endx:2, user:"", jobid:"def"}                                        
delete pass deletes &main.job{endx:2, user:"", jobid:"hij"}                                        
delete pass ignores &main.job{endx:3, user:"", jobid:"abc"}                                        
showing tree: len = 3                                                                              
&main.job{endx:3, user:"", jobid:"abc"}                                                            
&main.job{endx:3, user:"", jobid:"def"}                                                            
&main.job{endx:3, user:"", jobid:"hij"}      

@gconnell
Copy link
Collaborator

gconnell commented Mar 1, 2017

Great question, and sorry I didn't see this sooner.

As you've found, currently our iterator functions aren't resilient to mutations during iteration. We might work on this later (and happy to accept pull requests ;), but to get things going right away, you could do this:

todelete := []btree.Item{}
tree.Ascend(..., func(x btree.Item) bool {
  todelete = append(todelete, x)
  return true
})
for _, x := range todelete {
  tree.Delete(x)
}

It's not as efficient as it could be, and it requires extra storage, but it also actually works right now :)

@glycerine
Copy link
Author

glycerine commented Mar 1, 2017

For efficiency, the DeleteMin() method turned out to work well. This is isn't a general solution if you want to delete a range that starts in the middle of you elements, but it worked for the case above. As in the example above, where you want to delete all elements belong some time threshold, with a callback for each deletion, I'm doing:

func deleteThrough(t *btree, x int64, callme func(goner *mjob, through int64)) {                              
                                                                                                                
    for {                                                                                                       
        min := t.Min()                                                                                          
        if min == nil {                                                                                         
            return                                                                                              
        }                                                                                                       
        cur := min.(*mjob)                                                                                      
        if cur.endx > x {                                                                                       
            return                                                                                              
        }                                                                                                                                                            
        t.DeleteMin()                                                                                           
        if callme != nil {                                                                                      
            callme(cur, x)                                                                                      
        }                                                                                                       
    }                                                                                                           
}  

At least I think (hope) this is efficient. If its not, please let me know! :-) Typically a tree will keep a pointer to its minimum element... but if its log(n) to find the smallest element, this might be a problem...

@glycerine
Copy link
Author

glycerine commented Mar 1, 2017

Actually its worse than I expected.

https://github.com/google/btree/blob/master/btree.go#L153

makes it look like every delete is O(n) in the size of the tree n, as it does a full copy of the entire tree.

This is really, really, not what I would expect from a balanced binary tree implementation. Deleting the entire tree becomes an O(n*n) operation.

This is important. I'll make a separate issue.

@gconnell
Copy link
Collaborator

gconnell commented Mar 1, 2017

Nope, your original supposition was correct: it's O(logn) for a delete.

https://github.com/google/btree/blob/master/btree.go#L153 is the code called to delete an item from a single node, which is O(degree) of the btree (items will be max 'degree' in length, one itemset per node).

Your code is nicely efficient, but could be slightly moreso if you expect to delete more than 2 entries in your loop. In that case, you could blindly issue the delete, then put it back when you get to the one you care about.

var item btree.Item
for item = t.DeleteMin(); item.(*mjob).endx <= x {}
t.Insert(item)

Calling Min, which is O(logn), then Delete, which is O(logn) effectively doubles your processing time, unless you expect to only delete zero or one element each time.

@cstockton
Copy link

I ran into having to clear the tree myself today, created #20 which if it's not accepted it's small enough to copy pasta if it will help you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants