Skip to content

Commit

Permalink
clear_delta_base_cache(): don't modify hashmap while iterating
Browse files Browse the repository at this point in the history
On Thu, Jan 19, 2017 at 03:03:46PM +0100, Ulrich Spörlein wrote:

> > I suspect the patch below may fix things for you. It works around it by
> > walking over the lru list (either is fine, as they both contain all
> > entries, and since we're clearing everything, we don't care about the
> > order).
>
> Confirmed. With the patch applied, I can import the whole 55G in one go
> without any crashes or aborts. Thanks much!

Thanks. Here it is rolled up with a commit message.

-- >8 --
Subject: clear_delta_base_cache(): don't modify hashmap while iterating

Removing entries while iterating causes fast-import to
access an already-freed `struct packed_git`, leading to
various confusing errors.

What happens is that clear_delta_base_cache() drops the
whole contents of the cache by iterating over the hashmap,
calling release_delta_base_cache() on each entry. That
function removes the item from the hashmap. The hashmap code
may then shrink the table, but the hashmap_iter struct
retains an offset from the old table.

As a result, the next call to hashmap_iter_next() may claim
that the iteration is done, even though some items haven't
been visited.

The only caller of clear_delta_base_cache() is fast-import,
which wants to clear the cache because it is discarding the
packed_git struct for its temporary pack. So by failing to
remove all of the entries, we still have references to the
freed packed_git.

To make things even more confusing, this doesn't seem to
trigger with the test suite, because it depends on
complexities like the size of the hash table, which entries
got cleared, whether we try to access them before they're
evicted from the cache, etc.

So I've been able to identify the problem with large
imports like freebsd's svn import, or a fast-export of
linux.git. But nothing that would be reasonable to run as
part of the normal test suite.

We can fix this easily by iterating over the lru linked list
instead of the hashmap. They both contain the same entries,
and we can use the "safe" variant of the list iterator,
which exists for exactly this case.

Let's also add a warning to the hashmap API documentation to
reduce the chances of getting bit by this again.

Reported-by: Ulrich Spörlein <[email protected]>
Signed-off-by: Jeff King <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
  • Loading branch information
peff authored and gitster committed Jan 19, 2017
1 parent ad36dc8 commit abd5a00
Show file tree
Hide file tree
Showing 2 changed files with 7 additions and 6 deletions.
4 changes: 3 additions & 1 deletion Documentation/technical/api-hashmap.txt
Original file line number Diff line number Diff line change
Expand Up @@ -188,7 +188,9 @@ Returns the removed entry, or NULL if not found.
`void *hashmap_iter_next(struct hashmap_iter *iter)`::
`void *hashmap_iter_first(struct hashmap *map, struct hashmap_iter *iter)`::

Used to iterate over all entries of a hashmap.
Used to iterate over all entries of a hashmap. Note that it is
not safe to add or remove entries to the hashmap while
iterating.
+
`hashmap_iter_init` initializes a `hashmap_iter` structure.
+
Expand Down
9 changes: 4 additions & 5 deletions sha1_file.c
Original file line number Diff line number Diff line change
Expand Up @@ -2350,11 +2350,10 @@ static inline void release_delta_base_cache(struct delta_base_cache_entry *ent)

void clear_delta_base_cache(void)
{
struct hashmap_iter iter;
struct delta_base_cache_entry *entry;
for (entry = hashmap_iter_first(&delta_base_cache, &iter);
entry;
entry = hashmap_iter_next(&iter)) {
struct list_head *lru, *tmp;
list_for_each_safe(lru, tmp, &delta_base_cache_lru) {
struct delta_base_cache_entry *entry =
list_entry(lru, struct delta_base_cache_entry, lru);
release_delta_base_cache(entry);
}
}
Expand Down

0 comments on commit abd5a00

Please sign in to comment.