Skip to content

Commit

Permalink
Merge branch 'hashmap_iter_bucket_lock_fix'
Browse files Browse the repository at this point in the history
Yonghong Song says:

====================
Currently, the bpf hashmap iterator takes a bucket_lock, a spin_lock,
before visiting each element in the bucket. This will cause a deadlock
if a map update/delete operates on an element with the same
bucket id of the visited map.

To avoid the deadlock, let us just use rcu_read_lock instead of
bucket_lock. This may result in visiting stale elements, missing some elements,
or repeating some elements, if concurrent map delete/update happens for the
same map. I think using rcu_read_lock is a reasonable compromise.
For users caring stale/missing/repeating element issues, bpf map batch
access syscall interface can be used.

Note that another approach is during bpf_iter link stage, we check
whether the iter program might be able to do update/delete to the visited
map. If it is, reject the link_create. Verifier needs to record whether
an update/delete operation happens for each map for this approach.
I just feel this checking is too specialized, hence still prefer
rcu_read_lock approach.

Patch #1 has the kernel implementation and Patch #2 added a selftest
which can trigger deadlock without Patch #1.
====================

Signed-off-by: Alexei Starovoitov <[email protected]>
  • Loading branch information
Alexei Starovoitov committed Sep 4, 2020
2 parents 21e9ba5 + 4daab71 commit e6135df
Show file tree
Hide file tree
Showing 2 changed files with 19 additions and 11 deletions.
15 changes: 4 additions & 11 deletions kernel/bpf/hashtab.c
Original file line number Diff line number Diff line change
Expand Up @@ -1622,7 +1622,6 @@ struct bpf_iter_seq_hash_map_info {
struct bpf_map *map;
struct bpf_htab *htab;
void *percpu_value_buf; // non-zero means percpu hash
unsigned long flags;
u32 bucket_id;
u32 skip_elems;
};
Expand All @@ -1632,7 +1631,6 @@ bpf_hash_map_seq_find_next(struct bpf_iter_seq_hash_map_info *info,
struct htab_elem *prev_elem)
{
const struct bpf_htab *htab = info->htab;
unsigned long flags = info->flags;
u32 skip_elems = info->skip_elems;
u32 bucket_id = info->bucket_id;
struct hlist_nulls_head *head;
Expand All @@ -1656,27 +1654,26 @@ bpf_hash_map_seq_find_next(struct bpf_iter_seq_hash_map_info *info,

/* not found, unlock and go to the next bucket */
b = &htab->buckets[bucket_id++];
htab_unlock_bucket(htab, b, flags);
rcu_read_unlock();
skip_elems = 0;
}

for (i = bucket_id; i < htab->n_buckets; i++) {
b = &htab->buckets[i];
flags = htab_lock_bucket(htab, b);
rcu_read_lock();

count = 0;
head = &b->head;
hlist_nulls_for_each_entry_rcu(elem, n, head, hash_node) {
if (count >= skip_elems) {
info->flags = flags;
info->bucket_id = i;
info->skip_elems = count;
return elem;
}
count++;
}

htab_unlock_bucket(htab, b, flags);
rcu_read_unlock();
skip_elems = 0;
}

Expand Down Expand Up @@ -1754,14 +1751,10 @@ static int bpf_hash_map_seq_show(struct seq_file *seq, void *v)

static void bpf_hash_map_seq_stop(struct seq_file *seq, void *v)
{
struct bpf_iter_seq_hash_map_info *info = seq->private;

if (!v)
(void)__bpf_hash_map_seq_show(seq, NULL);
else
htab_unlock_bucket(info->htab,
&info->htab->buckets[info->bucket_id],
info->flags);
rcu_read_unlock();
}

static int bpf_iter_init_hash_map(void *priv_data,
Expand Down
15 changes: 15 additions & 0 deletions tools/testing/selftests/bpf/progs/bpf_iter_bpf_hash_map.c
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,10 @@ int dump_bpf_hash_map(struct bpf_iter__bpf_map_elem *ctx)
__u32 seq_num = ctx->meta->seq_num;
struct bpf_map *map = ctx->map;
struct key_t *key = ctx->key;
struct key_t tmp_key;
__u64 *val = ctx->value;
__u64 tmp_val = 0;
int ret;

if (in_test_mode) {
/* test mode is used by selftests to
Expand All @@ -61,6 +64,18 @@ int dump_bpf_hash_map(struct bpf_iter__bpf_map_elem *ctx)
if (key == (void *)0 || val == (void *)0)
return 0;

/* update the value and then delete the <key, value> pair.
* it should not impact the existing 'val' which is still
* accessible under rcu.
*/
__builtin_memcpy(&tmp_key, key, sizeof(struct key_t));
ret = bpf_map_update_elem(&hashmap1, &tmp_key, &tmp_val, 0);
if (ret)
return 0;
ret = bpf_map_delete_elem(&hashmap1, &tmp_key);
if (ret)
return 0;

key_sum_a += key->a;
key_sum_b += key->b;
key_sum_c += key->c;
Expand Down

0 comments on commit e6135df

Please sign in to comment.