Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Just to trigger #8

Closed
wants to merge 2 commits into from
Closed

Just to trigger #8

wants to merge 2 commits into from

Conversation

mdiep25
Copy link

@mdiep25 mdiep25 commented Nov 17, 2017

Signed-off-by: Minh Diep [email protected]

@daos-jenkins
Copy link
Collaborator

Can one of the admins verify this patch?

@mdiep25 mdiep25 force-pushed the mdiep-test branch 7 times, most recently from 56cc1c0 to a0a5a6d Compare November 17, 2017 23:12
Signed-off-by: Minh Diep <[email protected]>
@mdiep25
Copy link
Author

mdiep25 commented Nov 17, 2017

testing

@daos-jenkins
Copy link
Collaborator

Can one of the admins verify this patch?

@mdiep25 mdiep25 force-pushed the mdiep-test branch 8 times, most recently from 5f3f73c to d4ca69b Compare November 18, 2017 00:13
@mdiep25
Copy link
Author

mdiep25 commented Nov 18, 2017

just one

@mdiep25 mdiep25 force-pushed the mdiep-test branch 4 times, most recently from 0a58fdf to 951b00a Compare November 18, 2017 00:23
@mdiep25 mdiep25 force-pushed the mdiep-test branch 2 times, most recently from 41a6dad to d4cc914 Compare November 28, 2017 01:14
Signed-off-by: Minh Diep <[email protected]>
@johannlombardi johannlombardi deleted the mdiep-test branch August 29, 2018 20:53
jolivier23 added a commit that referenced this pull request Mar 23, 2020
#8)

clang-format doesn't work on older clang compilers.   Rather than
skipping the step, instead use fallback of Mozilla if the version
isn't suitable.

Signed-off-by: Jeff Olivier <[email protected]>
jolivier23 pushed a commit that referenced this pull request Apr 21, 2022
#8780)

It is possible that the error is not reset to 0 when a process is retrieving the private event in a multi-threaded environment since another thread might have processed the event completion.

add a lock on daos_event_t for events that are not part of an event queue, including the thread private event. This is required because in a multithreaded environment like dfuse, the event can be completed and the error set non
atomically, while the main thread sees the completion but before the error has actually set, causing the owner of the event to reset the event error, with the other thread setting the error after. this affects future use of the event that launches with ev_error already set to the previous error.

Signed-off-by: Mohamad Chaarawi <[email protected]>
jolivier23 pushed a commit that referenced this pull request Apr 21, 2022
#8779)

It is possible that the error is not reset to 0 when a process is retrieving the private event in a multi-threaded environment since another thread might have processed the event completion.

add a lock on daos_event_t for events that are not part of an event queue, including the thread private event. This is required because in a multithreaded environment like dfuse, the event can be completed and the error set non
atomically, while the main thread sees the completion but before the error has actually set, causing the owner of the event to reset the event error, with the other thread setting the error after. this affects future use of the event that launches with ev_error already set to the previous error.

Signed-off-by: Mohamad Chaarawi <[email protected]>
liw added a commit that referenced this pull request Jun 7, 2022
Makito and Samir observed the following assertion failure after
restarting engines.

  #0  raise () from /lib64/libc.so.6
  #1  abort () from /lib64/libc.so.6
  #2  __assert_fail_base () from /lib64/libc.so.6
  #3  __assert_fail () from /lib64/libc.so.6
  #4  pool_map_get_version (map=0x0) at src/common/pool_map.c:2852
  #5  ds_pool_get_version (pool=0x7f0ca063c690, pool=0x7f0ca063c690) at
      src/include/daos_srv/pool.h:296
  #6  pc=rpc@entry=0x7f0ca0998d30, p_rpt=p_rpt@entry=0x7f0ca83a77b0) at
      src/rebuild/srv.c:2101
  #7  rebuild_tgt_scan_handler (rpc=0x7f0ca0998d30) at
      src/rebuild/scan.c:954
  #8  crt_handle_rpc (arg=0x7f0ca0998d30) at src/cart/crt_rpc.c:1654
  #9  ABTD_ythread_func_wrapper (p_arg=0x7f0ca83a78a0) at
      arch/abtd_ythread.c:21
  #10 make_fcontext () from /usr/lib64/libabt.so.1
  #11 ?? ()

The ds_pool_get_version call passed a NULL map argument to
pool_map_get_version. The ds_pool.sp_map field may be NULL after the
pool is started but before the pool receives the initial pool map from
the pool service. This patch fixes ds_pool_get_version to return 0,
which is less than all valid pool map versions, when sp_map is NULL,
resulting in rebuild retries like this:

  Rebuild [queued] (pool=3bf68c9c ver=2) tgts=2
  Rebuild [started] (pool 3bf68c9c ver=2)
  Rebuild [failed] (pool 3bf68c9c ver=2 status=DER_BUSY(-1012): 'Device
    or resource busy')
  Rebuild [queued] (pool=3bf68c9c ver=2) tgts=2
  Rebuild [started] (pool 3bf68c9c ver=2)
  Rebuild [failed] (pool 3bf68c9c ver=2 status=DER_BUSY(-1012): 'Device
    or resource busy')
  Rebuild [queued] (pool=3bf68c9c ver=2) tgts=2
  Rebuild [started] (pool 3bf68c9c ver=2)
  Rebuild [scanning] (pool 3bf68c9c ver=2, toberb_obj=0, rb_obj=0, [...]
  Rebuild [scanning] (pool 3bf68c9c ver=2, toberb_obj=0, rb_obj=0, [...]
  Rebuild [completed] (pool 3bf68c9c ver=2, toberb_obj=0, rb_obj=0,[...]
  Target[2] (rank 2 idx 0 status 16 ver 1) is excluded.

Also, this patch removes some rebuild code that handles NULL
ds_pool.sp_group fields. Those can not happen as we always initialize
sp_group (as well as sp_iv_ns) before putting a ds_pool object into the
LRU.

Signed-off-by: Li Wei <[email protected]>
shimizukko pushed a commit that referenced this pull request Jun 28, 2022
Makito and Samir observed the following assertion failure after
restarting engines.

  #0  raise () from /lib64/libc.so.6
  #1  abort () from /lib64/libc.so.6
  #2  __assert_fail_base () from /lib64/libc.so.6
  #3  __assert_fail () from /lib64/libc.so.6
  #4  pool_map_get_version (map=0x0) at src/common/pool_map.c:2852
  #5  ds_pool_get_version (pool=0x7f0ca063c690, pool=0x7f0ca063c690) at
      src/include/daos_srv/pool.h:296
  #6  pc=rpc@entry=0x7f0ca0998d30, p_rpt=p_rpt@entry=0x7f0ca83a77b0) at
      src/rebuild/srv.c:2101
  #7  rebuild_tgt_scan_handler (rpc=0x7f0ca0998d30) at
      src/rebuild/scan.c:954
  #8  crt_handle_rpc (arg=0x7f0ca0998d30) at src/cart/crt_rpc.c:1654
  #9  ABTD_ythread_func_wrapper (p_arg=0x7f0ca83a78a0) at
      arch/abtd_ythread.c:21
  #10 make_fcontext () from /usr/lib64/libabt.so.1
  #11 ?? ()

The ds_pool_get_version call passed a NULL map argument to
pool_map_get_version. The ds_pool.sp_map field may be NULL after the
pool is started but before the pool receives the initial pool map from
the pool service. This patch fixes ds_pool_get_version to return 0,
which is less than all valid pool map versions, when sp_map is NULL,
resulting in rebuild retries like this:

  Rebuild [queued] (pool=3bf68c9c ver=2) tgts=2
  Rebuild [started] (pool 3bf68c9c ver=2)
  Rebuild [failed] (pool 3bf68c9c ver=2 status=DER_BUSY(-1012): 'Device
    or resource busy')
  Rebuild [queued] (pool=3bf68c9c ver=2) tgts=2
  Rebuild [started] (pool 3bf68c9c ver=2)
  Rebuild [failed] (pool 3bf68c9c ver=2 status=DER_BUSY(-1012): 'Device
    or resource busy')
  Rebuild [queued] (pool=3bf68c9c ver=2) tgts=2
  Rebuild [started] (pool 3bf68c9c ver=2)
  Rebuild [scanning] (pool 3bf68c9c ver=2, toberb_obj=0, rb_obj=0, [...]
  Rebuild [scanning] (pool 3bf68c9c ver=2, toberb_obj=0, rb_obj=0, [...]
  Rebuild [completed] (pool 3bf68c9c ver=2, toberb_obj=0, rb_obj=0,[...]
  Target[2] (rank 2 idx 0 status 16 ver 1) is excluded.

Also, this patch removes some rebuild code that handles NULL
ds_pool.sp_group fields. Those can not happen as we always initialize
sp_group (as well as sp_iv_ns) before putting a ds_pool object into the
LRU.

Skip-test: true
Signed-off-by: Li Wei <[email protected]>
liw added a commit that referenced this pull request Jul 6, 2022
Makito and Samir observed the following assertion failure after
restarting engines.

  #0  raise () from /lib64/libc.so.6
  #1  abort () from /lib64/libc.so.6
  #2  __assert_fail_base () from /lib64/libc.so.6
  #3  __assert_fail () from /lib64/libc.so.6
  #4  pool_map_get_version (map=0x0) at src/common/pool_map.c:2852
  #5  ds_pool_get_version (pool=0x7f0ca063c690, pool=0x7f0ca063c690) at
      src/include/daos_srv/pool.h:296
  #6  pc=rpc@entry=0x7f0ca0998d30, p_rpt=p_rpt@entry=0x7f0ca83a77b0) at
      src/rebuild/srv.c:2101
  #7  rebuild_tgt_scan_handler (rpc=0x7f0ca0998d30) at
      src/rebuild/scan.c:954
  #8  crt_handle_rpc (arg=0x7f0ca0998d30) at src/cart/crt_rpc.c:1654
  #9  ABTD_ythread_func_wrapper (p_arg=0x7f0ca83a78a0) at
      arch/abtd_ythread.c:21
  #10 make_fcontext () from /usr/lib64/libabt.so.1
  #11 ?? ()

The ds_pool_get_version call passed a NULL map argument to
pool_map_get_version. The ds_pool.sp_map field may be NULL after the
pool is started but before the pool receives the initial pool map from
the pool service. This patch fixes ds_pool_get_version to return 0,
which is less than all valid pool map versions, when sp_map is NULL,
resulting in rebuild retries like this:

  Rebuild [queued] (pool=3bf68c9c ver=2) tgts=2
  Rebuild [started] (pool 3bf68c9c ver=2)
  Rebuild [failed] (pool 3bf68c9c ver=2 status=DER_BUSY(-1012): 'Device
    or resource busy')
  Rebuild [queued] (pool=3bf68c9c ver=2) tgts=2
  Rebuild [started] (pool 3bf68c9c ver=2)
  Rebuild [failed] (pool 3bf68c9c ver=2 status=DER_BUSY(-1012): 'Device
    or resource busy')
  Rebuild [queued] (pool=3bf68c9c ver=2) tgts=2
  Rebuild [started] (pool 3bf68c9c ver=2)
  Rebuild [scanning] (pool 3bf68c9c ver=2, toberb_obj=0, rb_obj=0, [...]
  Rebuild [scanning] (pool 3bf68c9c ver=2, toberb_obj=0, rb_obj=0, [...]
  Rebuild [completed] (pool 3bf68c9c ver=2, toberb_obj=0, rb_obj=0,[...]
  Target[2] (rank 2 idx 0 status 16 ver 1) is excluded.

Also, this patch removes some rebuild code that handles NULL
ds_pool.sp_group fields. Those can not happen as we always initialize
sp_group (as well as sp_iv_ns) before putting a ds_pool object into the
LRU.

Signed-off-by: Li Wei <[email protected]>
jolivier23 pushed a commit that referenced this pull request Jul 19, 2022
Makito and Samir observed the following assertion failure after
restarting engines.

  #0  raise () from /lib64/libc.so.6
  #1  abort () from /lib64/libc.so.6
  #2  __assert_fail_base () from /lib64/libc.so.6
  #3  __assert_fail () from /lib64/libc.so.6
  #4  pool_map_get_version (map=0x0) at src/common/pool_map.c:2852
  #5  ds_pool_get_version (pool=0x7f0ca063c690, pool=0x7f0ca063c690) at
      src/include/daos_srv/pool.h:296
  #6  pc=rpc@entry=0x7f0ca0998d30, p_rpt=p_rpt@entry=0x7f0ca83a77b0) at
      src/rebuild/srv.c:2101
  #7  rebuild_tgt_scan_handler (rpc=0x7f0ca0998d30) at
      src/rebuild/scan.c:954
  #8  crt_handle_rpc (arg=0x7f0ca0998d30) at src/cart/crt_rpc.c:1654
  #9  ABTD_ythread_func_wrapper (p_arg=0x7f0ca83a78a0) at
      arch/abtd_ythread.c:21
  #10 make_fcontext () from /usr/lib64/libabt.so.1
  #11 ?? ()

The ds_pool_get_version call passed a NULL map argument to
pool_map_get_version. The ds_pool.sp_map field may be NULL after the
pool is started but before the pool receives the initial pool map from
the pool service. This patch fixes ds_pool_get_version to return 0,
which is less than all valid pool map versions, when sp_map is NULL,
resulting in rebuild retries like this:

  Rebuild [queued] (pool=3bf68c9c ver=2) tgts=2
  Rebuild [started] (pool 3bf68c9c ver=2)
  Rebuild [failed] (pool 3bf68c9c ver=2 status=DER_BUSY(-1012): 'Device
    or resource busy')
  Rebuild [queued] (pool=3bf68c9c ver=2) tgts=2
  Rebuild [started] (pool 3bf68c9c ver=2)
  Rebuild [failed] (pool 3bf68c9c ver=2 status=DER_BUSY(-1012): 'Device
    or resource busy')
  Rebuild [queued] (pool=3bf68c9c ver=2) tgts=2
  Rebuild [started] (pool 3bf68c9c ver=2)
  Rebuild [scanning] (pool 3bf68c9c ver=2, toberb_obj=0, rb_obj=0, [...]
  Rebuild [scanning] (pool 3bf68c9c ver=2, toberb_obj=0, rb_obj=0, [...]
  Rebuild [completed] (pool 3bf68c9c ver=2, toberb_obj=0, rb_obj=0,[...]
  Target[2] (rank 2 idx 0 status 16 ver 1) is excluded.

Also, this patch removes some rebuild code that handles NULL
ds_pool.sp_group fields. Those can not happen as we always initialize
sp_group (as well as sp_iv_ns) before putting a ds_pool object into the
LRU.

Signed-off-by: Li Wei <[email protected]>
liw added a commit that referenced this pull request Jul 20, 2022
Makito and Samir observed the following assertion failure after
restarting engines.

  #0  raise () from /lib64/libc.so.6
  #1  abort () from /lib64/libc.so.6
  #2  __assert_fail_base () from /lib64/libc.so.6
  #3  __assert_fail () from /lib64/libc.so.6
  #4  pool_map_get_version (map=0x0) at src/common/pool_map.c:2852
  #5  ds_pool_get_version (pool=0x7f0ca063c690, pool=0x7f0ca063c690) at
      src/include/daos_srv/pool.h:296
  #6  pc=rpc@entry=0x7f0ca0998d30, p_rpt=p_rpt@entry=0x7f0ca83a77b0) at
      src/rebuild/srv.c:2101
  #7  rebuild_tgt_scan_handler (rpc=0x7f0ca0998d30) at
      src/rebuild/scan.c:954
  #8  crt_handle_rpc (arg=0x7f0ca0998d30) at src/cart/crt_rpc.c:1654
  #9  ABTD_ythread_func_wrapper (p_arg=0x7f0ca83a78a0) at
      arch/abtd_ythread.c:21
  #10 make_fcontext () from /usr/lib64/libabt.so.1
  #11 ?? ()

The ds_pool_get_version call passed a NULL map argument to
pool_map_get_version. The ds_pool.sp_map field may be NULL after the
pool is started but before the pool receives the initial pool map from
the pool service. This patch fixes ds_pool_get_version to return 0,
which is less than all valid pool map versions, when sp_map is NULL,
resulting in rebuild retries like this:

  Rebuild [queued] (pool=3bf68c9c ver=2) tgts=2
  Rebuild [started] (pool 3bf68c9c ver=2)
  Rebuild [failed] (pool 3bf68c9c ver=2 status=DER_BUSY(-1012): 'Device
    or resource busy')
  Rebuild [queued] (pool=3bf68c9c ver=2) tgts=2
  Rebuild [started] (pool 3bf68c9c ver=2)
  Rebuild [failed] (pool 3bf68c9c ver=2 status=DER_BUSY(-1012): 'Device
    or resource busy')
  Rebuild [queued] (pool=3bf68c9c ver=2) tgts=2
  Rebuild [started] (pool 3bf68c9c ver=2)
  Rebuild [scanning] (pool 3bf68c9c ver=2, toberb_obj=0, rb_obj=0, [...]
  Rebuild [scanning] (pool 3bf68c9c ver=2, toberb_obj=0, rb_obj=0, [...]
  Rebuild [completed] (pool 3bf68c9c ver=2, toberb_obj=0, rb_obj=0,[...]
  Target[2] (rank 2 idx 0 status 16 ver 1) is excluded.

Also, this patch removes some rebuild code that handles NULL
ds_pool.sp_group fields. Those can not happen as we always initialize
sp_group (as well as sp_iv_ns) before putting a ds_pool object into the
LRU.

Signed-off-by: Li Wei <[email protected]>
jolivier23 pushed a commit that referenced this pull request Jul 21, 2022
Makito and Samir observed the following assertion failure after
restarting engines.

  #0  raise () from /lib64/libc.so.6
  #1  abort () from /lib64/libc.so.6
  #2  __assert_fail_base () from /lib64/libc.so.6
  #3  __assert_fail () from /lib64/libc.so.6
  #4  pool_map_get_version (map=0x0) at src/common/pool_map.c:2852
  #5  ds_pool_get_version (pool=0x7f0ca063c690, pool=0x7f0ca063c690) at
      src/include/daos_srv/pool.h:296
  #6  pc=rpc@entry=0x7f0ca0998d30, p_rpt=p_rpt@entry=0x7f0ca83a77b0) at
      src/rebuild/srv.c:2101
  #7  rebuild_tgt_scan_handler (rpc=0x7f0ca0998d30) at
      src/rebuild/scan.c:954
  #8  crt_handle_rpc (arg=0x7f0ca0998d30) at src/cart/crt_rpc.c:1654
  #9  ABTD_ythread_func_wrapper (p_arg=0x7f0ca83a78a0) at
      arch/abtd_ythread.c:21
  #10 make_fcontext () from /usr/lib64/libabt.so.1
  #11 ?? ()

The ds_pool_get_version call passed a NULL map argument to
pool_map_get_version. The ds_pool.sp_map field may be NULL after the
pool is started but before the pool receives the initial pool map from
the pool service. This patch fixes ds_pool_get_version to return 0,
which is less than all valid pool map versions, when sp_map is NULL,
resulting in rebuild retries like this:

  Rebuild [queued] (pool=3bf68c9c ver=2) tgts=2
  Rebuild [started] (pool 3bf68c9c ver=2)
  Rebuild [failed] (pool 3bf68c9c ver=2 status=DER_BUSY(-1012): 'Device
    or resource busy')
  Rebuild [queued] (pool=3bf68c9c ver=2) tgts=2
  Rebuild [started] (pool 3bf68c9c ver=2)
  Rebuild [failed] (pool 3bf68c9c ver=2 status=DER_BUSY(-1012): 'Device
    or resource busy')
  Rebuild [queued] (pool=3bf68c9c ver=2) tgts=2
  Rebuild [started] (pool 3bf68c9c ver=2)
  Rebuild [scanning] (pool 3bf68c9c ver=2, toberb_obj=0, rb_obj=0, [...]
  Rebuild [scanning] (pool 3bf68c9c ver=2, toberb_obj=0, rb_obj=0, [...]
  Rebuild [completed] (pool 3bf68c9c ver=2, toberb_obj=0, rb_obj=0,[...]
  Target[2] (rank 2 idx 0 status 16 ver 1) is excluded.

Also, this patch removes some rebuild code that handles NULL
ds_pool.sp_group fields. Those can not happen as we always initialize
sp_group (as well as sp_iv_ns) before putting a ds_pool object into the
LRU.

Signed-off-by: Li Wei <[email protected]>
mlawsonca pushed a commit that referenced this pull request May 16, 2023
osalyk pushed a commit to osalyk/daos that referenced this pull request Oct 18, 2023
…caffolding

dumps: dtx_update_abort scaffolding
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants