Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Put request processing should probably lock the datastore #2

Closed
peckato1 opened this issue Dec 5, 2023 · 1 comment
Closed

Put request processing should probably lock the datastore #2

peckato1 opened this issue Dec 5, 2023 · 1 comment

Comments

@peckato1
Copy link
Contributor

peckato1 commented Dec 5, 2023

In the PUT request writes we first check if the node we want to insert is present. This is needed for deducing the correct HTTP status code.

Consistent with [RFC7231], if the PUT request creates a new resource,
a "201 Created" status-line is returned. If an existing resource is
modified, a "204 No Content" status-line is returned.

These are 2 operations into the DS (first query for existence, then the actual write) while the DS is not locked. This means that somebody could change the the DS after we query but before we actually write into the datastore (i.e., he/she can insert/erase the data node we want to insert) and the status code would be invalid.

@jktjkt
Copy link
Contributor

jktjkt commented Dec 20, 2023

(and it's likely not just PUT, it's any method which validates whether something exists, and then goas ahead and does something.)

jktjkt pushed a commit that referenced this issue Jun 5, 2024
PUT returns HTTP status codes, one if the node was created and a
different one when node was modified. The code implements this using a
query (that is executed before the actual edit) to check if the node was
there before the edit.

This is potentially racy, because the node can be created between the
query and the actual edit.

This patch adds a sysrepo lock so this does not happen. There is one
problem, though. The candidate datastore reset itself to mirror running
when sr_unlock is called [1].
Therefore we do not lock the candidate DS.

[1] https://github.com/sysrepo/sysrepo/blob/4f40f5406f24e5a565d0ee554389fff2aad4bd1b/doc/main.dox#L94

Bug: #2
Change-Id: Ie211018e921b5b2d7a2f78d8cc531d58db6ac167
jktjkt pushed a commit that referenced this issue Jun 5, 2024
There is a possible data race in sysrepo when calling actions. First,
we check if the action node is present and then we executed the action.
There is a possibility that between those two sysrepo calls the action
node can be created or deleted.
Fortunately, sysrepo raises and error when the action node is not
present when performing the rpc call. Unfortunately, it is the same
error as if the input data tree for the rpc was invalid.

I was thinking about removing the check for the action node existence
and rely solely on the sysrepo's exception. This would indeed remove
the data race but the disadvantage is, that the error message is now
very broad (in case of actions) because we can't differentiate between
invalid input data and missing action node.

I don't think this would bring a better behaviour from rousette.
There are two possible scenarios with the current behaviour:

1) The data node gets created between the check and the rpc call.
From the client's point of view the node does not exist in the moment
the request is created and I *think* it is ok to reject it.

2) The data node gets deleted between the check and the rpc call.
From the client's point of view the node was existing and it is expected
to return something. However, sysrepo (and rousette) reports the
validation failure error which is (at least) confusing. Therefore I have
edited the error message to report this possibility as well.

Bug: #2
Change-Id: Iae1a70160a8c5d32ed041ef90c4b67e18cd964c1
jktjkt pushed a commit that referenced this issue Jun 7, 2024
PUT returns HTTP status codes, one if the node was created and a
different one when node was modified. The code implements this using a
query (that is executed before the actual edit) to check if the node was
there before the edit.

This is potentially racy, because the node can be created between the
query and the actual edit.

This patch adds a sysrepo lock so this does not happen. There is one
problem, though. The candidate datastore reset itself to mirror running
when sr_unlock is called [1].
Therefore we do not lock the candidate DS.

[1] https://github.com/sysrepo/sysrepo/blob/4f40f5406f24e5a565d0ee554389fff2aad4bd1b/doc/main.dox#L94

Bug: #2
Change-Id: Ie211018e921b5b2d7a2f78d8cc531d58db6ac167
jktjkt pushed a commit that referenced this issue Jun 7, 2024
There is a possible data race in sysrepo when calling actions. First,
we check if the action node is present and then we executed the action.
There is a possibility that between those two sysrepo calls the action
node can be created or deleted.
Fortunately, sysrepo raises and error when the action node is not
present when performing the rpc call. Unfortunately, it is the same
error as if the input data tree for the rpc was invalid.

I was thinking about removing the check for the action node existence
and rely solely on the sysrepo's exception. This would indeed remove
the data race but the disadvantage is, that the error message is now
very broad (in case of actions) because we can't differentiate between
invalid input data and missing action node.

I don't think this would bring a better behaviour from rousette.
There are two possible scenarios with the current behaviour:

1) The data node gets created between the check and the rpc call.
From the client's point of view the node does not exist in the moment
the request is created and I *think* it is ok to reject it.

2) The data node gets deleted between the check and the rpc call.
From the client's point of view the node was existing and it is expected
to return something. However, sysrepo (and rousette) reports the
validation failure error which is (at least) confusing. Therefore I have
edited the error message to report this possibility as well.

Bug: #2
Change-Id: Iae1a70160a8c5d32ed041ef90c4b67e18cd964c1
jktjkt pushed a commit that referenced this issue Jun 7, 2024
PUT returns HTTP status codes, one if the node was created and a
different one when node was modified. The code implements this using a
query (that is executed before the actual edit) to check if the node was
there before the edit.

This is potentially racy, because the node can be created between the
query and the actual edit.

This patch adds a sysrepo lock so this does not happen. There is one
problem, though. The candidate datastore reset itself to mirror running
when sr_unlock is called [1].
Therefore we do not lock the candidate DS.

[1] https://github.com/sysrepo/sysrepo/blob/4f40f5406f24e5a565d0ee554389fff2aad4bd1b/doc/main.dox#L94

Bug: #2
Change-Id: Ie211018e921b5b2d7a2f78d8cc531d58db6ac167
jktjkt pushed a commit that referenced this issue Jun 7, 2024
There is a possible data race in sysrepo when calling actions. First,
we check if the action node is present and then we executed the action.
There is a possibility that between those two sysrepo calls the action
node can be created or deleted.
Fortunately, sysrepo raises and error when the action node is not
present when performing the rpc call. Unfortunately, it is the same
error as if the input data tree for the rpc was invalid.

I was thinking about removing the check for the action node existence
and rely solely on the sysrepo's exception. This would indeed remove
the data race but the disadvantage is, that the error message is now
very broad (in case of actions) because we can't differentiate between
invalid input data and missing action node.

I don't think this would bring a better behaviour from rousette.
There are two possible scenarios with the current behaviour:

1) The data node gets created between the check and the rpc call.
From the client's point of view the node does not exist in the moment
the request is created and I *think* it is ok to reject it.

2) The data node gets deleted between the check and the rpc call.
From the client's point of view the node was existing and it is expected
to return something. However, sysrepo (and rousette) reports the
validation failure error which is (at least) confusing. Therefore I have
edited the error message to report this possibility as well.

Bug: #2
Change-Id: Iae1a70160a8c5d32ed041ef90c4b67e18cd964c1
jktjkt added a commit that referenced this issue Sep 6, 2024
Since commit ca1ad2c, the HTTP client
which is used in the test for one-shot HTTP queries contained some code
which deferred HTTP error handling. The reason why this code was needed
is that when the request failed on the HTTP level, and if an exception
is thrown (that's what the `client.on_error(...)` call is doing), then
the usual rules of stack unwinding took effect, which meant that the
entire `client` variable went out of scope. Its destructor would then
try to call the request's `on_close(...)`, which was trying to shut down
the HTTP client, but that one was already being destroyed:

  ERROR: AddressSanitizer: stack-use-after-scope on address 0x7f03e8cc60d0 at pc 0x565524c6a158 bp 0x7ffe0c7ab2d0 sp 0x7ffe0c7ab2c8
  READ of size 8 at 0x7f03e8cc60d0 thread T0
    #0 0x565524c6a157 in std::__shared_ptr<nghttp2::asio_http2::client::session, (__gnu_cxx::_Lock_policy)2>::get() const /nix/store/j00nb8s5mwaxgi77h21i1ycb91yxxqck-gcc-13.2.0/include/c++/13.2.0/bits/shared_ptr_base.h:1666:16
    #1 0x565524c6a157 in std::__shared_ptr_access<nghttp2::asio_http2::client::session, (__gnu_cxx::_Lock_policy)2, false, false>::_M_get() const /nix/store/j00nb8s5mwaxgi77h21i1ycb91yxxqck-gcc-13.2.0/include/c++/13.2.0/bits/shared_ptr_base.h:1363:66
    #2 0x565524c6a157 in std::__shared_ptr_access<nghttp2::asio_http2::client::session, (__gnu_cxx::_Lock_policy)2, false, false>::operator->() const /nix/store/j00nb8s5mwaxgi77h21i1ycb91yxxqck-gcc-13.2.0/include/c++/13.2.0/bits/shared_ptr_base.h:1357:9
    #3 0x565524c6a157 in auto auto Response clientRequest<char const*, char const*>(char const*, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>> const&, boost::posix_time::time_duration)::'lambda'(char const*)::operator()<boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>>(char const*) const::'lambda'(char const*)::operator()<unsigned int>(char const*) const CzechLight/rousette/tests/aux-utils.h:155:17
    #4 0x565524c6a157 in char const* std::__invoke_impl<void, auto Response clientRequest<char const*, char const*>(char const*, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>> const&, boost::posix_time::time_duration)::'lambda'(char const*)::operator()<boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>>(char const*) const::'lambda'(char const*)&, unsigned int>(std::__invoke_other, char const*&&, unsigned int&&) /nix/store/j00nb8s5mwaxgi77h21i1ycb91yxxqck-gcc-13.2.0/include/c++/13.2.0/bits/invoke.h:61:14
    #5 0x565524c6a157 in std::enable_if<is_invocable_r_v<char const*, char const*, unsigned int>, char const*>::type std::__invoke_r<void, auto Response clientRequest<char const*, char const*>(char const*, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>> const&, boost::posix_time::time_duration)::'lambda'(char const*)::operator()<boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>>(char const*) const::'lambda'(char const*)&, unsigned int>(char const*&&, unsigned int&&) /nix/store/j00nb8s5mwaxgi77h21i1ycb91yxxqck-gcc-13.2.0/include/c++/13.2.0/bits/invoke.h:111:2
    #6 0x565524c6a157 in std::_Function_handler<void (unsigned int), auto Response clientRequest<char const*, char const*>(char const*, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>> const&, boost::posix_time::time_duration)::'lambda'(char const*)::operator()<boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>>(char const*) const::'lambda'(char const*)>::_M_invoke(std::_Any_data const&, unsigned int&&) /nix/store/j00nb8s5mwaxgi77h21i1ycb91yxxqck-gcc-13.2.0/include/c++/13.2.0/bits/std_function.h:290:9
    #7 0x7f03ecc55683 in std::function<void (unsigned int)>::operator()(unsigned int) const /nix/store/j00nb8s5mwaxgi77h21i1ycb91yxxqck-gcc-13.2.0/include/c++/13.2.0/bits/std_function.h:591:9
    #8 0x7f03ecc55683 in nghttp2::asio_http2::client::request_impl::call_on_close(unsigned int) github/nghttp2/nghttp2-asio/lib/asio_client_request_impl.cc:67:5
    #9 0x7f03ecbf6daa in nghttp2::asio_http2::client::session_impl::~session_impl() github/nghttp2/nghttp2-asio/lib/asio_client_session_impl.cc:63:9
    #10 0x565524c40cd8 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release_last_use() /nix/store/j00nb8s5mwaxgi77h21i1ycb91yxxqck-gcc-13.2.0/include/c++/13.2.0/bits/shared_ptr_base.h:175:2
    #11 0x7f03ecc0854e in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count() /nix/store/j00nb8s5mwaxgi77h21i1ycb91yxxqck-gcc-13.2.0/include/c++/13.2.0/bits/shared_ptr_base.h:1071:11
    #12 0x7f03ecc0854e in std::__shared_ptr<nghttp2::asio_http2::client::session_impl, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr() /nix/store/j00nb8s5mwaxgi77h21i1ycb91yxxqck-gcc-13.2.0/include/c++/13.2.0/bits/shared_ptr_base.h:1524:31
    #13 0x7f03ecc0854e in nghttp2::asio_http2::client::session_impl::do_read()::$_0::~$_0() github/nghttp2/nghttp2-asio/lib/asio_client_session_impl.cc:629:15
    #14 0x7f03ecc0854e in std::_Function_base::_Base_manager<nghttp2::asio_http2::client::session_impl::do_read()::$_0>::_M_destroy(std::_Any_data&, std::integral_constant<bool, false>) /nix/store/j00nb8s5mwaxgi77h21i1ycb91yxxqck-gcc-13.2.0/include/c++/13.2.0/bits/std_function.h:175:4
    #15 0x7f03ecc0854e in std::_Function_base::_Base_manager<nghttp2::asio_http2::client::session_impl::do_read()::$_0>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation) /nix/store/j00nb8s5mwaxgi77h21i1ycb91yxxqck-gcc-13.2.0/include/c++/13.2.0/bits/std_function.h:203:8
    #16 0x7f03ecc0854e in std::_Function_handler<void (boost::system::error_code const&, unsigned long), nghttp2::asio_http2::client::session_impl::do_read()::$_0>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation) /nix/store/j00nb8s5mwaxgi77h21i1ycb91yxxqck-gcc-13.2.0/include/c++/13.2.0/bits/std_function.h:282:6
    #17 0x7f03ecc1d18c in std::_Function_base::~_Function_base() /nix/store/j00nb8s5mwaxgi77h21i1ycb91yxxqck-gcc-13.2.0/include/c++/13.2.0/bits/std_function.h:244:2
    #18 0x7f03ecc1d18c in boost::asio::detail::binder2<std::function<void (boost::system::error_code const&, unsigned long)>, boost::system::error_code, unsigned long>::~binder2() /home/jkt/work/prog/_build/czechlight-clang17-asan-ubsan/nghttp2-asio/include/boost/asio/detail/bind_handler.hpp:252:7
    #19 0x7f03ecc1d18c in boost::asio::detail::reactive_socket_recv_op<boost::asio::mutable_buffers_1, std::function<void (boost::system::error_code const&, unsigned long)>, boost::asio::any_io_executor>::do_complete(void*, boost::asio::detail::scheduler_operation*, boost::system::error_code const&, unsigned long) /home/jkt/work/prog/_build/czechlight-clang17-asan-ubsan/nghttp2-asio/include/boost/asio/detail/reactive_socket_recv_op.hpp:155:3
    #20 0x565524c5b55e in boost::asio::detail::scheduler_operation::destroy() /home/jkt/work/prog/_build/czechlight-clang17-asan-ubsan/rousette/include/boost/asio/detail/scheduler_operation.hpp:45:5
    #21 0x565524c5b55e in void boost::asio::detail::op_queue_access::destroy<boost::asio::detail::scheduler_operation>(boost::asio::detail::scheduler_operation*) /home/jkt/work/prog/_build/czechlight-clang17-asan-ubsan/rousette/include/boost/asio/detail/op_queue.hpp:47:8
    #22 0x565524c5b55e in boost::asio::detail::op_queue<boost::asio::detail::scheduler_operation>::~op_queue() /home/jkt/work/prog/_build/czechlight-clang17-asan-ubsan/rousette/include/boost/asio/detail/op_queue.hpp:81:7
    #23 0x565524c5b55e in boost::asio::detail::scheduler::abandon_operations(boost::asio::detail::op_queue<boost::asio::detail::scheduler_operation>&) /home/jkt/work/prog/_build/czechlight-clang17-asan-ubsan/rousette/include/boost/asio/detail/impl/scheduler.ipp:447:1
    #24 0x565524c55f1f in boost::asio::detail::epoll_reactor::shutdown() /home/jkt/work/prog/_build/czechlight-clang17-asan-ubsan/rousette/include/boost/asio/detail/impl/epoll_reactor.ipp:92:14
    #25 0x565524c63f95 in boost::asio::detail::service_registry::shutdown_services() /home/jkt/work/prog/_build/czechlight-clang17-asan-ubsan/rousette/include/boost/asio/detail/impl/service_registry.ipp:44:14
    #26 0x565524c4a441 in boost::asio::execution_context::shutdown() /home/jkt/work/prog/_build/czechlight-clang17-asan-ubsan/rousette/include/boost/asio/impl/execution_context.ipp:41:22
    #27 0x565524c4a441 in boost::asio::io_context::~io_context() /home/jkt/work/prog/_build/czechlight-clang17-asan-ubsan/rousette/include/boost/asio/impl/io_context.ipp:58:3
    #28 0x565524c426c1 in Response clientRequest<char const*, char const*>(char const*, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>> const&, boost::posix_time::time_duration) CzechLight/rousette/tests/aux-utils.h:165:1
    #29 0x565524c34d88 in DOCTEST_ANON_FUNC_4() CzechLight/rousette/tests/restconf-nacm.cpp:254:13
    #30 0x565524d65c92 in doctest::Context::run() /home/jkt/work/prog/_build/czechlight-clang17-asan-ubsan/target/include/doctest/doctest.h:7007:21
    #31 0x565524d6ae96 in main /home/jkt/work/prog/_build/czechlight-clang17-asan-ubsan/target/include/doctest/doctest.h:7085:71
    #32 0x7f03ebd5d0cd in __libc_start_call_main (/nix/store/7jiqcrg061xi5clniy7z5pvkc4jiaqav-glibc-2.38-27/lib/libc.so.6+0x280cd) (BuildId: fd8eb9ded07dda31c46fd63a5351b53bac6a4f1d)
    #33 0x7f03ebd5d188 in __libc_start_main@GLIBC_2.2.5 (/nix/store/7jiqcrg061xi5clniy7z5pvkc4jiaqav-glibc-2.38-27/lib/libc.so.6+0x28188) (BuildId: fd8eb9ded07dda31c46fd63a5351b53bac6a4f1d)
    #34 0x565524aeed04 in _start (/home/jkt/work/prog/_build/czechlight-clang17-asan-ubsan/rousette/test-restconf-nacm+0x111d04)

I suspected that this is a bug in nghttp2-asio, but this turned out to
be just a plain bug in interaction between C++'s memory management and
ASIO, and the cure is to use weak_ptr which "sees" that the original
object is already gone.

Change-Id: I72ac8c5618ac339fcdd800398713667b20c17f4c
@jktjkt jktjkt closed this as completed Sep 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants