Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CRASH] OpenSIPS segfaults during startup #3359

Closed
NormB opened this issue Apr 10, 2024 · 10 comments
Closed

[CRASH] OpenSIPS segfaults during startup #3359

NormB opened this issue Apr 10, 2024 · 10 comments
Assignees
Milestone

Comments

@NormB
Copy link
Member

NormB commented Apr 10, 2024

OpenSIPS version you are running

Testing with the master branch

Crash Core Dump

I've tried enabling all the recommended settings in the docker container to generate a core dump, however OpenSIPS still isn't able to generate one. Maybe there is something missing in docker-compose.yml or entrypoint.sh.

Running OpenSIPS as user/group = root/root allowed for the creation of the core dump.

docker-compose.yml

  opensips:
    privileged: true
    ulimits:
      core: 99999999
    volumes:
      - /cores:/cores

entrypoint.sh

COREDIR=/cores
echo "$COREDIR/core.%e.sig%s.%p" > /proc/sys/kernel/core_pattern
echo 1 > /proc/sys/kernel/core_uses_pid
echo 1 > /proc/sys/fs/suid_dumpable
...
SHM_MEMORY=1024
PKG_MEMORY=2048
USER=opensips
GROUP=opensips
PID_FILE=/usr/local/etc/opensips/opensips.pid
WORK_DIR=/cores

if [ "$1" = 'opensips' ]; then
  shift
  exec opensips -F -u $USER -g $GROUP -P $PID_FILE -w $WORK_DIR -m $SHM_MEMORY -M $PKG_MEMORY "$@"
fi

Describe the traffic that generated the bug

It looks like the segfault happens while processing a SUBSCRIBE request

To Reproduce

When the commented lines in the presence module are not commented, the segfault does not happen.

loadmodule "presence.so"
modparam("presence", "db_url", "{DB_TYPE}://{DB_USER}:{DB_PASSWORD}@{DB_HOST}:{DB_PORT}/{DB_NAME}")
#modparam("presence", "fallback2db", 1)
#modparam("presence", "cluster_id", {PRESENCE_CLUSTER_ID})
modparam("presence", "mix_dialog_presence", 1)
#modparam("presence", "cluster_federation_mode", "full-sharing")
modparam("presence", "contact_user", "presence")
modparam("presence", "end_sub_on_timeout", 1)
modparam("presence", "notify_offline_body", 1)
modparam("presence", "clean_period", 30)
modparam("presence", "db_update_period", 5)

Relevant System Logs

Apr 10 08:28:03 [21] DBG:core:comp_scriptvar: str 20: message-summary
Apr 10 08:28:03 [21] DBG:sipmsgops:has_totag: totag found
Apr 10 08:28:03 [21] DBG:core:parse_headers_aux: flags=ffffffffffffffff
Apr 10 08:28:03 [21] DBG:presence:search_event: start event= [message-summary]
Apr 10 08:28:03 [21] DBG:presence:extract_sdialog_info: 'Expires' header found, value= 30
Apr 10 08:28:03 [21] DBG:presence:extract_sdialog_info: subs->contact= sip:[email protected]:58690 - len = 29
Apr 10 08:28:03 [21] DBG:core:parse_headers_aux: flags=ffffffffffffffff
Apr 10 08:28:03 [21] CRITICAL:core:sig_usr: segfault in process pid: 21, id: 8
Apr 10 08:28:03 [21] DBG:core:restore_segv_handler: restoring SIGSEGV handler...
Apr 10 08:28:03 [21] DBG:core:restore_segv_handler: successfully restored system SIGSEGV handler
Apr 10 08:28:03 [1] DBG:core:handle_sigs: OpenSIPS exit status = 11
Apr 10 08:28:03 [1] INFO:core:handle_sigs: child process 21 exited by a signal 11
Apr 10 08:28:03 [1] INFO:core:handle_sigs: core was not generated
Apr 10 08:28:03 [1] INFO:core:handle_sigs: terminating due to SIGCHLD

backtrace

Changed the calls to handle_subscribe() and the segfaults have stopped.

handle_subscribe(1, "{VIP1_NODE_ID}");
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `opensips -F -u root -g root -P /usr/local/etc/opensips/opensips.pid -w /cores -'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000000000000000 in ?? ()
(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x00007f68f7595603 in handle_subscribe (msg=0x7f693bc7d0d0, force_active_param=0x7ffdc879a6a8, tag=0x7ffdc879a6b0) at subscribe.c:606
#2  0x000055e4d7c8ac1f in do_action (a=0x7f693880e230, msg=0x7f693bc7d0d0) at action.c:1048
#3  0x000055e4d7c86b10 in run_action_list (a=0x7f693880e230, msg=0x7f693bc7d0d0) at action.c:190
#4  0x000055e4d7c89cb9 in do_action (a=0x7f693880e4e0, msg=0x7f693bc7d0d0) at action.c:830
#5  0x000055e4d7c86b10 in run_action_list (a=0x7f693880e4e0, msg=0x7f693bc7d0d0) at action.c:190
#6  0x000055e4d7c89cb9 in do_action (a=0x7f6938810310, msg=0x7f693bc7d0d0) at action.c:830
#7  0x000055e4d7c86b10 in run_action_list (a=0x7f693880b9f8, msg=0x7f693bc7d0d0) at action.c:190
#8  0x000055e4d7c89cb9 in do_action (a=0x7f69388107f8, msg=0x7f693bc7d0d0) at action.c:830
#9  0x000055e4d7c86b10 in run_action_list (a=0x7f69388107f8, msg=0x7f693bc7d0d0) at action.c:190
#10 0x000055e4d7c8691a in run_actions (a=0x7f69388107f8, msg=0x7f693bc7d0d0) at action.c:136
#11 0x000055e4d7c89abb in do_action (a=0x7f69387b41a8, msg=0x7f693bc7d0d0) at action.c:803
#12 0x000055e4d7c86b10 in run_action_list (a=0x7f69387b3138, msg=0x7f693bc7d0d0) at action.c:190
#13 0x000055e4d7c89cb9 in do_action (a=0x7f69387b9a98, msg=0x7f693bc7d0d0) at action.c:830
#14 0x000055e4d7c86b10 in run_action_list (a=0x7f69387abb08, msg=0x7f693bc7d0d0) at action.c:190
#15 0x000055e4d7c8691a in run_actions (a=0x7f69387abb08, msg=0x7f693bc7d0d0) at action.c:136
#16 0x000055e4d7c86da2 in run_top_route (sr=..., msg=0x7f693bc7d0d0) at action.c:250
#17 0x000055e4d7d0e85f in receive_msg (buf=0x55e4d80b4b40 <buf> "SUBSCRIBE sip:[email protected] SIP/2.0\r\nVia: SIP/2.0/UDP 69.172.141.138:58690;branch=z9hG4bK712790379\r\nFrom: \"1000\" <sip:[email protected]:5060>;tag=4291382927\r\nTo: <sip:[email protected]:5060>;tag"..., len=498, rcv_info=0x7ffdc879b770, existing_context=0x0, msg_flags=0) at receive.c:225
#18 0x000055e4d7e610a4 in udp_read_req (si=0x7f69387904e0, bytes_read=0x7ffdc879b868) at net/proto_udp/proto_udp.c:186
#19 0x000055e4d7e51456 in handle_io (fm=0x7f693887c0a0, idx=0, event_type=1) at net/net_udp.c:296
#20 0x000055e4d7e4faaa in io_wait_loop_epoll (h=0x55e4d7f85f80 <_worker_io>, t=1, repeat=0) at net/../io_wait_loop.h:305
#21 0x000055e4d7e52961 in udp_start_processes (chd_rank=0x55e4d7f85228 <chd_rank>, startup_done=0x7f68f8caa018) at net/net_udp.c:530
#22 0x000055e4d7cc379d in main_loop () at main.c:283
#23 0x000055e4d7cc6d81 in main (argc=14, argv=0x7ffdc879bcb8) at main.c:887
(gdb)

OS/environment information

  • Operating System:
  • OpenSIPS installation:
  • other relevant information:

debian:bookworm

Additional context

@bogdan-iancu
Copy link
Member

@NormB , if you go to frame 1 and print the c_api structure:

f 1
p c_api
p pres_cluster_id

I suspect the api structure (with pointers to functions) provided by the clustering module is not valid populated .... Like the clusterer module was not loaded by presence module

@bogdan-iancu bogdan-iancu self-assigned this Apr 25, 2024
@bogdan-iancu bogdan-iancu added this to the 3.5-dev milestone Apr 25, 2024
@NormB
Copy link
Member Author

NormB commented Apr 25, 2024

The following docker-compose.yml updates seem to help get core dumps created (still starting as user/group root, not user/group opensips) :(

opensips:
    hostname: opensips
    container_name: opensips
    network_mode: "host"
    privileged: true
    ulimits:
      core:
        soft: -1
        hard: -1

@NormB
Copy link
Member Author

NormB commented Apr 25, 2024

bt full

(gdb) bt full
#0  0x0000000000000000 in ?? ()
No symbol table info available.
#1  0x00007fc00083e243 in handle_subscribe (msg=0x7fc044fb8130, force_active_param=0x7ffdae6a2e08, tag=0x7ffdae6a2e10) at subscribe.c:606
        init_req = 0
        subs = {pres_uri = {s = 0x0, len = 0}, to_user = {
            s = 0x562655bf1c50 <buf+176> "[email protected]:5060>;tag=02b6.c866be3961af44287e2c9f7e83cbfd19\r\nCall-ID: [email protected]\r\nCSeq: 5 SUBSCRIBE\r\nContact: <sip:[email protected]:65477>\r\nAccept: application/dialog-info+xml\r\nMa"..., len = 3}, to_domain = {
            s = 0x562655bf1c54 <buf+180> "planet.net:5060>;tag=02b6.c866be3961af44287e2c9f7e83cbfd19\r\nCall-ID: [email protected]\r\nCSeq: 5 SUBSCRIBE\r\nContact: <sip:[email protected]:65477>\r\nAccept: application/dialog-info+xml\r\nMax-Fo"..., len = 10}, from_user = {
            s = 0x562655bf1c21 <buf+129> "[email protected]:5060>;tag=1112801586\r\nTo: <sip:[email protected]:5060>;tag=02b6.c866be3961af44287e2c9f7e83cbfd19\r\nCall-ID: [email protected]\r\nCSeq: 5 SUBSCRIBE\r\nContact: <sip:[email protected]:"..., len = 4}, from_domain = {
            s = 0x562655bf1c26 <buf+134> "planet.net:5060>;tag=1112801586\r\nTo: <sip:[email protected]:5060>;tag=02b6.c866be3961af44287e2c9f7e83cbfd19\r\nCall-ID: [email protected]\r\nCSeq: 5 SUBSCRIBE\r\nContact: <sip:[email protected]:65477"..., len = 10}, subs_body = {s = 0x562655bf1d82 <buf+482> "",
            len = 0}, event = 0x7fc001ef8238, event_id = {s = 0x0, len = 0}, to_tag = {
            s = 0x562655bf1c69 <buf+201> "02b6.c866be3961af44287e2c9f7e83cbfd19\r\nCall-ID: [email protected]\r\nCSeq: 5 SUBSCRIBE\r\nContact: <sip:[email protected]:65477>\r\nAccept: application/dialog-info+xml\r\nMax-Forwards: 69\r\nUser-Agen"..., len = 37}, from_tag = {
            s = 0x562655bf1c3b <buf+155> "1112801586\r\nTo: <sip:[email protected]:5060>;tag=02b6.c866be3961af44287e2c9f7e83cbfd19\r\nCall-ID: [email protected]\r\nCSeq: 5 SUBSCRIBE\r\nContact: <sip:[email protected]:65477>\r\nAccept: applicatio"..., len = 10}, callid = {
            s = 0x562655bf1c99 <buf+249> "[email protected]\r\nCSeq: 5 SUBSCRIBE\r\nContact: <sip:[email protected]:65477>\r\nAccept: application/dialog-info+xml\r\nMax-Forwards: 69\r\nUser-Agent: Yealink SIP-T54W 96.86.0.70\r\nExpires: 30\r\nEve"..., len = 25}, sockinfo = 0x7fc041abb8b0,
          remote_cseq = 5, local_cseq = 0, contact = {
            s = 0x562655bf1cd1 <buf+305> "sip:[email protected]:65477>\r\nAccept: application/dialog-info+xml\r\nMax-Forwards: 69\r\nUser-Agent: Yealink SIP-T54W 96.86.0.70\r\nExpires: 30\r\nEvent: dialog\r\nContent-Length: 0\r\n\r\n", len = 29}, local_contact = {
            s = 0x7fc000869060 <buf> "sip:[email protected]", len = 27}, record_route = {s = 0x0, len = 0}, expires = 30, status = 0, reason = {
            s = 0x0, len = 0}, version = 0, db_flag = 0, auth_rules_doc = 0x0, internal_update_flag = 0, sh_tag = {s = 0x7fc041b40f88 "1686046221",
            len = 10}, next = 0x0}
        event = 0x7fc001ef8238
        parsed_event = 0x7fc044fba1f0
        ev_param = 0x0
        found = 0
        reason = {s = 0x0, len = 0}
        reply_code = 500
        reply_str = {s = 0x7fc000857bd0 "Server Internal Error", len = 21}
        ret = 0
        __FUNCTION__ = "handle_subscribe"
#2  0x00005626557c7c1f in do_action (a=0x7fc041b41068, msg=0x7fc044fb8130) at action.c:1048
        ret = 0
        v = 22054
        i = 0
        len = 0
        cmatch = 0
        aitem = 0x7fc044fb8130
        adefault = 0xffffffffffffffff
        spec = 0x1
        val = {rs = {s = 0x0, len = 0}, ri = 0, flags = 0}
        start = {tv_sec = 0, tv_usec = 0}
        end_time = 0
        cmd = 0x7fc000867390 <cmds+240>
        acmd = 0x7ffdae6a2fc0
        cmdp = {0x7ffdae6a2e08, 0x7ffdae6a2e10, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}
        tmp_vals = {{rs = {s = 0x7fc041b40af8 "dialog", len = 6}, ri = 1438588261, flags = 22054}, {rs = {
              s = 0x6 <error: Cannot access memory at address 0x6>, len = 26365904}, ri = 1, flags = 0}, {rs = {s = 0x7fc041b40f88 "1686046221", len = 10},
            ri = 1102623920, flags = 32704}, {rs = {s = 0x7fc041ade8c8 "\004", len = 1}, ri = 1436295929, flags = 22054}, {rs = {
              s = 0x800 <error: Cannot access memory at address 0x800>, len = -1043734496}, ri = -1368772816, flags = 32765}, {rs = {
              s = 0x562655859e73 <eval_elem+4861> "\211E\334\351\223", len = 1435455520}, ri = -1368772560, flags = 32765}, {rs = {
              s = 0x7fc044fb8130 "\001", len = 0}, ri = 0, flags = 0}, {rs = {s = 0x0, len = 0}, ri = 0, flags = 0}}
        route_p = 0x7fc041ade8c8
        sval = {s = 0x400000000 <error: Cannot access memory at address 0x400000000>, len = 1}
        __FUNCTION__ = "do_action"
#3  0x00005626557c3b10 in run_action_list (a=0x7fc041b40d50, msg=0x7fc044fb8130) at action.c:190
        ret = 1
        t = 0x7fc041b41068
#4  0x00005626557c6cb9 in do_action (a=0x7fc041b41318, msg=0x7fc044fb8130) at action.c:830
        ret = 1
        v = 1
        i = 1101541392
        len = 32704
        cmatch = 1157342120
--Type <RET> for more, q to quit, c to continu

@NormB
Copy link
Member Author

NormB commented Apr 25, 2024

(gdb) f 1
#1  0x00007fc00083e243 in handle_subscribe (msg=0x7fc044fb8130, force_active_param=0x7ffdae6a2e08, tag=0x7ffdae6a2e10)
    at subscribe.c:606
606	subscribe.c: No such file or directory.
(gdb) p c_api
$1 = {get_nodes = 0x7fc000bb94d2 <get_clusterer_nodes>, free_nodes = 0x7fc000bb9496 <free_clusterer_nodes>,
  set_state = 0x7fc000ba1eb7 <cl_set_state>, check_addr = 0x7fc000ba4a52 <clusterer_check_addr>,
  get_my_id = 0x7fc000bb9c11 <cl_get_my_id>, get_my_sip_addr = 0x7fc000bb9c20 <cl_get_my_sip_addr>,
  get_my_index = 0x7fc000bb9fe7 <cl_get_my_index>, send_to = 0x7fc000ba3e2f <cl_send_to>, send_all = 0x7fc000ba3f15 <cl_send_all>,
  send_all_having = 0x7fc000ba3ff6 <cl_send_all_having>, get_next_hop = 0x7fc000bb97d7 <api_get_next_hop>,
  free_next_hop = 0x7fc000bb9bec <api_free_next_hop>, register_capability = 0x7fc000ba9ed5 <cl_register_cap>,
  request_sync = 0x7fc000bc0a28 <cl_request_sync>, sync_chunk_start = 0x7fc000bc121b <cl_sync_chunk_start>,
  sync_chunk_iter = 0x7fc000bc168b <cl_sync_chunk_iter>, shtag_get = 0x7fc000bbcc0a <shtag_get>,
  shtag_activate = 0x7fc000bbdd47 <shtag_activate_api>, shtag_get_all_active = 0x7fc000bbe420 <shtag_get_all_active>,
  shtag_register_callback = 0x7fc000bbafb2 <shtag_register_callback>, shtag_get_sync_status = 0x7fc000bbd1ee <shtag_get_sync_status>,
  shtag_set_sync_status = 0x7fc000bbd58c <shtag_set_sync_status>, shtag_sync_all_backup = 0x7fc000bbd9f7 <shtag_sync_all_backup>}
(gdb) p pres_cluster_id
$2 = 0

@NormB
Copy link
Member Author

NormB commented Apr 25, 2024

Tested with cluster_id being set and fallback2db, cluster_federation_mode still commented. Same segfault with pres_cluster_id = 0.

#modparam("presence", "fallback2db", 1)
modparam("presence", "cluster_id", 1)
modparam("presence", "mix_dialog_presence", 1)
#modparam("presence", "cluster_federation_mode", "full-sharing")
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000000000000000 in ?? ()
(gdb) f 1
#1  0x00007fa082d97243 in handle_subscribe (msg=0x7fa0c75114b0, force_active_param=0x7ffdc71bcef8, tag=0x7ffdc71bcf00) at subscribe.c:606
606	subscribe.c: No such file or directory.
(gdb) p c_api
$1 = {get_nodes = 0x7fa0831124d2 <get_clusterer_nodes>, free_nodes = 0x7fa083112496 <free_clusterer_nodes>, set_state = 0x7fa0830faeb7 <cl_set_state>,
  check_addr = 0x7fa0830fda52 <clusterer_check_addr>, get_my_id = 0x7fa083112c11 <cl_get_my_id>, get_my_sip_addr = 0x7fa083112c20 <cl_get_my_sip_addr>,
  get_my_index = 0x7fa083112fe7 <cl_get_my_index>, send_to = 0x7fa0830fce2f <cl_send_to>, send_all = 0x7fa0830fcf15 <cl_send_all>,
  send_all_having = 0x7fa0830fcff6 <cl_send_all_having>, get_next_hop = 0x7fa0831127d7 <api_get_next_hop>,
  free_next_hop = 0x7fa083112bec <api_free_next_hop>, register_capability = 0x7fa083102ed5 <cl_register_cap>,
  request_sync = 0x7fa083119a28 <cl_request_sync>, sync_chunk_start = 0x7fa08311a21b <cl_sync_chunk_start>,
  sync_chunk_iter = 0x7fa08311a68b <cl_sync_chunk_iter>, shtag_get = 0x7fa083115c0a <shtag_get>, shtag_activate = 0x7fa083116d47 <shtag_activate_api>,
  shtag_get_all_active = 0x7fa083117420 <shtag_get_all_active>, shtag_register_callback = 0x7fa083113fb2 <shtag_register_callback>,
  shtag_get_sync_status = 0x7fa0831161ee <shtag_get_sync_status>, shtag_set_sync_status = 0x7fa08311658c <shtag_set_sync_status>,
  shtag_sync_all_backup = 0x7fa0831169f7 <shtag_sync_all_backup>}
(gdb) p pres_cluster_id
$2 = 0
(gdb)

@NormB
Copy link
Member Author

NormB commented Apr 25, 2024

Uncommenting cluster_federation_mode --> no segfault

@NormB
Copy link
Member Author

NormB commented Apr 25, 2024

subscriber.c

 603
 604         if (tag) {
 605                 subs.sh_tag = *tag;
 606                 if (c_api.shtag_get( &subs.sh_tag, pres_cluster_id)<0) {
 607                         LM_ERR("failed to lookup the <%.*s> sharing tag\n",
 608                                 subs.sh_tag.len, subs.sh_tag.s);
 609                         goto error;
 610                 }
 611         }
 612

It looks like shtag_get() does not perform a check of the cluster_id value before calling shtag_get_unsafe(). Perhaps a test for cluster_id <= 0 and a then a negative return (before any locking is performed) should be done.

More importantly, the cluster_id is set as a global parameter so it's unclear why it's 0 in the core.

modparam("presence", "cluster_id", 1)
int shtag_get(str *tag_name, int cluster_id)
{
        struct sharing_tag *tag;
        int ret;

        lock_start_read(shtags_lock);

        for (tag = *shtags_list;
                tag && (tag->cluster_id!=cluster_id || str_strcmp(&tag->name, tag_name));
                tag = tag->next) ;
        if (!tag) {
                lock_stop_read(shtags_lock);
                lock_start_write(shtags_lock);

                tag = shtag_get_unsafe(tag_name, cluster_id);
                ret = (tag == NULL) ? -1 : tag->state;

                lock_stop_write(shtags_lock);
        } else {
                ret = tag->state;

                lock_stop_read(shtags_lock);
        }

        return ret;
}

@NormB
Copy link
Member Author

NormB commented Apr 26, 2024

Tested adding a return -1 to shtag_get() if the cluster_id <= 0. The same segfault happened, so perhaps I didn't install the test patch correctly. Otherwise, the root of the problem is somewhere else.

bogdan-iancu added a commit that referenced this issue Apr 26, 2024
Avoid evaluating sharing tags when the clustering support in presence module is disabled.
Related to #3359
@bogdan-iancu
Copy link
Member

@NormB , if the cluster_federation_mode is not set, the module will auto-disable the clustering support by overriding the cluseter_id with 0.

Now, your original crash is most probably related to the fact that you have the clustering support disabled in the presence module (by not setting the cluster_id), but using the "sharing tag" parameter to the handle_subscribe() function. I just pushed a fix for that, please update and try again

bogdan-iancu added a commit that referenced this issue Apr 26, 2024
Avoid evaluating sharing tags when the clustering support in presence module is disabled.
Related to #3359

(cherry picked from commit 6af9039)
bogdan-iancu added a commit that referenced this issue Apr 26, 2024
Avoid evaluating sharing tags when the clustering support in presence module is disabled.
Related to #3359

(cherry picked from commit 6af9039)
@NormB
Copy link
Member Author

NormB commented Apr 26, 2024

Confirmed that the fix works correctly.

Thank you.

@NormB NormB closed this as completed Apr 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants