WL9175 post-push: ASAN memory leak in Query_log_event.

In WL-included builds ASAN run witnessed missed ~Query_log_event invocation. The destruct-or was not called due to the WL's changes in the error propagation that specifically affect LC MTS. The failure is exposed in particular by rpl_trigger as the following stack: #0 0x9ecd98 in __interceptor_malloc (/export/home/pb2/test/sb_2-22611026-1489061390.32/mysql-commercial-8.0.1-dmr-linux-x86_64-asan/bin/mysqld+0x9ecd98) #1 0x2b1a245 in my_raw_malloc(unsigned long, int) obj/mysys/../../mysqlcom-pro-8.0.1-dmr/mysys/my_malloc.cc:209:12 #2 0x2b1a245 in my_malloc obj/mysys/../../mysqlcom-pro-8.0.1-dmr/mysys/my_malloc.cc:72 #3 0x2940590 in Query_log_event::Query_log_event(char const*, unsigned int, binary_log::Format_description_event const*, binary_log::Log_event_type) obj/sql/../../mysqlcom-pro-8.0.1-dmr/sql/log_event.cc:4343:46 #4 0x293d235 in Log_event::read_log_event(char const*, unsigned int, char const**, Format_description_log_event const*, bool) obj/sql/../../mysqlcom-pro-8.0.1-dmr/sql/log_event.cc:1686:17 #5 0x293b96f in Log_event::read_log_event() #6 0x2a2a1c9 in next_event(Relay_log_info*) Previously before the WL Mts_submode_logical_clock::wait_for_workers_to_finish() had not returned any error even when Coordinator thread is killed. The WL patch needed to refine such behavior, but at doing so it also had to attend log_event.cc::schedule_next_event() to register an error to follow an existing pattern. While my_error() does not take place the killed Coordinator continued scheduling, ineffectively though - no Worker gets engaged (legal case of deferred scheduling), and without noticing its killed status up to a point when it resets the event pointer in apply_event_and_update_pos(): *ptr_ev= NULL; // announcing the event is passed to w-worker The reset was intended for an assigned Worker to perform the event destruction or by Coordinator itself when the event is deferred. As neither is the current case the event gets unattended for its termination. In contrast in the pre-WL sources the killed Coordinator does find a Worker. However such Worker could be already down (errored out and exited), in which case apply_event_and_update_pos() reasonably returns an error and executes delete ev in exec_relay_log_event() error branch. **Fixed** with deploying my_error() call in log_event.cc::schedule_next_event() error branch which fits to the existing pattern. THD::is_error() has been always checked by Coordinator before any attempt to reset *ptr_ev= NULL. In the errored case Coordinator does not reset and destroys the event itself in the exec_relay_log_event() error branch pretty similarly to how the pre-WL sources do. Tested against rpl_trigger and rpl suites to pass. Approved on rb#15667.
percona-ysorokin · Mar 10, 2017 · e39a757 · e39a757
1 parent ba659bf
commit e39a757
Showing 1 changed file with 11 additions and 3 deletions.
diff --git a/sql/log_event.cc b/sql/log_event.cc
@@ -2827,16 +2827,24 @@ static bool schedule_next_event(Log_event* ev, Relay_log_info* rli)
   error= rli->current_mts_submode->schedule_next_event(rli, ev);
   switch (error)
   {
-  case ER_MTS_CANT_PARALLEL:
     char llbuff[22];
+  case ER_MTS_CANT_PARALLEL:
     llstr(rli->get_event_relay_log_pos(), llbuff);
     my_error(ER_MTS_CANT_PARALLEL, MYF(0),
     ev->get_type_str(), rli->get_event_relay_log_name(), llbuff,
              "The master event is logically timestamped incorrectly.");
     return true;
   case ER_MTS_INCONSISTENT_DATA:
-    /* Don't have to do anything. */
-    return true;
+    llstr(rli->get_event_relay_log_pos(), llbuff);
+    {
+      char errfmt[]=
+        "Coordinator experienced an error or was killed while scheduling "
+        "an event at relay-log name %s position %s.";
+      char errbuf[sizeof(errfmt) + FN_REFLEN + sizeof(llbuff)];
+      sprintf(errbuf, errfmt, rli->get_event_relay_log_name(), llbuff);
+      my_error(ER_MTS_INCONSISTENT_DATA, MYF(0), errbuf);
+      return true;
+    }
   default:
     return false;
   }