Missing lastSuccess/lastFailure in GET /v2/subscriptions in some cases #2974

fgalan · 2017-09-14T12:30:01Z

We have tests that in some cases (e.g. 1960_failure_stats_in_db_and_notifications/failure_stats_in_db_and_notifications_stop_start_accumulator.test) the lastSucess/lastFailure field is missing in the GET /v2/subscription operation, although lastNotification and timesSent seem to be ok.

The problem is a bit annoying as it doesn't happend in all systems (e.g. we have a jenkins running regression that is showing the problem... while another jenkins with the same setup is not showing it!). It could be related with cache (in some systems we have observed the problem arises when cache is enabled and doesn't happen with noCache enabled).

Research in progress...

fgalan · 2017-09-14T12:30:06Z

In 1960_failure_stats_in_db_and_notifications/failure_stats_in_db_and_notifications_stop_start_accumulator.test it seems it happens only in the case of initial notification (which we know is special in the sense is synchronous to the subscription operation). In this test step 1 creates the entity, step 2 creates the subscription. If we reverse these steps, the problem doesn't happen.

fgalan · 2017-09-14T13:37:04Z

In one of the jenkins environment I'm testing this I have found that every time a test fails, the following appears in the logs:

time=2017-09-14T13:25:46.279Z | lvl=WARN | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=AlarmManager.cpp[405]:badInput | msg=Raising alarm BadInput : intent to update error status of non-existing subscription

That points out to the subCacheItemNotificationErrorStatus(), which is invoked by startSenderThread() once the notification is sent. Let's have a look to this function:

void subCacheItemNotificationErrorStatus(const std::string& tenant, const std::string& subscriptionId, int errors)
{
  if (noCache)
  {
    <here is the logic that updates directly in the database, so it makes sense that the problem doesn't happens with noCache is enabled>
    return;
  }
  ...
  cacheSemTake(__FUNCTION__, "Looking up an item for lastSuccess/Failure");

  CachedSubscription* subP = subCacheItemLookup(tenant.c_str(), subscriptionId.c_str());

  if (subP == NULL)
  {
    cacheSemGive(__FUNCTION__, "Looking up an item for lastSuccess/Failure");
    const char* errorString = "intent to update error status of non-existing subscription";

    alarmMgr.badInput(clientIp, errorString);
    return;
  }
  ...
}

Note that the notification flow is independent of the create subscription flow (in other words, notifications are asyncrhonous to the subscription creation) as they happen in separate threads. The creation flow is the one that inserts the subscription in the cache. But it may happen that the notifications flow executes faster than the creation flow, thus in that case the subCacheItemLookup() doesn't find the entry in which annotate the fail/sucess status and that is the cause of the problem.

Note I say "it may happen". That would explain the "randomness" observed in this problem, happening in some environments but not in others.

fgalan · 2017-09-14T13:48:04Z

I will try to refactor so csub is inserted in the cache before spawn the notification thread. If the refactor gets to complex, I'll .DISABLE the affected .test in a quick PR in order not blocking CI.

fgalan · 2017-09-14T14:48:19Z

Fixed in PR #2975

fgalan added this to the 1.9.0 milestone Sep 14, 2017

fgalan changed the title ~~Missing lastSuccess/lastFailure in GET /v2/subscripiotions in some cases~~ Missing lastSuccess/lastFailure in GET /v2/subscriptions in some cases Sep 14, 2017

fgalan added the bug label Sep 14, 2017

fgalan mentioned this issue Sep 14, 2017

FIX missing lastsuccess and lastfailure #2975

Merged

fgalan self-assigned this Sep 14, 2017

manucarrace closed this as completed in #2975 Sep 14, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing lastSuccess/lastFailure in GET /v2/subscriptions in some cases #2974

Missing lastSuccess/lastFailure in GET /v2/subscriptions in some cases #2974

fgalan commented Sep 14, 2017

fgalan commented Sep 14, 2017

fgalan commented Sep 14, 2017 •

edited

Loading

fgalan commented Sep 14, 2017

fgalan commented Sep 14, 2017

Missing lastSuccess/lastFailure in GET /v2/subscriptions in some cases #2974

Missing lastSuccess/lastFailure in GET /v2/subscriptions in some cases #2974

Comments

fgalan commented Sep 14, 2017

fgalan commented Sep 14, 2017

fgalan commented Sep 14, 2017 • edited Loading

fgalan commented Sep 14, 2017

fgalan commented Sep 14, 2017

fgalan commented Sep 14, 2017 •

edited

Loading