Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix memory leak caused by saving stats counter to persist config #2279

Conversation

lbudai
Copy link
Collaborator

@lbudai lbudai commented Sep 14, 2018

Leak:

  • start syslog-ng with a simple config
@version: 3.17

options {
  time_reopen(5);
  stats-level(3);
};

source s_file {
  file("/tmp/input-logs.txt" log-iw-size(10));
};

destination d_file {
  file("/tmp/output-logs.txt");
};

destination d_net {
 network("localhost" port(5555) 
        );
};

log {
  source (s_file);
  destination (d_net);
};

  • reload syslog-ng
    Each time a memory-usage counter is written into a variable that is dynamically allocated on the heap and not freeing up (on my machine 8 bytes are leaked for each reload).

Sometimes we need to express that something is equal to TRUE (typically
in a unit test). In this cases TRUE is a strict value, 1.

Signed-off-by: Laszlo Budai <[email protected]>
@lbudai lbudai requested a review from furiel September 14, 2018 14:22
@lbudai lbudai changed the title Fix memory leak caused by saving stats counter to persis config Fix memory leak caused by saving stats counter to persist config Sep 14, 2018
@lbudai lbudai force-pushed the fix-memory-leak-caused-by-saving-stats-counter-to-persis-config branch 3 times, most recently from 91b2254 to 531c26e Compare September 14, 2018 15:05
@lbudai
Copy link
Collaborator Author

lbudai commented Sep 14, 2018

@kira-syslogng
Copy link
Contributor

Build FAILURE

1 similar comment
@kira-syslogng
Copy link
Contributor

Build FAILURE

@lbudai
Copy link
Collaborator Author

lbudai commented Sep 15, 2018

hmmm... the reason why Travis/Kira is timeouted is that I have a mutex in a locked state.
The reason of this is that one of the the asserts has failed before the unlock is called... (in c++ this is not an issue as one could use RAII and scope lock). Using cr_expect instead of cr_assert solves this blocking issue.

As set is just an assignment, it breaks SRP.
This is not the only reason of this separation: if we have a setter and an
init method, then it is possible to init the counter only when it is
first registered to stats.

Signed-off-by: Laszlo Budai <[email protected]>
@lbudai lbudai force-pushed the fix-memory-leak-caused-by-saving-stats-counter-to-persis-config branch from 531c26e to 74d74cd Compare September 15, 2018 17:42
@lbudai
Copy link
Collaborator Author

lbudai commented Sep 15, 2018

...and the failing assert was a mistake by myself... now I think it is fixed

@kira-syslogng
Copy link
Contributor

Build SUCCESS

The way how we are saving the values caused a memory leak.
Of course it could be fixed, but there is conceptual issue with this approach:
the stats-counters can be re-used after reload (as they are not removed from
the global stats-counter table).

Signed-off-by: Laszlo Budai <[email protected]>
@lbudai lbudai force-pushed the fix-memory-leak-caused-by-saving-stats-counter-to-persis-config branch from 74d74cd to 10c11d7 Compare September 17, 2018 12:57
@kira-syslogng
Copy link
Contributor

Build SUCCESS

@bazsi
Copy link
Collaborator

bazsi commented Sep 17, 2018

I would depend on this patch in my threaded logthrdestdriver work, as it wasn't easy to resolve the same
in a multi-worker environment.

So this is not just a minor leak, but helps me a big deal. I'll try to review this tonight.

Thanks for fixing this!

@lbudai
Copy link
Collaborator Author

lbudai commented Sep 17, 2018

@bazsi: there are some 'ugly' parts in the PR: I didn't want to lock/unlock the stats twice, so I extended the register_stats methods with assigning the counters to the LogQueue.

@lbudai lbudai added this to the OSE 3.18 milestone Sep 17, 2018
@bazsi
Copy link
Collaborator

bazsi commented Sep 17, 2018

I like the approach in general, I might give some thought to naming. It is not immediately clear, why the existance of a counter should cause something to be initialized or not.

What if we delegated the stats registration to LogQueue and took care of this initialization there?

e.g. instead of log_queue_set_counters(), we would have

log_queue_register_stats() that would get the initialized StatsClusterKey as argument. This way, the
LogQueue could check the existance of the counters internally and do the initialization only the first time. That logic would then not infect 3 call sites: LogThreadedDestDriver, AFSql (which should be the former), and LogWriter.

What do you think?

OTOH, the patch by itself resolves an issue I am facing in the multithreaded driver, so I'd love to see this go in as soon as possible.

@lbudai
Copy link
Collaborator Author

lbudai commented Sep 17, 2018

I’m going to check it on tomorrow and move it to LogQueue.
(when a counter exists, the queue resets the counter... this is what I called init, but maybe reset would be a better name ).

@lbudai
Copy link
Collaborator Author

lbudai commented Sep 18, 2018

@bazsi: in case of dropped counter the ownership is at logwriter/logthrdstdrv/afsql (and I would not touch that part), but the other two counters are owned by the LogQueue, so I'll modify the code.

@lbudai lbudai force-pushed the fix-memory-leak-caused-by-saving-stats-counter-to-persis-config branch from 53e2c00 to 145da81 Compare September 18, 2018 15:10
@lbudai
Copy link
Collaborator Author

lbudai commented Sep 18, 2018

@bazsi: I think (I'm sure...), later the dropped counter can be removed from the LogQueue, so until that I'd keep this log_queue_set_dropped_counter method.
Thanks for suggesting this change, it's better now (IMHO).

note: in afsql module the stats counter registration point changed, but I think it still works.

@kira-syslogng
Copy link
Contributor

Build SUCCESS

Concept: queued_messages and memory_usage counters are used only by the
LogQueue, so they can be owned by the LogQueue instances.
The dropped_messages counter is owned by the LogWriter/LogThrDestDriver,
or AFSqlDestDriver instances, so it won't be registered by this new
method.

Signed-off-by: Laszlo Budai <[email protected]>
@lbudai lbudai force-pushed the fix-memory-leak-caused-by-saving-stats-counter-to-persis-config branch from 145da81 to 6504f33 Compare September 18, 2018 16:23
@kira-syslogng
Copy link
Contributor

Build SUCCESS

@lbudai
Copy link
Collaborator Author

lbudai commented Sep 18, 2018

@kira-syslogng retest this please

@kira-syslogng
Copy link
Contributor

Build SUCCESS

@bazsi bazsi mentioned this pull request Sep 18, 2018
@lbudai
Copy link
Collaborator Author

lbudai commented Sep 19, 2018

@kira-syslogng Do perftest

@gaborznagy gaborznagy requested a review from szemere September 19, 2018 08:42
@kira-syslogng
Copy link
Contributor

Build SUCCESS

@furiel
Copy link
Collaborator

furiel commented Sep 20, 2018

logthrdestdrv: init queue counters only when registered for the first time
and
logqueue: do not initialize counters in set method
patches are reverted in the end, (maybe more?). From these, the first one is especially hard to understand and validate by review. Can you rebuild the patchset without such reverts? That would help the review a lot.

@@ -244,15 +257,15 @@ stats_cluster_is_alive(StatsCluster *self, gint type)
{
g_assert(type < self->counter_group.capacity);

return ((1<<type) & self->live_mask);
return ((1<<type) & self->live_mask) == (1 << type);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me: return !!(1<<type) & self->live_mask) would be more readable


stats_register_counter(stats_level, sc_key, SC_TYPE_QUEUED, &self->queued_messages);

if (stats_check_level(STATS_LEVEL1))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this check? Can we just omit?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we cannot: if we wouldn't check this than in case of stats level 0 the need_to_reset_counters would be set to FALSE which would ended in an unitialized queued counter.
(note that stats_counter_set handles the case when NULL ptr passed to it as counter).

stats_register_counter(stats_level, sc_key, SC_TYPE_QUEUED, &self->queued_messages);

if (stats_check_level(STATS_LEVEL1))
need_to_reset_counters = !stats_contains_counter(sc_key, SC_TYPE_MEMORY_USAGE);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could go this check into _reset_counters.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it couldn't.
Why?
Because at that point the counter is registered... so it would return TRUE (of course when stats-level is >= 1).

@@ -208,6 +208,19 @@ stats_cluster_track_counter(StatsCluster *self, gint type)
return &self->counter_group.counters[type];
}

StatsCounterItem *
stats_cluster_get_counter(StatsCluster *self, gint type)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the end, this function is not used. Can you drop the patch?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to keep it as I’ll need it later.

@lbudai
Copy link
Collaborator Author

lbudai commented Sep 20, 2018

@furiel: thanks for the review notes.

I’ll check them but... but I’m not sure I want to rebuild the whole patchset. It is true that I’m blocking on this PR, but for me it is just a marginal issue what I had to solve and I want to focus on other things now.

@@ -1349,17 +1345,11 @@ _register_counters(LogWriter *self)
stats_register_counter(self->options->stats_level, &sc_key, SC_TYPE_SUPPRESSED, &self->suppressed_messages);
stats_register_counter(self->options->stats_level, &sc_key, SC_TYPE_DROPPED, &self->dropped_messages);
stats_register_counter(self->options->stats_level, &sc_key, SC_TYPE_PROCESSED, &self->processed_messages);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are also common counters. Cant we move these under _register_common_counters?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean under _common_counters?
_common_counter is at LogQueue implementation, a private (static) method for that, and has nothing to do with LogWriter.

Copy link
Collaborator

@furiel furiel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, all my questions are answered. Other than that, I had a few optional comments, nothing critical. imho this is ready to go in.

@furiel furiel merged commit cd56c32 into syslog-ng:master Sep 20, 2018
@lbudai
Copy link
Collaborator Author

lbudai commented Sep 20, 2018

I can do some modifications...

  1. modify the predicate function (return !!x)
  2. modifiy the code and use the stats_cluster_get_counter method

but only in if the PR get merged on today.

I still don't want to rebuild the whole patchset.
I think, the first version could be also merged, but totally agreed with Bazsi that this naming is not the best. Changing the LogQueue interface is not part of the original problem, but a make-things-better changset, or, in other words: it could be part of a separate PR... but I'd not open another PR for it.

@lbudai
Copy link
Collaborator Author

lbudai commented Sep 20, 2018

@furiel: you were too fast :)
then I'm going to open a PR for addressing those two.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants