Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: journald reader #1852

Merged
merged 13 commits into from
Jan 6, 2021
Merged

Conversation

sempervictus
Copy link

This is absolutely not ready to merge, its scaffold for the badly needed systemd journald reader mechanism. So far all it does, hopefully, is open the journal and start reading messages from the end into our queue. Its late, time's short, so this is very rough and untested so far. Collaboration very welcome.

This seeks to address #1367.

ping @atomicturtle @ddpbsd @dcid

@sempervictus sempervictus force-pushed the feature/reader_journald branch from 2fb9c8b to 0f0a9aa Compare March 5, 2020 06:13
@sempervictus sempervictus force-pushed the feature/reader_journald branch 2 times, most recently from 5fd14b0 to 5828876 Compare March 5, 2020 06:30
src/Makefile Show resolved Hide resolved
@ddpbsd
Copy link
Member

ddpbsd commented Mar 5, 2020

Thanks for jumping on this!
I'm sure some of the comments I'm leaving are already known issues, but my memory is bad enough that I've learned to leave myself notes.

@sempervictus
Copy link
Author

Nice! It works - was worried merge-squash would lose authorship.
Thanks @illuusio.

@sempervictus
Copy link
Author

Prototyped a source selector for the journal reader which should allow us to filter which units (now SYSLOG_IDENTIFIERs) we're reading from. This is the "cheapest" way i could think of to do this in a situation where we may have tons of subscribers to individual journals, doing that check and failing 99% of the time. Not sure what that'll do if nothing's passed in, but in my little console version of this it does perform the filtering correctly and writes out the unit name. Reason for strstr vs direct comparison is that systemd can generate auto-named units with common prefixes, and this should help pull from them even if they're added after the reader is initialized.

@ddpbsd: will add hostname too, thanks as always for the useful suggestions.

@atomicturtle, could you please poke at that snprintf to make the message format OSSEC-friendly as a syslog message or whatever format you'd prefer it to have in the queue?

RageLtMan and others added 8 commits March 9, 2020 17:07
Start process of creating a journald reader using the native
systemd interfaces.

Make blatant assumption that being on Linux means having systemd,
that definition and include directive should be fenced off with a
check for the header file being present.
This is a blind journal reader which just reads messages from the
current tail-end of the system journal, userspace and kernel, with
no consideration for source or position.

Next steps:
  Implement sd_journal_get_cursor and friends for position markers.
  Implement filtering which journals are read - system, uspace, or
specific applications.
  Implement filtering by string, severity, time, etc.

Testing:
  Absolutely none, this probably won't compile, may eat pets, etc.
* Fix compiling and suspend warnings in read_journald.c

* Add READING_JOURNAL #define for debug printing in logcollector

* Add 'journald' as allowed XML config parameter for 'logformat'
Use the attribute passed to sd_read_journal as a filter for string
comparison with `strstr()` of the _SYSTEMD_UNIT unit field data
(after "_SYSTEMD_UNIT=" prefix).

This isn't nearly as complete as PCRE/SREGEX matching for units
where something like "user\d+@.*" would come into play, but its
a lot cheaper, and is really a lazy convenience around having full
service/socket/unit names passed to the reader.

Next Steps:
  Severity filtering
  Cursors/state management
  Do we want to even think about namespaces?
  Lots and lots of testing
As suggested by @ddpbsd, use SYSLOG_IDENTIFIER to filter for log
sources instead of _SYSTEMD_UNIT.
Extract the journal entry _HOSTNAME per @ddpbsd, insert into final
message string.

Clean up message buffers between iterations to avoid matches in
error conditions where data from the prior loop iter is used for
comparison or output.
USE_SYSTEMD defaults to yes and enabled systemd support on Linux.
Compile with USE_SYSTEMD=no to disable it.

endif not end.

The filter thing didn't work, so only worry if USE_SYSTEMD is 'yes'
explicitly.

Ignore this file is no HAVE_SYSTEMD

Add the systemd-dev pacakge for travis.
@sempervictus sempervictus force-pushed the feature/reader_journald branch from bf37101 to 3cc42d5 Compare March 9, 2020 21:09
@sempervictus
Copy link
Author

Awesome, looks like Dan fixed the CI.

So who here is comfortable enough with positional cursors on source data to help figure out how we want to deal with keeping track of where we were for each reader before a restart or other feed interrupting event?

@sempervictus
Copy link
Author

@atomicturtle @ddpbsd @dcid: i think we have hit a small architectural pain point here: our reader and parser code is intertwined, and now that audit logs can come from journald (file or memory) or a flat file (fd), the fact that the audit reader is its own encapsulated file, hard-coded to the fd approach, either means i copy-paste code (same for postgres and others) and make a ton of redundant bloat, or we think about splitting out actual accessors of API/ABI endpoints and file descriptors (the literal "readers") from the parsing and formatting code.
Thoughts? Interns to sacrifice on the altar of clean-code?

else if (strcmp(logff[i].logformat, "journald") == 0) {
#ifdef HAVE_SYSTEMD
verbose(READING_JOURNAL, ARGV0, logff[i].file);
sd_read_journal(logff[i].file);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering is this correct way of handling this as other reader just make logff[i].read = sd_read_something(with args) and journald reader hangs forever?

);
// Strip off "SYSLOG_IDENTIFIER=" prefix for unit comparison inline
if (strstr((char *)(jsrc + 18), unit) == (char *)(jsrc + 18) ) {
// Read hostname
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a way to log all journald stuff as reading /var/log/messages as this just logs some units output.

@ddpbsd
Copy link
Member

ddpbsd commented Mar 15, 2020

@sempervictus It depends on how quickly you want journald support. Adding another reader in the same way previous ones were added should be eaasier than refactoring all/most of them.

* If unit name is same as config unit then log message
 * If config unit name is all then log everything (same as syslog reader)
 * If config unit name starts with char '/' then assume one wants all
   but can't configure it correctly
@sempervictus
Copy link
Author

@ddpbsd: makes sense.
@illuusio: thanks for all the commits, merged.
@ALL: so cursors are the remaining item on the list - we need to be able to pick up where we left off during an agent restart, ideally.

@illuusio
Copy link
Contributor

illuusio commented Apr 1, 2020

@sempervictus only thing with cursor is where to store them? If used multiple inputs is there some commong way in OSSEC-HIDS to store such info if there is not we shoud re-invent wheel with JSON or such?

illuusio added 2 commits April 6, 2020 04:25
* Use Syslog-ng time string to pass ossec-analysisd time parsing functions. After that message is corretly separated in parts and rules behave corretly

* Add needed ':' to have program_name parsed as well
)

* Add new variable 'void *ptr' to struct logreader to handle situations like Journald where there is need to keep track of place with something diffrent than 'FILE *'. Currently only user is journald but in future it could be used by other readers

* Change Journald reader behave as every else reader out there. This means that reading loop is handling looping not reader itself. This meas there can be other readers as well with Journald-reader
@illuusio
Copy link
Contributor

illuusio commented Apr 6, 2020

Now this should be Beta quality as I got it crashing couple time but can we get some testers #1367 (I have been testing this on couple servers and it work quit nice).

@sempervictus
Copy link
Author

@illuusio: what're you using for your decoders and rules to see data, or are you seeing data hit the manager?

@illuusio
Copy link
Contributor

illuusio commented Apr 6, 2020

@sempervictus I just hardcoded version that printed on analysid inner things. Servers sends me emails about something is wrong. I have to investigate more to see if rules really work as they should but messages are parsed (and separated) as they are coming from syslog-ng (which it is mimicking).

@illuusio
Copy link
Contributor

illuusio commented Jun 3, 2020

I've been testing this while now in productions and it works fairly well. It needs to have some rules for journald but I think this could be merged and rest of the problems solved after it receives more testing from community.

@sempervictus
Copy link
Author

sempervictus commented Jun 18, 2020

Yeah, i'm good with pushing this into master and wider testing.
One remaining through comes to mind: do we want to permit the reader to restrict which hostname it reads from? There are two competing and completely valid use cases here:

  1. An OSSEC-equipped host logging all of its dockers or what not
  2. An OSSEC-equipped host with OSSEC-equipped LXD/LXC containers. For all we know, they have separate OSSEC masters, and only want their own data.

@sempervictus
Copy link
Author

@illuusio: this effort is going to be the subject of a talk or two, any objections to me giving you a shout-out for all the hard work by the name listed in your GH profile? Any prefered alternative for given name or handle than the ones listed there?

@illuusio
Copy link
Contributor

1. An OSSEC-equipped host logging all of its dockers or what not

2. An OSSEC-equipped host with OSSEC-equipped LXD/LXC containers. For all we know, they have separate OSSEC masters, and only want their own data.

Is there such restrictions from other readers? I'm just asking as this kind of a easy task but should be done as 'OSSEC'-way if there is any and is there some -tag in config for this?

Any prefered alternative for given name or handle than the ones listed there?

Nope just use that if you prefer to give credit but is not necessary as part of security in your life is to play low 😸

@sempervictus
Copy link
Author

@illuusio: There are no such restrictions from other readers AFAIK, but other readers access inodes which are inside the namespace of the container, whereas the host's journal will have entries from journald-aware child containers.
First talk on this is today @ 1530 EST in the OSSEC virtual webinar (COVID replacement for the normal annual conference)

@illuusio
Copy link
Contributor

illuusio commented Jun 24, 2020

Hopefully there is some video as it's kinda late for me. So If I understood correctly the problem is that you have bunch of containers which report to one server and there should be restricting tag to make it choose which it allows to take input like:

<allow-host>megahost</allow-host>
<allow-host>container-host1</allow-host>

Or I'm missing something.

@sempervictus
Copy link
Author

@illuusio: we'll have video, either atomicorp will have a copy or both they and i will.
Since the configurations have to apply to all OSSEC agents, i'd say we do something like:

<journal-namespace>local</journal-namespace>

or

<journal-namespace>all</journal-namespace>

and then have it pull the current NS' hostname to compare

@illuusio
Copy link
Contributor

Ok and local only reports local stuff and all everything. I take a look as I have to get OSSEC working with agents soon and need this one be number one citizen.

@ddpbsd
Copy link
Member

ddpbsd commented Jun 24, 2020

Please add libsystemd-dev to the list of required packages in INSTALL.

@sempervictus
Copy link
Author

@illuusio: thank you. Based on the conference today, sounds like we want to be able to filter host/host+containers, which log source (application/service), and even potentially a substring match in the event output, though i think that can be added later - we just wanna hit master branch with a stable configuration API (in terms of not changing parameters down the line, only potentially adding some if people want more filtering capacity @ the reader).
@ddpbsd: good catch! wouldn't have caught that since Arch has the development libraries right in the main -libs package

@atomicturtle
Copy link
Member

merging this in so we can start more open testing

@atomicturtle atomicturtle merged commit da98b38 into ossec:master Jan 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants