Skip to content

Commit

Permalink
fix(kubernetes_logs source): Start reading logs at checkpoint (#4043)
Browse files Browse the repository at this point in the history
* Turn start_at_beginning off at kubernetes_logs source

Signed-off-by: MOZGIII <[email protected]>

* Add the extensive comments to motivate the decisions on the FileServer confguration

Signed-off-by: MOZGIII <[email protected]>
  • Loading branch information
MOZGIII authored Sep 24, 2020
1 parent d9e0371 commit b995485
Showing 1 changed file with 37 additions and 1 deletion.
38 changes: 37 additions & 1 deletion src/sources/kubernetes_logs/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -191,20 +191,56 @@ impl Source {
let annotator = PodMetadataAnnotator::new(state_reader, fields_spec);

// TODO: maybe some of the parameters have to be configurable.

// The 16KB is the maximum size of the payload at single line for both
// docker and CRI log formats.
// We take a double of that to account for metadata and padding, and to
// have a power of two rounding. Line splitting is countered at the
// parsers, see the `partial_events_merger` logic.
let max_line_bytes = 32 * 1024; // 32 KiB
let file_server = FileServer {
// Use our special paths provider.
paths_provider,
// This is the default value for the read buffer size.
max_read_bytes: 2048,
start_at_beginning: true,
// We want to use checkpoining mechanism, and resume from where we
// left off.
start_at_beginning: false,
// We're now aware of the use cases that would require specifying
// the starting point in time since when we should collect the logs,
// so we just disable it. If users ask, we can expose it. There may
// be other, more sound ways for users considering the use of this
// option to solvce their use case, so take consideration.
ignore_before: None,
// Max line length to expect during regular log reads, see the
// explanation above.
max_line_bytes,
// The directory where to keep the checkpoints.
data_dir,
// This value specifies not exactly the globbing, but interval
// between the polling the files to watch from the `paths_provider`.
// This is quite efficient, yet might still create some load of the
// file system, so this call is 10 times larger than the default for
// the files.
glob_minimum_cooldown: Duration::from_secs(10),
// The shape of the log files is well-known in the Kubernetes
// environment, so we pick the a specially crafted fingerprinter
// for the log files.
fingerprinter: Fingerprinter::FirstLineChecksum {
// Max line length to expect during fingerprinting, see the
// explanation above.
max_line_length: max_line_bytes,
},
// We expect the files distribution to not be a concern because of
// the way we pick files for gathering: for each container, only the
// last log file is currently picked. Thus there's no need for
// ordering, as each logical log stream is guaranteed to start with
// just one file, makis it impossible to interleave with other
// relevant log lines in the absense of such relevant log lines.
oldest_first: false,
// We do not remove the log files, `kubelet` is responsible for it.
remove_after: None,
// The standard emitter.
emitter: FileSourceInternalEventsEmitter,
};

Expand Down

0 comments on commit b995485

Please sign in to comment.