Expose the ability to checksum the first line to the users #2926

MOZGIII · 2020-07-01T08:51:14Z

Now when #2904 is merged, a logical continuation would be to start the discussion on exposing this functionality to the users.

Relevant references:

There are a few details that #2904 doesn't cover, but that we probably want to cover before we can move forward to expose this functionality to the users:

support for compressed files
skipping file headers (i.e. checksum second line instead of the first line)
how do we teach users about the properties and use cases of this checksum mode and how it compares to the others

binarylogic · 2020-08-07T15:36:50Z

I do like this change for the reasons discussed, but I want to think carefully about the UX here. This is exactly the kind of decision I do not want to present to the user. Checkpointing within the file source is already confusing and this would make it even more so. I wish there was a way to combine this strategy with the current checksum strategy so it "just works" for small and large files.

MOZGIII · 2020-08-07T18:12:31Z

As far as I understand, the only use case when we are reading binary files is when we're processing compressed data. The only meaningful way I know that works for compressed log files if the compression algorithm permits streaming the uncompressed data as it arrives.

What if we just do the checksumming on a decompressed stream in case the file is compressed? We can then use the line-aware fingerprinter. I think the resulting solution would just work for any meaningful case - covering all the existing cases, but with less painful tradeoffs.

binarylogic · 2021-02-06T16:09:23Z

This is done. #5215 should have closed this.

MOZGIII mentioned this issue Jul 1, 2020

The ability to checksum by the first line at the file server #2890

Closed

binarylogic mentioned this issue Aug 7, 2020

The problem with fingerprinters #2701

Closed

MOZGIII mentioned this issue Sep 15, 2020

Strategy device_and_inode does not work properly #2163

Closed

binarylogic closed this as completed Feb 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose the ability to checksum the first line to the users #2926

Expose the ability to checksum the first line to the users #2926

MOZGIII commented Jul 1, 2020

binarylogic commented Aug 7, 2020

MOZGIII commented Aug 7, 2020 •

edited

Loading

binarylogic commented Feb 6, 2021

Expose the ability to checksum the first line to the users #2926

Expose the ability to checksum the first line to the users #2926

Comments

MOZGIII commented Jul 1, 2020

binarylogic commented Aug 7, 2020

MOZGIII commented Aug 7, 2020 • edited Loading

binarylogic commented Feb 6, 2021

MOZGIII commented Aug 7, 2020 •

edited

Loading