Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libbeat/processors/cache: add file-backed cache #36686

Merged
merged 5 commits into from
Sep 28, 2023

Conversation

efd6
Copy link
Contributor

@efd6 efd6 commented Sep 27, 2023

Proposed commit message

See title.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Checklist

  • [ ]

How to test this PR locally

Related issues

Use cases

Screenshots

Logs

@efd6 efd6 self-assigned this Sep 27, 2023
@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Sep 27, 2023
@efd6 efd6 force-pushed the i2816-cache_processor_file_backed branch from 0afe564 to 8675f2f Compare September 27, 2023 06:43
@efd6 efd6 force-pushed the i2816-cache_processor_file_backed branch from 8675f2f to d6fcef4 Compare September 27, 2023 06:45
@efd6 efd6 marked this pull request as ready for review September 27, 2023 07:57
@efd6 efd6 requested a review from a team as a code owner September 27, 2023 07:57
@elasticmachine
Copy link
Collaborator

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

@elasticmachine
Copy link
Collaborator

elasticmachine commented Sep 27, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-09-28T03:13:08.501+0000

  • Duration: 76 min 5 sec

Test stats 🧪

Test Results
Failed 0
Passed 28350
Skipped 2013
Total 30363

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@efd6 efd6 force-pushed the i2816-cache_processor_file_backed branch from 829ffe3 to e0327b6 Compare September 27, 2023 22:10
ID string `config:"id"`
}

type fileConfig struct {
ID string `config:"id"`
WriteOutEvery time.Duration `config:"write_frequency"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to look over the rest of the code base to see if there is any consistency for naming this type of setting. My initial thought is that this is an "interval" rather than a "frequency", because an interval is a specified as duration between events.

err = dec.Decode(&e)
if err != nil {
if err != io.EOF {
c.log.Errorw("failed to read state element", "error", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be nice to know the input_offset from the decoder.

return
}
// Try to be atomic.
err = os.Rename(tmp, c.path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot traps with Rename that vary by OS. If you can put the "temporary" file in the same directory at the final file you can avoid most of them. For example, you cannot rename a file if the source and dest are on different volumes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting it in filepath.Dir(c.path) should get us there then.

}
tmp := f.Name()
defer func() {
err = f.Close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing to be aware of is that write operations are cached on Windows. So even though this may have closed the file, it's not guaranteed to be synced to disk. You could incorporate an f.Sync() call to ensure this is flush from the write cache to disk (it does a https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-flushfilebuffers).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is theoretically true on posix as well (I was considering this, but didn't do it).

return
}
// We are writing into tmp, so make sure we are private.
err = os.Chmod(f.Name(), 0o600)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this does nothing for Windows. Does that pose a problem for users? 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it goes in the path.data directory, we are at least not making things worse.

@efd6 efd6 requested a review from andrewkroh September 28, 2023 02:11
@efd6 efd6 force-pushed the i2816-cache_processor_file_backed branch from 35de580 to 5d57495 Compare September 28, 2023 02:17
Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

libbeat/processors/cache/file_store.go Outdated Show resolved Hide resolved
@efd6 efd6 force-pushed the i2816-cache_processor_file_backed branch 3 times, most recently from fca8d75 to 878dcb4 Compare September 28, 2023 03:01
@efd6 efd6 force-pushed the i2816-cache_processor_file_backed branch from 878dcb4 to 8238e85 Compare September 28, 2023 03:07
@efd6 efd6 requested a review from andrewkroh September 28, 2023 03:13
@efd6 efd6 merged commit 9e503b4 into elastic:main Sep 28, 2023
6 of 8 checks passed
Scholar-Li pushed a commit to Scholar-Li/beats that referenced this pull request Feb 5, 2024
This adds a file-backed cache implementation to the cache processor. Caching
between put and get operations is done in-memory using the memory cache, but
the file cache will load previously written cache state on start-up and will
write cache contents to file when the cache is dropped. Depending on user
configuration, the file cache will also periodically write the cache state to
the backing file to reduce state loss in the event of a crash.

For simplicity, the cache state is stored as a JSON stream of objects with
fields for the key, value and expiry timestamp of cached entities.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
8.11-candidate backport-skip Skip notification from the automated backport with mergify enhancement libbeat
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants