Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[UMBRELLA] Falco collaboration with CNCF tag-env-sustainability #2435

Open
incertum opened this issue Mar 6, 2023 · 20 comments
Open

[UMBRELLA] Falco collaboration with CNCF tag-env-sustainability #2435

incertum opened this issue Mar 6, 2023 · 20 comments
Milestone

Comments

@incertum
Copy link
Contributor

incertum commented Mar 6, 2023

Motivation

Falco would like to partner with https://github.com/cncf/tag-env-sustainability in order to improve Falco's efficiency (reduce compute overhead and resolve resource constraints limitations). This includes overcoming design challenges with new thinking in order to enable Falco to further extend threat detection capabilities w/ resource utilization budgets in mind.

Additional Context

EDIT Dec 19, 2023

@mkorbi
Copy link

mkorbi commented Mar 26, 2023

Hey @incertum, we would like to support you here.
First we will have to define a base line so that in the future you will have a measurable outcome. I opened some days ago the matching request for that method: cncf/tag-env-sustainability#64 (comment)

So we can get started here and then move on. WDTY?

Next steps would be to work out how to define the SCI for falco.

@incertum
Copy link
Contributor Author

Hi @mkorbi, amazing ❤️!

SCI scores and anything related to it is new to me. Eager to learn how we can define the SCI for Falco. Previously, we focused on traditional resource utilization and health metrics (e.g. CPU and memory usage, event or event drop rates ...).

CC @falcosecurity/core-maintainers

@incertum
Copy link
Contributor Author

@mkorbi Falco 0.35.0 is out featuring a new metrics option. By Falco 0.36.0 the metrics feature will transition into a stable state.

Following the discussion in cncf/tag-env-sustainability#64, we have a few questions:

cncf/tag-env-sustainability#64 (comment)
cncf/tag-env-sustainability#64 (comment)
However, it would be great to start collecting data on what we can already measure (CPU, GPU, memory), as @TheFoxAtWork said.

This would benefit use cases like Falco. CPU utilization is directly tied to the rate of events collected, which can be influenced by configurations. However, it is also dependent on the workload's nature, which is beyond Falco's control. Falco now supports measuring CPU utilization, event rates, and eBPF rate of tracepoint invocations natively.

cncf/tag-env-sustainability#64 (comment)
... deliverable to be an initial guide in evaluating resource consumption for projects in a default configuration so that interested projects can receive such an evaluation from this TAG ...

What could the expected deliverables for Falco look like? One idea is to provide adopters with a mathematical equation focused on overall CPU and/or memory utilization. This equation would allow them to calculate an approximate cost and observe how the cost changes when adjusting Falco's monitoring configurations. This would enable adopters to make informed decisions about resource allocation and optimize their usage of Falco.

Adopters can choose between measuring CPU and memory of Falco separately or use Falco's native metrics feature.

In addition, Falco follows a strict badging system across its repositories. Could see benefits to including TAG Environmental Sustainability engagement badge for our project ... WDYT? This badge would recognize our commitment to promoting and incorporating sustainable practices within the Falco community.

@leonardpahlke
Copy link

leonardpahlke commented Jun 20, 2023

Hey @incertum, congrats on the latest release!

As part of TAG ENV, we are establishing a working group that will first investigate and then guide future projects like Falco and other CNCF projects to track their Cloud Native Sustainability footprint from release to release. The WG charter is currently discussed, but as soon as it's up, this group will focus on this issue. cc @guidemetothemoon and @nikimanoledaki

--

Regarding your comments and questions. There are two topics we are mixing in this discussion:

  1. First, we want to make sure we incorporate cloud native sustainability in the development of our software. This is one is focused on maintainers building the open source software. It's about reporting, possible audits at some point, and enhancing the release process (adding a badge to the repo etc…).
  2. Secondly, we would like to enable transparency to users to check on the cloud native sustainability footprint. This is aimed at the end users of the software to best configure the project for their needs and understand the tradeoffs in configuration and overall application.

Both are important, but we should not mix it in discussions. The TAG scope overarches both. Both rely on the same metrics to make assessments. Hearing about your latest release, that features metrics, is great 👍.

The obvious next question is, which metrics we care about. That's a larger topic. And the WG will look into this more detailed. In essence, if we talk just about metrics, we care about energy usage. If the space matures further, we will care about natural resources too, but on a system level, so this would not apply to a project like Falco. Energy usage it is. We also need to investigate energy effectiveness (not just energy efficiency, but being “mindful” of energy “invested”). In most cases, we cannot measure the usage directly and need to use correlations like $ cost or map it with vCPU etc. The more accurately we can measure, the better are our estimates, right.

Let's circle back, if we “test bench” the project (first topic 1. mentioned) we have information on the system underneath. We don't have to go through Falco to measure the energy usage. We just have to record which parameters we adjust (total events, event kinds, etc.) in Falco and map it. For end users, this may not be the case since and user experience also comes into play. We may want to split this scope into two initiatives (1. & 2.) which are both related (would love to hear your thoughts @TheFoxAtWork).

Since this is the first time the TAG is working with a project to assess their cloud native sustainability footprint, I expect that this will be a great learning experience :D. I am excited!

@catblade
Copy link

Would there be a possibility of presenting FALCO on one of the TAG meetings, so we can learn more?

@incertum
Copy link
Contributor Author

Thank you @leonardpahlke and @catblade! happy to join one of the next TAG meetings.

Meanwhile, you might want to consider exploring this proposal on kernel version testing, which offers additional insights into why a kernel monitoring tool differs from other software. One notable distinction is that resource utilization depends on the actual workload and kernel settings of adopters, both of which are unpredictable factors for Falco developers. Consequently, I agree that enabling ...

@leonardpahlke

"transparency to users to check on the cloud native sustainability footprint. This is aimed at the end users of the software to best configure the project for their needs and understand the tradeoffs in configuration and overall application."

would be particularly beneficial for Falco.

Traditional CPU and memory usages are typically top of mind for SREs. Therefore, if we could derive energy consumption from those measurements, it would be highly appreciated.

That being said, happy to investigate and gather additional or different metrics.

@TheFoxAtWork
Copy link

There are a few items here worth considering (and indeed Falco is a different sort of cloud native project that makes this tricky but incredibly worthwhile as a first project to explore this with) (apologies if its a bit rambly - both the points, while generally separate, are more interrelated for projects like Falco due to what they do and less on how they do it, but i'd be happy to have this proven otherwise)

  1. This could likely be accomplished by leveraging the testing infrastructure the project has in place and plans to have in place - effectively supporting the right size for their needs. Efficient tracking of the Project in an execution environment with a few types of workloads and common kernel settings would provide good visibility for a baseline. Something like a 2x2 matrix/table to record Low and High interaction workloads and two common kernel settings (evaluated for each) is a good initial start for expressing baseline. Once a baseline is established, next steps may be looking over the ruleset to identify which rulesets are most intensive and which aren't (in testing and when running), then comparing to the value they provide adopters (the latter coming from the Falco team). After which a more concrete discussion on efficient versus valuable rules could be undertaken by the Project and potentially mark rules accordingly for adopters or update the maturity framework to include an "efficient, core-value" set.
  2. Having Falco provide transparency in its utilization for production environments is beneficial and it gives adopters a self-service option. Potential future improvements here could be Falco recommending which rules need tuned by the adopter as they are producing excessive noise and burning utilization above an identified threshold.

Lets look at the information available to us that doesn't details a specific provider or deployment environment if we can (since utilization/consumption measurements are wildly different) and focus on how the project is developed (primarily test infrastructure) and how it is commonly deployed (harder with Falco).

Somethings I expect to have confirmed:

  • Security tools are going to be computationally intensive due to the kinds of interactions they monitor and the rigor by which they are executed - anything we can do to guide adopters into more eco-conscious decisions without compromising security detections will improve the current state.
  • There are a limited number of ways to efficiently detect all the things adopters care about and largely will vary use case to use case.

@nikimanoledaki
Copy link

@catblade
Would there be a possibility of presenting FALCO on one of the TAG meetings, so we can learn more?

@incertum could you open a new issue using the Presentation template to do a short presentation at one of the upcoming regular meets, please? This will mainly be a discussion for TAG contributors to learn about Falco, get up to speed with the initiative discussed here, and discuss next steps.

Upcoming meets with available time include Wednesday 5th July & Wednesday 19th July. Meeting details can be found in the TAG's repo landing page. Thanks, looking forward to it! 🎉

@incertum
Copy link
Contributor Author

incertum commented Jul 1, 2023

Great, thank you! July 19th would be best.

@catblade
Copy link

catblade commented Jul 3, 2023

I'll make sure to add you into the agenda this week if someone else doesn't get to it first. :-)

@incertum
Copy link
Contributor Author

Updates July 19, 2023:

Here are the meeting notes https://docs.google.com/document/d/1TkmMyXJABC66NfYmivnh7z8Y_vpq9f9foaOuDVQS_Lo/edit#heading=h.5hquk4f1dn95, thanks @catblade!

Action Items on Falco side (ETA before Falco 0.36 release ~Sep 2023):

Tracking tag-env-sustainability progress:

@incertum
Copy link
Contributor Author

Updates Dec 19, 2023:

Expected ETA for a complete v1 to be "live" by KubeCon EU 2024.

@poiana
Copy link
Contributor

poiana commented Mar 18, 2024

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

@leogr
Copy link
Member

leogr commented Mar 22, 2024

/remove-lifecycle stale

@poiana
Copy link
Contributor

poiana commented Jun 20, 2024

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

@leogr
Copy link
Member

leogr commented Jun 20, 2024

/remove-lifecycle stale

@poiana
Copy link
Contributor

poiana commented Sep 18, 2024

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

@leogr
Copy link
Member

leogr commented Sep 20, 2024

/remove-lifecycle stale

@poiana
Copy link
Contributor

poiana commented Dec 19, 2024

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

@leogr
Copy link
Member

leogr commented Dec 19, 2024

/remove-lifecycle stale

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants