[serverless] add cspm metering functionality #162019

CohenIdo · 2023-07-17T07:53:13Z

solves

https://github.com/elastic/security-team/issues/7020

First usage of the shared task manager class for the security serverless project:

Registered CSPM metering callback
Add logs to the task manager service
Updated the interfaces

How to test it?

Follow the instructions provided in: https://github.com/elastic/security-team/issues/7025 to deploy the development serverless environment.
Checkout the branch
Start Kibana using the following command: yarn start --serverless=security (Make sure Elasticsearch is already running before).
Check the Kibana logs.
Look for the specified logger: plugins.securitySolutionServerless.serverless-security:cspm-usage-reporting-task:1
Review the logs associated with the specified logger to gather the required information.

[2023-07-20T17:50:38.552+03:00][INFO ][plugins.securitySolutionServerless.serverless-security:cspm-usage-reporting-task:1] usage records report was sent successfully
[2023-07-20T17:51:11.365+03:00][INFO ][plugins.securitySolutionServerless.serverless-security:cspm-usage-reporting-task:1] received usage records: [{"id":"serverless-security:cspm-usage-report","usage_timestamp":"2023-07-20T10:47:54.506Z","creation_timestamp":"7/20/2023, 5:51:11 PM","usage":{"type":"cspm-resource","quantity":1219,"period_seconds":3600,"cause":"CIS Amazon Web Services Foundations"},"source":{"id":"serverless-security:cspm-usage-reporting-task:1","instance_group_id":"missing project id"}}]
[2023-07-20T17:51:11.572+03:00][INFO ][plugins.securitySolutionServerless.serverless-security:cspm-usage-reporting-task:1] usage records report was sent successfully

…-for-cloud-security

joeypoon

looking good, just some small things

joeypoon · 2023-07-20T23:36:10Z

x-pack/plugins/security_solution_serverless/server/constants.ts

@@ -9,4 +9,4 @@
 const namespace = 'elastic-system';
 const USAGE_SERVICE_BASE_API_URL = `http://usage-api.${namespace}/api`;
 const USAGE_SERVICE_BASE_API_URL_V1 = `${USAGE_SERVICE_BASE_API_URL}/v1`;
-export const USAGE_SERVICE_USAGE_URL = `${USAGE_SERVICE_BASE_API_URL_V1}/usage`;
+export const USAGE_SERVICE_USAGE_URL = `${USAGE_SERVICE_BASE_API_URL_V1}`;


is this on purpose? I believe we need the /usage: https://github.com/elastic/usage-api/blob/5414a1c5f170b81552deb2140307e135ad93c7e0/api/user-v1-spec.yml#L50

you right:)

joeypoon · 2023-07-20T23:39:23Z

x-pack/plugins/security_solution_serverless/server/types.ts

+  meteringCallback: MeteringCallback;
+}
+
+export interface SecurityMetadataTaskStartContract {


let's rename this to: SecurityUsageReportingTaskStartContract

joeypoon · 2023-07-20T23:43:03Z

x-pack/plugins/security_solution_serverless/server/task_manager/usage_reporting_task.ts


    let usageReportResponse: Response | undefined;

    try {
      usageReportResponse = await usageReportingService.reportUsage(usageRecords);
-    } catch (e) {
-      this.logger.warn(JSON.stringify(e));
+      this.logger.info(`usage records report was sent successfully`);


I'm thinking it might make sense to log the response and some of the usage record data on success for compliance and billing debugging.

it's good idea to add the response, regarding the usage record data, we already have it in line 127

joeypoon · 2023-07-20T23:44:31Z

x-pack/plugins/security_solution_serverless/server/task_manager/usage_reporting_task.ts

+      taskId: this.taskId,
+      lastSuccessfulReport,
+    });
+    this.logger.info(`received usage records: ${JSON.stringify(usageRecords)}`);


I worry about the extra resources/cost we would incur here, especially if it's a lot of records. maybe logger.debug instead?

CohenIdo · 2023-07-20T15:12:46Z

x-pack/plugins/security_solution_serverless/server/cloud_security/cspm_metring_task.ts

+      aggs: {
+        unique_resources: {
+          cardinality: {
+            field: 'resource.id',


precision treshold should be added here

Let's add this comment in the code for now

I planned to add a value as part of this PR, I am going add it today.

eyalkraft

Good Job Ido!
For the first iteration the most important comments are:

Necessary Decoupling of benchmarks and bucket
Necessary documentation of the core of the metering function

eyalkraft · 2023-07-24T07:16:49Z

x-pack/plugins/security_solution_serverless/server/cloud_security/cspm_metring_task.ts

+      aggs: {
+        unique_resources: {
+          cardinality: {
+            field: 'resource.id',


Let's add this comment in the code for now

eyalkraft · 2023-07-24T07:26:46Z

x-pack/plugins/security_solution_serverless/server/cloud_security/cspm_metring_task.ts

+  };
+}
+
+interface BenchmarkBucket {


Currently, It doesn't seem like there's going to be a bucket per benchmark.
For example, KSPM and CSPM belong to the same bucket. Additionally, CNVM isn't benchmark related at all.
I suggest decoupling benchmarks and buckets in terminology and implementation.

KSPM and CSPM belong to the same bucket

wdym? We are going to charge per sulotion, so we will have a billing bucket for KSPM, CSPM and Vul mngm. This PR handling CPSM only.

I mean here to the CSPM benchmark which for now is cis_aws (and soon we will support GCP as well) I did it to provide more context on the charge, it used in the cause field.

To my understanding, we have a single bucket for KSPM + CSPM + CNVM as @MikePaquette described here https://github.com/elastic/security-team/issues/6497#issuecomment-1627719606

Sounds weird to me, in this case KSPM resource will be equal from price perspective as an EC2 instance on vulnerability mngm?

Anyway, this implementation method will enable us to support also the method you mentioned via the billing transform function and will enable also decouple the collectors tasks which I think will be easier to maintain it that way.

I tend to disagree.
We know Kibana, Kibana tests and so on way better than we know whatever infrastructure that would be provided by the transform function. I think we should keep the transform function as lean and stupid as possible.
Additionally, Bucket is a metering specific term that isn't related to benchmarks.
We have no reason for counting separately which complicates the metering for no reason.

I already updated the code to one single bucket for cloud security

eyalkraft · 2023-07-24T07:40:27Z

x-pack/plugins/security_solution_serverless/server/cloud_security/cspm_metring_task.ts

+    const projectId = cloudSetup?.serverless?.projectId || 'missing project id';
+    logger.error('no project id found');
+
+    const response = await esClient.search<unknown, Benchmarks>(getFindingsByResourceAggQuery());


We've discussed why querying isn't enough, and additional processing in the code is necessary. Please include the reasoning here as a comment with relevant links and an (very short) explanation why the alternative approach of querying directly isn't used. (https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html#_counts_are_approximate)

@eyalkraft I researched that and learned that behind the scenes term aggregation using the same logic as cardinality. (terms-agg-doc-count-error)
So as you suggested in our discussion, querying is indeed enough -without another iteration in Kibana.

Good 👍
Is the code updated then and I didn't realize it? Why are there still iterations over the query results in the code?

For the benchmarks...
https://github.com/elastic/kibana/pull/162019/files#r1271927221

eyalkraft · 2023-07-24T07:42:10Z

x-pack/plugins/security_solution_serverless/server/cloud_security/metering_tasks_configs.ts

+  taskTitle: 'Security Solution - CSPM Metring Periodic Tasks',
+  meteringCallback: cspmMetringCallback,
+  interval: '3600s', // 1 hour
+  periodSeconds: 3600, // equal to task interval


can we use the actual task interval variable here? to make sure things won't brake in case it changes

eyalkraft · 2023-07-24T07:54:07Z

x-pack/plugins/security_solution_serverless/server/cloud_security/cspm_metring_task.ts

+      ? cspmBenchmarks.reduce(
+          (accumulator, benchmarkBucket: BenchmarkBucket) => {
+            accumulator.benchmarks.push(benchmarkBucket.key);
+            accumulator.sumResources += benchmarkBucket.unique_resources.value;


again, not sure why the breaking down by benchmarks to later sum all together.

The purpose of this was to provide more context about the charge, which was used in the cause field.

The total number of benchmarks that the customer may have will be in future no more than three or four (today it will be a maximum of ~2) , so it will be a very small array iteration and I believe it is worth it.

eyalkraft · 2023-07-24T08:01:17Z

x-pack/plugins/security_solution_serverless/server/cloud_security/cspm_metring_task.ts

+
+    const usageRecords = {
+      id: `${TASK_TYPE_PREFIX}:${CSPM_METRING_TASK_NAME}`,
+      usage_timestamp: minTimestamp,


I'm not sure this should be the minTimestamp, and not the time when the billing query happens.
What are the implications of this value?
What happens if a user uninstalls all the benchmarks and this value is the same (the last finding time before the un-installment) for the entire 24 hours after?
What is this value used for?

usage_timestamp
This represents when the usage occurred, or the beginning of the range of usage that this record captures. If the usage is based on an aggregation of other data stored in another system, it should be based on the earliest of those records so we can tie this usage record back to the source data (for troubleshooting or presentation purposes).

taken from usage record schema v2.

eyalkraft · 2023-07-24T08:06:03Z

x-pack/plugins/security_solution_serverless/server/task_manager/usage_reporting_task.ts

+              run: async () => {
+                return this.runTask(taskInstance, core, meteringCallback);
+              },
+              // TODO


eyalkraft · 2023-07-24T08:07:11Z

x-pack/plugins/security_solution_serverless/server/task_manager/usage_reporting_task.ts

@@ -109,29 +104,35 @@ export class SecurityUsageReportingTask {
  ) => {
    // if task was not `.start()`'d yet, then exit
    if (!this.wasStarted) {
-      this.logger.debug('[runTask()] Aborted. Task not started yet');
      return;


wouldn't this be worth logging?

CohenIdo · 2023-07-24T08:45:45Z

Necessary documentation of the core of the metering function

@eyalkraft, I planned to document the work, can you elaborate more about what you mean in "the core of the metering function"

eyalkraft · 2023-07-24T08:53:51Z

Talking about the file cspm_metring_task.ts and specifically the query and the code responsible for the metring itself

eyalkraft

Required changes:

On no resources, don't send anything to the metering service (don't send an empty array). Make sure that not sending is equivalent to sending count 0.

Following PR:

tests

…fix'

…-fix'

eyalkraft

LGTM, please add the tests ASAP

kibana-ci · 2023-07-25T09:59:03Z

💚 Build Succeeded

Buildkite Build
Commit: 096a4d6

Metrics [docs]

✅ unchanged

History

💔 Build #144345 failed f0ecd32afaacc9651d3ea8c88e784791525e4e23
💔 Build #144300 failed 4ffa868
💔 Build #144223 failed 270abfb4234e54cc44f01107cdb607736390f5ac
💔 Build #143929 failed aae097fd7fe5bed1cfac0a08c19b1f57052ad0ef
💔 Build #143880 failed b015f355b2a86c3cb47721c8a891c6d14874394c

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

add logs and order files

2a6acaf

CohenIdo added the release_note:skip Skip the PR/issue when compiling release notes label Jul 17, 2023

working version of fetching resources before checking api

36e17a9

CohenIdo force-pushed the serverless-metering-for-cloud-security branch from f831c5c to 36e17a9 Compare July 18, 2023 11:36

clean comments

619ae1b

CohenIdo force-pushed the serverless-metering-for-cloud-security branch from 8d7fccc to 619ae1b Compare July 18, 2023 14:45

CohenIdo added 2 commits July 18, 2023 17:47

revert changes

cd85db9

revert changes

4d00c37

CohenIdo force-pushed the serverless-metering-for-cloud-security branch from a91b3f1 to 4d00c37 Compare July 18, 2023 15:33

CohenIdo marked this pull request as ready for review July 18, 2023 15:38

CohenIdo requested a review from a team as a code owner July 18, 2023 15:38

CohenIdo requested a review from joeypoon July 18, 2023 15:39

CohenIdo added 2 commits July 19, 2023 13:16

working version woth calling to the usage api

9aaf719

Merge remote-tracking branch 'upstream/main' into serverless-metering…

e69f8f0

…-for-cloud-security

CohenIdo force-pushed the serverless-metering-for-cloud-security branch from e87f7b6 to e69f8f0 Compare July 19, 2023 10:17

working and clean version

23073e3

CohenIdo force-pushed the serverless-metering-for-cloud-security branch from cfe526d to 23073e3 Compare July 20, 2023 12:50

aggergate using cardinallity

3e272d4

CohenIdo force-pushed the serverless-metering-for-cloud-security branch from b015f35 to 3e272d4 Compare July 20, 2023 14:33

CohenIdo requested a review from eyalkraft July 20, 2023 14:38

joeypoon reviewed Jul 20, 2023

View reviewed changes

CohenIdo commented Jul 21, 2023

View reviewed changes

eyalkraft requested changes Jul 24, 2023

View reviewed changes

CohenIdo requested a review from eyalkraft July 24, 2023 08:49

pr comments

3be9a0f

CohenIdo force-pushed the serverless-metering-for-cloud-security branch from aae097f to 3be9a0f Compare July 24, 2023 08:57

ready to review

61b3a9a

CohenIdo requested a review from joeypoon July 24, 2023 11:30

CohenIdo force-pushed the serverless-metering-for-cloud-security branch from 270abfb to 61b3a9a Compare July 24, 2023 12:10

refactor to one task manager for all cloud security solutions

903e6a2

CohenIdo force-pushed the serverless-metering-for-cloud-security branch from cded412 to 903e6a2 Compare July 24, 2023 13:20

CohenIdo added 3 commits July 24, 2023 16:24

update log

4ffa868

.

4525e54

remove debug code

6143c36

eyalkraft requested changes Jul 25, 2023

View reviewed changes

CohenIdo added 2 commits July 25, 2023 10:52

handle empty response

3438b55

linting

2886e68

CohenIdo force-pushed the serverless-metering-for-cloud-security branch from f0ecd32 to 2886e68 Compare July 25, 2023 08:00

Merge branch 'main' into serverless-metering-for-cloud-security

1e87ed5

CohenIdo requested a review from eyalkraft July 25, 2023 08:00

kibanamachine added 2 commits July 25, 2023 08:06

[CI] Auto-commit changed files from 'node scripts/lint_ts_projects --…

e4bfc4b

…fix'

[CI] Auto-commit changed files from 'node scripts/eslint --no-cache -…

096a4d6

…-fix'

eyalkraft approved these changes Jul 25, 2023

View reviewed changes

CohenIdo merged commit 707ed13 into elastic:main Jul 25, 2023

kibanamachine added v8.10.0 backport:skip This commit does not require backporting labels Jul 25, 2023

ThomThomson pushed a commit to ThomThomson/kibana that referenced this pull request Aug 1, 2023

[serverless] add cspm metering functionality (elastic#162019)

1e11b83

[serverless] add cspm metering functionality #162019

[serverless] add cspm metering functionality #162019

Conversation

CohenIdo commented Jul 17, 2023 • edited Loading

How to test it?

joeypoon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CohenIdo Jul 24, 2023 • edited Loading

Choose a reason for hiding this comment

eyalkraft left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CohenIdo Jul 24, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CohenIdo Jul 24, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CohenIdo Jul 24, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eyalkraft Jul 24, 2023 • edited Loading

Choose a reason for hiding this comment

CohenIdo commented Jul 24, 2023

eyalkraft commented Jul 24, 2023 • edited Loading

eyalkraft left a comment

Choose a reason for hiding this comment

eyalkraft left a comment

Choose a reason for hiding this comment

kibana-ci commented Jul 25, 2023

💚 Build Succeeded

Metrics [docs]

History

CohenIdo commented Jul 17, 2023 •

edited

Loading

CohenIdo Jul 24, 2023 •

edited

Loading

CohenIdo Jul 24, 2023 •

edited

Loading

CohenIdo Jul 24, 2023 •

edited

Loading

CohenIdo Jul 24, 2023 •

edited

Loading

eyalkraft Jul 24, 2023 •

edited

Loading

eyalkraft commented Jul 24, 2023 •

edited

Loading