Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[serverless] add cspm metering functionality #162019

Merged
merged 20 commits into from
Jul 25, 2023

Conversation

CohenIdo
Copy link
Contributor

@CohenIdo CohenIdo commented Jul 17, 2023

solves

First usage of the shared task manager class for the security serverless project:

  • Registered CSPM metering callback
  • Add logs to the task manager service
  • Updated the interfaces

How to test it?

  1. Follow the instructions provided in: https://github.com/elastic/security-team/issues/7025 to deploy the development serverless environment.
  2. Checkout the branch
  3. Start Kibana using the following command: yarn start --serverless=security (Make sure Elasticsearch is already running before).
  4. Check the Kibana logs.
    Look for the specified logger: plugins.securitySolutionServerless.serverless-security:cspm-usage-reporting-task:1
    Review the logs associated with the specified logger to gather the required information.
[2023-07-20T17:50:38.552+03:00][INFO ][plugins.securitySolutionServerless.serverless-security:cspm-usage-reporting-task:1] usage records report was sent successfully
[2023-07-20T17:51:11.365+03:00][INFO ][plugins.securitySolutionServerless.serverless-security:cspm-usage-reporting-task:1] received usage records: [{"id":"serverless-security:cspm-usage-report","usage_timestamp":"2023-07-20T10:47:54.506Z","creation_timestamp":"7/20/2023, 5:51:11 PM","usage":{"type":"cspm-resource","quantity":1219,"period_seconds":3600,"cause":"CIS Amazon Web Services Foundations"},"source":{"id":"serverless-security:cspm-usage-reporting-task:1","instance_group_id":"missing project id"}}]
[2023-07-20T17:51:11.572+03:00][INFO ][plugins.securitySolutionServerless.serverless-security:cspm-usage-reporting-task:1] usage records report was sent successfully

@CohenIdo CohenIdo added the release_note:skip Skip the PR/issue when compiling release notes label Jul 17, 2023
@CohenIdo CohenIdo force-pushed the serverless-metering-for-cloud-security branch from f831c5c to 36e17a9 Compare July 18, 2023 11:36
@CohenIdo CohenIdo force-pushed the serverless-metering-for-cloud-security branch from 8d7fccc to 619ae1b Compare July 18, 2023 14:45
@CohenIdo CohenIdo force-pushed the serverless-metering-for-cloud-security branch from a91b3f1 to 4d00c37 Compare July 18, 2023 15:33
@CohenIdo CohenIdo marked this pull request as ready for review July 18, 2023 15:38
@CohenIdo CohenIdo requested a review from a team as a code owner July 18, 2023 15:38
@CohenIdo CohenIdo requested a review from joeypoon July 18, 2023 15:39
@CohenIdo CohenIdo force-pushed the serverless-metering-for-cloud-security branch from e87f7b6 to e69f8f0 Compare July 19, 2023 10:17
@CohenIdo CohenIdo force-pushed the serverless-metering-for-cloud-security branch from cfe526d to 23073e3 Compare July 20, 2023 12:50
@CohenIdo CohenIdo force-pushed the serverless-metering-for-cloud-security branch from b015f35 to 3e272d4 Compare July 20, 2023 14:33
@CohenIdo CohenIdo requested a review from eyalkraft July 20, 2023 14:38
Copy link
Member

@joeypoon joeypoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good, just some small things

@@ -9,4 +9,4 @@
const namespace = 'elastic-system';
const USAGE_SERVICE_BASE_API_URL = `http://usage-api.${namespace}/api`;
const USAGE_SERVICE_BASE_API_URL_V1 = `${USAGE_SERVICE_BASE_API_URL}/v1`;
export const USAGE_SERVICE_USAGE_URL = `${USAGE_SERVICE_BASE_API_URL_V1}/usage`;
export const USAGE_SERVICE_USAGE_URL = `${USAGE_SERVICE_BASE_API_URL_V1}`;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you right:)

meteringCallback: MeteringCallback;
}

export interface SecurityMetadataTaskStartContract {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's rename this to: SecurityUsageReportingTaskStartContract

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done:)


let usageReportResponse: Response | undefined;

try {
usageReportResponse = await usageReportingService.reportUsage(usageRecords);
} catch (e) {
this.logger.warn(JSON.stringify(e));
this.logger.info(`usage records report was sent successfully`);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking it might make sense to log the response and some of the usage record data on success for compliance and billing debugging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's good idea to add the response, regarding the usage record data, we already have it in line 127

taskId: this.taskId,
lastSuccessfulReport,
});
this.logger.info(`received usage records: ${JSON.stringify(usageRecords)}`);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry about the extra resources/cost we would incur here, especially if it's a lot of records. maybe logger.debug instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree

aggs: {
unique_resources: {
cardinality: {
field: 'resource.id',
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

precision treshold should be added here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add this comment in the code for now

Copy link
Contributor Author

@CohenIdo CohenIdo Jul 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I planned to add a value as part of this PR, I am going add it today.

Copy link
Contributor

@eyalkraft eyalkraft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good Job Ido!
For the first iteration the most important comments are:

  1. Necessary Decoupling of benchmarks and bucket
  2. Necessary documentation of the core of the metering function

aggs: {
unique_resources: {
cardinality: {
field: 'resource.id',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add this comment in the code for now

};
}

interface BenchmarkBucket {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, It doesn't seem like there's going to be a bucket per benchmark.
For example, KSPM and CSPM belong to the same bucket. Additionally, CNVM isn't benchmark related at all.
I suggest decoupling benchmarks and buckets in terminology and implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KSPM and CSPM belong to the same bucket

wdym? We are going to charge per sulotion, so we will have a billing bucket for KSPM, CSPM and Vul mngm. This PR handling CPSM only.

I mean here to the CSPM benchmark which for now is cis_aws (and soon we will support GCP as well) I did it to provide more context on the charge, it used in the cause field.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my understanding, we have a single bucket for KSPM + CSPM + CNVM as @MikePaquette described here https://github.com/elastic/security-team/issues/6497#issuecomment-1627719606

Copy link
Contributor Author

@CohenIdo CohenIdo Jul 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds weird to me, in this case KSPM resource will be equal from price perspective as an EC2 instance on vulnerability mngm?

Anyway, this implementation method will enable us to support also the method you mentioned via the billing transform function and will enable also decouple the collectors tasks which I think will be easier to maintain it that way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to disagree.
We know Kibana, Kibana tests and so on way better than we know whatever infrastructure that would be provided by the transform function. I think we should keep the transform function as lean and stupid as possible.
Additionally, Bucket is a metering specific term that isn't related to benchmarks.
We have no reason for counting separately which complicates the metering for no reason.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already updated the code to one single bucket for cloud security

const projectId = cloudSetup?.serverless?.projectId || 'missing project id';
logger.error('no project id found');

const response = await esClient.search<unknown, Benchmarks>(getFindingsByResourceAggQuery());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've discussed why querying isn't enough, and additional processing in the code is necessary. Please include the reasoning here as a comment with relevant links and an (very short) explanation why the alternative approach of querying directly isn't used. (https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html#_counts_are_approximate)

Copy link
Contributor Author

@CohenIdo CohenIdo Jul 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eyalkraft I researched that and learned that behind the scenes term aggregation using the same logic as cardinality. (terms-agg-doc-count-error)
So as you suggested in our discussion, querying is indeed enough -without another iteration in Kibana.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good 👍
Is the code updated then and I didn't realize it? Why are there still iterations over the query results in the code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

taskTitle: 'Security Solution - CSPM Metring Periodic Tasks',
meteringCallback: cspmMetringCallback,
interval: '3600s', // 1 hour
periodSeconds: 3600, // equal to task interval
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use the actual task interval variable here? to make sure things won't brake in case it changes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

? cspmBenchmarks.reduce(
(accumulator, benchmarkBucket: BenchmarkBucket) => {
accumulator.benchmarks.push(benchmarkBucket.key);
accumulator.sumResources += benchmarkBucket.unique_resources.value;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, not sure why the breaking down by benchmarks to later sum all together.

Copy link
Contributor Author

@CohenIdo CohenIdo Jul 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of this was to provide more context about the charge, which was used in the cause field.

The total number of benchmarks that the customer may have will be in future no more than three or four (today it will be a maximum of ~2) , so it will be a very small array iteration and I believe it is worth it.


const usageRecords = {
id: `${TASK_TYPE_PREFIX}:${CSPM_METRING_TASK_NAME}`,
usage_timestamp: minTimestamp,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this should be the minTimestamp, and not the time when the billing query happens.
What are the implications of this value?
What happens if a user uninstalls all the benchmarks and this value is the same (the last finding time before the un-installment) for the entire 24 hours after?
What is this value used for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usage_timestamp
This represents when the usage occurred, or the beginning of the range of usage that this record captures. If the usage is based on an aggregation of other data stored in another system, it should be based on the earliest of those records so we can tie this usage record back to the source data (for troubleshooting or presentation purposes).

taken from usage record schema v2.

run: async () => {
return this.runTask(taskInstance, core, meteringCallback);
},
// TODO
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO what?

@@ -109,29 +104,35 @@ export class SecurityUsageReportingTask {
) => {
// if task was not `.start()`'d yet, then exit
if (!this.wasStarted) {
this.logger.debug('[runTask()] Aborted. Task not started yet');
return;
Copy link
Contributor

@eyalkraft eyalkraft Jul 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't this be worth logging?

@CohenIdo
Copy link
Contributor Author

Necessary documentation of the core of the metering function

@eyalkraft, I planned to document the work, can you elaborate more about what you mean in "the core of the metering function"

@CohenIdo CohenIdo requested a review from eyalkraft July 24, 2023 08:49
@eyalkraft
Copy link
Contributor

eyalkraft commented Jul 24, 2023

Talking about the file cspm_metring_task.ts and specifically the query and the code responsible for the metring itself

@CohenIdo CohenIdo force-pushed the serverless-metering-for-cloud-security branch from aae097f to 3be9a0f Compare July 24, 2023 08:57
@CohenIdo CohenIdo requested a review from joeypoon July 24, 2023 11:30
@CohenIdo CohenIdo force-pushed the serverless-metering-for-cloud-security branch from 270abfb to 61b3a9a Compare July 24, 2023 12:10
@CohenIdo CohenIdo force-pushed the serverless-metering-for-cloud-security branch from cded412 to 903e6a2 Compare July 24, 2023 13:20
Copy link
Contributor

@eyalkraft eyalkraft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Required changes:

  1. On no resources, don't send anything to the metering service (don't send an empty array). Make sure that not sending is equivalent to sending count 0.

Following PR:

  1. tests

@CohenIdo CohenIdo force-pushed the serverless-metering-for-cloud-security branch from f0ecd32 to 2886e68 Compare July 25, 2023 08:00
@CohenIdo CohenIdo requested a review from eyalkraft July 25, 2023 08:00
Copy link
Contributor

@eyalkraft eyalkraft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, please add the tests ASAP

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@CohenIdo CohenIdo merged commit 707ed13 into elastic:main Jul 25, 2023
@kibanamachine kibanamachine added v8.10.0 backport:skip This commit does not require backporting labels Jul 25, 2023
ThomThomson pushed a commit to ThomThomson/kibana that referenced this pull request Aug 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting release_note:skip Skip the PR/issue when compiling release notes v8.10.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants