Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: log rate limits #2900

Merged
merged 20 commits into from
Sep 7, 2022
Merged

Conversation

ctlong
Copy link
Member

@ctlong ctlong commented Aug 3, 2022

Introduce log rate limits to org and space quotas, and tasks and processes.

cloudfoundry/capi-release#245

  • I have reviewed the contributing guide

  • I have viewed, signed, and submitted the Contributor License Agreement

  • I have made this pull request to the main branch

  • I have run CF Acceptance Tests

cc/ @sethboyles

@ctlong ctlong force-pushed the log-rate-limit-rebased branch 2 times, most recently from e42958b to e341e31 Compare August 4, 2022 22:15
@mkocher mkocher force-pushed the log-rate-limit-rebased branch from aea5d28 to 820ebce Compare August 19, 2022 17:43
@mkocher mkocher marked this pull request as ready for review August 19, 2022 17:53
@mkocher
Copy link
Member

mkocher commented Aug 19, 2022

@sethboyles we think this is ready to go! Can you please take a look?

@mkocher mkocher force-pushed the log-rate-limit-rebased branch from 820ebce to 76e9f9e Compare August 19, 2022 18:47
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Aug 19, 2022

CLA Signed

The committers listed above are authorized under a signed CLA.

@mkocher mkocher marked this pull request as draft August 19, 2022 18:52
ctlong and others added 10 commits August 19, 2022 18:53
Updates the `/v3/organization_quotas/` and
`/v3/space_quotas/` endpoints to allow setting and retrieving of a new
parameter (`log_rate_limit_in_bytes_per_second`). This will eventually
permit the user to set log line production limits in bytes per second,
rather than lines per second.

Updates v3/processes and v3/tasks endpoints to support
`log_rate_limit_in_bytes_per_second`

An unlimited log rate limit is represented as -1

Tracker Story ID: [#182311424]
Tracker Story ID: [#182353823]
Tracker Story ID: [#182311433]
Tracker Story ID: [#182624538]
Tracker Story ID: [#182624530]
Github Issue: cloudfoundry/capi-release#245

Signed-off-by: Carson Long <[email protected]>
Signed-off-by: Kenneth Lakin <[email protected]>
Signed-off-by: Duane May <[email protected]>
Signed-off-by: Matthew Kocher <[email protected]>
Signed-off-by: Ben Fuller <[email protected]>
Signed-off-by: Seth Boyles <[email protected]>
Co-authored-by: Matthew Kocher <[email protected]>
- We accept -1 without a byte suffix to mean unlimited
- We also return -1 in the rendered manifest

[#182624538](https://www.pivotaltracker.com/story/show/182624538)
Github Issue: cloudfoundry/capi-release#245

Co-authored-by: Rebecca Roberts <[email protected]>
Co-authored-by: Andrew Crump <[email protected]>
- We found scaling_operation? to be a misleading name
- Just refer to started? to make it clearer when validation will be
  performed

[#182624538](https://www.pivotaltracker.com/story/show/182624538)

Co-authored-by: Andrew Crump <[email protected]>
- The log rate limit from the application web process is applied to
  the staging task
- The staging log rate limit can be customized when creating a build,
  for consistency with memory and disk limits

[#182311441](https://www.pivotaltracker.com/story/show/182311441)

Co-authored-by: Rebecca Roberts <[email protected]>
- Found that if there is a new version of Diego and an old version of
  cloud controller, the field would not be provided and defaulted to 0.
- Wrapped the integer in a MessageType so it would default to null.

Co-authored-by: Duane May <[email protected]>
* Add Log Rate container metrics. These metrics allow users to see how
  much each application is logging.

* Remove support for getting metrics from Traffic Controller. Log Cache
  has been the default for retreiving metrics for a long time. Removing
  "temporary" config flag and support for the old way of connecting to
  Traffic Controller for metrics

We did the two above items together because we did not want to add
support for getting the new container metrics from Traffic Controller.
The code was manually creating Traffic Controller protobufs and shoving
Log Cache data into them, which we had to move away from to make this
work without delving into Traffic Controller protobufs.

Signed-off-by: Duane May <[email protected]>
Co-authored-by: Duane May <[email protected]>
Signed-off-by: Matthew Kocher <[email protected]>
Co-authored-by: Matthew Kocher <[email protected]>
Signed-off-by: Ben Fuller <[email protected]>
Co-authored-by: Ben Fuller <[email protected]>
All our archeology seems to indicate that dashes should be preferred in
manifests as they are yaml and yaml prefers kebab-case to snake_case.

We also uncovered some oddities around setting process level properties
on the top level app object. We added tests to document that disk and
memory don't show up in the diff when changed on the app instead of the
process. We also discovered that proprties like health-check that
shouldbe kebab-case were not working at the app level and fixed them to
be consistent.

Signed-off-by: Rebecca Roberts <[email protected]>
Co-authored-by: Rebecca Roberts <[email protected]>
@mkocher mkocher force-pushed the log-rate-limit-rebased branch from 76e9f9e to be468d6 Compare August 19, 2022 18:54
@rroberts2222 rroberts2222 force-pushed the log-rate-limit-rebased branch from be468d6 to d1f388c Compare August 19, 2022 20:38
@sethboyles
Copy link
Member

Awesome! I'll go ahead and mark it as 'ready for review'

@sethboyles sethboyles marked this pull request as ready for review August 19, 2022 21:35
Copy link
Member

@moleske moleske left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked at this in detail but have some questions just from a 5 minute skim

@@ -0,0 +1,5 @@
Sequel.migration do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need five separate migrations? I assume they came to be because of the many commits, but if all five are needed for the feature, it seems like it should be one migration

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I pushed a change to combine the migrations.

lib/diego/bbs/models/desired_lrp_pb.rb Show resolved Hide resolved
mkocher and others added 3 commits August 29, 2022 16:41
Signed-off-by: Rebecca Roberts <[email protected]>
Co-authored-by: Rebecca Roberts <[email protected]>

wip: group with app manifest message updates for unlimited
- Adds validation to messages for builds, manifest processes and tasks
- Avoids 'An unknown error occurred' caused by the database insert
  failing due to an out of range error.

[#182969510](https://www.pivotaltracker.com/story/show/182969510)

Co-authored-by: Matthew Kocher <[email protected]>
Fixes:
  Using the `raise_error` matcher without providing a specific error or
  message risks false positives

[#182969510](https://www.pivotaltracker.com/story/show/182969510)

Co-authored-by: Rebecca Roberts <[email protected]>
@mkocher mkocher force-pushed the log-rate-limit-rebased branch from e8afa9c to e30a91f Compare August 29, 2022 16:41
@jdgonzaleza
Copy link

Hello! Please let the cli team when this PR since it's related with cloudfoundry/cli#2303

@sethboyles
Copy link
Member

@Benjamintf1 for the docs, should we mark this as experimental or upcoming? Are the changes in upstream components like Diego merged in?

@ctlong
Copy link
Member Author

ctlong commented Sep 1, 2022

@sethboyles I'd lean towards no, but defer to your process for introducing new features. The Diego work has been merged, and released in v2.66.0. That's the only upstream component.

Note: Ben's on vacation for the next couple weeks.

@sethboyles
Copy link
Member

That's good enough for me. Thanks!

@sethboyles
Copy link
Member

Hey @ctlong ,

We are seeing a couple of errors with manifests.

When supplying -1 or 0 for log-rate-limit-per-second, we are getting 500s from the manifest_diff and apply_manifest endpoints. It seems like strip is being called on the value, which errors because 0 and -1 are integers, not strings.

The other issue we are seeing is if we set disk_quota to 0. Normally this errors with

For application 'dora2': Process "web": Disk quota must use a supported unit: B, K, KB, M, MB, G, GB, T, or TB
FAILED

However, with this branch, we are seeing this error rendered in the manifest_diff:

Pushing app dora to org org / space space as admin...
Applying manifest file dora-manifest2.yml...

Updating with these attributes...
  ---
  applications:
  - name: dora
    processes:
-   - disk_quota: 1024M
+   - disk_quota: 'disk_quota must use a supported unit: B, K, KB, M, MB, G, GB, T, or TB'
      health-check-type: port
      instances: 1
      memory: 256M
      type: web

And then the error is not picked up by apply_manifest and then a deployment continues, despite having an invalid manifest. Weirdly this only happens with 0 and not -1 or other integers. We double checked that this error didn't occur on envs with our latest release.

We looked at the code a little and are not sure how this odd error happened

acrmp and others added 2 commits September 1, 2022 22:18
The cf CLI would provide these values as strings, however directly
curling the endpoint with the values as integers would result in a 500.

Co-authored-by: Carson Long <[email protected]>
@acrmp
Copy link
Member

acrmp commented Sep 1, 2022

@sethboyles Thanks! We pushed some more changes to address these issues.

@sethboyles
Copy link
Member

@acrmp thanks! Confirmed that both issues are fixed.

Sending an int value like 9999 for log-rate-limit-per-second causes a 500. This is the error:

undefined method `strip' for 9999:Integer
byte_converter.convert_to_b(human_readable_byte_value.strip)

I think coercing human_readable_byte_value to a string at the beginning of that method would protect against that.

(oh the joys of yaml...)

@acrmp
Copy link
Member

acrmp commented Sep 2, 2022

@sethboyles Thanks. I pushed a fix.

@sethboyles
Copy link
Member

Noticed a 500 on processes/scale when sending null:

Mysql2::Error: Column 'log_rate_limit' cannot be null

@sethboyles
Copy link
Member

checked the other endpoints:
builds
tasks
space_quotas
org_quotas
--they all seemed fine 👍

@duanemay
Copy link
Member

duanemay commented Sep 7, 2022

@sethboyles While looking at the issue with sending null for log_rate_limit_in_bytes_per_second we also noticed an existing issue with sending null for memory_in_mb

cf curl /v3/apps/$(cf app dora --guid)/processes/web/actions/scale -X POST -d '{ "memory_in_mb": null}'

@sethboyles
Copy link
Member

sethboyles commented Sep 7, 2022

We noticed it with instances, too. I think that is related to the memory issue since it tries to do some math with those two values. I didn't think null is supposed to be valid for instances, not sure about memory_in_mb. Is null valid in the scale endpoint for log_rate_limit_in_bytes_per_second?

@sethboyles
Copy link
Member

that is to say I'm not super concerned if it's not supposed to be valid. If it's not, we can toss a generic story in the backlog to deal with all three of these fields if you'd rather get this PR merged sooner than later

@duanemay
Copy link
Member

duanemay commented Sep 7, 2022

@sethboyles We don't think null is valid in this case. But we have put in a fix so that the error is not produced in this case.

@acrmp & Duane

@sethboyles sethboyles merged commit 5dba613 into cloudfoundry:main Sep 7, 2022
@acrmp acrmp deleted the log-rate-limit-rebased branch September 7, 2022 22:46
moleske added a commit to cloudfoundry/capi-bara-tests that referenced this pull request Sep 8, 2022
- this [pr](cloudfoundry/cloud_controller_ng#2900) added log rate limits which are now part of the expected
manifest

Co-authored-by: Michael Oleske <[email protected]>
Co-authored-by: David Alvarado <[email protected]>
MerricdeLauney added a commit that referenced this pull request Sep 26, 2022
We stopped using it as of #2900

Co-authored-by: Joseph Palermo <[email protected]>
Co-authored-by: Merric de Launey <[email protected]>
MerricdeLauney added a commit that referenced this pull request Sep 27, 2022
We stopped using it as of #2900

Co-authored-by: Joseph Palermo <[email protected]>
Co-authored-by: Merric de Launey <[email protected]>
MerricdeLauney added a commit that referenced this pull request Sep 27, 2022
We stopped using it as of #2900

Co-authored-by: Joseph Palermo <[email protected]>
Co-authored-by: Merric de Launey <[email protected]>

Co-authored-by: Joseph Palermo <[email protected]>
will-gant pushed a commit to sap-contributions/cloud_controller_ng that referenced this pull request Dec 16, 2022
We stopped using it as of cloudfoundry#2900

Co-authored-by: Joseph Palermo <[email protected]>
Co-authored-by: Merric de Launey <[email protected]>

Co-authored-by: Joseph Palermo <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants