Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FUN-990] s4 observability improvements #11512

Merged
merged 9 commits into from
Dec 13, 2023

Conversation

agparadiso
Copy link
Contributor

@agparadiso agparadiso commented Dec 7, 2023

Description

This PR addresses ticket FUN-990 Making some observability improvements

How was it tested

  • Changes on orm logic are covered with unit test
  • Changes related to observability where verity locally checking the /metrics endpoint

Copy link
Contributor

github-actions bot commented Dec 7, 2023

I see that you haven't updated any README files. Would it make sense to do so?

@agparadiso agparadiso self-assigned this Dec 7, 2023
@@ -195,6 +205,8 @@ func (h *functionsConnectorHandler) handleSecretsSet(ctx context.Context, gatewa
if err == nil {
response.Success = true
promStorageUserUpdatesCount.WithLabelValues(body.DonId).Inc()
promStorageTotalSize.WithLabelValues(body.DonId).Add(float64(len(record.Payload)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a correct way to calculate total storage because:

  • SecretsSet can overwrite an existing entry, not adding anything to total storage size
  • Entries expire and reduce total size asynchronously.

I suggested earlier to base it on the snapshot size, try to explore that please.

@agparadiso agparadiso requested a review from bolekk December 7, 2023 19:32
@agparadiso agparadiso force-pushed the chore/FUN-990_S4_observability_improv branch from 195350f to c823b16 Compare December 7, 2023 20:17
core/services/functions/connector_handler.go Outdated Show resolved Hide resolved
core/services/ocr2/plugins/s4/plugin.go Outdated Show resolved Hide resolved
@@ -59,13 +74,16 @@ func (c *plugin) Query(ctx context.Context, ts types.ReportTimestamp) (types.Que
return nil, errors.Wrap(err, "failed to GetVersions in Query()")
}

var storageTotalByteSize int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a larger type for extra safety

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏼 changed to *big.Int

@@ -25,6 +25,7 @@ type SnapshotRow struct {
Version uint64
Expiration int64
Confirmed bool
Payload []byte
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't include the whole payload in the snapshot. That will make it way too big.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏼 makes sense, will only fetch the size of the payload with octet_length(payload).

I've evaluated also using pg_column_size(payload) which reflects the actual space disk by including the field size + metadata overhead + padding. But this would difficult testing given that the len(payload) we are doing won't be able to take this ~1byte extra into account.

@agparadiso agparadiso requested a review from bolekk December 8, 2023 11:31
@agparadiso agparadiso marked this pull request as ready for review December 8, 2023 11:32
@agparadiso agparadiso requested review from a team, jmank88 and krehermann as code owners December 8, 2023 11:32
core/services/ocr2/plugins/s4/plugin.go Outdated Show resolved Hide resolved
core/services/ocr2/plugins/s4/plugin.go Show resolved Hide resolved
@@ -1,6 +1,7 @@
package s4

type PluginConfig struct {
DONID string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wrong when I told you that this is a job spec config. This actually needs to go onchain and be translated to/from protos. For simplicity I suggest to remove it for now. In current deployments one node supports a single DON anyway. When that changes we might have other domain separators coming from LOOPPs.

core/services/ocr2/plugins/s4/plugin.go Outdated Show resolved Hide resolved
core/services/functions/connector_handler.go Outdated Show resolved Hide resolved
core/services/s4/postgres_orm.go Show resolved Hide resolved
@agparadiso agparadiso requested a review from bolekk December 12, 2023 14:47
@agparadiso agparadiso changed the title [FUN-990] s4 observability improv [FUN-990] s4 observability improvements Dec 12, 2023
@agparadiso agparadiso requested a review from bolekk December 12, 2023 18:14
bolekk
bolekk previously approved these changes Dec 12, 2023
@agparadiso agparadiso force-pushed the chore/FUN-990_S4_observability_improv branch 2 times, most recently from e680b76 to c967629 Compare December 12, 2023 18:23
@agparadiso agparadiso force-pushed the chore/FUN-990_S4_observability_improv branch from c967629 to 42dac94 Compare December 13, 2023 10:23
@agparadiso agparadiso force-pushed the chore/FUN-990_S4_observability_improv branch from 42dac94 to 24efcc6 Compare December 13, 2023 10:47
@cl-sonarqube-production
Copy link

@agparadiso agparadiso requested a review from bolekk December 13, 2023 11:28
@agparadiso agparadiso added this pull request to the merge queue Dec 13, 2023
Merged via the queue into develop with commit 6f13447 Dec 13, 2023
81 of 82 checks passed
@agparadiso agparadiso deleted the chore/FUN-990_S4_observability_improv branch December 13, 2023 15:25
momentmaker added a commit that referenced this pull request Dec 13, 2023
* develop: (56 commits)
  [TT-367] [TT-745] Quick and Dirty OCRv2 Soak Test (#11487)
  [FUN-990] s4 observability improvements (#11512)
  fix health monitoring (#11558)
  Removes Optimism Goerli from Scheduled Tests (#11559)
  bump Foundry to the December release (#11540)
  Standardize LP filter logging (#11515)
  Change keepers to use the default contract transmitter (#11308)
  bump toml/v2 and prometheus to latest patch (#11541)
  Remove big from core utils (#11511)
  Handle edge case involving blocks not being found in the db (#11298)
  [DEPLOY-178]: Adds Scroll L2EP Contracts (#11405)
  disable kaniko fallback, increase deploy wait timeout (#11548)
  Use multiple EL clients with ocrv2 median smoke test (#11399)
  Remove core utils dependencies from common (#11425)
  [BCF-2760] Flakey test detection improvements (#11470)
  go.mods: rm libp2p; rm btcd replace (#11502)
  wrap devspace commands (#11530)
  small improvements based on comments (#11491)
  (test): Remove unnecessary fuzzing from Functions OnTokenTransfer tests (#11517)
  core/scripts/common: rm ava-labs/coreth; lint (#11451)
  ...
fbac pushed a commit that referenced this pull request Dec 14, 2023
* chore: count s4 number of updates performed by nodes

* chore: add storage total use and slots occupied

* chore: add plugin side of counting updates

* fix: modify total size counter to use snapshot len

* chore: take into account the payload size for total size

* chore: fix lint errors

* fix: fetch only payload_size, reword metrics help

* chore: remove don_id label

* chore: remove don_id for consistency
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants