-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FUN-990] s4 observability improvements #11512
Conversation
I see that you haven't updated any README files. Would it make sense to do so? |
@@ -195,6 +205,8 @@ func (h *functionsConnectorHandler) handleSecretsSet(ctx context.Context, gatewa | |||
if err == nil { | |||
response.Success = true | |||
promStorageUserUpdatesCount.WithLabelValues(body.DonId).Inc() | |||
promStorageTotalSize.WithLabelValues(body.DonId).Add(float64(len(record.Payload))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a correct way to calculate total storage because:
- SecretsSet can overwrite an existing entry, not adding anything to total storage size
- Entries expire and reduce total size asynchronously.
I suggested earlier to base it on the snapshot size, try to explore that please.
195350f
to
c823b16
Compare
@@ -59,13 +74,16 @@ func (c *plugin) Query(ctx context.Context, ts types.ReportTimestamp) (types.Que | |||
return nil, errors.Wrap(err, "failed to GetVersions in Query()") | |||
} | |||
|
|||
var storageTotalByteSize int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe a larger type for extra safety
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏼 changed to *big.Int
core/services/s4/orm.go
Outdated
@@ -25,6 +25,7 @@ type SnapshotRow struct { | |||
Version uint64 | |||
Expiration int64 | |||
Confirmed bool | |||
Payload []byte |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't include the whole payload in the snapshot. That will make it way too big.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏼 makes sense, will only fetch the size of the payload with octet_length(payload)
.
I've evaluated also using pg_column_size(payload)
which reflects the actual space disk by including the field size + metadata overhead + padding. But this would difficult testing given that the len(payload) we are doing won't be able to take this ~1byte extra into account.
@@ -1,6 +1,7 @@ | |||
package s4 | |||
|
|||
type PluginConfig struct { | |||
DONID string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wrong when I told you that this is a job spec config. This actually needs to go onchain and be translated to/from protos. For simplicity I suggest to remove it for now. In current deployments one node supports a single DON anyway. When that changes we might have other domain separators coming from LOOPPs.
e680b76
to
c967629
Compare
c967629
to
42dac94
Compare
42dac94
to
24efcc6
Compare
SonarQube Quality Gate |
* develop: (56 commits) [TT-367] [TT-745] Quick and Dirty OCRv2 Soak Test (#11487) [FUN-990] s4 observability improvements (#11512) fix health monitoring (#11558) Removes Optimism Goerli from Scheduled Tests (#11559) bump Foundry to the December release (#11540) Standardize LP filter logging (#11515) Change keepers to use the default contract transmitter (#11308) bump toml/v2 and prometheus to latest patch (#11541) Remove big from core utils (#11511) Handle edge case involving blocks not being found in the db (#11298) [DEPLOY-178]: Adds Scroll L2EP Contracts (#11405) disable kaniko fallback, increase deploy wait timeout (#11548) Use multiple EL clients with ocrv2 median smoke test (#11399) Remove core utils dependencies from common (#11425) [BCF-2760] Flakey test detection improvements (#11470) go.mods: rm libp2p; rm btcd replace (#11502) wrap devspace commands (#11530) small improvements based on comments (#11491) (test): Remove unnecessary fuzzing from Functions OnTokenTransfer tests (#11517) core/scripts/common: rm ava-labs/coreth; lint (#11451) ...
* chore: count s4 number of updates performed by nodes * chore: add storage total use and slots occupied * chore: add plugin side of counting updates * fix: modify total size counter to use snapshot len * chore: take into account the payload size for total size * chore: fix lint errors * fix: fetch only payload_size, reword metrics help * chore: remove don_id label * chore: remove don_id for consistency
Description
This PR addresses ticket FUN-990 Making some observability improvements
How was it tested
/metrics
endpoint