Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add uptime metric [SLT-372] #3321

Merged
merged 2 commits into from
Oct 21, 2024
Merged

add uptime metric [SLT-372] #3321

merged 2 commits into from
Oct 21, 2024

Conversation

trajan0x
Copy link
Contributor

@trajan0x trajan0x commented Oct 21, 2024

Description

Introduces an uptime metric

Summary by CodeRabbit

  • Documentation

    • Updated the metrics package documentation for clarity on environment variables and their functionality.
    • Corrected spelling errors and refined descriptions, particularly for the METRICS_HANDLER and OTLP exporters.
  • New Features

    • Introduced a new metrics handling system for recording service metrics using OpenTelemetry, including uptime tracking.
  • Refactor

    • Added deprecation notice for the meter field in the baseHandler struct.
    • Organized import statements for improved clarity.

Copy link
Contributor

coderabbitai bot commented Oct 21, 2024

Walkthrough

The changes in this pull request involve updates to the documentation in core/metrics/README.md, modifications to the baseHandler struct in core/metrics/base.go, and the introduction of a new file core/metrics/standard.go that implements a metrics handling system using OpenTelemetry. The documentation clarifies environment variable configurations, while the code changes include a deprecation notice for a field in base.go and the addition of a new metrics struct and related functions in standard.go.

Changes

File Change Summary
core/metrics/README.md Updated documentation for metrics package, clarified METRICS_HANDLER, corrected spelling errors, and enhanced descriptions for OTLP exporters.
core/metrics/base.go Added deprecation comment for meter field in baseHandler, reorganized imports, and introduced newStandardMetrics(ctx, b) function call.
core/metrics/standard.go Introduced standardMetrics struct for service metrics, added newStandardMetrics function, and implemented uptimeCallback method for uptime tracking.

Possibly related PRs

  • [SLT-141] feat(metrics): multiple exports #3099: The changes in core/metrics/README.md regarding the configuration of multiple OTLP exporters directly relate to the new feature introduced in this PR, which enables multiple exports within the metrics module.

Suggested labels

size/m

🐇 In the meadow, metrics bloom,
With handlers set to chase the gloom.
A gauge for uptime, clear and bright,
Documented well, it feels just right!
With deprecations, we tidy the space,
In the world of code, we find our place! 🌼


📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 3fa1d71 and a96ea1d.

📒 Files selected for processing (1)
  • core/metrics/base.go (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • core/metrics/base.go

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions github-actions bot added go Pull requests that update Go code size/s labels Oct 21, 2024
@trajan0x trajan0x changed the title add uptime metric add uptime metric [SLT-372] Oct 21, 2024
Copy link

codecov bot commented Oct 21, 2024

Codecov Report

Attention: Patch coverage is 50.00000% with 10 lines in your changes missing coverage. Please review.

Project coverage is 33.24611%. Comparing base (6415acd) to head (a96ea1d).
Report is 20 commits behind head on master.

Files with missing lines Patch % Lines
core/metrics/standard.go 44.44444% 8 Missing and 2 partials ⚠️
Additional details and impacted files
@@                 Coverage Diff                 @@
##              master       #3321         +/-   ##
===================================================
+ Coverage   31.92469%   33.24611%   +1.32142%     
===================================================
  Files            238         543        +305     
  Lines          14553       34777      +20224     
  Branches         356          82        -274     
===================================================
+ Hits            4646       11562       +6916     
- Misses          9614       22190      +12576     
- Partials         293        1025        +732     
Flag Coverage Δ
cctp-relayer 31.97848% <ø> (?)
core 58.63676% <50.00000%> (?)
ethergo 47.31849% <ø> (?)
git-changes-action 23.48315% <ø> (?)
omnirpc 32.66914% <ø> (?)
opbot 0.48870% <ø> (ø)
promexporter 6.81642% <ø> (ø)
rfq 24.64558% <ø> (+0.03270%) ⬆️
screener-api 29.45990% <ø> (?)
scribe 18.24614% <ø> (?)
solidity ?
tools 30.55118% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

cloudflare-workers-and-pages bot commented Oct 21, 2024

Deploying sanguine-fe with  Cloudflare Pages  Cloudflare Pages

Latest commit: a96ea1d
Status: ✅  Deploy successful!
Preview URL: https://f99b51a4.sanguine-fe.pages.dev
Branch Preview URL: https://fix-uptime-metric.sanguine-fe.pages.dev

View logs

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (6)
core/metrics/base.go (2)

46-47: Correct the casing in the deprecation notice

The deprecation notice for the meter field is a good practice for future maintenance. However, to adhere to Go conventions, please update the casing of the deprecation notice.

Apply this change:

-	// deprecated: will be removed in a future version
+	// Deprecated: will be removed in a future version
 	meter MeterProvider
🧰 Tools
🪛 GitHub Check: Lint (core)

[failure] 46-46:
deprecatedComment: use Deprecated: (note the casing) instead of deprecated: (gocritic)


Line range hint 1-268: Summary of changes and potential impact

The changes in this file appear to be part of a larger refactoring or enhancement of the metrics system:

  1. The meter field in the baseHandler struct has been marked as deprecated, indicating a potential shift in how metrics are handled.
  2. A new function newStandardMetrics has been introduced, suggesting the implementation of a standardized metrics system.

These changes, while relatively small, may have broader implications for the project's metrics handling. It would be beneficial to:

  1. Ensure that all parts of the codebase using the deprecated meter field are updated accordingly.
  2. Thoroughly test the new metrics system to verify its correctness and performance.
  3. Update any relevant documentation to reflect these changes in the metrics handling approach.

Consider creating a migration plan for transitioning away from the deprecated meter field and adopting the new standardized metrics system across the project.

🧰 Tools
🪛 GitHub Check: Lint (core)

[failure] 46-46:
deprecatedComment: use Deprecated: (note the casing) instead of deprecated: (gocritic)

core/metrics/README.md (4)

Line range hint 13-41: LGTM! Comprehensive OTLP documentation with room for improvement.

The OTLP section provides clear explanations of environment variables and introduces additional variables for multiple exporter backends. This is valuable information for users configuring the metrics package.

Consider addressing the TODO comment to fully document these options in a future update.

🧰 Tools
🪛 LanguageTool

[style] ~64-~64: The words ‘explanation’ and ‘explains’ are quite similar. Consider replacing ‘explains’ with a different word.
Context: ...s come with a # HELP explanation that explains them, for example: ```promql # HELP pr...

(VERB_NOUN_SENT_LEVEL_REP)


[uncategorized] ~74-~74: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...nguine/issues/1521), this is not a good long term solution since the logs are not current...

(EN_COMPOUND_ADJECTIVE_INTERNAL)


Line range hint 54-70: LGTM! Clear explanation of the metrics endpoint with helpful examples.

This section provides comprehensive information about the metrics endpoint, including configuration options and an example of a metric with HELP explanation. The note about server binding behavior is particularly important for users to understand.

Consider adding a brief explanation of what each part of the example metric represents (e.g., what otel_scope_name and otel_scope_version mean) to make it even more informative for users.

🧰 Tools
🪛 LanguageTool

[style] ~64-~64: The words ‘explanation’ and ‘explains’ are quite similar. Consider replacing ‘explains’ with a different word.
Context: ...s come with a # HELP explanation that explains them, for example: ```promql # HELP pr...

(VERB_NOUN_SENT_LEVEL_REP)


[uncategorized] ~74-~74: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...nguine/issues/1521), this is not a good long term solution since the logs are not current...

(EN_COMPOUND_ADJECTIVE_INTERNAL)


Line range hint 72-91: LGTM! Comprehensive overview of the logging situation and future plans.

This section provides valuable information about the current and future logging solutions, including limitations and known issues. The advice on using the logger with context is crucial for correct usage.

Consider adding a brief timeline or roadmap for the planned logging changes to help users prepare for the transition from ipfs/go-log to the new otelzap logger.


Line range hint 1-91: Great job on the comprehensive README! A few minor suggestions for improvement.

The README provides excellent coverage of the metrics package, with clear instructions and explanations for various components. To further enhance its quality:

  1. Fix typos:

    • Line 1: "appplication" should be "application"
    • Line 37: "enviornment" should be "environment"
    • Line 50: "enviornment" should be "environment"
    • Line 89: "beocmes" should be "becomes"
  2. Consider rephrasing the following for clarity:

    • Line 64-65: "Most metrics come with a # HELP explanation that explains them" - The words "explanation" and "explains" are repetitive. Consider rewording to "Most metrics include a # HELP comment that provides an explanation".
  3. Consistency: Ensure consistent capitalization of "environment" throughout the document.

These minor improvements will enhance the overall readability and professionalism of the README.

🧰 Tools
🪛 LanguageTool

[style] ~64-~64: The words ‘explanation’ and ‘explains’ are quite similar. Consider replacing ‘explains’ with a different word.
Context: ...s come with a # HELP explanation that explains them, for example: ```promql # HELP pr...

(VERB_NOUN_SENT_LEVEL_REP)


[uncategorized] ~74-~74: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...nguine/issues/1521), this is not a good long term solution since the logs are not current...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between c5c990e and 3fa1d71.

📒 Files selected for processing (3)
  • core/metrics/README.md (2 hunks)
  • core/metrics/base.go (3 hunks)
  • core/metrics/standard.go (1 hunks)
🧰 Additional context used
🪛 LanguageTool
core/metrics/README.md

[style] ~64-~64: The words ‘explanation’ and ‘explains’ are quite similar. Consider replacing ‘explains’ with a different word.
Context: ...s come with a # HELP explanation that explains them, for example: ```promql # HELP pr...

(VERB_NOUN_SENT_LEVEL_REP)

🪛 GitHub Check: Lint (core)
core/metrics/base.go

[failure] 46-46:
deprecatedComment: use Deprecated: (note the casing) instead of deprecated: (gocritic)

🔇 Additional comments (3)
core/metrics/base.go (1)

81-82: Provide details on the new newStandardMetrics function

A new function newStandardMetrics(ctx, b) has been added to the Start method. To ensure a comprehensive review:

  1. Could you provide more information about the purpose and functionality of this new function?
  2. How does it integrate with the existing metrics system?
  3. Are there any potential side effects or performance implications we should be aware of?

To help understand the usage and impact of this new function, please run the following script:

core/metrics/README.md (2)

Line range hint 1-11: LGTM! Clear introduction and informative table.

The introduction and metrics handler table provide a comprehensive overview of the package. The deprecation notice for the Jaeger client library is particularly important for users to be aware of.

🧰 Tools
🪛 LanguageTool

[style] ~64-~64: The words ‘explanation’ and ‘explains’ are quite similar. Consider replacing ‘explains’ with a different word.
Context: ...s come with a # HELP explanation that explains them, for example: ```promql # HELP pr...

(VERB_NOUN_SENT_LEVEL_REP)


[uncategorized] ~74-~74: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...nguine/issues/1521), this is not a good long term solution since the logs are not current...

(EN_COMPOUND_ADJECTIVE_INTERNAL)


Line range hint 43-52: LGTM! Concise and informative sections on Jaeger, Pyroscope, and Rookout.

These sections provide the necessary information for configuring each tool, with appropriate detail levels for each. The additional configuration details for Rookout are particularly helpful.

🧰 Tools
🪛 LanguageTool

[style] ~64-~64: The words ‘explanation’ and ‘explains’ are quite similar. Consider replacing ‘explains’ with a different word.
Context: ...s come with a # HELP explanation that explains them, for example: ```promql # HELP pr...

(VERB_NOUN_SENT_LEVEL_REP)


[uncategorized] ~74-~74: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...nguine/issues/1521), this is not a good long term solution since the logs are not current...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

Comment on lines +27 to +29
if str.uptimeGauge, err = str.meter.Float64ObservableGauge(processUptimeSecondsMetric, metric.WithDescription("The uptime of the process in seconds"), metric.WithUnit("seconds")); err != nil {
handler.ExperimentalLogger().Errorf(ctx, "failed to create %s gauge: %v", processUptimeSecondsMetric, err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Initialization errors are not propagated to the caller

If the creation of uptimeGauge or the registration of the callback fails, the function logs the error but does not return it to the caller. This could result in silent failures where metrics are not properly initialized, and the caller remains unaware of the issue.

Consider modifying newStandardMetrics to return an error if critical initialization steps fail, allowing the caller to handle the failure appropriately.

Suggested code change:

-func newStandardMetrics(ctx context.Context, handler Handler) *standardMetrics {
+func newStandardMetrics(ctx context.Context, handler Handler) (*standardMetrics, error) {
	// existing code
	if str.uptimeGauge, err = str.meter.Float64ObservableGauge(processUptimeSecondsMetric, metric.WithDescription("The uptime of the process in seconds"), metric.WithUnit("seconds")); err != nil {
-		handler.ExperimentalLogger().Errorf(ctx, "failed to create %s gauge: %v", processUptimeSecondsMetric, err)
+		return nil, fmt.Errorf("failed to create %s gauge: %v", processUptimeSecondsMetric, err)
	}
	// Register callback
	if _, err = str.meter.RegisterCallback(str.uptimeCallback, str.uptimeGauge); err != nil {
-		handler.ExperimentalLogger().Warnf(ctx, "failed to register callback: %v", err)
+		return nil, fmt.Errorf("failed to register callback: %v", err)
	}
+	return str, nil
}

Also applies to: 32-34

Comment on lines +19 to +35
func newStandardMetrics(ctx context.Context, handler Handler) {
str := standardMetrics{
metrics: handler,
meter: handler.Meter("standard_metrics"),
startTime: time.Now(),
}

var err error
if str.uptimeGauge, err = str.meter.Float64ObservableGauge(processUptimeSecondsMetric, metric.WithDescription("The uptime of the process in seconds"), metric.WithUnit("seconds")); err != nil {
handler.ExperimentalLogger().Errorf(ctx, "failed to create %s gauge: %v", processUptimeSecondsMetric, err)
}

// Register callback
if _, err = str.meter.RegisterCallback(str.uptimeCallback, str.uptimeGauge); err != nil {
handler.ExperimentalLogger().Warnf(ctx, "failed to register callback: %v", err)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Instance of standardMetrics is not stored or returned

The newStandardMetrics function initializes a standardMetrics instance str, but it neither returns this instance nor stores it externally. As str is local to the function, it may be garbage collected after the function exits, potentially causing the uptimeCallback to malfunction due to accessing invalid or freed memory.

Consider modifying newStandardMetrics to return the *standardMetrics instance and ensure it is stored appropriately to maintain its lifecycle.

Suggested code change:

-func newStandardMetrics(ctx context.Context, handler Handler) {
+func newStandardMetrics(ctx context.Context, handler Handler) *standardMetrics {
-	str := standardMetrics{
+	str := &standardMetrics{
		metrics:   handler,
		meter:     handler.Meter("standard_metrics"),
		startTime: time.Now(),
	}
	// existing code
+	return str
}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func newStandardMetrics(ctx context.Context, handler Handler) {
str := standardMetrics{
metrics: handler,
meter: handler.Meter("standard_metrics"),
startTime: time.Now(),
}
var err error
if str.uptimeGauge, err = str.meter.Float64ObservableGauge(processUptimeSecondsMetric, metric.WithDescription("The uptime of the process in seconds"), metric.WithUnit("seconds")); err != nil {
handler.ExperimentalLogger().Errorf(ctx, "failed to create %s gauge: %v", processUptimeSecondsMetric, err)
}
// Register callback
if _, err = str.meter.RegisterCallback(str.uptimeCallback, str.uptimeGauge); err != nil {
handler.ExperimentalLogger().Warnf(ctx, "failed to register callback: %v", err)
}
}
func newStandardMetrics(ctx context.Context, handler Handler) *standardMetrics {
str := &standardMetrics{
metrics: handler,
meter: handler.Meter("standard_metrics"),
startTime: time.Now(),
}
var err error
if str.uptimeGauge, err = str.meter.Float64ObservableGauge(processUptimeSecondsMetric, metric.WithDescription("The uptime of the process in seconds"), metric.WithUnit("seconds")); err != nil {
handler.ExperimentalLogger().Errorf(ctx, "failed to create %s gauge: %v", processUptimeSecondsMetric, err)
}
// Register callback
if _, err = str.meter.RegisterCallback(str.uptimeCallback, str.uptimeGauge); err != nil {
handler.ExperimentalLogger().Warnf(ctx, "failed to register callback: %v", err)
}
return str
}

@trajan0x trajan0x merged commit 112a9ab into master Oct 21, 2024
51 of 52 checks passed
@trajan0x trajan0x deleted the fix/uptime-metric branch October 21, 2024 15:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go Pull requests that update Go code needs-go-generate-services/rfq size/s
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant