Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(zetaclient): streamline process initialization #3291

Open
wants to merge 7 commits into
base: develop
Choose a base branch
from

Conversation

swift1337
Copy link
Contributor

@swift1337 swift1337 commented Dec 12, 2024

This PR streamlines the start.go file and introduces the graceful package to facilitate a graceful shutdown. It also includes various other miscellaneous improvements.

However, the diff is quite challenging to navigate. I suggest checking out the branch to see the changes in start.go and pkg/graceful (I also like Github Dev https://github.dev/zeta-chain/node/pull/3291)

Summary by CodeRabbit

Release Notes

  • New Features

    • Added a new telemetry endpoint /systemtime in the ZetaClient.
    • Introduced a function to resolve the database path for the chain observer.
  • Bug Fixes

    • Fixed issues with unsupported transaction versions for Solana.
    • Improved error handling for various operations, including configuration loading and logger creation.
  • Improvements

    • Enhanced initialization process for the ZetaClient with graceful shutdown capabilities.
    • Streamlined logging setup and error reporting across multiple components.
  • Tests

    • Updated test suite for the graceful shutdown process, ensuring robust validation of service behaviors.
  • Documentation

    • Updated changelog to reflect recent changes and improvements across features and fixes.

@swift1337 swift1337 self-assigned this Dec 12, 2024
Copy link

gitguardian bot commented Dec 12, 2024

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secret in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
14567965 Triggered Generic Password 19c089f cmd/zetaclientd/start.go View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secret safely. Learn here the best practices.
  3. Revoke and rotate this secret.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

Copy link
Contributor

coderabbitai bot commented Dec 12, 2024

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough
📝 Walkthrough

Walkthrough

This pull request introduces significant updates across various components of the zetaclient. Key changes include the addition of a new telemetry endpoint /systemtime, restructuring of the initialization process to enhance error handling and logging, and modifications to the test organization. The TSS package has been revamped, and several fixes have been implemented to improve transaction handling, particularly for Solana. Additionally, a new graceful package has been introduced to manage service shutdowns, along with comprehensive tests to ensure robust functionality.

Changes

File Path Change Summary
changelog.md Updated to reflect changes across features, tests, refactoring, and fixes; added /systemtime telemetry endpoint; modified test organization; revamped TSS package; implemented various fixes.
cmd/zetaclientd/start.go Restructured Start function; introduced promptPasswords function; improved error handling; integrated telemetry server initialization; modified shutdown handling.
cmd/zetaclientd/utils.go Removed createAuthzSigner; integrated its logic into createZetacoreClient; replaced waitForZetaCore with waitForBlocks; added prepareZetacoreClient, isObserverNode, and resolveObserverPubKeyBech32.
pkg/graceful/graceful.go Introduced a new package for graceful shutdowns; defined Process struct and Service interface; added methods for service management and shutdown handling.
pkg/graceful/graceful_test.go Added comprehensive test suite for Process functionality; validated service shutdown scenarios and error handling.
pkg/os/console.go Corrected typographical error in readPassword function; renamed delimitor to delimiter.
zetaclient/chains/base/logger.go Renamed ComplianceLogFile to complianceLogFile; renamed InitLogger to NewLogger; refactored logging initialization logic.
zetaclient/chains/base/logger_test.go Updated test function to reflect renaming of logger initialization function from InitLogger to NewLogger.
zetaclient/chains/interfaces/interfaces.go Removed GetLogger() method from ZetacoreClient interface.
zetaclient/config/config.go Added ResolveDBPath function to resolve the path to the chain observer database.
zetaclient/metrics/metrics.go Updated Start and Stop methods for improved error handling; changed method signatures to include context.Context.
zetaclient/metrics/metrics_test.go Modified SetUpSuite to start m.Start() in a new goroutine with context.Background().
zetaclient/metrics/telemetry.go Updated Start and Stop methods for better control flow and error handling; reduced shutdown timeout from 10 seconds to 5 seconds.
zetaclient/zetacore/client.go Removed GetLogger method from Client struct.

Possibly related PRs

Suggested labels

no-changelog, UPGRADE_LIGHT_TESTS

Suggested reviewers

  • fbac
  • kingpinXD
  • ws4charlie
  • brewmaster012
  • skosito
  • lumtis

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Experiment)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

!!!WARNING!!!
nosec detected in the following files: cmd/zetaclientd/start.go

Be very careful about using #nosec in code. It can be a quick way to suppress security warnings and move forward with development, it should be employed with caution. Suppressing warnings with #nosec can hide potentially serious vulnerabilities. Only use #nosec when you're absolutely certain that the security issue is either a false positive or has been mitigated in another way.

Only suppress a single rule (or a specific set of rules) within a section of code, while continuing to scan for other problems. To do this, you can list the rule(s) to be suppressed within the #nosec annotation, e.g: /* #nosec G401 */ or //#nosec G201 G202 G203
Broad #nosec annotations should be avoided, as they can hide other vulnerabilities. The CI will block you from merging this PR until you remove #nosec annotations that do not target specific rules.

Pay extra attention to the way #nosec is being used in the files listed above.

@github-actions github-actions bot added the nosec label Dec 12, 2024
@swift1337 swift1337 marked this pull request as ready for review December 12, 2024 19:56
@swift1337 swift1337 requested a review from a team as a code owner December 12, 2024 19:56
@swift1337 swift1337 changed the title refactor(zetaclient): streamline start.go refactor(zetaclient): streamline process initialization Dec 12, 2024
pkg/graceful/graceful.go Outdated Show resolved Hide resolved
Copy link

codecov bot commented Dec 12, 2024

Codecov Report

Attention: Patch coverage is 46.66667% with 120 lines in your changes missing coverage. Please review.

Project coverage is 61.68%. Comparing base (462fcea) to head (e3c3587).

Files with missing lines Patch % Lines
zetaclient/zetacore/client_start.go 0.00% 83 Missing ⚠️
pkg/graceful/graceful.go 86.31% 12 Missing and 1 partial ⚠️
zetaclient/metrics/metrics.go 16.66% 9 Missing and 1 partial ⚠️
zetaclient/config/config.go 0.00% 8 Missing ⚠️
zetaclient/metrics/telemetry.go 0.00% 6 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #3291      +/-   ##
===========================================
- Coverage    61.79%   61.68%   -0.12%     
===========================================
  Files          429      431       +2     
  Lines        30816    30999     +183     
===========================================
+ Hits         19043    19121      +78     
- Misses       10914    11018     +104     
- Partials       859      860       +1     
Files with missing lines Coverage Δ
pkg/os/console.go 76.92% <100.00%> (ø)
zetaclient/chains/base/logger.go 94.00% <100.00%> (-0.45%) ⬇️
zetaclient/zetacore/client.go 58.57% <ø> (+0.68%) ⬆️
zetaclient/metrics/telemetry.go 0.00% <0.00%> (ø)
zetaclient/config/config.go 0.00% <0.00%> (ø)
zetaclient/metrics/metrics.go 69.64% <16.66%> (-5.36%) ⬇️
pkg/graceful/graceful.go 86.31% <86.31%> (ø)
zetaclient/zetacore/client_start.go 0.00% <0.00%> (ø)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Outside diff range and nitpick comments (9)
cmd/zetaclientd/utils.go (2)

14-15: Use descriptive aliases for imported authz packages

The alias authz2 for the imported github.com/zeta-chain/node/pkg/authz package may cause confusion with github.com/zeta-chain/node/zetaclient/authz. Consider using more descriptive aliases, such as coreauthz and clientauthz, to enhance code clarity.


Line range hint 69-85: Log error details when retrying block height retrieval

In the waitForBlocks function, when zc.GetBlockHeight(ctx) fails or the block height is insufficient, the error details are not logged. Including the error information in the log message will aid in diagnosing connection issues or other problems.

Proposed change:

- logger.Info().Msgf("Failed to get block number, retry: %d/%d", retryCount, attempts)
+ logger.Info().Err(err).Msgf("Failed to get block number, retry: %d/%d", retryCount, attempts)
pkg/graceful/graceful_test.go (1)

145-147: Remove unreachable code following a panic

In the Panic handling test, the return nil statement after panic("oopsie") is unreachable. Removing it enhances code clarity and avoids confusion.

Proposed change:

 func (ctx context.Context) error {
     panic("oopsie")
-    return nil
 }
zetaclient/chains/base/logger.go (2)

67-72: Consider adding format validation

While the writer configuration is cleaner, consider validating the log format explicitly.

+const (
+    logFormatJSON = "json"
+    logFormatText = "text"
+)
+
 func NewLogger(cfg config.Config) (Logger, error) {
     // ...
+    switch cfg.LogFormat {
+    case logFormatJSON:
+        stdWriter = os.Stdout
+    case logFormatText:
         stdWriter = zerolog.ConsoleWriter{
             Out:        os.Stdout,
             TimeFormat: time.RFC3339,
         }
+    default:
+        return Logger{}, fmt.Errorf("unsupported log format: %s", cfg.LogFormat)
     }

75-77: Consider adding sampling configuration validation

The sampling configuration could benefit from validation of the sampling rate.

+const defaultSampleRate = 5
+
 func NewLogger(cfg config.Config) (Logger, error) {
     // ...
     if cfg.LogSampler {
+        rate := defaultSampleRate
+        if cfg.LogSampleRate > 0 {
+            rate = cfg.LogSampleRate
+        }
-        std = std.Sample(&zerolog.BasicSampler{N: 5})
+        std = std.Sample(&zerolog.BasicSampler{N: rate})
     }
zetaclient/config/config.go (1)

120-130: Consider making the database path configurable

While the implementation is correct, having the database path hardcoded as .zetaclient/chainobserver reduces flexibility. Consider making it configurable through environment variables or configuration files.

 // ResolveDBPath resolves the path to chain observer database
 func ResolveDBPath() (string, error) {
-    const dbpath = ".zetaclient/chainobserver"
+    dbpath := os.Getenv("ZETACLIENT_DB_PATH")
+    if dbpath == "" {
+        dbpath = ".zetaclient/chainobserver"
+    }
 
     userDir, err := os.UserHomeDir()
     if err != nil {
         return "", errors.Wrap(err, "unable to resolve user home directory")
     }
 
     return filepath.Join(userDir, dbpath), nil
 }
zetaclient/metrics/telemetry.go (1)

Line range hint 201-213: Enhance error handling and utilize context

The implementation could be improved by utilizing the context parameter and enhancing error handling.

-func (t *TelemetryServer) Start(_ context.Context) error {
+func (t *TelemetryServer) Start(ctx context.Context) error {
     if t.s == nil {
         return errors.New("invalid http server instance")
     }
 
-    if err := t.s.ListenAndServe(); err != nil {
-        if !errors.Is(err, http.ErrServerClosed) {
-            return fmt.Errorf("fail to start http server: %w", err)
-        }
-    }
+    errCh := make(chan error, 1)
+    go func() {
+        if err := t.s.ListenAndServe(); err != nil && !errors.Is(err, http.ErrServerClosed) {
+            errCh <- fmt.Errorf("fail to start http server: %w", err)
+        }
+    }()
+
+    select {
+    case err := <-errCh:
+        return err
+    case <-ctx.Done():
+        return ctx.Err()
+    }
 
     return nil
 }
changelog.md (2)

18-18: Add description for PR #3291

The changelog entry should include a brief description of the changes made in PR #3291 for better clarity and documentation.


Line range hint 1-1589: Standardize bullet point formatting

For better readability and consistency, consider:

  1. Using a single bullet point style throughout (either - or *)
  2. Maintaining consistent spacing after bullet points
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 462fcea and 9f78be0.

📒 Files selected for processing (14)
  • changelog.md (1 hunks)
  • cmd/zetaclientd/start.go (5 hunks)
  • cmd/zetaclientd/utils.go (6 hunks)
  • pkg/graceful/graceful.go (1 hunks)
  • pkg/graceful/graceful_test.go (1 hunks)
  • pkg/os/console.go (1 hunks)
  • zetaclient/chains/base/logger.go (4 hunks)
  • zetaclient/chains/base/logger_test.go (1 hunks)
  • zetaclient/chains/interfaces/interfaces.go (0 hunks)
  • zetaclient/config/config.go (2 hunks)
  • zetaclient/metrics/metrics.go (2 hunks)
  • zetaclient/metrics/metrics_test.go (2 hunks)
  • zetaclient/metrics/telemetry.go (2 hunks)
  • zetaclient/zetacore/client.go (0 hunks)
💤 Files with no reviewable changes (2)
  • zetaclient/chains/interfaces/interfaces.go
  • zetaclient/zetacore/client.go
✅ Files skipped from review due to trivial changes (1)
  • pkg/os/console.go
🧰 Additional context used
📓 Path-based instructions (10)
zetaclient/chains/base/logger_test.go (1)

Pattern **/*.go: Review the Go code, point out issues relative to principles of clean code, expressiveness, and performance.

zetaclient/metrics/metrics_test.go (1)

Pattern **/*.go: Review the Go code, point out issues relative to principles of clean code, expressiveness, and performance.

zetaclient/config/config.go (1)

Pattern **/*.go: Review the Go code, point out issues relative to principles of clean code, expressiveness, and performance.

zetaclient/metrics/telemetry.go (1)

Pattern **/*.go: Review the Go code, point out issues relative to principles of clean code, expressiveness, and performance.

zetaclient/chains/base/logger.go (1)

Pattern **/*.go: Review the Go code, point out issues relative to principles of clean code, expressiveness, and performance.

cmd/zetaclientd/start.go (1)

Pattern **/*.go: Review the Go code, point out issues relative to principles of clean code, expressiveness, and performance.

zetaclient/metrics/metrics.go (1)

Pattern **/*.go: Review the Go code, point out issues relative to principles of clean code, expressiveness, and performance.

pkg/graceful/graceful_test.go (1)

Pattern **/*.go: Review the Go code, point out issues relative to principles of clean code, expressiveness, and performance.

cmd/zetaclientd/utils.go (1)

Pattern **/*.go: Review the Go code, point out issues relative to principles of clean code, expressiveness, and performance.

pkg/graceful/graceful.go (1)

Pattern **/*.go: Review the Go code, point out issues relative to principles of clean code, expressiveness, and performance.

🔇 Additional comments (8)
pkg/graceful/graceful.go (2)

51-55: AddService method effectively integrates service lifecycle management

The AddService method elegantly combines AddStarter and AddStopper, providing a clean and modular way to manage services that implement the Service interface. This enhances code readability and maintainability.


104-145: ShutdownNow method robustly handles graceful termination with timeout

The ShutdownNow method correctly initiates the shutdown sequence, stopping services in reverse order and safely recovering from any panics within stoppers. The implementation of a timeout prevents indefinite hanging during shutdown.

pkg/graceful/graceful_test.go (1)

1-273: Test suite comprehensively validates graceful shutdown behavior

The test cases effectively cover various scenarios, including synchronous and asynchronous service operations, error and panic handling, manual starters and stoppers, and shutdown timeouts. This thorough testing ensures the reliability and robustness of the graceful package.

zetaclient/chains/base/logger_test.go (1)

82-82: LGTM: Function rename properly reflected in test

The test correctly uses the renamed NewLogger function, maintaining consistency with the implementation changes.

zetaclient/chains/base/logger.go (1)

60-64: Well-structured helper function extraction

Good refactoring to extract common logger configuration into augmentLogger. This improves code organization and reduces duplication.

zetaclient/metrics/metrics.go (1)

201-207: LGTM: Clean implementation of graceful shutdown

The implementation correctly handles graceful shutdown with a reasonable timeout.

zetaclient/metrics/telemetry.go (1)

216-222: LGTM: Clean implementation of graceful shutdown

The implementation correctly handles graceful shutdown with a reasonable timeout.

changelog.md (1)

Line range hint 1-1589: Version organization looks good!

The changelog follows semantic versioning and maintains a clear chronological order from newest to oldest versions.

pkg/graceful/graceful.go Show resolved Hide resolved
cmd/zetaclientd/utils.go Outdated Show resolved Hide resolved
cmd/zetaclientd/start.go Outdated Show resolved Hide resolved
zetaclient/metrics/metrics_test.go Show resolved Hide resolved
zetaclient/metrics/metrics.go Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (12)
cmd/zetaclientd/start.go (2)

85-85: Ensure consistent use of logger across the codebase

For uniformity and maintainability, it's important to use the same logger instance throughout the application. Replace direct calls to the global log package with the configured logger instance.

Apply this diff to use logger.Std:

- log.Info().Msgf("Config is updated from zetacore\n %s", cfg.StringMasked())
+ logger.Std.Info().Msgf("Config is updated from zetacore\n %s", cfg.StringMasked())

185-201: Use the configured logger instead of the global logger

In the startTelemetry function, relying on the global log package can lead to inconsistent logging behavior. Opt for the injected logger instance to maintain consistency and respect the configured logging settings.

Apply this diff to pass and use logger.Std:

func startTelemetry(ctx context.Context, cfg config.Config) (*metrics.TelemetryServer, error) {
+	func startTelemetry(ctx context.Context, cfg config.Config, logger zerolog.Logger) (*metrics.TelemetryServer, error) {

	// 1. Init pprof http server
	pprofServer := func(_ context.Context) error {
		addr := os.Getenv(envPprofAddr)
		if addr == "" {
			addr = "localhost:6061"
		}

-		log.Info().Str("addr", addr).Msg("starting pprof http server")
+		logger.Info().Str("addr", addr).Msg("starting pprof http server")

		// #nosec G114 -- timeouts unneeded
		err := http.ListenAndServe(addr, nil)
		if err != nil {
-			log.Error().Err(err).Msg("pprof http server error")
+			logger.Error().Err(err).Msg("pprof http server error")
		}

		return nil
	}
	...
}

+// Update the call site:
-telemetry, err := startTelemetry(ctx, cfg)
+telemetry, err := startTelemetry(ctx, cfg, logger.Std)
zetaclient/chains/base/logger.go (1)

52-57: Consider adding cleanup mechanism for compliance file

While the error handling for file opening is appropriate, there's no mechanism to ensure the file is properly closed when the logger is no longer needed.

Consider adding a Close method to the Logger struct:

type Logger struct {
	Std        zerolog.Logger
	Compliance zerolog.Logger
+	complianceFile io.Closer
}

+func (l *Logger) Close() error {
+	if l.complianceFile != nil {
+		return l.complianceFile.Close()
+	}
+	return nil
+}
zetaclient/config/config.go (1)

120-130: Enhance ResolveDBPath for better configurability and testability

While the current implementation is functional, it could be improved for better flexibility and testing.

Consider making the path configurable and adding test support:

+const DefaultDBPath = ".zetaclient/chainobserver"

-func ResolveDBPath() (string, error) {
+func ResolveDBPath(customPath string) (string, error) {
+	if customPath != "" {
+		return filepath.Clean(customPath), nil
+	}

	userDir, err := os.UserHomeDir()
	if err != nil {
		return "", errors.Wrap(err, "unable to resolve user home directory")
	}

-	return filepath.Join(userDir, dbpath), nil
+	return filepath.Join(userDir, DefaultDBPath), nil
}
pkg/graceful/graceful_test.go (2)

16-46: Consider using table-driven tests for better organization

The test cases follow a similar pattern and could be refactored into a table-driven test structure to reduce code duplication and improve maintainability.

Example refactor:

func TestProcess(t *testing.T) {
+    testCases := []struct {
+        name     string
+        timeout  time.Duration
+        async    bool
+        setup    func(*testSuite)
+        validate func(*testing.T, *testSuite, time.Time)
+    }{
+        {
+            name:    "Service sync",
+            timeout: defaultTimeout,
+            async:   false,
+            setup: func(ts *testSuite) {
+                go func() {
+                    time.Sleep(time.Second)
+                    ts.mockSignal <- os.Interrupt
+                }()
+            },
+            validate: func(t *testing.T, ts *testSuite, start time.Time) {
+                assert.Less(t, time.Since(start), defaultTimeout)
+                assert.Contains(t, ts.logBuffer.String(), "Shutdown completed")
+                assert.Contains(t, ts.logBuffer.String(), "mock is running in blocking mode")
+            },
+        },
+        // Add other test cases here
+    }
+
+    for _, tc := range testCases {
+        tc := tc
+        t.Run(tc.name, func(t *testing.T) {
+            t.Parallel()
+            ts := newTestSuite(t, tc.timeout, tc.async)
+            ts.process.AddService(ctx, ts.mockService)
+            start := time.Now()
+            tc.setup(ts)
+            ts.process.WaitForShutdown()
+            tc.validate(t, ts, start)
+        })
+    }
}

173-197: Enhance timeout test precision

The test could be more precise by measuring the actual duration of the shutdown operation.

Apply this diff to improve the test:

     t.Run("Shutdown timeout", func(t *testing.T) {
         t.Parallel()
         ts := newTestSuite(t, defaultTimeout, false)
+        start := time.Now()
         const workDuration = defaultTimeout + 5*time.Second
         ts.process.AddStopper(func() {
             ts.logger.Info().Msg("Stopping something")
             time.Sleep(workDuration)
             ts.logger.Info().Msg("Stopped something")
         })
         ts.process.ShutdownNow()
+        duration := time.Since(start)
+        assert.Greater(t, duration, defaultTimeout)
+        assert.Less(t, duration, workDuration)
         assert.Contains(t, ts.logBuffer.String(), "Stopping something")
         assert.Contains(t, ts.logBuffer.String(), "Shutdown interrupted by timeout")
         assert.NotContains(t, ts.logBuffer.String(), "Stopped something")
     })
zetaclient/metrics/metrics.go (1)

201-207: Consider adding cleanup operations in Stop method

The Stop method could benefit from additional cleanup operations, such as closing any open metric collectors or resetting gauges.

Apply this diff to enhance the shutdown process:

 func (m *Metrics) Stop() {
+    // Reset all gauges to avoid stale metrics
+    PendingTxsPerChain.Reset()
+    RelayerKeyBalance.Reset()
+    LastScannedBlockNumber.Reset()
+    RPCInProgress.Reset()
+
     ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
     defer cancel()
 
     if err := m.s.Shutdown(ctx); err != nil {
         log.Error().Err(err).Msg("failed to shutdown metrics server")
     }
 }
zetaclient/metrics/telemetry.go (2)

Line range hint 201-213: Improve context handling in Start method

The context parameter should be used to control the server's lifecycle and ensure proper cleanup.

Apply this diff to enhance context handling:

 func (t *TelemetryServer) Start(ctx context.Context) error {
     if t.s == nil {
         return errors.New("invalid http server instance")
     }
 
+    // Use base context for the server
+    t.s.BaseContext = func(net.Listener) context.Context {
+        return ctx
+    }
+
+    errCh := make(chan error, 1)
+    go func() {
+        if err := t.s.ListenAndServe(); err != nil && !errors.Is(err, http.ErrServerClosed) {
+            errCh <- fmt.Errorf("fail to start http server: %w", err)
+        }
+        close(errCh)
+    }()
 
-    if err := t.s.ListenAndServe(); err != nil {
-        if !errors.Is(err, http.ErrServerClosed) {
-            return fmt.Errorf("fail to start http server: %w", err)
-        }
+    select {
+    case err := <-errCh:
+        return err
+    case <-ctx.Done():
+        return ctx.Err()
     }
-
-    return nil
 }

216-222: Enhance cleanup operations in Stop method

The Stop method should ensure proper cleanup of all resources and provide more detailed error logging.

Apply this diff to improve the shutdown process:

 func (t *TelemetryServer) Stop() {
+    t.mu.Lock()
+    // Clear internal state
+    t.lastScannedBlockNumber = make(map[int64]uint64)
+    t.rtt = make(map[peer.ID]int64)
+    t.mu.Unlock()
+
     c, cancel := context.WithTimeout(context.Background(), 5*time.Second)
     defer cancel()
 
     if err := t.s.Shutdown(c); err != nil {
-        log.Error().Err(err).Msg("Failed to shutdown the TelemetryServer")
+        log.Error().
+            Err(err).
+            Str("timeout", "5s").
+            Msg("Failed to gracefully shutdown the TelemetryServer")
     }
 }
changelog.md (3)

Line range hint 1-1: Add version number to the main header

The main header should include the project name and current version number for better tracking.

-# CHANGELOG
+# CHANGELOG (v23.x)

18-18: Fix indentation in refactor section

The bullet point is not properly aligned with other entries in the section.

-* [3291](https://github.com/zeta-chain/node/pull/3291) - revamp zetaclient initialization (+ graceful shutdown)
+* [3291](https://github.com/zeta-chain/node/pull/3291) - revamp zetaclient initialization (+ graceful shutdown)

Line range hint 3-24: Consider adding migration notes for breaking changes

The unreleased section contains significant changes to the zetaclient initialization and TSS package. Consider adding a "Breaking Changes" section to document any migration steps required.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 462fcea and 9f78be0.

📒 Files selected for processing (14)
  • changelog.md (1 hunks)
  • cmd/zetaclientd/start.go (5 hunks)
  • cmd/zetaclientd/utils.go (6 hunks)
  • pkg/graceful/graceful.go (1 hunks)
  • pkg/graceful/graceful_test.go (1 hunks)
  • pkg/os/console.go (1 hunks)
  • zetaclient/chains/base/logger.go (4 hunks)
  • zetaclient/chains/base/logger_test.go (1 hunks)
  • zetaclient/chains/interfaces/interfaces.go (0 hunks)
  • zetaclient/config/config.go (2 hunks)
  • zetaclient/metrics/metrics.go (2 hunks)
  • zetaclient/metrics/metrics_test.go (2 hunks)
  • zetaclient/metrics/telemetry.go (2 hunks)
  • zetaclient/zetacore/client.go (0 hunks)
💤 Files with no reviewable changes (2)
  • zetaclient/chains/interfaces/interfaces.go
  • zetaclient/zetacore/client.go
✅ Files skipped from review due to trivial changes (1)
  • pkg/os/console.go
🧰 Additional context used
📓 Path-based instructions (10)
zetaclient/config/config.go (1)

Pattern **/*.go: Review the Go code, point out issues relative to principles of clean code, expressiveness, and performance.

zetaclient/chains/base/logger_test.go (1)

Pattern **/*.go: Review the Go code, point out issues relative to principles of clean code, expressiveness, and performance.

zetaclient/metrics/metrics_test.go (1)

Pattern **/*.go: Review the Go code, point out issues relative to principles of clean code, expressiveness, and performance.

zetaclient/metrics/telemetry.go (1)

Pattern **/*.go: Review the Go code, point out issues relative to principles of clean code, expressiveness, and performance.

zetaclient/metrics/metrics.go (1)

Pattern **/*.go: Review the Go code, point out issues relative to principles of clean code, expressiveness, and performance.

zetaclient/chains/base/logger.go (1)

Pattern **/*.go: Review the Go code, point out issues relative to principles of clean code, expressiveness, and performance.

cmd/zetaclientd/start.go (1)

Pattern **/*.go: Review the Go code, point out issues relative to principles of clean code, expressiveness, and performance.

cmd/zetaclientd/utils.go (1)

Pattern **/*.go: Review the Go code, point out issues relative to principles of clean code, expressiveness, and performance.

pkg/graceful/graceful_test.go (1)

Pattern **/*.go: Review the Go code, point out issues relative to principles of clean code, expressiveness, and performance.

pkg/graceful/graceful.go (1)

Pattern **/*.go: Review the Go code, point out issues relative to principles of clean code, expressiveness, and performance.

🔇 Additional comments (7)
pkg/graceful/graceful.go (2)

61-66: 🛠️ Refactor suggestion

Include stack trace when logging recovered panics for better debugging

When recovering from a panic in the service starter, including the stack trace provides valuable context for diagnosing issues. Consider using debug.Stack() to capture and log the stack trace.

Apply this diff to include the stack trace:

+import (
+	"runtime/debug"
+	...
+)

...

func (p *Process) AddStarter(ctx context.Context, fn func(ctx context.Context) error) {
	go func() {
		defer func() {
			if r := recover(); r != nil {
				p.logger.Error().
					Interface("panic", r).
+					Bytes("stack", debug.Stack()).
					Msg("panic in service")
				p.ShutdownNow()
			}
		}()

		if err := fn(ctx); err != nil {
			p.logger.Error().Err(err).Msg("failed to start service")
			p.ShutdownNow()
		}
	}()
}

125-130: 🛠️ Refactor suggestion

Include stack trace when logging recovered panics during shutdown

Including the stack trace when recovering from a panic during shutdown enhances the ability to troubleshoot and resolve issues efficiently. Utilize debug.Stack() to log the stack trace.

Apply this diff to include the stack trace:

+import (
+	"runtime/debug"
+	...
+)

...

func (p *Process) ShutdownNow() {
	...
	go func() {
		defer func() {
			if r := recover(); r != nil {
				p.logger.Error().
					Interface("panic", r).
+					Bytes("stack", debug.Stack()).
					Msg("panic during shutdown")
			}

			// complete shutdown
			close(done)
		}()

		// stop services in the reverse order
		for i := len(p.stopStack) - 1; i >= 0; i-- {
			p.stopStack[i]()
		}
	}()
	...
}
zetaclient/chains/base/logger_test.go (1)

82-82: Logger initialization updated correctly to base.NewLogger

The test now properly uses base.NewLogger for logger initialization, aligning with the updated function signature.

zetaclient/chains/base/logger.go (2)

60-64: LGTM: Well-structured logger augmentation

The augmentLogger helper function provides a clean abstraction for logger configuration.


82-83: Consider potential race conditions with global logger

Setting the global logger instance could lead to race conditions in concurrent scenarios.

Consider:

  1. Avoiding global state by passing logger instances explicitly
  2. If global logger is necessary, initialize it once during application startup
✅ Verification successful

Let me gather more information about the logger initialization and usage patterns.


Let me gather more information about concurrent access patterns and logger initialization.


Let me check the logger initialization context and its usage pattern.


Based on the analysis of the codebase, I can now provide a final response:

Global logger modification is safe in this context

The global logger is set only once during initialization in NewLogger() function, which is called during application startup. The codebase shows:

  1. Logger initialization is centralized in zetaclient/chains/base/logger.go
  2. The global logger is set after all configuration is complete
  3. Subsequent logger usage is read-only through method calls
  4. No evidence of concurrent modification of the logger instance

The current implementation follows a common pattern where the global logger is configured once at startup and then used in a read-only manner throughout the application's lifecycle.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for concurrent access to the global logger
rg -A 2 "log\.Logger\." 

Length of output: 493


Script:

#!/bin/bash
# Search for logger initialization and configuration
rg -A 5 "log\.(Init|New|Config)" 

# Search for concurrent operations or goroutines near logger usage
rg -B 2 -A 2 "go\s+func" zetaclient/

# Find all logger imports to understand the logging package
rg -l "\".*log\"" 

Length of output: 25607


Script:

#!/bin/bash
# Search for goroutines or concurrent operations near logger usage
rg -B 3 -A 3 "log\.Logger\." zetaclient/

# Check logger initialization in main or setup functions
ast-grep --pattern 'func $_($$) {
  $$$
  log.Logger = $_
  $$$
}'

# Look for any mutex or sync primitives around logger
rg -B 2 -A 2 "sync\.(Mutex|RWMutex)" zetaclient/chains/base/

Length of output: 2443


Script:

#!/bin/bash
# Check the logger initialization file content
cat zetaclient/chains/base/logger.go

# Look for any concurrent logger usage patterns
rg -B 3 -A 3 "go\s+func.*\{.*log\." zetaclient/

# Check if logger is modified after initialization
rg -B 2 -A 2 "log\.(Set|Init|Configure)" zetaclient/

Length of output: 2561

zetaclient/config/config.go (1)

124-127: LGTM: Proper error handling with wrapped context

The error handling using errors.Wrap provides good context for debugging.

changelog.md (1)

Line range hint 26-1589: LGTM! Historical changelog entries are well-documented

The historical changelog entries are well-organized, properly categorized, and provide clear descriptions of changes with appropriate PR references.

cmd/zetaclientd/utils.go Outdated Show resolved Hide resolved
zetaclient/metrics/metrics_test.go Show resolved Hide resolved
pkg/graceful/graceful_test.go Show resolved Hide resolved
zetaclient/metrics/metrics.go Show resolved Hide resolved
cmd/zetaclientd/utils.go Outdated Show resolved Hide resolved
cmd/zetaclientd/utils.go Outdated Show resolved Hide resolved
cmd/zetaclientd/utils.go Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Simplify cmd/zetaclient/start.go Graceful shutdown + unified background task behaviour
3 participants