Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to fix lock contention #1610

Merged
merged 7 commits into from
Dec 12, 2023
Merged

Try to fix lock contention #1610

merged 7 commits into from
Dec 12, 2023

Conversation

trajan0x
Copy link
Contributor

@trajan0x trajan0x commented Dec 11, 2023

Attempt lock contention fix

Summary by CodeRabbit

  • New Features

    • Enhanced GraphQL server with improved caching mechanisms for better performance and reliability.
    • Implemented mutex locks to ensure thread-safe operations in cache handling.
  • Enhancements

    • Updated GraphQL resolvers to utilize new caching strategy for daily statistics queries.
    • Extended functionality in the fetcher to handle multiple bridge events.
  • Refactor

    • Introduced new time-related functions for more robust error handling and fallback mechanisms in database queries.
  • Tests

    • Expanded test suite setup to include new time-related configurations for GraphQL components.
  • Chores

    • Updated CODEOWNERS file to reflect changes in directory ownership assignments.

@github-actions github-actions bot added go Pull requests that update Go code size/l labels Dec 11, 2023
Copy link
Contributor

coderabbitai bot commented Dec 11, 2023

Walkthrough

The recent updates involve enhancing the GraphQL service with a new caching mechanism, utilizing mutex locks to manage concurrent access. The resolver function for daily statistics now leverages this cache, ensuring thread-safe operations. Additionally, the parsing and storage of blockchain events have been refined to handle multiple items and ignore non-bridge events. Time-related functionality has been introduced for database querying, with new utility functions for managing a fallback time. Ownership assignments in the CODEOWNERS file have been updated, reflecting changes in responsibility for certain directories.

Changes

File Path Change Summary
.../graphql/server/gin.go Added mapmutex import and CacheMutex field in configuration struct for GraphQL.
.../graphql/server/graph/queries.resolvers.go Updated DailyStatisticsByChain resolver with caching logic.
.../graphql/server/graph/resolver.go Added CacheMutex field to Resolver struct.
.../graphql/server/graph/fetcher.go Enhanced event parsing and storage logic.
.../graphql/server/graph/queryutils.go Added time-related variables and functions for database querying.
services/explorer/api/suite_test.go Imported graph package and added fallback time setup in tests.
CODEOWNERS Updated ownership for agents and services/scribe directories.

🐇✨
In the realm of code, where data streams flow,
A rabbit hopped in, with mutex locks in tow.
Cache now guarded, with threads in a dance,
A whimsical leap, for stability's chance. 🌟🔒
🐇✨

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on X ?


Tips

Chat with CodeRabbit Bot (@coderabbitai)

  • If you reply to a review comment from CodeRabbit, the bot will automatically respond.
  • To engage with CodeRabbit bot directly around the specific lines of code in the PR, mention @coderabbitai in your review comment
  • Note: Review comments are made on code diffs or files, not on the PR overview.
  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Note: For conversation with the bot, please use the review comments on code diffs or files.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • The JSON schema for the configuration file is available here.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

@trajan0x
Copy link
Contributor Author

@coderabbitai review

Copy link

codecov bot commented Dec 11, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (a5aac41) 51.78571% compared to head (e79fbf1) 51.41925%.

Additional details and impacted files
@@                 Coverage Diff                 @@
##              master       #1610         +/-   ##
===================================================
- Coverage   51.78571%   51.41925%   -0.36647%     
===================================================
  Files            374         366          -8     
  Lines          25648       24837        -811     
  Branches         284         284                 
===================================================
- Hits           13282       12771        -511     
+ Misses         11076       10837        -239     
+ Partials        1290        1229         -61     
Flag Coverage Δ
cctp-relayer ?
explorer 25.43225% <ø> (ø)
promexporter 73.80952% <ø> (ø)
sinner 64.60481% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 1

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between a5aac41 and 1bc5d57.
Files ignored due to filter (5)
  • contrib/promexporter/go.mod
  • contrib/promexporter/go.sum
  • go.work.sum
  • services/explorer/go.mod
  • services/explorer/go.sum
Files selected for processing (3)
  • services/explorer/graphql/server/gin.go (2 hunks)
  • services/explorer/graphql/server/graph/queries.resolvers.go (1 hunks)
  • services/explorer/graphql/server/graph/resolver.go (2 hunks)
Additional comments: 5
services/explorer/graphql/server/gin.go (2)
  • 11-11: The import of the mapmutex package is correctly placed and follows the convention of grouping third-party imports together.

  • 41-41: The CacheMutex field is properly initialized with mapmutex.NewStringMapMutex(). Ensure that all usages of the graph.Resolver struct across the codebase are updated to handle this new field.


The shell script output indicates that the CacheMutex field is only referenced in the gin.go file where it was added. There are no other instances in the codebase where the graph.Resolver struct is being initialized or the CacheMutex field is being handled. This suggests that the changes are localized and do not affect other parts of the codebase.

Final review comment:

The initialization of the CacheMutex field in services/explorer/graphql/server/gin.go is correct and no further action is required as there are no other references to this field in the codebase.

services/explorer/graphql/server/graph/queries.resolvers.go (1)
  • 189-200: The implementation of caching in the DailyStatisticsByChain resolver function appears to be correct. The use of a mutex lock to prevent race conditions when accessing the cache is a good practice to mitigate lock contention issues. Ensure that the keyGenHandleNilInt function correctly handles nil values to prevent potential cache key collisions.
services/explorer/graphql/server/graph/resolver.go (2)
  • 4-4: The import of mapmutex is correctly added to use mapmutex.StringMapMutex in the Resolver struct.

  • 22-24: The DB, Fetcher, and Cache fields have been added to the Resolver struct. Ensure that these fields are properly initialized and used within the codebase, and that their addition is intentional and related to the PR's objective of fixing lock contention.


The verification scripts did not produce any output, which suggests that the new fields DB, Fetcher, and Cache in the Resolver struct may not be initialized or used in the codebase. This could indicate that the changes are incomplete or that the patterns used in the scripts did not match the code structure correctly. However, without further evidence, we cannot definitively conclude that there is an issue.

Final review comment:

The verification scripts did not find instances where the new DB, Fetcher, and Cache fields in the Resolver struct are initialized or used. Please ensure these fields are properly integrated into the codebase if they are relevant to the PR's objective.

Comment on lines +22 to +27
DB db.ConsumerDB
Fetcher fetcher.ScribeFetcher
Cache cache.Service
// CacheMutex is a mutex used for caching. It is used to prevent multiple
// stat queries from being made at the same time. In the future, this should be done somewhere else.
CacheMutex mapmutex.StringMapMutex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The addition of the CacheMutex field to the Resolver struct is consistent with the PR's objective to address lock contention. However, the comment on lines 25-26 suggests that this is a temporary solution and should be moved elsewhere in the future. It's important to track this technical debt and ensure it is addressed in subsequent work.

@trajan0x trajan0x marked this pull request as ready for review December 12, 2023 01:59
@trajan0x trajan0x requested a review from nautsimon as a code owner December 12, 2023 01:59
@trajan0x trajan0x merged commit f92d9c3 into master Dec 12, 2023
32 of 34 checks passed
@trajan0x trajan0x deleted the fix/lock-contention branch December 12, 2023 02:00
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 2

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 1bc5d57 and e79fbf1.
Files selected for processing (4)
  • CODEOWNERS (2 hunks)
  • services/explorer/api/suite_test.go (2 hunks)
  • services/explorer/graphql/server/graph/fetcher.go (3 hunks)
  • services/explorer/graphql/server/graph/queryutils.go (2 hunks)
Files skipped from review due to trivial changes (1)
  • CODEOWNERS
Additional comments: 6
services/explorer/api/suite_test.go (1)
  • 5-15: The addition of the graph and time package imports is consistent with the changes made in the SetupTest function to manage fallback time for tests. Ensure that the graph package's time management functions are well-tested and thread-safe, especially if tests are run in parallel.
services/explorer/graphql/server/graph/fetcher.go (3)
  • 282-284: The loop added to parseAndStoreLog function to store each parsed log could potentially address lock contention by processing logs individually. Ensure that the storeBridgeEvent function is thread-safe to handle concurrent calls without causing new contention issues.

  • 315-317: Similarly, the loop added to parseAndStoreLogCCTP function to store each parsed log should be reviewed for thread safety. Confirm that concurrent execution of storeBridgeEvent does not introduce race conditions or other concurrency issues.

  • 491-491: The comment "// will ignore non-bridge events" has been added, but the provided code snippet does not show any logic for filtering non-bridge events. Ensure that the comment accurately reflects the behavior of the storeBridgeEvent function.

services/explorer/graphql/server/graph/queryutils.go (2)
  • 1638-1639: Creating a context with a timeout based on a potentially mutable global variable can lead to unpredictable behavior if the variable is changed during operation. Ensure that the usage of timeToFallback is safe and consider if there's a need to protect against concurrent modifications when setting the fallback time.

  • 1667-1670: The error handling strategy in GetDestinationBridgeTxBW relies on the bridgeEventMV being nil or having a TChainID of 0 to trigger a fallback. Verify that the fallback function bwDestinationFallback is robust and that this error handling strategy is consistent with the rest of the application.

Comment on lines 212 to 223
func (g *APISuite) SetupTest() {
g.TestSuite.SetupTest()

initialFallback := graph.GetFallbackTime()
graph.UnsafeSetFallbackTime(time.Second * 20)
g.TestSuite.DeferAfterTest(func() {
graph.UnsafeSetFallbackTime(initialFallback)
})

g.db, g.eventDB, g.gqlClient, g.logIndex, g.cleanup, g.testBackend, g.deployManager = testutil.NewTestEnvDB(g.GetTestContext(), g.T(), g.scribeMetrics)

cleanup, port, err := clickhouse.NewClickhouseStore("explorer")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes in the SetupTest function to set and reset the fallback time are good practice to prevent side effects between tests. However, verify that the GetFallbackTime and UnsafeSetFallbackTime functions are thread-safe to avoid race conditions when tests are run concurrently.

Comment on lines +1619 to +1631
var timeToFallback = time.Second * 0

// GetFallbackTime gets the fallback time for the bridge watcher.
// this is intended only for testing
func GetFallbackTime() time.Duration {
return timeToFallback
}

// UnsafeSetFallbackTime sets the fallback time for the bridge watcher.
// it is intended for testing. Plese remember to reset this value.
func UnsafeSetFallbackTime(ttf time.Duration) {
timeToFallback = ttf
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The introduction of a global variable for controlling timeouts (timeToFallback) and the associated functions (GetFallbackTime and UnsafeSetFallbackTime) could lead to race conditions if accessed concurrently. Consider using a more thread-safe approach to manage timeouts, such as encapsulating the timeout within a struct and using methods with proper synchronization mechanisms.

- var timeToFallback = time.Second * 0
+ var timeToFallback = atomic.Value{}

- timeToFallback = ttf
+ timeToFallback.Store(ttf)

- return timeToFallback
+ return timeToFallback.Load().(time.Duration)

Committable suggestion

IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
var timeToFallback = time.Second * 0
// GetFallbackTime gets the fallback time for the bridge watcher.
// this is intended only for testing
func GetFallbackTime() time.Duration {
return timeToFallback
}
// UnsafeSetFallbackTime sets the fallback time for the bridge watcher.
// it is intended for testing. Plese remember to reset this value.
func UnsafeSetFallbackTime(ttf time.Duration) {
timeToFallback = ttf
}
var timeToFallback = atomic.Value{}
// GetFallbackTime gets the fallback time for the bridge watcher.
// this is intended only for testing
func GetFallbackTime() time.Duration {
return timeToFallback.Load().(time.Duration)
}
// UnsafeSetFallbackTime sets the fallback time for the bridge watcher.
// it is intended for testing. Plese remember to reset this value.
func UnsafeSetFallbackTime(ttf time.Duration) {
timeToFallback.Store(ttf)
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go Pull requests that update Go code M-Codeowners size/l
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant