-
-
Notifications
You must be signed in to change notification settings - Fork 698
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Leptos Axum Handle Server Fn unwrapping Context<ResponseOptions> can panic under load #2112
Comments
When I try to run this on a random example like
but do not run into any panics. Although it looks from the text of your steps that maybe it requires an extractor too? I'm open to the idea that there's some actual underlying issue here, but I would need an actual reproduction that is more specific than the steps you've given here. |
I ran into the same issue with
I am completely unable to reproduce the error when running the webapp locally, so I can't give you a minimal example or clear steps to reproduce the bug. |
@WIGGLES-dev @pnmadelaine Now that #2158 is merged, the server fn implementation on the main branch is now quite different from the previous implementation. It is available for testing either with a dependency on the main branch or version Because the underlying implementation has changed significantly, I'm going to close this issue. If you encounter the same problem with the new version, feel free to open a new issue. |
this does still happen on 0.6.5:
I think it originates here respectively here. |
@Panaetius I'm happy to look into this at some point if you can open a new issue with some way of reproducing the problem. |
As a note I haven't seen any way to consistently reproduce this problem but it is causing issues. Originally I saw it when making ~40 requests at the same time to the same server function. Just now it appeared to be happening under very little load. Added context.
|
More context on this I added a few strategic logs and found the issue appears to be that the Runtime has been disposed. Thereafter it ends up calling the RuntimeId::default() after exhausting other options on the code path to get the current runtime. That happens here I don't know enough to debug further at this time but my guess is the runtime that is created by default does not have the context that is expected. @gbj I would love you any thoughts on how this might be failing or where I might look to debug further. Regardless I will dig deeper myself as well. |
@glademiller I am not able to share any useful thoughts without a way to reproduce the issue. As I said above, I am happy to look into this more if anyone can open a new issue with a reproducible example. |
@gbj Thank you if I find a way to reproduce the issue consistently I will send it over. For now for anyone else having this problem if you are not using the ResponseOptions to change a server functions response then the workaround I am using at the moment is to patch leptos-axum by changing this line https://github.com/leptos-rs/leptos/blob/main/integrations/axum/src/lib.rs#L306 The root of the problem is that in some scenario I haven't been able to identify the Runtime is disposed so a default runtime is returned which does not have the ResponseOptions in the Context as leptos-axum expects. |
Edit: I am not sure what is causing this but it is like that some requests work, some others not (kind of random). Edit2: Removing mimalloc did not help I am facing the same issue after calling a server fn of an app that runs on #[cfg(feature = "ssr")]
pub mod ssr {
use crate::leptos_ui::app::NoCustomError;
use crate::middleware::proxy::Route;
use crate::state::AppState;
use leptos::use_context;
use leptos::ServerFnError;
pub fn get_state_route(
route_name: &str,
) -> Result<(Route, AppState), ServerFnError<NoCustomError>> {
let state: AppState =
use_context::<AppState>().ok_or(ServerFnError::ServerError::<NoCustomError>(
"No server state".to_string(),
))?;
let route = state
.proxy_api
.routes
.get(route_name)
.ok_or(ServerFnError::ServerError::<NoCustomError>(format!(
"Could not extract {} API route",
route_name
)))?;
Ok((route.clone(), state))
}
}
#[server(CasaWorkflowV1Details, "/api_leptos")]
pub async fn call_details(id: i64) -> Result<Details, ServerFnError> {
use crate::api::abc::api::workflow::details;
use http_auth_basic::Credentials;
use crate::leptos_ui::app::ssr::get_state_route;
let (route, state) = get_state_route("myapp")?;
let credentials = Credentials::new(
&route.user.clone().unwrap(),
&route.password.clone().unwrap(),
);
details(id, credentials, route.url.clone(), state.proxy_api.client).await
}
pub async fn details(
id: i64,
creds: Credentials,
base_uri: String,
client: HyperClient,
) -> Result<Details, ServerFnError> {
let uri = url::Url::parse(&format!("{base_uri}/api/abc/v1/{id}"))
.expect("base url")
.to_string();
let req = hyper::Request::builder()
.method(hyper::Method::GET)
.header("Authorization", format!("Basic {}", creds.encode()))
.header("content-type", "application/json")
.uri(uri)
.body(axum::body::Body::empty())?;
let resp = client.request(req).await?;
if !resp.status().is_success() {
return Err(ServerFnError::ServerError(
format!("x returned invalid status code '{}'", resp.status()).to_string(),
));
}
let body = resp.into_body();
let body = body.collect().await?;
let body = body.to_bytes().to_vec();
let debug_str = std::str::from_utf8(&body)?;
info!("response: '{}'", debug_str);
Ok(serde_json::from_slice(&body)?)
} |
@glademiller Under which Docker image does your service run? |
I noticed that the problem might happen when the serverfn returns a |
@johnbchron shouldn't the app in that case, that the issue is memory limit related, exit with a OoM error? Edit: In my case raising the memory did not fix the issue. |
@lcmgh Yeah raising the memory didn't help but I later moved from a I don't have the bandwidth to build the repro at the moment, but testing with only 1 core is probably a helpful clue. |
I am using kubeimages/distroless-cc so distroless is a common denominator. I am also running on ECS Fargate using .25 vCPU so a low powered system to be sure. I don't think I have seen anything in the metrics for the service that would indicate a significant memory or CPU usage spike. That said it does occur for me when I have many simultaneous requests. |
Issue also appeared when using Can confirm that raising the CPU settings from a fraction of 1 to 2 fixed the issue (Kubernetes). Can confirm that the issue appears when having many (in my case 5-10) concurrent requests. The first time such requests are being made it works fine for most of the time. Subsequent triggers for the same requests lead to more failed ones. |
2 CPUs improved the situation, but when refreshing often same error occurs |
@gbj I have created a sample app that reproduces the issue. Please see the below repo. https://github.com/jmsolomon2000/leptos-runtime-dispose-repro |
I can confirm this completely resolves the issue. This has been happening on multiple production apps I've built and released to a server, I never have been able to reproduce this locally. @gbj this is the issue I brought up in discord regarding the 502 errors through a load balancer. I never did get to the bottom of why that was happening but the above has resolved that also. |
@mjarvis9541 @glademiller I'm concerned that the solution (which I believe is now merged in #2468) mostly hides the problem rather than fixing it: i.e., if someone has a server function that actually sets something in I will plan to revisit this to make sure it is solved in 0.7, where the reactive system works significantly differently. |
@gbj I agree my "fix" is a workaround not a solution. That is why I qualified my comment above with "if you are not using the ResponseOptions to change a server functions response then the". The real solution is to find out what is causing the Runtime to be disposed and fixing that. In my case I don't ever change the ResponseOptions at the moment so the workaround is okay but I don't think it should be merged since it just changes the bug into being an issue with ResponseOption changes not being respected in this scenario. |
@gbj @glademiller I'm wondering if this has something to do with the fallback route and providing the correct context upon the server page load. Given:
I recall when using the custom server fn handler and custom leptos route handler the same context was required to be provided for both. I've noticed some strange behaviour with routing recently on how it's handling trailing slashes, :? routes; thinking there are duplicate routes and throwing a panic when there are not, and generally getting confused at times. The server was throwing panics when it couldn't access the pg pool via context on routes with trailing slash at the end. I have quite a large application with many routes, optional path params, mutliple nested params, all that fun stuff, so I'm having a great time trying to debug. The following setup seems to have stopped the panic from the server requests on paths with trailing slash and not finding the pg pool from context. However I'm unable to use Any thoughts on this? #[tokio::main]
async fn main() {
dotenvy::dotenv().unwrap();
let database_url = env::var("DATABASE_URL").expect("DATABASE_URL should be set");
let pool = PgPoolOptions::new()
.max_connections(5)
.connect(&database_url)
.await
.expect("could not create a database pool");
let conf = get_configuration(None).await.unwrap();
let leptos_options = conf.leptos_options;
let addr = leptos_options.site_addr;
let routes = generate_route_list(App);
let state = Arc::new(AppState { pool: pool.clone() });
let context = move || provide_context(pool.clone());
let app = Router::new()
.nest_service("/pkg", ServeDir::new("target/site/pkg"))
.route_service("/favicon.ico", ServeFile::new("target/site/favicon.ico"))
.leptos_routes_with_context(&leptos_options, routes, context, App)
.fallback(leptos_fallback)
.layer(Extension(state))
.with_state(leptos_options);
let listener = tokio::net::TcpListener::bind(&addr).await.unwrap();
axum::serve(listener, app).await.unwrap();
async fn leptos_fallback(
State(options): State<LeptosOptions>,
Extension(state): Extension<Arc<AppState>>,
req: Request<Body>,
) -> impl IntoResponse {
let context = move || provide_context(state.pool.clone());
let handler = leptos_axum::render_app_to_stream_with_context(options, context, App);
handler(req).await.into_response()
} For reference: render_route_with_context:
|
This one has plagued us for a while, we've never been able to reliably reproduce it enough for testing. The idea that multiple cores helps it seems to be a patch/workaround. I'm hoping/believing that this is fixed in 0.7, which we should get a sense of as more people test the beta |
This |
Describe the bug
In leptox_axum handle_server_fns_inner there are several call to unwrap() that can evidently be None (though I'm not entirely sure how).
Leptos Dependencies
To Reproduce
Steps to reproduce the behavior:
You'll have to bear with me here because I can only reproduce this when stress testing.
./wrk -t 8 -c 2000 -d 30s --latency http://localhost:3000
error reaching server to call server function: TypeError: NetworkError when attempting to fetch resource.
.Expected behavior
Option::unwrap() call is removed and replaced with a guard. Though I'm not sure what is causing this problem.
Screenshots
If applicable, add screenshots to help explain your problem.
Additional context
If you look at the handler you can clearly see that provide_context is called early on so there's no reason to suspect that getting the context for ResponseOptions wouldn't return Some, but it does). My guess bet is that there's reads and writes going on at the same time.
The text was updated successfully, but these errors were encountered: