-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Long DSS Response Times on Requests with Large AOIs #1121
Comments
Running this query times out after a minute on the Wing internal instance. CRDB Debugging Bundle attached. I believe it's this query: SELECT
scd_operations.id,
scd_operations.owner,
scd_operations.version,
scd_operations.url,
scd_operations.altitude_lower,
scd_operations.altitude_upper,
scd_operations.starts_at,
scd_operations.ends_at,
scd_operations.subscription_id,
scd_operations.updated_at,
scd_operations.state,
scd_operations.cells
FROM
scd_operations
WHERE
cells && $1
AND COALESCE(scd_operations.altitude_upper >= $2, true)
AND COALESCE(scd_operations.altitude_lower <= $3, true)
AND COALESCE(scd_operations.ends_at >= $4, true)
AND COALESCE(scd_operations.starts_at <= $5, true)
LIMIT
$6 where the culprit is this part: cells && $1 where there are some 25,000+ S2 cells passed to the intersection query (in On CRDB 21.2.7 and below, the |
In fact, looking at the code that was added in cockroachdb/cockroach#77418, I think there's some nested looping behavior going on, such that for every/some large number of S2 cells in the inverted index we're likely iterating over of the entirety of the passed in cell array (of 25,000 items). That's likely leading to some n^2 behavior |
@gsmithCAL The InterUSS DSS is generally designed to support the use cases described in F3548-21. Are you planning a flight that is going to cover 12,000 square miles? If not, could you elaborate on why your planning area is so large (more than 1/4 the size of Ohio)? I confirm I can reproduce using the code below with a local import time
from monitoring.monitorlib import auth
import requests
adapter = auth.make_auth_adapter("DummyOAuth(http://localhost:8085/token,big_planning_client)")
token = adapter.issue_token("localhost", ["utm.strategic_coordination"])
body = {
"area_of_interest": {
"volume": {
"outline_circle": {
"center": {
"lng": -95.28,
"lat": 32.38
},
"radius": {
"value": 100000,
"units": "M"
}
},
"altitude_lower": {
"value": 0,
"reference": "W84",
"units": "M"
},
"altitude_upper": {
"value": 400,
"reference": "W84",
"units": "M"
}
},
"time_start": {
"value": "2024-09-18T18:27:19.420Z",
"format": "RFC3339"
},
"time_end": {
"value": "2024-09-19T18:27:19.420Z",
"format": "RFC3339"
}
}
}
t0 = time.time()
resp = requests.post(
url="http://localhost:8082/dss/v1/operational_intent_references/query",
json=body,
headers={"Authorization": f"Bearer {token}"},
timeout=300,
)
t1 = time.time()
print(f"Response code {resp.status_code} after {t1 - t0}s") Using similar code, it does look like processing time scales with n^2 where n relates to area: |
@BenjaminPelletier I don't specifically have a use case for this. I originally came across this behavior when our USS attempted to create a subscription over a large area. I agree that in a "production" environment you would be unlikely to ever need to request an AOI as large as this. |
Given that the nested loops seem to be used only for the For example, there is an I'll give that a try tomorrow: it should be easy to check if it helps or not. |
I think it's worth investigating why a notional performance improvement (implementation of inverted indices) resulted in a substantial decrease in performance -- especially determining if that might hurt us in more normal circumstances as well. But, I think support for extremely large planning areas would not be particularly high on the priority list as our implementation is targeted at USSs following one or more of the ASTM standards (F3411 and/or F3548) and I don't think we've identified a true use case for this kind of query in that context. |
On CRDB v21.2.7, this query would have been implemented using a full table scan + post filter, or more likely an index scan (of the altitude & start/end times) + post filter. In both cases (full or index scan), the pathologically-large cell probe set is applied to a subset of the table. My guess is the other indexes also kick in and the bad S2 cell set is applied to a much smaller number of rows, leading to the "good" 700ms latency. It's also quite possible that the logic that was doing the post-filtering was able to do smart things like building a hash map and probing, or using sorting of the arrays to do a binary search rather than a naive loop. Neither of those techniques are applied in the inverted index logic described above. Post CRDB v21.2.7, the pathologically-large cell probe being applied to every single row in the database, regardless of other filters (because it's happening at index consultation time). In more nominal cases, the inefficiencies of the indexing loops are going to be dwarfed by the cost of performing a full table scan, shipping data over the wire, and then doing the filtering. This is at least partly a CRDB bug - it should handle this sort of case better than the current nested loop in the index path. However, I don't foresee it being an issue for "normal" queries - see the huge performance gains in #1070 as proof - it exercises this same inefficiency. |
From a quick try I did, and subsequently looking at the CRDB web-UI, it seems that the naive subquery approach results in full table scans. I don't know if this is due to the low number of records that are generated by the qualifier, or if that would still be the case on a real-world deployment with many records. |
Describe the bug
When making a request to the DSS that includes an area_of_interest consisting of a circle with a radius value of 100,000m, the DSS takes upwards of 30 seconds to respond. This behavior is observed on DSS v0.17.0 and CRDB v.24.1.3. It is important to note that this behavior is not seen when using DSS v0.17.0 and CRDB v21.2.7.
To Reproduce
Steps to reproduce the behavior:
POST http://localhost:8082/dss/v1/operational_intent_references/query
Body: {"area_of_interest": {"volume": {"outline_circle": {"center": {"lng": -95.28, "lat": 32.38}, "radius": {"value": 100000, "units": "M"}}, "altitude_lower": {"value": 0, "reference": "W84", "units": "M"}, "altitude_upper": {"value": 400, "reference": "W84", "units": "M"}}, "time_start": {"value": "2024-09-18T18:27:19.420Z", "format": "RFC3339"}, "time_end": {"value": "2024-09-19T18:27:19.420Z", "format": "RFC3339"}}}
Expected behavior
Using CRDB v21.2.7 these requests usually only take around 700ms to complete.
If anything needs further clarification, don't hesitate to leave a comment here or send a message on slack.
The text was updated successfully, but these errors were encountered: