-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataProvider routing, lazy deserialization, caching, and overlays #1246
Comments
I like this overall plan.
I think cargo feature is the right call here.
The standard pattern in Rust for caches is interior mutability (mutex or refcell). We can use things like Weak/Rc as well to build caches. |
That looks good!
I'd go for Mutex. I'm a bit concerned about your snippet at the end with overlays. The design you propose requires loading payload, modifying it, and then returning. This differs from what I see as the most important use case, which I'd show as: struct MyDataOverlay<P: DataProvider<ErasedDataStruct>> {
provider: P,
}
impl<P> DataProvider<ErasedDataStruct> for MyDataOverlay<P> {
fn load_payload(&self, req) -> DataResponse<ErasedDataStruct> {
if (/* data overlay conditional */) {
load_local_payload(req);
} else {
self.provider.load_payload(req)
}
}
} and: struct MyDataOverlay<P: DataProvider<ErasedDataStruct>> {
provider: P,
}
impl<P> DataProvider<ErasedDataStruct> for MyDataOverlay<P> {
fn load_payload(&self, req) -> DataResponse<ErasedDataStruct> {
let mut res = load_local_payload(req);
if (!res.contains(something)) {
res.extend_with(self.provider.load_payload(req));
}
res
}
} |
#1369 implements much of the infrastructure for this design to work. I consider the remaining deliverable for this issue to be tests/examples for the remaining constructions in the OP. |
Given CrabBake and the fact that the erased data provider needs a more prominent role, and based on further experience with FFI, here is my updated trait structure. BufferProviderA data provider that provides blobs. Function Signature: Features:
Status: Implemented. AnyProviderA data provider that provides Rust objects in memory as Function Signature: Features:
Status: Tracked by #1479 and #1494 KeyProvider
|
To-do: make sure everything here is well documented. |
Document the following in the data provider tutorial:
|
I wanted to put together an updated, comprehensive model of how different types of data providers interact with one another.
I. Routing
A "routing data provider" or "data router" is one that sends a data request to one or more downstream data providers.
Multi-Blob Data Provider
The multi-blob data provider (#1107) is a specific case. Its data model can be a set of ZeroMaps, and perhaps some metadata to help know which ZeroMap to query for a particular key and locale.
General-Purpose Data Router
The more general case requires using
dyn Any
as an intermediate. We already haveErasedDataStruct
for this purpose. Please note that ErasedDataStruct is a different module with a different purpose than the one that useserased_serde
.In order to convert from
ErasedDataStruct
to a concrete type, we need lazy deserialization.II. Lazy Deserialization
In #837, I suggest making a data provider that converts from u8 buffers to concrete data structs. Something like:
where
BufferMarker
is a data struct that has not been parsed yet.BlobDataProvider
,FsDataProvider
,MultiBlobDataProvider
, etc., would all produceBufferMarker
.To go one step further,
DataDeserializer
could work onErasedDataStruct
as well. It would first attempt to downcast the data struct to the concrete type, and if that fails, it then attempts to downcast to aBufferMarker
and deserializes it. (It is unexpected for both both downcasts to fail; in such a case, we would return an error result.)Open Question: How should we configure the deserializers (JSON, Bincode, Postcard, etc) that a
DataDeserializer
can operate on? The code we currently use is here, where we essentially have Cargo features that turn on or off the different deserializers. We want to avoid using erased_serde in the general case, because of the impact on code size that we discovered. The cargo feature might be the best option for now, because apps should hopefully know which deserializers they need to use at compile time. We could add an option for erased_serde later for apps that don't care as much about code size but want to dynamically load new deserializers at runtime.III. Caching
The rule of thumb is that there is no such thing as a one-size-fits-all caching solution. Clients have different use cases and resource constraints, which may favor heavy caching, light caching, or no caching at all.
A basic cache would look something like this:
Note that we load from a
DataProvider
but cache aDataResponse
.Depending on whether the cache is inserted before or after the deserializer, the cache could track raw buffers or resolved data. In general, the intent would be that the cache is inserted after the deserializer, such that we keep track of resolved data structs that the app has previously requested.
Open Question: The caching data provider needs to mutate itself, but the DataProvider trait works on shared references. I think we should use a mutex-like abstraction to make this thread-safe. The alternative would be to make DataProvider work on mutable references instead of shared references.
IV. Overlays
One of the main use cases for chained data providers has been the idea of data overlays.
Until we have specialization, data overlays probably need to operate through the
dyn Any
code path like caches and general-purpose routers. A data overlay would likely take the following form:Seeking feedback from:
The text was updated successfully, but these errors were encountered: