-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data provider trait definition #8
Comments
Option A: Enum returned by valueI have the following solution implemented in https://github.com/sffc/rust-wasm-i18n/blob/master/wasm-no-std/src/intl/util/data.rs: pub struct CompactData {
pub prefix: String,
pub suffix: String,
}
pub struct CurrencyData {
// ...
}
pub struct CompactCurrencyData {
// ...
}
pub struct ScientificData {
// ...
}
pub struct DecimalData {
pub zero_digit: char,
pub decimal_separator: char,
pub grouping_separator: char,
pub primary_group: Option<u8>,
pub secondary_group: Option<u8>,
}
pub enum DataKey {
// TODO: Upgrade Box to something that doesn't require owned data (Cow?)
Compact(Option<Box<CompactData>>),
Currency(Option<Box<CurrencyData>>),
CompactCurrency(Option<Box<CompactCurrencyData>>),
Scientific(Option<Box<ScientificData>>),
Decimal(Option<Box<DecimalData>>),
}
pub trait DataProvider {
fn get_data(&self, key: &DataKey) -> DataKey;
}
// Example implementation
pub struct DummyDataProvider {
}
impl DataProvider for DummyDataProvider {
fn get_data(&self, key: &DataKey) -> DataKey {
match key {
DataKey::Compact(_) => DataKey::Compact(Some(Box::new(CompactData {
prefix: String::from("p"),
suffix: String::from("s"),
}))),
DataKey::Currency(_) => DataKey::Currency(Some(Box::new(CurrencyData {}))),
DataKey::CompactCurrency(_) => DataKey::CompactCurrency(Some(Box::new(CompactCurrencyData {}))),
DataKey::Scientific(_) => DataKey::Scientific(Some(Box::new(ScientificData {}))),
DataKey::Decimal(_) => DataKey::Decimal(Some(Box::new(DecimalData {
zero_digit: '0',
decimal_separator: '.',
grouping_separator: ',',
primary_group: Some(3),
secondary_group: Some(2),
}))),
}
}
} Pros:
Cons:
|
@Manishearth @zbraniecki @hsivonen @hagbard Thoughts on this? I'd like to have a more concrete data provider trait definition soon, so we can start implementing more pieces of it. |
This comment has been minimized.
This comment has been minimized.
Yes, of course. What I'm missing is a list of "vaguely sensible" options, beyond the one option I proposed earlier in the thread. My ask is to compile a list of options, pick one to start with, and we can change later if need be. |
Ahh okay. Then since this is conceptually a "have key, get value" thing, any API that's semantically equivalent to "given URI, get resource" sounds fine. The issues of how to implement wrapped suppliers etc. (as suggested in the linked doc) all boil down to the semantics of the kays and what (if anything) can be assumed about the structure of the returned data. A sketch of what a wrapper might do (i.e. pseudo code for a realistic example that interprets the key in order to fetch from one of several sub-suppliers) would be nice. E.g. how would I write a wrapper than deals with a legacy path, no longer supported in the default supplier but needed by my legacy library code for some bizarre reason (or is that never going to work?). As a related aside, if the key were represented as a URI, what do you think it would look like (I think this would help me understand what properties things like version and the optional data key are meant to have and how they relate to each other). If this isn't something that could be represented as a URI, I think it says something quite interested about the design. |
There is a list of example keys in the draft, and the draft lists which specific types are allowed to be in the returned data.
Sure. I'll give an example of a provider that tries several providers to get data for a particular locale. // I'm writing this in Python because it's for illustration purposes only
class LocaleForkingProvider:
def __init__(self, child_providers):
self.child_providers = child_providers
def get_data(self, key, request_vars):
for provider in self.child_providers:
(data, response_vars) = provider.get_data(key, request_vars)
if response_vars.langid != "und":
return (data, response_vars)
# None of the providers had data other than root; return from the first one
return child_providers[0].get_data(key, request_vars)
My proposal as currently approved has keys as enums instead of string URIs. If there's a reasonable trait definition that is type-safe with string URIs, then that's something we could consider changing in the draft. Just to rewind a bit, I want to make sure we're on the same page that this issue is not for general discussion of the data pipeline proposal. This thread is to answer the specific question of the Rust trait definition for the data provider. I would like to see some concrete options, such that we can discuss their pros and cons. |
Example keys don't tell me what the semantics are. I can't understand issues like how we deal gracefully with deprecated keys (if we can). I'm not questioning whether what's suggested will work now, I'm trying to understand what it feel like to be using this API in 5 years time after a bunch of things have changed. Do clients end up having a mix of versioned keys in their code? I.e. "use v2 or this key, but version 7 of that key because reasons". How does that feel to a maintainer of this client code when (if) keys have mix of versions scattered around? How many keys is a "module" of code going to use? If 1 or 2 it might be fine, but if 10 or 20 it might get unmaintainable. This is what I want to understand.
So what order does it try in? Are suppliers going to advertise some "precedence" for ordering? Do they have version numbers? Or, if multiple suppliers provide data for a key, must they always provide identical results?
You misunderstand, I don't mind what the key is in code, I'm proposing a thought experiment where you say "if we were using a URI, this is what it would be" because a URI has parts which have different semantics (scheme, path, anchor). Know what it would look-like as a URI lets you frame the key data in a way that more easily explains its semantics and properties like stability. E.g. a URI using an anchor has the property that "scheme://path#anchor" implies the existence of the resource "scheme://path", so if there's something in keys for getting "a bit of a data block", then it would be compatible with an anchor. A path is different, but it shows order scheme://foo/bar may imply the existence of scheme://foo (but need not) but it does say how "foo" relates to "bar". In you case, you talk about an optional "key" (e.g. locale ID). It that "path like" or "anchor like"? What about versioning? Would a URI be: scheme://version/path or scheme://path/version ?
I can't answer specifics until I understand the concepts better. An example, even working code, doesn't let me understand the intended semantics (which is what matters most in design). |
My intent is that the section Supported Keys in the explainer addresses this question. Versioning of various pieces of the pipeline (key version, CLDR version, etc) is something into which I put a lot of thought when writing the explainer.
The keys are not something that client code generally needs to see, unless it is implementing a custom data provider (e.g., to override a certain piece of data). OmnICU requests data for a key, and the provider responds with data. The client just passes through the request and response. When designing the data provider trait, a higher emphasis should be put on efficiency and type safety, and perhaps a bit less on ergonomics than usual, since the average OmnICU user won't be needing to call or implement it.
In this example, they're tried in the order in which the user listed them in the constructor. I can imagine other data provider implementations doing some kind of sorting on the list.
If there's a need, I think it would be very reasonable for a data provider to provide metadata about itself, like what version of CLDR it provides, etc.
Data is versioned. The data version is one of the response variables explained in the explainer. It's also worth pointing out that I was reminded in #19 that this API should be async. Since the data provider can be reading data from a file, service, etc., we want it done async. |
So, I think the passthrough here should be handled externally rather than internally. I.e. the provider is able to respond "don't know" (this could simply be a None result, though we may need more variants to distinguish between "don't know" and "bad query", and depending on how we handle async there may be another "ask again later" option where you need to run some async fetcher first). Whoever is using the provider can handle fallback, and we can also write a generic provider that can handle fallback between multiple existing providers. Given that, I'd probably have DataKey not contain any data, and instead have a DataValue or Data for the return type. This can also be typed: we can have |
With function generics, a DataProvider would have to implement
Can you elaborate? If I were writing this in C++ using casts, I would do something like, const void* get_data(DataKey key) { /* ... */ }
auto currency_data = static_cast<const CurrencyData*> get_data(DataKey::CURRENCY); So I want the return type to be linked to the argument type in some way, but I also want this to work such that data providers can pass around keys and data hunks opaquely without generics. |
No, you just write one implementation that's capable of dealing with any type that satisfies
The analog to this would be returning a |
Okay, so to make sure I understand, you're proposing something like this? use std::borrow::Cow;
enum Encoding {
Utf8,
Utf16,
}
struct Request<T2> {
// *Requested* LangID
// Note: use LangID once it is available
langid: String,
encoding: Encoding,
params: T2,
}
struct Response {
// *Supported* LangID
langid: String,
encoding: Encoding,
data_version: String,
}
trait DataProvider {
fn get_data<T1: std::clone::Clone, T2>(req: Request<T2>) -> (Cow<'static, T1>, Response);
} Unfortunately the following usage doesn't compile: struct ForkingDataProvider<'a> {
provider_1: &'a dyn DataProvider,
provider_2: &'a dyn DataProvider,
}
impl<'a> DataProvider for ForkingDataProvider<'a> {
fn get_data<T1: std::clone::Clone, T2>(&self, req: Request<T2>) -> (Cow<'static, T1>, Response) {
if req.langid == "de-CH" {
return self.provider_1.get_data(req);
} else {
return self.provider_2.get_data(req);
}
}
}
/*
error[E0038]: the trait `DataProvider` cannot be made into an object
--> src/main.rs:28:5
|
24 | fn get_data<T1: std::clone::Clone, T2>(&self, req: Request<T2>) -> (Cow<'static, T1>, Response);
| -------- method `get_data` has generic type parameters
...
28 | provider_1: &'a dyn DataProvider,
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the trait `DataProvider` cannot be made into an object
*/ |
Right, that's what I meant by "this is tricky to turn into a trait object".
No, your |
If you need |
Fixed by #61 |
In data-pipeline.md, I explain the need for a data provider trait, with an open question being exactly what the definition of this trait should be. The purpose of this thread is to reach a cohesive solution for that trait.
Requirements list (evolving):
The text was updated successfully, but these errors were encountered: