-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework library interface #2582
base: master
Are you sure you want to change the base?
Rework library interface #2582
Conversation
a5f87ec
to
8b57b35
Compare
As always, documentation current to the tip of this branch is available here. |
Actually, the integration tests depend on JSON serialization. Should I integrate #2493 into this PR? I think it would be a better idea to merge this first and then rebase that one onto master, because I think JSON serialization still needs a lot of discussion that is mostly disconnected to the discussion that this PR needs. |
8b57b35
to
463d44a
Compare
Reviewing this, I think the only thing I still want to change here is for |
463d44a
to
9265d66
Compare
Ok, with this I think it is good enough to stabilize. As always, the up-to-date rendered documentation is here. Also, reminding that this is just the API skeleton, although most of it is already implemented. This still doesn't have support for residual goals for example, but it leaves space for it to be implemented later without an API breaking change. Stabilizing the Rust API will unblock parallel ventures such as #2465 and improving the Wasm interface because we will all have a stable foundation to build upon. There's still work to be done here, but it will be almost completely independent of these other ventures. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few notes and suggestions for things we could do while we are at it.
Otherwise this looks good to me.
enum StreamConfigInner { | ||
Stdio, | ||
Memory, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
enum StreamConfigInner { | |
Stdio, | |
Memory, | |
} | |
#[derive(Default)] | |
enum StreamConfigInner { | |
Stdio, | |
#[default] | |
Memory, | |
} |
/// Describes how the streams of a [`Machine`](crate::Machine) will be handled. | ||
pub struct StreamConfig { | ||
inner: StreamConfigInner, | ||
} | ||
|
||
impl Default for StreamConfig { | ||
/// Defaults to using in-memory streams. | ||
fn default() -> Self { | ||
Self::in_memory() | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the suggested changes for StreamConfigInner
below, the Default
impl could be derived.
/// Describes how the streams of a [`Machine`](crate::Machine) will be handled. | |
pub struct StreamConfig { | |
inner: StreamConfigInner, | |
} | |
impl Default for StreamConfig { | |
/// Defaults to using in-memory streams. | |
fn default() -> Self { | |
Self::in_memory() | |
} | |
} | |
/// Describes how the streams of a [`Machine`](crate::Machine) will be handled. | |
#[derive(Default)] | |
pub struct StreamConfig { | |
inner: StreamConfigInner, | |
} |
/// Describes how a [`Machine`](crate::Machine) will be configured. | ||
pub struct MachineBuilder { | ||
pub(crate) streams: StreamConfig, | ||
pub(crate) toplevel: &'static str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to require the top level to be a &'static str
?
I could see it being useful e.g. to use a file loaded at runtime to be used,
in that case we could use a Cow<'static, str>
to allow either a &'static str
or a String
to be used.
We would need to adjust MachineBuilder::with_toplevel
and Machine::load_top_level
accordingly.
Machine::load_top_level
would need to construct toplevel_stream
depending on the Cow variant we have, either via Stream::from_static_string
or via Stream::from_owned_string
.
Maybe a Stream::from_string
taking a Cow<&'static, str>
would make sense? (name is debatable other suggestions from_cow_string
or from_static_or_owned_string
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mentioned this above actually. I decided not going this way because it seemed like it would need really deep changes to accommodate properly. Maybe just leaking for now if it is too complicated is a good idea to get the interface right faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have expected this to rather straight forward, but it's not unlike that I missed something which makes this more difficult than immediately obvious.
I see 4 options here:
- keep the interface as is for now, making a breaking change later when changing to cow
- keep the interface as is, later add another e.g. with_owned_top_level taking a cow or string, the current method can then just wrap the reference into a cow
- rename the function to e.g. with_static_top_level, so we can add with_owned_top_level later (mirror the functions on Stream)
- change to a cow now and leak in the owned case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's much more likely that I overestimated how much work it would actually be. If it's as easy as you seem to imply I'll probably do it in this PR, although I'm not sure if Cow<'static, str>
is the best type to put in the API here (it probably is, but I want to think about it a bit more).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can see the top_level string only ends up being used to create the toplevel_stream
in Machine::load_top_level
.
Unless something unbeknownst to me depends on this being a Stream created from a static string I would assume that all that needs to be changed is the plumbing (i.e. argument type to Machine::load_top_level
and type of the toplevel
field in MachineBuilder
) and how the toplevel_stream
is constructed i.e.
- let toplevel_stream = Stream::from_static_string(program, &mut self.machine_st.arena);
+ let toplevel_stream = match program {
+ Cow::Borrowed(program) => Stream::from_static_string(program, &mut self.machine_st.arena),
+ Cow::Owned(program) => Stream::from_owned_string(program, &mut self.machine_st.arena)
+ };
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rather than having Cow in the API it would make sense to have with_static_top_level
taking &'static str
and with_owned_top_level
taking String
rather than having a single method that takes Cow
or it could be a single method taking an impl Into<Cow<'static, str>>
?
@@ -0,0 +1,632 @@ | |||
use std::cmp::Ordering; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the change from lib_machine.rs
to lib_machine/mod.rs
?
My personal preference would be the former.
Depending on the editor I am using I either have a lot of indistinguishable mod.rs
editor tabs or <mod_name>\mod.rs
rather than <mod_name>.rs
editor tabs (i.e. a longer title) which I find annoying to navigate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should enable either the clippy::mod_module_files
or clippy::self_named_module_files
lint to enforce a consistent style for the project?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this mostly to have the tests for the module in a different file, which makes compilation of tests much faster and helped me iterate. This doesn't necessarily need a module directory, but it makes it a bit more organized. I can change it back. Personally I like having the tests separated like this, but this would be a really big change for the whole project so I don't think it is appropriate to do it in this PR, if we decide to actually migrate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry, I missed that this involves splitting lib_machine.rs
into lib_machine/mod.rs
and lib_machine/lib_machine_tests.rs
. Not just renaming lib_machine.rs
to lib_machine/mod.rs
.
With mod.rs
apparently being the common style of the project I think this is fine.
Though this brings me a new point. Why does lib_machine_tests.rs
repeat the parent modules name i.e. why ist it lib_machine/lib_machine_tests.rs
rather than lib_machine/tests.rs
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just forgot to rename it, it should indeed be lib_machine/tests.rs
.
/// A leaf answer with bindings and residual goals. | ||
LeafAnswer { | ||
/// The bindings of variables in the query. | ||
/// | ||
/// Can be empty. | ||
bindings: BTreeMap<String, Term>, | ||
/// Residual goals. | ||
/// | ||
/// Can be empty. | ||
residual_goals: Vec<Term>, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mention in #2582 (comment) that residual goals are not yet implemented. Would it make sense to remove residual_goals
here for now and mark the variant as #[non_exhaustive]
rather than presumably always having an empty residual goal list which might cause someone to believe residual goals are already working.
/// A leaf answer with bindings and residual goals. | |
LeafAnswer { | |
/// The bindings of variables in the query. | |
/// | |
/// Can be empty. | |
bindings: BTreeMap<String, Term>, | |
/// Residual goals. | |
/// | |
/// Can be empty. | |
residual_goals: Vec<Term>, | |
}, | |
/// A leaf answer with bindings. | |
/// | |
/// Might in the future also contain residual goals. | |
#[non_exhaustive] | |
LeafAnswer { | |
/// The bindings of variables in the query. | |
/// | |
/// Can be empty. | |
bindings: BTreeMap<String, Term>, | |
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. I was getting pretty close to having residual goals implemented here, building on top of #2527. I was going to have that in this PR, but I decided that stabilizing the interface and unblocking the other fronts was higher priority.
I don't think this would stay unimplemented for too long, certainly not until the next release, and I also can't think of another field for LeafAnswer::LeafAnswer
apart from bindings
and residual_goals
. Do you think it's worth it having this as #[non_exhaustive]
for safety and then dropping it when (and if) the residual goals are implemented (if they are implemented before the next release and thus still in this major version bump)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Skgland I personally suggest limiting the scope for this. There are a lot of things waiting for this, i.e. (selfishly) #2465 and there are many interlocking PRs/issues.
@bakaq I don't know the level of effort required to implement RGs, but if it's significant I suggest limiting the scope here so we don't get bogged down. Especially if RGs can be positioned as an incremental improvement to this PR.
More small PRs >> less big PRs. That's my 🪙 🪙
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be fine with this either way. It sounded to me like residual goals would potentially be further out, in which case I think it would have been nicer to make the change, but if residual goals are around the corner I think it's fine the way it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to distinguish bindings from other goals that are also reported? Potentially yes. On the other hand, why exactly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand, why exactly?
Mostly for ease of use. Handling a list of (=)/2
goals is much clunkier than handling a dictionary, and that's the common case. Is that not a good enough reason? I've thought about that a lot and I think it is worth it.
|
||
impl Drop for QueryState<'_> { | ||
fn drop(&mut self) { | ||
// This may be wrong if the iterator is not fully consumend, but from testing it seems |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this have a //TODO
or //FIXME
comment until someone can check this for certain?
This completely reworks the library interface, moving in the direction of #2490. This already has
#![deny(missing_docs)]
, but I just did very basic documentation. This isn't intended to be a final version, and we probably want to adjust it a little1, especially in the naming. For now this is mostly about the interface and I haven't properly implemented a lot of them or even migrated the old tests yet, but I plan to do that soon. After we decide a good direction for the APIs, implement most of it and migrate the tests, we can write proper proper documentation, new tests and examples2.I have a rendered version of the
cargo doc
output from this PR here so that it's easier to review the interface. I will try to keep it updated with the tip of this branch.Some things I want to bring attention to:
PrologTerm
3 (the oldValue
) usesOrderedFloat<f64>
,Integer
andRational
, which are from theordered_float
anddashu
crates. This is a semver hazard, because we would need a breaking change every time one of those crates has a breaking change. It's specially hazardous because neither of them are1.0
yet, which tends to mean "unstable" in the Rust ecosystem. I don't think this is too bad actually, as the cadence of major versions of both of those crates seem to be very slow. I also think it wouldn't be very wise to wrap lot ofdashu
in Scryer Prolog to try to avoid this. On the other hand, I don't thinkOrderedFloat<f64>
is necessary inPrologTerm
. Usingf64
instead would mean that we loseEq
, which isn't that bad.StreamConfig
andMachineConfig
are now opaque and (kind of) use the builder pattern, which is a very common thing in the Rust ecosystem and means we can freely change the internal representation of these types (which I plan to do a lot in the future to enable some really cool stuff in Wasm).CompleteAnswer
type. I don't really see a benefit of using it instead of just collectingQueryState
into aVec<LeafAnswer>
or something like that. Please give any examples if you think of any.The following are some things I think I'm already going to do. If someone disagrees with any of them, please let me know!
Machine::new_lib()
should just be aDefault
implementation. Maybe also get rid ofMachine::new()
in favor of something likeMachineConfig::build()
to get the full benefits of the builder pattern.MachineConfig::with_toplevel()
to accept non-static strings, so that people can use a runtime generated toplevel without having to leak. This would need some deep changes probably, so I'm not sure if it's very simple to do.@mthom @Skgland @triska @lucksus I will appreciate if you take a look at this.
Footnotes
For example, I would really like if we leave space in the API for "lazy" APIs that don't need to allocate. They would be specially useful for the C API. ↩
It will also unblock ISSUE-2464: exposing scryer prolog functionality in libscryer_prolog.so for client library consumption #2465, because I think the interface will be mostly stable after that so there will not be many conflicts. ↩
I wanted to call it
Term
, but becauseTerm
already exists in the parser and we have wildcard imports everywhere there are a lot of conflicts that seem kind of complicated to fix. It seems that therebis-dev
branch gets rid of that type, so that's kind of exciting. ↩