Rework library interface #2582

bakaq · 2024-09-30T03:39:49Z

This completely reworks the library interface, moving in the direction of #2490. This already has #![deny(missing_docs)], but I just did very basic documentation. This isn't intended to be a final version, and we probably want to adjust it a little¹, especially in the naming. For now this is mostly about the interface and I haven't properly implemented a lot of them or even migrated the old tests yet, but I plan to do that soon. After we decide a good direction for the APIs, implement most of it and migrate the tests, we can write proper proper documentation, new tests and examples².

I have a rendered version of the cargo doc output from this PR here so that it's easier to review the interface. I will try to keep it updated with the tip of this branch.

Some things I want to bring attention to:

PrologTerm³ (the old Value) uses OrderedFloat<f64>, Integer and Rational, which are from the ordered_float and dashu crates. This is a semver hazard, because we would need a breaking change every time one of those crates has a breaking change. It's specially hazardous because neither of them are 1.0 yet, which tends to mean "unstable" in the Rust ecosystem. I don't think this is too bad actually, as the cadence of major versions of both of those crates seem to be very slow. I also think it wouldn't be very wise to wrap lot of dashu in Scryer Prolog to try to avoid this. On the other hand, I don't think OrderedFloat<f64> is necessary in PrologTerm. Using f64 instead would mean that we lose Eq, which isn't that bad.
StreamConfig and MachineConfig are now opaque and (kind of) use the builder pattern, which is a very common thing in the Rust ecosystem and means we can freely change the internal representation of these types (which I plan to do a lot in the future to enable some really cool stuff in Wasm).
I don't have a CompleteAnswer type. I don't really see a benefit of using it instead of just collecting QueryState into a Vec<LeafAnswer> or something like that. Please give any examples if you think of any.

The following are some things I think I'm already going to do. If someone disagrees with any of them, please let me know!

I think that Machine::new_lib() should just be a Default implementation. Maybe also get rid of Machine::new() in favor of something like MachineConfig::build() to get the full benefits of the builder pattern.
Change MachineConfig::with_toplevel() to accept non-static strings, so that people can use a runtime generated toplevel without having to leak. This would need some deep changes probably, so I'm not sure if it's very simple to do.

@mthom @Skgland @triska @lucksus I will appreciate if you take a look at this.

For example, I would really like if we leave space in the API for "lazy" APIs that don't need to allocate. They would be specially useful for the C API. ↩
It will also unblock ISSUE-2464: exposing scryer prolog functionality in libscryer_prolog.so for client library consumption #2465, because I think the interface will be mostly stable after that so there will not be many conflicts. ↩
I wanted to call it Term, but because Term already exists in the parser and we have wildcard imports everywhere there are a lot of conflicts that seem kind of complicated to fix. It seems that the rebis-dev branch gets rid of that type, so that's kind of exciting. ↩

bakaq · 2024-10-12T23:28:31Z

Migrated the tests (except the "integration tests", which I already did in a34996a and will reintegrate into this branch soon), and benches, so the CI passes now.
Renamed MachineConfig to MachineBuilder and removed Machine::new() and similar to solidify the builder pattern. The builder pattern will make it very smooth to add arbitrary additional configuration in the future.
Removed some "extra" helper methods to make the API surface for this MVP (Minimum Viable Product) as small as possible. I believe the current interface is enough for almost all use cases, and future improvements would be to add helper methods for common patterns (like just checking if a query succeeds).

As always, documentation current to the tip of this branch is available here.

bakaq · 2024-10-13T00:52:16Z

Actually, the integration tests depend on JSON serialization. Should I integrate #2493 into this PR? I think it would be a better idea to merge this first and then rebase that one onto master, because I think JSON serialization still needs a lot of discussion that is mostly disconnected to the discussion that this PR needs.

bakaq · 2024-12-03T21:34:54Z

Reviewing this, I think the only thing I still want to change here is for <QueryState as Iterator>::Item to be just LeafAnswer instead of Result<LeafAnswer, String>, because LeafAnswer already has an Exception(Term) variant, or maybe Result<LeafAnswer, Term> to special-case errors from other exceptions (I think this may be better).

bakaq · 2024-12-09T00:36:12Z

Ok, with this I think it is good enough to stabilize. As always, the up-to-date rendered documentation is here.

Also, reminding that this is just the API skeleton, although most of it is already implemented. This still doesn't have support for residual goals for example, but it leaves space for it to be implemented later without an API breaking change.

Stabilizing the Rust API will unblock parallel ventures such as #2465 and improving the Wasm interface because we will all have a stable foundation to build upon. There's still work to be done here, but it will be almost completely independent of these other ventures.

Pinging relevant people @mthom @jjtolton @Skgland

Skgland

A few notes and suggestions for things we could do while we are at it.
Otherwise this looks good to me.

Skgland · 2024-12-10T18:35:34Z

src/machine/config.rs

+enum StreamConfigInner {
    Stdio,
    Memory,
 }


Suggested change

enum StreamConfigInner {

Stdio,

Memory,

}

#[derive(Default)]

enum StreamConfigInner {

Stdio,

#[default]

Memory,

}

Skgland · 2024-12-10T18:36:36Z

src/machine/config.rs

+/// Describes how the streams of a [`Machine`](crate::Machine) will be handled.
+pub struct StreamConfig {
+    inner: StreamConfigInner,
+}
+
+impl Default for StreamConfig {
+    /// Defaults to using in-memory streams.
+    fn default() -> Self {
+        Self::in_memory()
+    }
+}


With the suggested changes for StreamConfigInner below, the Default impl could be derived.

Suggested change

/// Describes how the streams of a [`Machine`](crate::Machine) will be handled.

pub struct StreamConfig {

inner: StreamConfigInner,

}

impl Default for StreamConfig {

/// Defaults to using in-memory streams.

fn default() -> Self {

Self::in_memory()

}

}

/// Describes how the streams of a [`Machine`](crate::Machine) will be handled.

#[derive(Default)]

pub struct StreamConfig {

inner: StreamConfigInner,

}

src/machine/config.rs

Skgland · 2024-12-10T19:11:26Z

src/machine/config.rs

+/// Describes how a [`Machine`](crate::Machine) will be configured.
+pub struct MachineBuilder {
+    pub(crate) streams: StreamConfig,
+    pub(crate) toplevel: &'static str,


Do we want to require the top level to be a &'static str?

I could see it being useful e.g. to use a file loaded at runtime to be used,
in that case we could use a Cow<'static, str> to allow either a &'static str or a String to be used.

We would need to adjust MachineBuilder::with_toplevel and Machine::load_top_level accordingly.

Machine::load_top_level would need to construct toplevel_stream depending on the Cow variant we have, either via Stream::from_static_string or via Stream::from_owned_string.

Maybe a Stream::from_string taking a Cow<&'static, str> would make sense? (name is debatable other suggestions from_cow_string or from_static_or_owned_string)

I mentioned this above actually. I decided not going this way because it seemed like it would need really deep changes to accommodate properly. Maybe just leaking for now if it is too complicated is a good idea to get the interface right faster.

I would have expected this to rather straight forward, but it's not unlike that I missed something which makes this more difficult than immediately obvious.

I see 4 options here:

keep the interface as is for now, making a breaking change later when changing to cow

keep the interface as is, later add another e.g. with_owned_top_level taking a cow or string, the current method can then just wrap the reference into a cow

rename the function to e.g. with_static_top_level, so we can add with_owned_top_level later (mirror the functions on Stream)

change to a cow now and leak in the owned case

It's much more likely that I overestimated how much work it would actually be. If it's as easy as you seem to imply I'll probably do it in this PR, although I'm not sure if Cow<'static, str> is the best type to put in the API here (it probably is, but I want to think about it a bit more).

As far as I can see the top_level string only ends up being used to create the toplevel_stream in Machine::load_top_level.

Unless something unbeknownst to me depends on this being a Stream created from a static string I would assume that all that needs to be changed is the plumbing (i.e. argument type to Machine::load_top_level and type of the toplevel field in MachineBuilder) and how the toplevel_stream is constructed i.e.

- let toplevel_stream = Stream::from_static_string(program, &mut self.machine_st.arena); + let toplevel_stream = match program { + Cow::Borrowed(program) => Stream::from_static_string(program, &mut self.machine_st.arena), + Cow::Owned(program) => Stream::from_owned_string(program, &mut self.machine_st.arena) + };

Maybe rather than having Cow in the API it would make sense to have with_static_top_level taking &'static str and with_owned_top_level taking String rather than having a single method that takes Cow or it could be a single method taking an impl Into<Cow<'static, str>>?

Skgland · 2024-12-10T19:19:34Z

src/machine/lib_machine/mod.rs

@@ -0,0 +1,632 @@
+use std::cmp::Ordering;


Why the change from lib_machine.rs to lib_machine/mod.rs?

My personal preference would be the former.
Depending on the editor I am using I either have a lot of indistinguishable mod.rs editor tabs or <mod_name>\mod.rs rather than <mod_name>.rs editor tabs (i.e. a longer title) which I find annoying to navigate.

Maybe we should enable either the clippy::mod_module_files or clippy::self_named_module_files lint to enforce a consistent style for the project?

I did this mostly to have the tests for the module in a different file, which makes compilation of tests much faster and helped me iterate. This doesn't necessarily need a module directory, but it makes it a bit more organized. I can change it back. Personally I like having the tests separated like this, but this would be a really big change for the whole project so I don't think it is appropriate to do it in this PR, if we decide to actually migrate.

Ah, sorry, I missed that this involves splitting lib_machine.rs into lib_machine/mod.rs and lib_machine/lib_machine_tests.rs. Not just renaming lib_machine.rs to lib_machine/mod.rs.
With mod.rs apparently being the common style of the project I think this is fine.
Though this brings me a new point. Why does lib_machine_tests.rs repeat the parent modules name i.e. why ist it lib_machine/lib_machine_tests.rs rather than lib_machine/tests.rs?

I just forgot to rename it, it should indeed be lib_machine/tests.rs.

Skgland · 2024-12-10T19:24:15Z

src/machine/lib_machine/mod.rs

+    /// A leaf answer with bindings and residual goals.
+    LeafAnswer {
+        /// The bindings of variables in the query.
+        ///
+        /// Can be empty.
+        bindings: BTreeMap<String, Term>,
+        /// Residual goals.
+        ///
+        /// Can be empty.
+        residual_goals: Vec<Term>,
+    },


You mention in #2582 (comment) that residual goals are not yet implemented. Would it make sense to remove residual_goals here for now and mark the variant as #[non_exhaustive] rather than presumably always having an empty residual goal list which might cause someone to believe residual goals are already working.

Suggested change

/// A leaf answer with bindings and residual goals.

LeafAnswer {

/// The bindings of variables in the query.

///

/// Can be empty.

bindings: BTreeMap<String, Term>,

/// Residual goals.

///

/// Can be empty.

residual_goals: Vec<Term>,

},

/// A leaf answer with bindings.

///

/// Might in the future also contain residual goals.

#[non_exhaustive]

LeafAnswer {

/// The bindings of variables in the query.

///

/// Can be empty.

bindings: BTreeMap<String, Term>,

},

Interesting. I was getting pretty close to having residual goals implemented here, building on top of #2527. I was going to have that in this PR, but I decided that stabilizing the interface and unblocking the other fronts was higher priority.

I don't think this would stay unimplemented for too long, certainly not until the next release, and I also can't think of another field for LeafAnswer::LeafAnswer apart from bindings and residual_goals. Do you think it's worth it having this as #[non_exhaustive] for safety and then dropping it when (and if) the residual goals are implemented (if they are implemented before the next release and thus still in this major version bump)?

@Skgland I personally suggest limiting the scope for this. There are a lot of things waiting for this, i.e. (selfishly) #2465 and there are many interlocking PRs/issues.

@bakaq I don't know the level of effort required to implement RGs, but if it's significant I suggest limiting the scope here so we don't get bogged down. Especially if RGs can be positioned as an incremental improvement to this PR.

More small PRs >> less big PRs. That's my 🪙 🪙

I would be fine with this either way. It sounded to me like residual goals would potentially be further out, in which case I think it would have been nicer to make the change, but if residual goals are around the corner I think it's fine the way it is.

Does it make sense to distinguish bindings from other goals that are also reported? Potentially yes. On the other hand, why exactly?

On the other hand, why exactly?

Mostly for ease of use. Handling a list of (=)/2 goals is much clunkier than handling a dictionary, and that's the common case. Is that not a good enough reason? I've thought about that a lot and I think it is worth it.

Skgland · 2024-12-10T19:25:53Z

src/machine/lib_machine/mod.rs

+
+impl Drop for QueryState<'_> {
+    fn drop(&mut self) {
+        // This may be wrong if the iterator is not fully consumend, but from testing it seems


Should this have a //TODO or //FIXME comment until someone can check this for certain?

jjtolton · 2024-12-11T15:41:28Z

@bakaq should I rebase #2465 on top of this?

bakaq mentioned this pull request Sep 30, 2024

Tracking issue: Rust library interface overhaul #2490

Open

7 tasks

bakaq force-pushed the rework_library_interface branch from a5f87ec to 8b57b35 Compare October 12, 2024 22:29

bakaq mentioned this pull request Oct 13, 2024

Toplevel reimplementation with leaf answer callbacks #2527

Merged

bakaq force-pushed the rework_library_interface branch from 8b57b35 to 463d44a Compare October 13, 2024 19:28

bakaq added 22 commits December 8, 2024 20:18

Rename PrologTerm

8e7dc9d

Rename LeafAnswer

29fc55c

Machine and stream config rework

658e39a

Basic docs and non_exhaustive for PrologTerm

71dec62

Associated functions for creating PrologTerm

4480e7c

Conjunctions, disjunction, and LeafAnswer to PrologTerm

e74fd11

More PrologTerm documentation

0433706

LeafAnswer docs and success checking methods

1375f44

Docs for run_binary()

c13fc2d

Docs for Machine and QueryState

cb040da

Add interfaces for QueryState methods

dc8348b

Document test methods

e6cc408

#[deny(missing_docs)]

cab6173

Fix Machine links

79fbd9e

MachineBuilder

d336cbc

Remove parsed_results.rs

33d8abf

Rename PrologTerm to Term

bb5adba

Shrink MVP API surface

ec6286f

Separate lib_machine tests into separate file

2f82c78

Migrate tests to new API

e21c772

Migrate benches

3d3baee

Handle errors in QueryState

9265d66

bakaq force-pushed the rework_library_interface branch from 463d44a to 9265d66 Compare December 8, 2024 23:18

bakaq marked this pull request as ready for review December 9, 2024 00:26

Skgland reviewed Dec 10, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework library interface #2582

Rework library interface #2582

bakaq commented Sep 30, 2024 •

edited

Loading

bakaq commented Oct 12, 2024

bakaq commented Oct 13, 2024

bakaq commented Dec 3, 2024

bakaq commented Dec 9, 2024

Skgland left a comment

Skgland Dec 10, 2024

Skgland Dec 10, 2024

Skgland Dec 10, 2024 •

edited

Loading

bakaq Dec 10, 2024

Skgland Dec 10, 2024

bakaq Dec 10, 2024

Skgland Dec 11, 2024

Skgland Dec 11, 2024

Skgland Dec 10, 2024

Skgland Dec 10, 2024

bakaq Dec 10, 2024

Skgland Dec 11, 2024

bakaq Dec 11, 2024

Skgland Dec 10, 2024 •

edited

Loading

bakaq Dec 10, 2024

jjtolton Dec 10, 2024

Skgland Dec 10, 2024

triska Dec 10, 2024

bakaq Dec 10, 2024

Skgland Dec 10, 2024

jjtolton commented Dec 11, 2024

Rework library interface #2582

Are you sure you want to change the base?

Rework library interface #2582

Conversation

bakaq commented Sep 30, 2024 • edited Loading

Footnotes

bakaq commented Oct 12, 2024

bakaq commented Oct 13, 2024

bakaq commented Dec 3, 2024

bakaq commented Dec 9, 2024

Skgland left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Skgland Dec 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Skgland Dec 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jjtolton commented Dec 11, 2024

bakaq commented Sep 30, 2024 •

edited

Loading

Skgland Dec 10, 2024 •

edited

Loading

Skgland Dec 10, 2024 •

edited

Loading