Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework library interface #2582

Open
wants to merge 22 commits into
base: master
Choose a base branch
from
Open

Conversation

bakaq
Copy link
Contributor

@bakaq bakaq commented Sep 30, 2024

This completely reworks the library interface, moving in the direction of #2490. This already has #![deny(missing_docs)], but I just did very basic documentation. This isn't intended to be a final version, and we probably want to adjust it a little1, especially in the naming. For now this is mostly about the interface and I haven't properly implemented a lot of them or even migrated the old tests yet, but I plan to do that soon. After we decide a good direction for the APIs, implement most of it and migrate the tests, we can write proper proper documentation, new tests and examples2.

I have a rendered version of the cargo doc output from this PR here so that it's easier to review the interface. I will try to keep it updated with the tip of this branch.

Some things I want to bring attention to:

  • PrologTerm3 (the old Value) uses OrderedFloat<f64>, Integer and Rational, which are from the ordered_float and dashu crates. This is a semver hazard, because we would need a breaking change every time one of those crates has a breaking change. It's specially hazardous because neither of them are 1.0 yet, which tends to mean "unstable" in the Rust ecosystem. I don't think this is too bad actually, as the cadence of major versions of both of those crates seem to be very slow. I also think it wouldn't be very wise to wrap lot of dashu in Scryer Prolog to try to avoid this. On the other hand, I don't think OrderedFloat<f64> is necessary in PrologTerm. Using f64 instead would mean that we lose Eq, which isn't that bad.
  • StreamConfig and MachineConfig are now opaque and (kind of) use the builder pattern, which is a very common thing in the Rust ecosystem and means we can freely change the internal representation of these types (which I plan to do a lot in the future to enable some really cool stuff in Wasm).
  • I don't have a CompleteAnswer type. I don't really see a benefit of using it instead of just collecting QueryState into a Vec<LeafAnswer> or something like that. Please give any examples if you think of any.

The following are some things I think I'm already going to do. If someone disagrees with any of them, please let me know!

  • I think that Machine::new_lib() should just be a Default implementation. Maybe also get rid of Machine::new() in favor of something like MachineConfig::build() to get the full benefits of the builder pattern.
  • Change MachineConfig::with_toplevel() to accept non-static strings, so that people can use a runtime generated toplevel without having to leak. This would need some deep changes probably, so I'm not sure if it's very simple to do.

@mthom @Skgland @triska @lucksus I will appreciate if you take a look at this.

Footnotes

  1. For example, I would really like if we leave space in the API for "lazy" APIs that don't need to allocate. They would be specially useful for the C API.

  2. It will also unblock ISSUE-2464: exposing scryer prolog functionality in libscryer_prolog.so for client library consumption #2465, because I think the interface will be mostly stable after that so there will not be many conflicts.

  3. I wanted to call it Term, but because Term already exists in the parser and we have wildcard imports everywhere there are a lot of conflicts that seem kind of complicated to fix. It seems that the rebis-dev branch gets rid of that type, so that's kind of exciting.

@bakaq
Copy link
Contributor Author

bakaq commented Oct 12, 2024

  • Migrated the tests (except the "integration tests", which I already did in a34996a and will reintegrate into this branch soon), and benches, so the CI passes now.
  • Renamed MachineConfig to MachineBuilder and removed Machine::new() and similar to solidify the builder pattern. The builder pattern will make it very smooth to add arbitrary additional configuration in the future.
  • Removed some "extra" helper methods to make the API surface for this MVP (Minimum Viable Product) as small as possible. I believe the current interface is enough for almost all use cases, and future improvements would be to add helper methods for common patterns (like just checking if a query succeeds).

As always, documentation current to the tip of this branch is available here.

@bakaq
Copy link
Contributor Author

bakaq commented Oct 13, 2024

Actually, the integration tests depend on JSON serialization. Should I integrate #2493 into this PR? I think it would be a better idea to merge this first and then rebase that one onto master, because I think JSON serialization still needs a lot of discussion that is mostly disconnected to the discussion that this PR needs.

@bakaq
Copy link
Contributor Author

bakaq commented Dec 3, 2024

Reviewing this, I think the only thing I still want to change here is for <QueryState as Iterator>::Item to be just LeafAnswer instead of Result<LeafAnswer, String>, because LeafAnswer already has an Exception(Term) variant, or maybe Result<LeafAnswer, Term> to special-case errors from other exceptions (I think this may be better).

@bakaq bakaq force-pushed the rework_library_interface branch from 463d44a to 9265d66 Compare December 8, 2024 23:18
@bakaq bakaq marked this pull request as ready for review December 9, 2024 00:26
@bakaq
Copy link
Contributor Author

bakaq commented Dec 9, 2024

Ok, with this I think it is good enough to stabilize. As always, the up-to-date rendered documentation is here.

Also, reminding that this is just the API skeleton, although most of it is already implemented. This still doesn't have support for residual goals for example, but it leaves space for it to be implemented later without an API breaking change.

Stabilizing the Rust API will unblock parallel ventures such as #2465 and improving the Wasm interface because we will all have a stable foundation to build upon. There's still work to be done here, but it will be almost completely independent of these other ventures.

Pinging relevant people @mthom @jjtolton @Skgland

Copy link
Contributor

@Skgland Skgland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few notes and suggestions for things we could do while we are at it.
Otherwise this looks good to me.

Comment on lines +40 to 43
enum StreamConfigInner {
Stdio,
Memory,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
enum StreamConfigInner {
Stdio,
Memory,
}
#[derive(Default)]
enum StreamConfigInner {
Stdio,
#[default]
Memory,
}

Comment on lines +10 to +20
/// Describes how the streams of a [`Machine`](crate::Machine) will be handled.
pub struct StreamConfig {
inner: StreamConfigInner,
}

impl Default for StreamConfig {
/// Defaults to using in-memory streams.
fn default() -> Self {
Self::in_memory()
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the suggested changes for StreamConfigInner below, the Default impl could be derived.

Suggested change
/// Describes how the streams of a [`Machine`](crate::Machine) will be handled.
pub struct StreamConfig {
inner: StreamConfigInner,
}
impl Default for StreamConfig {
/// Defaults to using in-memory streams.
fn default() -> Self {
Self::in_memory()
}
}
/// Describes how the streams of a [`Machine`](crate::Machine) will be handled.
#[derive(Default)]
pub struct StreamConfig {
inner: StreamConfigInner,
}

src/machine/config.rs Show resolved Hide resolved
/// Describes how a [`Machine`](crate::Machine) will be configured.
pub struct MachineBuilder {
pub(crate) streams: StreamConfig,
pub(crate) toplevel: &'static str,
Copy link
Contributor

@Skgland Skgland Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to require the top level to be a &'static str?

I could see it being useful e.g. to use a file loaded at runtime to be used,
in that case we could use a Cow<'static, str> to allow either a &'static str or a String to be used.

We would need to adjust MachineBuilder::with_toplevel and Machine::load_top_level accordingly.

Machine::load_top_level would need to construct toplevel_stream depending on the Cow variant we have, either via Stream::from_static_string or via Stream::from_owned_string.

Maybe a Stream::from_string taking a Cow<&'static, str> would make sense? (name is debatable other suggestions from_cow_string or from_static_or_owned_string)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned this above actually. I decided not going this way because it seemed like it would need really deep changes to accommodate properly. Maybe just leaking for now if it is too complicated is a good idea to get the interface right faster.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have expected this to rather straight forward, but it's not unlike that I missed something which makes this more difficult than immediately obvious.

I see 4 options here:

  • keep the interface as is for now, making a breaking change later when changing to cow
  • keep the interface as is, later add another e.g. with_owned_top_level taking a cow or string, the current method can then just wrap the reference into a cow
  • rename the function to e.g. with_static_top_level, so we can add with_owned_top_level later (mirror the functions on Stream)
  • change to a cow now and leak in the owned case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's much more likely that I overestimated how much work it would actually be. If it's as easy as you seem to imply I'll probably do it in this PR, although I'm not sure if Cow<'static, str> is the best type to put in the API here (it probably is, but I want to think about it a bit more).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can see the top_level string only ends up being used to create the toplevel_stream in Machine::load_top_level.

Unless something unbeknownst to me depends on this being a Stream created from a static string I would assume that all that needs to be changed is the plumbing (i.e. argument type to Machine::load_top_level and type of the toplevel field in MachineBuilder) and how the toplevel_stream is constructed i.e.

-         let toplevel_stream = Stream::from_static_string(program, &mut self.machine_st.arena);
+         let toplevel_stream = match program {
+             Cow::Borrowed(program) => Stream::from_static_string(program, &mut self.machine_st.arena),
+             Cow::Owned(program) => Stream::from_owned_string(program, &mut self.machine_st.arena)
+         };

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rather than having Cow in the API it would make sense to have with_static_top_level taking &'static str and with_owned_top_level taking String rather than having a single method that takes Cow or it could be a single method taking an impl Into<Cow<'static, str>>?

@@ -0,0 +1,632 @@
use std::cmp::Ordering;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the change from lib_machine.rs to lib_machine/mod.rs?

My personal preference would be the former.
Depending on the editor I am using I either have a lot of indistinguishable mod.rs editor tabs or <mod_name>\mod.rs rather than <mod_name>.rs editor tabs (i.e. a longer title) which I find annoying to navigate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should enable either the clippy::mod_module_files or clippy::self_named_module_files lint to enforce a consistent style for the project?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this mostly to have the tests for the module in a different file, which makes compilation of tests much faster and helped me iterate. This doesn't necessarily need a module directory, but it makes it a bit more organized. I can change it back. Personally I like having the tests separated like this, but this would be a really big change for the whole project so I don't think it is appropriate to do it in this PR, if we decide to actually migrate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry, I missed that this involves splitting lib_machine.rs into lib_machine/mod.rs and lib_machine/lib_machine_tests.rs. Not just renaming lib_machine.rs to lib_machine/mod.rs.
With mod.rs apparently being the common style of the project I think this is fine.
Though this brings me a new point. Why does lib_machine_tests.rs repeat the parent modules name i.e. why ist it lib_machine/lib_machine_tests.rs rather than lib_machine/tests.rs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just forgot to rename it, it should indeed be lib_machine/tests.rs.

Comment on lines +34 to +44
/// A leaf answer with bindings and residual goals.
LeafAnswer {
/// The bindings of variables in the query.
///
/// Can be empty.
bindings: BTreeMap<String, Term>,
/// Residual goals.
///
/// Can be empty.
residual_goals: Vec<Term>,
},
Copy link
Contributor

@Skgland Skgland Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mention in #2582 (comment) that residual goals are not yet implemented. Would it make sense to remove residual_goals here for now and mark the variant as #[non_exhaustive] rather than presumably always having an empty residual goal list which might cause someone to believe residual goals are already working.

Suggested change
/// A leaf answer with bindings and residual goals.
LeafAnswer {
/// The bindings of variables in the query.
///
/// Can be empty.
bindings: BTreeMap<String, Term>,
/// Residual goals.
///
/// Can be empty.
residual_goals: Vec<Term>,
},
/// A leaf answer with bindings.
///
/// Might in the future also contain residual goals.
#[non_exhaustive]
LeafAnswer {
/// The bindings of variables in the query.
///
/// Can be empty.
bindings: BTreeMap<String, Term>,
},

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. I was getting pretty close to having residual goals implemented here, building on top of #2527. I was going to have that in this PR, but I decided that stabilizing the interface and unblocking the other fronts was higher priority.

I don't think this would stay unimplemented for too long, certainly not until the next release, and I also can't think of another field for LeafAnswer::LeafAnswer apart from bindings and residual_goals. Do you think it's worth it having this as #[non_exhaustive] for safety and then dropping it when (and if) the residual goals are implemented (if they are implemented before the next release and thus still in this major version bump)?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Skgland I personally suggest limiting the scope for this. There are a lot of things waiting for this, i.e. (selfishly) #2465 and there are many interlocking PRs/issues.

@bakaq I don't know the level of effort required to implement RGs, but if it's significant I suggest limiting the scope here so we don't get bogged down. Especially if RGs can be positioned as an incremental improvement to this PR.

More small PRs >> less big PRs. That's my 🪙 🪙

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be fine with this either way. It sounded to me like residual goals would potentially be further out, in which case I think it would have been nicer to make the change, but if residual goals are around the corner I think it's fine the way it is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to distinguish bindings from other goals that are also reported? Potentially yes. On the other hand, why exactly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand, why exactly?

Mostly for ease of use. Handling a list of (=)/2 goals is much clunkier than handling a dictionary, and that's the common case. Is that not a good enough reason? I've thought about that a lot and I think it is worth it.


impl Drop for QueryState<'_> {
fn drop(&mut self) {
// This may be wrong if the iterator is not fully consumend, but from testing it seems
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this have a //TODO or //FIXME comment until someone can check this for certain?

@jjtolton
Copy link

@bakaq should I rebase #2465 on top of this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants