Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce polymorphic schemas #445

Closed
wants to merge 13 commits into from
35 changes: 35 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -290,6 +290,41 @@ You can also write sequence schemas that expect particular values in specific po
;; (not (instance? java.lang.Number "4"))]
```

### Polymorphic schemas

Macros such as `s/defn` can define functions with polymorphic schemas. At runtime, they will be checked
by expanding polymorphic variables to their most general values. For example, at runtime `identity-mono`
and `identity-poly` are instrumented in the same way:

```clojure
(s/defn identity-mono :- s/Any
[x :- s/Any]
x)

(s/defn :all [T]
identity-poly :- T
[x :- T]
x)
```

The actual value chosen as the "most general" depends on the polymorphic variables kind and should not be
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't understand from here down. Are you basing this design off an existing language or library that I can read about to get a better understanding? (I'm pretty familiar with Haskell and Scala's typesystems but less so with e.g. depenent types).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there's a lot going on here and I prematurely jumped into too many details. I wrote about about some of it here, please take a look.

I will work on communicating better in the the docs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a bunch of readability improvements to the gist.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thanks -- that's very helpful context, and I think I understand it now.

I guess my initial reaction is that it's interesting, but I'm unsure about whether the utility of dotted types (and the corresponding generative testing stuff) pays for its complexity. I generally try to err on the side of keeping things simple and adding complexity only where it's clear there's high value for the user, and in this case I'm unsure both about how many people would use this functionality, and how many bugs would actually be caught by the testing. What are your thoughts about these questions?

I'll also mull it over and read some more. Not saying I'm necessarily opposed, but just want to make sure I understand the tradeoffs. Not sure if anyone else here has an opinion?

Copy link
Contributor Author

@frenchy64 frenchy64 Aug 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I know you're focussing on the utility dotted variable functionality, but I'll try and step back even further and give my feeling on all the features we've discussed.

how many people would use this functionality

It's a good question. I'm drawing on several data points.

The inspiration for the "just check a function against its schema" operation comes from clojure.spec, which exposes something similar called clojure.spec.test.alpha/check which will automatically generate test cases for your function based on its spec. This is popular functionality in private code IME and seemingly in open source projects on github.

I don't think something like clojure.spec.test.alpha/check exists for schema beyond schema.generators.fn-schema/check in #444. That alone I think would be a good addition to schema, even without polymorphism. Adding polymorphic types would make it more expressive, dotted variables more-so. Making schemas more expressive means that people can use check on more functions (I'm guessing that would go down well).

From personal experience, I've seen code bases that use schema that add polymorphic variables as comments to better explain what the function does. I'm guessing converting them to schemas and having them verified for "free" would be compelling.

how many bugs would actually be caught by the testing

It would be interesting to know how effective schema.generators.fn-schema/check (fuzz testing for s/defn functions) might be:

  1. without polymorphism
  2. with (non-dotted) polymorphism
  3. with dotted polymorphism

I'm guessing there are diminishing returns because the number of functions that require these features diminishes quickly (eg., 1 is any CRUD app fn---tons of those, 2 might be an internal library function---maybe 20x less, 3 might be a fleshed out and generalized library function released to the world---maybe 100x less).

But if we treat each function on its own merits, IME its very often that something that requires dotted polymorphism is a manually unrolled function with 4+ arities. Some of these expansions are monstrously complex (check out every-pred and some-fn). I'm not aware of any tool that can fuzz test something like that beyond writing a custom test.check expression. To me at least, that's a compelling use-case.

unsure about whether the utility of dotted types (and the corresponding generative testing stuff) pays for its complexity

It's also worth entertaining the possibility that these features go down great. It might invite an even longer list of complex feature requests for being able to check all kinds of functions (eg., return values per arity, transducers). Is that where we want schema to go? Personally, I think it would be awesome, but I'm not your average Clojure programmer. If I put my CRUD-developer hat on (that's my day job), then yes, perhaps this dotted polymorphic stuff is not that important and the state of things is good enough.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inspiration for the "just check a function against its schema" operation comes from clojure.spec, which exposes something similar called clojure.spec.test.alpha/check which will automatically generate test cases for your function based on its spec.

Interesting, I didn't know about that, thanks. I love generative testing in general, but my intuition was that most of the value came from richer invariants between the input and output than can be expressed by types/schemas. For example, when testing every-pred I would want to make sure I actually returned the right value, not just the right type, over all the various arities; I have always assumed that turning on validation for a couple hand-specified unit tests gives most of the value you could hope for from simple schema validation.

Since I think the utility of this automatic validation is at the core of the discussion here, I'd like to try to get on the same page and dispel any misunderstandings I might have. I looked through the first few pages of search results you linked to and didn't find any examples that were particularly convincing to me. (One thing I did find is that spec supports invariants that relate the input and output, which definitely provides more power and could make for more meaningful generative testing). Do you remember any specific cases where it's felt especially valuable?

One more question, would it be practical to incubate this feature in a downstream library? That's generally the approach I've tried to take with non-core features like generators when possible. (The generators library might actually be a good fit for this functionality, actually). I know there are some interactions with richness of function schemas and syntax which could be tricky, happy to discuss that if this seems like a potentially viable path.

If you feel really strongly that we should add this here now, I don't want to block you. I agree this could go down great and am certainly excited about new ideas that could improve the quality of testing and enhance the schema ecosystem overall, but if possible I'd really prefer to have stronger understanding of the value proposition (e.g. from examples or community support) before adding complexity here.

Hope that makes sense, thanks for bearing with me!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you remember any specific cases where it's felt especially valuable?

There aren't really situations where this feature (stest/check) competes with hand-crafted properties. They're complimentary. IMO stest/check shares similarities with static type checking: just write your code and magic happens to reveal some logical inconsistencies. The more specific your types/specs, the more detailed the checks, and the more you can "lean" on the tool.

It's also interesting to ponder a situation where the programmer is unfamiliar with test.check or even generative testing and just knows Clojure and clojure.spec. Right off the bat, without stest/check they wouldn't use generative testing very effectively (if at all).

But let's say they try stest/check because some tutorial told them to. Eventually, the generator for some argument needs tweaking, so they learn enough about generators to override a custom generator with clojure.spec.alpha/with-gen.

Maybe they use gen/generate to grab values and write one-off (non-shrinking) properties. Then, they might learn that using clojure.test.check/for-all gives you shrinking, so they learn that.

At this point, they may even use both stest/check and custom properties in tandem. They rely on a baseline beyond "it compiles", or even "it's instrumented": your function is fuzz tested with all combinations of (general) specs.

This is more the narrative I see. Certainly stest/check directly inspired me to learn how to create my own properties.

One more question, would it be practical to incubate this feature in a downstream library?

Yes I think so. The only drawback would be forcing users to use wrappers for s/defn and s/=> that provide the new syntax.

But the big advantage is that you could try out these features for yourself to get a better feel for them.

If you feel really strongly that we should add this here now, I don't want to block you.

No, I'd much rather collaborate with you to find something that matches the spirit of this library.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've fleshed out the idea of poly schemas + generative testing in a separate place. We can now see how this might look in action (you can try it by cloning the repo and lein repl).

Here's how I'd check clojure.core/comp: https://github.com/frenchy64/schema-incubator/blob/main/test/cljc/com/ambrosebs/schema_incubator/poly/validate_test.cljc#L220-L276

The basic idea is to enumerate all the possible arities and use quick-check to check a good subset of them.

Here's a WIP of checking that every-pred short-circuits with one arg: https://github.com/frenchy64/schema-incubator/blob/be7f31a16b41dc2d7c8a9fba15da0833c57a85d9/test/cljc/com/ambrosebs/schema_incubator/poly/validate_test.cljc#L169-L218

Same idea, but needs more work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, I really appreciate your flexibility and understanding! Sorry I dropped out from this conversation, things got a little hectic at home but I will definitely take a look as soon as I get a chance.

relied on. In the future, polymorphic variables may be instantiated with other values.

Dotted variables have an internal "most general" value which represents a homogeneous sequence of
generalized templates (ie., generalizing variables to the left of the `:..`).
The following two functions are instrumented in the same way.

```clojure
(s/defn :all [S T :..]
rest-args-poly :- S
[& xs :- {:a S :b T} :.. T]
x)

(s/defn rest-args-mono :- s/Any
[& xs :- [{:a s/Any :b s/Any}]]
x)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you expand on this example please? I'm not following what the dotted variables mean or when I would want to use them.

```

### Other schema types

[`schema.core`](https://github.com/plumatic/schema/blob/master/src/cljc/schema/core.cljc) provides many more utilities for building schemas, including `maybe`, `eq`, `enum`, `pred`, `conditional`, `cond-pre`, `constrained`, and more. Here are a few of our favorites:
Expand Down
154 changes: 132 additions & 22 deletions src/clj/schema/macros.clj
Original file line number Diff line number Diff line change
Expand Up @@ -147,12 +147,38 @@
(defn extract-arrow-schematized-element
"Take a nonempty seq, which may start like [a ...] or [a :- schema ...], and return
a list of [first-element-with-schema-attached rest-elements]"
[env s]
(assert (seq s))
(let [[f & more] s]
(if (= :- (first more))
[(normalized-metadata env f (second more)) (drop 2 more)]
[(normalized-metadata env f nil) more])))
([env s] (extract-arrow-schematized-element env s false))
([env s rest-arg?]
(assert (seq s))
(let [[f & more] s
[arg more] (if (= :- (first more))
(let [[arg more] [(normalized-metadata env f (second more)) (drop 2 more)]
[arg more] (if (and rest-arg? (= 2 (count more)))
(if (= :.. (first more))
(let [dvar (second more)]
(assert! (and (symbol? dvar)
(not (namespace dvar)))
(str "Bad '&' binding: dotted variable must be simple symbol: " (pr-str dvar)))
[(vary-meta arg update :schema
(fn [template]
`(let [template# (fn [~dvar] ~template)]
(cond
(instance? schema.core.AnyDotted ~dvar)
[(template# (:schema ~dvar))]

(vector? ~dvar)
[(apply s/cond-pre ~dvar)]

:else (throw (ex-info (str ~(format "Unknown value for dotted variable %s: " dvar)
~dvar)
{}))))))
(drop 2 more)]))
[arg more])]
[arg more])
[(normalized-metadata env f nil) more])]
(when rest-arg?
(assert! (empty? more) (str "Extra elements after rest argument: " (pr-str more))))
[arg more])))

(defn process-arrow-schematized-args
"Take an arg vector, in which each argument is followed by an optional :- schema,
Expand All @@ -161,8 +187,11 @@
(loop [in args out []]
(if (empty? in)
out
(let [[arg more] (extract-arrow-schematized-element env in)]
(recur more (conj out arg))))))
(if (= '& (first in))
(let [[arg more] (extract-arrow-schematized-element env (next in) true)]
(recur more (conj out (first in) arg)))
(let [[arg more] (extract-arrow-schematized-element env in)]
(recur more (conj out arg)))))))


;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
Expand All @@ -175,7 +204,7 @@
(assert! (or (symbol? rest-arg)
(and (vector? rest-arg)
(not-any? #{'&} rest-arg)))
"Bad & binding form: currently only bare symbols and vectors supported" (vec post-&))
"Bad '&' binding form: currently only bare symbols and vectors supported" (vec post-&))

[(vec pre-&)
(if (vector? rest-arg)
Expand All @@ -199,13 +228,14 @@
(if (vector? arg)
(simple-arglist-schema-form true arg)
[`schema.core/Any])
(do (assert! (vector? s) "Expected seq schema for rest args, got %s" s)
s))))
`(let [s# ~s]
(assert! (vector? s#) "Expected seq schema for rest args, got %s" s#)
s#))))

(defn input-schema-form [regular-args rest-arg]
(let [base (simple-arglist-schema-form false regular-args)]
(if rest-arg
(vec (concat base (rest-arg-schema-form rest-arg)))
`(into ~base ~(rest-arg-schema-form rest-arg))
base)))

(defn apply-prepost-conditions
Expand Down Expand Up @@ -310,12 +340,43 @@
(cons (into regular-args (when rest-arg ['& rest-arg]))
body))})))

(defn parse-poly-binder [binder]
(loop [binder binder
out []]
(if (empty? binder)
out
(let [[sym & binder] binder
_ (assert (and (symbol? sym)
(not (namespace sym)))
(str "Expected simple symbol in polymorphic binder: " sym))
[opts binder] (loop [binder binder
opts {:kind :schema}]
(if (not (keyword? (first binder)))
[opts binder]
(cond
(= :- (first binder)) (let [[_ kind & binder] binder]
(recur binder (assoc opts :kind kind)))
(= :.. (first binder)) (recur (next binder) (assoc opts :kind :..))
:else (throw (ex-info (str "Unknown keyword in polymorphic binder: " (first binder))
{})))))]
(recur binder (conj out [sym opts]))))))

(defn poly-binder-outer-bindings [binder]
(into [] (mapcat (fn [[sym {:keys [kind]}]]
[sym (case kind
:schema `schema.core/Any
:.. `(schema.core/->AnyDotted schema.core/Any)
(throw (ex-info (str "Unknown kind: " kind)
{})))]))
(parse-poly-binder binder)))

(defn process-fn-
"Process the fn args into a final tag proposal, schema form, schema bindings, and fn form"
[env name fn-body]
(let [compile-validation (compile-fn-validation? env name)
output-schema (extract-schema-form name)
output-schema-sym (gensym "output-schema")
poly-binder (-> name meta ::poly-binder)
bind-meta (or (when-let [t (:tag (meta name))]
(when (primitive-sym? t)
{:tag t}))
Expand All @@ -329,29 +390,65 @@
schema-bindings (map :schema-binding processed-arities)
fn-forms (map :arity-form processed-arities)]
{:outer-bindings (vec (concat
(poly-binder-outer-bindings poly-binder)
(when compile-validation
`[~(with-meta ufv-sym {:tag 'java.util.concurrent.atomic.AtomicReference}) schema.utils/use-fn-validation])
[output-schema-sym output-schema]
(apply concat schema-bindings)
(mapcat :more-bindings processed-arities)))
:arglists (map :arglist processed-arities)
:raw-arglists (map :raw-arglist processed-arities)
:schema-form (if (= 1 (count processed-arities))
`(schema.core/->FnSchema ~output-schema-sym ~[(ffirst schema-bindings)])
`(schema.core/make-fn-schema ~output-schema-sym ~(mapv first schema-bindings)))
:schema-form (if poly-binder
;; can't reuse output-schema-sym or schema-bindings since its type variables are instantiated via poly-binder-outer-bindings
`(schema.core/all ~poly-binder
~(if (= 1 (count processed-arities))
`(schema.core/->FnSchema ~output-schema ~[(-> schema-bindings first second)])
`(schema.core/make-fn-schema ~output-schema ~(mapv second schema-bindings))))
(if (= 1 (count processed-arities))
`(schema.core/->FnSchema ~output-schema-sym ~[(ffirst schema-bindings)])
`(schema.core/make-fn-schema ~output-schema-sym ~(mapv first schema-bindings))))
:fn-body fn-forms}))

(defn parse-arity-spec
"Helper for schema.core/=>*."
[spec]
(assert! (vector? spec) "An arity spec must be a vector")
(let [[init more] ((juxt take-while drop-while) #(not= '& %) spec)
fixed (mapv (fn [i s] `(schema.core/one ~s '~(symbol (str "arg" i)))) (range) init)]
(if (empty? more)
fixed
(do (assert! (and (= (count more) 2) (vector? (second more)))
"An arity with & must be followed by a single sequence schema")
(into fixed (second more))))))
init (vec init)
rest-schema? (= '& (first more))
dotted-schema? (and (<= 3 (count init))
(= :.. (-> init pop peek)))
_ (assert! (not (and rest-schema? dotted-schema?))
"Cannot provide both & and :.. to =>.")
[init template dvar] (if dotted-schema?
[(-> init pop pop pop) (-> init pop pop peek) (-> init peek)]
[init])
_ (when dotted-schema?
(assert! (and (symbol? dvar)
(not (namespace dvar)))
"Dotted variable after :.. must be a simple symbol: %s"
(pr-str dvar)))
fixed (mapv (fn [i s] `(schema.core/one ~s '~(symbol (str "arg" i)))) (range) init)
start-dotted-idx (count fixed)]
(cond
rest-schema? (do (assert! (and (= (count more) 2) (vector? (second more)))
"An arity with & must be followed by a single sequence schema")
(into fixed (second more)))
dotted-schema? `(into ~fixed (let [dvar# ~dvar
template# (fn [~dvar] ~template)]
(cond
(instance? schema.core.AnyDotted dvar#)
[(template# (:schema dvar#))]

(vector? dvar#)
(into [] (map-indexed (fn [i# s#] (schema.core/one
(template# s#)
(symbol (str "arg" (+ i# ~start-dotted-idx))))))
dvar#)
:else (throw (ex-info (str ~(format "Unknown value for dotted variable %s: " dvar)
dvar#)
{})))))
:else fixed)))

(defn emit-defrecord
[defrecord-constructor-sym env name field-schema & more-args]
Expand Down Expand Up @@ -409,18 +506,31 @@
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; Public: helpers for schematized functions

(defn extract-leading-fn-kv-pairs
"Split list into a map of keys allowed at the beginning
of a schema fn macro (eg., :all), and the rest."
[macro-args]
(loop [[k & [v & next-macro-args :as v-provided] :as macro-args] macro-args
leading-opts {}]
(if (= :all k)
(do (assert! v-provided (str "Missing value for key " k))
(recur next-macro-args (assoc leading-opts ::poly-binder v)))
[leading-opts macro-args])))

(defn normalized-defn-args
"Helper for defining defn-like macros with schemas. Env is &env
from the macro body. Reads optional docstring, return type and
attribute-map and normalizes them into the metadata of the name,
returning the normalized arglist. Based on
clojure.tools.macro/name-with-attributes."
[env macro-args]
(let [[name macro-args] (extract-arrow-schematized-element env macro-args)
(let [[leading-opts macro-args] (extract-leading-fn-kv-pairs macro-args)
[name macro-args] (extract-arrow-schematized-element env macro-args)
[maybe-docstring macro-args] (maybe-split-first string? macro-args)
[maybe-attr-map macro-args] (maybe-split-first map? macro-args)]
(cons (vary-meta name merge
(or maybe-attr-map {})
leading-opts
(when maybe-docstring {:doc maybe-docstring}))
macro-args)))

Expand Down
Loading