Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace length(min,max) with range operator #31

Closed
Altair-Bueno opened this issue Apr 2, 2023 · 16 comments
Closed

Replace length(min,max) with range operator #31

Altair-Bueno opened this issue Apr 2, 2023 · 16 comments

Comments

@Altair-Bueno
Copy link
Contributor

Altair-Bueno commented Apr 2, 2023

The range operator .. should be the syntax used by garde to perform HasLength validation. This would also provide more information to HasLength implementations, making them more performant.

Example

struct People {
  #[garde(length(2..))]
  linked_list : LinkedList<Person>
}
  • Using would make the HasLength impl return faster, as it wouldn't be necessary to compute the whole length of the linked list (related to Required rule #5)
  • Range operator is a well know Rust feature
@jprochazk
Copy link
Owner

This would also provide more information to HasLength implementations, making them more performant

Can you expand on this? What information does x..=y have that min=x, max=y doesn't?

Using would make the HasLength impl return faster, as it wouldn't be necessary to compute the whole length of the linked list

LinkedList::len computes in O(1), because it keeps track of its length.

As for the syntax, I politely disagree. I don't think ranges are a good match here! Ranges are primarily used for slicing, which is a kind of indexing operation. length does not index into your collection, it only checks its length against some bounds. I copied the syntax from validator, and I don't think we should try to fix what isn't broken.

@Altair-Bueno
Copy link
Contributor Author

Can you expand on this? What information does x..=y have that min=x, max=y doesn't?

Because it's a type, it contains more information than just the values. This could be passed to the HasLength impl to improve performance (see below)

LinkedList::len computes in O(1), because it keeps track of its length.

I didn't know LinkedList keeps track of the current length :). Nonetheless, it was just an example where the length operation might be O(n). Other data structures will behave differently

Ranges are primarily used for slicing, which is a kind of indexing operation

Ranges are much more powerful than that. They can be used to specify any kind of range, not only "indexes"

  • Character ranges ('A'..'Z').contains('B')
  • String ranges ("hello".."world").contains(&"a");
  • Time periods ...
  • ...

Anything that has either a start or end (or even none of them!) can be represented using ranges. And of course, as expected by the Rust compiler team, Ranges are heavily optimised for performance

I copied the syntax from validator

Validator wasn't designed with a composable design, and it shows on things like this. For instance, see Keats/validator#205

it seems like the derive macro filters what types you can use manually.

Rust has a rich type system we could leverage to our advantage to perform validation. We just need to use it

@jprochazk
Copy link
Owner

jprochazk commented Apr 2, 2023

Anything that has either a start or end (or even none of them!) can be represented using ranges. And of course, as expected by the Rust compiler team, Ranges are heavily optimised for performance

My understanding is that the .contains calls ultimately compile down to comparison ops. We're just using those comparison ops directly instead. But we don't have to guess, godbolt confirms this.

That being said, maybe there's some type system magic niche somewhere that I'm not aware of which could improve performance for length checks in garde. If that's the case, I encourage you to make the change and benchmark it! It shouldn't require any changes to the syntax, though. The derive macro could just emit ranges.

Validator wasn't designed with a composable design, and it shows on things like this

I don't think changing the syntax makes any difference in terms of composability here. Do you have any concrete examples?

Rust has a rich type system we could leverage to our advantage to perform validation. We just need to use it

Yep, I agree, which is why the validation rules in garde are based on traits instead of plain functions.

@Altair-Bueno
Copy link
Contributor Author

My understanding is that the .contains calls ultimately compile down to comparison ops

I had something else on my mind while writing that comment. To my knowledge, there are no performance benefits intrinsic to the Range type. Apologises for the misunderstanding

I don't think changing the syntax makes any difference in terms of composability here. Do you have any concrete examples?

As you noted, the macro could emit ranges and delegate the bound checks to the specific impl. Syntax could be left as it is.

@jprochazk
Copy link
Owner

As you noted, the macro could emit ranges and delegate the bound checks to the specific impl. Syntax could be left as it is.

I see, I think I also misunderstood what you were suggesting. Correct me if I'm wrong, but you're saying that this code:

pub fn apply<T: Bounds>(v: &T, (min, max): (&T, &T)) -> Result<(), Error> {

Should accept some R: RangeBounds<T> instead of (min, max) values, and then delegate the bounds checking to R::contains? If that is the case, then I agree it would be a significant improvement.

@Altair-Bueno
Copy link
Contributor Author

Altair-Bueno commented Apr 2, 2023

delegate the bounds checking to R::contains

Rather delegate the bounds checking to the particular HasLength impl, something like

pub trait HasLength<RangeType> {
    fn check(&self, range: RangeType) -> Result<(),Error>;
}

// Most collections will do this
impl HasLength<T, RangeType: RangeBounds> for Vec<T>{ 
    fn check(&self, range: RangeType) -> Result<(),Error> {
        if range.contains(self.len()) { Ok(()) } else { Err(...) }
    }
}

// Other collections, where `len()` is O(N) would choose any other impl

@jprochazk
Copy link
Owner

jprochazk commented Apr 2, 2023

I see, but in case of length and byte_length, you can already do this by implementing the Length/ByteLength trait directly. HasLength/HasByteLength is only meant for types which compute length in constant time. Seems like this is more of a problem with lacking documentation!

I still think the range rule could be generalized to support any RangeBounds type, which would allow something like:

#[derive(Validate)]
struct Foo {
    #[garde(range(min='a', max='z'))]
    value: char,
}

but that's a separate issue. (#32)

@jprochazk jprochazk reopened this Apr 2, 2023
@Altair-Bueno
Copy link
Contributor Author

Altair-Bueno commented Apr 2, 2023

I still think the range rule could be generalized to support any RangeBounds type,

Yea thats a good one

To be honest, i think everything could be generalised even further using traits. Something like this:

pub trait Validator<T> {
   type Context;
   fn validate(&self, ctx: &Context, value: T) -> garde::Result {
      todo!()
   }
}

// Anything that impls AsRef<str> would support `Email`
impl <T: AsRef<str>> Validator<T> for Email {
   type Context = ();
   fn validate(&self, ctx: &Context, value: T) -> garde::Result {
      todo!()
   }
}
// ...
use garde::validators::Email;
use garde::validators::Regex;
use garde_extensions::SuperComplexValidation;

#[derive(Validate)]
pub struct Person {
  #[validate(Email)]
  name: String,
  // Range impls Validate<Vec<T>>, Validate<LinkedList<T>>,...
  #[validate(18..=20)]
  age: NonZeroUsize,
  // impl AsRef<str>, so it could support `Regex`
  #[validate(Regex(r"[\w/]{1,20}")]
  foo: camino::Path,
  // Combine multiple validators by using comma
  // Allows custom validators, provided by the community
  //               vvvvvvvvvvvvvvvvvvvvvv
  #[validate(10.., SuperComplexValidation { entropy: 100 })]
  another_one: String,
}

This would generate something like this:

impl Validate for Person {
  type Context = ();
  fn validate(&self, context: Context) -> garde::Result {
     Email.validate(&context, &self.name)?;
     (18..=20).validate(&context, &self.age)?;
     // ...
     Ok(())
  }
}

I kinda love this API, as it makes dead simple to reason about the control flow of the program, simplifies the macro A LOT and provides third party crates with a contract (trait) to build upon, instead of relying on Fn

@jprochazk
Copy link
Owner

jprochazk commented Apr 2, 2023

But now Email is a struct, so you couldn't implement that rule for something that isn't AsRef<str>.

You could solve it by having an Email struct (which you pass to validate), and some ValidateEmail trait that would have a blanket impl for AsRef<str> + impl<T: ValidateEmail> Validator<T> for Email... And now you've got basically the same design as this library, only worse, because you don't get the opportunity to have nice error messages for "obviously wrong" usage (like duplicate rules).

Allows custom validators, provided by the community
...
provides third party crates with a contract (trait) to build upon, instead of relying on Fn

In this case, there's no difference between a trait and a Fn, because you couldn't store any state in the validator anyway, it would have to be a unit struct (and a function is already a unit struct that implements the function traits 🙂). Maybe the context API could be improved to support passing any kind of state to custom validators, so libraries could provide reusable validators including the context they require. Not sure how, though!

@jprochazk
Copy link
Owner

But anyway, seeing as the use case:

it wouldn't be necessary to compute the whole length of the linked list

Is addressed by implementing Length/ByteLength directly, I'm going to close this issue.

@Altair-Bueno
Copy link
Contributor Author

But now Email is a struct, so you couldn't implement that rule for something that isn't AsRef.

Given that emails are expected to be valid strings, I don't see how this is an issue. Specially when using AsRef<str> unlocks a lot of types. I can name a few:

  • Box<str>
  • Arc<str>
  • String
  • camino::Path # External crate, for free!
  • camino::PathBuf # External crate, for free!
  • Cow
  • SmallStr # External crate, for free!

Also, the docs list AsRef<str> as one of the implementors of the email rule (https://docs.rs/garde/latest/garde/rules/email/trait.Email.html#impl-Email-for-T)

you couldn't store any state in the validator anyway

See the another_one field on the example. The validator would be configurable and also can receive state (context API). But yes, the validator must be stateless

the context API could be improved to support passing any kind of state to custom validators

https://docs.rs/axum/latest/axum/#sharing-state-with-handlers

Check axum::extract::State, and axum::extract::FromRef


In a nutshell: What the API I described earlier offers is to ditch the macro based design in favour of a trait based one

@jprochazk
Copy link
Owner

jprochazk commented Apr 2, 2023

Given that emails are expected to be valid strings, I don't see how this is an issue.

This example is a bit contrived, but say you want to validate an email stored as a ropey::Rope. It isn't contiguous, so you can't get a &str from it directly without allocating an extra string.

The current email rule is already implemented over any AsRef<str>, but with your design, you couldn't implement it for anything else. As a consequence, the design you propose is strictly a subset of the current design, so as far as I can tell, it wouldn't unlock any new use cases. I encourage you to try using the types you listed in validated fields, because those are already supported!

What the API I described earlier offers is to ditch the macro based design in favour of a trait based one

The rules being deeply integrated into the macro is a feature, one which I don't want to ditch, because a good macro can significantly improve the user experience.

To give a more concrete example, the pattern rule actually attempts to compile the regex during macro expansion, and outputs the resulting error (if there is one), so you know right away if the regex is invalid. You couldn't do this with your design, because the Regex validator may be renamed to anything by the user, so the macro would have no way of knowing that it's being given a regex that it should check. Good error messages are really important!

Check axum::extract::State, and axum::extract::FromRef

I know about this approach, but it's based on storing the state as dyn Any, which is problematic, because you don't discover incorrect usage until you try to extract the state, and it panics. Ideally a better context API would be strongly typed.

@Altair-Bueno
Copy link
Contributor Author

This example is a bit contrived, but say you want to validate an email stored as a ropey::Rope. It isn't contiguous, so you can't get a &str from it directly without allocating an extra string.

The regex crate wouldn't work either, and pretty much anything (that works with strings) expects an str. By convention, we expect str. It's a safe bet that covers most usecases. Nonetheless, keep reading :)

you couldn't implement it for anything else

You wouldn't be implementing an email rule, but a validator for ropey::Rope (Validator<ropey::Rope>)

pub struct RopeyValidator;

impl Validator<ropey::Rope> for RopeyValidator {
  type Context = RopeyRegex; // Or something like that

   fn validate(&self, ctx: &Context, value: ropey::Rope) -> garde::Result {
      if ctx.matches(value) { Ok(()) } else { Err(...) }
   }
}

This API ditches the idea of hardcoded rules in favour of generic Validators (implementors of the trait Validator<T> for some T). And every field with the #[garde] attribute is checked against all the defined set of validators. Nothing more, nothing less

a good macro can significantly improve the user experience.

It can also increase significantly complexity and compile times. Thats one of the main reasons most people ran away from heavy macro frameworks like Rocket. They generate too much code and makes rustc compile too slow!!

To give a more concrete example, the pattern rule actually attempts to compile the regex during macro expansion,

Thats worrisome. I don't see that documented on the docs. And it's definitely a behaviour I would like to disable. I don't need to benchmark it to know it increases compilation times on medium sized projects. Moreover, there are already existing solutions to this problem (https://docs.rs/lazy-regex/latest/lazy_regex/). Why would your implementation be better? I would consider regex compilation out of scope for this crate

You couldn't do this with your design, because the Regex validator may be renamed to anything by the user, so the macro would have no way of knowing that it's being given a regex that it should check

Right! That example is incomplete. Because regex::Regex and regex::bytes::Regex aren't constructed that way.

// Either this or the `lazy_regex` crate I linked earlier. 
// 1. User is able to opt in into regex validation
// 2. Regex can be called elsewhere. 
// 3. No more macro magic behind the curtains
lazy_static::lazy_static! {
   static ref RE: Regex = Regex::new("...").unwrap();
}

#[derive(Validate)]
pub struct Person {
  // Use RE directly, as it impls `Validate<T> where T: AsRef<str>`
  #[validate(RE]
  foo: String,
}

I know about this approach, but it's based on storing the state as dyn Any

Nope, that was axum::extract::Extension. Since the 0.6.0 release, sh*t got serious. State is type safe

@Altair-Bueno
Copy link
Contributor Author

I created a POC of the API. I included an example example/person.rs that showcases the API. It doesn't include a macro, but rather an aproximation of what the macro should produce

https://github.com/Altair-Bueno/garde-poc-validator

Note: For the sake of simplicity, i'm using ranges, but it can be anything else

@jprochazk
Copy link
Owner

jprochazk commented Apr 3, 2023

You wouldn't be implementing an email rule, but a validator for ropey::Rope

As far as I can tell, that would require you to create N*M validators, where N is the number of types, and M is the number of rules. I think a less contrived example would be any string type which does not deref to &str. I still don't see how it's an improvement. By the way, I do have plans to update some of the rules to use custom validation functions that work on byte streams. For example, the url rule uses the url parser from url, which requires allocations, which is not ideal!

It can also increase significantly complexity and compile times. Thats one of the main reasons most people ran away from heavy macro frameworks like Rocket.

A lot of people actually love Rocket's approach, but they "ran away" from it because it was unmaintained. I know at least I did.

Thats worrisome. I don't see that documented on the docs. And it's definitely a behaviour I would like to disable

You aren't disabling compile time validation of your regular expressions, you're just moving that work to a different macro. I plan to replace full regex with just regex_syntax, as I wasn't previously aware of its existence. But it doesn't make sense to allow disabling this feature, as otherwise you would get a panic when the regex is constructed at runtime.

Nope, that was axum::extract::Extension. Since the 0.6.0 release, sh*t got serious. State is type safe

Only for State, but you can't always use it, for the rest there's Extension which is not type safe.

To reiterate, the macro being deeply integrated with rules is a feature, and I have no plans to ditch it. Feel free to build out your POC into a more complete library and then use that instead! :)

@Altair-Bueno
Copy link
Contributor Author

But it doesn't make sense to allow disabling this feature, as otherwise you would get a panic when the regex is constructed at runtime.

Compile times. The more work the macro does, the slower it will compile. Its the same deal with sqlx and other macros that do too much

Only for State, but you can't always use it,

Yes you can, or at least I haven't found any problems whatsoever with FromRef

#[derive(FromContext)
pub ValidatorContext {
  context1: (),
  context2: FooContext,
  // ...
}

To reiterate, the macro being deeply integrated with rules is a feature, and I have no plans to ditch it.

Roger. Thanks for your time :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants