Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Spike] Validation and type conversion at REST/Model/Database layer #1057

Closed
dhmlau opened this issue Feb 27, 2018 · 20 comments
Closed

[Spike] Validation and type conversion at REST/Model/Database layer #1057

dhmlau opened this issue Feb 27, 2018 · 20 comments
Assignees
Milestone

Comments

@dhmlau
Copy link
Member

dhmlau commented Feb 27, 2018

Timeboxed to 2 weeks

Overview

Validation

  • 1st tier: property/parameter
    • Validate against types and other constraint
    • For example for string type, other constraints could be regex, length, email format
  • 2nd tier: model
    • i.e. multiple properties/parameters validation
    • For example for REST calls, validate:
      • query/path/headers/form parameters
      • request body - content-type e.g. application/json, application/xml
  • 3rd tier: model collection level
    • For example, to make sure email is unique in Customer model
  • 4th tier: across models/systems
    • For example, when placing order with customerId, need to check if customerId exists.
    • business logics need to be involved in this kind of validation
    • Spike: evaluate the possibility. whether we need to open up extension for this purpose. e.g. in LB3, we have operational hooks. our users use those for validation purpose.

Type Conversion

  • How to serialize (parsing the request from REST) and deserialize (writing response back to REST) from REST to JavaScript and vice versa?
  • How to serialize and deserialize from JavaScript to database
    • Consider cases like storing boolean value - in some databases, it uses yes/no.
  • How to serialize and deserialize between different database type
    • For example, Customer stored in MongoDB where customerId is stored using objectId. Order stored in MySQL which doesn't have this native type. How to store customer using objectId as PK, referencing that from order in MySQL which has to be a string.
  • How does it work with different schema language, e.g. swagger, openapi v3, graphQL, xml schema.
    • how to go from REST call to database call

Acceptance Criteria

  • What is the expected UX? May include pseudo code.
    • for application developers
    • for extension developers
  • High level design - slides/diagram to illustrate how things work
    • what leverage we can get from LB3
  • How does it work in LB3 vs LB4? What should be improved and thrown away?
  • Feasibility
  • Complexity
  • Any open questions?

Questions to Answer

Validation

  • How much we can leverage LB3 juggler?
    The goal is to be able to leverage legacy juggler as much as we can.
  • For 4th tier (across models) validation, evaluate the possibility.
    Do we need to open up extension for this purpose. e.g. in LB3, we have operational hooks that are used by users for additional validation.

Note: Due to the different aspect of this spike, spike owner can go through the user experience and present to the team first before going further.

References

One of the user scenarios in here: https://strongloop.com/strongblog/examples-of-validations-for-loopback-models/

cc @raymondfeng

@shimks
Copy link
Contributor

shimks commented Apr 13, 2018

For validation, I was thinking of letting users provide a JSON schemas to validate their controller parameters and thought about using decorators for that purpose like this:

class Controller{
  createANumber(@validate(numberSchema) num: number) {
    return num;
  }
}

Unfortunately, I don't think it's possible to access the arguments from a parameter decorator, so it needs to be done in a method decorator. Then the question to answer is, do we want this feature to be available at the OpenAPI operation decorators, or should this feature live in its own method decorator?

In other words, how should validation be done from the following two options?:

@get() // incorporate into `openapi-v3`
createANumber(@validate(schema) num: number) {}
@validatable()
createANumber(@validate(schema) num: number) {}

For reference, OpenAPI spec does not seem to support useful fields that are present in JSON Schemas like pattern, so the information probably can't be directly integrated into the OpenAPI spec being generated.


Additionally, a very rough PoC on type coercion is up on #1256. Please let me know if you like the approach or have any questions about it.

Thoughts @strongloop/lb-next-dev @raymondfeng @bajtos?

@b-admike
Copy link
Contributor

@shimks IIUC we are talking about Tier 1 validation. I think your approach sounds reasonable and like the choice of JSON schema representation for the validation. +1 for using method decorators for parameters. I'm not aware of any drawbacks to this. Based on what you said for lack fo support for regex patterns for types in OAI spec https://github.com/OAI/OpenAPI-Specification/blob/master/versions/3.0.0.md#data-types, I'd go with option 2 for having separate method decorator. Have you looked into LoopBack 3 and how it does this type of validation?

@shimks
Copy link
Contributor

shimks commented Apr 16, 2018

@b-admike This is what we have for LoopBack 3 so far https://loopback.io/doc/en/lb2/Validating-model-data.html#using-validation-methods. Based on my short briefing with @raymondfeng, apparently this isn't good enough and we should be looking to replace the validation functions with something more robust.

@jannyHou
Copy link
Contributor

jannyHou commented Apr 16, 2018

@shimks For the validation part, I think the job of parameter decorator and method decorator would be:

  • param decorator: register the validation function for a certain argument in Reflect metadata
  • method decorator: retrieve the validation functions and apply them on arguments

This is not related to the path decorator IMO, so let's keep it as a separate decorator. e.g.

@validate()
createANumber(@validateLength(schema) num: number) {}

And +1 for @b-admike 's comment

IIUC we are talking about Tier 1 validation.

Reading the Overview....is this spike for the whole validation Epic?

I may need more time to catch up the progress of type coercion.

@jannyHou
Copy link
Contributor

@shimks OpenAPI spec supports pattern in SchemaObject, see https://github.com/OAI/OpenAPI-Specification/blob/master/versions/3.0.0.md#schema-object

And could you share more details like what's the usage of pattern in validation?

@shimks
Copy link
Contributor

shimks commented Apr 16, 2018

@jannyHou I can't I believe I missed that (pattern). pattern is essentially just regex for limiting whatever you're validating (see https://spacetelescope.github.io/understanding-json-schema/reference/regular_expressions.html#example).

I think since pattern exists for OpenAPI, it's possible to use openapi schemas to do validation, but I feel it'd be harder to do individual validation of each properties with only a part of OpenAPI spec so I think using JSON Schemas to do validation is the way to go.

@raymondfeng
Copy link
Contributor

Here is how I see OpenAPI parameter validation works:

  1. How to provide the metadata for validation

    • Each parameter has a TypeScript/JavaScript type (from the decorator or inference)
    • The OpenAPI spec for parameter can specify its type and format (subtype)
    • We can either extend OpenAPI spec with more constraints or decorate parameters to supply such metadata
  2. How validation is enforced at runtime

    • The method decorator can return a proxy that does the validation before calling the original method
    • An OpenAPI parameter validation action to perform the checks
    • A parameter resolver to trigger validation for the params

@bajtos
Copy link
Member

bajtos commented Apr 17, 2018

As for Tier1 validation, my understanding is that such validation rules will be declared at controller method level. In that case, I'd prefer to allow developers to use transport-specific decorators like OpenAPI spec @param to define validation rules. If needed, we can extend OpenAPI spec with our custom contraints, as mentioned by @raymondfeng.

Upsides:

  • It's easy to support design-first (top-down) approach, where validations are described directly in the OpenAPI spec/gRPC .proto document.
  • It's easy to generate API spec (e.g. OpenAPI spec or gRPC .proto documents) from the decorated controller methods, because the metadata used in decorators maps 1:1 to the spec.
  • Developers have full control over the API spec, it's easy to use any protocol-specific features.

Downsides:

  • When writing a server that provides multiple API protocols (REST/OpenAPI, gRPC, etc.), developers have to learn validation syntax of each protocol.

Regarding the enforcement of validation at runtime: as long as we are talking about validation of inputs for controller methods (or function handler based routes), I'd personally prefer to implement the validation in the REST layer, most likely as part of parseParams sequence action.

The method decorator can return a proxy that does the validation before calling the original method

While this is doable, I am concerned about the ramifications of generating proxy methods:

  • Performance impacts.
  • How easy it will be to troubleshoot proxied methods.
  • We may loose some flexibility in the controller method API - for example, can proxy methods support optional positional arguments?
  • How are we going to generate proxy methods in JavaScript codebase, where decorators are not available?

@shimks
Copy link
Contributor

shimks commented Apr 17, 2018

Here's what I have down so far for tier 1 validation: 6b40a33. Please look at the test file for the use case.

The PoC comprises of doing validation with JSON Schema, which I feel lines up better with our current support for bottom-up approach. This way we'd also have access to popular validators such as AJV.

I do agree that when looking at validation from the top-down approach, it's much more streamline to do validation with an OpenAPI schema. We'd just need to either use an OpenAPI schema validator or use a OpenAPI schema -> JSON Schema converter and then use a JSON Schema validator. This would take care of both tier 1 and tier 2 validations.

The PoC also shows validation done through a proxy method using decorators. (EDIT: I didn't know there was an actual Proxy object that could be used to do validation. I think the way it's done in my PoC so far serves a similar purpose though)

performance

I don't think using decorators will impact performance more than it already does since all we're doing is running validation on the given arguments and then passing them back to the original method.

troubleshooting

I also don't see problems with troubleshooting the proxied methods unless there are gaps in my knowledge.

flexibility

I may need to investigate more for positional arguments stuff.

JavaScript

I think for JavaScript codebases, we can just offer simple validation functions that the users can incorporate into their controller methods.

@shimks
Copy link
Contributor

shimks commented Apr 18, 2018

I'm about to start investigation for tier 3 validation and was wondering if uniqueness is the only example of it? I've checked https://loopback.io/doc/en/lb2/Validating-model-data.html and it seems like uniqueness is the only one that seem to fall within this tier.

@bajtos
Copy link
Member

bajtos commented Apr 19, 2018

An example that comes to my mind - I guess it would have to be implemented as a custom validator:

Let's say we have a Price model that has a property Currency of type string enum. Let's say we maintain the list of valid currencies in another table, where the currency name is NOT a primary key. In that case we may want to write a validator to ensure price.currency always contains one of the enum values listed in Currency table/model.

Maybe the example above is not a realistic one? Let's change it a bit:

The Price model remains the same. The currency table uses the currency name as the primary key. Conceptually, we want to say that "price.currency" is constrained to values (PKs) of "Currency" table. While this can be modeled by belongsTo relation and enforced at the database level, I think it would be great to have this kind of validation available at Tier 3 level. Few reasons for that:

  • We don't want to expose "belongsTo" relation API neither to JS code nor the REST API.
  • An Tier3 constraint can be converted to SQL constraints by our automigrate/autoupdate layer
  • Not all databases support referential integrity, a LoopBack Tier3 validation could be a good solution for them. A prime example: our memory connector. This feature is even more important if we still want to claim that LoopBack applications can seamlessly switch between backing databases.

I did a bit of googling aroud SQL constraints and referential integrity and could not find any other use cases below these two that we have already identified: uniqueness and referential integrity.

@dhmlau dhmlau added the DP3 label Apr 19, 2018
@shimks
Copy link
Contributor

shimks commented Apr 19, 2018

I think referential integrity can be accomplished at tier 2 (model) level as long as the schema provided to do the validation for the model contains all of the information. Do you think the spike should still investigate implementation for referential integrity at tier 3 validation?

For the case of validation of uniqueness, @raymondfeng and I have found that it should be possible to update the juggler definition to support uniqueness for the DBs that support the constraint. For DBs that don't support it, we think it's ok to defer the validation feature for uniqueness. If the users need it, I think they can use our tier 4 validation feature to do it themselves.

Thinking about these validations in our described 3rd tier, I feel that they can be accomplished at different tiers (2nd tier for referential integrity and 4th tier for non-native uniqueness support)

Thoughts @bajtos and @raymondfeng?

EDIT: we've found that support for UNIQUE constraint cannot be added in without access to autoupdate/automigrate.

@jannyHou
Copy link
Contributor

jannyHou commented Apr 23, 2018

Hope I didn't misunderstand the whole spike, I think we want to apply validator when given the OpenAPI spec, not the other way around, generate the OpenAPI spec according to validating rules.

As for Tier1 validation, my understanding is that such validation rules will be declared at controller method level. In that case, I'd prefer to allow developers to use transport-specific decorators like OpenAPI spec @param to define validation rules.

@bajtos I still feel as the initial implementation, it would be good to have a general validator instead of an OpenAPI specific one. How about using composed decorator? e.g.

async createUser(
  @validateEmailFormat()
  @requestBody()
  newUser: User
): Promise<User>

The OpenAPI decorator is get called first, so the validator decorator can consume the information in the spec. It guarantees the spec generation and validation are targeted on the same parameter, and also easy to switch to other protocols. Moreover, we also have a chance to modify the spec in the validator

I am a little confused with tier 3, what does model collection level mean? What would be the validation that tier 1 and 2 could not do and have to pass to tier 3?

@shimks
Copy link
Contributor

shimks commented Apr 23, 2018

@jannyHou What we want to do is set a baseline for other potential APIs to do their validation on (or at least that was my approach to it). It should be possible to integrate @validate() with @param or @operation to have the validation automatically done.

@shimks
Copy link
Contributor

shimks commented Apr 23, 2018

@raymondfeng @bajtos What kind of UX are we looking for in tier 4 validation? From what it sounds like, we need to be able to give users access to the arguments from the controller method to have them add in their own logic, but the easiest solution to this I feel like is for the users to just do this type of validation in the method itself, both for the users and for us. What am I missing?

Was the purpose to see if we can find a way to implement tools for extension devs to provide users with out-of-the-box validation functions (like uniqueness, presence of FK, etc.)?

@jannyHou
Copy link
Contributor

Had a talk with @shimks regarding tier 3 and 4, take email validation as an example, there could be two kinds of validations:

  • validate format
  • validate uniqueness(need to call backend api)

If we can pass in the model and repository to the decorator function, then those two use cases don't have a difference, and moreover, if we can get access to the related models and repositories in decorator function, that's even better, tier 3 and 4 can be merged into tier 1 and tier 2, we only need two tiers: property level and method level.

So the question would be: given a controller class constructor/prototype, can we get the context like its models and repositories? Or just pass them in as the inputs of decorator function?

@bajtos
Copy link
Member

bajtos commented Apr 26, 2018

For the case of validation of uniqueness, @raymondfeng and I have found that it should be possible to update the juggler definition to support uniqueness for the DBs that support the constraint. For DBs that don't support it, we think it's ok to defer the validation feature for uniqueness. If the users need it, I think they can use our tier 4 validation feature to do it themselves.

Thinking about these validations in our described 3rd tier, I feel that they can be accomplished at different tiers (2nd tier for referential integrity and 4th tier for non-native uniqueness support)

Sounds great 👍

Hope I didn't misunderstand the whole spike, I think we want to apply validator when given the OpenAPI spec, not the other way around, generate the OpenAPI spec according to validating rules.

I think we actually need both. Consider the case where a controller method is accepting a model data. In this case, most (if not all) validation rules are already described in LDL model definition. IMO, repository-json-schema should be able to take LDL/juggler validation rules and convert them to JSON Schema/OpenAPI spec.

Implementation wise, I agree the REST layer should apply validation based on the OpenAPI spec provided by individual endpoints (controller methods, route handlers, etc.)

@raymondfeng
Copy link
Contributor

Some thoughts around the 4 tiers of validation:

Tier 1:

  • Per property of models
  • Per parameter of api method calls

Tier 2:

  • multiple properties within the same model instance
  • multiple parameters of the same api method calls

Tier 3:

  • multiple instances of the same model collection (class/store)

Tier 4:

  • custom validations that relies on business logic with possible async access of external resources outside of LB4's control. Have the ability to apply validation logic at certain extension points such as model persistence and method calls.

Validations can be declarative (declaring constraints) and imperative (writing custom validator functions).

The overlapping between parameter value and model instance validation is interesting:

For example, CustomerController.create(customer) probably hints two types of validations here:

  1. Make sure customer parameter is valid. One use case is to disallow customer.id to be passed in during creation.
  2. Make sure customer is a valid instance of Customer model, such as email is required.

@shimks
Copy link
Contributor

shimks commented Apr 27, 2018

Spike findings

Coercion:

  • link to PoC PR [PoC - WIP] very rough PoC on request parameter type coercion/validation #1256 (ignore the commits on validation)
  • We should leverage types in @loopback/repository to do string/JSON coercion for us.
    • @loopback/types will act as a global typing system for serialization/deserialization/coercion
    • Coercion will be based on the parameter type of the OpenAPI spec that the route corresponds to
    • done in parseParams
  • Coerce according to Data types (NOTE: coercion logic in @loopback/types still needs to be verified)
    • integer/long/float/double -> number
    • string -> string
    • byte/binary -> Buffer
    • boolean -> boolean
    • date/date-time -> Date
    • password -> N/A
    • null -> null (still need to create a LoopBack type for this)
    • any -> any (don't coerce)
    • object and Reference Objects -> object
      • no Model validation should be done here
    • array -> if only one item, coerce accordingly, but don't coerce if array can have more than one items; too ambiguous
// in parseParams
let params: any[];
// ... parameter parsing
let coercedParams: any[];
for (param of params) {
  const lbType = getLoopBackTypeBasedOnParamSpec(param, paramSpec);
  coercedParams.push(lbType.coerce(param, paramSpec));
}
return coercedParams

Validation:

  • link to PoC PR [PoC] feat(validator): add in a JSON Schema validator decorator #1294
  • Use @validate (parameter level decorator) and @validatable (method level decorator) (names can be changeable) to validate the parameters against these two options:
    • JSON Schema: for validation on queries/models alike using AJV
    • functions/async functions: for custom validation using logic; intended for making manual (asynchronous) back-end calls for validation
  • AJV can be used to do coercion as well, but it is not as flexible as we'd like it to be
    • for example, we cannot coerce a string into a Date type
  • can be integrated into openapi-v3 decorators by taking the spec information (manually input) from @operation or @param, converting it into a JSON Schema, and then calling the decorator functions
  • Weighing the approach of property descriptor manipulation through method-level decorator
    • pros:
      • logically clean; validation is done on the parameters as the method is being invoked
      • extension friendly; not tied down to rest, and any other potential future servers can use this
    • cons:
      • need to address edge cases like replacing the function's length property (the parameter is not a rest parameter with a length of 0)
      • with current implementation, all functions that are decorated with @validatable (and by extension @operation) now return promises due to needing to address the possibility of the validator being an asynchronous function. Can be fixed by iterating through the list of validators and checking if any of them return promises.
        • a controller method decorated with an asynchronous function will still return a promise however, no matter what the original method returned
      • no access to Context; validation would be done isolated

Miscellaneous suggestions from the spike

  • Validating uniqueness of a key should be done after a juggler revamp, so that we could leverage the existing validatesUniquenessOf
  • Referential Integrity should be done on the applicable relation decorators without explicitly tying them into the new relation engine

Proposed discussion points regarding the spike results

Validation:

  • decide on method-level decorator approach vs baked-into-router approach
  • decide on shortcut validator decorators if we want any (like @email)

Proposed issues to come out of the spike

  • Implement Finish Fully flesh out the coercion PR (if agreed to be the approach we want to take)
    • implement the http-js layer coercion
      • both deserialization and serialization
    • for queries/paths, string should be coerced into the right javascript runtime type
    • for JSON/xml, reject if given json value is not the right type, except for data types like Date
    • Make @loopback/types extensible so that extension developers can add in their own type converters
      • focus on the conversion case for Date as an extension example to flesh out @loopback/types
    • Come up with a controller that utilizes both geoservice and Date as a form of acceptance test
    • Find a way for users to provide the minimum metadata needed to determine the coercion type; what is the dev exp? What can we provide to the devs?
  • Polish and merge the @validate PR (if agreed to be the approach we want to take)
  • Add in full JSON Schema support for @model and @property

@shimks
Copy link
Contributor

shimks commented May 7, 2018

Follow up issue for this spike has been created in #1306.

@shimks shimks closed this as completed May 7, 2018
@shimks shimks removed the in-progress label May 7, 2018
@shimks shimks added this to the April 2018 milestone May 14, 2018
@shimks shimks mentioned this issue May 29, 2018
7 tasks
@bajtos bajtos added the p1 label Aug 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants