Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Add validation to SubmitJob API #64

Open
bryantrobbins opened this issue Jan 8, 2017 · 3 comments
Open

API: Add validation to SubmitJob API #64

bryantrobbins opened this issue Jan 8, 2017 · 3 comments
Assignees

Comments

@bryantrobbins
Copy link
Owner

Before successfully writing to DynamoDB and placing a message on the queue, the SubmitJob API call should validate the parameters of the requested job.

Here is a sample JSON configuration object for a job:

{
  "dataset": "Lahman_Batting",
  "transformations": [
    {
      "type": "columnSelect",
      "columns": [
        "HR",
        "lgID"
      ]
    },
    {
      "type": "rowSelect",
      "column": "yearID",
      "operator": ">=",
      "criteria": "2000"
    },
    {
      "type": "columnDefine",
      "column": "custom",
      "expression": "2*(HR)"
    },
    {
      "type": "rowSum",
      "columns": [
        "playerID",
        "yearID",
        "lgID"
      ]
    }
  ],
  "output": {
    "type": "leaderboard",
    "column": "HR",
    "direction": "desc"
  }
}

Below is a list of required validations.

Dataset:

  • Dataset ID should be from set of allowed set of datasets (currently just "Lahman_Batting")

Output:

  • Output parameter "type" should be from allowed set of output types (currently just "leaderboard")
  • Output parameter "column" should be the name of a single column from the set of selected and/or defined columns as of the end of all transformations
  • Output parameter direction must be one of "desc" or "asc"

ColumnSelect and RowSum Transformation:

  • Entries in the "columns" list should be the name of an existing column, with respect to any previously executed transformations.
  • After the ColumnSelect transformation, all columns not present in the "columns" list are lost.
  • After the RowSum transformation, all string-valued columns not present in the "columns" list are lost.

RowSelect Transformation:

  • "column" should be the name of an existing column, with respect to any previously executed transformations.
  • "operator" should be one of <, >, <=, >=, =, or !=.
  • "criteria" should be either a number or string, and not an expression.
  • The type of the criteria (number or string) should match the type of the corresponding column chosen.

ColumnDefine Transformation:

  • "column" should be a unique name for the new column being defined, and should not conflict with the name of any existing column, with respect to any previously executed transformations
  • "expression" should be a valid mathematical expression using only scalar values (strings or numbers) or the names of existing columns, with respect to any previously executed transformations.
  • "expression" may use the following numerical operators: +, -, *, /, ^
  • After the ColumndDefine transformation, a new column with the given name is added.
@bryantrobbins bryantrobbins self-assigned this Jan 14, 2017
@bryantrobbins
Copy link
Owner Author

bryantrobbins commented Jan 14, 2017

Checking the column definition expressions is the hardest part of this. I'm using the pyparsing module (http://pyparsing.wikispaces.com/) to write a Python class with the necessary logic.

Check out https://github.com/bryantrobbins/baseball/blob/master/shared/btr3baseball/ExpressionValidator.py

@bryantrobbins
Copy link
Owner Author

@bryantrobbins
Copy link
Owner Author

TODO: Add a list here of possible exceptions thrown by the ConfigValidator for consumption by the UI and Worker

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant