Service-Engine auto provisions REST
, GraphQL
& gRPC
services that support CRUD operations (with full validation) to tables, views and materialized views of several popular databases.
It can be implemented via an NPM package and as a Docker Container.
- Overview
- Key Concepts & Interfaces
- Application Considerations
- Application Configurations
- Application Recommendations
- Key REST Endpoints
- Development Notes
- Related projects:
- Setup & Feature Video Walkthrough
- Versioning
- License
The fastest way to get up and running with this project, is to fork a prebuilt docker app that implements the framework.
This project runs the public Docker container and contains only migrations and related configurations.
The unique features that make your product(s) stand out in the market deserve the lion's share of your bandwidth. As such, it's unlikely you have much sincere interest in dedicating time building REST
endpoints that map 1-to-1 to DB tables.
These tasks are tedious and unchallenging -- as the specs for the work are fully derived from DB DDLs -- but if you desire REST
access... it must be accomplished.
This Framework aims to solve that.
I've worked in multiple shops where some subset of engineers had an interest in utilizing GraphQL
or gRPC
, while others were hesitent as REST
was the office standard and learning any new tech takes time. A primary goal of this project is to support all three so that the REST
needs of today are satisfied, while enabling GraphQL
& gRPC
evaluation/adoption.
The resources provivisioned by the server for the various services (REST
endpoints, GraphQL
resolvers & gRPC
methods) are built based on the results of a query that surveys the DB and returns back a list of all fields within all tables, views or materialized views for all schemas.
A core benefit of implementing this framework is offloading the validation
a given DB request from other backend processes.
This is benefitial for a few reasons, but before we discuss let's consider how a basic request to a REST
endpont would get handled.
- A user calls a
REST
endpoint - The view processing the request will assembles an object from headers, query string and body.
- This object gets validated to ensure it will do no harm and should be executed
- The object is transformed to
SQL
and get's sent to the DB for execution.
The example above show some general processing that occurs before a REST
request gets sent to a DB. The same steps exist in GraphQL and gRPC -- so we'll just focus on #3 as we discuss the value of this feature.
When an server starts, the following tasks get executed:
- Run DB Migrations (if configured to do so)
- Autodetects DB resources (table, view or materialized view) via inspection
- Builds JOI validators for all DB resources (distinct validator for each supported DQL & DML operation)
- Publishes
REST
,GraphQL
&gRPC
services that apply the appropriate validator for differentCRUD
operations
If you've got multiple applications calling the same DB, each application will need implement validation. If you are doing your best to follow the DRY principle, one option would be to place the validators inside a dedicated package, then implement that within each app calling the service (this would also be a fine place to share SQL queries).
While this is a fine strategy, the package holding these validators would be a code dependency (across multiple applications), which would require updates with each modification to the database.
Instead, the approach provided here is to simply offload the validation to the server implementing this service-engine
, which would respond to the caller with either the query results (for valid requests) or a verbose error message (for invalid requests).
Database migrations (a.k.a. Schema Migrations) are an awesome way for managing changes to db state and since this project will act as the DAL for a specific DB, it makes a logical place to also hold migration files.
If implementing this service by forking the Dockerized Template project, you will just be building the migration files manually and placing them in the appropriate directory.
If implementing in node
, you'll be following the knex migration docs.
If the DB powering this service is PostgreSQL
with the postgis
extension enabled, spatial queries will be enabled on geometric fields.
This feature works by identifying any fields of a geometric type (as reported in the initial DB survey on startup) and enabling various spatial type functions (st_*
) via SEARCH methods.
Additionally, any fields of this type are published as GeoJSON (after being transformed from WKT).
Current support for spatial search functions include:
- Radius (ST_Intersects)
- Bounding Box (ST_Intersects & ST_MakeEnvelope)
- Custom Polygon (ST_Intersects & ST_GeomFromText)
By abstracting the DB, you make it easier to manage changes DB versions or introduce optimizations like DB partitioning. This is possible because if applications are calling this service instead of the DB directly, you reduce the number of places where the DB changes need to be introduced.
It may sound absurd to some readers to be support fo,
but if you haven't been a part of a DB to DB migration - you haven't lived. These are complicated projects requiring a fair amount of planning and coordination before finally flipping the switch.
The need for jdbc/odbc drivers, and the packages that leverage them, will not be needed because this application will be exposeing
REST
, GraphQL
& gRPC
Services for interacting with the DB.
As a result, native features (like fetch) or lighter dependencies (like gRPC) can be used instead.
Requests to this server are used to build SQL queries via the SQL Query Builder, knex.js. While the call signatures for REST
, GraphQL
or gRPC
each differ, each received request gets transformed to a common standard before for validation and execution.
And if you understand how each request gets processed after getting standardized, it should help you understand the various interfaces.
Standardized API Requests are comprised of (among other things) a payload
& context
. Below is an example of what this standardized object looks like after it's been standardized.
{
"payload": {
"occupation": "engineer",
"state.in": "NJ|PA",
"handle.like": "sudo%",
},
"context": {
"page": 5,
"limit": 30,
"orderBy": "handle,name_last:desc",
"fields": "id,handle,email,name_first",
"seperator": "|"
},
}
The query
object above would get validated to ensure all fields requested to be returned and all used for ordering exist on the target resource, the keys
in the payload
are fields in the target table, and that the values
in payload
are [A] of the correct data type and [B] the operators used on fields (ex .like
or .in
) have the correct number of args and type if the operator has requirements (range, geoquery, etc. have these kinds of requirements).
If invalid, the application will respond with a meaningful, verbose message indicating what the issue was with the request.
If valid, The query
would get passed to a function that would build SQL
to be executed against the DB.
As an example, the query
object above would produce the SQL
below:
select
id
, handle
, email
, name_first
from
public.some_table -- REST call made to /public_some_table
where
occupation = 'engineer'
and
state in ('NJ', 'PA')
and
handle like 'sudo%'
order by
handle
, name_last desc
limit 30
offset 120
The example above uses three operators (equal
, in
, like
), this Framework supports sixteen operators
. The table below details each supported operator, how it's implemented in REST
, if it will support multiple seperated values and if the operator has a fixed number of arguments.
field.operator |
sql operator | multiple seperated args | # of args |
---|---|---|---|
field | = (default) | false | |
field.equal |
= | false | |
field.gt |
> | false | |
field.gte |
>= | false | |
field.lt |
< | false | |
field.lte |
<= | false | |
field.not |
<> | false | |
field.like |
like | false | |
field.null |
is null | false | |
field.not_null |
is not null | false | |
field.in |
in (...values) | true | |
field.not_in |
not in (...values) | true | |
field.range |
between x and y |
true | 2 |
field.not_range |
not between x and y |
true | 2 |
field.geo_bbox |
geo_bbox | true | 4 |
field.geo_radius |
geo_radius | true | 3 |
field.geo_polygon |
geo_polygon | false |
NOTE 1: Qeoqueries (bbox & radius) us long/lat formatted arguments. I've opened an issue to support a config option to flip that as it is more intuitive.
NOTE 2: Subquery Payload parameters in REST (which are available on defined complexResources) use the Greater-than sign (>
) as a prefix.
Example,
|page
&>state
are the query string parameters for context optionpage
on thetopResourceName
and sub querystate
on thesubResourceName
.
Inbound calls for Search Resources (REST, GraphQL & gRPC) accept a query context that is used to define the sql to be executed. Additionally -- all resources support fields
context, meaning no matter what operation you are executing, you can limit the fields being returned.
Below are all the supported context
keys available for use within a query:
key | description |
---|---|
fields | fields to return from the SQL query |
seperator | seperator used to seperator values submitted in request (default is "," |
orderBy | fields to order results by. can accept multiple values seperated by "," . Format: field:desc (:asc is default so you can omit) |
page | Pagination Page |
limit | Pagination Limit |
distinct | used to select distict records from a resultset. (any truthy value is respected) |
notWhere | used to determine if knex uses WHERE or NOT WHERE when applying filters. NOT IMPLEMENTED |
statementContext | used to determine how filters should be applied together (AND, OR, and NOT operators) NOT IMPLEMENTED |
NOTE 1: Context in REST is always in query string. This is useful for returning fields on CREATE
& UPDATE.
NOTE 2: Context parameters in REST use the pipe (|
) as a prefix. Example, |page
& |limit
are the query string parameters for context options page
& limit
.
There are several standardized components that exist in both REST
& GraphQL
interfaces.
REST
data returns in Response Headers, while GraphQL
data is returned in response types. gRPC
currently does not support these features.
Each request gets a Request ID (uuid) assigned, which is is attached to the response header and also injected into any log statements during the fulfillment of the request. This reqId
should make searching for events related to a single call in your logs trivial.
request-header | value | response-header | description |
---|---|---|---|
N/A | N/A | x-request-id |
UUID assigned to request for injection into logs and troubleshooting of calls. |
Each call (REST
, GraphQL
or gRPC
) ends up building a SQL query that in most cases get's executed (see debug mode). The actual SQL query is always available via a response header on REST
calls (as x-sql
) and available another way via GraphQL.
request-header | value | response-header | description |
---|---|---|---|
x-get-sql |
truthy | x-sql |
SQL built by service |
Executing a paginated search is a standard operation, and in order to save an additional service call to request the count for a search query (in addition to the actual search providing results) -- the unpaginated count is available via the response header.
This way -- you can choose to request the count for the first page, which does result in 2 DB calls -- but then omit that flag for subsequent pages. GraphQL
and gRPC
handles this a bit differently, but they function in very similar manners.
request-header | value | response-header | description |
---|---|---|---|
x-get-count |
truthy | x-count |
unpaginated count for submitted query (even if request was paginated) |
Every resource can be called in a normal mode, which submits valid queries to the DB and debug mode -- which stops at the DB's door. If you are interested in seeing how a given REST/GraphQL query was parsed, validation responses and the SQL query built (before it's executed) -- you can do so via debug mode in REST & GraphQL.
# service_call
http://localhost:8080/sample-app-name/service/${schema}_${table}/?|orderBy=uuid:desc&|limit=3&|page=10&|fields=id,uuid&active=truthy
# debug mode (no db call)
http://localhost:8080/sample-app-name/debug/${schema}_${table}/?|orderBy=uuid:desc&|limit=3&|page=10&|fields=id,uuid&active=falsey
# service call for example given above in
# Standardized Query = Payload + Context
http://localhost:8080/sample-app-name/service/${schema}_${table}/?occupation=engineer&state.in=NJ|PA&handle.like=sudo%&|page=5&|limit=3&|orderBy=handle,name_last:desc&|fields=id,handle,email,name_first&|seperator=|
The service should work out-of-the-box with minimal configuration. There are however a couple key requirements that must be satisfied before this service will function.
All schema names, resource names and field names must adhear to GraphQL SDL -- which limits supported characters to a very small subset of ascii chars ([a-zA-Z0-9-]
). It iss possible your db uses unsupported characters and any differences will need to be resolved before you can get this service to run.
Either update the field names or use the permissions to prohibit publication of resources (as setting a permission to .none()
prohibits the addition of the resource into the GraphQL schema).
Migration support is optional -- however if you want to use it you'll need to ensure the service account being used by the app has appropriate permissions to create objects and write records.
Additionally, if the service account lacks permissions to CRUD to specific objects, the endpoints, resolvers and methods will get created -- but calls to the db will result in 500 level errors in REST
and similar things in GraphQL
or gRPC
.
The supported method for resolving this is to define service permissions in the permissions configuration object, which will prevent the publication of REST endpoints &* resolvers.
This application implements knex.js, which supports a great many popular DBs, but not all DBs support returning fields on INSERT & UPDATE statements.
Postgres does and it's the recommended engine for new projects implemented this library.
Resources provisioned in this application are based upon the tables, views and materialized views -- and the fields and respective data types they contain -- reported on a dbSurveyQuery
.
While this application currently supports three popular database engines -- the dbSurveyQuery
and various Data Type mappings are based upon recent full versions of the engine.
What this means practically is that the application may work on lower versions, the DB data types supported in the application may differ between versions. It's also possible (although not expected) that the dbSurveyQuery
itself would not work for all versions. Anyone noticing an issue is encoraged to open an issue and roll up their sleeves to take the first swing at proposing a resolution.
In time -- its possible that version specific DB Engine support will be provided. But for now the DB Engines supported (including the version used for map development) are outlined below.
- DB Survey Query
- Data Types
- Map: DB Data Type -> JOI Validation
- Map: DB Data Type -> GraphQL Schema
- Map: DB Data Type -> gRPC Proto Scalar
- DB Survey Query
- Data Types
- Map: DB Data Type -> JOI Validation
- Map: DB Data Type -> GraphQL Schema
- Map: DB Data Type -> gRPC Proto Scalar
- DB Survey Query
- Data Types
- Map: DB Data Type -> JOI Validation
- Map: DB Data Type -> GraphQL Schema
- Map: DB Data Type -> gRPC Proto Scalar
- DB Survey Query
- Data Types
- Map: DB Data Type -> JOI Validation
- Map: DB Data Type -> GraphQL Schema
- Map: DB Data Type -> gRPC Proto Scalar
- DB Survey Query
- Data Types
- Map: DB Data Type -> JOI Validation
- Map: DB Data Type -> GraphQL Schema
- Map: DB Data Type -> gRPC Proto Scalar
- DB Survey Query
- Data Types
- Map: DB Data Type -> JOI Validation
- Map: DB Data Type -> GraphQL Schema
- Map: DB Data Type -> gRPC Proto Scalar
- Note: Not all Oracle DB types are supported by the
oracledb
npm drivers (example: BFILE). This is an issue until knex extendsfetchAsString
to support these types. In the meantime, the app will run, but you should simply omit these fields from a response using the|fields
context.
The page limitation used as the default and max for any request to the server.
const { App, logger, grpcService } = await ignite({
db, metadata, paginationLimit: 250,
});
The port that the gRPC
service will listen on.
const { App, logger, grpcService } = await ignite({
db, metadata, grpcPort: 50051,
});
Service permissions are managed via permissions objects defined at the system & resource levels:
systemPermissions
apply to all db resources published on service (REST & GraphQL).resourcePermissions
can be used to modify/overide permissions set for system.
Below is an example of how to configure permissions for the service:
import { ignite, initPostProcessing, permit } from "service-engine";
const systemPermissions = permit().none();
const resourcePermissions = {
'public.some_table': permit().create().read().update().delete(),
'some_schema.some_view_name': permit().read(),
'some_schema.some_mat_view': permit().read(),
// sqlite3 has no schemas
'some_table': permit().create().read().update().delete(),
'some_view_name': permit().read(),
}
const { App, logger, grpcService } = await ignite({
db, metadata,
systemPermissions,
resourcePermissions,
});
Sometimes it can be useful to intercept an inbound query before submitting for processing. To accomplish this, this framework supports middleware -- which are a set of functions that take as input
a query object
and returns a new query object
(that will still pass the validation).
This can be useful for appending submitted queries with additional search criteria deriving from the request on-the-fly -- like adding a partition key to a query or by appending a max bbox for a query using a geo point & zoom level.
Below is an example of how to configure permissions for the service:
import { ignite, initPostProcessing } from "service-engine";
// other setup ...
// object keys are resource endpoints `${schema}_${db_resource}` that are listed in the OpenAPI3 docs at `/openapi`
const resourceSearchMiddleware = {
public_accounts: item => ({
...item,
partition_key: !!item.email ? item.email.toLowerCase()[0] : null,
}),
}
const { App, logger, grpcService } = await ignite({
db,
metadata,
resourceSearchMiddleware
});
# REST call to /public_accounts or
# GRAPHQL query SearchPublicAccounts
# before middleware applied (raw query)
{
'email': '[email protected]'
}
# after middleware applied (transformed query)
{
'email': '[email protected]',
'partition_key': 'c'
}
Subqueries & Aggregate functions in SQL are fully supported in this framework. The configuration of these features are a little clunky, but once setup they support the same common interfaces as all other resources (full validation, middleware support, REST query standards, OpenAPI generation, GraphqL support, etc).
The reason I describe them as clunky is because you will often have to create a new view that matches that data structure of the query result you want to expose. This is because the validation and interface auto provisioning is based on the results of the dbSurveyQuery
, which means if you want access to data in a particular format from a server resource (think REST
endpoint)... it must be represented in that format in a table, view or materialized view.
The way I've built the feature is to define two (2) resources that exist in the dbSurveyQuery
-- which means that validators have been provisioned.
You name the resources as follows:
topResourceName
subResourceName
The subResourceName
is the real DB object that gets queried. The topResourceName
is the result of any grouping && aggregation functions.
Below is an example of how to configure complex resources for the service:
import { ignite, initPostProcessing } from "service-engine";
// other setup ...
const complexResources = [
{
topResourceName: 'cms_providers',
subResourceName: 'cms_providers',
calculatedFields: {
address_city: 'LOWER(address_city)'
},
},
{
topResourceName: 'public_i001_city_state_entity_provider_n',
subResourceName: 'cms_providers',
groupBy: ['address_city','address_state','entity_type','provider_type'],
calculatedFields: {
n: 'count(npi)'
},
}
]
const { App, logger, grpcService } = await ignite({
db,
metadata,
complexResources
});
In the first example cms_providers
(schema: cms
& view: providers
) is named as both the topResourceName
&& subResourceName
, which is fine as there are no aggregations resulting in changes to field names or data types. The use of calculatedFields is only used to transform the data within the confines of the original datatype of the field (notice the key name in calculatedFields
hasn't changed from the original field). You would likely never do this, as a normal view would be a better place to store such a query -- but I've placed it here only to help highlight how the feature works.
In the second example, there are both groupings and aggregation functions that change the name and/or data type of the fields reported in the dbSurveyQuery
. This is a problem because the result structure doesn't exist and won't be automatically provisioned as a validator.
To solve this.... I intentionally create a view that exists only for reference here in this complex query configuration. This resource, referenced as topResourceName, public_i001_city_state_entity_provider_n
is/could be a view I created specifically for the purpose of use in this complex resource (schema: public
& view: i001_city_state_entity_provider_n
). I use the i
+#
prefix to identify DB objects that "are not real".
NOTE: I know this is a bit clunky. I'll buy a beer for the person who comes up with something more elegant. But it works. And that's not nothing. 🔥
In some situations, it may be useful to redact columns from database records -- while still maintaining the ability to support querying that dimension.
This feature works best when paired with a middleware function that would to derive query conditions from a submitted query on the fly.
Below is an example of how to redact fields for a given db resource:
import { ignite, initPostProcessing } from "service-engine";
// other setup ...
// the fields below will be removed from api responses and not published in OpenAPI3, gRPC proto or GraphQL Schema BUT... can be used in queries
const redactedFields = {
public_people: [
'this',
'that',
'the_other',
'partition_key',
],
};
// append the `partition_key` -- which has been redacted and the user doesnt know on the fly based on the query
const resourceSearchMiddleware = {
public_people: query => ({
...query,
partition_key: !!query.last_name
? query.last_name.toLowerCase().substring(0, 3)
: null,
}),
}
const { App, logger, grpcService } = await ignite({
db, metadata,
redactedFields,
resourceSearchMiddleware
});
If you have the option -- I recommend PostgreSQL. Not only does it support PostGIS, but it's got great support for table partitioning and partial indexes. Additionally, a detail relevant to this project, PostgreSQL supports returning on data manipulation (CREATE + UPDATE) -- which means you'll get records back, including fields that the db created (like ids) upon creation.
MySQL & SQLite3 do not support this feature, and as a result REST
Create & Update calls serve 201 no bodys. GraphQL
and gRPC
calls function in a similar manner.
Database migrations (a.k.a. Schema Migrations) are the best way to manage modifications to the DB state in deployed environments.
In environments above development
, I would limit the creation of new db objects to the service account to be used by this service -- and I would remove permissions for destructive activies from standard users.
If engineers want to hack or iterate through some ideas, local is the place to do so. Once things get created and owned by the service account, an entire class of problems disappear.
Removing user data is dangerous.
If you give users features to delete records in bulk -- they'll misuse it. And if you give engineers permission to execute destructive operations in the DB -- they will use them.
For permanently removing user records, I recommend do with via a boolean active flag.
Even to support with GDPR or CCPA requirements, I'd not support deleting via this service, but instead, calling this service to flip flags and using an async worker to execute the purge.
endpoint: /healthz
A health check route is available at this endpoint. The response provides metadata for the service and some information about the DB dialect that is powering the server.
endpoint: /openapi
OpenAPI3 definitions for the REST service are available at this endpoint.
endpoint: /proto
The contents of the .proto
file that is needed to make gRPC calls to this service is available at this endpoint.
endpoint: /schema
The GraphQL schema used by this service are available at this endpoint.
endpoint: /resources
endpoint: /db_resources
endpoint: /db_resources/raw
The resources above were used repeatedly during development to get an idea of that DB resources were being reported by the dbSurveyQuery
and how they are being transformed for use by the Resource objects.
These are the versions of Node & NPM used to build the most recent versions of this package.
# node version
node -v
v14.17.3
# npm version
npm -v
7.20.2
# knex has a sqlite v4 dependency. We are using v5
npm i --legacy-peer-deps
Developing this framework requires using npm link
to add a local branch of this repo as a dependency for another project that implements it.
One one occasion, and for a reason I did't understand, the implementing project lost track of the local dependency. To resolve that I had to unlink and relink the dependency. If you are ever in a similar situation, the these steps should resolve the issue:
- Delete the node_modules in both the dependency and the consumer module.
- Run npm unlink --no-save [dependency-module]
- re-link with the 2-link commands as per npm-link
Now I am able to fully test my unpublished module locally.
REFERENCE: NPM Link Quick Start
I use nodemon when developing locally to contineally restart the server upon saved changes. On occasion, and for unknown reasons, my system would report this error:
Error: ENOSPC: System limit for number of file watchers reached
After a little Sherlocking, I found a solution on this medium post.
echo fs.inotify.max_user_watches=524288 | sudo tee -a /etc/sysctl.conf && sudo sysctl -p
Publishing new versions requires creating a tarball and pushing that NPM. For quick reference, here are the steps.
npm login # enter username & password
npm pack && npm publish
If you would like to see what a node.js implementation looks like, or are looking for something easily forkable, I've built such an application that serves as the basis for the public Docker Image available on Docker Hub.
Instead of having to actually implement this within a node app, you can simply skip that step and run the app as a Docker container, using the public docker image.
The repo above is a minimalistic project that implements the public Docker Container -- containing only the resources unique to an implementation (metadata, migration files, middleware, complex resources, permissions and env vars).
Developing this project required working with multiple DBs and learning to standup and load PostGIS.
There are several guides spread around the internet explaining how to run various DBs via containers, so I've simply aggregated the steps and publsihed them here.
Additionally, the process of loading spatial data into PostGIS was completely new to me. I have worked with these types of systems, benefiting greatly from the work of some awesome data folks, but I'd never been required to dive into that myself. This project changed that.
In this repo
I've included instructions on how to run PostGIS via a container AND I've provided the steps needed to load it. From downloading source material (shapefiles), to converting them to SQL
insert statements and using the CLI
to import the data -- every step is documented.
A series of videos, showing how to configure the application and how several features work, have been published as a playlist on YouTube.
Videos have been produced covering the following topics related to setup & Features:
- GIS DB Setup & Load
- Quick Start
- Insomnia Import
- Key REST Endpoints
- Permissions
- API Response Metadata
- Query Context
- SQL Operators
- CRUD Operations
- Debug Mode
- Complex Resources (subqueries & aggregate queries)
- Middleware & Redactions
- GraphQL Playground and Geoqueries
- gRPC Service (CRUD & Geoqueries)
- DB Schema Migrations
SemVer is used for versioning. For the versions available, see the tags on this repository.
This project is licensed under the MIT License - see the LICENSE.md file for details