Skip to content
This repository has been archived by the owner on Feb 12, 2022. It is now read-only.

Tweak syntax for multi-tenant tables #592

Closed
jtaylor-sfdc opened this issue Nov 27, 2013 · 8 comments
Closed

Tweak syntax for multi-tenant tables #592

jtaylor-sfdc opened this issue Nov 27, 2013 · 8 comments
Assignees
Milestone

Comments

@jtaylor-sfdc
Copy link
Contributor

The current syntax relies on a BASE_TABLE property, which is somewhat outside of the SQL syntax. Here's an example of the current syntax:

CREATE TABLE my_base_schema.base_table (
    tenant_id CHAR(15), 
    type_id CHAR(3), 
    entity_id CHAR(15));
CREATE TABLE my_schema.my_table (
    additional_col VARCHAR)
    BASE_TABLE='my_base_schema.base_table',
    TENANT_TYPE_ID='abc';

Here's an example of the proposed new syntax:

CREATE TABLE my_base_schema.base_table (
    tenant_id CHAR(15),
    type_id CHAR(3),
    entity_id CHAR(15)) MULTI_TENANT=true;
CREATE TABLE my_schema.my_table (column1 VARCHAR)
LIKE my_base_schema.base_table AS 'abc';

This would also be allowed

ALTER TABLE my_base_schema.base_table SET MULTI_TENANT=true;

as would this, but only if the table has no tenant-specific tables:

ALTER TABLE my_base_schema.base_table SET MULTI_TENANT=false;

Changes include:

  • Explicitly declaring a table as multi-tenant. This will allow us to do any validation required for the base table and create a second HBase table where the index rows would be stored (see Support secondary indexes on tenant-specific tables #491).
  • Don't allow a tenant connection to create a table from a base table that is not multi-tenant.
  • The tenant type ID is optionally specified in the AS clause. If not supplied, then the automatic filtering we do on tenant-specific tables would be restricted to just the first column (i.e. tenant_id) instead of two columns (i.e. tenant_id and type_id). The column definitions should also be optional and if left off would just create a tenant-specific table without any additional columns being defined.
  • Support using LIKE for non-tenant-specific use cases, as I believe the code would be the same. I think in this case, we wouldn't want to restrict what you can/can't do on the base table, though. Maybe the easiest way to handle this is by just not tracking the base table for these cases - we'd essentially just be making a copy of the metadata to create a table with a new name (that's what Postgres and MySQL support).
@elilevine
Copy link
Contributor

I like the explicit MULTI_TENANT='true' on the base table. We should consider surfacing that in DatabaseMetaData in some form.

@jtaylor-sfdc
Copy link
Contributor Author

Agreed. Just need to return that column (at the end of the select) in the
ResultSet for getTables()

On Thu, Dec 19, 2013 at 12:30 PM, Eli Levine [email protected]:

I like the explicit MULTI_TENANT='true' on the base table. We should
consider surfacing that in DatabaseMetaData in some form.


Reply to this email directly or view it on GitHubhttps://github.com//issues/592#issuecomment-30963643
.

@AakashPradeep
Copy link

Just curious to know that why we have not considered to call this create table as "CREATE TENANT TABLE" or DERIVED TABLE for example:

CREATE [TENANT | DERIVED ] TABLE my_schema.my_table (column1 VARCHAR) LIKE my_base_schema.base_table AS 'abc';

Since CREATE TABLE and LIKE give me the impression that we are creating another table in HBase having
the same schema as BASE table.

@jtaylor-sfdc
Copy link
Contributor Author

Good points, @AakashPradeep. Perhaps using LIKE would be confusing. I'm considering the following syntax:

-- Declare base table as both multi-tenant and multi-type. It will be possible to
-- just declare one or the other as well. For example, for the non multi-tenant
-- scenario, you may want to use a base table to represent multiple types.
CREATE bar (tenant_id CHAR(15), type_id CHAR(3), k INTEGER)
MULTI_TENANT=true, MULTI-TYPE=true

-- Instead of using CREATE ... LIKE, use DERIVE ... FROM which expresses
-- the relationship a bit better. We could still support CREATE ... LIKE pretty
-- easily, but in that case, it'd just make a copy of the source table.
DERIVE foo (aaa VARCHAR)
FROM bar AS 'abc'

@jtaylor-sfdc
Copy link
Contributor Author

I've made the above changes, but the more I think about it, the more I think these concepts fit into the SQL concept of a VIEW. The multi-tenant aspects put a bit of a wrinkle in things, but we can infer a VIEW is multi-tenant based on the connection being a tenant-specific connection.

Here's an example:

CREATE  TABLE product_metrics (
    tenant_id CHAR(15),
    user_id CHAR(15), 
    created_date DATE, 
    metric_id INTEGER,
    category_id VARCHAR,
    CONSTRAINT pk PRIMARY KEY (tenant_id, user_id, created_date, metric_id));

Note that we could potentially still require the MULT_TENANT=true property above to be more explicit about allowing the creation of tenant-specific tables from a given table.

To create a multi-tenant table, you could connect through a tenant-specific connection and do the following:

CREATE VIEW my_product_metrics AS
SELECT * FROM product_metrics;

Since you've connected through a tenant-specific connection, the WHERE tenant_id=? clause would automatically be tacked on. You could optionally add additional columns during the CREATE like this:

CREATE VIEW my_product_metrics(product_name VARCHAR) AS
SELECT * FROM product_metrics;

To create a multi-tenant, multi-type table, you could add a WHERE clause like this:

CREATE VIEW mobile_product_metrics(phone_type TINYINT)  AS
SELECT * FROM product_metrics
WHERE category_id = 'MOBILE';

You could also create a VIEW from a non tenant-specific connection.

You could use any WHERE clause, but for your VIEW to be updateable, you could only use equality expressions. These expressions would then be applied automatically for UPSERT statements.

Initially, we might only support SELECT *, but we could support a list of columns pretty easily and eventually when we have derived tables, even expressions.

@elilevine
Copy link
Contributor

This is really cool. If I read this right, you are proposing:

  1. Use CREATE VIEW statement to create multi-tenant and/or multi-type tables.
  2. Keep multi-tenant logic mostly the same, with the exception of possibly adding MULTI_TENANT=true property when creating base tables that support multi-tenant tables on top of them.
  3. Make multi-type tables more generic by allowing users to specify a WHERE clause in CREATE VIEW. This WHERE clause would automatically be applied to SELECTs and UPSERTs for the view.

I like CREATE VIEW more than DERIVE. The term "view" represents well the fact that these multi-tenant and/or multi-type tables are a sliver of data in a physical HBase table. I also like the explicit MULTI_TENANT=true property on base tables. We might want to consider @AakashPradeep's suggestion of adding the keyword TENANT in there to make things more explicit and allow Phoenix to fail if a tenant-specific table was created over a non-tenant-specific connection and visa versa:

CREATE TENANT VIEW my_product_metrics(product_name VARCHAR) AS
SELECT * FROM product_metrics;

The flexibility of the WHERE clause in defining how multi-type is great. One thing of concern is that currently typeId is required to be in the row key behind tenantId for tenant-specific tables, making for fast scans. Example in previous comment uses category_id, a key-value column.

@jtaylor-sfdc
Copy link
Contributor Author

Having typeId after tenantId is not a requirement. It would perform better, true, but it shouldn't be a requirement. It's more of a design decision.

I'll start down this path and let folks know if I hit any roadblocks.

@jtaylor-sfdc
Copy link
Contributor Author

I will write this up on the wiki too, but the above is now implemented:

  • multi-tenant tables are views on regular tables.
  • regular/non multi-tenant tables can have views too.
  • views can have views.
    You can specify a WHERE clause on your CREATE VIEW statement. It uses standard SQL syntax, with the restriction that you can't yet project a subset of your table columns into your view - you have to project them all. However, I also just added a check that disallows a tenant-specific connection from referring to the tenant id column.

Phoenix figures out if the view is "updatable" based on the WHERE clause in the CREATE VIEW statement. The basic rule is that it's updatable if you only have equality expressions with a column reference and a literal separated by ANDs, like this:

CREATE TABLE t (
    id INTEGER NOT NULL PRIMARY KEY,
    a VARCHAR,
    b INTEGER,
    c DATE);

CREATE VIEW v AS
SELECT * FROM t
WHERE a='foo' AND b=1

In these cases, Phoenix can infer the value that the columns have, so it can allow you to update through the view. For example, you could do this:

UPSERT INTO v(id,c) VALUES(100,CURRENT_DATE());

You get an error, though, if you try to update through a view and change anything that would make the WHERE clause not be satisfied. For example, this would throw an exception, since you're trying to set a to something other than 'foo':

UPSERT INTO v(id,a) VALUES(100,'zzz');

If your view isn't updatable, you get an error if you try to do an UPSERT or DELETE on it. For example, the following would not be updatable:

CREATE VIEW v AS
SELECT * FROM t
WHERE a = 'foo' AND c < CURRENT_DATE() - 7

It's still valid, though, and you can query through it and even derive other views from it, but you just can't use it in an UPSERT or DELETE statement. Most RDBMS have the concept of an updatable view. The restrictions on what is and what isn't updatable is not standardized, though. I thought we'd start simple on this - we could potentially make more views updatable down the road.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants