-
Notifications
You must be signed in to change notification settings - Fork 227
Allow creating multiple views or derived tables on the same physical HBase table #296
Comments
+1 |
This is a nice idea. If tenants can be segregated to single physical tables, it also opens up possibilities for pinning those tables to regionserver groups HBASE-6721, a feature I expect to see in an upcoming 0.94. |
Ooh, nice suggestion Andrew. +1 |
Unless there are objections, I'm planning on taking a stab at this in the next couple of days. @jtaylor-sfdc, any suggestions on implementation approaches? Thanks. |
Here's my first cut on how this could be implemented:
|
Tenant-specific indexes built on top of these "views" is the next logical step. |
@jtaylor-sfdc, how about using the same table name for both regular and tenant-specific tables? The presence/absence of leading TENANT_ID row key part in SYSTEM.TABLE would distinguish one from another. That way we can omit the extra HBASE_TABLE_NAME column and assume the name of tenant-specific table would be shared by its base table (which would have no TENANT_ID). |
Err... scratch that last comment. We have to have the extra column pointing back to the base table to support multiple "views" on the same base table per tenant. Thanks @ivarley for pointing it out. |
Implemented by @elilevine and pulled into master. Fantastic job! I'll close this issue, as the bulk of the functionality is in and we have these more specific issues for follow up work:
|
A single HBase table can comfortably grow to any size, and there are good reasons to have a relatively small number of them on a cluster (< thousands). Additionally, the fact that HBase can have an arbitrary set of column qualifiers in every row means that one physical table need not be limited to a single schema with a fixed set of columns. The 1-to-1 correspondance of schema <-> table in Phoenix is somewhat artificially limiting; it'd be nice if apps could share large physical tables across many logical phoenix schemas. (This is especially common in multi-tenant situations, which Phoenix already supports by virtue of accepting a tenant-id in the connection properties).
I'd like to suggest giving Pheonix the ability to run one or more views (or tenant-specific tables) that actually use a single common physical HBase table. So when that tenant issues the query "SELECT * FROM foo_view", Phoenix translates this to really mean "SELECT c1, c2, c3 from foo_table".
The structure of the primary key (row key) of all the views would probably need to the same. I can think of a couple ways to structure this from a syntax point of view:
We'd also have to decide whether to automatically enforce row ownership by virtual table (by, for example, transparently including the tenant id in the rowkew) or just leave that up to the clients. (It might be weird if you get a bunch of rows back from your query that someone else inserted into another logical table, with all nulls for the columns you're expecting ...). This could also be done by including a hidden column in the row, or a bit in the rowkey, that indicates which virtual table the rows are part of.
The text was updated successfully, but these errors were encountered: