-
Notifications
You must be signed in to change notification settings - Fork 600
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(E6data): initial implemantation for E6data SQL Analytics platform #9517
base: main
Are you sure you want to change the base?
Conversation
…ince it doesn't support transactions.
- Add support for catalog, secure connection, auto-resume, and cluster UUID - Implement custom table() method to handle catalog and database hierarchy - Modify get_schema() to use E6Data-specific column information - Adjust execute() method for E6Data compatibility - Update _fetch_from_cursor() to handle E6Data result format
- Customize Tokenizer to use double quotes for identifiers - Modify Generator to map VARCHAR, CHAR, and TEXT to STRING - Add custom TRANSFORMS for concat and length functions
- Add E6data dialect - Use E6DataType for type mapping - Retain existing rewrites including custom limit rewrite - Keep other compiler configurations unchanged
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Did a first pass, and will add a separate comment addressing some of your questions.
pyproject.toml
Outdated
@@ -87,6 +87,7 @@ shapely = { version = ">=2,<3", optional = true } | |||
# issues with versions <3.0.2 | |||
snowflake-connector-python = { version = ">=3.0.2,<4,!=3.3.0b1", optional = true } | |||
trino = { version = ">=0.321,<1", optional = true } | |||
e6data-python-connector = { version = "2.2.0" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e6data-python-connector = { version = "2.2.0" } | |
e6data-python-connector = { version = ">=2.2.0,<3", optional = true } |
Unless there's a reason to pin to a single version, this should include a version range, and optional = true
is necessary to ensure this is shipped as an extra, and not a required dependency to use any other Ibis backend(s).
ibis/backends/sql/dialects.py
Outdated
@@ -403,6 +403,7 @@ class Generator(Postgres.Generator): | |||
JSON_TYPE_REQUIRED_FOR_EXTRACTION = True | |||
SUPPORTS_UNLOGGED_TABLES = True | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert this change.
ibis/backends/e6data/__init__.py
Outdated
from urllib.parse import parse_qs, urlparse | ||
|
||
import numpy as np | ||
import pymysql |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you using this somewhere? If not, pleas remove it as an e6data backend dependency and also remove this import.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem like the correct implementation for an e6data conftest.py
given that it's not running in a container.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the tests until implemented correctly.
pyproject.toml
Outdated
@@ -152,6 +153,7 @@ dask = ["dask", "regex", "packaging"] | |||
datafusion = ["datafusion"] | |||
druid = ["pydruid"] | |||
duckdb = ["duckdb"] | |||
e6data = ["e6data-python-connector","pymysql"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e6data = ["e6data-python-connector","pymysql"] | |
e6data = ["e6data-python-connector"] |
Is pymysql
actually required? It doesn't seem to be used anywhere.
ibis/backends/e6data/compiler.py
Outdated
|
||
|
||
@public | ||
class E6DataCompiler(SQLGlotCompiler): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not inherit from MySQLCompiler
?
ibis/backends/e6data/__init__.py
Outdated
import polars as pl | ||
import pyarrow as pa | ||
|
||
class Backend(SQLBackend, CanCreateDatabase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With all the copypasting it might be worth considering inheriting from MySQLBackend
.
ibis/backends/e6data/__init__.py
Outdated
return ".".join(matched.groups()) | ||
def _from_url(self, url: str, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make sure to run your code through just fmt
. There are likely other places where there are formatting inconsistencies.
The best place to start is to try and run
Take a look at the individual backend docs pages in
We'll need to get whatever credentials/auth information is needed to login into a GitHub Actions secret. Let's chat in a DM on Zulip about this. |
- Change Backend class to inherit from MySQLBackend instead of SQLBackend - Remove unnecessary imports and methods duplicated in MySQLBackend - Update connection handling to use E6data_python_connector - Modify schema and table retrieval methods for E6data compatibility - Replace MySQLPandasData with E6DataPandasData for data conversion - Clean up and streamline code, removing print statements and unused methods
Hey @cpcloud , I've addressed most of your comments apart from the tests and documentation. Please let me know if these changes look good. I'll be working on writing the tests now.
I've Dm'ed you on Zulip, kindly need your assistance. |
Description of changes
This pull request aims to integrate E6data, a distributed SQL analytics engine, with Ibis. E6data is also designed for high-performance analytics on large-scale data, and this integration will allow Ibis users to leverage E6data's capabilities seamlessly.
Key changes and considerations:
E6data Backend Implementation:
SQL Dialect Customization:
Compiler Modifications:
Connection String and Authentication:
Schema and Metadata Handling:
Testing:
I need some guidance on how to go about adding tests for the integration.
Documentation:
Could you give me some pointers on how to add relevant documentation?
Dependencies:
We're also working on a Mini-kube-based single-node testing infrastructure, which might make adding testing automation for the CI easier.
I'm new to the Ibis community, and this PR could be much better. I appreciate the time and guidance from maintainers in improving it further. Any comments are welcome; again, I appreciate your time and patience.