ODBC Refactor #3

JohnOmernik · 2024-03-06T16:20:25Z

@nmani has pointed out severe limitations in how we look at ODBC Connections. Currently we just rely on defined ODBC connections in the ODBC UI (which we made useful in the bootstrap by templating registry files that can be loaded in order to take the configuration away from the users.

Naveen has mentioned the ability to make the ODBC Configuration both easy and portable (platform independent).

This will require a refactor that will "likely" live here, but regardless of what has to change in which repos, I will use this to walk through some of the challenges.

Here are some high level items to consider in this project.

jupyter_integration_base

Instances
- Instances are connections to various clusters.
  - Consider you may have an oracle integration that connects to multiple oracle clusters.
  - Currently, Instances for jupyter_pyodbc based integrations (tera, oracle, impala, hive etc) have an optional argument for dsn
    - For most windows environments, this allows us to use built in authentication, or pass passwords at connect
    - Password handling This is an important part: Passwords, or using windows auth when available needs to be seemless for the users.
    - Password (and OTP) passwords are part of jupyter_integration_base. We need to be able to handle asking for passwords/OTP in integration_base and pass to the underlying instance. (or flag when it it Built in auth)

jupyter_pyodbc

This is the base class that multiple other integrations (that use ODBC) utilize as a base.
Most other ODBC classes are just wrappers for this, although some have some custom code. From here on when I say jupyter_pyodbc I am saying "jupyter_pyodbc and child integrations"
jupyter_pyodbc currently uses DSNs through the instace argument ?dsn=JUPINSTNACE
The reason I used this approach (a full DSN vs. options in PyODBC) is that connections for various ODBC drivers were not clear to me how to set that generically. So instead, I would create a ODBC DSN in the UI. Then I would copy that out of my registry. This allowed me to set performance items that weren't (apparently) exposed in pyodbc. Things like the Teradata settings for Strings vs. Integeters. We need this ability in whatever refactor we do. We need the ability to set EVERY argument of a ODBC connection. And we need to have it be set once (in a YAML file or something) in the Bootstrap by the admin and apply to all users.
One issue with this was that some items were dependent on the User running. The first and most obvious is the username. The DSN had my username in it, that's why I created templated DSN and the bootstrap would replace that with the user ran. Another is the Teradata driver version which we have to detect from the User's install.
Having YAML Defined that can be referenced by the Instances is probably good.
Reg files are NOT Portable, windows only, and assuming people can directly run reg files.
I am hesitant to define an instance with EVERY option available in the reg file. We may need to put YAML files in the users profile under .ipython/integrations. We could create a new folder for ODBC connections. This should be platform independent.

jupyter_integrations_bootstrap

Currently this is where an org can define reg files.
We do some templating here for the reg files
That could likely be moved over to the YAML or what ever is defined

Trying to get a bunch of the stuff laid out here so we can refactor all three repos (or more) in a way that makes sense to make it platform independent.

JohnOmernik mentioned this issue Mar 6, 2024

ODBC Indepedence PitterPatterPython/jupyter_integrations_bootstrap#2

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ODBC Refactor #3

ODBC Refactor #3

JohnOmernik commented Mar 6, 2024

ODBC Refactor #3

ODBC Refactor #3

Comments

JohnOmernik commented Mar 6, 2024

jupyter_integration_base

jupyter_pyodbc

jupyter_integrations_bootstrap