Skip to content

Commit

Permalink
#154: Moved dialects documentation.
Browse files Browse the repository at this point in the history
  • Loading branch information
redcatbear committed Apr 26, 2019
1 parent 5a75587 commit 9b974bb
Show file tree
Hide file tree
Showing 10 changed files with 683 additions and 0 deletions.
77 changes: 77 additions & 0 deletions doc/dialects/db2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# DB2 SQL Dialect

DB2 was tested with the IBM DB2 JCC Drivers that come with DB2 LUW V10.1 and V11. As these drivers didn't have any major changes in the past years any DB2 driver should work (back to V9.1). The driver comes with 2 different implementations `db2jcc.jar` and `db2jcc4.jar`. All tests were made with the `db2jcc4.jar`.

Additionally there are 2 files for the DB2 Driver.

* `db2jcc_license_cu.jar` - License File for DB2 on Linux Unix and Windows
* `db2jcc_license_cisuz.jar` - License File for DB2 on zOS (Mainframe)

Make sure that you upload the necessary license file for the target platform you want to connect to.

## Supported Capabilities

The DB2 dialect handles some casts in regards of time data types and functions.

Casting of Data Types

* `TIMESTAMP` and `TIMESTAMP(x)` will be cast to `VARCHAR` to not lose precision.
* `VARCHAR` and `CHAR` for bit data will be cast to a hex string with double the original size
* `TIME` will be cast to `VARCHAR(8)`
* `XML` will be cast to `VARCHAR(DB2_MAX_LENGTH)`
* `BLOB` is not supported

Casting of Functions

* `LIMIT` will replaced by `FETCH FIRST x ROWS ONLY`
* `OFFSET` is currently not supported as only DB2 V11 support this nativly
* `ADD_DAYS`, `ADD_WEEKS` ... will be replaced by `COLUMN + DAYS`, `COLUMN + ....`


## JDBC Driver

You have to specify the following settings when adding the JDBC driver via EXAOperation:

* Name: `DB2`
* Main: `com.ibm.db2.jcc.DB2Driver`
* Prefix: `jdbc:db2:`

## Adapter script

```sql
CREATE or replace JAVA ADAPTER SCRIPT adapter.jdbc_adapter AS

// This is the class implementing the callback method of the adapter script
%scriptclass com.exasol.adapter.jdbc.JdbcAdapter;

// This will add the adapter jar to the classpath so that it can be used inside the adapter script
// Replace the names of the bucketfs and the bucket with the ones you used.
%jar /buckets/bucketfs1/bucket1/virtualschema-jdbc-adapter-dist-1.10.0.jar;

// DB2 Driver files
%jar /buckets/bucketfs1/bucket1/db2jcc4.jar;
%jar /buckets/bucketfs1/bucket1/db2jcc_license_cu.jar;
// uncomment for mainframe connection and upload db2jcc_license_cisuz.jar;
//%jar /buckets/bucketfs1/bucket1/db2jcc_license_cisuz.jar;
/
```

## Creating a Virtual Schema

You can now create a virtual schema as follows:

```sql
create or replace connection DB2_CON to 'jdbc:db2://host:port/database' user 'db2-usr' identified by 'db2-pwd';

create virtual schema db2 using adapter.jdbc_adapter with
SQL_DIALECT = 'DB2'
CONNECTION_NAME = 'DB2_CON'
SCHEMA_NAME = '<schemaname>'
;
```

`<schemaname>` has to be replaced by the actual db2 schema you want to connect to.

## Running the DB2 Integration Tests

A how to has been included in the [setup sql file](../../integration-test-data/db2-testdata.sql)
48 changes: 48 additions & 0 deletions doc/dialects/exasol.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Exasol SQL Dialect

## Supported Capabilities

The Exasol SQL dialect supports all capabilities that are supported by the virtual schema framework.

## JDBC Driver

Connecting to an Exasol database is the simplest way to start with virtual schemas.
You don't have to install any JDBC driver, because it is already installed in the Exasol database and also included in the jar of the JDBC adapter.

## Adapter Script

After uploading the adapter jar, the adapter script can be created as follows:

```sql
CREATE SCHEMA adapter;
CREATE JAVA ADAPTER SCRIPT adapter.jdbc_adapter AS
%scriptclass com.exasol.adapter.jdbc.JdbcAdapter;
%jar /buckets/your-bucket-fs/your-bucket/virtualschema-jdbc-adapter-dist-1.10.0.jar;
/
```

## Creating a Virtual Schema

```sql
CREATE CONNECTION exasol_conn TO 'jdbc:exa:exasol-host:1234' USER 'user' IDENTIFIED BY 'pwd';

CREATE VIRTUAL SCHEMA virtual_exasol USING adapter.jdbc_adapter WITH
SQL_DIALECT = 'EXASOL'
CONNECTION_NAME = 'EXASOL_CONN'
SCHEMA_NAME = 'default';
```

## Using IMPORT FROM EXA Instead of IMPORT FROM JDBC

Exasol provides the faster and parallel `IMPORT FROM EXA` command for loading data from Exasol. You can tell the adapter to use this command instead of `IMPORT FROM JDBC` by setting the `IMPORT_FROM_EXA` property. In this case you have to provide the additional `EXA_CONNECTION_STRING` which is the connection string used for the internally used `IMPORT FROM EXA` command (it also supports ranges like `192.168.6.11..14:8563`). Please note, that the `CONNECTION` object must still have the JDBC connection string in `AT`, because the Adapter Script uses a JDBC connection to obtain the metadata when a schema is created or refreshed. For the internally used `IMPORT FROM EXA` statement, the address from `EXA_CONNECTION_STRING` and the user name and password from the connection will be used.

```sql
CREATE CONNECTION exasol_conn TO 'jdbc:exa:exasol-host:1234' USER 'user' IDENTIFIED BY 'pwd';

CREATE VIRTUAL SCHEMA virtual_exasol USING adapter.jdbc_adapter WITH
SQL_DIALECT = 'EXASOL'
CONNECTION_NAME = 'EXASOL_CONN'
SCHEMA_NAME = 'default'
IMPORT_FROM_EXA = 'true'
EXA_CONNECTION_STRING = 'exasol-host:1234';
```
96 changes: 96 additions & 0 deletions doc/dialects/hive.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
# Hive SQL Dialect

## JDBC Driver

The dialect was tested with the Cloudera Hive JDBC driver available on the [Cloudera downloads page](http://www.cloudera.com/downloads). The driver is also available directly from [Simba technologies](http://www.simba.com/), who developed the driver.

When you unpack the JDBC driver archive you will see that there are two variants, JDBC 4.0 and 4.1. We tested with the JDBC 4.1 variant.

You have to specify the following settings when adding the JDBC driver via EXAOperation:

* Name: `Hive`
* Main: `com.cloudera.hive.jdbc41.HS2Driver`
* Prefix: `jdbc:hive2:`

Make sure you upload **all files** of the JDBC driver (one at the time of writing) in EXAOperation **and** to the bucket.

## Adapter Script

You have to add all files of the JDBC driver to the classpath using `%jar` as follows (filenames may vary):

```sql
CREATE SCHEMA adapter;
CREATE JAVA ADAPTER SCRIPT jdbc_adapter AS
%scriptclass com.exasol.adapter.jdbc.JdbcAdapter;

%jar /buckets/bucketfs1/bucket1/virtualschema-jdbc-adapter-dist-1.10.0.jar;

%jar /buckets/bucketfs1/bucket1/HiveJDBC41.jar;
/
```

### Creating a Virtual Schema

```sql
CREATE CONNECTION hive_conn TO 'jdbc:hive2://hive-host:10000' USER 'hive-usr' IDENTIFIED BY 'hive-pwd';

CREATE VIRTUAL SCHEMA hive_default USING adapter.jdbc_adapter WITH
SQL_DIALECT = 'HIVE'
CONNECTION_NAME = 'HIVE_CONN'
SCHEMA_NAME = 'default';
```

### Connecting To a Kerberos Secured Hadoop:

Connecting to a Kerberos secured Impala or Hive service only differs in one aspect: You have to a `CONNECTION` object which contains all the relevant information for the Kerberos authentication. This section describes how Kerberos authentication works and how to create such a `CONNECTION`.

#### Understanding how it Works (Optional)

Both the adapter script and the internally used `IMPORT FROM JDBC` statement support Kerberos authentication. They detect, that the connection is a Kerberos connection by a special prefix in the `IDENTIFIED BY` field. In such case, the authentication will happen using a Kerberos keytab and Kerberos config file (using the JAAS Java API).

The `CONNECTION` object stores all relevant information and files in its fields:

* The `TO` field contains the JDBC connection string
* The `USER` field contains the Kerberos principal
* The `IDENTIFIED BY` field contains the Kerberos configuration file and keytab file (base64 encoded) along with an internal prefix `ExaAuthType=Kerberos;` to identify the `CONNECTION` as a Kerberos `CONNECTION`.

#### Generating the CREATE CONNECTION Statement

In order to simplify the creation of Kerberos `CONNECTION` objects, the [`create_kerberos_conn.py`](https://github.com/EXASOL/hadoop-etl-udfs/blob/master/tools/create_kerberos_conn.py) Python script has been provided. The script requires 5 arguments:

* `CONNECTION` name (arbitrary name for the new `CONNECTION`)
* Kerberos principal for Hadoop (i.e., Hadoop user)
* Kerberos configuration file path (e.g., `krb5.conf`)
* Kerberos keytab file path, which contains keys for the Kerberos principal
* JDBC connection string

Example command:

```
python tools/create_kerberos_conn.py krb_conn [email protected] /etc/krb5.conf ./krbuser.keytab \
'jdbc:hive2://hive-host.example.com:10000;AuthMech=1;KrbRealm=EXAMPLE.COM;KrbHostFQDN=hive-host.example.com;KrbServiceName=hive'
```

Output:

```sql
CREATE CONNECTION krb_conn TO 'jdbc:hive2://hive-host.example.com:10000;AuthMech=1;KrbRealm=EXAMPLE.COM;KrbHostFQDN=hive-host.example.com;KrbServiceName=hive' USER '[email protected]' IDENTIFIED BY 'ExaAuthType=Kerberos;enp6Cg==;YWFhCg=='
```

#### Creating the CONNECTION
You have to execute the generated `CREATE CONNECTION` statement directly in EXASOL to actually create the Kerberos `CONNECTION` object. For more detailed information about the script, use the help option:

```sh
python tools/create_kerberos_conn.py -h
```

#### Using the Connection When Creating a Virtual Schema

You can now create a virtual schema using the Kerberos connection created before.

```sql
CREATE VIRTUAL SCHEMA hive_default USING adapter.jdbc_adapter WITH
SQL_DIALECT = 'HIVE'
CONNECTION_NAME = 'KRB_CONN'
SCHEMA_NAME = 'default';
```
54 changes: 54 additions & 0 deletions doc/dialects/impala.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# Impala SQL Dialect

The Impala dialect is similar to the Hive dialect in most aspects. For this reason we only highlight the differences in this section.

## JDBC Driver

You have to specify the following settings when adding the JDBC driver via EXAOperation:

* Name: `Impala`
* Main: `com.cloudera.impala.jdbc41.Driver`
* Prefix: `jdbc:impala:`

Make sure you upload **all files** of the JDBC driver (over 10 at the time of writing) in EXAOperation and to the bucket.

## Adapter script

The adapter can be created similar to Hive:

```sql

CREATE SCHEMA adapter;
CREATE JAVA ADAPTER SCRIPT jdbc_adapter AS
%scriptclass com.exasol.adapter.jdbc.JdbcAdapter;

%jar /buckets/bucketfs1/bucket1/virtualschema-jdbc-adapter-dist-1.10.0.jar;

%jar /buckets/bucketfs1/bucket1/hive_metastore.jar;
%jar /buckets/bucketfs1/bucket1/hive_service.jar;
%jar /buckets/bucketfs1/bucket1/ImpalaJDBC41.jar;
%jar /buckets/bucketfs1/bucket1/libfb303-0.9.0.jar;
%jar /buckets/bucketfs1/bucket1/libthrift-0.9.0.jar;
%jar /buckets/bucketfs1/bucket1/log4j-1.2.14.jar;
%jar /buckets/bucketfs1/bucket1/ql.jar;
%jar /buckets/bucketfs1/bucket1/slf4j-api-1.5.11.jar;
%jar /buckets/bucketfs1/bucket1/slf4j-log4j12-1.5.11.jar;
%jar /buckets/bucketfs1/bucket1/TCLIServiceClient.jar;
%jar /buckets/bucketfs1/bucket1/zookeeper-3.4.6.jar;
/
```

## Creating a Virtual Schema

You can now create a virtual schema as follows:

```sql
CREATE CONNECTION impala_conn TO 'jdbc:impala://impala-host:21050' USER 'impala-usr' IDENTIFIED BY 'impala-pwd';

CREATE VIRTUAL SCHEMA impala_default USING adapter.jdbc_adapter WITH
SQL_DIALECT = 'IMPALA'
CONNECTION_NAME = 'IMPALA_CONN'
SCHEMA_NAME = 'default';
```

Connecting to a Kerberos secured Impala works similar as for Hive and is described in the section [Connecting To a Kerberos Secured Hadoop](hive.md#connecting-to-a-kerberos-secured-hadoop).
115 changes: 115 additions & 0 deletions doc/dialects/oracle.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
# Oracle SQL Dialect

## Supported capabilities

The Oracle dialect does not support all capabilities. A complete list can be found in [OracleSqlDialect.getCapabilities()](../../virtualschema-jdbc-adapter/src/main/java/com/exasol/adapter/dialects/impl/OracleSqlDialect.java).

Oracle data types are mapped to their equivalents in Exasol. The following exceptions apply:

- `NUMBER`, `NUMBER with precision > 36` and `LONG` are casted to `VARCHAR` to prevent a loss of precision. <br>
If you want to return a DECIMAL type for these types you can set the property ORACLE_CAST_NUMBER_TO_DECIMAL_WITH_PRECISION_AND_SCALE: <br>
`ORACLE_CAST_NUMBER_TO_DECIMAL_WITH_PRECISION_AND_SCALE='36,20'` <br>
This will cast NUMBER with precision > 36, NUMBER without precision and LONG to DECIMAL(36,20).
Keep in mind that this will yield errors if the data in the Oracle database does not fit into the specified DECIMAL type.
- `DATE` is casted to `TIMESTAMP`. This data type is only supported for positive year values, i.e., years > 0001.
- `TIMESTAMP WITH [LOCAL] TIME ZONE` is casted to `VARCHAR`. Exasol does not support timestamps with time zone information.
- `INTERVAL` is casted to `VARCHAR`.
- `CLOB`, `NCLOB` and `BLOB` are casted to `VARCHAR`.
- `RAW` and `LONG RAW` are not supported.

## JDBC Driver

To setup a virtual schema that communicates with an Oracle database using JDBC, the JDBC driver, e.g., `ojdbc7-12.1.0.2.jar`, must first be installed in EXAoperation and deployed to BucketFS; see [this article](https://www.exasol.com/support/browse/SOL-179#WhichJDBCdriverforOracleshallIuse?) and [Deploying the Adapter Step By Step](../deploying_the_virtual_schema_adapter.md) for instructions.

## Adapter Script

After uploading the adapter jar we are ready to create an Oracle adapter script. Adapt the following script as indicated.

```sql
CREATE SCHEMA adapter;
CREATE JAVA ADAPTER SCRIPT adapter.jdbc_oracle AS
%scriptclass com.exasol.adapter.jdbc.JdbcAdapter;

// You need to replace `your-bucket-fs` and `your-bucket` to match the actual location
// of the adapter jar.
%jar /buckets/your-bucket-fs/your-bucket/virtualschema-jdbc-adapter-dist-1.10.0.jar;

// Add the oracle jdbc driver to the classpath
%jar /buckets/bucketfs1/bucket1/ojdbc7-12.1.0.2.jar
/
```

## JDBC Connection

Next, create a JDBC connection to your Oracle database. Adjust the properties to match your environment.

```sql
CREATE CONNECTION jdbc_oracle
TO 'jdbc:oracle:thin:@//<host>:<port>/<service_name>'
USER '<user>'
IDENTIFIED BY '<password>';
```

A quick option to test the `JDBC_ORACLE` connection is to run an `IMPORT FROM JDBC` query. The connection works, if `42` is returned.

```sql
IMPORT FROM JDBC AT jdbc_oracle
STATEMENT 'SELECT 42 FROM DUAL';
```

### Creating a Virtual schema

Having created both a JDBC adapter script and a JDBC oracle connection, we are ready to create a virtual schema. Insert the name of the schema that you want to expose in Exasol.

```sql
CREATE VIRTUAL SCHEMA virt_oracle USING adapter.jdbc_oracle WITH
SQL_DIALECT = 'ORACLE'
CONNECTION_NAME = 'JDBC_ORACLE'
SCHEMA_NAME = '<schema>';
```

## Using IMPORT FROM ORA Instead of IMPORT FROM JDBC

Exasol provides the `IMPORT FROM ORA` command for loading data from Oracle. It is possible to create a virtual schema that uses `IMPORT FROM ORA` instead of JDBC to communicate with Oracle. Both options are indented to support the same features. `IMPORT FROM ORA` almost always offers better performance since it is implemented natively.

This behavior is toggled by the Boolean `IMPORT_FROM_ORA` variable. Note that a JDBC connection to Oracle is still required to fetch metadata. In addition, a "direct" connection to the Oracle database is needed.

### Deploying the Oracle Instant Client

To be able to communicate with Oracle, you first need to supply Exasol with the Oracle Instant Client, which can be obtained [directly from Oracle](http://www.oracle.com/technetwork/database/database-technologies/instant-client/overview/index.html). Open EXAoperation, visit Software -> "Upload Oracle Instant Client" and select the downloaded package. The latest version of Oracle Instant Client we tested is `instantclient-basic-linux.x64-12.1.0.2.0`.

### Creating an Oracle Connection

Having deployed the Oracle Instant Client, a connection to your Oracle database can be set up.

```sql
CREATE CONNECTION conn_oracle
TO '(DESCRIPTION =
(ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)
(HOST = <host>)
(PORT = <port>)))
(CONNECT_DATA = (SERVER = DEDICATED)
(SERVICE_NAME = <service_name>)))'
USER '<username>'
IDENTIFIED BY '<password>';
```

This connection can be tested using, e.g., the following SQL expression.

```sql
IMPORT FROM ORA at CONN_ORACLE
STATEMENT 'SELECT 42 FROM DUAL';
```

### Creating a Virtual schema

Assuming you already setup the JDBC connection `JDBC_ORACLE` as shown in the previous section, you can continue with creating the virtual schema.

```sql
CREATE VIRTUAL SCHEMA virt_import_oracle USING adapter.jdbc_oracle WITH
SQL_DIALECT = 'ORACLE'
CONNECTION_NAME = 'JDBC_ORACLE'
SCHEMA_NAME = '<schema>'
IMPORT_FROM_ORA = 'true'
ORA_CONNECTION_NAME = 'CONN_ORACLE';
```
Loading

0 comments on commit 9b974bb

Please sign in to comment.