You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Microsoft SQL Server
Oracle
Hive
PostgreSQL
MySQL
Amazon Redshift
MongoDB
Cassandra
BigQuery
Elasticsearch
Apache Kafka
Clickhouse
Druid
JMX
Kinesis
Phoenix
Google Sheets
Kudu
Redis
Thrift
Prometheus
Linkedin Pinot
Black Hole
Accumulo
Local File
Memory
MemSQL
System
TPCDS
TPCH
see https://prestosql.io/docs/current/connector.html for up to date list
Supported file types in Hive Connector:
ORC
Parquet
Uber Hudi/hoodie (already in Hive connector)
Netflix Iceberg (already in Hive connector)
Avro
JSON (using org.apache.hive.hcatalog.data.JsonSerDe)
CSV (using org.apache.hadoop.hive.serde2.OpenCSVSerde)
TextFile
RCText (RCFile using ColumnarSerDe)
RCBinary (RCFile using LazyBinaryColumnarSerDe)
SequenceFile
Security:
LDAP/Kerberos/JWT/Cert
TLS
Apache Ranger for column/table/schema/catalog level authorisation, column masking and row level filtering fine grained access
All view sql:
from_utf8(from_base64(substr(view_original_text,17,length(view_original_text) - 19))) view_sql
Very useful feature is joining between data from any of the connectors.
Disclaimer: while Hive sources are suitable for data of large size (ie Petabytes), JDBC sources are only performant on small tables. For example:
you have a 8 billion row txns table that is indexed/partitioned in Oracle.
Below query takes 38 seconds run in Oracle directly but 70 minutes when run in Presto -->
select custid, count(1) numtxns from oracle.txns
where txnmonth in (to_date('202002','YYYYMM'),to_date('201902','YYYYMM'))
group by custid
having count(1) > 4
Missing in all JDBC connectors:
Aggregate pushdown #6613
Join pushdown #6620
Complex filter pushdown #7994 / #402
ORDER BY pushdown #8093
DISTINCT pushdown #4324
The text was updated successfully, but these errors were encountered:
https://prestosql.io/resources.html is very thin right now.
Proposing an 'unverified' section:
C#/.NET: https://github.com/koralium/EntityFrameworkCore.Presto
Perl: https://github.com/spiritloose/Net-Presto
PHP: https://github.com/360d-io-labs/PhpPrestoClient
Elixir: https://github.com/bbalser/ex_prestodb
Scala: https://github.com/nezihyigitbasi/presto-scala-client
C: https://github.com/easydatawarehousing/prestoclient/tree/master/C
Rust: https://github.com/nooberfsh/prusto
Clojure: https://github.com/metabase/metabase/blob/6a6327646964559e735c3557d8c39f5ceff5dcd8/modules/drivers/presto/src/metabase/driver/presto.clj
ODBC: Simba
GUI: Hue, Sqlpad..etc
Google Cloud: Dataproc
Alibaba Cloud: Data Lake Analytics, E-MapReduce
Tencent Cloud: Elastic MapReduce
Huawei Cloud: MapReduce Service
Unmerged Connectors (data sources):
Generic JDBC #3105
Flexible (csv/excel/txt/raw/html/json/xml/word doc/powerpoint/pdf/outlook email/zip/gzip/bzip2 file from websites or local disk) https://github.com/ebyhr/presto-flex
HTTP Rest API sources https://github.com/prestosql-rocks/presto-rest / https://github.com/cecoppinger/presto-rest / https://github.com/nineinchnick/trino-rest
Sybase (SAP) ASE & IQ #3462 / #2976
Teradata prestodb/presto#12078 / https://github.com/jmrozanec/presto-teradata-connector
DB2 https://github.com/IBM/presto-db2
Snowflake #2551 / https://github.com/rahulbsw/trino-snowflake/tree/main/src/main/java/io/trino/snowflake / https://github.com/awslabs/aws-athena-query-federation/pull/454/files
Hbase https://github.com/harbby/presto-connectors/tree/master/presto-hbase / https://github.com/analysys/presto-hbase-connector / https://github.com/awslabs/aws-athena-query-federation/tree/master/athena-hbase
Riak https://github.com/kuenishi/presto-riak
Vertica #6134 / https://github.com/alexsumin/presto-vertica-connector / https://github.com/lev4ik/presto-vertica-connector / https://github.com/awslabs/aws-athena-query-federation/tree/master/athena-vertica
Carbondata https://github.com/apache/carbondata/tree/master/integration/presto
SAP Hana https://github.com/qq5132834/presto-0.233-hana-connector
Salesforce #2548
DynamoDB https://github.com/buremba/presto-dynamodb / https://github.com/Teradata/presto/tree/c125c3413c19bc6bb69e5f16e4c6f769062ff1cf/presto-dynamo / https://github.com/mugglmenzel/presto/tree/master/presto-dynamo / https://github.com/awslabs/aws-athena-query-federation/tree/master/athena-dynamodb
Athena https://github.com/kutny/presto-athena/tree/master/src/main/java/com/facebook/presto/plugin/athena
DocumentDB https://github.com/awslabs/aws-athena-query-federation/tree/master/athena-docdb
Timestream https://github.com/awslabs/aws-athena-query-federation/tree/master/athena-timestream
Neptune https://github.com/awslabs/aws-athena-query-federation/tree/master/athena-neptune
Sagemaker https://github.com/aws-samples/amazon-athena-train-amazon-sagemaker
Azure CosmosDB (supported by Mongo connector)
Azure SQL DB (supported by SQLServer connector)
Netezza https://github.com/aakashnand/trino-netezza / https://github.com/combineads/plugin-presto-netezza
Lucene https://github.com/totticarter/presto-lucene
Ignite #8323 / https://github.com/prabhuom1/presto-ignite-connector-plugin / https://github.com/emhlbmc/presto-ignite
Databricks Spark Delta lake https://docs.delta.io/0.7.0/presto-integration.html
Azure Synapse https://docs.starburstdata.com/latest/connector/starburst-synapse.html
Pulsar #8020 / https://github.com/apache/pulsar/tree/master/pulsar-sql/presto-pulsar/src/main/java/org/apache/pulsar/sql/presto (inc. protobuf - https://github.com/apache/pulsar/pull/9841/files)
Ethereum https://github.com/xiaoyao1991/presto-ethereum
Cockroach #8317
Excel https://github.com/phillip2019/trino-plugins/tree/main/spreadsheet-storage-handler
Zookeeper https://github.com/phillip2019/trino-plugins/tree/9df3cdd40a15da6415fd0d0d83f6e1576674b0d6/zookeeper-storage-handler/src/main/java/com/fortitudetec/presto/zookeeper
MapD https://github.com/NVIDIA/presto-mapd-connector
Arrow https://github.com/Praveen2112/presto/tree/arrow_connector/presto-arrow-flight/src/main/java/io/prestosql/plugin/arrow / https://github.com/Parth-Brahmbhatt/presto-1/tree/arrow-flight/presto-arrowflight/src/main/java/io/prestosql/plugin/arrowflight / https://github.com/koralium/Koralium/tree/master/presto/trino-connector-arrowflight
kdb https://github.com/sand-stone/dataswitch/tree/master/presto-kdb/src/main/java/kdb/presto / https://github.com/tuor713/trino-kdb
Kylin https://github.com/poiyyq/presto-kylin/tree/master/src/main/java/com/facebook/presto/plugin/kylin
Exasol https://github.com/blunghamer/presto-plugins/tree/master/presto-exasol/src/main/java/io/prestosql/plugin/exasol
Influx #2397
SnappyData https://github.com/dawsongzhao/snappydata-presto-connector
Greenplum https://github.com/openlookeng/hetu-core/tree/master/hetu-greenplum/src/main/java/io/hetu/core/plugin/greenplum
HiveServer2 https://github.com/WeilerWebServices/Eventbrite/tree/master/presto/presto-hive-jdbc / https://github.com/leolorenzoluis/presto-csv-jdbc
Cloudwatch Logs https://github.com/awslabs/aws-athena-query-federation/tree/master/athena-cloudwatch
Cloudwatch Metrics https://github.com/awslabs/aws-athena-query-federation/tree/master/athena-cloudwatch-metrics
AWS Resource inventory CMDB https://github.com/awslabs/aws-athena-query-federation/tree/master/athena-aws-cmdb
Obiba and REST https://github.com/obiba/presto-obiba
Manta https://github.com/joyent/presto-manta
Hazelcast https://github.com/ajermakovics/presto-hazelcast
TileDB https://github.com/TileDB-Inc/TileDB-Presto
Template https://github.com/jmrozanec/presto-template-connector
BioSamples https://github.com/EBIBioSamples/biosamples-presto-connector
FPGA https://github.com/supermt/presto-fpga-connector
Vitess https://github.com/vitessio/contrib/tree/master/presto-vitess-connector / https://github.com/yuokada/presto-vitess
Loki https://github.com/sdojjy/presto-loki/tree/master/src/main/java/io/prestosql/plugin/loki
LocalCSV https://github.com/dongqianwei/presto-localcsv
Spreadsheets https://github.com/fortitudetec/presto-plugins/tree/master/spreadsheet-storage-handler
AWS metadata? https://github.com/haitaoyao/presto-aws-plugin
Cloudata? https://github.com/justinsb/cloudata/tree/master/cloudata-structured/src/main/java/com/cloudata/structured/sql/provider
Extract/Memory? https://github.com/yoloz/prestoSamples
MD5? https://github.com/Bongss/MD5PasswordCracker/tree/master/presto-cracker/src/main/java/PasswordCracker
Weiwodb/bytecode? https://github.com/photogamerun/weiwodb
Kubernetes https://github.com/xuxinkun/kubesql
TiDB/tikv https://github.com/zhihu/presto-connectors
OpenGauss https://github.com/openlookeng/hetu-core/tree/master/hetu-opengauss/src/main/java/io/hetu/core/plugin/opengauss
Victoria Metrics prestodb/presto#13777
Pravega pravega/presto-connector#7
Twitter https://github.com/bitsondatadev/presto/tree/twitter-hacks / https://github.com/kokosing/trino-rest/tree/master/trino-rest-twitter/src/main/java/rocks/trino/rest/twitter
Apache DataSketches #6643
Slack https://github.com/kokosing/trino-rest/tree/master/trino-rest-slack/src/main/java/rocks/trino/rest/slack / https://aws.amazon.com/blogs/big-data/create-a-custom-data-connector-to-slacks-member-analytics-api-in-amazon-quicksight-with-amazon-athena-federated-query/
Github https://github.com/kokosing/trino-rest/tree/master/trino-rest-github/src/main/java/rocks/trino/rest/github
Katta https://github.com/zhenqin/katta/tree/master/katta-presto/src/main/java/com/ivyft/katta/presto
Monarch Ampool https://github.com/davinash/monarch/tree/21ac4f538fe695fd7481003084fd2f0a8982cd32/Connectors/monarch-presto/src/main/java/io/ampool/presto/connector
Huawei Heti OpenLookEng (DataCenter/VDM) https://github.com/openlookeng/hetu-core
Yugabyte-db #5708 / https://docs.yugabyte.com/latest/develop/ecosystem-integrations/presto/
recordstore https://github.com/PierreZ/record-store/tree/04b399325aeb020eb9816aeb9a18c876a6a7ee27/presto-connector/src/main/java/fr/pierrezemb/recordstore/presto
Pixelsdb https://github.com/pixelsdb/pixels/tree/master/pixels-presto
Kairosdb https://github.com/xuhang1458/presto-kairosdb-connector
Apache IoTDB (Internet of Things) https://github.com/xuhang1458/presto-iotdb-connector
Git https://github.com/nineinchnick/trino-git
Rapid7 Armor https://github.com/rapid7/presto-armor-connector/tree/master/src/main/java/com/rapid7/presto/armor
Aerospike https://www.aerospike.com/docs/connect/access/presto/index.html
Hyena messaging daemon https://github.com/FCG-LLC/presto/tree/master/presto-hyena/src/main/java/co/llective/presto/hyena
Python https://github.com/tooptoop4/presto-python/blob/master/src/test/java/rocks/prestodb/python/TestPythonFunctions.java
EC2 instance https://github.com/haitaoyao/presto-aws-plugin/tree/master/src/main/java/presto/aws
Adding below for SEO purpose...
Merged connectors (data sources):
Microsoft SQL Server
Oracle
Hive
PostgreSQL
MySQL
Amazon Redshift
MongoDB
Cassandra
BigQuery
Elasticsearch
Apache Kafka
Clickhouse
Druid
JMX
Kinesis
Phoenix
Google Sheets
Kudu
Redis
Thrift
Prometheus
Linkedin Pinot
Black Hole
Accumulo
Local File
Memory
MemSQL
System
TPCDS
TPCH
see https://prestosql.io/docs/current/connector.html for up to date list
Supported file types in Hive Connector:
ORC
Parquet
Uber Hudi/hoodie (already in Hive connector)
Netflix Iceberg (already in Hive connector)
Avro
JSON (using org.apache.hive.hcatalog.data.JsonSerDe)
CSV (using org.apache.hadoop.hive.serde2.OpenCSVSerde)
TextFile
RCText (RCFile using ColumnarSerDe)
RCBinary (RCFile using LazyBinaryColumnarSerDe)
SequenceFile
Supported filesystem storage for Hive Connector:
AWS S3 ("s3", "s3a")
Google Cloud Storage aka GCS ("gs")
Hadoop File System ("hdfs")
Windows Azure Storage Blob (WASB) ("wasb", "wasbs")
Azure Data Lake Storage (ADLS) ("adl")
Azure ADLS Gen2 - Azure Blob File System ("abfs", "abfss")
Aliyun Object Storage Service (OSS) - https://hadoop.apache.org/docs/current/hadoop-aliyun/tools/hadoop-aliyun/index.html
Tencent Cloud Object Storage (COSN) - #4978
IBM Cloud Object Storage (COS) - CODAIT/stocator#218 (comment) / https://docs.starburstdata.com/latest/connector/starburst-hive-ibm-cos.html
??? Oracle Cloud Infrastructure Object Storage (OCI) - oracle/oci-hdfs-connector#51
MinIO
Ceph
Dell EMC
Cloudian?
OpenIO?
SwiftStack?
Caching:
Data: Alluxio or Qubole Rubix
Directory/file listing
Metastore partitions
Performance tips:
Partitioning on VARCHAR columns (not high cardinality) that are included in WHERE of SELECTs
ANALYZE to gather statistics: https://prestosql.io/docs/current/optimizer/statistics.html
Compressed ORC/Parquet files with size between 32MB-1GB and the columns inside the files sorted so min/max ranges don't overlap for reduced network IO and better predicate pushdown (https://www.slideshare.net/databricks/the-parquet-format-and-performance-optimization-opportunities)
There are no Primary Keys/Foreign Keys/Indexes
https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/
High CPU important for parallelism/concurrency
High Memory important for joins/group by/distinct
task.max-worker-threads
task.concurrency
Security:
LDAP/Kerberos/JWT/Cert
TLS
Apache Ranger for column/table/schema/catalog level authorisation, column masking and row level filtering fine grained access
All view sql:
from_utf8(from_base64(substr(view_original_text,17,length(view_original_text) - 19))) view_sql
Very useful feature is joining between data from any of the connectors.
Disclaimer: while Hive sources are suitable for data of large size (ie Petabytes), JDBC sources are only performant on small tables. For example:
you have a 8 billion row txns table that is indexed/partitioned in Oracle.
Below query takes 38 seconds run in Oracle directly but 70 minutes when run in Presto -->
Missing in all JDBC connectors:
Aggregate pushdown #6613
Join pushdown #6620
Complex filter pushdown #7994 / #402
ORDER BY pushdown #8093
DISTINCT pushdown #4324
The text was updated successfully, but these errors were encountered: