HiveMetastore outputFormat should not be accessed from a null StorageFormat #6972

chappidim · 2016-12-27T22:13:40Z

With this commit , starting Presto 0.152, we can't query System Connector's JDBC/HiveMetastore if Hive tables were created from StorageHandlers where INPUT/OUTPUT Formats may not exists.

Repo:

Create an external table on Hive w.r.t dynamodb

show create table hivet2;

WARNING: Configured write throughput of the dynamodb table SeattleBikes is less than the cluster map capacity. ClusterMapCapacity: 40 WriteThroughput: 5
WARNING: Writes to this table might result in a write outage on the table.
OK

CREATE EXTERNAL TABLE `hivet2`(
  `col1` string COMMENT 'from deserializer', 
  `col2` string COMMENT 'from deserializer', 
  `col3` string COMMENT 'from deserializer', 
  `col4` string COMMENT 'from deserializer', 
  `col5` string COMMENT 'from deserializer', 
  `col6` string COMMENT 'from deserializer', 
  `col7` string COMMENT 'from deserializer', 
  `col8` string COMMENT 'from deserializer', 
  `col9` string COMMENT 'from deserializer')
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.dynamodb.DynamoDBSerDe' 
STORED BY 
  'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler' 
WITH SERDEPROPERTIES ( 
  'serialization.format'='1')
LOCATION
  'hdfs://ip-172-31-21-252.us-west-2.compute.internal:8020/user/hive/warehouse/hivet2'
TBLPROPERTIES (
  'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\"}', 
  'dynamodb.column.mapping'='col1:BikeId,col2:BikeModel,col3:BikeType,col4:City,col5:Condition,col6:Latitude,col7:Location,col8:Longitude,col9:Manufacturer', 
  'dynamodb.table.name'='SeattleBikes', 
  'numFiles'='0', 
  'numRows'='0', 
  'rawDataSize'='0', 
  'totalSize'='0', 
  'transient_lastDdlTime'='1482875308')
Time taken: 1.076 seconds, Fetched: 27 row(s)

Open CLI and run some query to pull metastore/table details.

presto-cli --catalog hive --schema default --debug
presto:default> 
presto:default> SELECT TABLE_CAT, TABLE_SCHEM, TABLE_NAME, COLUMN_NAME, DATA_TYPE, TYPE_NAME, COLUMN_SIZE, BUFFER_LENGTH, 
    DECIMAL_DIGITS, NUM_PREC_RADIX, NULLABLE, REMARKS, COLUMN_DEF, SQL_DATA_TYPE, SQL_DATETIME_SUB, CHAR_OCTET_LENGTH, 
    ORDINAL_POSITION, IS_NULLABLE, SCOPE_CATALOG, SCOPE_SCHEMA, SCOPE_TABLE, SOURCE_DATA_TYPE, IS_AUTOINCREMENT 
    FROM system.jdbc.columns WHERE TABLE_CAT='hive';

Query 20161227_220527_00074_urbnw, FAILED, 2 nodes
http://ip-172-31-21-252.us-west-2.compute.internal:8889/query.html?20161227_220527_00074_urbnw
Splits: 2 total, 0 done (0.00%)
CPU Time: 0.0s total,     0 rows/s,     0B/s, 33% active
Per Node: 0.0 parallelism,     0 rows/s,     0B/s
Parallelism: 0.0
0:00 [0 rows, 0B] [0 rows/s, 0B/s]

Query 20161227_220527_00074_urbnw failed: outputFormat should not be accessed from a null StorageFormat
java.lang.IllegalStateException: outputFormat should not be accessed from a null StorageFormat
        at com.facebook.presto.hive.metastore.StorageFormat.getOutputFormat(StorageFormat.java:59)
        at com.facebook.presto.hive.HiveMetadata.extractHiveStorageFormat(HiveMetadata.java:1554)
        at com.facebook.presto.hive.HiveMetadata.getTableMetadata(HiveMetadata.java:268)
        at com.facebook.presto.hive.HiveMetadata.listTableColumns(HiveMetadata.java:370)
        at com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorMetadata.listTableColumns(ClassLoaderSafeConnectorMetadata.java:169)
        at com.facebook.presto.metadata.MetadataManager.listTableColumns(MetadataManager.java:420)
        at com.facebook.presto.connector.system.jdbc.ColumnJdbcTable.cursor(ColumnJdbcTable.java:125)
        at com.facebook.presto.connector.system.SystemPageSourceProvider$1.cursor(SystemPageSourceProvider.java:126)
        at com.facebook.presto.split.MappedRecordSet.cursor(MappedRecordSet.java:53)
        at com.facebook.presto.spi.RecordPageSource.<init>(RecordPageSource.java:37)
        at com.facebook.presto.connector.system.SystemPageSourceProvider.createPageSource(SystemPageSourceProvider.java:105)
        at com.facebook.presto.split.PageSourceManager.createPageSource(PageSourceManager.java:48)
        at com.facebook.presto.operator.ScanFilterAndProjectOperator.createSourceIfNecessary(ScanFilterAndProjectOperator.java:292)
        at com.facebook.presto.operator.ScanFilterAndProjectOperator.isFinished(ScanFilterAndProjectOperator.java:180)
        at com.facebook.presto.operator.Driver.processInternal(Driver.java:375)
        at com.facebook.presto.operator.Driver.processFor(Driver.java:301)
        at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:622)
        at com.facebook.presto.execution.TaskExecutor$PrioritizedSplitRunner.process(TaskExecutor.java:529)
        at com.facebook.presto.execution.TaskExecutor$Runner.run(TaskExecutor.java:665)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

Ran on Presto 0.152.3/emr-5.2.0 (Hive metastore can be external/internal).

The text was updated successfully, but these errors were encountered:

pratimaws · 2017-04-10T14:51:30Z

Same issue on EMR. External hive table to S3 or DynamoDB works fine. But when trying to access the DynamoDB table, it throws the same error: Query 20170410_121048_00008_sdbf4 failed: outputFormat should not be accessed from a null StorageFormat.. However, the S3 backed table continues to work fine on Presto.

buremba · 2017-04-20T18:37:20Z

Any update on this issue? We ran into it with S3-backup tables on Hive on version 0.173.

ebirukov · 2017-04-25T11:28:55Z

I have same error on emr presto 0.166
presto:information_schema> SELECT table_schema, table_name, column_name FROM information_schema.columns WHERE table_schema IN ('default', 'test');

Query 20170425_112726_00155_nfp6g failed: outputFormat should not be accessed from a null StorageFormat
java.lang.IllegalStateException: outputFormat should not be accessed from a null StorageFormat
at com.facebook.presto.hive.metastore.StorageFormat.getOutputFormat(StorageFormat.java:61)
at com.facebook.presto.hive.HiveMetadata.extractHiveStorageFormat(HiveMetadata.java:1266)
at com.facebook.presto.hive.HiveMetadata.getTableMetadata(HiveMetadata.java:251)
at com.facebook.presto.hive.HiveMetadata.listTableColumns(HiveMetadata.java:329)
at com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorMetadata.listTableColumns(ClassLoaderSafeConnectorMetadata.java:166)
at com.facebook.presto.metadata.MetadataManager.listTableColumns(MetadataManager.java:397)
at com.facebook.presto.connector.informationSchema.InformationSchemaPageSourceProvider.getColumnsList(InformationSchemaPageSourceProvider.java:182)
at com.facebook.presto.connector.informationSchema.InformationSchemaPageSourceProvider.buildColumns(InformationSchemaPageSourceProvider.java:156)
at com.facebook.presto.connector.informationSchema.InformationSchemaPageSourceProvider.getInformationSchemaTable(InformationSchemaPageSourceProvider.java:135)
at com.facebook.presto.connector.informationSchema.InformationSchemaPageSourceProvider.getInternalTable(InformationSchemaPageSourceProvider.java:129)
at com.facebook.presto.connector.informationSchema.InformationSchemaPageSourceProvider.createPageSource(InformationSchemaPageSourceProvider.java:87)
at com.facebook.presto.split.PageSourceManager.createPageSource(PageSourceManager.java:56)
at com.facebook.presto.operator.ScanFilterAndProjectOperator.getOutput(ScanFilterAndProjectOperator.java:222)
at com.facebook.presto.operator.Driver.processInternal(Driver.java:378)
at com.facebook.presto.operator.Driver.processFor(Driver.java:301)
at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:622)
at com.facebook.presto.execution.TaskExecutor$PrioritizedSplitRunner.process(TaskExecutor.java:555)
at com.facebook.presto.execution.TaskExecutor$Runner.run(TaskExecutor.java:691)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

but next queries is correct work
presto> SELECT table_schema, table_name, column_name FROM information_schema.columns WHERE table_schema IN ('presto');
or
presto> SELECT table_schema, table_name, column_name FROM information_schema.columns WHERE table_schema IN ('test');

prestodb#6972

chadjefferies · 2018-04-05T15:47:16Z

As of 0.198, I am also experiencing this error pretty regularly against S3 backed Parquet tables.

Most tables have been created through Hive, though we also have tables that have been created through Presto. Offending tables have the following formats:

InputFormat:            org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
OutputFormat:           org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat

We have BI tools and data exploration tools that query system.jdbc.columns and information_schema.columns and regularly receive this error.

rupeshmalladi · 2018-04-19T20:40:19Z

This issue also shows up for Hive external table created on AWS EMR using AWS ElasticSearch as source and again, this format uses a storage handler as well.

Below table can be queried using hive on AWS EMR but cannot be queried using presto on AWS EMR and similarly on AWS Athena (which seems to be using presto as well)

Hive DDL :

CREATE EXTERNAL TABLE prod_elastic_search.insights (
reportinstanceid string,
ip string,
sg_event_id string,
sg_message_id string,
useragent string,
event string,
email string,
timestamp bigint,
reportsubject string,
category ARRAY,
@id string,
@timestamp timestamp)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES(
'es.mapping.id' = 'sendgridevents',
'es.resource' = 'cwl-*',
'es.nodes' = '',
'es.port' = '80',
'es.nodes.wan.only' = 'true',
'es.index.auto.create' = 'true'
,'es.read.field.as.array.exclude' = 'true'
);

Error Details:

presto> select * from prod_elastic_search.insights limit 10;
Query 20180419_203320_00002_juv3q failed: outputFormat should not be accessed from a null StorageFormat

vapiravfif · 2018-07-29T14:28:08Z

We have tables using Hive StorageHandler and living in same metastore with regular and presto-queryable tables. External tools like JetBrain's Data Grip and BI tools, which use information_schema and system.jdbc.columns for displaying tables and columns regularly fail because of the StorageHandler tables, resulting different problems in the tools.

Presto should handle this exception and ignore unqueryable tables.
Is there a roadmap/timeline for merging the fix #9837 for this issue?

derekennui · 2018-11-13T07:48:40Z

here is a simple makeshift :
I found the error in source code in com.facebook.presto.hive.metastore.StorageFormat, and change the code

public String getOutputFormat()
{
if (outputFormat == null) {
return "null";
//throw new IllegalStateException("outputFormat should not be accessed from a null StorageFormat");
}
return outputFormat;
}

rschlussel · 2019-05-15T20:19:12Z

We should backport the fix from: trinodb/trino#568

sjpascual · 2019-06-19T17:42:06Z

I've opened a PR to backport the fix here: #12973

In lower version of Presto, sometimes the old query will throw "outputFormat should not be accessed from a null StorageFormat" error (see prestodb/presto#6972).

In lower version of Presto, sometimes the old query will throw "outputFormat should not be accessed from a null StorageFormat" error (see prestodb/presto/issues/6972). Sometimes there can be many many tables, it would be nice if we filter only for certain sources. We might also add access control based on the table filter.

And load schemas one by one. This should improve performance for large Presto instances where a single schema may contain thousands of tables. Plus, in lower version of Presto, sometimes the old query will throw "outputFormat should not be accessed from a null StorageFormat" error (see prestodb/presto#6972). This change allows us to skip this error and still return valid results.

stale · 2021-06-22T17:33:57Z

This issue has been automatically marked as stale because it has not had any activity in the last 2 years. If you feel that this issue is important, just comment and the stale tag will be removed; otherwise it will be closed in 7 days. This is an attempt to ensure that our open issues remain valuable and relevant so that we can keep track of what needs to be done and prioritize the right things.

chappidim changed the title ~~HiveMetadate outputFormat should not be accessed from a null StorageFormat~~ HiveMetastore outputFormat should not be accessed from a null StorageFormat Dec 27, 2016

gppeixoto mentioned this issue May 31, 2017

Failing to communicate with Presto after "Failed getting schema" getredash/redash#1799

Open

1 task

pashalogin added a commit to pashalogin/presto that referenced this issue Jan 28, 2018

https://github.com/prestodb/presto/issues/6972

1734470

prestodb#6972

pashalogin mentioned this issue Jan 28, 2018

Fix for: HiveMetastore outputFormat should not be accessed from a null StorageFormat (#6972) #9837

Closed

findepi mentioned this issue Jun 17, 2018

hive.information_schema.columns is null #10829

Closed

rschlussel added the backport label May 15, 2019

ktmud added a commit to ktmud/redash that referenced this issue Jun 20, 2019

Use system.jdbc for Presto tables

39e43f1

In lower version of Presto, sometimes the old query will throw "outputFormat should not be accessed from a null StorageFormat" error (see prestodb/presto#6972).

ktmud added a commit to ktmud/redash that referenced this issue Jun 20, 2019

Use system.jdbc for Presto tables

c49277d

In lower version of Presto, sometimes the old query will throw "outputFormat should not be accessed from a null StorageFormat" error (see prestodb/presto#6972).

stale bot added the stale label Jun 22, 2021

stale bot closed this as completed Jul 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HiveMetastore outputFormat should not be accessed from a null StorageFormat #6972

HiveMetastore outputFormat should not be accessed from a null StorageFormat #6972

chappidim commented Dec 27, 2016 •

edited by findepi

Loading

pratimaws commented Apr 10, 2017

buremba commented Apr 20, 2017

ebirukov commented Apr 25, 2017

chadjefferies commented Apr 5, 2018

rupeshmalladi commented Apr 19, 2018 •

edited

Loading

vapiravfif commented Jul 29, 2018

derekennui commented Nov 13, 2018

rschlussel commented May 15, 2019

sjpascual commented Jun 19, 2019

stale bot commented Jun 22, 2021

HiveMetastore outputFormat should not be accessed from a null StorageFormat #6972

HiveMetastore outputFormat should not be accessed from a null StorageFormat #6972

Comments

chappidim commented Dec 27, 2016 • edited by findepi Loading

pratimaws commented Apr 10, 2017

buremba commented Apr 20, 2017

ebirukov commented Apr 25, 2017

chadjefferies commented Apr 5, 2018

rupeshmalladi commented Apr 19, 2018 • edited Loading

Hive DDL :

Error Details:

vapiravfif commented Jul 29, 2018

derekennui commented Nov 13, 2018

rschlussel commented May 15, 2019

sjpascual commented Jun 19, 2019

stale bot commented Jun 22, 2021

chappidim commented Dec 27, 2016 •

edited by findepi

Loading

rupeshmalladi commented Apr 19, 2018 •

edited

Loading