Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HiveMetastore outputFormat should not be accessed from a null StorageFormat #6972

Closed
chappidim opened this issue Dec 27, 2016 · 10 comments
Closed

Comments

@chappidim
Copy link
Contributor

chappidim commented Dec 27, 2016

With this commit , starting Presto 0.152, we can't query System Connector's JDBC/HiveMetastore if Hive tables were created from StorageHandlers where INPUT/OUTPUT Formats may not exists.

Repo:

  1. Create an external table on Hive w.r.t dynamodb
show create table hivet2;

WARNING: Configured write throughput of the dynamodb table SeattleBikes is less than the cluster map capacity. ClusterMapCapacity: 40 WriteThroughput: 5
WARNING: Writes to this table might result in a write outage on the table.
OK

CREATE EXTERNAL TABLE `hivet2`(
  `col1` string COMMENT 'from deserializer', 
  `col2` string COMMENT 'from deserializer', 
  `col3` string COMMENT 'from deserializer', 
  `col4` string COMMENT 'from deserializer', 
  `col5` string COMMENT 'from deserializer', 
  `col6` string COMMENT 'from deserializer', 
  `col7` string COMMENT 'from deserializer', 
  `col8` string COMMENT 'from deserializer', 
  `col9` string COMMENT 'from deserializer')
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.dynamodb.DynamoDBSerDe' 
STORED BY 
  'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler' 
WITH SERDEPROPERTIES ( 
  'serialization.format'='1')
LOCATION
  'hdfs://ip-172-31-21-252.us-west-2.compute.internal:8020/user/hive/warehouse/hivet2'
TBLPROPERTIES (
  'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\"}', 
  'dynamodb.column.mapping'='col1:BikeId,col2:BikeModel,col3:BikeType,col4:City,col5:Condition,col6:Latitude,col7:Location,col8:Longitude,col9:Manufacturer', 
  'dynamodb.table.name'='SeattleBikes', 
  'numFiles'='0', 
  'numRows'='0', 
  'rawDataSize'='0', 
  'totalSize'='0', 
  'transient_lastDdlTime'='1482875308')
Time taken: 1.076 seconds, Fetched: 27 row(s)
  1. Open CLI and run some query to pull metastore/table details.
presto-cli --catalog hive --schema default --debug
presto:default> 
presto:default> SELECT TABLE_CAT, TABLE_SCHEM, TABLE_NAME, COLUMN_NAME, DATA_TYPE, TYPE_NAME, COLUMN_SIZE, BUFFER_LENGTH, 
    DECIMAL_DIGITS, NUM_PREC_RADIX, NULLABLE, REMARKS, COLUMN_DEF, SQL_DATA_TYPE, SQL_DATETIME_SUB, CHAR_OCTET_LENGTH, 
    ORDINAL_POSITION, IS_NULLABLE, SCOPE_CATALOG, SCOPE_SCHEMA, SCOPE_TABLE, SOURCE_DATA_TYPE, IS_AUTOINCREMENT 
    FROM system.jdbc.columns WHERE TABLE_CAT='hive';

Query 20161227_220527_00074_urbnw, FAILED, 2 nodes
http://ip-172-31-21-252.us-west-2.compute.internal:8889/query.html?20161227_220527_00074_urbnw
Splits: 2 total, 0 done (0.00%)
CPU Time: 0.0s total,     0 rows/s,     0B/s, 33% active
Per Node: 0.0 parallelism,     0 rows/s,     0B/s
Parallelism: 0.0
0:00 [0 rows, 0B] [0 rows/s, 0B/s]

Query 20161227_220527_00074_urbnw failed: outputFormat should not be accessed from a null StorageFormat
java.lang.IllegalStateException: outputFormat should not be accessed from a null StorageFormat
        at com.facebook.presto.hive.metastore.StorageFormat.getOutputFormat(StorageFormat.java:59)
        at com.facebook.presto.hive.HiveMetadata.extractHiveStorageFormat(HiveMetadata.java:1554)
        at com.facebook.presto.hive.HiveMetadata.getTableMetadata(HiveMetadata.java:268)
        at com.facebook.presto.hive.HiveMetadata.listTableColumns(HiveMetadata.java:370)
        at com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorMetadata.listTableColumns(ClassLoaderSafeConnectorMetadata.java:169)
        at com.facebook.presto.metadata.MetadataManager.listTableColumns(MetadataManager.java:420)
        at com.facebook.presto.connector.system.jdbc.ColumnJdbcTable.cursor(ColumnJdbcTable.java:125)
        at com.facebook.presto.connector.system.SystemPageSourceProvider$1.cursor(SystemPageSourceProvider.java:126)
        at com.facebook.presto.split.MappedRecordSet.cursor(MappedRecordSet.java:53)
        at com.facebook.presto.spi.RecordPageSource.<init>(RecordPageSource.java:37)
        at com.facebook.presto.connector.system.SystemPageSourceProvider.createPageSource(SystemPageSourceProvider.java:105)
        at com.facebook.presto.split.PageSourceManager.createPageSource(PageSourceManager.java:48)
        at com.facebook.presto.operator.ScanFilterAndProjectOperator.createSourceIfNecessary(ScanFilterAndProjectOperator.java:292)
        at com.facebook.presto.operator.ScanFilterAndProjectOperator.isFinished(ScanFilterAndProjectOperator.java:180)
        at com.facebook.presto.operator.Driver.processInternal(Driver.java:375)
        at com.facebook.presto.operator.Driver.processFor(Driver.java:301)
        at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:622)
        at com.facebook.presto.execution.TaskExecutor$PrioritizedSplitRunner.process(TaskExecutor.java:529)
        at com.facebook.presto.execution.TaskExecutor$Runner.run(TaskExecutor.java:665)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

Ran on Presto 0.152.3/emr-5.2.0 (Hive metastore can be external/internal).

@chappidim chappidim changed the title HiveMetadate outputFormat should not be accessed from a null StorageFormat HiveMetastore outputFormat should not be accessed from a null StorageFormat Dec 27, 2016
@pratimaws
Copy link

Same issue on EMR. External hive table to S3 or DynamoDB works fine. But when trying to access the DynamoDB table, it throws the same error: Query 20170410_121048_00008_sdbf4 failed: outputFormat should not be accessed from a null StorageFormat.. However, the S3 backed table continues to work fine on Presto.

@buremba
Copy link
Contributor

buremba commented Apr 20, 2017

Any update on this issue? We ran into it with S3-backup tables on Hive on version 0.173.

@ebirukov
Copy link

I have same error on emr presto 0.166
presto:information_schema> SELECT table_schema, table_name, column_name FROM information_schema.columns WHERE table_schema IN ('default', 'test');

Query 20170425_112726_00155_nfp6g failed: outputFormat should not be accessed from a null StorageFormat
java.lang.IllegalStateException: outputFormat should not be accessed from a null StorageFormat
at com.facebook.presto.hive.metastore.StorageFormat.getOutputFormat(StorageFormat.java:61)
at com.facebook.presto.hive.HiveMetadata.extractHiveStorageFormat(HiveMetadata.java:1266)
at com.facebook.presto.hive.HiveMetadata.getTableMetadata(HiveMetadata.java:251)
at com.facebook.presto.hive.HiveMetadata.listTableColumns(HiveMetadata.java:329)
at com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorMetadata.listTableColumns(ClassLoaderSafeConnectorMetadata.java:166)
at com.facebook.presto.metadata.MetadataManager.listTableColumns(MetadataManager.java:397)
at com.facebook.presto.connector.informationSchema.InformationSchemaPageSourceProvider.getColumnsList(InformationSchemaPageSourceProvider.java:182)
at com.facebook.presto.connector.informationSchema.InformationSchemaPageSourceProvider.buildColumns(InformationSchemaPageSourceProvider.java:156)
at com.facebook.presto.connector.informationSchema.InformationSchemaPageSourceProvider.getInformationSchemaTable(InformationSchemaPageSourceProvider.java:135)
at com.facebook.presto.connector.informationSchema.InformationSchemaPageSourceProvider.getInternalTable(InformationSchemaPageSourceProvider.java:129)
at com.facebook.presto.connector.informationSchema.InformationSchemaPageSourceProvider.createPageSource(InformationSchemaPageSourceProvider.java:87)
at com.facebook.presto.split.PageSourceManager.createPageSource(PageSourceManager.java:56)
at com.facebook.presto.operator.ScanFilterAndProjectOperator.getOutput(ScanFilterAndProjectOperator.java:222)
at com.facebook.presto.operator.Driver.processInternal(Driver.java:378)
at com.facebook.presto.operator.Driver.processFor(Driver.java:301)
at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:622)
at com.facebook.presto.execution.TaskExecutor$PrioritizedSplitRunner.process(TaskExecutor.java:555)
at com.facebook.presto.execution.TaskExecutor$Runner.run(TaskExecutor.java:691)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

but next queries is correct work
presto> SELECT table_schema, table_name, column_name FROM information_schema.columns WHERE table_schema IN ('presto');
or
presto> SELECT table_schema, table_name, column_name FROM information_schema.columns WHERE table_schema IN ('test');

@chadjefferies
Copy link

As of 0.198, I am also experiencing this error pretty regularly against S3 backed Parquet tables.

Most tables have been created through Hive, though we also have tables that have been created through Presto. Offending tables have the following formats:

InputFormat:            org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
OutputFormat:           org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat 

We have BI tools and data exploration tools that query system.jdbc.columns and information_schema.columns and regularly receive this error.

@rupeshmalladi
Copy link

rupeshmalladi commented Apr 19, 2018

This issue also shows up for Hive external table created on AWS EMR using AWS ElasticSearch as source and again, this format uses a storage handler as well.

Below table can be queried using hive on AWS EMR but cannot be queried using presto on AWS EMR and similarly on AWS Athena (which seems to be using presto as well)

Hive DDL :

CREATE EXTERNAL TABLE prod_elastic_search.insights (
reportinstanceid string,
ip string,
sg_event_id string,
sg_message_id string,
useragent string,
event string,
email string,
timestamp bigint,
reportsubject string,
category ARRAY,
@id string,
@timestamp timestamp)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES(
'es.mapping.id' = 'sendgridevents',
'es.resource' = 'cwl-*',
'es.nodes' = '',
'es.port' = '80',
'es.nodes.wan.only' = 'true',
'es.index.auto.create' = 'true'
,'es.read.field.as.array.exclude' = 'true'
);

Error Details:

presto> select * from prod_elastic_search.insights limit 10;
Query 20180419_203320_00002_juv3q failed: outputFormat should not be accessed from a null StorageFormat

@vapiravfif
Copy link

We have tables using Hive StorageHandler and living in same metastore with regular and presto-queryable tables. External tools like JetBrain's Data Grip and BI tools, which use information_schema and system.jdbc.columns for displaying tables and columns regularly fail because of the StorageHandler tables, resulting different problems in the tools.

Presto should handle this exception and ignore unqueryable tables.
Is there a roadmap/timeline for merging the fix #9837 for this issue?

@derekennui
Copy link

here is a simple makeshift :
I found the error in source code in com.facebook.presto.hive.metastore.StorageFormat, and change the code

public String getOutputFormat()
{
if (outputFormat == null) {
return "null";
//throw new IllegalStateException("outputFormat should not be accessed from a null StorageFormat");
}
return outputFormat;
}

@rschlussel
Copy link
Contributor

We should backport the fix from: trinodb/trino#568

@sjpascual
Copy link
Contributor

I've opened a PR to backport the fix here: #12973

ktmud added a commit to ktmud/redash that referenced this issue Jun 20, 2019
In lower version of Presto, sometimes the old query will throw
"outputFormat should not be accessed from a null StorageFormat"
error (see prestodb/presto#6972).
ktmud added a commit to ktmud/redash that referenced this issue Jun 20, 2019
In lower version of Presto, sometimes the old query will throw
"outputFormat should not be accessed from a null StorageFormat"
error (see prestodb/presto#6972).
ktmud added a commit to ktmud/redash that referenced this issue Jun 26, 2019
In lower version of Presto, sometimes the old query will throw
"outputFormat should not be accessed from a null StorageFormat"
error (see prestodb/presto/issues/6972).

Sometimes there can be many many tables, it would be nice
if we filter only for certain sources.

We might also add access control based on the table filter.
ktmud added a commit to ktmud/redash that referenced this issue Jul 3, 2019
And load schemas one by one. This should improve performance
for large Presto instances where a single schema may contain
thousands of tables.

Plus, in lower version of Presto, sometimes the old query will throw
"outputFormat should not be accessed from a null StorageFormat"
error (see prestodb/presto#6972). This change allows us to skip
this error and still return valid results.
ktmud added a commit to ktmud/redash that referenced this issue Sep 25, 2019
And load schemas one by one. This should improve performance
for large Presto instances where a single schema may contain
thousands of tables.

Plus, in lower version of Presto, sometimes the old query will throw
"outputFormat should not be accessed from a null StorageFormat"
error (see prestodb/presto#6972). This change allows us to skip
this error and still return valid results.
ktmud added a commit to ktmud/redash that referenced this issue Aug 14, 2020
And load schemas one by one. This should improve performance
for large Presto instances where a single schema may contain
thousands of tables.

Plus, in lower version of Presto, sometimes the old query will throw
"outputFormat should not be accessed from a null StorageFormat"
error (see prestodb/presto#6972). This change allows us to skip
this error and still return valid results.
@stale
Copy link

stale bot commented Jun 22, 2021

This issue has been automatically marked as stale because it has not had any activity in the last 2 years. If you feel that this issue is important, just comment and the stale tag will be removed; otherwise it will be closed in 7 days. This is an attempt to ensure that our open issues remain valuable and relevant so that we can keep track of what needs to be done and prioritize the right things.

@stale stale bot added the stale label Jun 22, 2021
@stale stale bot closed this as completed Jul 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants