Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky TestHiveAlluxioCacheFileOperations.testCacheFileOperations #22861

Closed
sopel39 opened this issue Jul 29, 2024 · 11 comments · Fixed by #23605
Closed

Flaky TestHiveAlluxioCacheFileOperations.testCacheFileOperations #22861

sopel39 opened this issue Jul 29, 2024 · 11 comments · Fixed by #23605
Assignees

Comments

@sopel39
Copy link
Member

sopel39 commented Jul 29, 2024

https://github.com/trinodb/trino/actions/runs/10146256016/job/28054908427?pr=22827

 Error:  io.trino.plugin.hive.TestHiveAlluxioCacheFileOperations.testCacheFileOperations -- Time elapsed: 0.314 s <<< ERROR!
io.trino.testing.QueryFailedException: Could not read table schema
	at io.trino.testing.AbstractTestingTrinoClient.execute(AbstractTestingTrinoClient.java:134)
	at io.trino.testing.DistributedQueryRunner.executeInternal(DistributedQueryRunner.java:565)
	at io.trino.testing.DistributedQueryRunner.executeWithPlan(DistributedQueryRunner.java:554)
	at io.trino.testing.QueryAssertions.assertDistributedUpdate(QueryAssertions.java:108)
	at io.trino.testing.QueryAssertions.assertUpdate(QueryAssertions.java:62)
@mosabua
Copy link
Member

mosabua commented Jul 29, 2024

fyi @JiamingMai @jja725 .. can you maybe help.

@jja725
Copy link
Member

jja725 commented Jul 29, 2024

checking, @sopel39 is this error flaky in github only or it's flaky at local laptop as well? That would help debugging

@sopel39
Copy link
Member Author

sopel39 commented Jul 30, 2024

Locally it passes for me

@JiamingMai
Copy link
Contributor

I can run the test successfully in my local environment.
image

@ebyhr
Copy link
Member

ebyhr commented Sep 17, 2024

@ebyhr
Copy link
Member

ebyhr commented Sep 17, 2024

@jkylling Can you take a look at this issue? This test seems very flaky.

@ebyhr
Copy link
Member

ebyhr commented Sep 18, 2024

@dekimir
Copy link
Contributor

dekimir commented Sep 21, 2024

The underlying cause seems the same as in #21121, which affects TestHiveConnectorTest.

@pajaks
Copy link
Member

pajaks commented Sep 30, 2024

This is very likely caused by cache key collision.
After adding some logging it’s visible that we try to read metadata file from cache with smaller size (it’s new file size) so JSON cannot be parsed.
File read from cache:

{
 "writerVersion" : "testversion",
 "owner" : "hive",
 "tableType" : "MANAGED_TABLE",
 "dataColumns" : [ {
   "name" : "data",
   "type" : "string",
   "properties" : { }
 } ],
 "partitionColumns" : [ {
   "name" : "key",
   "type" : "string",
   "properties" : { }
 } ],
 "parameters" : {
   "trino_version" : "testversion",
   "trino_query_id" : "20240927_114936_00001_nvcjj",
   "transactional" : "false",
   "auto.purge" : "false",
   "numFiles" : "-1",
   "totalSize" : "-1"
 },
 "storageFormat" : "PARQUET",
 "serde

File read directly from storage:

{
  "writerVersion" : "testversion",
  "owner" : "hive",
  "tableType" : "MANAGED_TABLE",
  "dataColumns" : [ {
    "name" : "data",
    "type" : "string",
    "properties" : { }
  } ],
  "partitionColumns" : [ {
    "name" : "key",
    "type" : "string",
    "properties" : { }
  } ],
  "parameters" : {
    "trino_version" : "testversion",
    "trino_query_id" : "20240927_114644_00001_smc5a",
    "transactional" : "false",
    "auto.purge" : "false"
  },
  "storageFormat" : "PARQUET",
  "serdeParameters" : { },
  "columnStatistics" : { }
}

Notice lack of:

"numFiles" : "-1",
"totalSize" : "-1"

in updated file.
Length of old file is: 592 and new 545.
And 545 is exactly the place where the old file is cut. Length of file to read from cache is taken from URIStatus which is taken directly from current file AFAIU.

@sopel39
Copy link
Member Author

sopel39 commented Sep 30, 2024

And 545 is exactly the place where the old file is cut. Length of file to read from cache is taken from URIStatus which is taken directly from current file AFAIU.

So file size is not part of cache key? cc @raunaqmorarka

@pajaks
Copy link
Member

pajaks commented Sep 30, 2024

And 545 is exactly the place where the old file is cut. Length of file to read from cache is taken from URIStatus which is taken directly from current file AFAIU.

So file size is not part of cache key? cc @raunaqmorarka

It's not, it will be added in #23605

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

8 participants