Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: PR 17365 breaks BQ loads in some cases #22372

Closed
steveniemitz opened this issue Jul 20, 2022 · 12 comments · Fixed by #24471
Closed

[Bug]: PR 17365 breaks BQ loads in some cases #22372

steveniemitz opened this issue Jul 20, 2022 · 12 comments · Fixed by #24471

Comments

@steveniemitz
Copy link
Contributor

What happened?

In beam 2.39, this used to work:

BigQueryIO
   .write()
   .optimizedWrites()
   .to(OutputTable)
   .withAvroFormatFunction(abc)
   .withAvroSchemaFactory(xyz)
   .withCreateDisposition(CreateDisposition.CREATE_NEVER)
   .withWriteDisposition(WriteDisposition.WRITE_TRUNCATE))

but now throws an exception if you end up in MultiPartitionsWriteTables:

Caused by: java.lang.IllegalArgumentException: Unless create disposition is CREATE_NEVER, a schema must be specified, i.e. DynamicDestinations.getSchema() may not return null. However, create disposition is CREATE_IF_NEEDED, and org.apache.beam.sdk.io.gcp.bigquery.DynamicDestinationsHelpers$ConstantTableDestinations@52e95e1 returned null for destination tableSpec: <table>
	at org.apache.beam.sdk.util.Preconditions.checkArgumentNotNull(Preconditions.java:436)
	at org.apache.beam.sdk.io.gcp.bigquery.WriteTables$WriteTablesDoFn.processElement(WriteTables.java:207)

The reason seems like now, the create disposition is set to CreateDisposition.CREATE_IF_NEEDED in WriteTables no matter what the user sets it to.

cc @pabloem @MarcoRob

Issue Priority

Priority: 2

Issue Component

Component: io-java-gcp

@steveniemitz
Copy link
Contributor Author

Additionally, the new UpdateSchemaDestination breaks if the source format is set to avro due to it trying to load an empty file (a 0 length file is not a valid avro file).

@pabloem
Copy link
Member

pabloem commented Jul 22, 2022

@johnjcasey @chamikaramj could you help look into this? I will be on vacation for a couple weeks from today.

@johnjcasey
Copy link
Contributor

ack

@johnjcasey
Copy link
Contributor

Agreed, this looks like that original PR changed how we load dynamic destinations for MultiPartitionsWrite tables. @MarcoRob can you take a look at this?

@MarcoRob
Copy link
Contributor

Agreed, this looks like that original PR changed how we load dynamic destinations for MultiPartitionsWrite tables. @MarcoRob can you take a look at this?

Sure, let me check, tks

@Abacn
Copy link
Contributor

Abacn commented Oct 14, 2022

Hi, just curious about what is the current status of this bug? Is the working example fixed by #22390 ? @steveniemitz @MarcoRob

@ahmedabu98
Copy link
Contributor

Looks like #22390 fixed the Avro source problem referred to in the second comment.

@ahmedabu98
Copy link
Contributor

Looks like the problem is in this deleted code block. The create disposition in WriteTables has always been CreateDisposition.CREATE_IF_NEEDED, because that's necessary when creating temp tables. However, in the case the schema isn't provided, this would cause the problem we're seeing here.

That code block used to handle this case by fetching the schema of the final destination table and using it as the schema for the temp tables. I think bringing that code back would solve this issue, unless there's a particular reason it was deleted @MarcoRob?

@MarcoRob
Copy link
Contributor

Looks like the problem is in this deleted code block. The create disposition in WriteTables has always been CreateDisposition.CREATE_IF_NEEDED, because that's necessary when creating temp tables. However, in the case the schema isn't provided, this would cause the problem we're seeing here.

That code block used to handle this case by fetching the schema of the final destination table and using it as the schema for the temp tables. I think bringing that code back would solve this issue, unless there's a particular reason it was deleted @MarcoRob?

The change was part of a fix made due the following ticket.
I am taking a look at this now to validate the issue and fix the bug. I will be updating the bug with more info.

@Abacn
Copy link
Contributor

Abacn commented Nov 30, 2022

Hi @MarcoRob, what is the status of this issue? Is it safe to restore the deleted block?

@Abacn
Copy link
Contributor

Abacn commented Feb 8, 2023

Decide to reopen this issue because there is still more than one breakage remain

#25355

  • incompatible schema error in copy when dynamicDestination
  • check equality between provided schema and final destination schema does not work (#issuecomment-1423190926)

some fixes already done:
#22624 (fixes #22543) #24471 #24700

@Abacn Abacn removed this from the 2.45.0 Release milestone Feb 8, 2023
@Abacn
Copy link
Contributor

Abacn commented Mar 3, 2023

#25355 is fixed and will be made available in upcoming 2.46.0. Close this for now.

@Abacn Abacn closed this as completed Mar 3, 2023
@github-actions github-actions bot added this to the 2.47.0 Release milestone Mar 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants