Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File (CSV, JSON, Excel, Feather, Parquet) connector has issue with nyc taxi parquet #27169

Closed
1 task
alberttwong opened this issue Jun 8, 2023 · 1 comment
Closed
1 task
Labels
area/connectors Connector related issues autoteam community team/tse Technical Support Engineers type/bug Something isn't working

Comments

@alberttwong
Copy link

alberttwong commented Jun 8, 2023

Connector Name

source-file

Connector Version

0.3.4

What step the error happened?

During the sync

Revelant information

details can be found at StarRocks/starrocks#24772. Using source-file with file https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2023-01.parquet from nyc yellow taxi.

Relevant log output

"metadata" : {
    "attemptNumber" : 1,
    "jobId" : 5,
    "connector_command" : "read"
  },
  "stacktrace" : "io.airbyte.workers.internal.exception.SourceException: Source process exited with non-zero exit code 137\n\tat io.airbyte.workers.general.DefaultReplicationWorker.lambda$readFromSrcAndWriteToDstRunnable$5(DefaultReplicationWorker.java:379)\n\tat java.base/java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1804)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)\n\tat java.base/java.lang.Thread.run(Thread.java:1589)\n",
  "timestamp" : 1686246768331
} ]
2023-06-08 17:52:49 INFO i.a.c.i.LineGobbler(voidCall):149 - 
2023-06-08 17:52:49 INFO i.a.c.i.LineGobbler(voidCall):149 - ----- END REPLICATION -----
2023-06-08 17:52:49 INFO i.a.c.i.LineGobbler(voidCall):149 - 
2023-06-08 17:52:49 INFO i.a.w.t.TemporalAttemptExecution(get):163 - Stopping cancellation check scheduling...
2023-06-08 17:52:49 INFO i.a.w.t.s.ReplicationActivityImpl(lambda$replicate$3):159 - sync summary: io.airbyte.config.StandardSyncOutput@2e2b0110[standardSyncSummary=io.airbyte.config.StandardSyncSummary@796f1f60[status=failed,recordsSynced=0,bytesSynced=0,startTime=1686246749880,endTime=1686246769528,totalStats=io.airbyte.config.SyncStats@68efefa0[bytesCommitted=0,bytesEmitted=0,destinationStateMessagesEmitted=0,destinationWriteEndTime=1686246769527,destinationWriteStartTime=1686246749961,estimatedBytes=<null>,estimatedRecords=<null>,meanSecondsBeforeSourceStateMessageEmitted=0,maxSecondsBeforeSourceStateMessageEmitted=0,maxSecondsBetweenStateMessageEmittedandCommitted=0,meanSecondsBetweenStateMessageEmittedandCommitted=0,recordsEmitted=0,recordsCommitted=0,replicationEndTime=1686246769528,replicationStartTime=1686246749880,sourceReadEndTime=1686246768315,sourceReadStartTime=1686246749918,sourceStateMessagesEmitted=0,additionalProperties={}],streamStats=[],additionalProperties={}],normalizationSummary=<null>,webhookOperationSummary=<null>,state=<null>,outputCatalog=io.airbyte.protocol.models.ConfiguredAirbyteCatalog@4663685c[streams=[io.airbyte.protocol.models.ConfiguredAirbyteStream@30b6d201[stream=io.airbyte.protocol.models.AirbyteStream@31235f68[name=nyc,jsonSchema={"$schema":"http://json-schema.org/draft-07/schema#","type":"object","properties":{"DOLocationID":{"type":["number","null"]},"RatecodeID":{"type":["number","null"]},"fare_amount":{"type":["number","null"]},"congestion_surcharge":{"type":["number","null"]},"tpep_dropoff_datetime":{"format":"date-time","type":["string","null"]},"VendorID":{"type":["number","null"]},"passenger_count":{"type":["number","null"]},"tolls_amount":{"type":["number","null"]},"improvement_surcharge":{"type":["number","null"]},"trip_distance":{"type":["number","null"]},"payment_type":{"type":["number","null"]},"store_and_fwd_flag":{"type":["string","null"]},"total_amount":{"type":["number","null"]},"extra":{"type":["number","null"]},"tip_amount":{"type":["number","null"]},"mta_tax":{"type":["number","null"]},"airport_fee":{"type":["number","null"]},"PULocationID":{"type":["number","null"]},"tpep_pickup_datetime":{"format":"date-time","type":["string","null"]}}},supportedSyncModes=[full_refresh],sourceDefinedCursor=<null>,defaultCursorField=[],sourceDefinedPrimaryKey=[],namespace=<null>,additionalProperties={}],syncMode=full_refresh,cursorField=[],destinationSyncMode=overwrite,primaryKey=[],additionalProperties={}]],additionalProperties={}],failures=[io.airbyte.config.FailureReason@26262403[failureOrigin=source,failureType=<null>,internalMessage=Source process exited with non-zero exit code 137,externalMessage=Something went wrong within the source connector,metadata=io.airbyte.config.Metadata@7bda9913[additionalProperties={attemptNumber=1, jobId=5, connector_command=read}],stacktrace=io.airbyte.workers.internal.exception.SourceException: Source process exited with non-zero exit code 137
	at io.airbyte.workers.general.DefaultReplicationWorker.lambda$readFromSrcAndWriteToDstRunnable$5(DefaultReplicationWorker.java:379)
	at java.base/java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1804)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1589)
,retryable=<null>,timestamp=1686246768331,additionalProperties={}]],commitStateAsap=true,additionalProperties={}]
2023-06-08 17:52:49 INFO i.a.w.t.s.ReplicationActivityImpl(lambda$replicate$3):164 - Sync summary length: 3459

Contribute

  • Yes, I want to contribute
@alberttwong alberttwong added area/connectors Connector related issues needs-triage type/bug Something isn't working labels Jun 8, 2023
@alberttwong
Copy link
Author

ahh.. the parquet file is too big! I tried a 11meg parquet file and worked! https://d37ci6vzurychx.cloudfront.net/trip-data/green_tripdata_2023-01.parquet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues autoteam community team/tse Technical Support Engineers type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants