Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OLR loses connection with Debezium #184

Open
alexisread opened this issue Oct 22, 2024 · 0 comments
Open

OLR loses connection with Debezium #184

alexisread opened this issue Oct 22, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@alexisread
Copy link

A brief description of the bug.
OLR connects to Debezium successfully, and the current SCN is exchanged. On a table insert, OLR loses the connection with Debezium.
Debezium error:
deboracle | [2024-10-21 12:37:55,539] INFO OpenLogReplicator ready, streaming from SCN 464082622521. (io.debezium.connector.oracle.olr.client.OlrNetworkClient)
deboracle | [2024-10-21 12:37:56,567] INFO 172.20.0.4 - - [21/Oct/2024:12:37:56 +0000] "GET /connectors HTTP/1.1" 200 34 "-" "curl/7.81.0" 3 (org.apache.kafka.connect.runtime.rest.RestServer)
deboracle | [2024-10-21 12:37:57,768] INFO OpenLogReplicator streaming client started successfully. (io.debezium.connector.oracle.olr.client.OlrNetworkClient)
deboracle | [2024-10-21 12:37:58,649] INFO WorkerSourceTask{id=Q1COTRX_OWNER_REFDB_INSTRUMENT-0} Committing offsets for 31 acknowledged messages (org.apache.kafka.connect.runtime.WorkerSourceTask)
deboracle | [2024-10-21 12:37:58,651] WARN Cannot flush latest offset SCN as no checkpoint event was received. (io.debezium.connector.oracle.olr.OpenLogReplicatorStreamingChangeEventSource)
deboracle | [2024-10-21 12:38:01,697] INFO 172.20.0.4 - - [21/Oct/2024:12:38:01 +0000] "GET /connectors HTTP/1.1" 200 34 "-" "curl/7.81.0" 2 (org.apache.kafka.connect.runtime.rest.RestServer)
deboracle | [2024-10-21 12:38:06,880] INFO 172.20.0.4 - - [21/Oct/2024:12:38:06 +0000] "GET /connectors HTTP/1.1" 200 34 "-" "curl/7.81.0" 3 (org.apache.kafka.connect.runtime.rest.RestServer)
deboracle | [2024-10-21 12:38:12,000] INFO 172.20.0.4 - - [21/Oct/2024:12:38:11 +0000] "GET /connectors HTTP/1.1" 200 34 "-" "curl/7.81.0" 3 (org.apache.kafka.connect.runtime.rest.RestServer)
deboracle | [2024-10-21 12:38:13,743] INFO Client disconnected. (io.debezium.connector.oracle.olr.OpenLogReplicatorStreamingChangeEventSource)
deboracle | [2024-10-21 12:38:13,745] ERROR Failed: Connection lost (io.debezium.connector.oracle.olr.OpenLogReplicatorStreamingChangeEventSource)
deboracle | io.debezium.connector.oracle.olr.client.OlrNetworkClientException: Connection lost
deboracle | at io.debezium.connector.oracle.olr.client.OlrNetworkClient.fillBuffer(OlrNetworkClient.java:268)
deboracle | at io.debezium.connector.oracle.olr.client.OlrNetworkClient.read(OlrNetworkClient.java:241)
deboracle | at io.debezium.connector.oracle.olr.client.OlrNetworkClient.readNextEvent(OlrNetworkClient.java:162)
deboracle | at io.debezium.connector.oracle.olr.client.OlrNetworkClient.readNextEventWithStartScnSkip(OlrNetworkClient.java:145)
deboracle | at io.debezium.connector.oracle.olr.client.OlrNetworkClient.readEvent(OlrNetworkClient.java:126)
deboracle | at io.debezium.connector.oracle.olr.OpenLogReplicatorStreamingChangeEventSource.execute(OpenLogReplicatorStreamingChangeEventSource.java:120)
deboracle | at io.debezium.connector.oracle.olr.OpenLogReplicatorStreamingChangeEventSource.execute(OpenLogReplicatorStreamingChangeEventSource.java:52)
deboracle | at io.debezium.pipeline.ChangeEventSourceCoordinator.streamEvents(ChangeEventSourceCoordinator.java:313)
deboracle | at io.debezium.pipeline.ChangeEventSourceCoordinator.executeChangeEventSources(ChangeEventSourceCoordinator.java:203)
deboracle | at io.debezium.pipeline.ChangeEventSourceCoordinator.lambda$start$0(ChangeEventSourceCoordinator.java:143)
deboracle | at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
deboracle | at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
deboracle | at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
deboracle | at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
deboracle | at java.base/java.lang.Thread.run(Thread.java:829)
deboracle | [2024-10-21 12:38:13,748] ERROR Producer failure (io.debezium.pipeline.ErrorHandler)
deboracle | io.debezium.connector.oracle.olr.client.OlrNetworkClientException: Connection lost
deboracle | at io.debezium.connector.oracle.olr.client.OlrNetworkClient.fillBuffer(OlrNetworkClient.java:268)
deboracle | at io.debezium.connector.oracle.olr.client.OlrNetworkClient.read(OlrNetworkClient.java:241)
deboracle | at io.debezium.connector.oracle.olr.client.OlrNetworkClient.readNextEvent(OlrNetworkClient.java:162)
deboracle | at io.debezium.connector.oracle.olr.client.OlrNetworkClient.readNextEventWithStartScnSkip(OlrNetworkClient.java:145)
deboracle | at io.debezium.connector.oracle.olr.client.OlrNetworkClient.readEvent(OlrNetworkClient.java:126)
deboracle | at io.debezium.connector.oracle.olr.OpenLogReplicatorStreamingChangeEventSource.execute(OpenLogReplicatorStreamingChangeEventSource.java:120)
deboracle | at io.debezium.connector.oracle.olr.OpenLogReplicatorStreamingChangeEventSource.execute(OpenLogReplicatorStreamingChangeEventSource.java:52)
deboracle | at io.debezium.pipeline.ChangeEventSourceCoordinator.streamEvents(ChangeEventSourceCoordinator.java:313)
deboracle | at io.debezium.pipeline.ChangeEventSourceCoordinator.executeChangeEventSources(ChangeEventSourceCoordinator.java:203)
deboracle | at io.debezium.pipeline.ChangeEventSourceCoordinator.lambda$start$0(ChangeEventSourceCoordinator.java:143)
deboracle | at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
deboracle | at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
deboracle | at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
deboracle | at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
deboracle | at java.base/java.lang.Thread.run(Thread.java:829)
deboracle | [2024-10-21 12:38:13,749] INFO Streaming metrics dump: io.debezium.connector.oracle.olr.OpenLogReplicatorStreamingChangeEventSourceMetrics@73d37d04 (io.debezium.connector.oracle.olr.OpenLogReplicatorStreamingChangeEventSource)
deboracle | [2024-10-21 12:38:13,749] INFO Offsets: OracleOffsetContext [scn=464082622521, commit_scn=[], lcr_position=null] (io.debezium.connector.oracle.olr.OpenLogReplicatorStreamingChangeEventSource)
deboracle | [2024-10-21 12:38:13,749] INFO Finished streaming (io.debezium.pipeline.ChangeEventSourceCoordinator)
deboracle | [2024-10-21 12:38:13,749] INFO Connected metrics set to 'false' (io.debezium.pipeline.ChangeEventSourceCoordinator)
deboracle | [2024-10-21 12:38:14,429] ERROR WorkerSourceTask{id=Q1COTRX_OWNER_REFDB_INSTRUMENT-0} Task threw an uncaught and unrecoverable exception. Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask)
deboracle | org.apache.kafka.connect.errors.ConnectException: An exception occurred in the change event producer. This connector will be stopped.
deboracle | at io.debezium.pipeline.ErrorHandler.setProducerThrowable(ErrorHandler.java:67)
deboracle | at io.debezium.connector.oracle.olr.OpenLogReplicatorStreamingChangeEventSource.execute(OpenLogReplicatorStreamingChangeEventSource.java:149)
deboracle | at io.debezium.connector.oracle.olr.OpenLogReplicatorStreamingChangeEventSource.execute(OpenLogReplicatorStreamingChangeEventSource.java:52)
deboracle | at io.debezium.pipeline.ChangeEventSourceCoordinator.streamEvents(ChangeEventSourceCoordinator.java:313)
deboracle | at io.debezium.pipeline.ChangeEventSourceCoordinator.executeChangeEventSources(ChangeEventSourceCoordinator.java:203)
deboracle | at io.debezium.pipeline.ChangeEventSourceCoordinator.lambda$start$0(ChangeEventSourceCoordinator.java:143)
deboracle | at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
deboracle | at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
deboracle | at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
deboracle | at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
deboracle | at java.base/java.lang.Thread.run(Thread.java:829)
deboracle | Caused by: io.debezium.connector.oracle.olr.client.OlrNetworkClientException: Connection lost
deboracle | at io.debezium.connector.oracle.olr.client.OlrNetworkClient.fillBuffer(OlrNetworkClient.java:268)
deboracle | at io.debezium.connector.oracle.olr.client.OlrNetworkClient.read(OlrNetworkClient.java:241)
deboracle | at io.debezium.connector.oracle.olr.client.OlrNetworkClient.readNextEvent(OlrNetworkClient.java:162)
deboracle | at io.debezium.connector.oracle.olr.client.OlrNetworkClient.readNextEventWithStartScnSkip(OlrNetworkClient.java:145)
deboracle | at io.debezium.connector.oracle.olr.client.OlrNetworkClient.readEvent(OlrNetworkClient.java:126)
deboracle | at io.debezium.connector.oracle.olr.OpenLogReplicatorStreamingChangeEventSource.execute(OpenLogReplicatorStreamingChangeEventSource.java:120)
deboracle | ... 9 more
deboracle | [2024-10-21 12:38:14,430] INFO Stopping down connector (io.debezium.connector.common.BaseSourceTask)
deboracle | [2024-10-21 12:38:14,432] INFO SignalProcessor stopped (io.debezium.pipeline.signal.SignalProcessor)
deboracle | [2024-10-21 12:38:14,433] INFO Debezium ServiceRegistry stopped. (io.debezium.service.DefaultServiceRegistry)
deboracle | [2024-10-21 12:38:14,537] INFO Connection gracefully closed (io.debezium.jdbc.JdbcConnection)
deboracle | [2024-10-21 12:38:14,642] INFO Connection gracefully closed (io.debezium.jdbc.JdbcConnection)
deboracle | [2024-10-21 12:38:14,643] INFO [Producer clientId=Q1COTRX-schemahistory] Closing the Kafka producer with timeoutMillis = 30000 ms. (org.apache.kafka.clients.producer.KafkaProducer)
deboracle | [2024-10-21 12:38:14,649] INFO Metrics scheduler closed (org.apache.kafka.common.metrics.Metrics)
deboracle | [2024-10-21 12:38:14,649] INFO Closing reporter org.apache.kafka.common.metrics.JmxReporter (org.apache.kafka.common.metrics.Metrics)
deboracle | [2024-10-21 12:38:14,649] INFO Metrics reporters closed (org.apache.kafka.common.metrics.Metrics)
deboracle | [2024-10-21 12:38:14,649] INFO App info kafka.producer for Q1COTRX-schemahistory unregistered (org.apache.kafka.common.utils.AppInfoParser)
deboracle | [2024-10-21 12:38:14,650] INFO [Producer clientId=connector-producer-Q1COTRX_OWNER_REFDB_INSTRUMENT-0] Closing the Kafka producer with timeoutMillis = 30000 ms. (org.apache.kafka.clients.producer.KafkaProducer)
deboracle | [2024-10-21 12:38:14,656] INFO Metrics scheduler closed (org.apache.kafka.common.metrics.Metrics)
deboracle | [2024-10-21 12:38:14,656] INFO Closing reporter org.apache.kafka.common.metrics.JmxReporter (org.apache.kafka.common.metrics.Metrics)
deboracle | [2024-10-21 12:38:14,656] INFO Metrics reporters closed (org.apache.kafka.common.metrics.Metrics)
deboracle | [2024-10-21 12:38:14,656] INFO App info kafka.producer for connector-producer-Q1COTRX_OWNER_REFDB_INSTRUMENT-0 unregistered (org.apache.kafka.common.utils.AppInfoParser)
deboracle | [2024-10-21 12:38:17,166] INFO 172.20.0.4 - - [21/Oct/2024:12:38:17 +0000] "GET /connectors HTTP/1.1" 200 34 "-" "curl/7.81.0" 2 (org.apache.kafka.connect.runtime.rest.RestServer)

OLR output:
[ec2-user@ip-10-72-3-42 OpenLogReplicator]$ ./OpenLogReplicator
2024-10-21 12:35:15 INFO 00000 OpenLogReplicator v1.7.0 (C) 2018-2024 by Adam Leszczynski ([email protected]), see LICENSE file for licensing information
2024-10-21 12:35:15 INFO 00000 arch: x86_64, build-arch: , system: Linux, release: 4.18.0-513.18.1.el8_9.x86_64, build: Debug, compiled: 2024-10-15 12:01, modules: Kafka OCI Protobuf
2024-10-21 12:35:15 INFO 00000 adding source: S1
2024-10-21 12:35:15 INFO 00000 adding target: T1
2024-10-21 12:35:15 INFO 00000 writer is starting with Writer: Network:0.0.0.0:5000
2024-10-21 12:35:15 WARN 10003 file: checkpoint/DB1-chkpt.json - get metadata returned: No such file or directory
2024-10-21 12:37:56 INFO 00000 client requested to start from scn: 464082622521
2024-10-21 12:37:56 INFO 00000 connecting to Oracle instance of DB1 to //0.0.0.0:1521/dpcotest.ec2.internal
2024-10-21 12:37:56 INFO 00000 version: Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production, context: dpcotest, resetlogs: 0, activation: 0, con_id: 0, con_name: dpcotest
2024-10-21 12:37:56 INFO 00000 loading character mapping for AL32UTF8
2024-10-21 12:37:56 INFO 00000 loading character mapping for AL16UTF16
2024-10-21 12:37:56 ERROR 10003 file: /opt/app/oracle/fast_recovery_area/DPCOTEST/archivelog/DPCOTEST/onlinelog/o1_mf_1_m0vkkrnp_.log - get metadata returned: No such file or directory
2024-10-21 12:37:56 INFO 00000 online redo log: /opt/app/oracle/oradata/DPCOTEST/onlinelog/o1_mf_1_m0vkkrkt_.log
2024-10-21 12:37:56 ERROR 10003 file: /opt/app/oracle/fast_recovery_area/DPCOTEST/archivelog/DPCOTEST/onlinelog/o1_mf_2_m0vkkro7_.log - get metadata returned: No such file or directory
2024-10-21 12:37:56 INFO 00000 online redo log: /opt/app/oracle/oradata/DPCOTEST/onlinelog/o1_mf_2_m0vkkrlc_.log
2024-10-21 12:37:56 ERROR 10003 file: /opt/app/oracle/fast_recovery_area/DPCOTEST/archivelog/DPCOTEST/onlinelog/o1_mf_3_m0vkkrom_.log - get metadata returned: No such file or directory
2024-10-21 12:37:56 INFO 00000 online redo log: /opt/app/oracle/oradata/DPCOTEST/onlinelog/o1_mf_3_m0vkkrln_.log
2024-10-21 12:37:56 INFO 00000 timezone: +00:00, db-timezone: +00:00, log-timezone: +00:00, host-timezone: +00:00
2024-10-21 12:37:56 INFO 00000 Oracle Replicator for DB1 in online mode is starting (flags: 4096) from scn: 464082622521
2024-10-21 12:37:56 INFO 00000 current resetlogs is: 1165353144
2024-10-21 12:37:56 INFO 00000 first data SCN: 464082622521
2024-10-21 12:37:56 INFO 00000 starting sequence not found - starting with new batch with seq: 5178
2024-10-21 12:37:56 INFO 00000 reading dictionaries for scn: 464082622521
2024-10-21 12:37:58 INFO 00000 - found: SYS.CCOL$ (dataobj: 29, obj: 32, columns: 11, lobs: 0, lob-idx: 0), part of cluster
2024-10-21 12:37:58 INFO 00000 - found: SYS.CDEF$ (dataobj: 29, obj: 31, columns: 21, lobs: 0, lob-idx: 0), part of cluster
2024-10-21 12:37:58 INFO 00000 - found: SYS.COL$ (dataobj: 2, obj: 21, columns: 37, lobs: 0, lob-idx: 0), part of cluster
2024-10-21 12:37:58 INFO 00000 - found: SYS.DEFERRED_STG$ (dataobj: 81, obj: 81, columns: 25, lobs: 0, lob-idx: 0)
2024-10-21 12:37:58 INFO 00000 - found: SYS.ECOL$ (dataobj: 152, obj: 152, columns: 4, lobs: 0, lob-idx: 0)
2024-10-21 12:37:58 INFO 00000 - found: SYS.LOB$ (dataobj: 2, obj: 109, columns: 18, lobs: 0, lob-idx: 0), part of cluster
2024-10-21 12:37:58 INFO 00000 - found: SYS.LOBCOMPPART$ (dataobj: 850, obj: 850, columns: 24, lobs: 0, lob-idx: 0)
2024-10-21 12:37:58 INFO 00000 - found: SYS.LOBFRAG$ (dataobj: 847, obj: 847, columns: 16, lobs: 0, lob-idx: 0)
2024-10-21 12:37:58 INFO 00000 - found: SYS.OBJ$ (dataobj: 18, obj: 18, columns: 37, lobs: 0, lob-idx: 0)
2024-10-21 12:37:58 INFO 00000 - found: SYS.TAB$ (dataobj: 2, obj: 4, columns: 45, lobs: 0, lob-idx: 0), part of cluster
2024-10-21 12:37:58 INFO 00000 - found: SYS.TABPART$ (dataobj: 814, obj: 814, columns: 26, lobs: 0, lob-idx: 0)
2024-10-21 12:37:58 INFO 00000 - found: SYS.TABCOMPART$ (dataobj: 836, obj: 836, columns: 35, lobs: 0, lob-idx: 0)
2024-10-21 12:37:58 INFO 00000 - found: SYS.TABSUBPART$ (dataobj: 826, obj: 826, columns: 26, lobs: 0, lob-idx: 0)
2024-10-21 12:37:58 INFO 00000 - found: SYS.TS$ (dataobj: 6, obj: 16, columns: 32, lobs: 0, lob-idx: 0), part of cluster
2024-10-21 12:37:58 INFO 00000 - found: SYS.USER$ (dataobj: 10, obj: 22, columns: 30, lobs: 0, lob-idx: 0), part of cluster
2024-10-21 12:37:58 INFO 00000 - found: XDB.XDB$TTSET (dataobj: 21073, obj: 21073, columns: 4, lobs: 0, lob-idx: 0)
2024-10-21 12:37:58 INFO 00000 - found: XDB.X$NM46MP5MDR0M04NE0KWN0SK0K1LN (dataobj: 21986, obj: 21986, columns: 2, lobs: 0, lob-idx: 0)
2024-10-21 12:37:58 INFO 00000 - found: XDB.X$PT46MP5MDR0M04NE0KWN0SK0K1LN (dataobj: 21988, obj: 21988, columns: 2, lobs: 0, lob-idx: 0)
2024-10-21 12:37:58 INFO 00000 - found: XDB.X$QN46MP5MDR0M04NE0KWN0SK0K1LN (dataobj: 21987, obj: 21987, columns: 4, lobs: 0, lob-idx: 0)
2024-10-21 12:37:58 INFO 00000 - found: OWNER_ARM.TRANSACTION_REPORT (dataobj: 0, obj: 74738, columns: 137, lobs: 6, lob-idx: 756), partitioned(table: 252, lob: 756)
2024-10-21 12:37:58 INFO 00000 last confirmed scn: 464082622521, starting sequence: 5178, offset: 0
2024-10-21 12:37:58 INFO 00000 resume writer
2024-10-21 12:37:58 INFO 00000 streaming to client
2024-10-21 12:37:58 INFO 00000 found redo log version: 19.0.0, activation: 2125574454, resetlogs: 1165353144, page: 512, sequence: 5178, SID: DPCOTEST, endian: LITTLE
2024-10-21 12:37:58 INFO 00000 processing redo log: group: 3 scn: 464082608971 to 0 seq: 5178 path: /opt/app/oracle/oradata/DPCOTEST/onlinelog/o1_mf_3_m0vkkrln_.log offset: 1024
2024-10-21 12:37:58 INFO 00000 new activation detected: 2125574454

Is the bug present on the latest master branch.
Yes

Describe steps required to reproduce the bug.
Tested with OLR running on the Oracle instance, and as a docker image with sshfs access to the redo logs via mounts at:
/home/main/oracle/oradata
/home/main/oracle/fast_recovery_area

service:
olr:
build:
context: ./open_log_replicator
dockerfile: DockerfileImage
container_name: olr
privileged: true
volumes:
- /home/main/oracle/oradata:/opt/app/oracle/oradata
- /home/main/oracle/fast_recovery_area:/opt/app/oracle/fast_recovery_area
- ./open_log_replicator/scripts:/opt/OpenLogReplicator/scripts
- ./open_log_replicator/checkpoint:/opt/OpenLogReplicator/checkpoint

For the system where OLR is running on the Oracle box, the OS is RHEL8, and OLR is built using the attached build_redhat.sh script
Config for both is attached as OpenLogReplicator.json and connector-product.properties
Checkpoints are attached, redo logs are too big to upload.

For bugs related to Redo Log parse error where reproduction is not possible.
I don't believe this to be a redo log parse error, as OLR does not give any error. This is mainly a query about an OLR-Debezium link error.
olr_log.zip

@bersler bersler added the bug Something isn't working label Oct 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants