Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Observations from DMS Replication tests #11

Open
hammerhead opened this issue Aug 9, 2024 · 2 comments
Open

Observations from DMS Replication tests #11

hammerhead opened this issue Aug 9, 2024 · 2 comments

Comments

@hammerhead
Copy link
Member

While testing out the DMS transformation, I made a few observations I want to share. Each issue still needs to be specified further with proper steps to reproduce, this is more of a brain dump meta collection. All observations were made on a PostgreSQL RDS source.

DELETE without primary key

Issuing a DELETE FROM my_table (aka truncate) on the source fails to replicate with Unable to invoke DML operation without primary key information. Currently, a WHERE clause with the primary key is required:

https://github.com/daq-tools/commons-codec/blob/e9f17e90bd4ad044cb6e4d942868dd868af26e1b/src/commons_codec/transform/aws_dms.py#L88

ALTER TABLE ADD COLUMN

I tried an ALTER TABLE my_table ADD COLUMN last_updated TIMESTAMP DEFAULT, which was ignored as being an unsupported operation. Subsequent UPDATE my_table SET last_updated = NOW() WHERE id = 1; therefore also fail.
What can be a strategy to handle such situations? Do we need to manually migrate the CrateDB table structure?

Similarly, if my PostgreSQL table looks like this:

CREATE TABLE public.locations (
   device_id TEXT NOT NULL PRIMARY KEY,
   location TEXT,
   attributes JSONB
);

I also cannot have the addition of a new JSON property replicated (again no primary key): UPDATE locations SET attributes['last_updated'] = TO_JSONB(NOW());

Data types within OBJECT

If there is a JSON in PostgreSQL with a timestamp, such as {"last_updated": "2024-08-09T11:54:36.815606Z"}, the OBJECT in CrateDB recognizes last_updated as TEXT. Since there is no type information in the JSON, there is probably nothing the CDC translator can do. For real-world use cases, we need to think of some strategy (run ALTER TABLE ADD COLUMN in CrateDB upfront?).

@amotl
Copy link
Member

amotl commented Aug 9, 2024

Hi, and thanks for your reports. I think you explained a few cases very well where it becomes clear that the current implementation just provides fundamental mappings of basic INSERT|UPDATE|DELETE statements. We will need to check how corresponding DMS events look like, and whether we can apply their semantics to CrateDB, and how.

@amotl
Copy link
Member

amotl commented Aug 11, 2024

In general, Debezium/PostgreSQL also do not support DDL operations / schema changes?

The [PostgreSQL] connector relies on and reflects the PostgreSQL logical decoding feature, which has the following limitations:

  • Logical decoding does not support DDL changes. This means that the connector is unable to report DDL change events back to consumers.
  • ...

-- https://debezium.io/documentation/reference/connectors/postgresql.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants