Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New sink: postgres #15765

Open
jszwedko opened this issue Dec 28, 2022 · 11 comments · May be fixed by #21248
Open

New sink: postgres #15765

jszwedko opened this issue Dec 28, 2022 · 11 comments · May be fixed by #21248
Labels
sink: new A request for a new sink type: feature A value-adding code addition that introduce new functionality.

Comments

@jszwedko
Copy link
Member

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Use Cases

Users want to write data to Postgres from Vector.

Breaking this off from #939

Attempted Solutions

No response

Proposal

No response

References

Version

vector 0.26.0

@jszwedko jszwedko added type: feature A value-adding code addition that introduce new functionality. sink: new A request for a new sink labels Dec 28, 2022
@hhromic
Copy link
Contributor

hhromic commented Dec 28, 2022

This is certainly an interesting kind of sink: databases.
I wonder if it would make more sense to implement an ODBC sink instead of specific database sinks?
This way, any database that provides ODBC drivers (there are many already) could interface with Vector out of the box. In addition, probably would be much less maintenance burden for Vector than maintaining multiple specific sinks.

@jszwedko
Copy link
Member Author

Agreed, supporting something like ODBC would let us hit a lot of them at once. https://github.com/pacman82/odbc-api seems promising as a Rust wrapper.

@zamazan4ik
Copy link
Contributor

AFAIK, rsyslog implements both approaches: ODBC based and database-specific based. The first one allows using with the same plugin for as many databases as possible without database-specific features. The second one uses database-specific features.

@spencergilbert
Copy link
Contributor

I was also going to suggest a more general sink, at least to start with

@thomasdba
Copy link

any updates ?

@vnagendra
Copy link

Unless I am missing something what is needed for Postgres specifically is already in the current Cargo.toml (according to the PRs on this repo anyway)..

https://github.com/vectordotdev/vector/pull/18391/files

My suggestion would be to have something that addresses a specific set of (potentially limited) use cases. For me, having the ability to write to Db (a specific table) avoids having 1 more thing in the middle like Redis/Rabbit/etc.,

This is a fair thing to do for a limited number of things received (logs, metrics, messages whatever). When I say limited I am talking 100K+/day -- that is enough for a good set of use cases. Hopefully nobody receiving 100K/min is going to write to a db :)

@jorgehermo9
Copy link
Contributor

jorgehermo9 commented Sep 1, 2024

I would like to give a try to this, if I have enough time these days, I could submit a PR addressing this in a couple of weeks. It should be similar to databend and clickhouse sinks, right?

I see that we could encode the event as json (like the clickhouse sink does) and then use json_populate_record function from postgres (related docs) to fill the table schema from the json.

@jszwedko
Copy link
Member Author

jszwedko commented Sep 3, 2024

I would like to give a try to this, if I have enough time these days, I could submit a PR addressing this in a couple of weeks. It should be similar to databend and clickhouse sinks, right?

I see that we could encode the event as json (like the clickhouse sink does) and then use populate_json_record function from postgres (related docs) to fill the table schema from the json.

Awesome! It seems like there is significant demand for this feature. I think what you are describing could work. I also think we could leverage https://github.com/apache/opendal to do the inserts. We started using this library with the webhdfs sink as a way to try it out and have a PR using it for an sftp sink: #18076. It seems to be a promising approach to expand Vector's sinks without minimal wrapping.

@jorgehermo9
Copy link
Contributor

jorgehermo9 commented Sep 3, 2024

It seems that OpenDal uses postgres as a key-value storage

https://github.com/apache/opendal/blob/main/core/src/services/postgresql/backend.rs#L187

And does not fits with I think what is expected from a postgres sink. I would expect a behaviour like the databend or clickhouse sinks.

Something like this:
image

Hope this makes sense

@jorgehermo9
Copy link
Contributor

Also, it is worth noting this statement from Opendal's documentation
image

@jszwedko
Copy link
Member Author

jszwedko commented Sep 3, 2024

Also, it is worth noting this statement from Opendal's documentation image

Ah, good find 👍 I think what you suggested with json_populate_recordset likely makes the most sense then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sink: new A request for a new sink type: feature A value-adding code addition that introduce new functionality.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants