-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
latest max column_name instead of sql_last_start #57
Comments
This would be a very useful feature for some that have ordered fields representing IDs. We may add this into the next release of this plugin |
+1 for this feature. I've got a lot of event data that I'm importing from SQL and I can't use sql_last_start (as the query is against a subscriber DB that gets updates shipped to it on a periodic basis). Ideally being able to specify the column that contains my ID, and have that kept between runs would be great. |
+1 - Will be much more useful than :sql_last_start. Even if you have a timestamp column you couldn't use it in conjunction with :sql_last start if logstash runs on a different server and you couldn't guarantee that system times are in sync. |
+1 exactly what I meant #46 |
+1 |
@igiton @willhughes @alaendle @nitram4 @crazw You've all asked for this feature. Is it preferred to be in addition to |
@untergeek hey, thanks for working on this. I don't use the existing sql_last_start functionality, so having it be one or the other is fine by me. |
/cc @acchen97 - any thoughts on keeping both the options or either of them? |
I like the way the current PR is (either-or). I expect most users to use this to accurately track a particular PK or UUID. Depending on how popular the "tracking by column value" option is, it may even make sense to make it default in the future. Would love validation from others who have requested this feature though. :) |
exactly, this feature will solve lot of issues with incremental fetching of data... |
At the moment I couldn't imagine a scenario where it makes sense to use both - so I also vote for the either-or approach. Many thanks for your work! |
Test for tracking_column and warn user once per query if it's not there. Add a test to verify this is working properly closes logstash-plugins#57
Test for tracking_column and warn user once per query if it's not there. Add a test to verify this is working properly closes logstash-plugins#57
@igiton @willhughes @alaendle @nitram4 @crazw Feel free to upgrade your jdbc plugin to 3.0.0, which now has this feature. To make use of it, you need to add/set the following options:
Also changed is the built-in metadata. When using timestamps to determine which rows to get, this was called
Please open a new issue if you encounter any problems. |
awesome, can be also used for timestamp fields? |
@nitram4 I don't see why not. It's a stored value. If you can do a timestamp comparison in a query string, then using the stored value should just work. |
@untergeek - Tanks and missiles man !!! |
thanks, its working !!! i needed it so much. Thanks again |
Would you please update the guide in elk page, where the sql_last_start was still used: |
@tianchao-haohan please see master for the latest doc version: https://www.elastic.co/guide/en/logstash/master/plugins-inputs-jdbc.html |
Got that, thanks! |
Any chance a tracking column could be a concat or expression of multiple columns? I.e. (timestamp + some_other_column)? |
@geeklisted feel free to open an issue around this specifically. one way to achieve this today is to alias a concatted field as is done in these examples: http://www.sqlbook.com/SQL/SQL-CONCATENATE-24.aspx. and using that field name as the tracking column |
At the moment, it is not possible to query database like: "SELECT * FROM TABLE_NAME where id > : last_saved_id"
and not by using built-in parameter "sql_last_start" e.g. "SELECT * FROM TABLE_NAME where timestamp > : sql_last_start".
I don't have timestamp in my table and I want to schedule logstash to have jdbc input and output in elasticsearch.
I dont want to query whole table, I want to query only new entries from database but I don't know how and where to save the "id" from previous input. This id will be used in next scheduled time.
If you found this worth to be implemented in the future please do.
The text was updated successfully, but these errors were encountered: