latest max column_name instead of sql_last_start #57

igiton · 2015-09-28T12:37:10Z

At the moment, it is not possible to query database like: "SELECT * FROM TABLE_NAME where id > : last_saved_id"
and not by using built-in parameter "sql_last_start" e.g. "SELECT * FROM TABLE_NAME where timestamp > : sql_last_start".
I don't have timestamp in my table and I want to schedule logstash to have jdbc input and output in elasticsearch.
I dont want to query whole table, I want to query only new entries from database but I don't know how and where to save the "id" from previous input. This id will be used in next scheduled time.
If you found this worth to be implemented in the future please do.

talevy · 2015-09-30T09:55:51Z

This would be a very useful feature for some that have ordered fields representing IDs. We may add this into the next release of this plugin

ghost · 2015-10-29T03:50:45Z

+1 for this feature.

I've got a lot of event data that I'm importing from SQL and I can't use sql_last_start (as the query is against a subscriber DB that gets updates shipped to it on a periodic basis).

Ideally being able to specify the column that contains my ID, and have that kept between runs would be great.

alaendle · 2015-11-13T14:13:00Z

+1 - Will be much more useful than :sql_last_start. Even if you have a timestamp column you couldn't use it in conjunction with :sql_last start if logstash runs on a different server and you couldn't guarantee that system times are in sync.

nitram4 · 2015-11-22T20:02:16Z

+1 exactly what I meant #46

crazw · 2015-12-02T02:02:34Z

+1

closes logstash-plugins#57

untergeek · 2016-01-05T19:16:33Z

@igiton @willhughes @alaendle @nitram4 @crazw

You've all asked for this feature. Is it preferred to be in addition to sql_last_start or should it be an either-or thing? The current push in #108 is in an either-or configuration. While I can't imagine it being a highly desirable feature to be able to use both, I am only one user whose opinion is just that.

ghost · 2016-01-05T23:23:12Z

@untergeek hey, thanks for working on this.

I don't use the existing sql_last_start functionality, so having it be one or the other is fine by me.

suyograo · 2016-01-05T23:26:41Z

/cc @acchen97 - any thoughts on keeping both the options or either of them?

acchen97 · 2016-01-06T00:36:37Z

I like the way the current PR is (either-or). I expect most users to use this to accurately track a particular PK or UUID. Depending on how popular the "tracking by column value" option is, it may even make sense to make it default in the future.

Would love validation from others who have requested this feature though. :)

nitram4 · 2016-01-06T13:26:19Z

exactly, this feature will solve lot of issues with incremental fetching of data...

alaendle · 2016-01-06T15:02:14Z

At the moment I couldn't imagine a scenario where it makes sense to use both - so I also vote for the either-or approach. Many thanks for your work!

Test for tracking_column and warn user once per query if it's not there. Add a test to verify this is working properly closes logstash-plugins#57

untergeek · 2016-01-06T19:40:59Z

@igiton @willhughes @alaendle @nitram4 @crazw

Feel free to upgrade your jdbc plugin to 3.0.0, which now has this feature.

To make use of it, you need to add/set the following options:

  jdbc {
    # ...other config...
    use_column_value => true
    tracking_column => MY_COLUMN_NAME
    # ...other config, if any...
  }

Also changed is the built-in metadata. When using timestamps to determine which rows to get, this was called sql_last_start. In order to keep it simple, and with one metadata value, this is now sql_last_value. sql_last_value will hold the last column value from tracking_column, so you can set up a query like:

jdbc {
  statement => "SELECT id, mycolumn1, mycolumn2 FROM my_table WHERE id > :sql_last_value"
  use_column_value => true
  tracking_column => id
  # ... other configuration bits
}

Please open a new issue if you encounter any problems.

nitram4 · 2016-01-07T10:43:27Z

awesome, can be also used for timestamp fields?

untergeek · 2016-01-07T17:47:45Z

@nitram4 I don't see why not. It's a stored value. If you can do a timestamp comparison in a query string, then using the stored value should just work.

lovejeet · 2016-01-12T10:53:05Z

@untergeek - Tanks and missiles man !!!
Way to go. I upgraded to 3.0.0 and voila ....

nitram4 · 2016-01-12T19:47:59Z

thanks, its working !!! i needed it so much. Thanks again

tianchao-haohan · 2016-01-27T07:28:16Z

Would you please update the guide in elk page, where the sql_last_start was still used:
https://www.elastic.co/guide/en/logstash/current/plugins-inputs-jdbc.html#plugins-inputs-jdbc-record_last_run

acchen97 · 2016-01-27T07:31:05Z

@tianchao-haohan please see master for the latest doc version: https://www.elastic.co/guide/en/logstash/master/plugins-inputs-jdbc.html

tianchao-haohan · 2016-01-28T06:27:05Z

Got that, thanks!

geeklisted · 2016-09-21T16:14:20Z

Any chance a tracking column could be a concat or expression of multiple columns? I.e. (timestamp + some_other_column)?

talevy · 2016-09-21T19:59:25Z

@geeklisted feel free to open an issue around this specifically.

one way to achieve this today is to alias a concatted field as is done in these examples: http://www.sqlbook.com/SQL/SQL-CONCATENATE-24.aspx. and using that field name as the tracking column

suyograo added the enhancement label Nov 13, 2015

suyograo assigned untergeek Dec 2, 2015

untergeek added a commit to untergeek/logstash-input-jdbc that referenced this issue Jan 5, 2016

Allow tracking by column number, not just time

f08efe0

closes logstash-plugins#57

untergeek mentioned this issue Jan 5, 2016

Allow tracking by column number, not just time #108

Closed

untergeek closed this as completed in 5c495b4 Jan 6, 2016

lovejeet mentioned this issue Jan 12, 2016

Update jdbc.rb #109

Closed

untergeek mentioned this issue Jan 12, 2016

sql_last_start isn't persisted when executing logstash as non-sevice #88

Closed

This was referenced Jan 20, 2016

jdbc_fetch_size no effect? #103

Open

PSQLException: ERROR: column "sql_last_start" does not exist #114

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

latest max column_name instead of sql_last_start #57

latest max column_name instead of sql_last_start #57

igiton commented Sep 28, 2015

talevy commented Sep 30, 2015

ghost commented Oct 29, 2015

alaendle commented Nov 13, 2015

nitram4 commented Nov 22, 2015

crazw commented Dec 2, 2015

untergeek commented Jan 5, 2016

ghost commented Jan 5, 2016

suyograo commented Jan 5, 2016

acchen97 commented Jan 6, 2016

nitram4 commented Jan 6, 2016

alaendle commented Jan 6, 2016

untergeek commented Jan 6, 2016

nitram4 commented Jan 7, 2016

untergeek commented Jan 7, 2016

lovejeet commented Jan 12, 2016

nitram4 commented Jan 12, 2016

tianchao-haohan commented Jan 27, 2016

acchen97 commented Jan 27, 2016

tianchao-haohan commented Jan 28, 2016

geeklisted commented Sep 21, 2016

talevy commented Sep 21, 2016

latest max column_name instead of sql_last_start #57

latest max column_name instead of sql_last_start #57

Comments

igiton commented Sep 28, 2015

talevy commented Sep 30, 2015

ghost commented Oct 29, 2015

alaendle commented Nov 13, 2015

nitram4 commented Nov 22, 2015

crazw commented Dec 2, 2015

untergeek commented Jan 5, 2016

ghost commented Jan 5, 2016

suyograo commented Jan 5, 2016

acchen97 commented Jan 6, 2016

nitram4 commented Jan 6, 2016

alaendle commented Jan 6, 2016

untergeek commented Jan 6, 2016

nitram4 commented Jan 7, 2016

untergeek commented Jan 7, 2016

lovejeet commented Jan 12, 2016

nitram4 commented Jan 12, 2016

tianchao-haohan commented Jan 27, 2016

acchen97 commented Jan 27, 2016

tianchao-haohan commented Jan 28, 2016

geeklisted commented Sep 21, 2016

talevy commented Sep 21, 2016