Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

latest max column_name instead of sql_last_start #57

Closed
igiton opened this issue Sep 28, 2015 · 21 comments
Closed

latest max column_name instead of sql_last_start #57

igiton opened this issue Sep 28, 2015 · 21 comments
Assignees

Comments

@igiton
Copy link

igiton commented Sep 28, 2015

At the moment, it is not possible to query database like: "SELECT * FROM TABLE_NAME where id > : last_saved_id"
and not by using built-in parameter "sql_last_start" e.g. "SELECT * FROM TABLE_NAME where timestamp > : sql_last_start".
I don't have timestamp in my table and I want to schedule logstash to have jdbc input and output in elasticsearch.
I dont want to query whole table, I want to query only new entries from database but I don't know how and where to save the "id" from previous input. This id will be used in next scheduled time.
If you found this worth to be implemented in the future please do.

@talevy
Copy link
Contributor

talevy commented Sep 30, 2015

This would be a very useful feature for some that have ordered fields representing IDs. We may add this into the next release of this plugin

@ghost
Copy link

ghost commented Oct 29, 2015

+1 for this feature.

I've got a lot of event data that I'm importing from SQL and I can't use sql_last_start (as the query is against a subscriber DB that gets updates shipped to it on a periodic basis).

Ideally being able to specify the column that contains my ID, and have that kept between runs would be great.

@alaendle
Copy link

+1 - Will be much more useful than :sql_last_start. Even if you have a timestamp column you couldn't use it in conjunction with :sql_last start if logstash runs on a different server and you couldn't guarantee that system times are in sync.

@nitram4
Copy link

nitram4 commented Nov 22, 2015

+1 exactly what I meant #46

@crazw
Copy link

crazw commented Dec 2, 2015

+1

@untergeek
Copy link
Contributor

@igiton @willhughes @alaendle @nitram4 @crazw

You've all asked for this feature. Is it preferred to be in addition to sql_last_start or should it be an either-or thing? The current push in #108 is in an either-or configuration. While I can't imagine it being a highly desirable feature to be able to use both, I am only one user whose opinion is just that.

@ghost
Copy link

ghost commented Jan 5, 2016

@untergeek hey, thanks for working on this.

I don't use the existing sql_last_start functionality, so having it be one or the other is fine by me.

@suyograo
Copy link
Contributor

suyograo commented Jan 5, 2016

/cc @acchen97 - any thoughts on keeping both the options or either of them?

@acchen97
Copy link

acchen97 commented Jan 6, 2016

I like the way the current PR is (either-or). I expect most users to use this to accurately track a particular PK or UUID. Depending on how popular the "tracking by column value" option is, it may even make sense to make it default in the future.

Would love validation from others who have requested this feature though. :)

@nitram4
Copy link

nitram4 commented Jan 6, 2016

exactly, this feature will solve lot of issues with incremental fetching of data...

@alaendle
Copy link

alaendle commented Jan 6, 2016

At the moment I couldn't imagine a scenario where it makes sense to use both - so I also vote for the either-or approach. Many thanks for your work!

untergeek added a commit to untergeek/logstash-input-jdbc that referenced this issue Jan 6, 2016
Test for tracking_column and warn user once per query if it's not there.
Add a test to verify this is working properly

closes logstash-plugins#57
untergeek added a commit to untergeek/logstash-input-jdbc that referenced this issue Jan 6, 2016
Test for tracking_column and warn user once per query if it's not there.
Add a test to verify this is working properly

closes logstash-plugins#57
@untergeek
Copy link
Contributor

@igiton @willhughes @alaendle @nitram4 @crazw

Feel free to upgrade your jdbc plugin to 3.0.0, which now has this feature.

To make use of it, you need to add/set the following options:

  jdbc {
    # ...other config...
    use_column_value => true
    tracking_column => MY_COLUMN_NAME
    # ...other config, if any...
  }

Also changed is the built-in metadata. When using timestamps to determine which rows to get, this was called sql_last_start. In order to keep it simple, and with one metadata value, this is now sql_last_value. sql_last_value will hold the last column value from tracking_column, so you can set up a query like:

jdbc {
  statement => "SELECT id, mycolumn1, mycolumn2 FROM my_table WHERE id > :sql_last_value"
  use_column_value => true
  tracking_column => id
  # ... other configuration bits
}

Please open a new issue if you encounter any problems.

@nitram4
Copy link

nitram4 commented Jan 7, 2016

awesome, can be also used for timestamp fields?

@untergeek
Copy link
Contributor

@nitram4 I don't see why not. It's a stored value. If you can do a timestamp comparison in a query string, then using the stored value should just work.

@lovejeet
Copy link

@untergeek - Tanks and missiles man !!!
Way to go. I upgraded to 3.0.0 and voila ....

@nitram4
Copy link

nitram4 commented Jan 12, 2016

thanks, its working !!! i needed it so much. Thanks again

@tianchao-haohan
Copy link

Would you please update the guide in elk page, where the sql_last_start was still used:
https://www.elastic.co/guide/en/logstash/current/plugins-inputs-jdbc.html#plugins-inputs-jdbc-record_last_run

@acchen97
Copy link

@tianchao-haohan please see master for the latest doc version: https://www.elastic.co/guide/en/logstash/master/plugins-inputs-jdbc.html

@tianchao-haohan
Copy link

Got that, thanks!

@geeklisted
Copy link

Any chance a tracking column could be a concat or expression of multiple columns? I.e. (timestamp + some_other_column)?

@talevy
Copy link
Contributor

talevy commented Sep 21, 2016

@geeklisted feel free to open an issue around this specifically.

one way to achieve this today is to alias a concatted field as is done in these examples: http://www.sqlbook.com/SQL/SQL-CONCATENATE-24.aspx. and using that field name as the tracking column

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.