-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reducing the data persited in the full_data columm #185
Reducing the data persited in the full_data columm #185
Conversation
Does this require a data migration for existing full_data columns (put another way...have events already been released, and of so how do you deal with existing ones)? |
8985d7a
to
2b1dda5
Compare
def get_last_cnn_from_events(ems_id) | ||
EventStream.where(:ems_id => ems_id).maximum('ems_ref') || 1 | ||
def get_last_ems_ref(ems_id) | ||
EventStream.where(:ems_id => ems_id).maximum('CAST(ems_ref AS int)') || 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ems_id is already a bigint in the database, so I don't understand the statement about fixing string comparisons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey, the maximum is applied on the ems_ref and it is a string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, I just cast the ems_ref to int
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds like you are doing something similar to what the chain_id column is for, so perhaps it's better to store it there? The purpose of the chain_id is to store a chain of events that make up a sequence, and all events in the same chain will have the same id. Additionally, the chain_id is typically a constantly growing id, which is perfect for finding the maximum of. @agrare Thoughts on using chain_id instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, my other concern here is that every event addition will require a somewhat complex operation. This table is one of our largest tables (multiple millions of rows) and maximum is a table scan when done in this way with casting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm I think this is a little different than chaining events, all of these events are independent and it looks like get_last_ems_ref is used when asking the provider for events aka give me all events that occurred since X.
It looks like we call this get_last_cnn_from_events everytime we collect events but probably we should be calling that once when the collector starts up then save the highest cnn as a variable so we aren't hitting the event_streams table on every batch collection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@agrare good point, I will try create a cache to improve performance.
2b1dda5
to
521fe7d
Compare
@Fryguy about you comment I think that refactor impact only the new events, because the old events still gonna have these extra data and I'm only reducing the number of things that be persisted in the full_data column. Also I investigate and we aren't using nothing of the full_data column today, before this PR we were saving all the attributes of the XClarityClient::Event object and now we only saving the interesting to Lenovo in the full_data. The old full_data:
The new full_data:
|
521fe7d
to
412987a
Compare
@CharlleDaniel What is the purpose of the get_last_ems_ref? |
@Fryguy Morning, the method purpose is get the bigger ems_ref and use it to get the new events since this value (ems_ref). I'm just fixing the string comparison and changing the method name to make more sense. |
def parse_events(events) | ||
events.collect do |data| | ||
event = ManageIQ::Providers::Lenovo::PhysicalInfraManager::EventCatcher::Event.new(data).to_hash | ||
ManageIQ::Providers::Lenovo::PhysicalInfraManager::EventParser.event_to_hash(event, @ems.id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the point of splitting these two up? The whole point of the EventParser
is to parse the raw event into a hash so I don't see the utility in adding ManageIQ::Providers::Lenovo::PhysicalInfraManager::EventCatcher::Event
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created this class to filter only the used attributes, but I will refactor this.
@CharlleDaniel what do you think about https://github.com/ManageIQ/manageiq-providers-lenovo/pull/185/files#r192171630 specifically only calling get_last_ems_ref once on startup? |
2b3a6a3
to
331cb3d
Compare
331cb3d
to
28ba18f
Compare
@@ -54,7 +69,7 @@ def create_event_connection(ems) | |||
:port => ems.endpoints.first.port) | |||
end | |||
|
|||
def get_last_cnn_from_events(ems_id) | |||
EventStream.where(:ems_id => ems_id).maximum('ems_ref') || 1 | |||
def bigger_ems_ref(ems_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think last_event_ems_ref
or something would be a better name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@agrare It's fine, I changed this name to last_event_ems_ref
, thanks 👍
end | ||
|
||
# Update the @bigger_ems_ref with the new bigger ems_ref if to exist new events | ||
if events.any? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the above loop doesn't filter out any events wouldn't this check be better up at the start of the function?
You can just return if events.blank?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@agrare so, could you review again ?
a4be0be
to
618d58f
Compare
4a3136e
to
cd1c7bf
Compare
fields | ||
end | ||
|
||
def parse_events(events) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you call this raw_events? Parsing events
and returning events
is confusing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Checked commits CharlleDaniel/manageiq-providers-lenovo@9e9a560~...cd1c7bf with ruby 2.3.3, rubocop 0.52.1, haml-lint 0.20.0, and yamllint 1.10.0 |
fields | ||
end | ||
|
||
def parse_events(events) | ||
if events.any? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prefer return if events.blank?
instead of having the whole method in a single conditional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, done.
1846ba6
to
1b4388a
Compare
fields | ||
end | ||
|
||
def raw_events(events) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh no other way around, the method is fine I meant change the input and the output variables to have different names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 and parse_events
was a better method name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@agrare hahaha 😄 now I changed the variables name and back the method name to parse_events.
1c7418f
to
c7f75ab
Compare
@agrare Do you have any request change now? 😰 |
def get_last_cnn_from_events(ems_id) | ||
EventStream.where(:ems_id => ems_id).maximum('ems_ref') || 1 | ||
def last_event_ems_ref(ems_id) | ||
EventStream.where(:ems_id => ems_id).maximum('CAST(ems_ref AS int)') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to be expensive but now that it is only done once on startup it should be manageable. If this proves to be an issue we can add a column to the events stream table to store this specifically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. I think all the thinks are ok, but if we need in the future add a new column to store this, let me know I can do this. 👍
Reducing the data persited in the full_data columm
This PR is able to:
full_data
column to save only the necessary data.get_last_ems_ref
"8000" > "10000"
.