Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reducing the data persited in the full_data columm #185

Merged
merged 2 commits into from
Jun 5, 2018

Conversation

CharlleDaniel
Copy link
Member

@CharlleDaniel CharlleDaniel commented May 23, 2018

This PR is able to:

  • Reduce the data persisted in the full_data column to save only the necessary data.
  • Fix the method get_last_ems_ref
    • The method is using the function maximum to compare strings and return true for a case like as "8000" > "10000".

@Fryguy
Copy link
Member

Fryguy commented May 23, 2018

Does this require a data migration for existing full_data columns (put another way...have events already been released, and of so how do you deal with existing ones)?

@CharlleDaniel CharlleDaniel force-pushed the full_data_columm_cleaning branch from 8985d7a to 2b1dda5 Compare May 23, 2018 14:31
def get_last_cnn_from_events(ems_id)
EventStream.where(:ems_id => ems_id).maximum('ems_ref') || 1
def get_last_ems_ref(ems_id)
EventStream.where(:ems_id => ems_id).maximum('CAST(ems_ref AS int)') || 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ems_id is already a bigint in the database, so I don't understand the statement about fixing string comparisons.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey, the maximum is applied on the ems_ref and it is a string.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, I just cast the ems_ref to int

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like you are doing something similar to what the chain_id column is for, so perhaps it's better to store it there? The purpose of the chain_id is to store a chain of events that make up a sequence, and all events in the same chain will have the same id. Additionally, the chain_id is typically a constantly growing id, which is perfect for finding the maximum of. @agrare Thoughts on using chain_id instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, my other concern here is that every event addition will require a somewhat complex operation. This table is one of our largest tables (multiple millions of rows) and maximum is a table scan when done in this way with casting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm I think this is a little different than chaining events, all of these events are independent and it looks like get_last_ems_ref is used when asking the provider for events aka give me all events that occurred since X.

It looks like we call this get_last_cnn_from_events everytime we collect events but probably we should be calling that once when the collector starts up then save the highest cnn as a variable so we aren't hitting the event_streams table on every batch collection.

Copy link
Member Author

@CharlleDaniel CharlleDaniel Jun 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agrare good point, I will try create a cache to improve performance.

@CharlleDaniel CharlleDaniel force-pushed the full_data_columm_cleaning branch from 2b1dda5 to 521fe7d Compare May 23, 2018 14:36
@CharlleDaniel
Copy link
Member Author

CharlleDaniel commented May 23, 2018

@Fryguy about you comment I think that refactor impact only the new events, because the old events still gonna have these extra data and I'm only reducing the number of things that be persisted in the full_data column. Also I investigate and we aren't using nothing of the full_data column today, before this PR we were saving all the attributes of the XClarityClient::Event object and now we only saving the interesting to Lenovo in the full_data.

The old full_data:

{
   "action": 100,
   "args": [],
   "bayText": "Not Available",
   "chassisText": "Not Available",
   "cn": "13",
   "commonEventID": "FQXHMCR0007I",
   "componentID": "FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF",
   "eventClass": 800,
   "eventDate": "2017-04-13T13:32:25Z",
   "eventID": "FQXHMCR0007I",
   "eventSourceText": "Management",
   "failFRUs": "",
   "failSNs": "",
   "flags": "",
   "fruSerialNumberText": "Not Available",
   "localLogID": "",
   "localLogSequence": "",
   "location": "",
   "msg": "Management server date and time is synchronized to the NTP server.",
   "msgID": "",
   "mtm": "",
   "originatorUUID": "",
   "parameters": {   },
   "senderUUID": "",
   "serialnum": "",
   "service": 100,
   "serviceabilityText": "Not Required",
   "severity": 200,
   "severityText": "Informational",
   "sourceID": "FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF",
   "sourceLogID": "",
   "sourceLogSequence": 0,
   "systemFruNumberText": "Not Available",
   "systemName": "Management Server",
   "systemSerialNumberText": "Not Available",
   "systemText": "Management Server",
   "systemTypeText": "Management",
   "systemTypeModelText": "Not Available",
   "timeStamp": "2017-04-13T13:32:25Z",
   "typeText": "System",
   "userid": "",
   "userIDIndex": 1
}

The new full_data:

{
  "event_type": "snmp_linkUp",
  "ems_ref": "47953",
  "source": "LenovoXclarity",
  "message": "The communication link with ifIndex 100001 is up.",
  "timestamp": "2018-05-18T01:41:35Z",
  "component_id": "00000000000010008000A48CDB984C00",
  "severity": 200,
  "severity_type": "Informational",
  "sender_uuid": "00000000000010008000A48CDB984C00",
  "sender_name": "ThinkAgile-VX-NE1032-SW02",
  "sender_model": "7159-HD1",
  "sender_type": "Switch",
  "type": "Switch"
}

@CharlleDaniel CharlleDaniel force-pushed the full_data_columm_cleaning branch from 521fe7d to 412987a Compare May 23, 2018 15:10
@CharlleDaniel CharlleDaniel changed the title Cleaning of the data persited in the full_data columm Reduce of the data persited in the full_data columm May 24, 2018
@CharlleDaniel CharlleDaniel changed the title Reduce of the data persited in the full_data columm Reducing the data persited in the full_data columm May 24, 2018
@Fryguy
Copy link
Member

Fryguy commented May 31, 2018

@CharlleDaniel What is the purpose of the get_last_ems_ref?

@CharlleDaniel
Copy link
Member Author

@Fryguy Morning, the method purpose is get the bigger ems_ref and use it to get the new events since this value (ems_ref). I'm just fixing the string comparison and changing the method name to make more sense.

def parse_events(events)
events.collect do |data|
event = ManageIQ::Providers::Lenovo::PhysicalInfraManager::EventCatcher::Event.new(data).to_hash
ManageIQ::Providers::Lenovo::PhysicalInfraManager::EventParser.event_to_hash(event, @ems.id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the point of splitting these two up? The whole point of the EventParser is to parse the raw event into a hash so I don't see the utility in adding ManageIQ::Providers::Lenovo::PhysicalInfraManager::EventCatcher::Event.

Copy link
Member Author

@CharlleDaniel CharlleDaniel Jun 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created this class to filter only the used attributes, but I will refactor this.

@agrare
Copy link
Member

agrare commented Jun 4, 2018

@CharlleDaniel what do you think about https://github.com/ManageIQ/manageiq-providers-lenovo/pull/185/files#r192171630 specifically only calling get_last_ems_ref once on startup?

@CharlleDaniel CharlleDaniel changed the title Reducing the data persited in the full_data columm [WIP] Reducing the data persited in the full_data columm Jun 4, 2018
@miq-bot miq-bot added the wip label Jun 4, 2018
@agrare agrare self-assigned this Jun 4, 2018
@CharlleDaniel CharlleDaniel force-pushed the full_data_columm_cleaning branch from 2b3a6a3 to 331cb3d Compare June 4, 2018 20:39
@CharlleDaniel CharlleDaniel force-pushed the full_data_columm_cleaning branch from 331cb3d to 28ba18f Compare June 4, 2018 20:56
@@ -54,7 +69,7 @@ def create_event_connection(ems)
:port => ems.endpoints.first.port)
end

def get_last_cnn_from_events(ems_id)
EventStream.where(:ems_id => ems_id).maximum('ems_ref') || 1
def bigger_ems_ref(ems_id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think last_event_ems_ref or something would be a better name.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agrare It's fine, I changed this name to last_event_ems_ref, thanks 👍

end

# Update the @bigger_ems_ref with the new bigger ems_ref if to exist new events
if events.any?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the above loop doesn't filter out any events wouldn't this check be better up at the start of the function?
You can just return if events.blank?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agrare so, could you review again ?

@CharlleDaniel CharlleDaniel force-pushed the full_data_columm_cleaning branch 2 times, most recently from a4be0be to 618d58f Compare June 5, 2018 15:37
@CharlleDaniel CharlleDaniel changed the title [WIP] Reducing the data persited in the full_data columm Reducing the data persited in the full_data columm Jun 5, 2018
@miq-bot miq-bot removed the wip label Jun 5, 2018
@CharlleDaniel CharlleDaniel force-pushed the full_data_columm_cleaning branch 3 times, most recently from 4a3136e to cd1c7bf Compare June 5, 2018 15:47
fields
end

def parse_events(events)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you call this raw_events? Parsing events and returning events is confusing

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@miq-bot
Copy link
Member

miq-bot commented Jun 5, 2018

Checked commits CharlleDaniel/manageiq-providers-lenovo@9e9a560~...cd1c7bf with ruby 2.3.3, rubocop 0.52.1, haml-lint 0.20.0, and yamllint 1.10.0
3 files checked, 0 offenses detected
Everything looks fine. 🍰

fields
end

def parse_events(events)
if events.any?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer return if events.blank? instead of having the whole method in a single conditional

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, done.

@CharlleDaniel CharlleDaniel force-pushed the full_data_columm_cleaning branch 2 times, most recently from 1846ba6 to 1b4388a Compare June 5, 2018 15:59
fields
end

def raw_events(events)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh no other way around, the method is fine I meant change the input and the output variables to have different names.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 and parse_events was a better method name

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agrare hahaha 😄 now I changed the variables name and back the method name to parse_events.

@CharlleDaniel CharlleDaniel force-pushed the full_data_columm_cleaning branch 2 times, most recently from 1c7418f to c7f75ab Compare June 5, 2018 16:04
@CharlleDaniel
Copy link
Member Author

@agrare Do you have any request change now? 😰

def get_last_cnn_from_events(ems_id)
EventStream.where(:ems_id => ems_id).maximum('ems_ref') || 1
def last_event_ems_ref(ems_id)
EventStream.where(:ems_id => ems_id).maximum('CAST(ems_ref AS int)')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to be expensive but now that it is only done once on startup it should be manageable. If this proves to be an issue we can add a column to the events stream table to store this specifically.

Copy link
Member Author

@CharlleDaniel CharlleDaniel Jun 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I think all the thinks are ok, but if we need in the future add a new column to store this, let me know I can do this. 👍

@agrare agrare merged commit c7f75ab into ManageIQ:master Jun 5, 2018
agrare added a commit that referenced this pull request Jun 5, 2018
Reducing the data persited in the full_data columm
@agrare agrare added this to the Sprint 88 Ending Jun 18, 2018 milestone Jun 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants