Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not able to raise memory threshold for MiqEmsRefreshCoreWorker #16945

Closed
tjyang opened this issue Feb 3, 2018 · 2 comments
Closed

Not able to raise memory threshold for MiqEmsRefreshCoreWorker #16945

tjyang opened this issue Feb 3, 2018 · 2 comments
Labels

Comments

@tjyang
Copy link

tjyang commented Feb 3, 2018

Hi

I am not able to raise the memory to keep workers from exiting due to reaching threshold.

  • miq version
    miqwk02 is in region 2 with role as worker to scan on openstack provider.
[root@miqwk02 vmdb]# git log -1
commit fde3d114e4c232fdf8df8d5af060653e70f02c0c
Merge: f82613e d442c03
Author: Jason Frey <[email protected]>
Date:   Wed Jan 24 14:20:58 2018 -0500

    Merge pull request #16848 from JPrause/spr77_cl

    [CHANGELOG] Update for Sprint 77
[root@miqwk02 vmdb]#

  • offending log in automation.log
[----] I, [2018-02-03T09:17:51.341020 #2015:b6f134]  
INFO -- : MiqAeEvent.build_evm_event >> event=<"evm_worker_memory_exceeded">
 inputs=<{:event_details=>"Worker [MiqEmsRefreshCoreWorker] with ID: [2000000082127], 
PID: [27614], GUID: [91fe327e-6a91-40e7-81e4-31050174fd72] 
process memory usage [607640000] exceeded limit [419430400], requesting worker to exit", 
:type=>"MiqEmsRefreshCoreWorker", "MiqEvent::miq_event"=>2000000167855, 
:miq_event_id=>2000000167855, "EventStream::event_stream"=>2000000167855, 
:event_stream_id=>2000000167855}>

  • corrective action to raise memory threshold for all workers
    Following is the the setting for all workers from Advance setting.
    All 400/500/600M memory threshold were raised to 1G.
:workers:
  :worker_base:
    :event_catcher:
      :event_catcher_amazon:
        :poll: 15.seconds
      :event_catcher_azure:
        :poll: 15.seconds
      :event_catcher_google:
        :poll: 15.seconds
      :event_catcher_kubernetes:
        :poll: 1.seconds
      :event_catcher_prometheus:
        :poll: 20.seconds
      :event_catcher_nuage_network:
        :topics:
        - topic/CNAMessages
      :event_catcher_openshift:
        :poll: 1.seconds
      :event_catcher_openstack:
        :poll: 15.seconds
        :topics:
          :nova: notifications.*
          :cinder: notifications.*
          :glance: notifications.*
          :heat: notifications.*
        :duration: 10.seconds
        :capacity: 50
        :amqp_port: 5672
        :amqp_heartbeat: 30
        :amqp_recovery_attempts: 4
        :ceilometer:
          :event_types_regex: "\\A(?!firewall|floatingip|gateway|net|port|router|subnet|security_group|vpn)"
      :event_catcher_openstack_infra:
        :poll: 15.seconds
        :topics:
          :nova: notifications.*
          :cinder: notifications.*
          :glance: notifications.*
          :heat: notifications.*
          :ironic: notifications.*
        :duration: 10.seconds
        :capacity: 50
        :amqp_port: 5672
        :amqp_heartbeat: 30
        :amqp_recovery_attempts: 4
        :ceilometer:
          :event_types_regex: "\\A(?!firewall|floatingip|gateway|net|port|router|subnet|security_group|vpn)"
      :event_catcher_openstack_network:
        :poll: 15.seconds
        :topics:
          :neutron: notifications.*
        :duration: 10.seconds
        :capacity: 50
        :amqp_port: 5672
        :amqp_heartbeat: 30
        :amqp_recovery_attempts: 4
        :ceilometer:
          :event_types_regex: "\\A(firewall|floatingip|gateway|net|port|router|subnet|security_group|vpn)"
      :event_catcher_openstack_service: auto
      :event_catcher_redhat:
        :poll: 15.seconds
      :event_catcher_redhat_network:
        :poll: 15.seconds
        :topics:
          :neutron: notifications.*
        :duration: 10.seconds
        :capacity: 50
        :amqp_port: 5672
        :amqp_heartbeat: 30
        :amqp_recovery_attempts: 4
        :ceilometer:
          :event_types_regex: "\\A(firewall|floatingip|gateway|net|port|router|subnet|security_group|vpn)"
      :event_catcher_vmware:
        :flooding_monitor_enabled: true
        :poll: 1.seconds
        :ems_event_max_wait: 60
      :event_catcher_vmware_cloud:
        :poll: 15.seconds
        :duration: 10.seconds
        :capacity: 50
        :amqp_port: 5672
        :amqp_heartbeat: 30
        :amqp_recovery_attempts: 4
      :defaults:
        :flooding_events_per_minute: 30
        :flooding_monitor_enabled: false
        :ems_event_page_size: 100
        :ems_event_thread_shutdown_timeout: 10.seconds
        :memory_threshold: 2.gigabytes
        :nice_delta: 1
        :poll: 1.seconds
      :event_catcher_ansible_tower:
        :poll: 20.seconds
      :event_catcher_embedded_ansible:
        :poll: 20.seconds
      :event_catcher_lenovo:
        :poll: 15.seconds
      :event_catcher_cinder:
        :poll: 10.seconds
      :event_catcher_swift:
        :poll: 10.seconds
      :memory_threshold: 2.gigabytes
    :queue_worker_base:
      :ems_metrics_collector_worker:
        :ems_metrics_collector_worker_amazon: {}
        :ems_metrics_collector_worker_azure: {}
        :ems_metrics_collector_worker_google: {}
        :ems_metrics_collector_worker_kubernetes:
          :metrics_port: 5000
          :metrics_path: "/hawkular/metrics"
          :prometheus_open_timeout: 5
          :prometheus_request_timeout: 30
        :ems_metrics_collector_worker_openshift: {}
        :ems_metrics_collector_worker_openstack: {}
        :ems_metrics_collector_worker_openstack_infra: {}
        :ems_metrics_collector_worker_openstack_network: {}
        :ems_metrics_openstack_default_service: auto
        :ems_metrics_collector_worker_redhat: {}
        :ems_metrics_collector_worker_redhat_network: {}
        :ems_metrics_collector_worker_vmware: {}
        :defaults:
          :count: 2
          :memory_threshold: 1.gigabytes
          :nice_delta: 3
          :poll_method: :escalate
      :ems_refresh_worker:
        :ems_refresh_worker_amazon: {}
        :ems_refresh_worker_amazon_network: {}
        :ems_refresh_worker_amazon_ebs_storage: {}
        :ems_refresh_worker_amazon_s3: {}
        :ems_refresh_worker_azure: {}
        :ems_refresh_worker_azure_network: {}
        :ems_refresh_worker_google: {}
        :ems_refresh_worker_google_network: {}
        :ems_refresh_worker_kubernetes: {}
        :ems_refresh_worker_kubevirt: {}
        :ems_refresh_worker_openshift: {}
        :ems_refresh_worker_openstack: {}
        :ems_refresh_worker_openstack_infra: {}
        :ems_refresh_worker_openstack_network: {}
        :ems_refresh_worker_redhat: {}
        :ems_refresh_worker_redhat_network: {}
        :ems_refresh_worker_vmware: {}
        :ems_refresh_worker_vmware_cloud: {}
        :ems_refresh_worker_vmware_cloud_network: {}
        :defaults:
          :memory_threshold: 2.gigabytes
          :nice_delta: 7
          :poll: 10.seconds
          :poll_method: :normal
          :queue_timeout: 120.minutes
          :restart_interval: 2.hours
        :ems_refresh_worker_ansible_tower_automation: {}
        :ems_refresh_worker_embedded_ansible_automation: {}
        :ems_refresh_worker_foreman_configuration: {}
        :ems_refresh_worker_foreman_provisioning: {}
        :ems_refresh_worker_lenovo_physical_infra: {}
        :ems_refresh_worker_microsoft: {}
        :ems_refresh_worker_nuage_network: {}
        :ems_refresh_worker_cinder: {}
        :ems_refresh_worker_swift: {}
      :defaults:
        :cpu_usage_threshold: 100.percent
        :dequeue_method: :drb
        :memory_threshold: 1.gigabytes
        :poll_method: :normal
        :queue_timeout: 10.minutes
      :ems_metrics_processor_worker:
        :count: 2
        :memory_threshold: 1.gigabytes
        :nice_delta: 7
        :poll_method: :escalate
      :event_handler:
        :cpu_usage_threshold: 0.percent
        :nice_delta: 7
      :generic_worker:
        :count: 2
        :memory_threshold: 1.gigabytes
      :priority_worker:
        :memory_threshold: 1.gigabytes
        :count: 2
        :nice_delta: 1
        :poll: 1.seconds
      :reporting_worker:
        :count: 2
        :nice_delta: 7
        :memory_threshold: 1.gigabytes
      :smart_proxy_worker:
        :count: 2
        :memory_threshold: 1.gigabytes
        :queue_timeout: 20.minutes
        :restart_interval: 6.hours
        :heartbeat_thread_shutdown_timeout: 10.seconds
    :ems_inventory_collector_worker:
      :ems_inventory_collector_worker_kubernetes:
        :deleted_notices_only: true
        :disabled: true
        :watch_thread_shutdown_timeout: 10.seconds
      :ems_inventory_collector_worker_openshift:
        :deleted_notices_only: true
        :disabled: true
        :watch_thread_shutdown_timeout: 10.seconds
      :disabled: true
      :nice_delta: 1
      :poll: 5.seconds
    :defaults:
      :count: 1
      :gc_interval: 15.minutes
      :heartbeat_freq: 10.seconds
      :heartbeat_timeout: 2.minutes
      :memory_threshold: 1.gigabytes
      :nice_delta: 10
      :parent_time_threshold: 3.minutes
      :poll: 3.seconds
      :poll_escalate_max: 30.seconds
      :poll_method: :normal
      :restart_interval: 0.hours
      :starting_timeout: 10.minutes
      :stopping_timeout: 10.minutes
    :embedded_ansible_worker:
      :starting_timeout: 20.minutes
      :poll: 10.seconds
      :memory_threshold: 0.megabytes
    :agent_coordinator_worker:
      :heartbeat_timeout: 30.minutes
      :poll: 30.seconds
    :ems_refresh_core_worker:
      :poll: 1.seconds
      :nice_delta: 1
      :thread_shutdown_timeout: 10.seconds
    :schedule_worker:
      :container_entities_purge_interval: 1.day
      :binary_blob_purge_interval: 1.hour
      :authentication_check_interval: 1.hour
      :chargeback_generation_interval: 1.day
      :chargeback_generation_time_utc: 3600
      :db_diagnostics_interval: 30.minutes
      :drift_state_purge_interval: 1.day
      :event_streams_purge_interval: 1.day
      :evm_snapshot_delete_delay_for_job_not_found: 1.hour
      :evm_snapshot_interval: 1.hour
      :job_proxy_dispatcher_interval: 15.seconds
      :job_proxy_dispatcher_stale_message_check_interval: 60.seconds
      :job_proxy_dispatcher_stale_message_timeout: 2.minutes
      :job_timeout_interval: 60.seconds
      :load_balancer_retired_interval: 10.minutes
      :log_active_configuration_interval: 1.days
      :log_database_statistics_interval: 1.days
      :memory_threshold: 1.gigabytes
      :nice_delta: 3
      :orchestration_stack_retired_interval: 10.minutes
      :performance_collection_interval: 3.minutes
      :performance_collection_start_delay: 5.minutes
      :performance_realtime_purging_interval: 21.minutes
      :performance_realtime_purging_start_delay: 5.minutes
      :performance_rollup_purging_interval: 4.hours
      :performance_rollup_purging_start_delay: 5.minutes
      :policy_events_purge_interval: 1.day
      :poll: 15.seconds
      :report_result_purge_interval: 1.week
      :server_log_stats_interval: 5.minutes
      :server_stats_interval: 60.seconds
      :service_retired_interval: 10.minutes
      :session_timeout_interval: 30.seconds
      :storage_file_collection_interval: 1.days
      :storage_file_collection_time_utc: 21600
      :task_timeout_check_frequency: 1.hour
      :vim_performance_states_purge_interval: 1.day
      :vm_retired_interval: 10.minutes
      :yum_update_check: 12.hours
    :ui_worker:
      :connection_pool_size: 8
      :memory_threshold: 1.gigabytes
      :nice_delta: 1
      :count: 1
    :vim_broker_worker:
      :memory_threshold: 2.gigabytes
      :nice_delta: 3
      :poll: 1.seconds
      :reconnect_retry_interval: 5.minutes
      :vim_broker_status_interval: 15.minutes
      :vim_broker_update_interval: 0.seconds
      :vim_broker_max_wait: 60
      :vim_broker_max_objects: 250
    :web_service_worker:
      :connection_pool_size: 8
      :memory_threshold: 1.gigabytes
      :nice_delta: 1
      :count: 1
    :websocket_worker:
      :connection_pool_size: 14
      :memory_threshold: 1.gigabytes
      :nice_delta: 1
      :count: 1
    :cockpit_ws_worker:
      :count: 1
@jrafanie
Copy link
Member

@tjyang that should have worked. Are you sure it's saved? It might take a few minutes to honor the settings. You can also try using tools/configure_server_settings.rb with examples here:
#17039

@miq-bot miq-bot added the stale label Oct 12, 2018
@miq-bot
Copy link
Member

miq-bot commented Oct 12, 2018

This issue has been automatically marked as stale because it has not been updated for at least 6 months.

If you can still reproduce this issue on the current release or on master, please reply with all of the information you have about it in order to keep the issue open.

Thank you for all your contributions!

@tjyang tjyang closed this as completed Nov 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants