Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate healthcheck in netbox #8831

Closed
jcralbino opened this issue Mar 9, 2022 · 9 comments
Closed

Integrate healthcheck in netbox #8831

jcralbino opened this issue Mar 9, 2022 · 9 comments
Assignees
Labels
status: accepted This issue has been accepted for implementation type: feature Introduction of new functionality to the application

Comments

@jcralbino
Copy link

jcralbino commented Mar 9, 2022

NetBox version

v3.1.9

Feature type

New functionality

Proposed functionality

In order to improve the monitoring of the netbox infrastructure key components we may be able to leverage the app django-health-check

Using a dedicated url https://netbox.fqdn/healthcheck we will be able to perform health checks in the applications required to run netbox and eventually its monitoring.

We should be able to check:

  • status of DB
  • status of Redis and queues
  • status of nginx

other relevant metrics

Use case

Adding a healthcheck endpoint that is able to validate the status of the database , Nginx service and redis service will provide the necessary information required by admins to confirm that the service is running in good condition

This is of particular relevance within distributed environment like kubernettes

The information provided in this endpoint can be then consumed via api. Relevant values can be

  • dBstatus : can identify that dB is working and replying to queries from the application
  • redisstatus : can identify that redis is operating well and caching is occurring
  • nginx : can identify if service is providing functional

Database changes

No response

External dependencies

_No response_is

@jcralbino jcralbino added the type: feature Introduction of new functionality to the application label Mar 9, 2022
@jeremystretch
Copy link
Member

Improve the monitoring of the environment running netbox

We're going to need a more substantial explanation of the use case here, especially if the proposal is to introduce a new dependency. Please elaborate on the specific features you'd like to see and why.

@jeremystretch jeremystretch added the status: revisions needed This issue requires additional information to be actionable label Mar 28, 2022
@okamidash
Copy link

I'm not the original issue creator, but I want to throw my hat into the ring.
Adding a healthcheck endpoint would allow admins using netbox in distributed deployments (such as Kubernetes) to be able to monitor the health and readiness of an application in a more realistic manner.

As it currently stands, there is no way of knowing whether netbox can accept requests, beyond a tcp socket being open. That doesn't inspire much confidence in the application, regardless of it's actual state.

Please implement this, if you have the time :3

@jcralbino
Copy link
Author

Hello
Thanks for looking into this
In our setup that is using a distributed environment where db and redis is running in a separate virtual machine having this endpoint will simplify our monitoring of this service

I have updated the description of the use case

@jeremystretch jeremystretch added status: under review Further discussion is needed to determine this issue's scope and/or implementation and removed status: revisions needed This issue requires additional information to be actionable labels Mar 29, 2022
@github-actions
Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. NetBox is governed by a small group of core maintainers which means not all opened issues may receive direct feedback. Please see our contributing guide.

@github-actions github-actions bot added the pending closure Requires immediate attention to avoid being closed for inactivity label May 29, 2022
@jeremystretch jeremystretch added needs milestone Awaiting prioritization for inclusion with a future NetBox release and removed status: under review Further discussion is needed to determine this issue's scope and/or implementation pending closure Requires immediate attention to avoid being closed for inactivity labels Jun 7, 2022
@riconem
Copy link

riconem commented Aug 19, 2022

Hey,
is there any progress on this feature?

This feature would be great to monitor the status of the netbox and their dependencies.

In our setup sometimes we have connection issues. But it's hard to say where this issues comes from. With this feature we could simply see if its any failure of the netbox setup or other network elements.

@arthanson arthanson self-assigned this Nov 2, 2022
@jeremystretch jeremystretch added status: accepted This issue has been accepted for implementation and removed needs milestone Awaiting prioritization for inclusion with a future NetBox release labels Nov 3, 2022
@jeremystretch jeremystretch modified the milestone: v3.4 Nov 3, 2022
@jeremystretch
Copy link
Member

We discussed this a bit more in today's maintainers meeting, and the consensus was that a plugin would be a slightly better fit. The primary rationale for this decision was that some subset of users inevitably will want to disable the functionality, and our approach to date of providing a toggle in the form of a configuration parameter is something of an anti-pattern.

Packaging this as a plugin (which is now possible in v3.4 per #9880) has the advantage of allowing users to introduce the new functionality as desired. It might also prompt heightened interested versus a core feature, as plugins are much more accessible to the casual contributor.

@arthanson are you still volunteering to own the plugin creation?

@stavros-k
Copy link

@jeremystretch Continuing from #10825

I'm working on a helm chart, for TrueNAS Scale catalog.

Healthchecks are needed for:

  • Easy testing on CI when upgrade releases.
  • Make sure pods/containers are healthy, if not restart them (k8s feature).

It does not have to be an http endpoint.
Even a binary or a script in the container that exits with 0 if everything is fine or 1 if not would be enough.

Healthcheck should be available for worker as well.
I guess for the main container you can hit the /login and see if it's up.

For housekeeper, I don't think currently there is a way to health check it.
As it's an infnite loop.
If it crashes, container will exit and k8s will restart it anyway.

@tobiasge
Copy link
Member

We added some health checks into the docker-compose.yml in the last release.
Those also work in Openshift were I use the containers. So you can reuse them in the Helm chart.

@arthanson
Copy link
Collaborator

Closing as completed a healthcheck plugin: https://github.com/netbox-community/netbox-healthcheck-plugin please add any feature requests or issues there.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 19, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
status: accepted This issue has been accepted for implementation type: feature Introduction of new functionality to the application
Projects
None yet
Development

No branches or pull requests

7 participants