Newer
Older
New Nagios UI, with performance data persistence, alerting and grafana dashboards.
## Components
* Nagios 4 - The industry standard in infrastructure monitoring
* Prometheus - The leader in statistical data gathering
* Grafana - The best known and most flexible graph dashboarding tool
* Alertmanager - Prometheus plugin to centralise and unify alerting from various alert sources
# TODO
## Essential
* UI Authentication (ID sharing with Grafana if possible) based on Nagios contacts
* Nagios config editing system?
* External volumes for Prometheus and Grafana
* Add config (if not present)
* prometheus.yaml
* Use subdirs to allow additional files
* grafana.ini
* Grafana default data source
* Grafana default graphs (Hosts/Services)
* Should Grafana allow logins and changes?
## Next steps - Questions
* Alertmanager?
* Should Prometheus be able to send alerts?
* We could do this by setting up a rule based on the `nagios_state` label
* Should Nagios be able to send alerts?
* Should this be delegated to prometheus?
* Should Grafana be able to send alerts?
* What if one component is failing (e.g. Nagios config fail, or Prometheus DB corrupt)?
## Hard but useful - Questions
* Clustering?
* Prometheus (Sort of has this functionality)
* Grafana - Need to sync graphs/datasources on change
* Alertmanager (Built in)
* For Nagios
* How to heartbeat?
* Who is master?
* Should all nodes gather data
* Should Prometheus support alerting
* Should Nagios use alertmanager
# Notes
https://grafana.com/docs/grafana/latest/http_api/dashboard/