README.md

# nag-next

New Nagios UI, with performance data persistence, alerting and grafana dashboards.

## Components
* Nagios 4 - The industry standard in infrastructure monitoring
* Prometheus - The leader in statistical data gathering
* Grafana - The best known and most flexible graph dashboarding tool
* Alertmanager - Prometheus plugin to centralise and unify alerting from various alert sources

# TODO

## Essential
* UI Authentication (ID sharing with Grafana if possible) based on Nagios contacts
* Nagios config editing system?
* External volumes for Prometheus and Grafana
* Add config (if not present)
  * prometheus.yaml
    * Use subdirs to allow additional files
  * grafana.ini
  * Grafana default data source
  * Grafana default graphs (Hosts/Services)
  * Should Grafana allow logins and changes?

## Next steps - Questions
* Alertmanager?
  * Should Prometheus be able to send alerts?
    * We could do this by setting up a rule based on the `nagios_state` label
  * Should Nagios be able to send alerts?
    * Should this be delegated to prometheus?
  * Should Grafana be able to send alerts?
* What if one component is failing (e.g. Nagios config fail, or Prometheus DB corrupt)?

## Hard but useful - Questions
* Clustering?
  * Prometheus (Sort of has this functionality)
  * Grafana - Need to sync graphs/datasources on change
  * Alertmanager (Built in)
  * For Nagios
    * How to heartbeat?
    * Who is master?
  * Should all nodes gather data
  * Should Prometheus support alerting
  * Should Nagios use alertmanager

# Notes

https://grafana.com/docs/grafana/latest/http_api/dashboard/