diff --git a/simplexalerts/index.md b/simplexalerts/index.md index 9373168..811e3dd 100644 --- a/simplexalerts/index.md +++ b/simplexalerts/index.md @@ -27,16 +27,20 @@ Simple threshold-based alert are reactive by nature, but their automated monitor - Threshold-based: a [SMARTCTL](https://en.wikipedia.org/wiki/Smartctl) alert creating a notification when any hard drive within your infrastructure crosses a pre-failure threshold +~~~ smartctl_device_attribute{attribute_flags_long=\~".*prefailure.*", attribute_value_type="value"} <= on (device, attribute_id, instance, attribute_name) smartctl_device_attribute{attribute_flags_long=\~".*prefailure.*", attribute_value_type="thresh"} +~~~ - Statistical (anomaly detection): CPU spike or under-use +~~~ cpu_percentage_use > (avg_over_time(cpu_percentage_use[5m]) + (3* stddev_over_time(cpu_percentage_use[5m]))) OR cpu_percentage_use < (avg_over_time(cpu_percentage_use[5m]) - (3* stddev_over_time(cpu_percentage_use[5m]))) +~~~ ## Associated Risks