If you are like many Sensu users, you’ve come to rely on event handlers from the Sensu Plugins project for sending notifications to on-call personnel. These handler plugins automatically apply filtering to reduce alert fatigue and improve the overall signal-to-noise ratio of notifications coming from Sensu. Historically this filtering logic has been implemented in the sensu-plugin framework. This logic will soon be deprecated in the sensu-plugin library and move into the larger Sensu Core platform as first-class features. Read on to learn more about this important change.
It was a good idea at the time…
As a Sensu user, you may have already discovered that there’s a significant difference between using a Sensu Plugins handler and using more generic command line utilities to handle events. Configuring a handler definition which uses a command line utility like mailx to send email notifications will result in a message being sent for each received event, whereas a handler plugin like handler-mailer.rb from the sensu-plugins-mailer gem will generate far fewer messages.
The reason for this is that handlers from the Sensu Plugins project almost universally build on the Sensu::Handler class provided by the sensu-plugin library. This class applies a series of filters to every event, evaluating whether the event is associated with a check that is silenced, has alerting disabled, or qualifies as an repeated event by way of comparing
occurrence attributes (a.k.a. occurrence filtering). The filters in this class are only available to Ruby plugins which use the sensu-plugin library, and even then there are some caveats.
When filtering was added to sensu-plugin circa 2011, abstracting this behavior into a shared library made sense. Native filters as we know them now had not yet been added to Sensu, and since that time this common filtering behavior has helped handler authors avoid reimplementing the same filtering pattern in each handler.
Fast-forwarding to today, users are increasingly finding that filtering inside the handler plugin itself can be computationally expensive and may lead to unexpected alerting behaviors.
Ideally a Sensu event that will be filtered out should be dropped as early on in processing as possible. Forking a child process to perform event filtering inside the handler plugin requires more compute resources (CPU, RAM) than using a native Sensu filter. Wherever possible, forking a child process to execute a handler plugin should be avoided in order to keep resource usage low.
Without an understanding of sensu-plugin’s internals, it seems reasonable to assume that a handler configured with a set of native Sensu filters ought to notify the on-call operator if an event successfully navigates those filters. Unfortunately, due to aspects of sensu-plugin’s logic which cannot currently be overridden via configuration, the reality is that such events may be unexpectedly filtered. This has led users to open a number of issues (e.g. sensu #1067, #1092, #1097, sensu-plugin #88) and pull requests (e.g. sensu-plugin #91, #107) around this sort of unexpected behavior.
For example, an event which has alerted on-call personnel may resolve itself without human intervention, but under the current occurrence filtering implementation resolution events can be unexpectedly filtered. The on-call personnel who would benefit from knowing the incident is resolved, allowing them to return to sleep, instead must manually review the state of alerts to determine that the event has been resolved.
These are just a few examples of the feedback we’ve received from Sensu users. There are almost certainly many more such issues already closed or simply unreported.
So where do we go from here?
UPDATE: Since this post was first published we’ve implemented many of the changes described herein. See our update below.
As stewards of Sensu we want to ensure success for users by providing functionality that is both obvious and consistent. The current situation leaves much to be desired in both of these criteria, and changing occurrence filtering logic will have a significant impact on how operators experience on-call rotations with Sensu. To remedy these issues we are planning to take the following actions over the coming weeks:
- Publish improved documentation around native Sensu filters, including examples which reproduce the current event filtering logic of sensu-plugin.
- Release an open source Sensu filter extension that reproduces sensu-plugin occurrence filtering logic
- Publish a new minor version release of sensu-plugin (1.4.0) which:
- Supports disabling the library’s built-in occurrence filtering on a per-check basis
- Prints deprecation warnings in the Sensu log file when the built-in filtering is used
- Update plugin gems published by the Sensu Plugins project to depend on sensu-plugin 1.4.0, once available
- Subsequently, publish a new major version release of sensu-plugin (2.0.0) which disables occurrence filtering logic by default
- Update plugin gems published by the Sensu Plugins project to depend on sensu-plugin 2.0.0, once available
The current situation is such that we can’t move things forward without requiring some work from you, the Sensu operator. We need you to help us spread the word about this change by sharing this post with your friends and colleagues who use Sensu, and validate the documentation and software enhancements described above as they become available by adopting them into your Sensu environment. We believe that by making event filtering more configurable and decoupled from the sensu-plugin framework, native Sensu filters will become more effective for and better understood by all Sensu users—regardless of the executables they choose to use as handlers—which will be a win for our entire community.
For the TL;DR crowd, here’s a condensed summary of the points laid out above:
- By default, handler plugins from the Sensu Plugins project filter events in a way which is relatively resource intensive, non-obvious and often confusing
- Starting with the next minor release of the sensu-plugin library (1.4.0), this default event filtering will generate deprecation warnings in handler output which will be visible in the Sensu server log file
- Documentation will be enhanced to illustrate how filtering currently implemented in sensu-plugin can be reproduced using native filters
- To continue using the current occurrence filtering behavior beyond sensu-plugin 1.4.0, Sensu users will need to update their handler configurations to make use of new filter extensions and native filters
- Future releases of Sensu will include enhancements (e.g. first-class silencing, occurrence filtering extension) to make this transition easier
Update: 6 September 2016
Since this writeup was originally posted, we’ve been busy working on these and other improvements in preparation to release Sensu 0.26. This new release includes the following:
- New sensu-plugin version 1.4, which deprecates the library’s built-in filter behavior.
- New first-class silencing API, enabling new approaches to silencing independent of sensu-plugin library.
- New occurrences filter extension, an improved implementation of sensu-plugin’s now-deprecated occurrence filtering.
- Event filter documentation now includes additional eval filter examples, among others.
Although Sensu 0.26 will be the first release to include this new version of sensu-plugin, those who are interested can install the new version in an existing Sensu environment by running
sudo /opt/sensu/embedded/bin/gem install sensu-plugin -v '~> 1.4'.
With sensu-plugin 1.4.0 or later installed, deprecated filter behavior can be disabled on a per-check basis by setting the value of the
enable_deprecated_filtering check attribute to
false. The value of
enable_deprecated_filtering defaults to
true, meaning that sensu-plugin will continue to apply filtering as it has done in previous versions, but will print warnings when events are filtered by the now-deprecated code.