src/weatherreport/how_to_add_a_check.md
A new check can be added by creating a new file in the src/ directory named
weatherreport_check_NAME.erl where NAME is a short descriptive check name
(e.g. memory_use).
The file must be an erlang module which implements the weatherreport_check
behaviour. This requires the following four functions to be implemented (see
the documentation of the weatherreport_check module for more details):
description/0 Return a short description of what the check does. This will
be printed to the console when weatherreport is run with the -l option.
valid/0 Check that running the diagnostic check is valid. Any preconditions
required by the check (e.g. cluster connectivity) should be carried out here.
If a check has no prerequisites then this function can just return true.
check/0 The function that actually performs the check. Typically this will
involve either calls to the local OS (via weatherreport_util:run_command/1,
calls to the local cluster node (via weatherreport_node:local_command/3) or
calls to the cluster (via weatherreport_node:cluster_command/3). This
command should return a list of tuples of the form {LogLevel, Message}
where LogLevel is an atom that specifies a supported log level (e.g.
warning or info) and Message is any erlang term that is matched by the
format/1 function.
format/1 This function is used to format the messages returned by check/0
and its clauses must match all possible messages returnable by check/0. It
should return a tuple of the form {String, Args} where String is the
format string Args is the list of formatting arguments. The format string
should be a human-readable description of the message.
The following annotated example is based on weatherreport_check_memory_use.erl
and the file header and licence is omitted.
%% @doc Diagnostic that checks the current memory usage. If memory
%% usage is high, a warning message will be sent, otherwise only
%% informational messages.
The module begins with an edoc declaration which provides af full description of
the check. Any relevant details which cannot be communicated in the one-line
string returned by description/0 function should be included here.
-module(weatherreport_check_memory_use).
-behaviour(weatherreport_check).
-export([description/0,
valid/0,
check/0,
format/1]).
The module name is specified, the weatherreport_check behaviour is set and the
functions required by that behaviour are exported.
-spec description() -> string().
description() ->
"Measure memory usage".
Define description/0 which returns a concise description for inclusion in
command line output.
-spec valid() -> boolean().
valid() ->
weatherreport_node:can_connect().
Define valid/0 which is used to check that we can connect to the local cluster
node. Connectivity to the local node is required in this check so that the OS
process ID can be obtained.
-spec check() -> [{atom(), term()}].
check() ->
Pid = weatherreport_node:pid(),
Output = weatherreport_util:run_command("ps -o pmem,rss -p " ++ Pid),
[_,_,Percent, RealSize| _] = string:tokens(Output, "/n \n"),
Messages = [{info, {process_usage, Percent, RealSize}}],
case weatherreport_util:binary_to_float(list_to_binary(Percent)) >= 90 of
false ->
Messages;
true ->
[{critical, {high_memory, Percent}} | Messages]
end.
The actual code that carries out the check. Note that an info message is
always returned and a critical message is appended to the Messages list
only if memory usage exceeds a hard-coded threshold. Note also that there are
two message forms: {process_usage, Percent RealSize} and
{high_memory, Percent}. When format/1 is defined it must match both of
these message forms.
-spec format(term()) -> {io:format(), [term()]}.
format({high_memory, Percent}) ->
{"Memory usage is HIGH: ~s% of available RAM", [Percent]};
format({process_usage, Percent, Real}) ->
{"Process is using ~s% of available RAM, totalling ~s KB of real memory.", [Percent, Real]}.
Finally format/1 is defined. There are two function clauses, one to match each
of the message forms that can be returned by check. The tuple returned by this
function will eventually be used to generate the text displayed in the console
output.