Getting started with zztat
So now that you have zztat installed and up and running, what to do next?
Doing Things Your Way
There are many configuration options which allow you to control how the framework runs on your machines. They're all available on a global scope which applies to all targets, and on a local scope which overrides the global settings for a specific database only. A list of all the configuration options and what they do is available here.
Here's a few of the most important ones you may want to set:
Namely, the Heartbeat Timeout and Heartbeat Taks Interval parameters. These control how reactive the framework is. After the number specified as the heartbeat task interval has elapsed, the target database will contact the repository to see if there are any tasks that it needs to perform. This includes tasks such as synchronizing metrics, gauges or reactions, or getting details for a new blackout. The lower the number, the more responsive the framework is, and the quicker any tasks created by the repository will get performed by the targets. The default on a new installation is 120 seconds, or two minutes. You may want to lower this for example to 30 seconds, particularly while you're setting things up.
The heartbeat timeout specifies when we consider a database as unavailable. Target databases will send a ping to the repository every 20 seconds. If the repository has not received a ping from a target after the number of seconds specified for the heartbeat timeout has elapsed, it will raise an alert.
You can use the zztat UI to set these parameters by navigating to the Configuration screen. You will find the two parameters under the General Settings. Alternatively, if you are a console person and feel more comfortable in a terminal than in a browser, you can use zz$manage.reaction_config().
How much space your zztat repository will require is largely dictated by two factors: How many metrics you are running and how frequently you are running them. But there's also a third factor which you can control through the reaction configuration: the data retention. The AUTO_PURGE_INTERVAL parameter defines how many days we keep data on the repository before purging it.
If you want to know more about how much space a given metric consumes, you can use the space_usage procedure in the zz$diag package to give you a fairly accurate prediction. Please do note however that this procedure requires some metric data to return proper results. So you may wanna run this after the framework has been running for at least some time with a set of metrics that represents what you are planning for.
One of the most dreaded alerts a DBA can receive is the database alerting you that a tablespace is about to run full. Some environments get a dozen of these every day, and they always result in the same boring task: adding a new data file or growing an existing one. It's also one of the most critical alerts, since if space runs full, your applications will start to fail. The framework provides three parameters for you to control how it automatically adds data files. I have written about this in a blog post, which you can find here.
Note that you can also configure this in the zztat UI, under the Configuration / Automatic Datafile Management section.
After a fresh installation, the zztat framework will be aware of two email addresses, which are the centrally configured warning and critical alert email addresses. This may be sufficient in some cases, but perhaps you'd like to divide up notifications among team members or even teams. You can add any number of additional emails and specify exactly what they will receive.
Under the Configuration / Notification Settings section you can launch the email configuration editor, which allows a granular control over who receives what notification and for which target. If you're a console guy, you can of course also achieve the same thing by calling zz$manage.mail_recipient.add() and zz$manage.mail_recipient.remove().
First, if you haven't already, you should run the zzdiag.sql script and ensure that everything was properly installed and that there are no issues with your installation.
The framework comes with many metrics, gauges and reactions. Several of them are disabled out of the box to ensure that the default install is light-weight and has minimal impact on any database system. Have a look at the default metric list and see which ones you'd like to enable.