Cloudwatch

When in AWS, and with cloudwatch integration enabled, various KPI indicators are provided to allow monitoring and alerting

KPIs

The following metrics are provided:

  • Available Memory: The percentage of memory allowed to be allocated by the proxy that is available for use. In general, this can change over time dramatically, but if lower than 20% on average for a vie minute interval, an alert should be generated;
  • CPU Usage: The percentage of cpu time used by this proxy. While multiple proxies may make this metric harder to guage, in a single proxy environment, if this is above 70% for more than five minutes, an alert should be raised;
  • DB Transaction Percent: The percentage of queries that are considered to be in transctions;
  • Cache Hit Percentage: The percent of queries that are cache hits. This will vary for each customer, but can be used to set an alarm if caching drops below an expected value;
  • DB Query Rate: The rate that queries are made against the database, i.e. are not cached;
  • DB Query Time: The average response time from the database. This can be used to generate alerts if above the expected SLA;
  • Average Response Time: The average response time for all queries, cached or not. This also can be used to generate alerts if above an expected SLA;
  • DB Read Percentage: The percent of queries that are reads, which is a pre-requesite for caching or read/write split.
  • SQL Exceptions: The number (per second) of SQL exceptions generated as a result of executing queries.

Each of these metrics are reported on a per-minute basis, as part of the per-minute logging that is done with other logs to the cloudwatch log channel as well.