How to resolve Sprinto check for monitoring ElastiCache datastore CPU utilization

About

Sprinto Check: AWS ElastiCache Datastore CPU Utilization Monitoring

When you integrate your AWS account with Sprinto, the platform automatically retrieves the list of services associated with your account. If you're using AWS ElastiCache and haven't configured a monitoring alert for the CPUUtilization metric, Sprinto activates this check for all ElastiCache clusters under your account.

Note: At this time, our platform does not support AWS CloudWatch Composite Alarms or Math-Based Alarms. This means you cannot create alarms that: Combine multiple alarms using logical conditions (e.g., ALARM1 AND ALARM2) Use metric math expressions (e.g., calculating averages or deltas across metrics)

Purpose

Monitoring CPU utilization in your AWS ElastiCache environment, specifically focusing on the CPUUtilization metric, is essential for ensuring optimal performance and efficient resource allocation. Sprinto promptly sends notifications if CPU utilization exceeds the defined threshold, empowering you to take proactive measures to maintain optimal performance.

How to Resolve

To address this check, follow these steps to create an AWS CloudWatch alert to monitor AWS ElastiCache CPU utilization.

Before you Begin

Ensure you have "Admin" access on the AWS account to create alerts.
Confirm the existence of ElastiCache clusters on your AWS account for which you want to set up alerts.
Log in to Sprinto as an administrator.

Create Monitoring Alert

Log in to your AWS console using your credentials.
Navigate to the CloudWatch web service.
Go to Alarms > In Alarm, and click Create Alarm.
Click Select Metric.
On the Select Metric page, choose the AWS/ElastiCache namespace and select the metric CPUUtilization.
Specify the metric and conditions to define the alert trigger point, then click Next.
On the Configure Action page:
- Create New Topic: Enter the topic name and email address for alert notifications.
- Add Notification: Select the created SNS topic and click Add Notification.
Enter a name and description for the created alert, then click Next.
Preview your created alarm, and if necessary, edit any parameters before clicking Create Alarm.

Once the monitoring alert for ElastiCache CPU utilization is set up on CloudWatch, Sprinto retrieves the changes, and the Sprinto check status is marked as "Passing."

For additional assistance with the Sprinto check, please reach out to Sprinto Support. We're here to help!

Benefits of Using CloudWatch with ElastiCache

ElastiCache, coupled with CloudWatch, offers enhanced visibility into critical performance metrics associated with your resources. CloudWatch alarms provide the capability to set thresholds on these metrics, ensuring timely notifications to prompt preventive actions when necessary.

Monitoring Workload Trends Over Time

CloudWatch allows you to track trends over an extended period, offering data points available for up to 455 days (15 months). This historical perspective aids in detecting workload growth, providing valuable insights that contribute to forecasting resource utilization effectively.

Precise Visibility into Redis Process Load

Given that Redis is single-threaded when processing commands, ElastiCache introduces the EngineCPUUtilization metric to offer precise visibility into the load of the Redis process. This clarity aids in comprehensively understanding your Redis workload.

Setting Thresholds and Best Practices

Setting thresholds for EngineCPUUtilization is crucial, and although the tolerance level varies for each use case, a best practice is to ensure that EngineCPUUtilization stays below 90%. Benchmarks correlating EngineCPUUtilization with performance, based on your application and expected workload, can provide valuable insights.

To stay proactive, it's recommended to set multiple CloudWatch alarms at different levels for EngineCPUUtilization (e.g., 65% for WARN and 90% for HIGH) to receive alerts before performance impacts occur.

Remediation Steps for High EngineCPUUtilization

Addressing high EngineCPUUtilization involves several considerations:

Identifying Redis Operations: Utilize Redis SLOWLOG to identify commands with longer completion times, addressing potential causes such as excessive usage of commands like KEYS.
Optimizing Data Model: Non-optimal data models can contribute to unnecessary EngineCPUUtilization. Factors like set cardinality and hash size should be considered for efficient performance.
Snapshot Creation: Consider using a replica when running Redis in a node group with multiple nodes to create snapshots. This ensures the primary node remains unaffected during snapshot creation.
Volume of Operations: Analyze the type of operations causing high EngineCPUUtilization. Optimize read operations using read replicas and provide additional compute capacity for write operations.
Monitoring CPUUtilization: Keep an eye on CPUUtilization to monitor the percentage of CPU utilization for the entire host, especially for smaller nodes with two or fewer CPU cores.
Consideration for T2/T3 Cache Nodes: If using T2 or T3 cache nodes, monitor CPUCreditUsage and CPUCreditBalance to ensure performance levels are maintained, considering the burstable nature of these nodes.

By following these best practices and considerations, you can effectively monitor and manage the performance of your ElastiCache cluster, ensuring optimal utilization and preemptive action against potential issues.

Support

Please get in touch with our support team if you have any specific queries related to Sprnto AI or need any assistance.

Troubleshooting

Issue : Sprinto check not passing even after setting up alert for AWS ElastiCache Datastore instances within a replication group

Help article: Click here