# AWS

Sprinto provides deep integration with AWS services to monitor critical infrastructure metrics across compute, storage, database, and networking layers. These monitors ensure your AWS environment is operating within safe thresholds and aligned with security and compliance expectations.

This article covers the AWS-specific infrastructure monitors in Sprinto, what they check, and how to resolve failing monitors.

***

### Monitored AWS Services

Sprinto evaluates infrastructure compliance and performance across the following AWS services:

1. **EC2** – CPU utilisation
2. **EBS** – Volume health and backup
3. **ECS** – CPU and memory utilisation
4. **DynamoDB** – Write capacity, latency, encryption, point-in-time recovery
5. **SQS** – Monitoring visible messages using CloudWatch
6. **ALB / CLB** – Latency and load balancer metrics
7. **ElastiCache** – CPU utilisation, connection count
8. **CloudWatch** – Alarm configuration and active metric collection

***

### Detailed Monitors and Resolution Steps

#### 1. **EC2: CPU Utilisation Should Be Monitored**

* **What it checks**: CloudWatch is configured to track CPU usage for EC2 instances
* **How to resolve**:
  * Go to **CloudWatch > Alarms > Create Alarm**
  * Select EC2 → Choose CPUUtilization metric
  * Define threshold (e.g., >80% for 5 minutes)
  * Set action (e.g., SNS notification)

***

#### 2. **EBS: Volume Health and Backup**

* **What it checks**:
  * Volumes are healthy (no degraded status)
  * Snapshots or backup policies are active
* **How to resolve**:
  * Use AWS Backup or Lifecycle Policies to take regular snapshots
  * Go to **EC2 > Volumes** → Check Status
  * Enable snapshot creation with tags or scheduled jobs

***

#### 3. **ECS: CPU and Memory Metrics**

* **What it checks**:
  * CloudWatch metrics for ECS services are enabled
  * Thresholds are defined for resource usage
* **How to resolve**:
  * Navigate to **CloudWatch > Metrics > ECS**
  * Set alarms for `CPUUtilization` and `MemoryUtilization`

***

#### 4. **DynamoDB: Write Capacity, Latency, and Backup**

* **What it checks**:
  * Write capacity units (WCU) and latency metrics
  * Point-in-time recovery (PITR) is enabled
  * Table encryption status
* **How to resolve**:
  * Go to **DynamoDB > Tables**
  * Enable **Auto Scaling** for WCU
  * Turn on **PITR** under the **Backups** tab
  * Ensure **Encryption at rest** is enabled (using AWS KMS)

***

#### 5. **SQS: Visible Messages Should Be Monitored**

* **What it checks**:
  * CloudWatch alarm is configured for message backlog
* **How to resolve**:
  * Go to **CloudWatch > Alarms > Create Alarm**
  * Select SQS → Choose `ApproximateNumberOfMessagesVisible`
  * Set a threshold (e.g., >100 messages)
  * Attach notification or auto-scaling rule

***

#### 6. **ALB / CLB: Latency Should Be Monitored**

* **What it checks**:
  * CloudWatch alarms are configured for high latency or 5xx errors
* **How to resolve**:
  * Go to **CloudWatch > Metrics > LoadBalancer**
  * Track `TargetResponseTime` or `HTTPCode_ELB_5XX_Count`
  * Set alarm thresholds

***

#### 7. **ElastiCache: CPU and Connection Metrics**

* **What it checks**:
  * CPU utilisation and current connection count via CloudWatch
* **How to resolve**:
  * Enable enhanced monitoring for ElastiCache
  * Create alarms for CPU and `CurrConnections` metrics

***

#### 8. **CloudWatch: Alarm Configuration**

* **What it checks**:
  * Monitoring is active across key services
  * Alarms are not in `INSUFFICIENT_DATA` state
* **How to resolve**:
  * Periodically audit alarms for gaps or inactive services
  * Ensure metrics are collected with correct granularity

***

### Remediating in Sprinto

* For automated checks, Sprinto syncs alarm status via integration
* For manual checks:
  * Upload screenshots of CloudWatch alarms or service configurations
  * Attach backup policy summaries if applicable
* Use **Mark as Resolved** once the required action is complete

***

Here’s the new **subsection** to add under **AWS Infrastructure Monitors** in the **Cloud & Infrastructure Monitoring** section:

***

### Monitor AWS API Gateway Errors

Sprinto raises this check if your **AWS API Gateway** is not configured to monitor for errors and performance anomalies.\
This check helps ensure your APIs are operational, reliable, and compliant with availability-related controls in frameworks like SOC 2 and ISO 27001.

Monitoring focuses on metrics like:

* `5XXError`: Server-side errors
* `4XXError`: Client-side issues
* `Latency`: Slow response times
* `IntegrationLatency`: Backend integration delays

***

#### Steps to Enable Monitoring

**1. Go to CloudWatch Metrics**

* Open the **AWS Console**
* Navigate to **CloudWatch > Metrics**
* Under **Browse**, go to:

  ```
  API Gateway > By API Name
  ```

**2. Select the Relevant Metrics**

Choose:

* `5XXError`
* `4XXError`
* `Latency`
* `IntegrationLatency`

You can apply filters for **API Name**, **Stage**, or **Method** as needed.

**3. (Optional) Create Alarms**

To automate detection:

1. Go to **CloudWatch > Alarms > Create Alarm**
2. Select one of the metrics (e.g., `5XXError`)
3. Set a threshold (e.g., `> 1 error for 5 minutes`)
4. Add notification channel (email, SNS, etc.)

***

#### Evidence Guidelines

<table><thead><tr><th width="235.5390625">Evidence Type</th><th width="115.41015625">Accepted?</th><th>Notes</th></tr></thead><tbody><tr><td>CloudWatch metrics graph</td><td>✅</td><td>Must show time series data for API Gateway errors</td></tr><tr><td>Screenshot of alarm setup</td><td>✅</td><td>Include metric, condition, and notification target</td></tr><tr><td>JSON/CSV export of metrics</td><td>✅</td><td>Optional – useful for detailed audits</td></tr></tbody></table>

{% hint style="info" %}
**Notes:**

* Metrics may take a few minutes to appear after enabling CloudWatch logging for the API.
* For private APIs, ensure that internal network access and VPC links are also being monitored.
  {% endhint %}

***

#### AWS Load Balancer Configuration

Sprinto monitors AWS Load Balancer settings to ensure they are configured securely and do not expose resources to public access or operational risks.

***

**What is checked**

<table><thead><tr><th width="224">Check Type</th><th>Description</th></tr></thead><tbody><tr><td><strong>Public access restriction</strong></td><td>Ensures ALBs/CLBs do not accept traffic from unrestricted sources (<code>0.0.0.0/0</code>)</td></tr><tr><td><strong>Health checks configured</strong></td><td>Validates that health checks are correctly defined to identify failing targets</td></tr><tr><td><strong>Version and protocol validation</strong></td><td>Verifies use of latest TLS protocols and secure listener configurations</td></tr><tr><td><strong>Logging and monitoring</strong></td><td>Checks whether access logs and metrics collection are enabled</td></tr></tbody></table>

***

**When does this fail?**

* ALBs or CLBs allow public traffic without restriction
* No health probe is defined or probes do not match expected path/port
* Logging is disabled or missing from configuration
* Older TLS versions (e.g., TLS 1.0/1.1) are in use

***

**How to resolve**

1. Restrict access by updating ALB/CLB listener and security group settings
2. Configure health probes for all target groups
3. Enable logging in the **Load Balancer attributes** section
4. Upgrade listener protocols to **TLS 1.2 or higher**

***

**Restricting Public Access on AWS Application Load Balancer (ALB)**

Sprinto raises a monitor when it detects that **AWS ALBs are accessible from public IP ranges**, especially `0.0.0.0/0`, which implies unrestricted internet access. This configuration may expose internal services or applications to external threats.

***

**What is checked:**

* ALBs with listener rules allowing **HTTP/HTTPS traffic from public IP ranges**
* Security groups attached to ALBs that allow **open inbound rules**
* Misconfigured **target groups** without route-based restrictions

***

**How to resolve:**

1. **Update ALB security groups** to allow traffic only from known CIDR blocks or internal subnets
2. Use **private subnets** or **VPC endpoints** if the ALB is intended for internal traffic
3. Configure **WAF rules** or **listener conditions** to add an additional access layer
4. Document and upload configuration changes or screenshots as evidence in Sprinto

***

### Best Practices

* Tag critical resources and apply monitoring only where needed
* Automate alarm creation using Infrastructure as Code (e.g., Terraform, CloudFormation)
* Enable notifications via SNS or integrate with incident response tools
* Set thresholds based on baselined performance, not fixed values
