Syed Jafer K

Its all about Trade-Offs

Learning Notes #43 – Avoiding Insurmountable Queue Backlogs | AWS | Cloud Pattern

Today i got a chance to read about https://aws.amazon.com/builders-library/avoiding-insurmountable-queue-backlogs/where they mention how to deal with backlog messages in the queue. Its a nice read. I will recommend you to read the same. In this blog i jot down notes from aws blog for future reference.

1. Understanding Queue Behavior: While queues enhance system durability and availability by handling asynchronous tasks, they can also lead to increased latency and prolonged recovery times if backlogs occur.

2. Causes of Backlogs: Backlogs can arise from system failures, unexpected load patterns, or when the message arrival rate surpasses the processing rate, leading to increased latency and potential system unavailability.

3. Measuring Availability and Latency: Metrics such as the number of messages in Dead Letter Queues (DLQs) and message age are effective indicators of system availability and latency, respectively

Strategies to Prevent and Manage Backlogs

1. Separate Workloads into Individual Queues: Assigning separate queues for different workloads or customers can prevent one workload from impacting others.

2. Implement Shuffle-Sharding: Distributing workloads across a fixed number of queues using hashing techniques ensures that a single customer’s load doesn’t overwhelm the system.

3. Sidelining Excess Traffic: Redirecting excess traffic to separate ‘spillover’ queues allows the system to handle surges without affecting primary processing.

4. Prioritize Fresh Messages: Processing newer messages first (LIFO approach) ensures timely handling of current data, while older messages can be processed as capacity allows

5. Set Message Time-to-Live (TTL): Implementing TTL for messages ensures that outdated messages are automatically discarded, preventing unnecessary processing.

6. Limit Resource Allocation per Workload: Controlling the number of processing threads or resources assigned to each workload prevents any single workload from monopolizing system resources.

7. Implement Backpressure Mechanisms: Sending feedback to upstream systems to control the inflow of messages helps manage load and prevent backlogs.

8. Use Delay Queues: Introducing delay queues postpones processing of certain messages, allowing the system to manage workloads more effectively during high-load periods.

9. Avoid Excessive In-Flight Messages: Limiting the number of messages being processed simultaneously prevents system overload and potential failures.

10. Utilize Dead Letter Queues (DLQs): Routing messages that cannot be processed after multiple attempts to DLQs ensures problematic messages don’t hinder overall processing.

11. Ensure Adequate Buffering: Maintaining sufficient buffering capacity in polling threads accommodates workload spikes and enhances system resilience.

12. Implement Heartbeating for Long-Running Messages: Regularly updating the visibility timeout for long-running messages prevents them from being reprocessed due to perceived timeouts.

13. Plan for Cross-Host Debugging: Employing tools like AWS X-Ray and correlation IDs facilitates effective debugging across distributed systems.