This was a planned test of our infrastructure alerting and incident management system. In this event, a sudden stress test of our under resourced development environment under production loads caused a significant percentage of app activity over a 5 minute window to hang for several seconds or timeout due to exceeding the 15s response time cutoff.
In addition to ensuring our automating alerting system worked, we wanted to see how our systems would respond to spikes in usage that push them beyond the limits of their allocated resources. While the app became painfully slow, it did not suffer a catastrophic failure or complete outage.