Netflix explains how its error handling system works and it’s impressive
Even with record growth in the midst of the pandemic, Netflix hasn’t seen a drop-in service in 2020. Or at least none that we’ve noticed. Netflix the streaming platform explained how its error handling system works on Monday, November 2 the operation of a technique created to make the service more stable: it was put to the test recently when no one was left without a marathon of its series during a systemic failure.
The priority-based progressive load reduction technique is a way to keep essential parts of Netflix running during an outage. First, the company defined which requests are less important (such as access sheets and other history requests), important (history, language selection, or pause button), and most important (the content itself, of course).
In the event of a system failure, Netflix takes established priorities into account and prevents your cell phone, computer or television from making lower requests, so the rest can be accessed. It all seems very obvious now that it was put in place, but Netflix had no such thing until 2019: Basically, systems could be totally on-air or totally off-air, with no middle ground.
Netflix limits traffic to prevent the entire service from failing
This animation shows how it was still possible to start playing an episode of Cobra Kai even if most requests were denied (503 is a standard HTTP code for server downtime; 200 is when all went well. ):
Behind the scenes, all control is carried out by Zuul, a routing service created by Netflix that even has open-source code. It constantly monitors the different services that makeup Netflix: if the latency or failure rate of one of them exceeds a predefined limit, the traffic of the service in question is limited to keep it running.
Now, if Zuul himself notices that it is getting overloaded, depending on the processing usage or the number of active connections, the traffic will gradually be limited more aggressively to keep Netflix running while the system recovers. Harnessing this resource is essential, because obviously if Zuul goes down, all of Netflix will be down.
While the mess is happening behind the scenes, Zuul sends a signal to your TV, app, or browser, letting you know how many requests they can make and within what time frame. This helps prevent your phone (and millions of other users) from repeatedly trying to reconnect to Netflix in a short period of time, further exacerbating the overload on your company’s servers.