Google Goof-Ups Capacity Calculation, Results In Gmail Outage For 2 Hours
General News — By Ricky on September 1, 2009 at 11:00 pmGoogle have concluded that the two hour Gmail shut down on Tuesday was the result of a miscalculation regarding the capacity of its system.
Millions of Gmail users checking in for entertainment news to business updates were left stranded. The problem was caused by a classic cascade in which servers were jammed with traffic in rapid succession.
Google say that, the problem began when it took several Gmail servers offline for maintenance, a routine procedure that normally is transparent to users. However, the twist this time around was that Google had made certain modifications to the routers that direct Gmail traffic to servers with the intentions of improving reliability. These slight modifications obviously backfired and cost Google dearly.
In a statement posted on the official Gmail blog an executive stated, “We had slightly underestimated the load which some recent changes (ironically, some designed to improve service availability) placed on the request routers — servers which direct web queries to the appropriate Gmail server for response. At about 12:30 pm Pacific a few of the request routers became overloaded and in effect told the rest of the system “stop sending us traffic, we’re too slow!” This transferred the load onto the remaining request routers, causing a few more of them to also become overloaded, and within minutes nearly all of the request routers were overloaded. As a result, people couldn’t access Gmail via the web interface because their requests couldn’t be routed to a Gmail server. IMAP/POP access and mail processing continued to work normally because these requests don’t use the same routers.”

Google fixed the problem by allocating traffic across the rest of its indigenous network, Google have promised it would focus on making sure that the request routers have sufficient headroom to handle future spikes in demand, as well as figuring out a way to make sure that problems in one sector can be isolated without bringing down the entire service. “We’ll be hard at work over the next few weeks implementing these and other Gmail reliability improvements — remains more than 99.9% available to all users, and we’re committed to keeping events like today’s notable for their rarity,” Treynor wrote.
Google have apologized to all its customers.

Tweet This
Save to delicious1
Stumble it

Subscribe
1 Comment
That sucked real bad for me as I had to access some important documents online. It’s really irrresponsible on google’s part to make such kinda stuff later. What’s use, man?