The company reported problems across a number of regions, including the US, Europe and Asia, with Azure cloud platform customers' access and use of a range of IT services. The "connectivity issues" persisted for between three and approximately five hours before being "mitigated", depending where in the world the problems arose.
The outages affected customers of its virtual machines and analytics services, and hit website availability and storage services too, among others.
In a company blog Jason Zander, corporate vice president of Microsoft Azure, said the outages occurred during a "performance update", despite earlier testing having been carried out on the potential effect of the update on service availability.
"As part of a performance update to Azure Storage, an issue was discovered that resulted in reduced capacity across services utilising Azure Storage, including virtual machines, visual studio online, websites, search and other Microsoft services," Zander said. "Prior to applying the performance update, it had been tested over several weeks in a subset of our customer-facing storage service for Azure Tables. We typically call this 'flighting,' as we work to identify issues before we broadly deploy any updates."
"The flighting test demonstrated a notable performance improvement and we proceeded to deploy the update across the storage service. During the rollout we discovered an issue that resulted in storage blob front ends going into an infinite loop, which had gone undetected during flighting. The net result was an inability for the front ends to take on further traffic, which in turn caused other services built on top to experience issues," he said.
Zander said the company reversed the update after identifying the problem and that this led to "availability improvement" for most of its customers. Issues affecting some customers persisted a little longer.
"When we have an incident like this, our main focus is rapid time to recovery for our customers, but we also work to closely examine what went wrong and ensure it never happens again," Zander said. "We will continually work to improve our customers’ experiences on our platform."
Following a previous Microsoft Azure outage in the summer, IT contracts specialist Lindsey Brown of Pinsent Masons, the law firm behind Out-Law.com, said: "Cloud providers trade on reputation and any 'downtime' to their services can be very damaging to their business and ability to attract and retain customers."
"While some businesses will be able to obtain better contractual assurances around service levels from non-cloud suppliers, cloud customers can gain some comfort from knowing that cloud providers are likely to be operating modern, up-to-date systems and will have a pressing business need of their own to restore availability quickly in the event that an outage occurs," she said.