A while back, I was starting up an EC2 instance on the AWS cloud when it entered an endless restart loop. All the application deployment efforts we’d made (installation and service configuration) over two weeks just went down the drain. So we called support. The support rep redirected us to his team leader who simply told us that, as indicated in the SLA, we had to abide by the shared responsibility model and they were not liable for our loss.
Over the last year I had endless conversations with companies that strive to adopt the cloud – specifically the Amazon cloud. Of those I met, I can say that ClickSoftware is one of the leading traditional ISVs that managed to adopt the cloud. The Amazon cloud is with no doubt the most advanced cloud computing facility, leading the market. In my previous job I was involved in the ClickSoftware cloud initiative, from decision making with regards to Amazon cloud all the way to taking the initial steps to educate and support the company’s different parties in providing an On-Demand SaaS offering.
The Cloud Service Level Agreement (SLA) discussion puts penalties and compensations on the table. Can we say that the compensation method the customer expects is the same as the Software as a Service (SaaS) vendor’s SLA provides?
A while ago, I experienced issues while starting up a specific instance on Amazon AWS cloud. I’m still not sure why, but the instance entered an endless restart loop. All the application deployment work (installation and configuration of a service) we did on it for about two weeks just went down the drain. Discussion with the Amazon AWS support team ended with an escalation of the support request to their head of support.
Take a look at the following paragraphs copied from the Amazon AWS EC2 SLA –
Traditionally delivering high availability often meant replicating everything. However, today with the option of going to the cloud we can say that providing two of everything is costly. High availability should be planned and achieved at several different levels: including the software, the data center and the geographic redundancy. According to a recent study the cost of a data center outage ranges from a minimum cost of $38,969 to a maximum of $1,017,746 per organization, with an overall average cost of $505,502 per incident.
1 – Total cost of partial and complete outages can be a significant expense for organizations.
2 – Total cost of outages is systematically related to the duration of the outage.
3 – Total cost of outages is systematically related to the size of the data center
4 – Certain causes of the outage are more expensive than others. Specifically, IT equipment failure is the most expensive root cause. Accidental/human error is least expensive.
From an attacker’s perspective, cloud providers aggregate access to many victims’ data into a single point of entry. As the cloud environments become more and more popular, they will increasingly become the focus of attacks. Some organizations think that liability can be outsourced, but no, and I hope that we all understand it cannot. The contract with your cloud vendors basically means nothing, the ISVs or should I say the `SaaS providers` still holds the responsibility, so rather than focusing on contracts and limiting liability in cloud services deals, you should focus on controls and auditability.
Amazon AWS: Ten days before the earthquake, AWS launched its fifth “region” in Tokyo, making it the second data center hub in Asia Pacific (APAC) after Singapore. Regina Tan, spokeswoman of Amazon AWS said, ”The Amazon Web Services Tokyo Region was not affected by the earthquake or tsunami last week. AWS publishes our most up-to-the-minute information on service availability at our Service Health Dashboard on our website which constantly keeps our customers (and anyone who is interested) up-to-date on AWS’ services across our five Regions including the Tokyo Region.” Read more
Google: According to a Google spokesperson, “Google’s networking and data center infrastructures have not been materially affected by the [Japan] earthquake. Read more
Equinix: “Our two data centers in Tokyo are operating as normal and are running from power from the grid. There is no facility damage or operational impact since the earthquake happened,” said Samuel Lee, president of Equinix Asia Pacific. Read more
High Availability (HA) is an interesting subject that I am tracking as part of my journey in OD (On-Demand) market of ISVs. Alot of questions are raised when this discussion take place, for exmaple –
Are there any industry standards? Does 4 nines, counted as a HA or should the ISV support 5 nines ? Is that include an additional fee ? How does an ISV will be able to support 5 nines? How many nines does a cloud provider really support? can it be obligated to 100% uptime? …etc.
Click here to read this optimistic article brough to you by NYTimes, provide Urs Hölzle, senior vice president for operations at Google, insights.