It used to be so simple. If a service failed, it was the service provider’s fault, the customer’s fault, or in rare cases the fault of a natural disaster. But with the cloud, reliability and ultimately responsibility is not so simple.
When cloud availability suffers, to put it in frank terms, “whose throat gets choked?” Is it the cloud service provider, the virtualization supplier, the consumer, the network service provider, or the fault of a different link in the complex system that comprises the cloud ecosystem?
There’s not yet any standard for service-level agreements (SLA) and which link in the chain takes responsibility when cloud availability is compromised, which raises important challenges that are highlighted in a recent TechZine article, “Guide to Cloud Accountability,” by Randee Adams and Eric Bauer at Alcatel-Lucent (News
- Alert).
The authors note that, “Accountability in the cloud has not yet been clearly defined.” They content that until there is a standard, “ SLAs or other agreements need to clearly provide all parties involved the information they need to know as to who’s responsible for preventing and remedying outages and how are those outages being identified and measured.”
Adams and Bauer define the parties that might bear responsibility for a lapse in reliability. They suggest a starting point in consideration of when each party is responsible for an issue in cloud performance and availability and highlight what those responsibilities entail.
Cloud consumers – should be responsible for properly provisioning, configuring and operating their cloud appliance, according to the article.
End users – should be responsible for some operation, configuration or improper use of equipment.
Virtual appliance vendors – should be held accountable for the stability and reliability of the application software.
Infrastructure suppliers – of computing, storage, networking and other platform software should be responsible for assuring that equipment is robust and reliable.
Cloud service providers – must take responsibility for the reliable operation of cloud computing infrastructure and facilities, and serving the needs of cloud consumers.
Network service providers – should be held accountable for network availability and any network issues.
“Of course, specific outage responsibilities vary according to the cloud service model and the contract terms agreed to by the cloud consumer and cloud service provider,” Adams and Bauer stressed.
They also note that this is why measurement is almost as important as clearly defining responsibility. Three service measurements are recommended.
The first measurement looks at how each key component in the data center affects service availability. “To eliminate all impairments not associated with the application, this measurement is taken with minimal IP routing, switching and facility infrastructure between the measurement point and the server hosting the application,” Adams and Bauer explained. Separate ratings “can be calculated for routers, security appliances, load balancers and other infrastructure configurations.”
The second measurement considers service availability and how it is affected by the data center environment. The performance of individual application instances is measured with the host data center.
The third necessary measurement looks at service availability “across multiple data centers to mitigate impairment of individual application instances, IP equipment and facilities and data center infrastructure,” according to the article. This third measurement includes the service availability benefits of georedundant application instances at multiple cloud data centers.
The authors make several very valid points, not the least of which is the obvious need for a standard so companies that use the cloud can carefully develop their SLAs to ensure in the event of a failure in cloud reliability a finger-pointing match does not erupt. At the end of the day we literally and figuratively are all in this together, and having clearly understood responsibilities, with accountabilities that can be measure and tracked, is the best way to avoid problems down the road.