Monday, June 23, 2008

What happens when it rains?

Amazon S3 and data corruption. Thread: S3 data corruption?

 

We've isolated this issue to a single load balancer that was brought into service at 10:55pm PDT on Friday, 6/20.  It was taken out of service at 11am PDT Sunday, 6/22.  While it was in service it handled a small fraction of Amazon S3's total requests in the US.  Intermittently, under load, it was corrupting single bytes in the byte stream.  When the requests reached Amazon S3, if the Content-MD5 header was specified, Amazon S3 returned an error indicating the object did not match the MD5 supplied.  When no MD5 is specified, we are unable to determine if transmission errors occurred, and Amazon S3 must assume that the object has been correctly transmitted. Based on our investigation with both internal and external customers, the small amount of traffic received by this particular load balancer, and the intermittent nature of the above issue on this one load balancer, this appears to have impacted a very small portion of PUTs during this time frame.
One of the things we'll do is improve our logging of requests with MD5s, so that we can look for anomalies in their 400 error rates.  Doing this will allow us to provide more proactive notification on potential transmission issues in the future, for customers who use MD5s and those who do not. In addition to taking the actions noted above, we encourage all of our customers to take advantage of mechanisms designed to protect their applications from incorrect data transmission.  For all PUT requests, Amazon S3 computes its own MD5, stores it with the object, and then returns the computed MD5 as part of the PUT response code in the ETag.  By validating the ETag returned in the response, customers can verify that Amazon S3 received the correct bytes even if the Content MD5 header wasn't specified in the PUT request.  Because network transmission errors can occur at any point between the customer and Amazon S3, we recommend that all customers use the Content-MD5 header and/or validate the ETag returned on a PUT request to ensure that the object was correctly transmitted.  This is a best practice that we'll emphasize more heavily in our documentation to help customers build applications that can handle this situation.
If you have specific questions or concerns about how your application might have been affected, please feel free to e-mail us at aws@amazon.com.

posted by Aaron Fischer on Monday, June 23, 2008 7:21:18 PM (Pacific Standard Time, UTC-08:00)   #    Comments [0]