Archive for September, 2011

Synchronous vs. Asynchronous replication strategy: Which one is better?

Posted in Uncategorized on September 22, 2011 by swaminathans

If you’re a developer who is looking to pick a highly available database or datastore for your application, you’ve to think of various parameters to make the right choice for your application. Some of the questions you need to hash out include “What query model do I want and what does the datastore offer?”, “What is its consistency guarantees?”, “What is its reliability and availability SLA?” and “What are its scaling dimensions?”. While the process of picking the right datastore warrants a post on its own, I wanted to focus on one specific parameter of this question for this post.

The focus of this post is: “How does a datastore replicate data? Is replication done lazily or synchronously?

One might ask: Why should an application developer (or any datastore consumer) care how a datastore replicates data? The reality is that how a system replicates data has great impact on the datastore’s durability and availability.

Traditionally there are two kinds of replication techniques: synchronous replication and asynchronous replication. A datastore using synchronous replication does not acknowledge a write call until it is acknowledged by its replicas (usually a majority of the replicas).  Examples of pratical datastore that does synchronous replication include Amazon Dynamo (and many of its variants like Cassandra, Voldermort and Riak), Amazon SimpleDB, Amazon RDS Multi-AZ MySQL and Google AppEngine’s High Replication Datastore. On the other hand, a datastore that replicates data asynchronously propagates data to its replica, i.e., when one of the replica gets a write call it will acknowledge to the client right away and in the background it will propagate the writes to its replicas. Examples of systems that use this model include MySQL replication, MongoDB and Google AppEngine’s Master-slave datastore.

Which one is better? Well, it really depends on what you’re planning to use it for. Datastores that synchronously replicates data ensures provides higher durability and availability. Why? Consider the failure scenario where the replica that acknowledged that even if the replica that took a write dies as soon as it acknowledged to the clients, the data is not lost. Clients can read their last acknowledged writes by accessing from the other available replicas. However, in an asynchronously replicated systems, it is not possible for clients to know if their writes were propagated to the other replicas before the failure of the replica that coordinated the writes. If the replica has failed permanently (due to a permanent node failure or so), then you can experience a data loss where you lost the latest updates (durability issue). If the replica has failed temporarily, then clients can still access an old version of the data from the other available replicas. However, they will not be able to perform any writes unless they are willing to deal with merging conflicting updates when the replica comes back online.

So, why would anyone run the risk of picking a lower durable datastore and pick an asynchronously replicated datastore? The reason is because asynchronously replicated datastore provides reduced write latency (since a write is acknowledged locally by a single before being written to other replicas, the write latency is lower) and better read scaling (as adding more replicas does not impact your latency like in the case of synchronously replicated systems).

So, in a nutshell, the rule of thumb I tend to use in picking between these two techniques is: If your application requires high durability and availability, pick a synchronously replicated datastore solution. If your application is OK with losing a small % of writes (e.g., a soft-state or caching system) due to a failed replica (usually the master replica), then you are OK with asynchronous replication system. A great example is how MySQL users use read replicas to do read scaling. Another great example for using asynchronous replication is to use your asynchronous replica as the backup copy (used for disaster recovery).

Reading List for Distributed Systems

Posted in Distributed Systems on September 7, 2011 by swaminathans

I quite often get asked by friends, colleagues who are interested in learning about distributed systems saying “Please tell me what are the top papers and books we need to read to learn more about distributed systems”. I used to write one off emails giving a few pointers. Now that, I’ve asked enough I thought it is a worthwhile exercise to put these together in a single post.

Please feel free to comment if you think there are more posts that needs to be added.

Papers:

Distributed systems Theory:

I believe these are some of the foundational theory papers you must read before you go on to build large scale systems.

Distributed Consensus:

Paxos is the gold standard of distributed consensus protocols. Amazingly, this is the simplest of all protocols. There is a huge story behind how Paxos paper delayed getting published as the original paper was written in a non-obvious fashion by Lamport J. Paxos was more approachable for general masses after he wrote an abridged version of the paper.

Original part-time parliament: http://research.microsoft.com/en-us/um/people/lamport/pubs/lamport-paxos.pdf

If that paper is too abstract for you, I recommend reading the lite version: Paxos made Simple

I always believe a theory is well understood only if you understand how others put in practice. Google has documented their experience with Paxos here:

Paxos made live talks about Google’s experience with using Paxos and Chubby, Google’s Paxos based lock service.

Consistency and Ordering in Distributed Systems:

Lamport, L. Time, clocks, and the ordering of events in a distributed system. ACM Communications, 21(7), pp. 558-565, 1978.

L. Lamport, R. Shostak, and M. Pease, The Byzantine Generals Problem, ACM Transactions on Programming Languages and Systems, July 1982, pages 382-401.

Replication in Distributed Databases:

Demers et al., Epidemic algorithms for replicated database maintenance, PODC 1987.

Jerome H. Saltzer and M. Frans Kaashoek, Principles of Computer System Design, Chapter 10: Consistency.

Lindsay, B.G., et. al., “Notes on Distributed Databases”, Research Report RJ2571(33471), IBM Research, July 1979

Distributed Hash Tables:

Distributed Hash Tables are distributed systems that use hashing techniques for routing and membership.  Most of them use consistent hashing as the foundation for routing.

Seminal work on consistent hashing techniques: Consistent Hashing and Random Trees. Application of consistent hashing to caching: Web caching with consistent hashing.

What followed last decade was researchers using consistent hashing techniques to build P2P systems and routing techniques using them. Examples of such systems include Chord, Pastry and Tapestry.

Real life Distributed systems and data Stores:

The following distributed databases papers are seminal and great examples of distributed systems. A must read for people interested in building distributed systems:

Amazon Dynamo: Amazon’s own key-value store

Google File System: Google’s very own distributed file system.

Google BigTable: Google’s distributed datastore.

Map-Reduce: A seminal piece of work that has powered the Hadoop ecosystem

Autopilot: Automatic Data Center Management Michael Isard April 2007.

Update [ 9/13/2011]: Alex Feinberg pointed out his reading list, which is equally impressive as well and not so surprisingly have a great number of papers in common. Thanks Alex!