Archive for the Distributed Systems Category

Reading List for Distributed Systems

Posted in Distributed Systems on September 7, 2011 by swaminathans

I quite often get asked by friends, colleagues who are interested in learning about distributed systems saying “Please tell me what are the top papers and books we need to read to learn more about distributed systems”. I used to write one off emails giving a few pointers. Now that, I’ve asked enough I thought it is a worthwhile exercise to put these together in a single post.

Please feel free to comment if you think there are more posts that needs to be added.

Papers:

Distributed systems Theory:

I believe these are some of the foundational theory papers you must read before you go on to build large scale systems.

Distributed Consensus:

Paxos is the gold standard of distributed consensus protocols. Amazingly, this is the simplest of all protocols. There is a huge story behind how Paxos paper delayed getting published as the original paper was written in a non-obvious fashion by Lamport J. Paxos was more approachable for general masses after he wrote an abridged version of the paper.

Original part-time parliament: http://research.microsoft.com/en-us/um/people/lamport/pubs/lamport-paxos.pdf

If that paper is too abstract for you, I recommend reading the lite version: Paxos made Simple

I always believe a theory is well understood only if you understand how others put in practice. Google has documented their experience with Paxos here:

Paxos made live talks about Google’s experience with using Paxos and Chubby, Google’s Paxos based lock service.

Consistency and Ordering in Distributed Systems:

Lamport, L. Time, clocks, and the ordering of events in a distributed system. ACM Communications, 21(7), pp. 558-565, 1978.

L. Lamport, R. Shostak, and M. Pease, The Byzantine Generals Problem, ACM Transactions on Programming Languages and Systems, July 1982, pages 382-401.

Replication in Distributed Databases:

Demers et al., Epidemic algorithms for replicated database maintenance, PODC 1987.

Jerome H. Saltzer and M. Frans Kaashoek, Principles of Computer System Design, Chapter 10: Consistency.

Lindsay, B.G., et. al., “Notes on Distributed Databases”, Research Report RJ2571(33471), IBM Research, July 1979

Distributed Hash Tables:

Distributed Hash Tables are distributed systems that use hashing techniques for routing and membership.  Most of them use consistent hashing as the foundation for routing.

Seminal work on consistent hashing techniques: Consistent Hashing and Random Trees. Application of consistent hashing to caching: Web caching with consistent hashing.

What followed last decade was researchers using consistent hashing techniques to build P2P systems and routing techniques using them. Examples of such systems include Chord, Pastry and Tapestry.

Real life Distributed systems and data Stores:

The following distributed databases papers are seminal and great examples of distributed systems. A must read for people interested in building distributed systems:

Amazon Dynamo: Amazon’s own key-value store

Google File System: Google’s very own distributed file system.

Google BigTable: Google’s distributed datastore.

Map-Reduce: A seminal piece of work that has powered the Hadoop ecosystem

Autopilot: Automatic Data Center Management Michael Isard April 2007.

Update [ 9/13/2011]: Alex Feinberg pointed out his reading list, which is equally impressive as well and not so surprisingly have a great number of papers in common. Thanks Alex!

WebApps 11

Posted in Distributed Systems on October 6, 2010 by swaminathans

As many of you, the past 4 years in Amazon, I’ve spent on building out various parts of Amazon and AWS infrastructure that will enable Amazon and other AWS users to build highly available and scalable applications.

This also gives me immense pleasure to be part of the program committee for Usenix WebApps 2011. This conference is primarily focused on building highly scalable and Web applications. Looking forward to reviewing some great submissions and the conference.

I highly recommend submitting your ideas.

CFP details below:

WebApps ’11 Call for Papers

2nd USENIX Conference on Web Application Development (WebApps ’11)

June 15–16, 2011
Portland, OR

WebApps ’11 will take place during USENIX Federated Conferences Week, June 12–17, 2011.

Important Dates

  • Submissions due: January 21, 2011, 11:59 p.m. PST (hard deadline, no extensions, no exceptions, so don’t ask)
  • Notification to authors: March 17, 2011
  • Final files due: May 3, 2011

Conference Organizers

Program Chair
Armando Fox, University of California, Berkeley

Program Committee
Adam Barth, Google Inc.
Abdur Chowdhury, Twitter
Jon Howell, Microsoft Research
Collin Jackson, Carnegie Mellon University
Bobby Johnson, Facebook
Emre Kıcıman, Microsoft Research
Michael E. (“Max”) Maximilien, IBM Research
Owen O’Malley, Yahoo! Research
John Ousterhout, Stanford University
Swami Sivasubramanian, Amazon Web Services
Geoffrey M. Voelker, University of California, San Diego
Nickolai Zeldovich, Massachusetts Institute of Technology

Overview

The Web is now the dominant platform for delivering interactive applications to hundreds of millions of users. Such applications are now expected to scale effortlessly from tens of users to tens of millions of users in a single day while providing a responsive “always-on” experience. These demands, as well as the new possibilities opened by the proliferation of Web-capable mobile devices, requires that Web apps’ design and operation be elastic, failure-tolerant, and seamlessly scalable, supporting multiple devices and access methods.

Like the inaugural WebApps ’10, WebApps ’11 seeks to attract cutting-edge research that advances the state of the art, not only on novel Web applications but also on infrastructure, tools, and techniques that support the development, analysis/testing, operation, or deployment of those applications.

Topics

Possible topics include but are not limited to:

  • Storage for Web-scale applications
  • Techniques for testing and debugging
  • Novel strategies for fault tolerance or high availability in Web apps
  • The Web as an emerging platform in new application areas
  • HCI techniques related specifically to Web apps
  • Measurement, modeling, workload generation, and other tools to aid experimental research on Web apps
  • New and unusual app features or implementation techniques
  • Media delivery applications and infrastructure
  • Client-side libraries, toolkits, plug-ins
  • Server-side frameworks
  • Languages and language engineering advances relevant to Web app development
  • Deployment substrates and technologies (cloud computing, infrastructure as a service, testing as a service, etc.)

More details:

CONFERENCE HOME PAGE (full info): http://www.usenix.org/events/webapps11/

CFP online and in PDF format:

http://www.usenix.org/events/webapps11/cfp/
http://www.usenix.org/events/webapps11/cfp/webapps11cfp.pdf

Just Launched: Amazon RDS Multi-AZ

Posted in Cloud Computing, Distributed Systems on May 18, 2010 by swaminathans

Today, my colleagues and I launched a major feature in Amazon RDS called Multi-AZ DBInstances. A Multi-AZ DB Instance performs synchronous replication of data across replicas located in different availability zones.

What benefits does Multi-AZ DB Instance provide?

-          Higher Durability: Your data is synchronously replicated across different availability zones. This guarantees higher levels of durability even in the wake of a disk, instance or availability zone failures.

-          Higher Availability: With Multi-AZ instances, we perform master-slave replication and when we detect that the master is unavailable, we automatically failover to the slave replica. This guarantees higher levels of availability. Contrary to MySQL’s asynchronous replication, since the data is synchronously replicated, when the replica fails over to the secondary replica you will experience no data loss.

You can learn more about all the resilient goodness of RDS Multi-AZ deployments here.

Building Scalable Social Gaming Platform using Clouds

Posted in Cloud Computing, Distributed Systems on February 18, 2010 by swaminathans

This week, I ran into many social gaming related posts.  First thing that surprised me was the social games are not played predominantly by teens and the average social gamer is around 40s.  This illustrates why these social games attract tens of millions of players every day.

When I read the high scalability article about How Zynga scaled Farmville to Harvest 75 Million Players a month, I was intrigued by their scaling challenges and requirements:

* Read-write ratio: Interactive games are write heavy unless traditional web applications. Seems intuitive when you think about the fact every move is recorded in a datastore.
* Users are disturbed by high latencies and variability in latencies:  Note that I called high latencies and latency variations as two different entities. As we noted in Dynamo, Amazon also cares about the variability in latencies (percentiles) and build our services to make sure we can constantly keep the variability in control.
* Dealing with latency and failure characteristics of external dependencies: These applications need to deal with external platforms like Facebook which may or may not be available all the time.

I like the way Zynga approached this problem:

* For solving heavy writes, looks like they have partitioned heavily. Seems reasonable – however, I’m curious to see how they handled “hot spots” (wherein the most active users are the ones constantly generating more data) and whether simple hash-based partitioning is good enough to spread the hot spots.

* For handling latency variations, they went for isolating each component and built graceful degradation at each layer. This is a common practice in building large scale systems . The thing what I would be curious to see is how “gracefully” does their datastore degrades and also which datastore they are using for that.

* Finally, to deal with failures of external dependancies and still meet latency SLAs, looks like they cache the responses of external dependancies.

These are very good lessons for building scalable systems.

An interesting followup to this, I saw that Rightscale (which is apparently helping Zynga run on top of AWS) is using its expertise to offer a social gaming platform for other aspiring Zyngas out there! Seems like an exciting internet industry at its really nascent stages.

Amazon RDS

Posted in Cloud Computing, Distributed Systems on October 27, 2009 by swaminathans

Today, my team launched Amazon RDS, a new AWS service that offers relational database in the AWS Cloud. Amazon RDS provides a managed relational database in the cloud and the service does the heavy lifting done by DBAs such as provisioning, monitoring database health, managing automated backups, point-in-time recovery, creating/restoring snapshots, adding storage space on the fly, changing compute power for your database, etc.., all through simple Web service calls. Get started in few minutes!

A Word about Persistence: One size does not fit all.

The scalability, reliability and performance of a platform is heavily dependent on how it manages their data. In Amazon, we believe that when it comes to persistence: one size does not fit all. Pick the right data management platform based on your application’s needs. For people, who require a highly available and highly durable key-value store, we have Amazon S3. For people requiring raw disk in the cloud – we have Amazon EBS. For, people who want to store and query structured data in the cloud and don’t want to actively manage its scalability, we have Amazon SimpleDB.

One of the biggest features our customer has been asking us has been to provide a managed relational database service in the cloud. The need for their relational database can be due to various reasons: programming familiarity, dealing with existing applications already built for relational databases, need for complex transactions and joins which are relational databases’ forte. If you fall in this category, you will be happy with Amazon RDS.

For more details on Amazon RDS, take a look at:

GFS Evolution: Few thoughts…

Posted in Distributed Systems on August 20, 2009 by swaminathans

I finally caught up with the ACM Queue’s interview of Sean Quinlan on GFS Evolution. A small recap of the article for folks who haven’t read the article. GFS, Google’s FileSystem, has been used extensively in Google for more than 10 years (see GFS paper ).

In this interview, Sean talks about some of the shortcomings of original GFS design and what were the challenges they faced when many more applications started using GFS. He talks about some of the biggest issues they ran into GFS was having a single master in charge of a FS cluster where they ran out of how much metadata a single master could keep in its memory thereby having an inherent limit on the number of files a GFS cluster can run.

Later in the interview, he talks about a new version of GFS they are building that uses a distributed master model where they can add more machines to the GFS cluster and the machines will be able to distribute the load of replication, chunking automatically. Clearly, this will handle more load, more files, will be provide higher availability and better performance.

Few things that intrigued me about this interview:

(i) GFS, which was originally built for batch file processing, has evolved to support more online applications like gmail. This introduced new performance, availability and durability requirements.

(ii) How the changing application patterns have driven the design of GFS to a more distributed model that meet the demanding availability and performance needs of online applications.

(iii) The experience with “loose” (eventual) consistency model in GFS and how they handled different failure modes. Looks like their biggest issue was in dealing with client failures as clients were in charge of data reconciliation (which is one of the biggest challenges with eventual consistency). Looks like to avoid these issues, they are moving to a single writer per file model, basically serializing all the writes. Seems like a reasonable approach to provide a tighter bound on consistency (at the expense of possibly reduced “write availability”).

Overall, this was a very insightful interview for me and it is interesting to see how similar some of these problems are what Amazon has seen and solved in the past.

I am really looking forward to read a new SOSP/OSDI paper on GFS v2.

Follow

Get every new post delivered to your Inbox.