Amazon RDS

Posted in Cloud Computing, Distributed Systems on October 27, 2009 by swaminathans

Today, my team launched Amazon RDS, a new AWS service that offers relational database in the AWS Cloud. Amazon RDS provides a managed relational database in the cloud and the service does the heavy lifting done by DBAs such as provisioning, monitoring database health, managing automated backups, point-in-time recovery, creating/restoring snapshots, adding storage space on the fly, changing compute power for your database, etc.., all through simple Web service calls. Get started in few minutes!

A Word about Persistence: One size does not fit all.

The scalability, reliability and performance of a platform is heavily dependent on how it manages their data. In Amazon, we believe that when it comes to persistence: one size does not fit all. Pick the right data management platform based on your application’s needs. For people, who require a highly available and highly durable key-value store, we have Amazon S3. For people requiring raw disk in the cloud – we have Amazon EBS. For, people who want to store and query structured data in the cloud and don’t want to actively manage its scalability, we have Amazon SimpleDB.

One of the biggest features our customer has been asking us has been to provide a managed relational database service in the cloud. The need for their relational database can be due to various reasons: programming familiarity, dealing with existing applications already built for relational databases, need for complex transactions and joins which are relational databases’ forte. If you fall in this category, you will be happy with Amazon RDS.

For more details on Amazon RDS, take a look at:

Job Openings in Cloud Computing

Posted in Cloud Computing on October 13, 2009 by swaminathans

I’m looking for some really smart people who would like to work on building large scale distributed systems to join my team. We have various positions open : right from beginners to senior technical leaders.

In an earlier post, my boss, Werner Vogels, summarized qualifications he expects his ideal candidates to meet in a blog post that he wrote 5 years ago. My standards are no different :).

I’m looking for candidates who can meet these requirements. If you’re a beginner (college grad or so), then ideally you should be willing to get to that skill level soon. So, if you’re interested in building such large scale systems, send me your CV to my email: swami-removetheobvious@amazon.com

Expand Your Datacenter to Amazon Cloud

Posted in Cloud Computing on August 26, 2009 by swaminathans

So, far in AWS, we have provided new services to “expand our cloud offering”. Today, with the introduction of Amazon VPC, we allow our customers to expand their datacenter to the cloud by providing a secure and seamless bridge to AWS cloud.

As Werner mentions in his post, one of the significant challenges for enterprise in moving to cloud is: how to integrate applications running in the cloud into his existing management frameworks.

To put it in a simpler way, traditionally CIOs had to plan “move to cloud” initiatives as a major project as they cannot reuse their existing software management for their EC2 instances. With the introduction of Amazon VPC the bar becomes very low.

Amazon VPC enables enterprises to connect their existing infrastructure to a set of isolated AWS compute resources via a Virtual Private Network (VPN) connection, and to extend their existing management capabilities such as security services, firewalls, and intrusion detection systems to include their AWS resources.

We are really proud to launch Amazon VPC and believe this is a true milestone in the field of Cloud Computing!

GFS Evolution: Few thoughts…

Posted in Distributed Systems on August 20, 2009 by swaminathans

I finally caught up with the ACM Queue’s interview of Sean Quinlan on GFS Evolution. A small recap of the article for folks who haven’t read the article. GFS, Google’s FileSystem, has been used extensively in Google for more than 10 years (see GFS paper ).

In this interview, Sean talks about some of the shortcomings of original GFS design and what were the challenges they faced when many more applications started using GFS. He talks about some of the biggest issues they ran into GFS was having a single master in charge of a FS cluster where they ran out of how much metadata a single master could keep in its memory thereby having an inherent limit on the number of files a GFS cluster can run.

Later in the interview, he talks about a new version of GFS they are building that uses a distributed master model where they can add more machines to the GFS cluster and the machines will be able to distribute the load of replication, chunking automatically. Clearly, this will handle more load, more files, will be provide higher availability and better performance.

Few things that intrigued me about this interview:

(i) GFS, which was originally built for batch file processing, has evolved to support more online applications like gmail. This introduced new performance, availability and durability requirements.

(ii) How the changing application patterns have driven the design of GFS to a more distributed model that meet the demanding availability and performance needs of online applications.

(iii) The experience with “loose” (eventual) consistency model in GFS and how they handled different failure modes. Looks like their biggest issue was in dealing with client failures as clients were in charge of data reconciliation (which is one of the biggest challenges with eventual consistency). Looks like to avoid these issues, they are moving to a single writer per file model, basically serializing all the writes. Seems like a reasonable approach to provide a tighter bound on consistency (at the expense of possibly reduced “write availability”).

Overall, this was a very insightful interview for me and it is interesting to see how similar some of these problems are what Amazon has seen and solved in the past.

I am really looking forward to read a new SOSP/OSDI paper on GFS v2.

From push to pull…

Posted in Cloud Computing on July 14, 2009 by swaminathans

Last week, in an internal talk series, Werner pointed to a paper titled “From push to pull: Emerging models for mobilizing resources” by Hagel and Brown.

That night, I read that paper and what an intriguing paper it was! For folks who haven’t read this paper, the fundamental crux of this paper is follows. The traditional resource model is changing from a push model (where resource needs are planned in advance and pushed to the consumers) to a pull model (where consumers are pulling resources on demand).

For instance, in the media industry, people are moving away from traditional media sources like television where content is pushed to the audience. Instead, people are happy to pick and choose what content they want to view and pull from different sources like YouTube. In the paper, authors point to other examples such as the university education model. For example, one of the most popular universities in America, University of Phoenix, is introducing a new curriculum model that allows students to decide (i.e., “pull” ) what subjects they want to study instead of traditional “push” model. This has made the university so popular that it is the largest private university in USA.

You might be wondering, why is this related to “large scale systems”?  I just realized the obvious that IT industry is going through the same revolution of moving towards pull model and cloud computing is one of the key enablers for it.

In the past, the traditional model for running an IT shop is to plan ahead in terms of what is our hardware needs, software needs, make the appropriate buying and planning decisions. For instance, typically companies had to plan for their resource demands at least an year in advance and “push” the resources.

With cloud computing, IT shop have an option to “pull” resources only when they need to and not worry about an year-ahead planning cycle. Many may already know the story of NYTimes digital IT shop that pulled AWS EC2 resources to run its image conversion job (see TimesMachine).  Others might not know Animoto handled its peak demands using dynamic scaling capabilities of Amazon EC2.

It is interesting how cloud computing has enabled the pull resource model in the IT world.

Anyway, if you get a chance, please read Hagel and Brown’s paper!