Amazon ElastiCache

Posted in Uncategorized on August 23, 2011 by swaminathans

Last night, my sister team in AWS launced a service I’m very excited about: Amazon Elasticache. Historically, caching has been the one of the most widely used techniques to build scalable web applications where the caches store the most often accessed computation results which take longer (or is harder) to re-compute in the source. In-memory caches are normally used to front databases so that often accessed results can be retrieved from memory faster (see examples of how to use MySQL and memcache together, here). However, to ensure that in-memory cache do not become a scalability bottleneck themselves, distributed cache clusters use techniques like distributed hash tables (DHTs) to ensure that cache cluster can be “scaled out”. As the scale of caching system becomes harder, it is a challenge to manage them in a large scale environment.

Today, AWS has made the process of running a cache cluster easier with a new managed cache offering called Elasticache. A quote from the detail page sums it up well:

“Amazon ElastiCache is a web service that makes it easy to deploy, operate, and scale an in-memory cache in the cloud. Amazon ElastiCache is protocol-compliant with Memcached, a widely adopted memory object caching system, so code, applications, and popular tools that you use today with existing Memcached environments will work seamlessly with the service. Amazon ElastiCache simplifies and offloads the management, monitoring, and operation of in-memory cache environments, enabling you to focus on the differentiating parts of your applications.”

Congratulations team!

Growth of AWS

Posted in Uncategorized on March 4, 2011 by swaminathans

This week, BusinessWeek article posted a great article on Cloud computing and AWS. The one statement that really caught my eyes which highlights our growth is:

“Keeping up with the demand requires frantic expansion: Each day, Jassy’s operation (AWS) adds enough computing muscle to power one whole Amazon.com circa 2000, when it was a $2.8 billion business.

Please take a look at the article here.

Shameless hiring plug: Do you want to be part of the team that builds such disruptive technologies, email me: swami-removetheobvious@amazon.com?

IEEE Network Magazine: Cloud Computing Special Issue

Posted in Cloud Computing on January 9, 2011 by swaminathans

I’m co-editing a special issue on Cloud computing for IEEE network magazine. For folks interested, take a look at the CFP below. Submission deadline is Jan 15, 2011.

Call For Papers

Final submissions due 15 January 2011

Background
Cloud Computing is a recent trend in information technology and scientific computing that moves computing and data away from desktop and portable PCs into large Data Centers. Cloud computing is based on a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. Cloud Computing opens new perspectives in internetworking technologies, raising new issues in the architecture, design and implementation of existing networks and data centers. The relevant research has just recently gained momentum and the space of potential ideas and solutions is still far from being widely explored.

Scope
This special issue of the IEEE Network Magazine will feature articles that discuss networking aspects of cloud computing. Specifically, it aims at delivering the state-of-the-art research on current cloud computing networking topics, and at promoting the networking discipline by bringing to the attention of the community novel problems that must be investigated. Areas of interest include, but are not limited to:

  • Data center architecture
    • Server interconnection
    • Routers (switching technology) for data center deployment
    • Centralization of network administration/routing
  • Energy-efficient cloud networking
    • Low energy routing
    • Green data centers
  • Measurement-based network management
    • Traffic engineering
    • Network anomaly detection
    • Usage-based pricing
  • Security issues in clouds
    • Secure routing
    • Security threats and countermeasures
    • Virtual network security
  • Virtual Networking
    • Virtualized network resource management
    • Virtual cloud storage
  • Futuristic topics
    • Interclouds
    • Cloud computing support for mobile users

Guest Editors

Swami Sivasubramanian
Amazon, USA
swami@amazon.com

Dimitrios Katsaros
University of Thessaly, Greece
dkatsar@inf.uth.gr

George Pallis
University of Cyprus, Cyprus
gpallis@cs.ucy.ac.cy

Athena Vakali
Aristotle University of Thessaloniki, Greece
avakali@csd.auth.gr

Datacenter Networks and move towards scalable network architectures

Posted in Cloud Computing on November 2, 2010 by swaminathans

Last week, I was super excited to attend James’s talk on “Datacenter networks are in my way” in our internal talk series called Principals of Amazon. As always, James’s talk always is illuminating. I highly encourage everyone to read James’s post and the slides.

A few takeaways from James’s talk worth calling out:

– Contrary to popular belief, power is not the leading driver for datacenter operational cost. It is actually the server cost (which is about 57%).

– The above leads to the conclusion that techniques like shutting down servers when the server is not being used, while interesting, is not a big return for the investment. Instead, you are better off doing the exact opposite: utilize your existing server investment to the fullest.

– Traditional DC networks are usually oversubscribed and live in a very vertical world where all network components are done by a single vendor and are also built to be more like mainframe with “scale up” (get bigger boxes) model instead of “scale out” model. This is bad for sustainability and reliability.

– To enable higher server utilization, you need your datacenter networks to support full connectivity between hosts and not be oversubscribed.

The above takeaways tell us that we need to build DC networks such that they can be easily scaled (moving from oversubscribed to undersubscribed). To scale the DC networks, we need to build out a scale out DC network architecture and systems like OpenFlow enable that. It is interesting to see that just like what we learnt in distributed systems and datastores is applicable to datacenter networks also: Scale out (horizontal scaling) is in the long run better than scale up (vertical scaling).

Amazon CloudFront: Awards

Posted in Uncategorized on October 26, 2010 by swaminathans

Amazon CloudFront is a web service for content delivery that allows you to deliver static and streaming content using a network of edge locations.  Often, these systems are called Content Deliver Networks or CDNs (for more general survey of CDN techniques, see here).

I was fortunate to have had the opportunity to work with folks like Brad Marshall and David Richardson during the initial days of CloudFront, and even more excited to see what they have done in the past couple of years such as adding the support for streaming media and HTTPS support.

Last week, I was excited to see CloudFront winning couple of prestigious awards: Streaming Media – 2010 Editor’s Pick and European Readers Choice awards. I’m glad to see Cloudfront’s contributions being recognized and our customers find it extremely useful. Congratulations, CloudFront team!

On that note, CloudFront team is actively looking for extremely smart engineers who are passionate about large scale distributed systems and networks. If you’re interested, please contact David Richardson.

Taking on a new role in AWS Database Services

Posted in Cloud Computing, NoSQL on October 12, 2010 by swaminathans

As many of you know me and my work, I’ve always been passionate about building large scale distributed systems. I’m glad to have had the opportunity to work in teams that built great systems like Amazon Dynamo, Amazon CloudFront, and Amazon RDS. Moreover, I had the opportunity to learn from some exceptionally smart people like Werner Vogels, Al Vermuellen and James Hamilton.

I am personally thrilled to see the momentum Dynamo has created in the NoSQL world and was personally excited to talk about in the NoSQL meetups. Similarly, I have been amazed at the rapid adoption of RDS and its Multi-AZ features.

So far, in AWS, I’ve been working as a Principal Engineer (aka Architect) in charge of the architecture and implementation of these systems. Recently, I took on a new role to manage and lead the non-relational database services team in AWS in a leadership role. I’ll be leading the SimpleDB team and couple of other internal teams and looking at ways to build scalable data access primitives. I’m super excited to to join this incredibly smart team and deliver some great systems.

On this note, I’m actively hiring for my teams, see here. If you’re interested in joining a team that is building what will be the blueprint for scalable datastores, then please send me a note (swami-removetheobvious@amazon.com). I’m looking for highly talented engineers.

WebApps 11

Posted in Distributed Systems on October 6, 2010 by swaminathans

As many of you, the past 4 years in Amazon, I’ve spent on building out various parts of Amazon and AWS infrastructure that will enable Amazon and other AWS users to build highly available and scalable applications.

This also gives me immense pleasure to be part of the program committee for Usenix WebApps 2011. This conference is primarily focused on building highly scalable and Web applications. Looking forward to reviewing some great submissions and the conference.

I highly recommend submitting your ideas.

CFP details below:

WebApps ’11 Call for Papers

2nd USENIX Conference on Web Application Development (WebApps ’11)

June 15–16, 2011
Portland, OR

WebApps ’11 will take place during USENIX Federated Conferences Week, June 12–17, 2011.

Important Dates

  • Submissions due: January 21, 2011, 11:59 p.m. PST (hard deadline, no extensions, no exceptions, so don’t ask)
  • Notification to authors: March 17, 2011
  • Final files due: May 3, 2011

Conference Organizers

Program Chair
Armando Fox, University of California, Berkeley

Program Committee
Adam Barth, Google Inc.
Abdur Chowdhury, Twitter
Jon Howell, Microsoft Research
Collin Jackson, Carnegie Mellon University
Bobby Johnson, Facebook
Emre Kıcıman, Microsoft Research
Michael E. (“Max”) Maximilien, IBM Research
Owen O’Malley, Yahoo! Research
John Ousterhout, Stanford University
Swami Sivasubramanian, Amazon Web Services
Geoffrey M. Voelker, University of California, San Diego
Nickolai Zeldovich, Massachusetts Institute of Technology

Overview

The Web is now the dominant platform for delivering interactive applications to hundreds of millions of users. Such applications are now expected to scale effortlessly from tens of users to tens of millions of users in a single day while providing a responsive “always-on” experience. These demands, as well as the new possibilities opened by the proliferation of Web-capable mobile devices, requires that Web apps’ design and operation be elastic, failure-tolerant, and seamlessly scalable, supporting multiple devices and access methods.

Like the inaugural WebApps ’10, WebApps ’11 seeks to attract cutting-edge research that advances the state of the art, not only on novel Web applications but also on infrastructure, tools, and techniques that support the development, analysis/testing, operation, or deployment of those applications.

Topics

Possible topics include but are not limited to:

  • Storage for Web-scale applications
  • Techniques for testing and debugging
  • Novel strategies for fault tolerance or high availability in Web apps
  • The Web as an emerging platform in new application areas
  • HCI techniques related specifically to Web apps
  • Measurement, modeling, workload generation, and other tools to aid experimental research on Web apps
  • New and unusual app features or implementation techniques
  • Media delivery applications and infrastructure
  • Client-side libraries, toolkits, plug-ins
  • Server-side frameworks
  • Languages and language engineering advances relevant to Web app development
  • Deployment substrates and technologies (cloud computing, infrastructure as a service, testing as a service, etc.)

More details:

CONFERENCE HOME PAGE (full info): http://www.usenix.org/events/webapps11/

CFP online and in PDF format:

http://www.usenix.org/events/webapps11/cfp/
http://www.usenix.org/events/webapps11/cfp/webapps11cfp.pdf

NoSQL meetup

Posted in NoSQL on August 24, 2010 by swaminathans

This week, I’ll be giving a talk to the NoSQL Seattle community about “Building Scalable Distributed Systems”. I’m super-excited to meet this really active community, which is passionate about building and running large scale systems.  I’m also curious to hear about Ben about Riak and about the Hue framework. More details about the meetup can be found here.

See you there on the NoSQL meetup this Wednesday evening!

Just Launched: Amazon RDS Multi-AZ

Posted in Cloud Computing, Distributed Systems on May 18, 2010 by swaminathans

Today, my colleagues and I launched a major feature in Amazon RDS called Multi-AZ DBInstances. A Multi-AZ DB Instance performs synchronous replication of data across replicas located in different availability zones.

What benefits does Multi-AZ DB Instance provide?

–          Higher Durability: Your data is synchronously replicated across different availability zones. This guarantees higher levels of durability even in the wake of a disk, instance or availability zone failures.

–          Higher Availability: With Multi-AZ instances, we perform master-slave replication and when we detect that the master is unavailable, we automatically failover to the slave replica. This guarantees higher levels of availability. Contrary to MySQL’s asynchronous replication, since the data is synchronously replicated, when the replica fails over to the secondary replica you will experience no data loss.

You can learn more about all the resilient goodness of RDS Multi-AZ deployments here.

Building Scalable Social Gaming Platform using Clouds

Posted in Cloud Computing, Distributed Systems on February 18, 2010 by swaminathans

This week, I ran into many social gaming related posts.  First thing that surprised me was the social games are not played predominantly by teens and the average social gamer is around 40s.  This illustrates why these social games attract tens of millions of players every day.

When I read the high scalability article about How Zynga scaled Farmville to Harvest 75 Million Players a month, I was intrigued by their scaling challenges and requirements:

* Read-write ratio: Interactive games are write heavy unless traditional web applications. Seems intuitive when you think about the fact every move is recorded in a datastore.
* Users are disturbed by high latencies and variability in latencies:  Note that I called high latencies and latency variations as two different entities. As we noted in Dynamo, Amazon also cares about the variability in latencies (percentiles) and build our services to make sure we can constantly keep the variability in control.
* Dealing with latency and failure characteristics of external dependencies: These applications need to deal with external platforms like Facebook which may or may not be available all the time.

I like the way Zynga approached this problem:

* For solving heavy writes, looks like they have partitioned heavily. Seems reasonable – however, I’m curious to see how they handled “hot spots” (wherein the most active users are the ones constantly generating more data) and whether simple hash-based partitioning is good enough to spread the hot spots.

* For handling latency variations, they went for isolating each component and built graceful degradation at each layer. This is a common practice in building large scale systems . The thing what I would be curious to see is how “gracefully” does their datastore degrades and also which datastore they are using for that.

* Finally, to deal with failures of external dependancies and still meet latency SLAs, looks like they cache the responses of external dependancies.

These are very good lessons for building scalable systems.

An interesting followup to this, I saw that Rightscale (which is apparently helping Zynga run on top of AWS) is using its expertise to offer a social gaming platform for other aspiring Zyngas out there! Seems like an exciting internet industry at its really nascent stages.