Amazon’s First Annual Ph.D. Symposium: Program

As I had called out earlier, Amazon is conducting its first annual Ph.D. symposium where we invite Ph.D. students around the world to come and present their research topics in the form of talks and posters on three major topics:  Cloud computing, Operations Research and Data Mining.

We had a great list of submissions from Ph.D. candidates  based out of top schools in USA, Canada and Europe. This made our selection process hard and competitive.Our acceptance rate was in the single digit percentage which is extremely low for a research symposium carried out for the first team.

This enabled us to generate a great program that stretched over a single day with three parallel tracks: Cloud Computing or AWS, Machine Learning/Computer Vision and Operations Research. Credit goes to Jay Little from Amazon (and his team) for putting together such a great program for its symposium.

I wanted to highlight a few talks that were selected in these tracks.

Cloud Computing 

Justin Debrabant, Brown University.

Beyond Main Memory: Anti-Caching in Main Memory Database Systems.

The traditional wisdom for building disk-based relational database management systems (DBMS) is to organize data in heavily-encoded blocks stored on disk, with a main memory block cache. In order to improve performance given high disk latency, these systems use a multi-threaded architecture with dynamic record-level locking that allows multiple transactions to access the database at the same time. Previous research has shown that this results in substantial over- head for on-line transaction processing (OLTP) applications.

The next generation DBMSs seek to overcome these limitations with architecture based on main memory resident data. To overcome the restriction that all data fit in main memory, we propose a new technique, called anti-caching, where cold data is moved to disk in a transactionally-safe manner as the database grows in size. Because data initially resides in memory, an anti-caching architecture reverses the traditional storage hierarchy of disk-based systems. Main memory is now the primary storage device.

We implemented a prototype of our anti-caching proposal in a high-performance, main memory OLTP DBMS and performed a series of experiments across a range of database sizes, workload skews, and read/write mixes. We compared its performance with an open-source, disk-based DBMS optionally fronted by a distributed main memory cache. Our results show that for higher skewed workloads the anti-caching architecture has a performance advantage over either of the other architectures tested of up to 9× for a data size 8× larger than memory.

In addition, this talk will encompass ongoing work on adapting OLTP systems to use non-volatile RAM (NVRAM). With latency profiles inline with current DRAM speeds and persistent writes, NVRAM technologies have potential to significantly impact current OLTP architectures. We will discuss several possible architectures for NVRAM-based systems including an extension to the anti-caching architecture originally designed with disk as a backend. Finally, we will present early benchmark results attained using a NVM emulator constructed by our collaborators at Intel.

Rahul Potharaju, Purdue University

Network failures are a significant contributor to system downtime and service unavailability. To track troubleshooting at various levels (e.g., network, server, software), operators typically deploy trouble ticket systems which log all the steps from opening a ticket (e.g., customer complaint) till its resolution. Unfortunately, while tickets carry valuable information for network management, analyzing them to do problem inference is extremely difficult — fixed fields are often inaccurate or incomplete, and the free-form text is mostly written in natural language. To analyze this free-form text, we have built NetSieve [NSDI’13], a system that aims to automatically analyze ticket text written in natural language to infer the problem symptoms, troubleshooting activities and resolution actions.

In this talk, I will present the design philosophy behind NetSieve and the lessons we learnt by applying our system on several large datasets. The goal of my talk is to motivate the research community to focus on these unexplored yet important datasets as they contain actionable data to understand and improve system design. For instance, the lessons learnt by applying these techniques to vulnerability databases can be applied towards improving the security of deployed systems, understanding the common pitfalls of writing secure programs etc. Similarly, lessons learnt from customer review repositories (e.g., Amazon) can be applied towards understanding problems faced by a majority of customers and improving future product designs.

Shrutarshi Basu, Cornell University

Merlin : A Unified Framework for Network Management

Merlin is a new network, high-level management framework. With Merlin, administrators express network policy using programs in a declarative language based on logical predicates and regular expressions. The Merlin compiler automatically partitions these programs into components that can be placed on a variety of devices including switches, middleboxes, and end hosts. It uses a constraint solver based on parameterizable heuristics to determine allocations of network-wide resources such as paths and bandwidth. To facilitate management of federated networks, Merlin provides mechanisms for delegating management of sub-policies to tenants, along with tools for verifying that delegated sub-policies do not violate global constraints. Overall, Merlin simplifies the task of network administration by providing high-level abstractions for specifying network policy and scalable infrastructure for enforcing them.

Machine Learning/Computer Vision/Natural Language Processing:

Bin Zhao, CMU

With the widespread availability of low-cost devices capable of photo shooting and high-volume video record- ing, we are facing explosion of both image and video data. The sheer volume of such visual data poses both challenges and opportunities in machine learning and computer vision research.

In image classification, most of previous research has focused on small to medium-scale data sets, containing objects from dozens of categories. However, we could easily access images spreading thousands of categories. Unfortunately, despite the well-known advantages and recent advancements of multi-class classification tech- niques in machine learning, complexity concerns have driven most research on such super large-scale data set back to simple methods such as nearest neighbor search, one-vs-one or one-vs-rest approach. However, facing image classification problem with such huge task space, it is no surprise that these classical algorithms, often favored for their simplicity, will be brought to their knees not only because of the training time and storage cost they incur, but also because of the conceptual awkwardness of such algorithms in massive multi-class paradigms. Therefore, it is our goal to directly address the bigness of image data, not only the large number of training images and high-dimensional image features, but also the large task space. Specifically, we present algorithms capable of efficiently and effectively training classifiers that could differentiate tens of thousands of image classes.

Similar to images, one of the major difficulties in video analysis is also the huge amount of data, in the sense that videos could be hours long or even endless. However, it is often true that only a small portion of video contains important information. Consequently, algorithms that could automatically detect unusual events within streaming or archival video would significantly improve the efficiency of video analysis and save valuable human attention for only the most salient contents. Moreover, given lengthy recorded videos, such as those captured by digital cameras on mobile phones, or surveillance cameras, most users do not have the time or energy to edit the video such that only the most salient and interesting part of the original video is kept. To this end, we also develop algorithm for automatic video summarization.

Therefore, the goal of our research can be summarized as follows: We aim to design machine learning algorithms to automatically analyze and understand large-scale image and video data. Specifically, we design algorithms to address the bigness in image categorization – not only in the form of large number of data points and/or high-dimensional features, but also the large task space. We aim to scale our algorithm to image collections with tens of thousands of classes. We also propose algorithm to address the bigness of video stream, with hours in temporal length, or even endless, and automatically distill such videos to identify interesting events and summarize its contents.

Ying Liu, MIT:

Gaussian Graphical Models (GGMs) or Gauss Markov random fields are widely used in many applications, and the trade-off between the modeling capacity and the efficiency of learning and inference has been an important research problem. In this paper, we study the family of GGMs with small feedback vertex sets (FVSs), where an FVS is a set of nodes whose removal breaks all the cycles. Exact inference such as computing the marginal distributions and the partition function has complexity $O(k^{2}n)$  using message-passing algorithms, where k  is the size of the FVS, and n  is the total number of nodes. We propose efficient structure learning algorithms for two cases: 1) All nodes are observed, which is useful in modeling social or flight networks where the FVS nodes often correspond to a small number of high-degree nodes, or hubs, while the rest of the networks is modeled by a tree. Regardless of the maximum degree, without knowing the full graph structure, we can exactly compute the maximum likelihood estimate in $O(kn^2+n^2\log n)$  if the FVS is known or in polynomial time if the FVS is unknown but has bounded size. 2) The FVS nodes are latent variables, where structure learning is equivalent to decomposing a inverse covariance matrix (exactly or approximately) into the sum of a tree-structured matrix and a low-rank matrix. By incorporating efficient inference into the learning steps, we can obtain a learning algorithm using alternating low-rank correction with complexity $O(kn^{2}+n^{2}\log n)$  per iteration. We also perform experiments using both synthetic data as well as real data of flight delays to demonstrate the modeling capacity with FVSs of various sizes.

Hossein Azari Soufiani, Harvard:

Rank data can be modeled using Random Utility Theory by assigning a real-valued score to each alternative from a parameterized distribution, and then ranking the alternatives according to scores. Our studies have enabled the estimation of parameters for random utility models (RUMs) with exponential families as noise. We also extend RUM models by defining an RUM mixture model with multiple preference types. We identify conditions that enable fast inference through MC-EM, providing identifiability of the models and approximate log-concavity of the likelihood function.

Operations Research

Aly Megahed, Georgia Tech

Topics in Supply Chain Planning in Deterministic and Stochastic Environments.

The problem of supply chain planning became vital to the business practices of most manufacturing and service organizations in today’s global marketplace. In this talk, we present three unified supply chain planning problems.

For the first problem, motivated by the planning challenges faced by of one of the world’s largest manufacturers of wind turbines, we present a comprehensive tactical supply chain planning model for wind turbine manufacturing. The model is multi-period, multi-echelon, and multi-commodity. It explicitly incorporates delay dependent backorder penalties with any general cost structure for the first time. We present numerical results that show the significant impact of this capability. We also show the effect of backorder penalties on different supply chain costs.

We extend the first problem to include uncertainty, and develop a two-stage stochastic programming model for this second problem. The considered supply uncertainty combines random yield and stochastic lead times, and is thus the most general form of such uncertainty to date. Theoretical and computational results regarding the impact of supplier uncertainty/unreliability on procurement decisions are derived.

In our third problem, we extend the first problem by including procurement with time-aggregated quantity discounts, where the discounts are given on aggregated order quantities while the orders are placed on periodic basis (e.g., total annual orders that were placed on a monthly basis). We develop a Mixed Integer Programming (MIP) model for the problem. The model is computationally very challenging to solve. Leading commercial solvers such as CPLEX take hours to generate good feasible solutions for many tested instances. We develop a pre-solve algorithm that reduces the number of binary variables and a MIP based loval search heuristic that quickly generates good feasible solutions. Preliminary results show that our solution methodology is promising.

Rattachut Tangsucheeva, Penn State

Why Firms Go Bankrupt? A Dynamics Perspective

Managing modern supply chains involves dealing with complex dynamics of materials, information, and cash flows around the globe, and is a key determinant of business success today. One of the well-recognized challenges in supply chain management is the inventory bullwhip effect in which demand forecast errors are amplified as it propagates upstream from a retailer, in part because of lags and errors in information flows. Adverse effects of such bullwhip effect include excessive inventory and wasteful swings in manufacturing production. In this research we postulate that there can be a corresponding bullwhip in the cash flow across the supply chain and it can be one of many reasons why firms run out of cash and become bankrupt. Furthermore, we explore ways to predict its impact by using a cash flow forecasting model and identify a potential strategy to engineer a solution for controlling working capital by using portfolio optimization theory.

Sarah Ahmadian, Waterloo

Mobile Facility Location

I will talk about the Mobile Facility Location. This problem is motivated by a model where a manufacturer wants to set up distribution centers so that it can send products to these centers (in one shipment) and the products needed at each retail center can be shipped from the closest distribution center.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s