Middleware 2016

AdaptCache: Adaptive Data Partitioning and Migration for Distributed Object Caches

Omar Asad (McGill University), Bettina Kemme (McGill University)

Caching is an important aspect of today's web enterprise systems, enhancing performance and reducing access frequency to the backend storage server. However, when implemented as an independent component, such as when using Memcached, access to the remote cache can lead to considerable overhead and delay. Instead, in this paper we propose a cooperative and integrated cache framework where each application server has its own local cache but can also access the caches of other servers. Our solution dynamically distributes objects across caches and requests across servers such that the load is equally distributed across servers and servers find the objects they need for request execution in their local cache most of the time. The challenge is to handle the complex workloads of current e-commerce sites where requests can access more than one object and change the objects they access over time, and different request types access overlapping sets of objects. As such, we look at a whole range of distribution mechanisms and analyze their behavior in detail. We evaluated the various alternatives using the YCSB and RUBiS benchmarks, showing that our approach is applicable in different dynamic workload scenarios and is able to autonomously capture workload changes and adapt instantly. Further, the optimization techniques could further lower the data distribution processing times as well as reducing the number of migrated objects.

Atum: Scalable Group Communication Using Volatile Groups

Rachid Guerraoui (École Polytechnique Fédérale de Lausanne), Anne-Marie Kermarrec (Inria Rennes), Matej Pavlovic (École Polytechnique Fédérale de Lausanne), Dragos-Adrian Seredinschi (École Polytechnique Fédérale de Lausanne)

This paper presents Atum, a group communication middleware for a large, dynamic, and hostile environment. At the heart of Atum lies the novel concept of volatile groups: small, dynamic groups of nodes, each executing a replicated state machine protocol, and organized in a flexible overlay. Using volatile groups, Atum scatters faulty nodes evenly among groups, and then masks each individual fault inside its group. To broadcast messages among volatile groups, Atum runs a gossip protocol across the overlay. We report on our synchronous and asynchronous (eventually synchronous) implementations of Atum, as well as on three representative applications that we build on top of it: A publish/subscribe platform, a file sharing service, and a data streaming system. We show that (a) Atum can grow at an exponential rate beyond 1000 nodes and disseminate messages in polylogarithmic time (conveying good scalability); (b) it smoothly copes with 18% of nodes churning every minute; and (c) it is impervious to arbitrary faults, suffering no performance decay despite 5.8% Byzantine nodes in a system of $850$ nodes.

Bifrost - Supporting Continuous Deployment with Automated Enactment of Multi-Phase Live Testing Strategies

Gerald Schermann (University of Zurich), Dominik Schöni (University of Zurich), Philipp Leitner (University of Zurich), Harald C. Gall (University of Zurich)

Live testing is used in the context of continuous delivery and deployment to test changes or new features in the production environment. This includes canary releases, dark launches, A/B tests, and gradual rollouts. Oftentimes, multiple of these live testing practices need to be combined (e.g., running an A/B test after a dark launch). Manually administering such multi-phase live testing strategies is a daunting task for developers or release engineers. In this paper, we introduce a formal model for multi-phase live testing, and present Bifrost as a prototypical Node.js based middleware that allows developers to define and automatically enact complex live testing strategies. We extensively evaluate the runtime behavior of Bifrost in three rollout scenarios of a microservice-based case study application, and conclude that the performance overhead of our prototype is at or below 8 ms for most scenarios. Further, we show that more than 100 parallel strategies can be enacted even on cheap public cloud instances.

Big ideas paper: Policy-driven middleware for a legally-compliant Internet of Things

Jatinder Singh (University of Cambridge), Thomas Pasquier (University of Cambridge), Jean Bacon (University of Cambridge), David Eyers (University of Otago), Raluca Diaconu (University of Cambridge)

Internet of Things (IoT) applications, systems and services are subject to law. We argue that for the IoT to develop, there must be technical mechanisms that allow the enforcement of policy, to ensure that such systems align with legal realities. The audit of policy enforcement must enable the apportionment of liability, demonstrate compliance with regulation, and indicate whether policy correctly captures legal responsibilities. As both systems and obligations evolve dynamically, this cycle must be continuously maintained.

This poses a huge challenge given the global scale of the IoT vision. The IoT entails dynamically creating new services through managed and flexible data exchange. Data management is complex in this dynamic environment, given the need to both control and share information, often across federated domains of administration.

We see middleware playing a key role in managing the IoT. Our vision is for a middleware-enforced, unified policy model that applies end-to-end, throughout the IoT. This is because policy cannot be bound to things, applications, or administrative domains, since functionality is the result of composition, with chains of data flows dynamically formed.

We have investigated the use of Information Flow Control (IFC) to manage and audit data flows in cloud computing; a domain where trust can be well-founded, regulations are more mature and associated responsibilities are clearer. We feel that IFC has great potential in the broader IoT context. However, the scale, dynamic and federated nature of the IoT poses a number of significant research challenges, which this paper explores.

BrowserFlow: Imprecise Data Flow Tracking to Prevent Accidental Data Disclosure

Ioannis Papagiannis (Facebook), Pijika Watcharapichat (Imperial College London), Divya Muthukumaran (Imperial College London), Peter Pietzuch (Imperial College London)

With the use of external cloud services such as Google Docs or Evernote in an enterprise setting, the loss of control over sensitive data becomes a major concern for organisations. It is typical for regular users to violate data disclosure policies accidentally, e.g. when sharing text between documents in browser tabs. Our goal is to help such users comply with data disclosure policies: we want to alert them about potentially unauthorised data disclosure from trusted to untrusted cloud services. This is particularly challenging when users can modify data in arbitrary ways, they employ multiple cloud services, and cloud services cannot be changed.

To track the propagation of text data robustly across cloud services, we introduce imprecise data flow tracking, which identifies data flows implicitly by detecting and quantifying the similarity between text fragments. To reason about violations of data disclosure policies, we describe a new text disclosure model that, based on similarity, associates text fragments in web browsers with security tags and identifies unauthorised data flows to untrusted services. We demonstrate the applicability of imprecise data tracking through BrowserFlow, a browser-based middleware that alerts users when they expose potentially sensitive text to an untrusted cloud service. Our experiments show that BrowserFlow can robustly track data flows and manage security tags for many documents with no noticeable performance impact.

Containers and Virtual Machines at Scale: A Comparative Study

Prateek Sharma (University of Massachusetts Amherst), Lucas Chaufournier (University of Massachusetts Amherst), Prashant Shenoy (University of Massachusetts Amherst), Y.C. Tay (National University of Singapore)

Virtualization is used in data center and cloud environments to decouple applications from the hardware they run on. Hardware virtualization and operating system level virtualization are two prominent technologies that enable this. Containers, which use OS virtualization, have recently surged in interest and deployment. In this paper, we study the differences between the two virtualization technologies. We compare containers and virtual machines in large data center environments along the dimensions of performance, manageability and software development.

We evaluate the performance differences caused by the different virtualization technologies in data center environments where multiple applications are running on the same servers (multi-tenancy). Our results show that co-located applications can cause performance interference, and the degree of interference is higher in the case of containers for certain types of workloads. We also evaluate differences in the management frameworks which control deployment and orchestration of containers and VMs. We show how the different capabilities exposed by the two virtualization technologies can affect the management and development of applications. Lastly, we evaluate novel approaches which combine hardware and OS virtualization.

Dos and Don'ts in Mobile Phone Sensing Middleware: Learning from a Large-Scale Experiment

Valerie Issarny (Inria), Vivien Mallet (Inria), Kinh Nguyen (Inria), Pierre-Guillaume Raverdy (Ambientic), Fadwa Rebhi (Inria), Raphael Ventura (Inria)

Mobile phone sensing contributes to changing the way we approach science: massive amount of data is being contributed across places and time, and paves the way for advanced analyses of numerous phenomena at an unprecedented scale. Still, despite the extensive research work on enabling resource-efficient mobile phone sensing with a very-large crowd, key challenges remain. One challenge is facing the introduction of a new heterogeneity dimension in the traditional middleware research landscape. The middleware must deal with the heterogeneity of the contributing crowd in addition to the system's technical heterogeneities. In order to tackle these two heterogeneity dimensions together, we have been conducting a large-scale empirical study in cooperation with the city of Paris. Our experiment revolves around the public release of a mobile app for urban pollution monitoring that builds upon a dedicated mobile crowd-sensing middleware. In this paper, we report on the empirical analysis of the resulting mobile phone sensing efficiency from both technical and social perspectives, in face of a large and highly heterogeneous population of participants. We concentrate on the data originating from the 20 most popular phone models of our user base, which represent contributions from over 2,000~users with 23~million observations collected over 10 months. Following our analysis, we introduce a few recommendations to overcome -technical and crowd- heterogeneities in the implementation of mobile phone sensing applications and supporting middleware.

Dynamic Load Balancing for Ordered Data-Parallel Regions in Distributed Streaming Systems

Scott Schneider (IBM Research), Joel Wolf (IBM Research), Kirsten Hildrum (IBM Research), Rohit Khandekar (IBM Research), Kun-Lung Wu (IBM Research)

Distributed stream computing has emerged as a technology that can satisfy the low latency, high throughput demands of big data. Stream computing naturally exposes pipeline, task and data parallelism. Meeting the throughput and latency demands of online big data requires exploiting such parallelism across heterogeneous clusters. When a single job is running on a homogeneous cluster, load balancing is important. When multiple jobs are running across a heterogeneous cluster, load balancing becomes critical. The data parallel regions of distributed streaming applications are particularly sensitive to load imbalance, as their overall speed is gated by the slowest performer. We propose a dynamic load balancing technique based on a system artifact: the TCP blocking rate per connection. We build a function for each connection based on this blocking rate, and obtain a balanced load distribution by modeling the problem as a minimax separable resource allocation problem. In other words, we minimize the maximum value of these functions. Our model achieves local load balancing that does not require any global information. We test our model in a real streaming system, and demonstrate that it is able to detect differences in node capacities, determine the correct load distribution for those capacities and dynamically adapt to changes in the system.

GC-assisted JVM Live Migration fo Java Server Application

Rodrigo Bruno (INESC-ID / Instituto Superior Técnico / University of Lisbon), Paulo Ferreira (INESC-ID / Instituto Superior Técnico / University of Lisbon),

Live migration of Java Virtual Machines (JVMs) consumes significant amounts of time and resources, imposing relevant application performance overhead. This problem is specially hard when memory modified by applications changes faster than it can be transferred through the network (to a remote host). Current solutions to this problem resort to several techniques which depend on high-speed networks and application throttling, require lots of CPU time to compress memory, or need explicit assistance from the application. We propose a novel approach, Garbage Collector (GC) assisted JVM Live Migration for Java Server Applications (ALMA). ALMA makes a snapshot to be migrated containing a minimal amount of application state, by taking into account the amount of reachable memory (i.e. live data) detected by the GC. The main novelty of ALMA is the following: ALMA analyzes the JVM heap looking for regions in which a collection phase is advantageous w.r.t. the network bandwidth available (i.e. it pays to collect because a significant amount of memory will not be part of the snapshot). ALMA is implemented on OpenJDK 8 and extends CRIU (a Linux disk-based process checkpoint/restore tool) to support process live migration over the network. We evaluate ALMA not only by using well-known JVM performance benchmarks (SPECjvm2008 and DaCapo), but also by comparing it to other previous approaches. ALMA shows significant performance improvements compared to all previous solutions.

Geo-Distribution of Flexible Business Processes over Publish/Subscribe Paradigm

Martin Jergler (Technical University of Munich (TUM)), Mohammad Sadoghi (IBM T. J. Watson Research Center), Hans-Arno Jacobsen (Technical University of Munich (TUM))

An increasing amount of business processes are inherently knowledge-intense and require ad-hoc decision making. Flexible modeling approaches such as Case Management Model and Notation (CMMN) were designed to support such scenarios. At the same time, many processes involve participants and data from different organizations across the globe. Often, legal regulations such as data privacy render centralized automation systems impractical because data must be processed where it is collected. Instead, distributed approaches to coordinate process and data are necessary for supporting geo-scale execution. In this paper, we present a fully geo-distributed workflow engine that implements the core execution semantics of CMMN, the Guard-Stage-Milestone (GSM) meta-model, and supports locality of process data by distributing data and control-flow management over a loosely-coupled publish/subscribe infrastructure. We present a novel context-aware mapping (CAM) of GSM into Workflow Units (WFUs), representing the unit of distribution in our system. We have developed our distributed workflow execution engine over PADRES, an enterprise-grade event management system. Evaluation results show that our approach scales well with process size and degree of distribution and that CAM improves throughput and latency by up to 4X compared to the baseline.

Locality-Aware Routing in Stateful Streaming Applications

Matthieu Caneill (Université Grenoble Alpes), Ahmed El Rheddane (Université Grenoble Alpes), Vincent Leroy (Université Grenoble Alpes), Noel De Palma (Université Grenoble Alpes)

Distributed stream processing engines continuously execute series of operators on data streams. Horizontal scaling is achieved by deploying multiple instances of each operator in order to process data tuples in parallel. As the application is distributed on an increasingly high number of servers, the likelihood that the stream is sent to a different server for each operator increases. This is particularly important in the case of stateful applications that rely on keys to deterministically route messages to a specific instance of an operator. Since network is a bottleneck for many stream applications, this behavior significantly degrades their performance.

Our objective is to improve stream locality for stateful stream processing applications. We propose to analyse traces of the application to uncover correlations between the keys used in successive routing operations. By assigning correlated keys to instances hosted on the same server, we significantly reduce network consumption and increase performance while preserving load balance. Furthermore, this approach is executed online, so that the assignment can automatically adapt to changes in the characteristics of the data. Data migration is handled seamlessly with each routing configuration update.

We implemented and evaluated our protocol using Apache Storm, with a real workload consisting of geo-tagged Flickr pictures as well as Twitter publications. Our results show a significant improvement in throughput.

Locking Made Easy

Jelena Antic (EPFL), Georgios Chatzopoulos (EPFL), Rachid Guerraoui (EPFL), Vasileios Trigonakis (EPFL)

A priori, locking seems easy: To protect shared data from concurrent accesses, it is sufficient to lock before accessing the data and unlock after. Nevertheless, making locking efficient requires fine-tuning (a) the granularity of locks and (b) the locking strategy for each lock and possibly each workload. As a result, locking can become very complicated to design and debug.

We present GLS, a middleware that makes lock-based programming simple and effective. GLS offers the classic lock-unlock interface of locks. However, in contrast to classic lock libraries, GLS does not require any effort from the programmer for allocating and initializing locks, nor for selecting the appropriate locking strategy. With GLS, all these intricacies of locking are hidden from the programmer. GLS is based on GLK, a generic lock algorithm that dynamically adapts to the contention level on the lock object. GLK is able to deliver the best performance among simple spinlocks, scalable queue-based locks, and blocking locks. Furthermore, GLS offers several debugging options for easily detecting various lock-related issues, such as deadlocks.

We evaluate GLS and GLK on two modern hardware platforms, using several software systems (i.e., HamsterDB, Kyoto Cabinet, Memcached, MySQL, SQLite) and show how GLK improves their performance by 23% on average, compared to their default locking strategies. We illustrate the simplicity of using GLS and its debugging facilities by rewriting the synchronization code for Memcached and detecting two potential correctness issues.

Mitigating performance unpredictability in the IaaS using the Kyoto principle

Alain Tchana (IRIT, Toulouse), Bao Bui (IRIT, Toulouse), Boris Teabe (IRIT, Toulouse), Daniel Hagimont (IRIT, Toulouse)

Performance isolation is enforced in the cloud by setting to each virtual machine (VM) a given fraction of each resource type (physical memory, processor, and IO bandwidth). However, microarchitectural-level resources such as processor’s caches cannot be divided and allocated to VMs: they are globally shared among all VMs which compete for their use, leading to cache contention. There fore, performance isolation and predictability are compro mised. This situation is devastating for HPC applications.

In this paper, we propose a software solution (called Kyoto) to this issue, inspired by the polluters pay principle. A VM is said to pollute the cache if it provokes significant cache replacements which impact the performance of other VMs. Henceforth, using the Kyoto system, the provider can encourage HPC cloud users to book pollution permits for their VMs. We have implemented Kyoto in several virtualization systems including both general purpose systems (Xen and KVM) and specialized HPC systems (Pisces).

Netalytics: Cloud-Scale Application Performance Monitoring with SDN and NFV

Guyue Liu (George Washington University), Michael Trotter (George Washington University), Yuxin Ren (George Washington University), Timothy Wood (George Washington University)

Application performance monitoring in large data centers relies on either deploying expensive and specialized hardware at fixed locations or heavily customizing applications and collecting logs spread across thousands of servers. Such an endeavor makes performance diagnosis a time-consuming task for cloud providers and a problem beyond the control of cloud customers. We address this problem using emerging software defined paradigms such as Software Defined Networking and Network Function Virtualization as well as big data technologies. In this paper, we propose NetAlytics: a non-intrusive distributed performance monitoring system for cloud data centers. NetAlytics deploys customized monitors in the middle of the network which are transparent to end hosts applications, and leverages a real-time big data framework to analyze application behavior in a timely manner. NetAlytics can scale to packet rates of 40Gbps using only four monitoring cores and fifteen processing cores. Its placement algorithm can be tuned to minimize network bandwidth cost or server resources, and can reduce monitoring traffic overheads by a factor of 4.5. We present experiments that demonstrates how NetAlytics can be used to troubleshoot performance problems in load balancers, present comprehensive performance analysis, and provide metrics that drive automation tools, all while providing both low overhead monitors and scalable analytics.

Online Scheduling for Shuffle Grouping in Distributed Stream Processing Systems

Nicolo Rivetti (LINA / Université de Nantes, France - DIAG / Sapienza University of Rome, Italy), Emmanuelle Anceaume (IRISA / CNRS, Rennes, France), Yann Busnel (Crest (Ensai) / Inria, Rennes, France), Leonardo Querzoni (DIAG / Sapienza University of Rome, Italy), Bruno Sericola (Inria, Rennes, France)

Shuffle grouping is a technique used by stream processing frameworks to share input load among parallel instances of stateless operators. With shuffle grouping each tuple of a stream can be assigned to any available operator instance, independently from any previous assignment. A common approach to implement shuffle grouping is to adopt a round robin policy, a simple solution that fares well as long as the tuple execution time is constant. However, such assumption rarely holds in real cases where execution time strongly depends on tuple content. As a consequence, parallel stateless operators within stream processing applications may experience unpredictable unbalance that, in the end, causes undesirable increase in tuple completion times.

In this paper we propose Proactive Online Shuffle Grouping(POSG), a novel approach to shuffle grouping aimed at reducing the overall tuple completion time. POSG estimates the execution time of each tuple, enabling a proactive and online scheduling of input load to the target operator instances. Sketches are used to efficiently store the otherwise large amount of information required to schedule incoming load. We provide a probabilistic analysis and illustrate, through both simulations and a running prototype, its impact on stream processing applications.

ORION: A Framework for GPU Occupancy Tuning

Ari B. Hayes (Rutgers University), Lingda Li (Rutgers University), Daniel Chavarria (Pacific Northwest National Lab), Shuaiwen Song (Pacific Northwest National Lab), Eddy Z. Zhang (Rutgers University)

An important feature of modern GPU architectures is variable occupancy. Occupancy measures the ratio between the number of active threads on a GPU and the maximum number of threads the GPU hardware can schedule. High occupancy allows a large number of threads to run simultaneously and hide memory latency, but may increase resource contention. Low occupancy has less resource contention, but is also less capable of latency hiding. Occupancy tuning is an important and challenge problem. A program running at different occupancy levels can have three to four times difference in running time. There has been limited exploration in GPU program occupancy tuning. We introduce Orion, the first GPU occupancy tuning framework. The Orion framework automatically chooses and generates the best-occupancy code for any given GPU program. It is capable of finding the (near-)optimal occupancy level with static and dynamic tuning techniques. We demonstrate the efficiency of Orion with twelve representative benchmarks from the Rodinia benchmark suite and CUDA SDK evaluated on two different GPU architectures, obtaining up to 1.57 times speedup, 68.5% memory resource saving, and 28.5% energy saving compared to the baseline of highly optimized code compiled by nvcc.

Programming Scalable Cloud Services with AEON

BO SANG (Purdue University), Masoud Saeida Ardekani (Purdue University), Gustavo Petri (University Paris Diderot), Patrick Eugster (Purdue University), Srivatsan Ravi (Purdue University)

Designing distributed Internet-facing applications that are adaptable to unpredictable workloads and efficiently utilize modern cloud computing platforms is hard. The actor model is a popular paradigm that can be used to develop distributed applications: actors encapsulate state and communicate with each other by sending events. Consistency is guaranteed if each event only accesses a single actor, thus eliminating potential data races and deadlocks. However it is nontrivial to provide consistency for concurrent events spanning across multiple actors.

This paper addresses this problem by introducing AEON: a protocol for strongly consistent and truly scalable cloud applications across distributed actors. Concretely AEON provides the following properties: (i) Programmability: programmers need only reason about sequential semantics when reasoning about concurrency resulting from multi-actor events; (ii) Scalability: its runtime protocol guarantees serializable and starvation-free execution of multi-actor events, while maximizing parallel execution; (iii) Elasticity: supports fine-grained elasticity enabling the programmer to transparently migrate individual actors without violating atomicity or entailing significant performance overheads.

We have implemented a highly available and fault-tolerant prototype of AEON in C++. Extensive experiments show several complex cloud applications build atop AEON significantly outperform others built using existing state-of-the-art distributed cloud programming protocols. According to the experiments, AEON is about 3x faster than similar programming models (EventWave and Orleans). And the elasticity of AEON guarantees service quality with minimal cost compared to any static setup.

SDNFV: Flexible and Dynamic Software Defined Control of an Application- and Flow-Aware Data Plane

Wei Zhang (The George Washington University), Guyue Liu (The George Washington University), Ali Mohammadkhan (University of California, Riverside), Jinho Hwang (IBM T. J. Watson Research Center), K. K. Ramakrishnan (University of California, Riverside), Timothy Wood (The George Washington University)

Software Defined Networking (SDN) promises greater flexibility for directing packet flows, and Network Function Virtualization promises to enable dynamic management of software-based network functions. However, the current divide between an intelligent control plane and an overly simple, stateless data plane results in the inability to exploit the flexibility of a software based network. In this paper we propose SDNFV, a framework that expands the capabilities of network processing-and-forwarding elements to flexibly manage packet flows, while retaining both a high performance data plane and an easily managed control plane.

SDNFV proposes a hierarchical control framework where decisions are made across the SDN controller, a host-level manager, and individual VMs to best exploit state available at each level. This increases the network’s flexibility com- pared to existing SDNs where controllers often make decisions solely based on the first packet header of a flow. SDNFV intelligently places network services across hosts and connects them in sequential and parallel chains, giving both the SDN controller and individual network functions the ability to enhance and update flow rules to adapt to changing conditions. Our prototype demonstrates how to efficiently and flexibly reroute flows based on data plane state such as packet payloads and traffic characteristics.

Secure Content-Based Routing Using Intel Software Guard Extensions

Rafael Pires (University of Neuchatel), Marcelo Pasin (University of Neuchatel), Pascal Felber (University of Neuchatel), Christof Fetzer (TU Dresden)

Content-based routing (CBR) is a powerful model that supports scalable asynchronous communication among large sets of geographically distributed nodes. Yet, preserving privacy represents a major limitation for the wide adoption of CBR, notably when the routers are located in public clouds. Indeed, a CBR router must see the content of the messages sent by data producers, as well as the filters (or subscriptions) registered by data consumers. This represents a major deterrent for companies for which data is a key asset, as for instance in the case of financial markets or to conduct sensitive business-to-business transactions. While there exists some techniques for privacy-preserving computation, they are either prohibitively slow or too limited to be usable in real systems.

In this paper, we follow a different strategy by taking advantage of trusted hardware extensions that have just been introduced in off-the-shelf processors and provide a trusted execution environment. We exploit Intel's new software guard extensions (SGX) to implement a CBR engine in a secure enclave. Thanks to the hardware-based trusted execution environment (TEE), the compute-intensive CBR operations can operate on decrypted data shielded by the enclave and leverage efficient matching algorithms. Extensive experimental evaluation shows that SGX adds only limited overhead to insecure plaintext matching outside secure enclaves while providing much better performance and more powerful filtering capabilities than alternative software-only solutions. To the best of our knowledge, this work is the first to demonstrate the practical benefits of SGX for privacy-preserving CBR.

SecureKeeper: Confidential ZooKeeper using Intel SGX

Stefan Brenner (TU Braunschweig), Colin Wulf (TU Braunschweig), Matthias Lorenz (TU Braunschweig), Nico Weichbrodt (TU Braunschweig), David Goltzsche (TU Braunschweig), Christof Fetzer (TU Dresden), Peter Pietzuch (Imperial College London), Rüdiger Kapitza (TU Braunschweig)

Cloud computing, while ubiquitous, still suffers from trust issues, especially for applications managing sensitive data. Third-party coordination services such as Zookeeper and Consul are fundamental building blocks for cloud applications, but are exposed to potentially sensitive application data. Recently, hardware trust mechanisms such as Intel's Software Guard Extensions (SGX) offer Trusted Execution Enviroments (TEEs) to shield application data from untrusted software, including the privileged OS and hypervisors. Such hardware support suggests new options for securing third-party coordination services.

We describe SecureKeeper, an enhanced version of the ZooKeeper coordination service that uses SGX to preserve the confidentiality and basic integrity of ZooKeeper-managed data. SecureKeeper uses multiple small enclaves to ensure that (i) user-provided data in ZooKeeper is always kept encrypted while not residing inside an enclave, and (ii) essential processing steps that demand plaintext access can still be performed securely. SecureKeeper limits the required changes to the ZooKeeper codebase and relies on Java's native code support for accessing enclaves. With an overhead of 11%, the performance of SecureKeeper with SGX is comparable to ZooKeeper with secure communication, while providing stronger security guarantees with a minimal TCB.

Unifying HDFS and GPFS: Enabling Analytics on Software-Defined Storage

Ramya Raghavendra (IBM Research), Pranita Dewan (IBM Research), Mudhakar Srivatsa (IBM Research)

Distributed file systems built for Big Data Analytics and cluster file systems built for traditional applications have very different functionality requirements, resulting in separate storage silos. In enterprises, there is often the need to run analytics on data generated by traditional applications that is stored on cluster file systems. The absence of a single data store that can serve both classes of applications leads to data duplication and hence, increased storage costs, along with the cost of moving data between the two kinds of file systems.

It is difficult to unify these two classes of file systems since the classes of applications that use them have very different requirements, in terms of performance, data layout, consistency and fault tolerance. In this paper, we look at the design differences of two file systems, IBM’s GPFS and the open source Hadoop’s Distributed File System (HDFS) and propose a way to reconcile these design differences. We design and implement a shim layer over GPFS that can be used by analytical applications to efficiently access data stored in GPFS. Through our evaluation, we provide quantitative results that show that our system performs at par with with HDFS, for most Big Data applications while retaining all the guarantees provided by traditional cluster filesystems.

Accepted Papers