Facebook has developed Presto, a distributed SQL query engine optimized for running ad-hoc interactive analytic queries against data sources ranging in size from gigabytes to petabytes. A single Presto query can combine data from multiple sources and provide responses in times ranging from sub-seconds to minutes.
The engine was designed with a basic storage abstraction that should make it easy to provide SQL query capability against HDFS, other well-known data stores such as HBase, and custom systems such as the Facebook News Feed backend, says Facebook's Martin Traverso. Storage plugins provide interfaces for fetching metadata, getting data locations, and accessing the data itself. "Presto is 10 times better than Hive/MapReduce in terms of CPU efficiency and latency for most queries at Facebook," Traverso says.
Although Apache Hive has become the most common infrastructure for querying massive data warehouses on Hadoop clusters, Traverso says Presto's execution model is fundamentally different than Hive, making it better-suited to interactive queries. "We are also working on a query 'accelerator' by designing a new data format that is optimized for query processing and avoids unnecessary transformations," he says.
View Full Article
Abstracts Copyright © 2013 Information Inc., Bethesda, Maryland, USA
No entries found