MapReduce and Its Discontents

Apache Hadoop is the current darling of the “Big Data” world. At its core is the MapReduce computing model for decomposing large data-analysis jobs into smaller tasks and distributing those tasks around a cluster. MapReduce itself was pioneered at Google for indexing the Web and other computations over massive data sets.
 
In this talk, I describe MapReduce and discuss strengths, such as cost-effective scalability, as well as weaknesses, such as its limits for real-time event stream processing and the relative difficulty of writing MapReduce programs. I briefly show you how higher-level languages ease the development burden and provide useful abstractions for the developer.
 
Then I discuss emerging alternatives, such as Google’s Pregel system for graph processing and event stream processing systems like Storm, as well as the role of higher-level languages in optimizing the productivity of developers. Finally, I speculate about the future of Big Data technology.
 

Click Here to view this presentation