Discusses dimensionality reduction terms such as SVD, LSA, pLSA and MinHash, including what these terms have in common, how they differ and when it’s appropriate to use which. The course is relevant for modelers, business intelligence and technical developers.
Daniel Eklund presents a survey of various Big Data relational technologies and discusses how to use theory to dissect the newest technology.
Dean Wampler discusses the strengths and weaknesses of MapReduce, and the newer variants for big data processing: Pregel and Storm.
This powerpoint presentation covers an intro to Hadoop and the use of Predictive analytics using Storm, Hadoop, R on AWS.
One of the first roadblocks many developers face when trying to learn about Hadoop is simply getting an installation of Hadoop working locally that they can use to test. This guide shows you how to install Hadoop on a Mac so you can start playing with Hadoop today. It also provides tips and tricks on how to debug and test the distributed MapReduce jobs you create.
The growth of Internet businesses led to a whole new scale of data processing challenges. Companies like Google, Facebook, Yahoo, Twitter, and Quantcast now routinely collect and process hundreds to thousands of terabytes of data on a daily basis. The most important of the storage techniques used by these companies is discussed in this whitepaper.
Previous knowledge of Hadoop is not necessary, but you should be comfortable using R interactively from a command shell in addition to a GUI.
Think Big Analytics provides data science, engineering and training services that quickly help companies meet their business goals. We identify and prioritize the best opportunities for Big Data projects based on your desired business outcomes. We then assemble the right architecture and custom applications that create real value. Our unique and proven methodology ensures our clients begin to see ROI within the first 40 days of a project.


