Demystifying Dimensionality Reduction

Datasets come in all shapes and sizes. Some are tall and skinny (lots of samples), some are short and wide (less samples more variables), and some are both tall and wide (common for network graphs).  In many cases, the number of variables becomes too large to manage effectively in memory.

Dimensionality Reduction is a common method of modeling data as a smaller representation, which maintains item similarity (as measured by the “distance” between two samples).

Unfortunately, the techniques generally employed in Dimensionality Reduction come with intimidating and unwieldy names such as Singular Value Decomposition, Principle Component Analysis, Latent Dirichlet Allocation, K-Means Clustering, Latent Semantic Analysis, Random Projections, etc. Many of the techniques involve the same underlying mathematics, they were simply developed independently for different domains.  

Jeff will attempt to demystify the topic a bit by explaining what it is, why you would use it and many of the common underlying themes, terms and and approaches in everyday English.

Click Here to view this presentation

See the video here.
Demystifying Dimensionality Reduction

Questions about this topic? Don’t hesitate to contact Jeff Hansen directly at [email protected]