Statistics (not so) 101 - applications across disciplines
Today I’m going back to fundamentals, and one of my favorite topics, statistics. Full disclosure, I had almost no exposure to statistics in my undergraduate engineering studies (yikes), but PhD students in UW’s Department of Anthropology are required to take a series of Biostatistics courses before their Comprehensive Exam, and I liked it so much that I expanded my courses to complete a certificate in Computational Statistics. One of those courses was Spatial Statistics with Dr. Jon Wakefield, where I was introduced to clustering analyses. Britain’s Dr. John Snow was one of the original spatial statisticians / epidemiologists with his analysis of clustering of cholera cases around pumps in London. By looking at the clustering (or areas of higher density) of cholera deaths on a map, he was able to associate cholera outbreaks to pump locations.
Dr. John Snow’s map of cholera outbreaks in London SoHo.
So clustering can quantify event incidences happening in similar locations, or you can apply this theory to quantifying similarity of “events” in multiple dimensions - one approach to this is called Principal Component Analysis (PCA). Very roughly, PCA takes complex, multi-dimensional datasets and simplifies them by determining which variable interactions are the most influential to the structure of the data, and reduces the dataset to two dimensions. You can then identify clusters within the data in two dimensions, check out the visual below. This is a super powerful combination of techniques that is used in a variety of different fields, like Epidemiology, as I mentioned above, but also in Biological Anthropology in comparing bone morphology (shape) across fossil hominins (ancient humans), and in astrophysics; Dr. Sam used PCA and clustering in searching for new planets, and now he’s applying these techniques to building our Large Movement Models. Everything goes back to statistics 🧠
Reducing a 3-dimensional dataset to 2-dimensions in order and identifying clusters.
What our clustering looks like with some of my data!
Want to know a bit more about my background and inspiration? Check out this segment from my interview on the Tech Business Podcast with Paul Essery!