High Context Data - the art of labeling
Last week I talked about how an interdisciplinary team is one of our biggest strengths in handling human movement data. This week I’ll talk more about why this data is challenging to work with and how crucial context is for human data. To start - humans are complicated. While we are generally structured the same, two legs, two arms, torso, head, etc., and move around bipedally, the way each of us moves is unique. The way you move is fundamentally dictated by your physical proportions, mobility, and health, but you also adapt to the environment that you’re moving within (uphill, downhill, slippery surfaces, weather conditions, etc.), what shoes you’re wearing, how you’re feeling that day, and even the past - old injuries, activity history, and childhood development. This is a nightmare for biomechanics research “in the wild” because there are so many variables to control when you want to publish population-level findings based on individual-level data (I’ll talk more about representation in data in the future). One of the most crucial components of model building and training is in applying labels to your data. More variables require more labels to provide context for the data you are processing. This quickly becomes overwhelming when you leave the constraints of a laboratory environment, especially if you’re not familiar with which variables matter to human movement. While gathering more data is one option (the big data option), we’re focused on contextualized data (thick data) - taking into account an individual's habitat, among other factors, not just the data from their wearable sensors. This isn’t a problem you can simply throw another AI model at (yet!), first you need humans who understand the inputs to tap into the full potential of the outputs. 🧠