While advances in deep learning and computer vision have enabled markerless pose estimation in individual animals, extending these to multiple animals presents unique challenges for studies of social behaviors or animals in their natural environments.
One potential solution is Social LEAP Estimates Animal Poses (SLEAP), a machine learning system for multi-animal pose tracking. The system, developed at the Salk Institute for Biological Studies, aims to enable workflows for data labeling, model training, and inference on previously unseen data.
Here is an exclusive Tech Briefs interview, edited for length and clarity, with Salk Fellow Talmo Pereira.
Tech Briefs: What was the biggest technical challenge you faced while developing SLEAP?
Pereira: One major technical challenge that we have to resolve with SLEAP is the fact that the whole ML OPS tooling and the way that deep learning frameworks like tensor, fluid pipe, torch, and so forth are structured, is typically used following the industry-style model where training is done on the research and product side, and on the client side, we’re just doing inference. We’re taking a model that we train and running predictions. That might even run on the company servers, for example.
But when we're building SLEAP, we’re building a fundamentally different paradigm. We want to give users the ability to train their own model. That is the whole point of this thing — we’re taking a general algorithm that works for a particular computer vision task, code estimation, or markers motion capture, and we want to give researchers the tool to be able to train their own models to do so. This continues to be a major engineering roadblock because there are just a lot more dependencies required to do training. They don't have cross-platform support for different types of end-user hardware, and, frankly, the whole UX of the packaging and distribution is a lot more painful than it is for an inference-only app.
Tech Briefs: Can you explain in simple terms how it works?
Pereira: The way that SLEAP works is that it's intended to be a framework for the end-to-end workflow for going from raw images, videos, all the way through to having trained neural network models and inferred results and to give you the ability to inspect and proofread those results for doing post tracking. This task is detecting the locations of individual landmarks, or in this case animals, and potentially multiple interacting animals — for any type of animal, any number of animals, in any kind of experimental setting. I say animal, but really it generalizes to pretty much anything that you can think of — plants, humans, inanimate objects. It doesn't really matter, right? We’re really kind of targeting the use case and workflow of the animal behavior researcher.
Tech Briefs: What are your next steps? Any updates you can share?
Pereira: One of the things that we're doing is leveraging the fact that we can capture and quantify subtle patterning in body language and essentially use that as a marker for the underlying biological processes that give rise to that particular pattern of body language.
In natural language processing, a common task to do ahead of time — before many advances — was to take out all the punctuation because this makes it much easier to tokenize individual words and so forth. You get rid of the punctuation and for the most part still retain a lot of the same structure. Somebody a while back had the idea to take out all of the non-punctuation instead and only leave in the punctuation. They do this with a whole bunch of different books and authors and so forth. And it's really interesting to see the pattern, the sort of signature, of how different authors punctuate.
So, you can imagine that patterns in movements, subtleties in how you actually perform your movements, is to a behavior as punctuation is to a given writing style. Just like how you can infer from the structure of text evolving, you can also do so from the patterning of body movements.
Just like somebody who is in an emotional state in which they're very manic will switch their punctuation style from lots of exclamation points to lots of ellipses, perhaps an organism that has a disease will begin to exhibit these little patterns of the punctuation.
Using this technology, we can now pull out this sort of structure and use that as a signal to predict disease possibly even earlier than we could before. So now it’s serving as not only a screening and diagnostic tool, but something that we can use to advance our basic research to receive the neurobiology of the disease.
Tech Briefs: Do you have any advice for engineers or researchers aiming to bring their ideas to fruition?
Pereira: Definitely invest in your DevOps and ML OPS. The practice of doing machine learning research is frankly quite often hampered by lack of proficient software engineering. I think if you're going to be doing something where you're building a framework, especially as a researcher, there are not a lot of incentive structures in academia to put the time and work and effort into creating clean frameworks, but it really pays dividends. And it really helps to ensure that other people will be using your work.