Human hand marked with GlowTrack fluorescent tags. (Image: Salk Institute)

Movement offers a window into how the brain operates and controls the body. From clipboard-and-pen observation to modern AI-based techniques, tracking human and animal movement has come a long way. Current methods utilize AI to automatically track parts of the body as they move. However, training these models is still time-intensive and limited by the need for researchers to manually mark each body part hundreds to thousands of times.

Now, Salk Associate Professor Eiman Azim and his team have created GlowTrack, a non-invasive movement tracking method that uses fluorescent dye markers to train AI — capable of tracking a single digit on a mouse’s paw or hundreds of landmarks on a human hand.

Here is an exclusive Tech Briefs interview — edited for length and clarity — with Azim and First Author Daniel Butler, a Salk bioinformatics analyst.

Tech Briefs: What was the inspiration for GlowTrack? What made you try the fluorescent dye markers?

Azim: I'll tell you the inspiration for the types of science we do, and then Dan can chime in on his inspiration for the solution. We study movement. We want to understand how the brain and spinal cord control the body. To do that you need to be able to quantify movement in some detail so that you can relate what the body does to what the nervous system does. Now that said, there are a lot of other people who quantify movement for many different reasons. Ethologists do it to understand how animals operate in their environment and roboticists do it to understand how a natural body moves so that they can imitate it. So, there are many different people who care about movement, but traditionally the way that you quantify it has been relatively low throughput and pretty biased to the investigator involved.

So, classically, you just watch, and describe what you see. What people did for quite a while, including myself when I was a postdoctoral fellow, was to place physical markers on the body, which is what they're still doing in Hollywood when they're making films like Lord of the Rings. They have fiducial markers all over the body of the subject, which works really well. But it's pretty difficult to do that on animals, especially small ones that don't like having physical markers attached to their bodies. So, we wanted to take advantage of the revolution that's been happening in the research world. Many of the ideas that were developed for human motion tracking and for training artificial neural networks to automatically detect items of interest have revolutionized those fields and have been, in the past five to seven years, brought into research.

The general idea of how they work is that you collect all your video data, but you don't need to attach markers to your animals before you collect that data. A big problem of doing that is you have to pick what you're going to quantify before you can do your experiment, and then you're stuck with the markers that you tracked. If you can decide after the fact what you want to track, then it opens up a lot of flexibility for research. So, what's been done by several different groups has changed how nearly all of us are doing tracking of animal movement. It is to take advantage of advances in deep learning and by training neural networks, to find the things you care about.

The way you do that in the lab is you take a subset of all the data you collect. Let's say you've collected 5 million video frames, which is not an infeasible number for the amount of data we collect. You can take a small subset, let's say a thousand of those video frames, and you have a human annotator click on the things you care about, to image the wrist, the fingers, the elbow, the shoulder, the eyes, the nose, you name it. If you do that enough times, you can train a neural network to find all those landmarks on the rest of your video frames. Then you can get the quantification for your experiments. That works really well in isolated environments — environments where nothing changes — the camera angle stays the same, the lighting conditions stay the same, the animals stay the same, the behavior stays the same. The reason for that is these networks are very hyper-specialized to the environment that they were trained in.

What you really want though, is neural networks that are more versatile and can generalize to new contexts.Imagine a self-driving car; if it knows how to see a stop sign on a sunny day, you also want it to find a stop sign on a snowy night. So, you need flexible neural networks. The way that you generate those neural networks is to have large amounts of training data. But it's not sufficient just to have quantity, you also need that training data to be visually diverse.

From left: Daniel Butler and Eiman Azim. (Image: Salk Institute)

So, Dan came up with a really clever idea of how to take the human annotator out of the equation. A human's never going to sit there and label a million frames of data. But Dan can generate a million frames of high-quality, visually diverse training data in one afternoon. When you train a neural network with that, what we show in the paper, is that now it is generalized to all kinds of contexts and it's much more versatile for doing the kind of quantification that we need.

Tech Briefs: What was the biggest technical challenge you faced while developing GlowTrack?

Butler: Yeah, there were a lot; there were definitely a lot. [laughs] It was kind of one after the next, but one of the big challenges was that the fluorescence is not just exhibited by the dye, but, in a typical lab environment that an animal moves around in, you might have a few different objects that exhibit fluorescence. So, to start with, when we were trying to implement this idea, the first stumbling block was that, we had lots of objects lighting up when we illuminated the animal with UV illumination.

So, what we eventually figured out was that although the different objects fluoresced, they fluoresced with different decay times. So, if we cut the UV light off and let the objects exhibit their natural decay, we were able to find dyes that decayed just a few milliseconds more slowly than the ambient fluorescence in the environment. What that meant was we could use this trick — which we called tri-phasic illumination — to get only the dye we introduced into the scene to fluoresce and be captured on the video.

Tech Briefs: You mentioned pairing GlowTrack’s capabilities with other tracking tools that reconstruct movements in three dimensions and with analysis approaches that can probe these vast movement data sets or patterns. How is that coming along? Do you have any updates you could share?

Azim: The paper came out last week, so we're excited to see how it gets adopted. We've definitely already been talking to colleagues who are in our universe who want to use these kinds of approaches to train models for tracking their own behavioral setups. These could be for different types of animals, but still within the animal behavior sphere.

But we can envision ways where this is useful for anybody who wants to train neural networks to identify visual features in a scene without having humans manually label everything. We were thinking about some of the applications where this might be relevant, you can think about robotics, you can think about things like virtual surgery environments where surgeons are training on artificial virtual models. You can imagine this could be useful for an industry where it makes a lot of sense to be able to have models trained on huge amounts of training data — whether they're animals or stop signs.

So, we see a lot of potential applications and the idea of having the fluorescent stroke alternate between visible and invisible can be applied to a lot of different optics. One thing that was very useful but may not be apparent in the paper is that these neural networks get trained on particular scales.

For example, with our applications in some videos, a mouse takes up half of the entire frame of the video, but in other videos a mouse might be much smaller, like a quarter of the size. The neural networks don't love that big change in scale. Dan developed a really clever, automated way — again, without the human having to intervene — to re-scale the images that you're testing, that you're labeling automatically. There are a lot of different optimization steps throughout the pipeline that we think could be used for anybody who cares about neural networks, such as automatically detecting landmarks in an image of interest.

Tech Briefs: Do you have any advice for engineers or researchers aiming to bring their ideas to fruition?

Azim: Stick with it. This project started with a very different approach. Something called adversarial networks. What we described in this paper arose from what we learned from that. But it would've been very easy to give up when that approach wasn't generalizing to the extent that we needed. So, being tenacious but also having a good team; Dan really led this entire effort. But there were enough people around in the lab and in neighboring labs to give him the video data he needed and the animals that were already trained to do behaviors that he needed so that he could build his expertise into infrastructure that already existed. So, have a good team and stick with it.

Butler: Building on the idea of sticking with it: I think when you're working on a technical problem and things aren't working and it's difficult, I think that's an indication that you're working on something important that other people haven't been able to solve yet. I think it's even more valuable to continue when the problem is difficult to solve.