The difference between smart and dumb headphones is that the smart ones go beyond playing music — they can be physiological monitors and virtual touchscreens. Xiaoran Fan, while a doctoral candidate at Rutgers University, led a team of researchers that developed a method called HeadFi that uses ordinary headphones as sensors.

Tech Briefs: How did this project start?

Xiaoran Fan: I'm an audiophile, so I’ve been interested in headphones. Although simple straightforward headphones are used for applications like studio mixers and home audio, recently, we’ve been seeing smart headphones from Apple, Samsung, and Microsoft.

Xiaoran Fan

We’ve always known that the drivers in headphones, in principle work similarly to microphones, so in some sense they are reciprocal. Since microphones can sense signals, that means headphones can by default also do it. So, although headphones had the potential for being smart, nobody had yet used that IQ. That was the initial incentive that started me in this direction. Plus, my advisor, Rich Howard has spent his lifetime doing small signal measurements, so when I talked with him about this idea, he pointed out some methods I could try. Then I dove in and after a lot of exploration and experimenting, we were able to publish a paper on our work.

Tech Briefs: How are headphones used as microphones?

Fan: The driver itself is complicated — it has resistance, capacitance, and inductance — it's a complex impedance system. The exact technology depends on what type of headphone you have, but basically, all of them are just transducers that convert electrical signals into mechanical signals. These transducers can, in principle, be made to operate in reverse. Mechanical signals could vibrate the diaphragm, which would move the voice coil back and forth to generate electrical signals; so, in principle they are reciprocal.

But the problem is that the headphones are optimized for playing music — the music signal is dominant. Although you can record an excitation signal from outside at the same time, it could be 1000 times smaller than the music signal. So, the challenge was how to do a sensing task while the headphone is still playing music. If we cannot do that and the user has to stop the music in order to use the sensing function, then this is not useful.

So, we did something very interesting. Headphones come in pairs, with a left driver and a right driver. We took advantage of the fact that headphones are manufactured to have the left and right drivers match each other. That means that the sound signals in the two are balanced. We can therefore use the input signal from the left driver to cancel the input signal from the right driver using the fact we know the exact music signal we are playing for both channels. If, therefore, there is a difference between the outputs of the left and right drivers, subtracting the outputs will produce a difference signal.

Let's say I speak — the two headphones will capture my voice but the voice from my mouth to the left driver and the voice from my mouth to the right driver propagates through different channels. The physical channel between my voice and my left ear and the one between my voice and the right ear are not the same. This is because my bones and tissues are structured differently from left to right.

So, if you do a subtraction, there will be a difference between left and right drivers, which can be captured. This subtraction cancels the music signals but enables us to capture the sensing signal differential. We can use that small piece of information to do something.

“Dumb” headphones can be plugged into a HeadFi device that connects to a cellphone, turning them into intelligent headphones. Engineers are working on a smaller version of the device. (Image Credit: Siddharth Rupavatharam)

Tech Briefs: Could you explain to me how you could use the headphones to measure things like identifying the user; monitoring heart rate; and recognizing gestures.

Fan: We actually presented four applications in our paper: gesture recognition, heart rate monitoring, user identification, and also — the simplest one — voice communication.

Take heart rate monitoring as an example. When you’re hurrying, your heart is pumping — it produces a mechanical vibration throughout your body, which the headphones can capture.

But just as with voice, the channels from your heart to the left headphone and to the right headphone are different. By running the difference signal through our algorithm, we can find the period of your heart rate.

As for gesture recognition, let's say you tap or touch the right enclosure of the headphones. The right driver will receive the touch signal, but the left will receive a much weaker touch signal. After you do the subtraction, you will know the phase. If the rising edge is in one direction, that means it’s the right phone, If the rising edge is in the other direction, it’s a touch on the left.

But there are other more advanced ways to define a gesture. For example, the signal produced by a scratch will be more complicated. In that case, you can apply some deep learning methods to learn the signal pattern to identify the gesture.

But I think the most interesting application is user identification. The way it works is that the headphone generates a swipe signal — swiping up a frequency band and sending the signal into your ear. The signal propagates through your ear canal, reflects back, and is captured by the same headphone driver that generated it. The left and right drivers both capture the signal and do the subtraction. The interesting part is that everyone’s ear canal has a different structure — It’s like a fingerprint — so that means the echo received by the headphones will be different for everyone. What makes it even more interesting is that for everyone, the left and right ear canals are different, so if you do the subtraction there's a difference signal. And that difference is also different from person to person — even between identical twins. We did experiments with identical twins and we had an over a 95 % success rate. I think that's a cool part of the application.

Tech Briefs: Why do you use a high frequency signal?

Fan: The reason we generate a high frequency is because It's sort of like a CT scan — an ultrasonic CT scan. We swipe a range of frequencies because our ear canal has different structures that higher frequencies can explore with better resolution, in order to identify the distinctive shape of a particular ear canal. We sweep the frequencies to find the one that gives us the best results for that ear.

Tech Briefs: How do you generate the sweep signal?

Fan: A chirp generator is included in our software.

We have a fully automated process. There are two steps. The first step is to detect whether your headphones are on your head, then you run the application.

We use an interesting trick to check whether the headphones are on your head. It’s based on the seashell effect. When you're walking on the shore, if you pick up a seashell and clamp it close to your ear, you hear something like a sea noise. That's because the seashell and your ear canal form a semi-sealed enclosure of space that resonates and amplifies some frequency.

It’s the same with headphones. When you have the headphones on your ear, some frequency has been amplified and we can detect whether or not the headphone is on your ear by simply looking at the signal strength at various frequencies. If the headphone’s on your head, we start sending a chirp.

But we’re not limited to these four applications. We could also do measurements like step counts and respiration monitoring.

Tech Briefs: How would you know whether an input is from steps or from breathing?

Fan: The signals would be different from each other so you could use a deep learning model to distinguish them.

This could also be used for eldercare. It could recognize if a person fell, so we should call 911. We are also planning to work on that.

The core intellectual contribution for our project is we presented a platform that can make stock headphones become smart, which enables an array of possible applications. At a high level, this technology can enable a ubiquitous human/headphone/network interface because so many existing "dumb" headphones out there could benefit from HeadFi.

Tech Briefs: When you say platform, do you mean software?

Fan: It’s a software/hardware solution. The hardware is a Wheatstone bridge to do the cancellation between the left and right drivers. After that, we need to do signal processing for tasks like classification, including support for vector machine or deep learning frameworks. So, although it's a software/hardware combined solution, the hardware can be extremely simple — as simple as just two resistors.

Tech Briefs: You have an adapter that you plug in?

Fan: Yes, that's our current prototype of the hardware.

Tech Briefs: And then you would have to design an app to download load into the smartphone?

Fan: Yes, the adapter we are currently using has a USB Type C connection, which is quite common with headphones and cell phones right now.

Tech Briefs: Do you have a rough idea of when this might become commercialized?

Fan: We are using the Rutgers commercialization center website to look for a partner at this moment and are exploring potential partners like Apple or Samsung or Microsoft. We are also developing upgraded hardware, since our current adapter is just an open printed circuit board. So far, we have been able to shrink the board size at this moment down to 3 cm x 2 cm. We are aiming to make it plug and play and we're also writing Android software to make it easy to demonstrate.

An edited version of this interview appeared in the May 2021 issue of Tech Briefs.