Capture Movement Like a Pro: How to Implement Motion Detection in JavaScript

C

Case

The client had a request: Is it possible to determine how energetically people are dancing?

The idea behind it was to measure people’s energy levels. As you may already guess, it was an ad campaign for one of the energy drink brands.

The campaign wasn’t designed to be an online campaign. The plan was that people will be dancing on the stage that was built for the ad campaign, and our software should be measuring their energy level during the dance in real-time.

But how?

Building

I’ve tried to design a system where high-speed internet is not an essential part of the software. If once we were able to open a page and javascript was loaded, it meant we’ll be able to operate as intended. Reducing the connection to the server meant most of the heavy lifting had to be done in the front end by Javascript. (after technical details, I’ll come back to this method and explain how it saved our operation big time)

This is how I designed the system:

https://i.imgur.com/Quex4Xp.png

The goal was every dancer should see his result in real-time on the big screen, to know how well he/she’s doing. In this way, the person performing would know if he/she should speed up or can slow down a bit if he/she is out of breath.

To achieve this, we had to show a progress bar that changes according to the dancer’s performance. And at the end of the performance, we had to send a recorded act of dance with the progress bar’s fluctuations to the server to store.

And last but not least, we had to be able to control the software remotely, because we shouldn’t interact with the notebook that mirrors the big screen.

Once requirements were established, it was time to build the actual system.

To determine and rate the energy level of the dancer, I decided to use motion detection technology. In other words, if there are too many movements in the frame, it’s probably an energetic dance. And to detect the motion using Javascript meant I had to compare every frame the camera sent me to see if there were enough movements and, if so, check how many pixels changed to rate the performance.

To rate the performance, I’ve set the minimum threshold to check if the amount of pixel changes were greater than the threshold, which meant the person on camera dancing energetically enough, and added 1% to his progress if pixel changes were below the threshold I penalized dancer by reducing his progress by 1%. This meant if the dancer decided to stop for a while to take a breath, his progress would keep declining. It would continue until the dancer hit 100%, which meant he won.

Suppose you want to dig a little deeper to understand the technical background of motion detection; I strongly recommend you to check out the articles I mentioned at the end of this note. Once you’ve understood how things work under the hood, you won’t feel like you’re shooting in the dark when you're using a library. For example, I’ve used “diff-cam-engine” for motion detection.

When I was done with the game mechanism of the app, now I had to work on the part where I had to record the performance video, merge it with the progress bar animation, and add the music to the background, which played when the dancer performed.

To do it, I’ve decided to use hidden canvas, where I drew a video and the animation of the progress bar side by side in real-time. When the performer finished the act, I added the music to the background using WebRTC and sent the final product to the server. This was the only time I’d sent a huge chunk of data to the backend. (I’ve listed tutorials and code examples of the WebRTC at the end of this note)

Example of the canvas recording:

After dealing with the video editing part of the app, it was time to handle the remote control part of the system. If you remember, one of the requirements was that we shouldn’t interact with the monitor screen in any way. To achieve seamless integration with the backend, sockets came to the rescue. The front end of the app listened to the socket to receive the new player’s data and start and stop the command of the game. These pieces of information were sent from the app's control panel, which listened to the socket to receive info about the process that happened on the app's front end.

When it comes to the backend, Python is my go-to language. Because of Django’s solid structure and built-in features, I’ve decided to use Django at the backend this time as well. And handling the socket connections was fairly easy, too, because of Django channels. A simple Django setup inside the docker container did the trick. Here’s the setup example that I’ve used.

https://i.imgur.com/sdPa3tU.png

Conclusion

As you may already know, when you’re in the field, always something goes wrong. For example, in our situation, on the 2nd day of the event, the internet connection was so poor that uploading recorded videos to the server was a headache. And luckily, because of the system’s architecture and most of the processes being handled by javascript on the front side of the app, it wasn’t a big deal. We just turned off the video recording and uploading them to the server from the admin panel, and we were good to go. The app's usage didn’t interrupt, and everybody was happy.

I was already familiar with video editing with FFmpeg at the backend and recording webcams with the help of ActionScript and Nginx RTMP. Still, the result of working on videos with Javascript was surprisingly positive too. And I’m looking forward to using it in future projects as well.

In conclusion, the project was completed successfully.

Below is a list of the articles and libraries that I looked into before starting the project may be helpful in your situation as well:

Libraries:

Articles: