This post is about how and why I pieced together my own motion-detection and encoding system. I wanted to produce smooth HD videos whenever motion was detected. Find the code here.
Note: I plan to add videos and stills to make this more than a wall of text, but wanted to get it out before the holidays.
Raspberry Pi computers are powerful despite being small. Did you know the raspivid program can capture 1080p at 30fps without breaking a sweat? Given that, I figured it should be possible to layer on motion detection without impacting the HD video quality too greatly. But like many things, it’s harder than you might think.
By now you’re probably thinking, “have you tried the motion package that’s available in Raspbian?” I have! I’ve used it extensively but couldn’t get it to produce smooth videos at 720p and 15fps. So this was the perfect opportunity to kick off a new project, write some real-world multi-threaded python code, explore what the Raspberry Pi 3 is capable of, and learn more about opencv and ffmpeg along the way.
- Raspberry Pi 3 B+. It has 4 CPU cores operating at 1.4Ghz, 1GB of RAM
- Fast 32GB MicroSD card. The 32GB version of this one.
- A Pi Camera v2
- Python 3 with the picamera module
I started out in August 2018 with a simple Python 3 script that captured frames using the picamera Python module. It captured into bgr format which played nicely with numpy and opencv. I then explored the individual pieces of motion detection:
- Capturing a frame
- Converting to grayscale
- Calculating the pixel differences between the two frames
- Clamping each pixel 255 if it crosses a threshold, 0 otherwise
- Adding up the pixel values to see which percentage had changed
Once I got that working, I tried using opencv to save the video. IIRC this used ffmpeg libraries behind the scenes, but there was no way to configure it to use the Pi’s hardware h264 encoder, so I decided against having opencv encode for me. Even if I didn’t abandon it, doing motion detection and encoding within a single process (and a single CPU core) was not going to work long-term. In place of having opencv encode for me, I decided to fork out to ffmpeg for maximum control. I passed “-c:v h264_omx” to ffmpeg to use the Raspberry Pi’s built-in hardware encoder, and piped frames to it. This setup freed ffmpeg to use a different CPU core. After all, we have 3 more CPU cores waiting to be used.
At this point I explored the multiprocessing module in order to find a reliable way to transfer frames to another thread. They all turned out to be overkill as pushing and popping from lists in Python is an atomic, thread-safe operation. With this knowledge I decided to use lists as cross-thread queues and avoid the more complicated multiprocessing data structures.
In September 2018 I focused on squeezing more performance out of the thread that was doing capture and motion detection. Switching to picamera’s
capture_continuous function helped a bit. Then in May 2019 I finally realized I didn’t need to check for motion every frame. Doing so once per second was plenty frequent. Big win here!
I then focused on the low quality of the videos that were coming out of my setup. They were very blocky regardless of the framerate, especially at resolutions above 640x480. So I tuned the Constant Rate Factor, which didn’t help.
Perhaps I was mis-understanding what was going on with the frame data behind the scenes … maybe picamera was reusing buffers. In a multi-threaded setup like mine, it could mean the buffer contained pixels from multiple frames, leading to ghosting or other strange artifacts. So I made copies of the frame buffer object before pushing onto the queue. Didn’t help.
Maybe the bgr data coming out of picamera isn’t as high quality as I thought. I read somewhere that the camera encodes to jpeg really well and at high speed, so I switched to that. Didn’t help.
Maybe something is screwy with ffmpeg and jpegs? So I used some sample picamera code to capture 50 jpeg stills. I verified that they were high quality by viewing them in Firefox. Then I found some sample ffmepg code to turn jpeg files into an mp4. See below:
ffmpeg -r 10 -f image2 -i image%02d.jpg -c:v h264_omx -crf 23 combined.mp4
Hmm, that looks horrible. Maybe it’s the encoder? I tried to change “-crf” to 18 but it didn’t help. Then I noticed something I hadn’t seen before:
Codec AVOption crf (Select the quality for constant quality mode) specified for output file #0 (combined.mp4) has not been used for any stream. The most likely reason is either wrong type (e.g. a video option with no video streams) or that it is a private option of some encoder which was not actually used for any stream.
What?! CRF isn’t supported by this encoder! Geez. I did some searching and found that the default bit rate for h264_omx is very low. So I specified it manually
ffmpeg -r 10 -f image2 -i image%02d.jpg -c:v h264_omx -b:v 800k combined.mp4
So much better! So the cause of the low-quality videos was the h264_omx encoder. Raising the bitrate helped, but it’s not a good long-term solution (as we’ll see later).
With a hard-coded bitrate of 800k I proceeded to test and noticed the ffmpeg process was maxing out one core of the CPU, and that ffmpeg kept encoding long after motion had stopped. Since the code queues up frames for encoding and stops enqueuing after motion stops, ffmpeg’s run duration seemed legitimate. Encoding was taking a really long time. It didn’t even help to give ffmpeg permission to use more than 1 core by using “-threads 3”. It never made use of additional cores. This is indeed another limitation of the h264_omx encoder.
A hardware encoder is good in theory
So while I can detect motion in high def, doing so is pointless if encoding can’t keep up. The frames will queue in memory, running the risk that we’ll fill up 1GB of RAM and the process will get OOM-killed. I don’t want that.
So I switched back to the standard h264 encoder along with “-crf 18 -threads 3”. Yes! Now we have HD video that gets encoded in realtime, triggered by motion detection.
Since we’re streaming JPEG format straight from the camera hardware, the next logical step was to make them available as MJPEG to a browser for live-viewing. Luckily with a bit of fiddling I got this to work too.
So here we are. My project that began in August 2018 now has something to show. Motion detection, encoding to MP4, live streaming, all in 720p at 20fps. There’s plenty of room for improvement but since life is busy for me these days, it’ll probably be something I revisit again next year. In the words of Alf: Ha!
Thanks for reading!