This post is about how and why I pieced together my own motion-detection and encoding system. I wanted to produce smooth HD videos whenever motion was detected. Find the code here.

Note: I plan to add videos and stills to make this more than a wall of text, but wanted to get it out before the holidays.

Raspberry Pi computers are powerful despite being small. Did you know the raspivid program can capture 1080p at 30fps without breaking a sweat? Given that, I figured it should be possible to layer on motion detection without impacting the HD video quality too greatly. But like many things, it’s harder than you might think.

I know what you’re thinking … “have you tried the motion package that’s available in Raspbian?” Yes, I have but was slightly disappointed. If you know of a way to configure motion to produce smooth videos at 720p and 15fps and beyond, please let me know.

Given what I wanted to do, this was the perfect opportunity to kick off a new project, write some real-world multi-threaded python code, explore what the Raspberry Pi 3 is capable of, and learn more about opencv and ffmpeg along the way.

Materials

The journey

I started out in August 2018 with a simple Python 3 script that used the picamera module to capture bgr-formatted frames for use with opencv. I then explored the individual pieces of motion detection:

  1. Converting to grayscale
  2. Calculating the pixel differences between two frames
  3. Clamping each pixel to 255 if it crosses a threshold, 0 otherwise
  4. Adding up the pixel values to see which percentage of pixels had changed

Once I got that working, I tried using opencv to save the frames as MP4 video. IIRC opencv uses ffmpeg libraries behind the scenes. Unfortunately it didn’t support the RasPi’s hardware h264 encoder. It also would have bound us to a single CPU core. Doing motion detection and video encoding within a single low-power RasPi CPU core was not going to work long-term, so it made sense to move towards a “proper” solution early on.

The common way to make use of additional CPU cores is to fork a new process and do some form of inter-process communication. So I forked an ffmpeg process with subprocess.Popen, grabbed stdin, stderr and stdout file descriptors, and piped frames to it. I also specified the “-c:v h264_omx” CLI param to use the Raspberry Pi’s built-in hardware encoder.

At this point I explored the multiprocessing module in search of a reliable way to transfer frames between threads. They all turned out to be overkill as pushing and popping from lists in Python is an atomic, thread-safe operation. With this knowledge I decided to use lists as cross-thread queues and avoid the more complicated multiprocessing data structures.

In September 2018 I focused on squeezing more performance out of the capture and motion detection logic. Switching to picamera’s capture_continuous function helped a bit. Then in May 2019 I finally realized I didn’t need to check for motion every frame. Doing so once per second was plenty frequent. Big win here!

A few months later I turned to the low quality of the videos that were coming out of my setup. They were very blocky regardless of the framerate, especially at resolutions above 640x480. So I tuned ffmpeg’s Constant Rate Factor param, which didn’t help.

What’s going on here?!

Investigation #1

Perhaps I was mis-understanding how picamera was handling frame data behind the scenes … was it reusing buffers? In a multi-threaded setup like mine, it could mean the buffer contained pixels from multiple frames, leading to ghosting or other strange artifacts. So I made a copy of the bytearray before pushing it onto the queue. Didn’t help.

Investigation #2

Maybe the bgr data coming out of picamera isn’t as high quality as I thought. I read somewhere that the camera can encode to jpeg at high speed, so I switched to that. Didn’t help. However, this streamlined support for MJPEG live-streaming, so wasn’t a wasted effort.

Investigation #3

Maybe something is screwy with ffmpeg and jpegs? So I used the following picamera code to capture 50 jpeg stills:

import io
import time
import picamera

class SplitFrames(object):
    def __init__(self):
        self.frame_num = 0
        self.output = None

    def write(self, buf):
        if buf.startswith(b'\xff\xd8'):
            # Start of new frame; close the old one (if any) and
            # open a new output
            if self.output:
                self.output.close()
            self.frame_num += 1
            self.output = io.open('image%02d.jpg' % self.frame_num, 'wb')
        self.output.write(buf)

with picamera.PiCamera(resolution='720p', framerate=30) as camera:
    camera.start_preview()
    # Give the camera some warm-up time
    time.sleep(2)
    output = SplitFrames()
    start = time.time()
    camera.start_recording(output, format='mjpeg')
    camera.wait_recording(2)
    camera.stop_recording()
    finish = time.time()
print('Captured %d frames at %.2ffps' % (
    output.frame_num,
    output.frame_num / (finish - start)))

I then verified that they were high quality by viewing them in Firefox. Then I used ffmpeg to turn the jpeg files into an mp4:

ffmpeg -r 10 -f image2 -i image%02d.jpg -c:v h264_omx -crf 23 combined.mp4

Hmm, that looks horrible. Maybe it’s the encoder? I tried to change “-crf” to 18 but it didn’t help. Then I noticed something in the ffmpeg output I hadn’t seen before:

Codec AVOption crf (Select the quality for constant quality mode) specified for output file #0 (combined.mp4) has not been used for any stream. The most likely reason is either wrong type (e.g. a video option with no video streams) or that it is a private option of some encoder which was not actually used for any stream.

What?! CRF isn’t supported by this encoder! Geez. I did some searching and found that the default bit rate for h264_omx is very low. So I specified it manually:

ffmpeg -r 10 -f image2 -i image%02d.jpg -c:v h264_omx -b:v 800k combined.mp4

This looks so much better!

VIDEO HERE

So the cause of the low-quality videos was the h264_omx encoder. Raising the bitrate helped, but it’s not a good long-term solution (as we’ll see later).

Yay quality!

It’s great that hardcoding the bitrate increased the quality, but we’re not done yet. While encoding I noticed that ffmpeg was maxing out a CPU core. That’d be fine if it finished encoding in a reasonable amount of time, but it stayed running for long after motion had stopped. The code queues up frames for encoding and stops enqueuing after motion stops, so ffmpeg’s run duration seemed legitimate. There was simply too much work for 1 CPU core and the h264_omx encoder to handle. I tried to give ffmpeg permission to use more than 1 core by using “-threads 3”, but it never made use of additional cores. This is indeed another limitation of the h264_omx encoder.

A hardware encoder is good in theory

So while I can detect motion in high def, doing so is pointless if encoding can’t keep up. The frames will queue in memory, running the risk that we’ll fill up 1GB of RAM and the process will get OOM-killed. I don’t want that.

So I switched back to the standard h264 encoder along with “-crf 18 -threads 3”. This incantation resulted in high quality video, encoded in realtime, triggered by motion detection. Yes!

Did you notice that I gave ffmpeg permission to use 3 CPU cores? While it’s true that ffmpeg tends to max out all 3 of those cores while it’s encoding, the fourth core used by the python script tends to hover around 20% utilization. This leaves plenty of headroom to keep the system responsive.

Live viewing

Video files are great and all, but we also need the ability to watch the video stream in realtime. Recall that I asked the camera for JPEG format frames while investigating video quality issues. One positive side effect is the camera can produce JPEGs faster than other formats. Another is that browsers can natively play “streaming JPEGs” via the MJPEG format. With JPEGs in hand, we’re almost there! It only took a tiny bit of fiddling to make it work too. See the code here.

What’s next?

My project that began in August 2018 now has something to show: motion detection, encoding to MP4, live streaming, all in 720p at 20fps. The raspi-hd-surveillance project is alive on GitHub! And of course, I already have thoughts on how to squeeze out even more pixels. Perhaps I’ll be inspired to do another round of improvements after the holidays. Or maybe you will be?

Check the GitHub issues in a month or so if you’re interested in helping out.

Thanks for reading!