Streaming images and video using the Raspberry Pi Camera module

Sometimes streaming video might not be as straightforward as it might seem to be. In this blog, I will walk you through some ideas on how to stream video using Raspberry Pi Camera module and some Python libraries. Different techniques focussing around picamera module will be investigated along with some front-end code to facilitate reception via a web browser.

Before we start on this endeavour, let me share with you the hardware and software stack. All of the examples provided can be run on Raspberry Pi SBCs. They are used on Raspberry Pi 4 and Raspberry Pi 4. Also, I use Raspberry Pi Camera modules 1 and 2. Software stack is mainly made up of two things. The first one is the module allowing us to interface with the camera – picamera. Although this is the first version of the package, the provided code (with some modifications) will run with picamera2. Finally, I use Django 5 as the backend and the mix with plain HTML and JavaScript.

Streaming images

As strange as it might seem streaming images not video can be a way to emulate a video streaming. The general idea is to capture new images continuously and send them to the front-end. This solution is very straightforward; it does not require any special support for decoding video stream, just displaying images which will be updated once a new one is captured and sent.

First, we need to import some modules.

from io import BytesIO
from PIL import Image
from picamera import PiCamera

Then we create the camera device using PiCamera.

device = PiCamera()

Let us write a function that captures an image in JPEG format and provides it as a ByteIO stream of data.

def get_image():
  stream = BytesIO()
  device.start_preview()
  device.capture(stream, format='jpeg')
  device.stop_preview()
  return stream

To yield JPEG images as an image stream, a wrapper of the raw data is needed. So, instead of just yielding the raw image stream, each picture (frame) has to be opaqued. This is necessary for the web browser to distinguish one frame from the other.

def streamer_jpeg():
  while True:
    try:
      stream = get_image()
      if stream is None:
        break
        
      stream.seek(0)

      yield(b'--frame\r\n'
        b'Content-Type: image/jpeg\r\n\r\n' + stream.read() + b'\r\n\r\n')
    except Exception as e:
      print("Video jpeg streaming stopped")

Now, let us move to Django and make it available as a view. In this case a new function inside views Python file has to be created. For example:

def stream_jpeg(request):
  streamer = streamer_jpeg()
  return StreamingHttpResponse(streamer, content_type="multipart/x-mixed-replace;boundary=frame")

Here, we take our stream_jpeg() generator and pass it to StreamingHttpResponse(). This will allow one to create a stream response for HTTP. Pay attention to the MIME type. Here, x-mixed-replace is being used and as a boundary we use a frame. This goes in pair with what the streamer_jpeg() function yields. The generator generates the specially wrapped content and then Django knows how to separate each frame when they are being produced.

To make it available under Django instance, it is required to create a route. In applications the following urls file should be present:

app_name = "video"

urlpatterns = [
  path('stream_jpeg', views.stream_jpeg, name="stream_jpeg"),
]

Finally, the HTML code will follow:

<div class="container">
  <img src="{% url 'video:stream_jpeg' %}" class="img-fluid" style="width: 100%;">
</div>

We create a div container with img entity inside which source directs us to the JPEG stream, in turn provided by stream_jpeg() function.

The above will allow you to stream images directly in your browser. There are no additional encoders needed since the images are in JPEG format, which is widely supported. This solution is straight forward, simple, and quite robust. Even if there are multiple users accessing the same page at the same time. The frames will be divided, more or less equally among them. However, this solution has one important disadvantage. It is slow in terms of frame rate. The picamera takes tame to capture a single still image. Thus, this solution would not be advised for situations where flawless stream is the prime requirement. On the other hand, it is a very good solution to have a quick preview available in low resolution.

MJPEG Streaming

Streaming motion JPEG is quite similar to streaming JPEG images. So you have two compression formats here: MJPEG and JPEG. The key difference between them is that MJPEG stores each frame of a video as a separate JPEG image, which can be more efficient for scenes with minimal movement or static images. On the other hand, JPEG is primarily used for compressing still images and does not handle motion well — it is a lossy compression algorithm that discards some data to reduce file size, making it not ideal for video content where things are moving around.

Let us now focus on the implementation part. We start with the streamer function:

def streamer_mjpeg(delay=0.1):
  stream = BytesIO()
  camera.start_recording(stream, 'mjpeg')
  try:
    while True:
      camera.wait_recording(delay)
      stream.seek(0)
      yield(b'--frame\r\n'
        b'Content-Type: image/jpeg\r\n\r\n' + stream.read() + b'\r\n\r\n')
      stream.seek(0)
      stream.truncate()
  except Exception as e:
    print("Video MJPEG streaming stopped")

Above allows to constantly feed stream in MJPEG format. There are some key differences. Now, the camera is using the video recording method start_recording() where the stream and format are given. After starting the recording, we need to wait for the camera to produce some data, hence the wait_recording() method. When data was collected, the stream is moved to its beginning, and a data frame is yielded. This part is exactly the same as for streaming images in JPEG format. In order to make place for new frames, we need to move the stream to the beginning (since it was read during yielding) and truncate it.

The rest of the MJPEG application is the same. Even the same <img> tag is used to display data. No additional features are needed for the web browser to handle the MJPEG stream properly. The advantage of this solution is its performance. It is faster than compared to JPEG image streaming. The most significant difference is the fact that the camera module is configured in video streaming mode and is constantly recording data.

Differences between picamera and picamera2

The aforementioned code snippet is valid for picamera library, however, when a picamera2 (current standard for capturing images and video on Raspberry Pi OS) is being considered the main code structure is a bit different and has to be adjusted, but only in context of creating a streamer.


from picamera2.encoders import MJPEGEncoder
from picamera2.outputs import FileOutput

def streamer_mjpeg(delay=0.1):
  encoder = MJPEGEncoder(bitrate)
  stream = BytesIO()
              stream_output = FileOutput(stream)      
              camera.start_recording(encoder, stream_output)  
  try:
    while True:
      camera.wait_recording(delay)
      stream.seek(0)
      yield(b'--frame\r\n'
        b'Content-Type: image/jpeg\r\n\r\n' + stream.read() + b'\r\n\r\n')
      stream.seek(0)
      stream.truncate()
  except Exception as e:
    print("Video MJPEG streaming stopped")

Notice a few things. First, additional modules need to be imported to use MJPEGEncoder and FileOuput. Second, the stream is initiated differently, compared to the first picamera version. An encoder must be created. The encoder allows us to consume a parameter like bitrate, thus this parameter is coupled with the encoder, as it should be. In order to still be able to use a BytesIO object holding the stream, another new instance of FileOutput class has to be created. It is a wrapper that handles how streams are being handled in picamera2 (or, generally speaking, in libcamera). The FileOutput consumes previously created stream and returns a new output stream. The main difference is in how the recording of a video is started in terms of the new library. Instead of specifying the stream, now the function consumes two things: the encoder and the output stream. Other than that, there are no more significant differences; however, there are important from the perspective of the new API provided with picamera2 (and libcamera since picamera2 is built on it).

Streaming H264

Streaming video using h264 codec is much different than streaming video using the previous two solutions. H264 is a video codec that includes variable block-size motion compensation. In other words, the current frame does not only depend on the itself, but also on previous frame or frames. Therefore, the motion in the image is represented better while motionless blocks are reused, allowing a lower bitrate. The overall structure of the video capture process is similar to the MJPEG format.

def streamer_h264(delay=0.1):
  stream = BytesIO()
  camera.start_recording(stream, 'h264')
  try:
    while True:
      camera.wait_recording(delay)
      stream.seek(0)
      yield stream.read()
      stream.seek(0)
      stream.truncate()
  except Exception as e:
    print("Video h264 streaming stopped")

As can be seen above the overall layout is very similar to streaming video using MJPEG. However, there is one significant difference. Now, instead of dividing frames into fragments separated with the ‘–frame’ keyword, the stream pours clean h264 data. This is the most significant difference in how the stream is produced.

One other difference is on the Django view level, namely:

def stream_h264(request):
  streamer = streamer_h264()
  return StreamingHttpResponse(streamer, content_type="video/h264")

Instead of using a multipart type of content, a plain video/h264 is being used. This informs the browser that what is being actually streamed is a video content.

It might be worth mentioning that this is not the most significant difference from a user perspective. Since the content is a raw h264 video stream, it might happen that the web browser is not used to process this type of stream and simply it will not produce any video out of it. However, downloading a raw h264 video stream with, e.g. wget and storing it inside a file like video_stream.h264 will renter it usable. The content of the file can be used with software that is used to work with raw video streams like VLC (video player) or other software specialised software for video content processing an creation.

A question might arise, why even bother to stream video with raw h264 codec? The implementation for streaming video as MP4 is very similar and in truth it is based on h264.

Additionally, it is important to mention how the implementation would look like for picamera2. It is very similar to the MJPEG picamera2 implementation and would look like the following.

def streamer_mjpeg(delay=0.1):
  encoder = H264Encoder(bitrate)
  stream = BytesIO()
              stream_output = FileOutput(stream)      
              camera.start_recording(encoder, stream_output)  
  try:
    while True:
      camera.wait_recording(delay)
      stream.seek(0)
      yield stream.read()
      stream.seek(0)
      stream.truncate()
  except Exception as e:
    print("Video h264 streaming stopped")

A keen eye would immediately notice two differences. Now, a different encoder is being used, H264Encoder, which consumes desired bitrate. The second difference is as already mentioned for h264 video stream. The streamer function is yielding raw data from the stream; no additional boundaries in the form of frames are necessary. This is the only practical difference between picamera and picamera2 API.

Streaming MP4

Streaming MP4 is not much different from streaming h264, at least from technical perspective, but as it always is, the details matter. H264 is an encoder, format in which video data is being stored encoded. It could be compared to JPEG, BMP, and PNG formats that are used to store still images. In turn, MP4 is a multimedia container format that allows one to store video data, audio data and other types like e.g. subtitles. What is more, and particularly important for this case, is a modern standard that supports streaming content over Internet.

Let us start with the streamer implementation.

def streamer_mp4(delay=0.1):
  stream = BytesIO()
  camera.start_recording(stream, 'h264')
  try:
    while True:
      camera.wait_recording(delay)
      stream.seek(0)
      data = stream.read()
      stream.seek(0)
      stream.truncate()

      ffmpeg_process.stdin.write(data)
      ffmpeg_process.stdin.flush()
      mp4_data = ffmpeg_process.stdout.read()
      if mp4_data:
	    yield mp4_data
  except Exception as e:
    print("Video MP4 streaming stopped")

The structure of the streamer is virtually the same as that for h264. The encoder is also h264 since MP4 can be a container for the h264 video format. The significant difference is related to the encapsulation of h264 video stream inside MP4. This can be seen as a reference to ffmpeg process, denoted as ffmpeg_process object. The raw data data coming out of the h264 encoder is sent to the ffmpeg process. In turn the ffmpeg process is providing the MP4 stream that is suitable for on-line video streaming. Finally, the MP4 data are yielded from the streamer. Let us have a closer look at the ffmpeg process that is responsible for encapsulating h264 encoded data inside the MP4 container.

import os
import threading
import subprocess

def ffmpeg_stream():
  ffmpeg_cmd = [
        "ffmpeg",
        "-f", "h264",
        "-i", "pipe:0",
        "-f", "mp4",
        "-movflags", "frag_keyframe+empty_moov",
        "-vcodec", "copy",
        "pipe:1"
      ]
  ffmpeg_process = subprocess.Popen(ffmpeg_cmd, 
            stdin=subprocess.PIPE,
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE)
  fd = ffmpeg_process.stdout.fileno()
  os.set_blocking(fd, False)
  fd = ffmpeg_process.stderr.fileno()
  os.set_blocking(fd, False)
  
  def drain_stderr(stderr, event):
    while True:
      stderr.readline()
      if event.is_set():
        break

  t_event = threading.Event()
  t = threading.Thread(target=drain_stderr, args=(ffmpeg_process.stderr,t_event,))
  t.start()

  return ffmpeg_process, t_event

The ffmpeg_stream() function creates the ffmpeg process that is responsible for converting the h264 stream into the MP4 stream. This function returns a tuple where the first item is the ffmpeg_process subprocess object created using the Popen method. Thanks to this, the process can be governed outside this function, and in particular can be terminated on demand. The second item in the tuple is an event object. It is also related to terminating the ffmpeg process but not directly. When the event is set, it will result in terminating a helper thread responsible for reading data from ffmpeg’s standard error.

In order to create the ffmpeg process a set of arguments has to be passed to it to achieve the h264 to MP4 translation. The equivalent to the list of arguments would be

ffmpeg -f h264 -i pipe:0 -f mp4 -movflags frag_keyframe+empty_moov -vcodec copy pipe:1

The above command says following to the ffmpeg process. The format of the input data is encoded as h264 stream, and the input will be provided via standard input (pipe:0). Then the parameters for the output are specified as follows. The format should be mp4. Each new frame should be started at each video keyframe (frag_keyframe), thus the output will be fragmented. Additionally, empty_moov is another flag that will force an initial moov atom at the very start of the file while not placing any samples inside it. Normally, this contains a short segment of the file, which will place track description but with zero duration. These two flags ensure that the file is properly fragmented and suitable for streaming.

The video codec should be copied, and -vcodec copy ensures it. This prevents transcoding, and thus no additional overload is imposed on the process. Finally, the output should be redirected to the standard output with pipe:1.

There are a few additional options worth trying if the provided configuration causes some troubles.

Another movflag option that might be applied, but according to the documentation, it could be futile. The option is faststart; it forces an additional pass to move the moov atom at the beginning, however it is not suitable for streaming but if you encounter some issues it might be worth trying.

Another ffmpeg argument could be -fflags nobuffer. It reduces the time required for the initial analysis of the input stream used to determine the metadata about the input stream. Using this option might shorten the time it takes for ffmpeg to output the first frame. The analysis time can be further influenced with analysis duration duration_us, where duration_us defines the analysis time. The longer the analysis time, the more accurate information detected by ffmpeg. However, the longer the time, the longer the latency introduced by the process. By default the analysis time is 5000000, thus 5 seconds.

The ffmpeg process is created using subprocess.Popen. It allows specifying standard input (input h264 stream), standard output (output MP4 stream), and standard error. Standard error provides additional information from ffmpeg such as current encoding rate, number of processed frames, time, etc. If the stderr is not provided as subprocess.PIPE to subprocess Popen, the output will be displayed directly on the screen. To avoid it, I would suggest specifying it as a pipe. However, if this is being considered, additional action needs to be taken. If the data from stderr is not read, the ffmpeg process might freeze and result in ceasing operation. To overcome this problem, an additional thread was created — drain_stderr(). As the name suggests, it drains the pipe and allows ffmpeg process to continue its operation; and at least not stop due to filled standard error pipe :D. The thread can be stopped by setting the event via event object.

To facilitate working with ffmpeg and its draining thread, the pipes need to be set as non-blocking. This is a crucial property since by default pipes are in blocking mode. If the pipes for standard output and error were left as blocking it would lead to wrong operation of the process. Reading from the ffmpeg process might freeze due to the initial analysis, and no output will be produced. This could be solved with an additional thread but while keeping it simple, it is better to set the pipes in non-blocking mode.

Having covered the ffmpeg part responsible for producing an MP4 stream, it is necessary to dive into Django part and a suitable view.

def stream_mp4(request):
  streamer = streamer_mp4()
  return StreamingHttpResponse(streamer, content_type='video/mp4')

The view for the streamer is identical to the previous view, with one small exception. The content type has once again changed, now it is set to video/mp4. The new change compared to other streams is the front-end part.

<video id="cameraVideo" controls autoplay>
  <source src="/path/to/mp4/stream" type="video/mp4">
  Your browser does not support the video tag or the format of this video.
</video>

This will create a HTML5 video widget with some controls and will start playing immediately thanks to the autoplay attribute. If the browser is not supporting this particular content, information about it will be displayed on the screen.

Different Encoders

In this blog post I have covered different types of image/video streaming using both picamera2 and its predecessor picamera. There are different ways to stream video content, however, four streams that I have described might have some advantages over other types of video streaming. In particular, naïve JPEG image streaming might be considered as a strange solution to stream video. However, there are some benefits. If transmission to multiple users is being considered, the JPEG image streaming is a good choice while keeping complexity at its minimum. Data can be distributed between different users at a time. To stream video content, additional mechanisms to distribute the streams among multiple users would need to be deployed.

There is also an undeniable benefit of using JPEG image streaming over MP4 or even MJPEG. This issue is not that visible while using picamera, however, with picamera2 it is more noticeable. And if a less powerful SBC is being considered like Raspberry Pi Zero 2, then the issue can not be ignored anymore. All of it is related to RAM memory. If you try to stream high resolution MP4 stream on a RAM limited device you will soon find out that you run out of memory. In the best case, you will receive a notification that there is no free memory left to allocate. In worst case, unfortunately the most common one, the Raspberry Pi will hang irreversibly and all services become unresponsive. With picamera2 this happens more often when streaming high-resolution MP4 data. To provide an overview of the situation let’s consider Raspberry Pi Zero 2 W. It is equipped with 515MB of RAM memory. The OS consumes almost 80-90MB. After launching the web server that allows one to stream video among other functionalities, the total memory consumption is around 200MB. Launching an MP4 stream in high resolution instantly will suspend the OS rendering it unusable. The picamera2 module creates additional video buffers that cannot fit in the available memory. This happens when there is no swap file available, but once it is in place you will receive information about picamera2 not being able to allocate the memory. In this situation you can recover, since the OS is still operational.

Currently, JPEG image streaming is widely used in one of my projects, remote-lab, where one of the functionalities is to stream video feedback live to students providing live view on microcontroller development boards. You can read more about it here, or go straight to my Github repository.

Conclusions

Streaming video is not a complex task when peer-to-peer video transmission is considered. Four different streaming solutions are presented. Streaming JPEG images, streaming MJPEG (Motion JPEG), h264 raw video stream and MP4 video stream.

Streaming JPEG is great when even multiple users want to use the same resources without introducing complex resource sharing, buffers, and stream governing processes. JPEG allows you to set the image quality which can reduce the bandwidth and render it more usable. It is not as efficient as MJPEG, it is slower but good starting point, giving a lot of freedom to process images on the fly when needed. MJPEG is very similar to streaming JPEG images but follows conventions and more standardized approach. It allows you to also change image quality but through setting bitrate. In addition, it is possible to set the framerate to define how many images should be taken per second. Both picamera and picamera2, offer encoding video stream in h264 format, which is widely adopted. As described earlier, raw h264 stream is not much usable but when encapsulated in MP4 video stream you could stream video live!

There is one more point to consider, not much related to different stream formats but using ffmpeg. Since MP4 is a media format capable of holding not only video data but audio data as well, it is possible to attach audio stream to it as well. However, streaming audio alone is an entirely different matter, which I also hope to describe here. When it is available I will update this blog as well to let my readers know.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.