Over a year ago, I was describing a project about an On-Screen Display (OSD) based on the RP2040. It was a simple yet effective application that allowed me to display flight telemetry on a screen in real-time. The solution was somewhat limited because it was created out of necessity. I have several RC planes with FPV capabilities. During flights, I wanted information about current position, battery voltage, etc., displayed on the video feedback. All this information was displayed using four lines of text, two at the top and two at the bottom of the screen. Since then, project requirements have evolved, and I needed to add more functionality to keep up with my needs. In the newest update, I’ve focused on enhancing I2C transmission and adding a frame buffer.

Overview of New Functionalities
Here’s a short list of what was added and improved in the project:
- Frame buffer to cover the entire display
- Modified SPI and DMA transmission to support the frame buffer
- Font scaling without needing additional font files
- Enhanced I2C communication with CRC8 checksum
- Storing configuration in a structure
- Added support for parameters
Frame Buffer
The frame buffer is a memory area that stores pixel data for the entire display. It allows for more complex graphics and text rendering, as well as smoother animations. The frame buffer is implemented as a 2D array of pixels, where each pixel is represented by one (white) or zero (black/transparent). The frame buffer is updated in the background while the main program continues to run. This allows for smooth transitions and even animations without flickering or tearing.
The main reason behind implementing the frame buffer was to allow more freedom in placing text on the screen. Previously, the vertical position was hardcoded. Now, text can be placed anywhere on the screen.
The second reason related to text line shifts. Let’s take a closer look at the previous implementation:
for (int i = 0; i < 4; ++i)
{
if (line_counter >= line_starts[i] && line_counter < line_starts[i] + 8)
{
start_dma_transfer(OSD_spi, (void *)dma_spi_tx, line_buffers[i],
line_lengths[i], line_counter, line_starts[i]);
}
}
The above implementation iterated over four lines of text. However, each iteration introduced a small delay when triggering the DMA transfer. This delay wasn’t significant but was noticeable when the text was shifted. Each line was shifted slightly to the right due to this delay. Adding additional lines would cause further shifts, and the text wouldn’t be aligned properly. The frame buffer solved this problem by allowing for a single DMA transfer per line. Currently, the implementation looks like this:
if(line_counter >= FRAME_BUFFER_Y_OFFSET_LINES && line_counter < FRAME_BUFFER_Y_OFFSET_LINES + OSD_LINES){
int offset = (line_counter - FRAME_BUFFER_Y_OFFSET_LINES ) * OSD_LINE_LENGTH;
start_dma_transfer_line(OSD_spi, (void *)dma_spi_tx, &frame_buffer[offset], OSD_LINE_LENGTH);
}
++line_counter;
This code doesn’t introduce an additional delay when printing a line. The delay is always the same, so there’s no shift.
The shift can be observed in the image below. It’s best visible at the two bottom lines of text. With the current frame buffer implementation, this is just a relic of the past. The text is now aligned properly and there’s no shift. Furthermore, the presence of the frame buffer allows for more complex text rendering and drawing graphics on the screen.

DMA and SPI Transmission
Now, the DMA start transfer function is simplified:
void start_dma_transfer_line(void *spi, void *dma, uint8_t *buffer, uint32_t length);
It doesn’t care about the current line or what the starting line is. It simply focuses on transmitting a single line of the frame buffer to the display.
Font Scaling
This feature is closely associated with the frame buffer. The font scaling is implemented in a way that allows for different font sizes without needing additional font files. The font is scaled by simply changing the size of the font in the frame buffer. This allows for more flexibility and easier implementation of different font sizes. The scaling allows for vertical and horizontal scaling. However, there’s a limitation closely related to its performance. Scaling is always done whenever it’s invoked.
void font_print_line_scale(uint8_t *font, uint8_t *input, uint32_t input_length,
uint32_t line_length, uint32_t horizontal_char_offset, uint8_t *buffer,
uint8_t x_scale, uint8_t y_scale);
Below is a brief pseudocode of the scaling print line function:
Function font_print_line_scale(font, input, input_length, line_length, horizontal_char_offset, buffer, x_scale, y_scale):
Define an array `sign` of size 10 to store scaled characters
If x_scale > 10:
Set x_scale to 10
For each row `i` from 0 to 7:
For each character `j` in the input:
If x_scale == 1:
Set `sign[0]` to the current input character
For each vertical scale `y` from 0 to y_scale:
Write the scaled character to the buffer at the appropriate position
Else:
Clear the `sign` array (set all elements to 0)
Get the current character's bitmap from the font
For each bit `k` in the character's bitmap:
If the bit is set:
Duplicate the bit `x_scale` times horizontally into the `sign` array
For each horizontal scale `x` from 0 to x_scale:
For each vertical scale `y` from 0 to y_scale:
Write the horizontally and vertically scaled character to the buffer at the appropriate position
As you can notice, there are a couple of nested loops. The deepest loop level is four. The two outer loops go through lines and characters from the font while the later goes through each character in the input string. If scaling is enabled (larger than 1), the function creates a temporary representation of a scaled character. Finally, the last two loops do the horizontal and vertical scaling by precisely writing the scaled character to the frame buffer. Each bit of the scaled character needs to be correctly written to the frame buffer.
An example can be seen in the image below.

The actual implementation is given below:
void font_print_line_scale(uint8_t *font, uint8_t *input, uint32_t input_length,
uint32_t line_length, uint32_t horizontal_char_offset, uint8_t *buffer,
uint8_t x_scale, uint8_t y_scale)
{
uint8_t sign[10]; // defines maximum x scale to 10
if (x_scale > 10)
{
x_scale = 10;
}
for (int i = 0; i < 8; ++i)
{
for (int j = 0; j < input_length; ++j)
{
if (x_scale == 1)
{
sign[0] = input[j];
for (int y = 0; y < y_scale; ++y)
{
*(buffer + (i * y_scale + y) * line_length + j + horizontal_char_offset) = (font + 8 * sign[0])[i];
}
}
else
{
// clear scaled sign
for (int x = 0; x < x_scale; ++x)
{
sign[x] = 0x00;
}
// scale horizontally by
// duplicating given bit exactly x_scale times
uint8_t character = (font + 8 * input[j])[i];
for (int k = 0; k < 8; ++k)
{
if (character & (1 << k))
{
for (int x = 0; x < x_scale; ++x)
{
uint8_t offset = (x_scale * 8 - 1) - (k * x_scale + x);
uint8_t byte_offset = offset / 8;
uint8_t bit_offset = offset % 8;
sign[byte_offset] |= 1 << (7 - bit_offset);
}
}
}
for (int x = 0; x < x_scale; ++x)
{
for (int y = 0; y < y_scale; ++y)
{
*(buffer + (i * y_scale + y) * line_length + horizontal_char_offset + j * x_scale + x) = sign[x];
}
}
}
}
}
}
There’s a small disadvantage related to this implementation. There’s no buffering when multiple identical characters are printed. For example, if the user wants to print “AAAAA”, the function will create a scaled character for each “A”. This can be improved by creating a buffer for each character; however, that would require additional memory and time to create the buffer. The current implementation is fast enough for most applications and doesn’t require additional buffering.
The implementation is fast enough considering the capabilities of the RP2040 and doesn’t introduce any lags when printing strings.
CRC8 Checksum
The introduction of a CRC8 checksum calculation in the OSD system ensures data integrity during I2C communication. The CRC is computed over the payload data before transmission and appended to the structure as part of the osd_data_t
structure. Specifically, the CRC value is stored in the osd_data->crc
field, which is defined by the macro OSD_REG_CRC
. This calculation occurs at the end of the I2C transmission process, ensuring that any modifications to the data payload are reflected in the checksum. The CRC8 algorithm uses a precomputed lookup table (osd_crc8_table
) for efficiency, as seen in osd-comm.c
, where the CRC is generated using the osd_crc
function:
uint8_t osd_crc(const uint8_t *buffer, uint8_t length);
This function processes the data buffer and computes the checksum using the lookup table. During I2C transmission, the CRC is appended to the payload, allowing the receiver to validate the integrity of the received data. The placement of the CRC in the osd_data
structure ensures that it’s included in the transmitted frame, providing a robust mechanism for error detection in the OSD system’s communication stack.
Additional Remarks on SPI Transfers
SPI transfers are critical for sending video data or overlay information to a screen. The four SPI modes, determined by clock polarity (CPOL) and phase (CPHA), define when data is sampled relative to clock edges; however, they don’t alter the overall transfer mechanism. For example, both SPI_CPOL_1 with SPI_CPHA_0 and SPI_CPHA_1 use a high-idle clock but differ in sampling timing. While CPOL sets the idle state of the clock (high or low), it doesn’t influence when data is transferred; only the phase dictates this. In analog systems like OSD, even minor changes in CPHA can cause visible effects: shifting the sampling point from a falling to rising edge introduces a slight time delay in data latching, which may manifest as flicker, misalignment, or color distortion on-screen. This sensitivity highlights why precise phase settings are essential for maintaining image quality and synchronization, ensuring that data is captured at the correct moment during the clock cycle.
If you experience flickering, try changing the phase to its previous value. In the previous implementation, the phase was set to 0, which caused some flickering on the screen. The current implementation uses SPI_CPHA_1, which seems to work better with the OSD display.
The effect is also visible when scaling text is enabled. With the previous phase setting, vertical gaps were present in the image below.

After changing the phase, the gaps disappeared, and the text is displayed correctly.

Conclusions
The journey from a simple four-line telemetry display to a full-fledged frame buffer OSD has been incredibly rewarding. Implementing the frame buffer wasn’t just about adding features; it fundamentally changed how we approach displaying information on the screen, allowing for greater flexibility and visual clarity. The improvements in I2C communication with CRC8 checksums enhance reliability, while font scaling provides a convenient way to adjust display size without bloating the codebase.
This project highlights the power of the RP2040 – its capabilities allow for surprisingly complex tasks within a small footprint. While performance considerations remain important (especially regarding the font scaling algorithm), the current implementation strikes a good balance between functionality and speed.
Looking ahead, potential future enhancements include exploring more advanced graphics rendering techniques, implementing dynamic text wrapping to fit content within display boundaries, and optimizing the font scaling process for even faster performance. The open-source nature of this project encourages community contributions and experimentation – I invite anyone interested to explore the code on GitHub: https://github.com/wdomski/OSD-RP2040 Ultimately, this OSD project serves as a testament to what can be achieved with a little ingenuity and the versatile RP2040 microcontroller.