Optical music recognition

Introduction

The task is to parse the notes contained in a sheet of music given as a run-length encoded string.

RLE images are formatted as follows :

"W {white_pixels_count} B {black_pixels_count} ..."

It simply switches between W and B (for white and black pixels), followed by the number of continuous pixels containing the same value.

The music sheet received as argument must be clean, meaning containing only 100% black or white pixels, notes are separated by at least one pixel, the space between the lines is at least 8 pixels, and at least 4 times the height of a line, lines are parallel between each others, the image doesn't contain noise, all notes have a tail, and notes must either be halves or quarters.

Here's an example of image that the current algorithm is able to parse :

Full description of the task can be found on the coding game puzzle page

Requirements

gcc ^11.3.0
cmake ^3.22.1

Build and run

Build

Step required only if you want to debug images Set the DEBUG_IMAGE variable to TRUE in the CMakeLists.txt file located at the project's root, and set the path in which you want the debu images to be stored.

Required Create a build folder at the project's root.

Open a terminal in the build folder and run

cmake .. && make

Run

You can try the program with the provided examples (taken from here) or encode another image in RLE, as long as it respects the requirements listed above for an image to be clean.

You can store the image in the build folder, or static/input_files but you'll need to rebuild the project so that static files get collected.

Else you can use any other folder as long as you give its absolute path as argument when running the script.

Run the program from the build folder with

./orm $width $height $file_path

For example :

./omr 120 176 in1.txt

Note:

Each image has its height and width written in the static/input_files/README.md file.

Tests

From the build folder, you can run several test scripts :

Unit tests

./grid_test

./region_of_interest_test

./rle_parser_test

End to end tests

./omr_test

Data structures

Grid

A grid is a wrapper to store a 2d binary table extended with util functions (get width and height, get columns and rows, get pixel, get subgrid).

Each cell is true if the pixel is on, else false

Region of interest

A region of interest (ROI) is an area in a picture containing relevant information.

The broadest ROI is the picture itself, the smallest ROIs are the areas in which individual keys can be found.

A ROI is composed of :

one x-y offset vector (computed from the ROI's upper left corner's coordinates, the origin [0, 0] being the global image's upper left corner)
absolute width and height of the region
two histograms, storing the quantity of black pixel in each rows and columns respectively

RLE parser

The RLE parser is in charge of extracting significant data from the RLE string.

It fills in a 2D grid and computes horizontal - vertical distributions as it reads the RLE string.

It uses the generated 2d grid and distributions to create the global region of interest.