logo_long Home Rules Data News Results FAQ Contact About

Rules of the HDC2021

The rules and information about the HDC2021 can also be found in this pdf . (Updated 24.06.2021)

How to enter the competition

To enter the HDC2021 competition:

  • Register before 23:59 EET (Eastern European Time) on July 31, 2021, using this electronic form. (We accept late registrations sent by email before 23:59 EET (Eastern European Time) on August 31, 2021.)
  • Send your submission to hdc2021 ("at") fips.fi before September 30, 2021 23:59 EET. What needs to be submitted? See below for instructions.

    Only submissions that fulfill the requirements listed below will be accepted.

    Rules of the competition

    The data is arranged into categories of gradually increasing difficulty, with the last ones being pretty much impossible. The HDC2021 competition is structured accordingly step-wise like this:

    1. All teams start with Category 0 containing only slightly blurred images. If their algorithm leads to 70% or more of the characters in Category 0 being identified correctly by the OCR, and passes the "sanity check", they earn the right to compete in Category 1. (Below we describe the OCR procedure and explain the sanity check.)
    2. Recursion: once a team's algorithm reaches at least 70% of correctly identified characters in Category N, they earn the right to compete in Category N+1. The algorithm is required to have an input parameter specifying the blur category. The sanity check may be applied at all categories.
    3. Denote by Nmax the largest category number that at least one team could enter. If there is only one team, they win. If there are several teams competing in Category Nmax, they are ordered in the leaderboard according to the number of characters correctly identified in Category Nmax material.
    4. In case of a tie in Category Nmax, the previous categories, starting from Nmax-1, will be compared until one of the competitors win. If there is still a tie, the organizing committee will increase the percentage threshold to make the final decision on the winner. If the tie persists, the organizing committee will make the final decision on the winner.
    The spirit of the competition is that the same algorithm is used at all blur categories, and that it is a general-purpose algorithm, capable of deblurring also other kinds of images than text images. The organizing committee has the right to disqualify an algorithm trying to violate that spirit.

    Conflict of interest: researchers affiliated with the Department of Mathematics and Statistics of University of Helsinki will not be added to the leaderboard and cannot win the competition.

    OCR and string matching method

    All deconvoluted images will be quantitatively measured by applying the tesseract OCR via pytesseract Python module, followed by a comparison with the true text using the Levenshtein distance between the strings, implemented in the FuzzyWuzzy Python module. The same set of parameters in pytesseract and FuzzyWuzzy will be used to assess all deconvoluted images.

    Note: Only the middle text line out of the three will be measured.

    The basic code that runs the OCR and quantifies the transcription can be downloaded here .

    The function requires two arguments:

    1. The file name (relative or full path) of the deconvoluted image (output from your algorithm).
    2. The file name (relative or full path) of a text file with the expected text contents.

    The code runs in Python 3. Please check requirements.txt in the downloadable zip file for the version numbers of the modules.

    Sanity check

    The sanity check basically means that the algorithm should serve as a deconvolution method for general images, not only for images of text. This is to prevent approaches that always output text, regardless of the input. The sanity test image material contains some technical targets and some natural images (See examples in Figure 1 & 2 on this page (New 29.07.2021)), and the threshold for passing is very low. It's enough to have some sort of deconvolution effect, even a poor one, instead of always producing images of text.

    HDC_sanity_check_example1
    Figure 1: Sanity check example of a natural image of a bird with 3 different levels of blur.
    HDC_sanity_check_example1
    Figure 2: Sanity check example of a natural image of a building with 3 different levels of blur.

    Deadline and what needs to be submitted:

    Deadline: September 30, 2021 23:59 EET (Eastern European Time)

    The algorithms must be shared with us as a private GitHub repository at latest on the deadline. The codes should be in Matlab or Python3.

    After the deadline there is a brief period during which we can troubleshoot the codes together with the competing teams. This is to ensure that we are able to run the codes.

    Github repository

    Competitors can update the contents of the shared repository as many times as needed before the deadline. We will consider only the latest release of your repository on Github.

    Attention: Simple commits to the main branch will not be considered. You MUST also create a release. You can find Github's documentation on how to create releases here. If the latest release does not work we will not accept older versions.

    Your repository must contain a README.md file with at least the following sections:

    • Authors, institution, location.
    • Brief description of your algorithm and a mention of the competition.
    • Installation instructions, including any requirements.
    • Usage instructions.
    • Show a few examples.
    The teams can submit more than one deconvolution algorithm to the challenge, each algorithm must be in a separate repository. The maximum number of algorithms is the number of members of the team. The teams don't need to register multiple times in case they decide to submit more than one algorithm to the challenge. (New 24.06.2021)

    Your code on Github

    The repository must contain a main routine that we can run to apply your algorithm automatically to every image in a given directory. This is the file we will run to evaluate your code. Give it an easy to identify name like main.m or main.py.

    Your main routine must require three input arguments:

    1. (string) Folder where the input image files are located
    2. (string) Folder where the output images must be stored
    3. (int) Blur category number. Values between 0 and 19
    Below are the expected formats of the main routines in python and Matlab:



    Matlab: The main function must be a callable function:

    function main(inputFolder,outputFolder,categoryNbr)

    ...

    your code comes here

    ...

    Example calling the function:

    >> main('path/to/input/files', 'path/to/output/files', 3)



    Python: The main function must be a callable function from the command line. To achieve this you can use sys.argv or argparse module.

    Example calling the function:

    $ python3 main.py path/to/input/files path/to/output/files 3



    The main routine must produce a deconvolved PNG file in the output folder for each image in the input folder. The output PNG images must have the same dimensions as the input files and the same filename apart from the extension. All images in the input directory belong to the same blur category, specified by the input argument.

    The teams are allowed to use freely available python modules or Matlab toolboxes. Toolboxes, libraries and modules with paid licenses can also be used if the organizing committee also have the license. For example, the most usual Matlab toolboxes for image processing and deconvolution can be used (Image processing toolbox, wavelet toolbox, PDE toolbox, computer vision toolbox, deep learning toolbox, optimization toolbox). The teams can contact us to check if other toolboxes are available.

    Open science spirit of HDC2021

    Finally, the competitors must make their GitHub repositories public at latest on October 31, 2021. In the spirit of open science, only a public code can win HDC2021.