To enter the HDC2021 competition:
Only submissions that fulfill the requirements listed below will be accepted.
The data is arranged into categories of gradually increasing difficulty, with the last ones being pretty much impossible. The HDC2021 competition is structured accordingly step-wise like this: Conflict of interest: researchers affiliated with the Department of Mathematics and Statistics of University of Helsinki will not be added to the leaderboard and cannot win the competition. Note: Only the middle text line out of the three will be measured. The basic code that runs the OCR and quantifies the transcription can be downloaded here . The function requires two arguments:
The code runs in Python 3. Please check requirements.txt in the downloadable zip file for the version numbers of the modules. The sanity check basically means that the algorithm should serve as a deconvolution method for general images, not only for images of text.
This is to prevent approaches that always output text, regardless of the input. The sanity test image material contains some technical targets and some natural images (See examples in Figure 1 & 2 on this page (New 29.07.2021)), and the threshold for passing is very low.
It's enough to have some sort of deconvolution effect, even a poor one, instead of always producing images of text. Deadline: September 30, 2021 23:59 EET (Eastern European Time) The algorithms must be shared with us as a private GitHub repository at latest on the deadline. The codes should be in Matlab or Python3. After the deadline there is a brief period during which we can troubleshoot the codes together with the competing teams. This is to ensure that we are able to run the codes. Competitors can update the contents of the shared repository as many times as needed before the deadline. We will consider only the latest release of your repository on Github. Attention: Simple commits to the main branch will not be considered. You MUST also create a release.
You can find Github's documentation on how to create releases here. If the latest release does not work we will not accept older versions. Your repository must contain a README.md file with at least the following sections: The repository must contain a main routine that we can run to apply your algorithm automatically to every image in a given directory. This is the file we will run to evaluate your code. Give it an easy to identify name like main.m or main.py. Your main routine must require three input arguments:
Rules of the competition
The spirit of the competition is that the same algorithm is used at all blur categories, and that it is a general-purpose algorithm, capable of deblurring also other kinds of images than text images. The organizing committee has the right to disqualify an algorithm trying to violate that spirit.
OCR and string matching method
All deconvoluted images will be quantitatively measured by applying the tesseract OCR via pytesseract
Python module, followed by a comparison with the true text using the Levenshtein distance between the strings, implemented in the FuzzyWuzzy
Python module. The same set of parameters in pytesseract and FuzzyWuzzy will be used to assess all deconvoluted images.
Sanity check
Deadline and what needs to be submitted:
Github repository
The teams can submit more than one deconvolution algorithm to the challenge, each algorithm must be in a separate repository.
The maximum number of algorithms is the number of members of the team.
The teams don't need to register multiple times in case they decide to submit more than one algorithm to the challenge. (New 24.06.2021)
Your code on Github
Below are the expected formats of the main routines in python and Matlab:
Matlab: The main function must be a callable function:
function main(inputFolder,outputFolder,categoryNbr)
...
your code comes here
...
Example calling the function:
>> main('path/to/input/files', 'path/to/output/files', 3)
Python: The main function must be a callable function from the command line. To achieve this you can use sys.argv or argparse module.
Example calling the function:
$ python3 main.py path/to/input/files path/to/output/files 3
The main routine must produce a deconvolved PNG file in the output folder for each image in the input folder.
The output PNG images must have the same dimensions as the input files and the same filename apart from the extension.
All images in the input directory belong to the same blur category, specified by the input argument.
The teams are allowed to use freely available python modules or Matlab toolboxes. Toolboxes, libraries and modules with paid licenses can also be used if the organizing committee also have the license. For example, the most usual Matlab toolboxes for image processing and deconvolution can be used (Image processing toolbox, wavelet toolbox, PDE toolbox, computer vision toolbox, deep learning toolbox, optimization toolbox). The teams can contact us to check if other toolboxes are available.
Finally, the competitors must make their GitHub repositories public at latest on October 31, 2021. In the spirit of open science, only a public code can win HDC2021.