An Approach to the Estimation of Global and Local Text Skew in Historical Printed Documents

Darko Brodic, Cedomir A. Maluckov, Zoran N. Milivojevic

Abstract


Historical printed documents represent an important part of our heritage. In orderto preserve their content, digitalization process is mandatory. As a possible result of this process,the document image can be generated with some degree of inclination that can aect the textby creating skewed text lines. This paper proposed a multi step approach to the estimation ofthe global and local text skew for historical printed documents. It was based on the analysisof the connected component created by the lled convex hulls around each text element.Then, the largest connected component was determined in order to identify the initial skewrate. Accordingly, the connected component was enlarged by oriented morphological erosionimplemented on the complementary image. After that, the longest enlarged component wasextracted. The global text skew of the document was identied by its orientation. Then, theoriginal document was rotated according to the obtained angle. Furthermore, the text linesegmentation was established by strike through line. This way, it created connected component,which orientation represented the local text skew identied by the least square method.Eciency and correctness of the algorithm were examined by testing on dataset. The resultsproved the robustness of the algorithm.

Keywords


Binarization; Image analysis; Image processing; Least-squares approximation; Optical character recognition, Text skew estimation.

Full Text: PDF