Published on April 26, 2021
In Deep Tech

Guide to LayoutParser: A Document Image Analysis Python Library

LayoutParser is a Python library for Document Image Analysis with unified coding and a great collection of pre-trained deep learning models

by Rajkumar Lakshmanamoorthy

Documents containing a combination of texts, images, tables, codes, etc., in complex layouts are digitally saved in image format. Analyzing and extracting useful information out of these image documents is performed with the help of machine learning. This supervised task is termed as Document Image Analysis (DIA). The popular DIA tasks in practical use include:

Document Image Classification
Layout Detection
Table Detection
Scene Text Detection
Character Recognition

There have been a few task-specific applications such as OCR (Optical Character Recognition) in real-world usage over decades. However, a library that provides all DIA tasks in one place became an important need of document analysis society, such as historical researchers and social science analysts. For instance, a screenshot image of an old newspaper’s page may contain historic research-centred contents in the form of tables, charts, texts and photographs. An OCR reader can be used to extract texts but cannot read other information. Moreover, an OCR reader may miss to recognize the text layouts and mix texts from different layouts in its output. A separate method will be required to extract information from tables, charts and so on.

The evolution of deep learning-based convolutional neural networks has begun to try to give solutions to the need of an integrated Document Image Analysis system. However, the practical implementation of recent successful deep learning models has faced some challenges. High-level DIA parameters are not always explicitly processed by deep learning frameworks. This makes customization of pre-trained models difficult. Popular models are trained on a particular set of annotated document images. Documents do not possess any common template and formats and are limited only by human creativity. This needs to collect task-specific annotated document images, preprocess them according to the model requirements, and fine-tune the model with those images in case of custom implementations of a popular model. The deep learning network part and the DIA part are usually trained separately to make customized fine-tuning difficult, tedious, and time-consuming.

To this end, Zejiang Shen of the Allen Institute of AI, Ruochen Zhang of the Brown University, Melissa Dell and Jacob Carlson of the Harvard University, Benjamin Charles Germain Lee of the University of Washington, and Weining Li of the University of Waterloo have introduced LayoutParser, a Python library for Document Image Analysis. This library has a Model Zoo with a great collection of pre-trained deep learning models with an off-the-shelf implementation strategy. This library has a unified architecture to adapt any DIA model. Apart from the usage of pre-trained models, LayoutParser provides tools for customization and fine-tuning as per need. Further, data preparation tools- for tasks such as document image annotation and data preprocessing tools are readily available in this library. The library aims at quality models and pipelines distribution with reproducibility, reusability and extensibility through a continuously improving community platform.

The architecture of LayoutParser — The architecture of the LayoutParser library (source)

LayoutParser performs one or more of the following DIA usages:

It receives document images as input. It offers off-the-shelf tools for any DIA task. It performs the tasks in order and yields the output.
It receives unannotated document images. It provides tools for efficient annotation of layouts and other parts of a document image.
It supports efficient custom training for user-specific tasks. Once trained, the model can be employed for inference.
It offers tools for visualization and storage of data, models, weights and checkpoints.
It provides community sharing, distribution, and documentation.

To store a layout in memory and retrieve it back, LayoutParser offers unified data structures. Three key components in the LayoutParser data structure are Coordinate, TextBlock, and Layout. Unique operations are defined in LayoutParser to process the library-defined data structures.

Operations in LayoutParser — Special operations and transformations in the LayoutParser library to process its own data structures (Source).

We discuss the code implementation and two practical applications of the library in the sequel.

Layout Detection in a Document Image

Install the LayoutParser library and its dependencies from the PyPi packages.

 %%bash
 pip install -U layoutparser
 # install detectron2
 pip install 'git+https://github.com/facebookresearch/detectron2.git@v0.4#egg=detectron2' 
 # install OCR module
 pip install layoutparser[ocr]

Import the libraries and modules.

 import layoutparser as lp
 import matplotlib.pyplot as plt
 import matplotlib
 %matplotlib inline
 import cv2

Deploy a pre-trained Detectron2 model configured for layout parsing.

 model = lp.Detectron2LayoutModel('lp://PubLayNet/faster_rcnn_R_50_FPN_3x/config', 
                                  extra_config=["MODEL.ROI_HEADS.SCORE_THRESH_TEST", 0.8],
                                  label_map={0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"})

Now the model is ready for inference. Download the source files from the official repository to obtain a sample image to perform inference on it.

!git clone https://github.com/Layout-Parser/layout-parser.git

Output:

Change directory to read the example data.

 %cd /content/layout-parser/examples/data/
 !ls -p

Output:

Read the ‘paper-image.jpg’ and display it.

 img = cv2.imread("/content/layout-parser/examples/data/paper-image.jpg")
 # convert BGR image into RGB format
 image = img[..., ::-1]
 # display image
 plt.figure(figsize=(12,16))
 plt.imshow(image)
 plt.xticks([])
 plt.yticks([])
 plt.show()

Output:

Predict the layouts in the above image using the pre-trained model.

layout = model.detect(image)

Display the image with predicted layouts over it.

lp.draw_box(image, layout, box_width=3)

Output:

This Colab Notebook contains the above example code implementations.

OCR from Table Document Image

Install the LayoutParser and its dependencies. In addition, install an OCR engine. Here, we use the TesseractOCR engine to recognize text and its location.

 %%bash
 pip install -U layoutparser
 pip install layoutparser[ocr]      
 sudo apt install tesseract-ocr
 sudo apt install libtesseract-dev

Import the necessary libraries and modules.

 import layoutparser as lp
 import numpy as np
 import pandas as pd
 import matplotlib.pyplot as plt
 import matplotlib
 %matplotlib inline
 import cv2

Load the pre-trained TesseractOCR engine.

model = lp.TesseractAgent()

Prepare data from the source code. Download the source files from the source repository and change the directory to denote the example images path.

 !git clone https://github.com/Layout-Parser/layout-parser.git
 %cd /content/layout-parser/examples/data/
 !ls -p

Read the image and display it to have an idea of how it looks.

 image = cv2.imread('example-table.jpeg')
 # display image
 plt.figure(figsize=(12,16))
 plt.imshow(image)
 plt.xticks([])
 plt.yticks([])
 plt.show()

Output:

Detect text characters with the OCR engine. Collect the text along with its bounding box details for plotting and post-processing.

 res = model.detect(image, return_response=True)
 # collect text and its bounding boxes
 ocr  = model.gather_data(res, lp.TesseractFeatureType(4))

Plot the original image along with bounding boxes on recognized texts.

 lp.draw_text(image, ocr, font_size=12, with_box_on_text=True,
              text_box_width=1)

Output:

We can recognize that the output texts are reproduced with Engine-specified fonts and sizes. Thus the system has recognized texts and their locations precisely. Further, we can post-process these texts in a column-wise manner or row-wise manner as per need.

This Colab Notebook contains the above example code implementations.

Wrapping Up

In this article, we have discussed the open-source LayoutParser library, its architecture and capabilities. Further, we discussed two practical use cases of Document Image Analysis with hands-on Python codes. With more inclusion of new models in the near future, LayoutParser will get a prominent place in Document Image Analysis.

References

📣 Want to advertise in AIM? Book here

Rajkumar Lakshmanamoorthy

A geek in Machine Learning with a Master's degree in Engineering and a passion for writing and exploring new things. Loves reading novels, cooking, practicing martial arts, and occasionally writing novels and poems.