Document Analysis and Processing (DAP)

 

Initiated over 25 years ago, this research project focuses on the proposal and development of algorithms for extracting information from digital documents, such as texts, images and videos. The main objective is to develop document analysis techniques that can be applied to a wide range of fields, such as finance, law, medicine, archiving, education, among others. Particular attention is given to images from ancient and medical documents. The research themes of this project represent a continuation of the work already launched since the creation of the laboratory. To this end, we will continue to use natural language processing (NLP), optical character recognition (OCR) and computer vision techniques to extract relevant information from various types of documents. We will also explore machine learning and deep learning techniques to improve algorithm accuracy and the ability to adapt to new or unknown documents. In addition, high priority has been given to the development of fast, accurate and robust approaches for the automatic image analysis of archival documents (AIDA), and more specifically to the analysis of their structure or layout, transcription and indexing.

Word retrieval is an important task for understanding and exploiting document content by creating indexes. It is an information retrieval technique that aims to identify all occurrences of a query word in a set of documents (e.g., a book). In the word retrieval task, the input is a set of non-indexed documents and the output is a list of words ranked according to their similarity to the query word. This enables quick and easy online access to cultural heritage documents, and opens up further possibilities for studying these resources. In this project, we aim to improve word retrieval performance by developing a generative conditional model based on an adversarial network to generate clean document images from highly degraded images. This enhancement model deals with various degradation tasks such as watermarking and chemical degradation, with the aim of producing hyper-clean document images and fine detail retrieval performance.

Transcribing archival documents, particularly Arabic manuscripts, has remained a tedious and costly task for many years (often carried out manually by administrative staff or archivists). Three major contributions are included in this project. The first is a new method based on a deep UNet architecture adapted to identify the central part of each line of text. The second contribution concerns the proposal of a method based on a deep encoder-decoder architecture. The encoder consists mainly of five octave layers. The decoder consists of a succession of five layers of recurrent neural networks preceded by a layer integrating the deep self-attention mechanism. The third method is based on a deep encoder-decoder architecture, the encoder part of which is mainly based on the fusion of the skip connection technique and the gated mechanism.

The aim of this project is to extract information from visually rich documents (VRDs) using Graph Neural Networks (GNNs). RNGs were chosen because they excel at capturing relationships and dependencies between different components. This is particularly useful for DVRs, where components such as text units and encompassing frames have complex relationships. To this end, a first model has been proposed based on a node classification approach for extracting information from visually rich documents (VRDs) using a graph-based representation. The approach uses a weighted graph representation of VRDs, where node features are based on spatial, textural and visual characteristics extracted from the VRD, and node neighbors are chosen based on a customized edge weight. The document graph is then fed into a multi-layer graph convolutional network (GCN) for node classification, which is able to efficiently focus on important neighboring nodes.

In this project, we propose to study adversarial attacks in situations relevant to real-world contexts by examining both sides of the issue: the ways in which an adversary can attack an implementation of a deep neural network, such as the sensors in an autonomous vehicle, and the ways in which these sensors can be hardened against adversarial attacks to ensure a reliable outcome. Recent implementations of machine learning-based applications tend to use multi-view and multi-modal data to accomplish their task, as single-view detectors show their limitations, particularly in the face of challenges such as occlusions. However, the majority of research on adversarial attacks focuses on single-view detectors, and adversarial attacks in a multi-view context remain a relatively understudied topic. One of the first aims of our work was to carry out an exploratory study into the transferability of patch-based adversarial attacks to different views of the same scene. This study showed that view angles have a significant effect on the performance of adversarial attacks, which has an impact on our objective: to propose an attack that can target a multi-view object detector. Initial findings show that for an adversarial attack to succeed, the attacker must simultaneously target multiple views. This multi-view adversarial attack will enable us to gain insight into the vulnerabilities specific to a multi-view context, leading us to our next objective: the implementation of an adversarial defense guaranteeing a highly robust deep neural network.

Atlas-based segmentation is a high-level segmentation technique that has become a standard paradigm for exploiting a priori knowledge in image segmentation. Different regions of the human body detected on medical imaging such as the brain or the female pelvic region, for example, are known to be anatomically complex and of high variability from patient to patient, making the task of segmentation using low-level segmentation techniques difficult. An atlas-based automatic segmentation approach using online learning has been developed. It was first applied to the segmentation of the human cerebellum from 2D brain MRI images. In a second step, the approach was applied to the segmentation of local regions likely to be affected by cervical cancer from 3D female pelvic MRI images. The proposed segmentation approach is based on a novel registration technique that uses a hybrid optimization procedure based on a particular genetic algorithm design combined with gradient descent in a multi-resolution strategy. The atlases used in this work were made available to us progressively in sequential order. The proposed approach is therefore based on an on-line machine learning method for the construction of the atlas base and for the segmentation process.

The aim is to develop a digital model of the thorax, with the necessary meshing and simulation of the electrical impedance tomography (EIT) system on which image reconstruction is based. EIT scans the organ by passing an electrical current through the body, detecting it on the skin with an electrode belt and generating electrical impedance measurements. TIE is used to detect lung infections, breast cancer, etc., and is a non-invasive type of medical imaging. TIE for lung monitoring is based on the repeated measurement of surface voltages resulting from a rotating injection of high-frequency, low-intensity alternating current flowing between electrodes located around the chest. During monitoring, cyclic injections of electrical currents are performed sequentially, usually between all pairs of adjacent electrodes. Structural information from the human chest reflects the actual shape of the lungs and thorax. Several mesh sizes have been studied, leading to a specific number of nodes and elements in the mesh. The choice of the best mesh size was based on the smooth contour of the lung that was obtained. This method was used to study conductivity distribution within a section of the human thorax by varying the current injection pattern.

The main aim of the work carried out in this project is to implement a histological image analysis system capable of meeting all the constraints and difficulties of analyzing this type of large scene in the field of medicine. The approach, based on incremental phenotyping, will make it possible to analyze a WSI image and then obtain precise, targeted information. The system to be developed will support clinicians in the analysis and diagnosis of histological images. The system will assist medical experts in the morphological recognition of different cell types, enabling quantification and morphometric analysis (e.g. dimensions, circularity, texture, etc.) from digitized histopathology slides, with the aim of obtaining reliable quantitative data enabling correlation studies with the various issues raised by clinicians (diagnostic, prognostic and predictive). The aim of the project is to propose a system to assist clinicians in the analysis and diagnosis of histological images. This system will be integrated as an essential assistant in the automated professional workflows handled by clinicians.