Luận án tiến sĩ: Nâng cao hiệu quả phát hiện công thức toán học trong ảnh văn bản

Luận án tiến sĩ nghiên cứu nghiên cứu cải thiện kết quả phát hiện công thức toán học trong ảnh văn bản, phát triển phương pháp mới, đánh giá hiệu quả ứng dụng trong lĩnh vực toán

Trường đại học

Hanoi University of Science and Technology

Chuyên ngành

Computer Science

Người đăng

Ẩn danh

Thể loại

doctoral dissertation

2021

154

Phí lưu trữ

45 Point

Mục lục chi tiết

DECLARATION OF AUTHORSHIP

ACKNOWLEDGEMENT

ABSTRACT

1. CONTENT

1.1. DECLARATION OF AUTHORSHIP

1.2. LIST OF TABLES

1.3. LIST OF FIGURES

1.4. Objectives of the thesis

1.5. Introduction of the ME detection and recognition

1.5.1. Introduction of MEs

1.5.2. Introduction of ME detection

1.5.3. Introduction of ME recognition

1.6. Contributions of this thesis

1.7. Structure of this thesis

1.8. ME detection methods in document images

1.8.1. Rule-based detection

1.8.2. Handcrafted feature extraction methods for the ME detection

1.8.3. Deep neural network for ME detection

1.8.3.1. Deep neural networks

1.8.3.2. Deep neural network models for ME detection

1.8.4. Traditional approaches for ME recognition

1.8.5. Neural network approaches for ME recognition

1.8.6. Datasets and evaluation metrics

1.8.7. Existing systems for ME recognition

1.8.8. Summary of the chapter

2. THE DETECTION OF MEs USING THE LATE FUSION OF HANDCRAFTED AND DEEP LEARNING FEATURES

2.1. Overview of the proposed method

2.2. Handcrafted feature extraction for ME detection

2.2.1. Handcrafted feature extraction for isolated ME detection

2.2.2. Handcrafted feature extraction for inline ME detection

2.3. Deep learning method for ME detection

2.4. Late fusion of handcrafted and deep learning features for ME detection

2.5. Post-processing for ME detection

2.6. Performance evaluation of the detection of MEs using different machine learning algorithms

2.7. Performance evaluation of the detection of MEs using the fusion of handcrafted and deep learning features with different operations

2.8. Performance evaluation of the detection of isolated and inline MEs on different public datasets

2.9. Evaluation of the impact of image resolution on the ME detection

2.10. Evaluation of the impact of the post-processing

2.11. Visualization of extracted features of images using the handcrafted and deep learning feature approaches

2.12. Error analysis and discussion

2.13. Measurement of execution time

2.14. Summary of the chapter

3. THE DETECTION OF MEs USING THE COMBINATION OF THE DISTANCE TRANSFORM AND FASTER R-CNN

3.1. Overview of the proposed method for ME detection using the DT and the Faster R-CNN

3.2. The detection of MEs using the DT and the Faster R-CNN

3.2.1. Distance transform of document image

3.2.2. ME detection using a Faster R-CNN

3.2.3. Region proposal network

3.2.4. Fully connected detection network

3.3. Loss function of the training Faster R-CNN

3.3.1. Loss function of the training process of Faster R-CNN

3.4. Evaluation of the impact of the DT and anchor box generation to the performance of the ME detection

3.5. Comparison of Faster R-CNN models in ME detection

3.6. Comparison of the proposed and state-of-the-art methods used in ME detection

3.7. Performance comparison of the proposed method on cross datasets

3.8. Illustration of feature extraction of the Resnet-50

3.9. Error analysis and discussion

3.10. Measurement of execution time

3.11. Summary of the chapter

4. THE DETECTION AND RECOGNITION OF MEs IN DOCUMENT IMAGES

4.1. Overview of the proposed system for the detection and recognition of MEs

4.2. ME recognition using the WAP network

4.2.1. Watcher module of the WAP network

4.2.2. Parser module of the WAP network

4.3. Training the WAP network

4.4. Performance evaluation of the detection and recognition of MEs

4.5. Error analysis and discussion

4.6. Measurement of execution time

4.7. Summary of the chapter

ABBREVIATIONS

LIST OF TABLES

LIST OF FIGURES

Tóm tắt

I. Giới thiệu về phát hiện công thức toán học

Công thức toán học (MEs) đóng vai trò quan trọng trong các tài liệu khoa học. Việc phát hiện và nhận diện MEs trong ảnh văn bản là bước thiết yếu cho quá trình số hóa tài liệu. Phát hiện công thức nhằm xác định vị trí của các biểu thức trong tài liệu, trong khi nhận diện công thức chuyển đổi các biểu thức từ định dạng hình ảnh sang chuỗi. MEs được phân loại thành hai loại: biểu thức tách biệt và biểu thức nội tuyến. Biểu thức tách biệt hiển thị trên một dòng riêng, trong khi biểu thức nội tuyến được trộn lẫn với các thành phần khác. Độ chính xác trong việc phát hiện các biểu thức tách biệt đã được cải thiện dần, tuy nhiên, việc phát hiện các biểu thức nội tuyến vẫn là một thách thức lớn. Độ chính xác của việc phát hiện ảnh hưởng trực tiếp đến độ chính xác của việc nhận diện. Nếu phát hiện sai, sẽ dẫn đến lỗi trong nhận diện MEs.

1.1. Tầm quan trọng của công thức toán học trong tài liệu khoa học

Công thức toán học là thành phần không thể thiếu trong các tài liệu khoa học, giúp diễn đạt các khái niệm phức tạp một cách rõ ràng và chính xác. Việc số hóa các tài liệu này không chỉ giúp bảo tồn kiến thức mà còn tạo điều kiện thuận lợi cho việc nghiên cứu và học tập. Nhu cầu về việc phát hiện và nhận diện MEs trong tài liệu ngày càng tăng, đặc biệt trong bối cảnh số hóa tài liệu đang diễn ra mạnh mẽ. Các công nghệ như công nghệ OCR và công nghệ nhận diện hình ảnh đã được áp dụng để cải thiện hiệu quả phát hiện công thức toán học.

II. Phương pháp phát hiện công thức toán học

Nghiên cứu này đề xuất ba đóng góp chính trong việc phát hiện và nhận diện MEs trong ảnh tài liệu khoa học. Đầu tiên, một phương pháp lai hai giai đoạn được đề xuất để phát hiện MEs hiệu quả. Giai đoạn đầu tiên là phân tích bố cục của toàn bộ ảnh tài liệu nhằm cải thiện độ chính xác của việc phân đoạn dòng văn bản và từ. Giai đoạn thứ hai, cả MEs tách biệt và nội tuyến trong ảnh tài liệu được phát hiện. Các đặc trưng được trích xuất từ cả phương pháp thủ công và học sâu nhằm cải thiện độ chính xác phát hiện. Phương pháp thủ công sử dụng Biến đổi Fourier nhanh (FFT) cho ảnh dòng văn bản để phát hiện MEs tách biệt, trong khi các tham số Gaussian của hồ sơ chiếu được áp dụng cho việc phát hiện MEs nội tuyến.

2.1. Phương pháp lai trong phát hiện MEs

Phương pháp lai kết hợp giữa các đặc trưng thủ công và học sâu nhằm tối ưu hóa độ chính xác phát hiện. Các mạng nơ-ron tích chập (CNN) như AlexNet và ResNet đã được tối ưu hóa cho việc phát hiện MEs. Việc kết hợp các đặc trưng thủ công và học sâu dựa trên điểm dự đoán đã cho thấy hiệu quả cao trong việc phát hiện MEs mà không cần sử dụng nhận diện ký tự. Điều này cho phép hệ thống hoạt động trực tiếp trên ảnh MEs mà không cần qua bước nhận diện ký tự, từ đó nâng cao hiệu quả phát hiện.

III. Hệ thống phát hiện và nhận diện công thức toán học

Hệ thống được đề xuất tích hợp cả phát hiện và nhận diện MEs trong ảnh tài liệu. Các MEs trong ảnh tài liệu được phát hiện và nhận diện, với kết quả nhận diện được biểu diễn bằng Latex. Ứng dụng này nhằm hỗ trợ người dùng cuối trong việc sử dụng phát hiện và nhận diện MEs trong ảnh tài liệu một cách thuận tiện. Việc tích hợp này không chỉ giúp cải thiện độ chính xác mà còn tạo ra một quy trình làm việc liền mạch cho người dùng. Hệ thống đã được thử nghiệm trên hai tập dữ liệu công khai (Marmot và GTDB) với độ chính xác đạt được cho các biểu thức tách biệt và nội tuyến lần lượt là 92.90% và 91%. So sánh hiệu suất với các phương pháp truyền thống cho thấy tính hiệu quả của phương pháp đề xuất.

3.1. Tích hợp phát hiện và nhận diện

Việc tích hợp phát hiện và nhận diện MEs trong một hệ thống duy nhất mang lại nhiều lợi ích. Hệ thống không chỉ giúp phát hiện chính xác các MEs mà còn chuyển đổi chúng thành định dạng có thể sử dụng được. Điều này rất quan trọng trong việc số hóa tài liệu khoa học, nơi mà việc truy cập và sử dụng thông tin nhanh chóng là rất cần thiết. Hệ thống đã chứng minh được khả năng hoạt động hiệu quả trong môi trường thực tế, hỗ trợ người dùng trong việc tìm kiếm và sử dụng thông tin toán học một cách dễ dàng.

25/01/2025

Bạn đang xem trước tài liệu:

Luận án tiến sĩ nghiên cứu nâng cao hiệu quả phát hiện công thức toán học trong ảnh văn bản

Tải đầy đủ

Trích đoạn nội dung tài liệu

MINISTRY OF EDUCATION AND TRAINING HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY BUI HAI PHONG ENHANCING PERFORMANCE OF MATHEMATICAL EXPRESSION DETECTION IN SCIENTIFIC DOCUMENT IMAGES DOCTORAL DISSERTATION IN COMPUTER SCIENCE Hanoi−2021 luan an MINISTRY OF EDUCATION AND TRAINING HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY BUI HAI PHONG ENHANCING PERFORMANCE OF MATHEMATICAL EXPRESSION DETECTION IN SCIENTIFIC DOCUMENT IMAGES Major: Computer Science Code: 9480101 DOCTORAL DISSERTATION IN COMPUTER SCIENCE SUPERVISORS: 1. Hoang Manh Thang 2. Le Thi Lan Hanoi−2021 luan an DECLARATION OF AUTHORSHIP I, Bui Hai Phong, declare that the thesis titled "Enhancing performance of mathe- matical expression detection in scientific document images" has been entirely composed by myself. I assure some points as follows: This work was done wholly or mainly while in candidature for a Ph.

research degree at Hanoi University of Science and Technology. The work has not be submitted for any other degree or qualifications at Hanoi University of Science and Technology or any other institutions. Appropriate acknowledge has been given within this thesis where reference has been made to the published work of others. The thesis submitted is my own, except where work in the collaboration has been included.

The collaborative contributions have been clearly indicated. Hanoi, September, 2021 PhD Student SUPERVISORS 1. Hoang Manh Thang 2. Le Thi Lan i luan an ACKNOWLEDGEMENT I decided to pursue a PhD.

in Computer Science at MICA International Research Institute, Hanoi University of Science and Technology (HUST) in 2017. It has been one of the best decisions I could have made. HUST is a really special place where I have accumulated immense knowledge. I would like to thank Executive Board and all members of MICA Research Institute, HUST for the kind support in the PhD.

I wish to express my deepest gratitude to my supervisors Assoc. Hoang Manh Thang and Assoc. Le Thi Lan for their continuous instruction, advice and support in the PhD course. The thesis cannot be fulfilled without the specific direction of my supervisors.

I wish to thank all members of Computer Vision Department, MICA Research Institute, HUST for the frequent support in the PhD. I wish to thank Executive Board and all members of School of Graduate Education; School of School of Electronics and Telecommunications and School of Information and Communication Technology, HUST for the specific comments and suggestion for the thesis. I wish to thank all members of Faculty of Information Technology, Hanoi Archi- tectural University for the support in the professional work in the completion of the PhD. I wish to thank Professor Akiko Aizawa and members of Aizawa Laboratory, Na- tional Institute of Informatics, Tokyo, Japan where I have obtained many scientific experiences during the internship of the PhD.

I wish to thank anonymous reviewers for valuable comments during the completion of the PhD. I gratefully acknowledge the funding from SAHEP HUST project number T2020- SAHEP-008 and Domestic Master/ PhD Scholarship Programme of Vingroup Innova- tion Foundation 2019-2021. I wish to express my sincere gratitude to my family and friends for the continuous support and encouragement in the completion of the PhD. Student ii luan an ABSTRACT Mathematical expressions (MEs) play an important role in scientific documents and a huge number of scientific documents have been produced over years.

Therefore, the demand of document digitization for researching and studying purposes has contin- uously increased. Detection and recognition of MEs in documents are considered as essential steps for document digitization. The detection of expressions aims to locate the position of expressions within documents. Meanwhile, the recognition of MEs aims at converting expressions from image format to string.

In the documents, mathematical expressions are classified in two categories: isolated (displayed) and inline (embedded) expressions. An isolated expression displays in a separate line, an inline expression is mixed with other components (texts). Mathematical expressions may consist of math- ematical operators (e. Large expressions may consist of multiple text lines.

Meanwhile, small expressions may consist of one character. The accuracy of the detection of isolated expressions has been gradually improved. However, the detection of inline expressions is considered as a challenging task. In practice, the detection and recognition of MEs in document images are closely related.

The accuracy of the detection allows to obtain accuracy of the recognition. In contrast, the incorrect detection may cause errors in the recognition of MEs This thesis presents three main contributions in the detection and recognition of MEs in scientific document images: (1) First, a hybrid method of two stages has been proposed for the effective detection of MEs. At first stage, the layout analysis of entire document images is introduced to improve the accuracy of text line and word segmentation. At second stage, both isolated and inline MEs in document images are detected.

Both hand-crafted and deep learning features are extensively investigated and combined to improve the detection accuracy. In the handcrafted feature extraction approach, the Fast Fourier Transform (FFT) is applied for text line images for the detection of isolated MEs. The Gaussian parameters of projection profile are applied as the feature extraction for the detection of inline MEs. After the feature extraction, various machine learning classifiers have been fine tuned for the detection.

In the deep learning approach, the CNNs (Alexnet and ResNet) have been optimized for the detection of MEs. The fusion of handcrafted and deep learning features based on the prediction scores has been applied. The merit of the method is that it can operate directly on the ME images without the employment of character recognition. (2) Second, an end-to-end framework for mathematical expression detection in sci- iii luan an entific document images is proposed without using any Optical Character Recognition (OCR) or Document Analysis techniques as in conventional methods.

The distance transform is firstly applied for input document images in order to take advantages of the distinguished features of spatial layout of MEs. Then, the transformed images are fed into the Faster Region with Convolutional Neural Network (Faster R-CNN) that has been optimized to improve the accuracy of the detection. Specifically, the optimiza- tion and generation strategies of anchor boxes of the Region Proposal Network have been proposed to improve the accuracy of expression detection of various sizes. The proposed methods for the detection of MEs have been tested on two public datasets (Marmot and GTDB).

The obtained accuracies of isolated and inline expressions in the Marmot dataset are 92.90% while those in the GTDB dataset are 91. The performance comparison with conventional methods shows the effectiveness of the proposed method. (3) Finally, the detection and recognition of MEs have been integrated in a system. The MEs in document images have been detected and recognized.

The recognition results are represented in Latex. The application aims to support end users to use the detection and recognition of MEs in document images conveniently. Student iv luan an CONTENTS DECLARATION OF AUTHORSHIP. viii LIST OF TABLES.

xi LIST OF FIGURES. Objectives of the thesis. Introduction of the ME detection and recognition. Introduction of MEs.

Introduction of ME detection. Introduction of ME recognition. Contributions of this thesis. Structure of this thesis.

ME detection methods in document images. Rule-based detection. Handcrafted feature extraction methods for the ME detection. Deep neural network for ME detection.

Deep neural networks. Deep neural network models for ME detection. Traditional approaches for ME recognition. Neural network approaches for ME recognition.

Datasets and evaluation metrics. Existing systems for ME recognition. Summary of the chapter. THE DETECTION OF MEs USING THE LATE FUSION OF HANDCRAFTED AND DEEP LEARNING FEATURES.

Overview of the proposed method. Handcrafted feature extraction for ME detection. Handcrafted feature extraction for isolated ME detection. Handcrafted feature extraction for inline ME detection.

Deep learning method for ME detection. Late fusion of handcrafted and deep learning features for ME detection. Post-processing for ME detection. Performance evaluation of the detection of MEs using different machine learning algorithms.

Performance evaluation of the detection of MEs using the fusion of hand- crafted and deep learning features with different operations. Performance evaluation of the detection of isolated and inline MEs on dif- ferent public datasets. Evaluation of the impact of image resolution on the ME detection. Evaluation of the impact of the post-processing.

Visualization of extracted features of images using the handcrafted and deep learning feature approaches. Error analysis and discussion. Measurement of execution time. Summary of the chapter.

THE DETECTION OF MEs USING THE COMBINATION OF THE DISTANCE TRANSFORM AND FASTER R-CNN. Overview of the proposed method for ME detection using the DT and the Faster R-CNN. The detection of MEs using the DT and the Faster R-CNN. Distance transform of document image.

ME detection using a Faster R-CNN. Region proposal network. Fully connected detection network. 86 vi luan an 3.

Loss function of the training Faster R-CNN. Loss function of the training process of Faster R-CNN. Evaluation of the impact of the DT and anchor box generation to the per- formance of the ME detection. Comparison of Faster R-CNN models in ME detection.

Comparison of the proposed and state-of-the-art methods used in ME detection 95 3. Performance comparison of the proposed method on cross datasets. Illustration of feature extraction of the Resnet-50. Error analysis and discussion.

Measurement of execution time. Summary of the chapter. THE DETECTION AND RECOGNITION OF MEs IN DOCUMENT IMAGES. Overview of the proposed system for the detection and recognition of MEs.

ME recognition using the WAP network. Watcher module of the WAP network. Parser module of the WAP network. Training the WAP network.

Performance evaluation of the detection and recognition of MEs. Error analysis and discussion. Measurement of execution time. Summary of the chapter.

125 vii luan an ABBREVIATIONS No. Abbreviation Meaning 1 CNN Convolutional Neural Network 2 DT Distance Transform 3 ExpRate Expression Error Rate 4 FFT Fast Fourier Transform 5 Faster R-CNN Faster Regions Convolutional Neural Network 6 GRU Gated Recurrent Unit 7 HOG Histogram of Oriented Gradients 8 HPP Horizontal Projection Profile 9 IoU Intersection over Union 10 kNN k-Nearest Neighbour 11 LSTM Long-Short Term Memory 12 Mask R-CNN Mask Region with Convolutional Neural Network 13 ME Mathematical Expression 14 OCR Optical Charater Recognition 15 ResNet Residual Neural Network 16 RF Random Forest 17 RNN Recurrent Neural Network 18 ROIs Region of Interests 19 RPN Region Proposal Network 20 SSD Single Shot Detector 21 SVM Support Vector Machine 22 t-SNE t- Distributed Stochastic Neighbor Embedding 23 VPP Vertical Projection Profile 24 WAP Watcher Attend Parser Neural Network 25 WER Word Error Rate 26 YOLO You Only Look One viii luan an LIST OF TABLES 1.1 Results of document analysis of participating methods in competition 2019 13 1.2 Summary of significant handcrafted features for isolated ME detection .3 Summary of significant handcrafted features for inline ME detection .4 Milestones in the development of DNNs .5 Parameters of Alexnet .6 Parameters of Resnet18 .7 Statistic of the Marmot and GTDB datasets .1 Features of VPP of variable and word images in Figure 2.2 Comparison of VPP features between italic and non-italic styles of char- acter "a" of Arial font .3 Alexnet architecture and layer parameters .4 ResNet-18 architecture and layer parameters .5 Performance comparison on isolated expression detection on the Marmot dataset using different machine learning algorithms (highest scores are in bold) .6 Performance comparison on inline expression detection on the Marmot dataset using different machine learning algorithms (highest scores are in bold) .7 Performance comparison on isolated expression detection on the Marmot dataset using different fusion techniques (highest scores are in bold) .8 Performance comparison on inline expression detection on the Marmot dataset using different fusion techniques (highest scores are in bold) .

Nội dung được bảo vệ bản quyền — Tải xuống đầy đủ

Luận án tiến sĩ mang tiêu đề "Nâng cao hiệu quả phát hiện công thức toán học trong ảnh văn bản" của tác giả Bùi Hải Phong, dưới sự hướng dẫn của PGS. Hoàng Mạnh Thắng và PGS. Lê Thị Lan, được thực hiện tại Trường Đại Học Bách Khoa Hà Nội vào năm 2021. Bài luận án này tập trung vào việc cải thiện khả năng nhận diện các công thức toán học trong hình ảnh văn bản, một vấn đề quan trọng trong lĩnh vực xử lý ảnh và trí tuệ nhân tạo. Những cải tiến trong phương pháp phát hiện này không chỉ giúp nâng cao độ chính xác mà còn mở ra nhiều ứng dụng thực tiễn trong giáo dục và nghiên cứu khoa học.

Để mở rộng thêm kiến thức về các chủ đề liên quan, bạn có thể tham khảo các tài liệu sau:

Luận Văn Về Toán Tử Tuyến Tính Không Bị Chặn - Nghiên cứu về các toán tử tuyến tính, có liên quan đến các phương pháp toán học trong xử lý dữ liệu.
Luận án tiến sĩ về bài toán tối ưu không lồi và ứng dụng của các thuật toán - Tài liệu này cung cấp cái nhìn sâu sắc về các thuật toán tối ưu, có thể áp dụng trong việc phát hiện công thức toán học.
Luận án tiến sĩ về luật số lớn trong mảng nhiều chiều và mảng tam giác của biến ngẫu nhiên đa trị - Nghiên cứu này có thể bổ sung thêm kiến thức về các phương pháp thống kê và xác suất, hỗ trợ cho việc phát triển các mô hình nhận diện công thức toán học.

Những tài liệu này sẽ giúp bạn có cái nhìn toàn diện hơn về các phương pháp và ứng dụng trong lĩnh vực toán học và công nghệ thông tin.

#trí tuệ nhân tạo

#nâng cao hiệu quả

#xử lý ảnh

#phát hiện công thức toán học

#ảnh văn bản

#nhận diện ký hiệu toán học

Chủ đề

Công nghệ nhận diện hình ảnh

Ứng dụng trí tuệ nhân tạo trong giáo dục

Phát triển thuật toán trong toán học

Xử lý và phân tích dữ liệu hình ảnh