Hanoi University of Science and Technology School of Information and Communication Technology 7 D Master Thesis in Data Science Unified Deep Neural Networks for Anatomical Site Classification and Lesion Segmentation for Upper Gastrointestinal Endoscopy NGUYEN DUY MANH manh.nd202657mQ@sis.vn Supervisor: Dr. Tran Vinh Duc Hanoi 10-2022 Author’s Declaration Thereby declare that I am the sole author of this Uhesis. The results in this work are not complete copies of any other works. STUDENT Nguyen Duy Manh Contents Contents Abstract List of Figures List.
of Tahles List of Acronyms 1) Introduction oe 1.1 General intraduetion " + + L2 Objectives dd 1¬". ee ww 14 Qutlie of the thesis 2 Artificial Intelligence aud Machine Learning we 2.1 Basia concepts 22 Types ofleamming 2.8 Reinforcement learning soot 2.1 Decp Learning und Neural Networks 2. ee 10 2314 Recurrent Neural Network 11 2.15 Deep Convolutional Network 231.6 ‘Ivaining a Neural Network 2. 11 Convolnrional Neural Network 12 2.2 The convolution operation.8 A Squeeze-and-Excitation block [1.9 Overview comparison between FPN and FaPN [15] 38 3.10 Feature alignment module [15] 2.00000 ee eee eee 39 3.11 Feature selection module [I5] .1 Demostration of upper GI 4.
Some samples in anatomical dataset 4.3 Some samples in lesion dataset SE 4.4 Some samples in HP dataset.6 Leaming rate in training phase 46 4.7 EndoUnet - Confusion matrix on anatomical site classification task GA NGÌu sea H Hướng a ew aes we SB RS eR oe 49 4.8 SFMNet - Confusion matrix on anatomical site classification task on a fold.9 Confusion matrices on lesion classification task on a fold.10 Some examples of the lesion segmentation task.24 Activation function 2325 Poolimg 2.3 Fully convolutional network 2.4 Some common canvolntional network architectures 2341 VO.2 Transformers for Vision 2. Multi-task learning Transfer lourning cv. ee Avoid overfirting 3 Mcthodology al FindoUNet .1 Overall architecture : BQ Bucoder00. Fee Compact generalized non-local module.
Squeeze and excitation module .5 Feature-atigned pyramid network 3. " An R 33 Metrics and loss functions 34 Multiusk truining 6 ee 4 Experhnents 41 Datasets. 42 Data preprocessing and data augmentation 2. ee AQ Implementation details.
4ã 44 Experimental resulla ee 46 6 Conclusion and future work ñ1 3.8 A Squeeze-and-Excitation block [1.9 Overview comparison between FPN and FaPN [15] 38 3.10 Feature alignment module [15] 2.00000 ee eee eee 39 3.11 Feature selection module [I5] .1 Demostration of upper GI 4. Some samples in anatomical dataset 4.3 Some samples in lesion dataset SE 4.4 Some samples in HP dataset.6 Leaming rate in training phase 46 4.7 EndoUnet - Confusion matrix on anatomical site classification task GA NGÌu sea H Hướng a ew aes we SB RS eR oe 49 4.8 SFMNet - Confusion matrix on anatomical site classification task on a fold.9 Confusion matrices on lesion classification task on a fold.10 Some examples of the lesion segmentation task. 50 Abstract Image Processing is a subfield of computer vision concerned with comprehending and extracting data from digital images. ‘There are several applications for image processing in various fields, including face recognition, optical character recognition, manufacturing automation inspection, medical diagnostics, and tasks connected to autonomous vehicles, such as pedestrian detection.
In recent years, the deep neural network has become one of the most popular image processing approaches due to a number of significant advancements. The use of machine learning in biomedical applications can be structured into three main orientations: (1) as a computer-aided diagnosis to help the physicians for an efficient and early diagnosis, with a better harmonization and less contradictory diagnosis; (2) to enhance the medical care of patients with better-personalized ther- apies; and (3) to improve the human wellbeing, for example by analyzing the spread of dis ‘ase and social behaviors in relation to environmental factors [I]. In this work, I propose to construct the models for the first orientation that is capable of handling multiple simultaneous tasks pertaining to the upper gastrointestinal (G1) tract. On a dataset of 11469 endoscopic images, the models were evaluated and produced relatively positive results.8 A Squeeze-and-Excitation block [1.9 Overview comparison between FPN and FaPN [15] 38 3.10 Feature alignment module [15] 2.00000 ee eee eee 39 3.11 Feature selection module [I5] .1 Demostration of upper GI 4.
Some samples in anatomical dataset 4.3 Some samples in lesion dataset SE 4.4 Some samples in HP dataset.6 Leaming rate in training phase 46 4.7 EndoUnet - Confusion matrix on anatomical site classification task GA NGÌu sea H Hướng a ew aes we SB RS eR oe 49 4.8 SFMNet - Confusion matrix on anatomical site classification task on a fold.9 Confusion matrices on lesion classification task on a fold.10 Some examples of the lesion segmentation task. 50 List of Acronyms GI Gaatrointestinal Helicobacter Pylori AT Artificial Tntelligence ML Machine Learning DI. Teep Learning, NN Neural Network DNN Theep Nenral Network CNN Convolutional Neural Network RNN Recurrent Neural Network MTL Maltictask Learning nL Reinforcement. Learning Abstract Image Processing is a subfield of computer vision concerned with comprehending and extracting data from digital images.
‘There are several applications for image processing in various fields, including face recognition, optical character recognition, manufacturing automation inspection, medical diagnostics, and tasks connected to autonomous vehicles, such as pedestrian detection. In recent years, the deep neural network has become one of the most popular image processing approaches due to a number of significant advancements. The use of machine learning in biomedical applications can be structured into three main orientations: (1) as a computer-aided diagnosis to help the physicians for an efficient and early diagnosis, with a better harmonization and less contradictory diagnosis; (2) to enhance the medical care of patients with better-personalized ther- apies; and (3) to improve the human wellbeing, for example by analyzing the spread of dis ‘ase and social behaviors in relation to environmental factors [I]. In this work, I propose to construct the models for the first orientation that is capable of handling multiple simultaneous tasks pertaining to the upper gastrointestinal (G1) tract.
On a dataset of 11469 endoscopic images, the models were evaluated and produced relatively positive results. Abstract Image Processing is a subfield of computer vision concerned with comprehending and extracting data from digital images. ‘There are several applications for image processing in various fields, including face recognition, optical character recognition, manufacturing automation inspection, medical diagnostics, and tasks connected to autonomous vehicles, such as pedestrian detection. In recent years, the deep neural network has become one of the most popular image processing approaches due to a number of significant advancements.
The use of machine learning in biomedical applications can be structured into three main orientations: (1) as a computer-aided diagnosis to help the physicians for an efficient and early diagnosis, with a better harmonization and less contradictory diagnosis; (2) to enhance the medical care of patients with better-personalized ther- apies; and (3) to improve the human wellbeing, for example by analyzing the spread of dis ‘ase and social behaviors in relation to environmental factors [I]. In this work, I propose to construct the models for the first orientation that is capable of handling multiple simultaneous tasks pertaining to the upper gastrointestinal (G1) tract. On a dataset of 11469 endoscopic images, the models were evaluated and produced relatively positive results. Abstract Image Processing is a subfield of computer vision concerned with comprehending and extracting data from digital images.
‘There are several applications for image processing in various fields, including face recognition, optical character recognition, manufacturing automation inspection, medical diagnostics, and tasks connected to autonomous vehicles, such as pedestrian detection. In recent years, the deep neural network has become one of the most popular image processing approaches due to a number of significant advancements. The use of machine learning in biomedical applications can be structured into three main orientations: (1) as a computer-aided diagnosis to help the physicians for an efficient and early diagnosis, with a better harmonization and less contradictory diagnosis; (2) to enhance the medical care of patients with better-personalized ther- apies; and (3) to improve the human wellbeing, for example by analyzing the spread of dis ‘ase and social behaviors in relation to environmental factors [I]. In this work, I propose to construct the models for the first orientation that is capable of handling multiple simultaneous tasks pertaining to the upper gastrointestinal (G1) tract.
On a dataset of 11469 endoscopic images, the models were evaluated and produced relatively positive results. List of Acronyms GI Gaatrointestinal Helicobacter Pylori AT Artificial Tntelligence ML Machine Learning DI. Teep Learning, NN Neural Network DNN Theep Nenral Network CNN Convolutional Neural Network RNN Recurrent Neural Network MTL Maltictask Learning nL Reinforcement. Learning Abstract Image Processing is a subfield of computer vision concerned with comprehending and extracting data from digital images.
‘There are several applications for image processing in various fields, including face recognition, optical character recognition, manufacturing automation inspection, medical diagnostics, and tasks connected to autonomous vehicles, such as pedestrian detection. In recent years, the deep neural network has become one of the most popular image processing approaches due to a number of significant advancements. The use of machine learning in biomedical applications can be structured into three main orientations: (1) as a computer-aided diagnosis to help the physicians for an efficient and early diagnosis, with a better harmonization and less contradictory diagnosis; (2) to enhance the medical care of patients with better-personalized ther- apies; and (3) to improve the human wellbeing, for example by analyzing the spread of dis ‘ase and social behaviors in relation to environmental factors [I]. In this work, I propose to construct the models for the first orientation that is capable of handling multiple simultaneous tasks pertaining to the upper gastrointestinal (G1) tract.
On a dataset of 11469 endoscopic images, the models were evaluated and produced relatively positive results. List of Acronyms GI Gaatrointestinal Helicobacter Pylori AT Artificial Tntelligence ML Machine Learning DI. Teep Learning, NN Neural Network DNN Theep Nenral Network CNN Convolutional Neural Network RNN Recurrent Neural Network MTL Maltictask Learning nL Reinforcement. Learning References ñ2 List of Acronyms GI Gaatrointestinal Helicobacter Pylori AT Artificial Tntelligence ML Machine Learning DI.
Teep Learning, NN Neural Network DNN Theep Nenral Network CNN Convolutional Neural Network RNN Recurrent Neural Network MTL Maltictask Learning nL Reinforcement.24 Activation function 2325 Poolimg 2.3 Fully convolutional network 2.4 Some common canvolntional network architectures 2341 VO.2 Transformers for Vision 2. Multi-task learning Transfer lourning cv. ee Avoid overfirting 3 Mcthodology al FindoUNet .1 Overall architecture : BQ Bucoder00. Fee Compact generalized non-local module.
Squeeze and excitation module .5 Feature-atigned pyramid network 3. " An R 33 Metrics and loss functions 34 Multiusk truining 6 ee 4 Experhnents 41 Datasets. 42 Data preprocessing and data augmentation 2. ee AQ Implementation details.
4ã 44 Experimental resulla ee 46 6 Conclusion and future work ñ1 List of Tables al Detailed sevlings of MiT-B2 and MiT-B3. Al Number of images in each anatomical site and lighting mode 43 42 Accuracy comparison on the three classification taska. a7 43 Dive Score comparison on the segmentation task. 48 44 Number of parameters and speed of models + List of Figures Ra Reinforcement learning components.2 Relationship between AI, ML, and DL 7 2.
8 24 Mlustration of a deep learning model [2] .6 Architecture of a CN 13 27 Example of convolution operation [i]. 4 28 Sparse connectivity, viewed from below lỗ 2.9 Sparse connectivity, viewed from above [Đ] .10 Common activation functions [5] 16 211 Max pooling.13 Architecture of an FON [6].14 Architecture of VGGI6 [J].16 DenseNet architecture vs ResNet architecture [9] .18 Attention in Neural Machine Translation.19 The Transformer - model ai tecture [Il]. 220 Vision Transformer architecture [12] 221 Common form of multi-task learning [2] 26 2.22 The traditional supervised learning setup 6.1 Architecture of EndoUNet 31 3.2 VGG19-based shared block + 82 3.3 ResNet50-based shared bloek. c2 262 38 34 DenseNet121-based shared block 33 gã EndoUNet decoder configuration 34 3.7 Grouped compact generalized non-local (CGNL) module [13].24 Activation function 2325 Poolimg 2.3 Fully convolutional network 2.4 Some common canvolntional network architectures 2341 VO.