Enhancing licence plate recognition for a robust vehicle re-identification system
- Authors: Boby, Alden
- Date: 2024-10-11
- Subjects: Uncatalogued
- Language: English
- Type: Academic theses , Master's theses , text
- Identifier: http://hdl.handle.net/10962/464322 , vital:76501
- Description: Vehicle security is a growing concern for citizens of South Africa. Law enforcement relies on reports and security camera footage for vehicle identification but struggles to match the increasing number of carjacking incidents and low vehicle recovery rates. Security camera footage offers an accessible means to identify stolen vehicles, yet it often poses hurdles like anamorphic plates and low resolution. Furthermore, depending on human operators proves inefficient, requiring faster processes to improve vehicle recovery rates and trust in law enforcement. The integration of deep learning has revolutionised object detection algorithms, increasing the popularity of vehicle tracking for security purposes. This thesis investigates advanced deep-learning methods for a comprehensive vehicle search and re-identification system. It enhances YOLOv7’s algorithmic capabilities and employs preprocessing techniques like super-resolution and perspective correction via the Improved Warped Planar Object Detection network for more effective licence plate optical character recognition. Key contributions include a specifically annotated dataset for training object detection models, an optical character recognition model based on YOLOv7, and a method for identifying vehicles in unrestricted data. The system detected rectangular and square licence plates without prior shape knowledge, achieving a 98.7% character recognition rate compared to 95.31% in related work. Moreover, it outperformed traditional optical character recognition by 28.25% and deep-learning EasyOCR by 14.18%. Its potential applications in law enforcement, traffic management, and parking systems can improve surveillance and security through automation. , Thesis (MSc) -- Faculty of Science, Computer Science, 2024
- Full Text:
- Date Issued: 2024-10-11
- Authors: Boby, Alden
- Date: 2024-10-11
- Subjects: Uncatalogued
- Language: English
- Type: Academic theses , Master's theses , text
- Identifier: http://hdl.handle.net/10962/464322 , vital:76501
- Description: Vehicle security is a growing concern for citizens of South Africa. Law enforcement relies on reports and security camera footage for vehicle identification but struggles to match the increasing number of carjacking incidents and low vehicle recovery rates. Security camera footage offers an accessible means to identify stolen vehicles, yet it often poses hurdles like anamorphic plates and low resolution. Furthermore, depending on human operators proves inefficient, requiring faster processes to improve vehicle recovery rates and trust in law enforcement. The integration of deep learning has revolutionised object detection algorithms, increasing the popularity of vehicle tracking for security purposes. This thesis investigates advanced deep-learning methods for a comprehensive vehicle search and re-identification system. It enhances YOLOv7’s algorithmic capabilities and employs preprocessing techniques like super-resolution and perspective correction via the Improved Warped Planar Object Detection network for more effective licence plate optical character recognition. Key contributions include a specifically annotated dataset for training object detection models, an optical character recognition model based on YOLOv7, and a method for identifying vehicles in unrestricted data. The system detected rectangular and square licence plates without prior shape knowledge, achieving a 98.7% character recognition rate compared to 95.31% in related work. Moreover, it outperformed traditional optical character recognition by 28.25% and deep-learning EasyOCR by 14.18%. Its potential applications in law enforcement, traffic management, and parking systems can improve surveillance and security through automation. , Thesis (MSc) -- Faculty of Science, Computer Science, 2024
- Full Text:
- Date Issued: 2024-10-11
A Practical Use for AI-Generated Images
- Boby, Alden, Brown, Dane L, Connan, James
- Authors: Boby, Alden , Brown, Dane L , Connan, James
- Date: 2023
- Subjects: To be catalogued
- Language: English
- Type: text , article
- Identifier: http://hdl.handle.net/10962/463345 , vital:76401 , xlink:href="https://link.springer.com/chapter/10.1007/978-3-031-43838-7_12"
- Description: Collecting data for research can be costly and time-consuming, and available methods to speed up the process are limited. This research paper compares real data and AI-generated images for training an object detection model. The study aimed to assess how the utilisation of AI-generated images influences the performance of an object detection model. The study used a popular object detection model, YOLO, and trained it on a dataset with real car images as well as a synthetic dataset generated with a state-of-the-art diffusion model. The results showed that while the model trained on real data performed better on real-world images, the model trained on AI-generated images, in some cases, showed improved performance on certain images and was good enough to function as a licence plate detector on its own. The study highlights the potential of using AI-generated images for data augmentation in object detection models and sheds light on the trade-off between real and synthetic data in the training process. The findings of this study can inform future research in object detection and help practitioners make informed decisions when choosing between real and synthetic data for training object detection models.
- Full Text:
- Date Issued: 2023
- Authors: Boby, Alden , Brown, Dane L , Connan, James
- Date: 2023
- Subjects: To be catalogued
- Language: English
- Type: text , article
- Identifier: http://hdl.handle.net/10962/463345 , vital:76401 , xlink:href="https://link.springer.com/chapter/10.1007/978-3-031-43838-7_12"
- Description: Collecting data for research can be costly and time-consuming, and available methods to speed up the process are limited. This research paper compares real data and AI-generated images for training an object detection model. The study aimed to assess how the utilisation of AI-generated images influences the performance of an object detection model. The study used a popular object detection model, YOLO, and trained it on a dataset with real car images as well as a synthetic dataset generated with a state-of-the-art diffusion model. The results showed that while the model trained on real data performed better on real-world images, the model trained on AI-generated images, in some cases, showed improved performance on certain images and was good enough to function as a licence plate detector on its own. The study highlights the potential of using AI-generated images for data augmentation in object detection models and sheds light on the trade-off between real and synthetic data in the training process. The findings of this study can inform future research in object detection and help practitioners make informed decisions when choosing between real and synthetic data for training object detection models.
- Full Text:
- Date Issued: 2023
Enabling Vehicle Search Through Robust Licence Plate Detection
- Boby, Alden, Brown, Dane L, Connan, James, Marais, Marc, Kuhlane, Luxolo L
- Authors: Boby, Alden , Brown, Dane L , Connan, James , Marais, Marc , Kuhlane, Luxolo L
- Date: 2023
- Subjects: To be catalogued
- Language: English
- Type: text , article
- Identifier: http://hdl.handle.net/10962/463372 , vital:76403 , xlink:href="https://ieeexplore.ieee.org/abstract/document/10220508"
- Description: Licence plate recognition has many practical applications for security and surveillance. This paper presents a robust licence plate detection system that uses string-matching algorithms to identify a vehicle in data. Object detection models have had limited application in the character recognition domain. The system utilises the YOLO object detection model to perform character recognition to ensure more accurate character predictions. The model incorporates super-resolution techniques to enhance the quality of licence plate images to increase character recognition accuracy. The proposed system can accurately detect license plates in diverse conditions and can handle license plates with varying fonts and backgrounds. The system's effectiveness is demonstrated through experimentation on components of the system, showing promising license plate detection and character recognition accuracy. The overall system works with all the components to track vehicles by matching a target string with detected licence plates in a scene. The system has potential applications in law enforcement, traffic management, and parking systems and can significantly advance surveillance and security through automation.
- Full Text:
- Date Issued: 2023
- Authors: Boby, Alden , Brown, Dane L , Connan, James , Marais, Marc , Kuhlane, Luxolo L
- Date: 2023
- Subjects: To be catalogued
- Language: English
- Type: text , article
- Identifier: http://hdl.handle.net/10962/463372 , vital:76403 , xlink:href="https://ieeexplore.ieee.org/abstract/document/10220508"
- Description: Licence plate recognition has many practical applications for security and surveillance. This paper presents a robust licence plate detection system that uses string-matching algorithms to identify a vehicle in data. Object detection models have had limited application in the character recognition domain. The system utilises the YOLO object detection model to perform character recognition to ensure more accurate character predictions. The model incorporates super-resolution techniques to enhance the quality of licence plate images to increase character recognition accuracy. The proposed system can accurately detect license plates in diverse conditions and can handle license plates with varying fonts and backgrounds. The system's effectiveness is demonstrated through experimentation on components of the system, showing promising license plate detection and character recognition accuracy. The overall system works with all the components to track vehicles by matching a target string with detected licence plates in a scene. The system has potential applications in law enforcement, traffic management, and parking systems and can significantly advance surveillance and security through automation.
- Full Text:
- Date Issued: 2023
Exploring the Incremental Improvements of YOLOv5 on Tracking and Identifying Great White Sharks in Cape Town
- Kuhlane, Luxolo L, Brown, Dane L, Boby, Alden
- Authors: Kuhlane, Luxolo L , Brown, Dane L , Boby, Alden
- Date: 2023
- Subjects: To be catalogued
- Language: English
- Type: text , article
- Identifier: http://hdl.handle.net/10962/464107 , vital:76476 , xlink:href="https://link.springer.com/chapter/10.1007/978-3-031-37963-5_98"
- Description: The information on great white sharks is used by scientists to help better understand the marine organisms and to mitigate any chances of extinction of great white sharks. Sharks play a very important role in the ocean, and their role in the oceans is under-appreciated by the general public, which results in negative attitudes towards sharks. The tracking and identification of sharks are done using manual labour, which is not very accurate and time-consuming. This paper uses a deep learning approach to help identify and track great white sharks in Cape Town. A popular object detecting system used in this paper is YOLO, which is implemented to help identify the great white shark. In conjunction with YOLO, the paper also uses ESRGAN to help upscale low-quality images from the datasets into more high-quality images before being put into the YOLO system. The main focus of this paper is to help train the system; this includes training the system to identify great white sharks in difficult conditions such as murky water or unclear deep-sea conditions.
- Full Text:
- Date Issued: 2023
- Authors: Kuhlane, Luxolo L , Brown, Dane L , Boby, Alden
- Date: 2023
- Subjects: To be catalogued
- Language: English
- Type: text , article
- Identifier: http://hdl.handle.net/10962/464107 , vital:76476 , xlink:href="https://link.springer.com/chapter/10.1007/978-3-031-37963-5_98"
- Description: The information on great white sharks is used by scientists to help better understand the marine organisms and to mitigate any chances of extinction of great white sharks. Sharks play a very important role in the ocean, and their role in the oceans is under-appreciated by the general public, which results in negative attitudes towards sharks. The tracking and identification of sharks are done using manual labour, which is not very accurate and time-consuming. This paper uses a deep learning approach to help identify and track great white sharks in Cape Town. A popular object detecting system used in this paper is YOLO, which is implemented to help identify the great white shark. In conjunction with YOLO, the paper also uses ESRGAN to help upscale low-quality images from the datasets into more high-quality images before being put into the YOLO system. The main focus of this paper is to help train the system; this includes training the system to identify great white sharks in difficult conditions such as murky water or unclear deep-sea conditions.
- Full Text:
- Date Issued: 2023
Spatiotemporal Convolutions and Video Vision Transformers for Signer-Independent Sign Language Recognition
- Marais, Marc, Brown, Dane L, Connan, James, Boby, Alden
- Authors: Marais, Marc , Brown, Dane L , Connan, James , Boby, Alden
- Date: 2023
- Subjects: To be catalogued
- Language: English
- Type: text , article
- Identifier: http://hdl.handle.net/10962/463478 , vital:76412 , xlink:href="https://ieeexplore.ieee.org/abstract/document/10220534"
- Description: Sign language is a vital tool of communication for individuals who are deaf or hard of hearing. Sign language recognition (SLR) technology can assist in bridging the communication gap between deaf and hearing individuals. However, existing SLR systems are typically signer-dependent, requiring training data from the specific signer for accurate recognition. This presents a significant challenge for practical use, as collecting data from every possible signer is not feasible. This research focuses on developing a signer-independent isolated SLR system to address this challenge. The system implements two model variants on the signer-independent datasets: an R(2+ I)D spatiotemporal convolutional block and a Video Vision transformer. These models learn to extract features from raw sign language videos from the LSA64 dataset and classify signs without needing handcrafted features, explicit segmentation or pose estimation. Overall, the R(2+1)D model architecture significantly outperformed the ViViT architecture for signer-independent SLR on the LSA64 dataset. The R(2+1)D model achieved a near-perfect accuracy of 99.53% on the unseen test set, with the ViViT model yielding an accuracy of 72.19 %. Proving that spatiotemporal convolutions are effective at signer-independent SLR.
- Full Text:
- Date Issued: 2023
- Authors: Marais, Marc , Brown, Dane L , Connan, James , Boby, Alden
- Date: 2023
- Subjects: To be catalogued
- Language: English
- Type: text , article
- Identifier: http://hdl.handle.net/10962/463478 , vital:76412 , xlink:href="https://ieeexplore.ieee.org/abstract/document/10220534"
- Description: Sign language is a vital tool of communication for individuals who are deaf or hard of hearing. Sign language recognition (SLR) technology can assist in bridging the communication gap between deaf and hearing individuals. However, existing SLR systems are typically signer-dependent, requiring training data from the specific signer for accurate recognition. This presents a significant challenge for practical use, as collecting data from every possible signer is not feasible. This research focuses on developing a signer-independent isolated SLR system to address this challenge. The system implements two model variants on the signer-independent datasets: an R(2+ I)D spatiotemporal convolutional block and a Video Vision transformer. These models learn to extract features from raw sign language videos from the LSA64 dataset and classify signs without needing handcrafted features, explicit segmentation or pose estimation. Overall, the R(2+1)D model architecture significantly outperformed the ViViT architecture for signer-independent SLR on the LSA64 dataset. The R(2+1)D model achieved a near-perfect accuracy of 99.53% on the unseen test set, with the ViViT model yielding an accuracy of 72.19 %. Proving that spatiotemporal convolutions are effective at signer-independent SLR.
- Full Text:
- Date Issued: 2023
An evaluation of hand-based algorithms for sign language recognition
- Marais, Marc, Brown, Dane L, Connan, James, Boby, Alden
- Authors: Marais, Marc , Brown, Dane L , Connan, James , Boby, Alden
- Date: 2022
- Subjects: To be catalogued
- Language: English
- Type: text , article
- Identifier: http://hdl.handle.net/10962/465124 , vital:76575 , xlink:href="https://ieeexplore.ieee.org/abstract/document/9856310"
- Description: Sign language recognition is an evolving research field in computer vision, assisting communication between hearing disabled people. Hand gestures contain the majority of the information when signing. Focusing on feature extraction methods to obtain the information stored in hand data in sign language recognition may improve classification accuracy. Pose estimation is a popular method for extracting body and hand landmarks. We implement and compare different feature extraction and segmentation algorithms, focusing on the hands only on the LSA64 dataset. To extract hand landmark coordinates, MediaPipe Holistic is implemented on the sign images. Classification is performed using poplar CNN architectures, namely ResNet and a Pruned VGG network. A separate 1D-CNN is utilised to classify hand landmark coordinates extracted using MediaPipe. The best performance was achieved on the unprocessed raw images using a Pruned VGG network with an accuracy of 95.50%. However, the more computationally efficient model using the hand landmark data and 1D-CNN for classification achieved an accuracy of 94.91%.
- Full Text:
- Date Issued: 2022
- Authors: Marais, Marc , Brown, Dane L , Connan, James , Boby, Alden
- Date: 2022
- Subjects: To be catalogued
- Language: English
- Type: text , article
- Identifier: http://hdl.handle.net/10962/465124 , vital:76575 , xlink:href="https://ieeexplore.ieee.org/abstract/document/9856310"
- Description: Sign language recognition is an evolving research field in computer vision, assisting communication between hearing disabled people. Hand gestures contain the majority of the information when signing. Focusing on feature extraction methods to obtain the information stored in hand data in sign language recognition may improve classification accuracy. Pose estimation is a popular method for extracting body and hand landmarks. We implement and compare different feature extraction and segmentation algorithms, focusing on the hands only on the LSA64 dataset. To extract hand landmark coordinates, MediaPipe Holistic is implemented on the sign images. Classification is performed using poplar CNN architectures, namely ResNet and a Pruned VGG network. A separate 1D-CNN is utilised to classify hand landmark coordinates extracted using MediaPipe. The best performance was achieved on the unprocessed raw images using a Pruned VGG network with an accuracy of 95.50%. However, the more computationally efficient model using the hand landmark data and 1D-CNN for classification achieved an accuracy of 94.91%.
- Full Text:
- Date Issued: 2022
Deep Learning Approach to Image Deblurring and Image Super-Resolution using DeblurGAN and SRGAN
- Kuhlane, Luxolo L, Brown, Dane L, Connan, James, Boby, Alden, Marais, Marc
- Authors: Kuhlane, Luxolo L , Brown, Dane L , Connan, James , Boby, Alden , Marais, Marc
- Date: 2022
- Subjects: To be catalogued
- Language: English
- Type: text , article
- Identifier: http://hdl.handle.net/10962/465157 , vital:76578 , xlink:href="https://www.researchgate.net/profile/Luxolo-Kuhlane/publication/363257796_Deep_Learning_Approach_to_Image_Deblurring_and_Image_Super-Resolution_using_DeblurGAN_and_SRGAN/links/6313b5a01ddd44702131b3df/Deep-Learning-Approach-to-Image-Deblurring-and-Image-Super-Resolution-using-DeblurGAN-and-SRGAN.pdf"
- Description: Deblurring is the task of restoring a blurred image to a sharp one, retrieving the information lost due to the blur of an image. Image deblurring and super-resolution, as representative image restoration problems, have been studied for a decade. Due to their wide range of applications, numerous techniques have been proposed to tackle these problems, inspiring innovations for better performance. Deep learning has become a robust framework for many image processing tasks, including restoration. In particular, generative adversarial networks (GANs), proposed by [1], have demonstrated remarkable performances in generating plausible images. However, training GANs for image restoration is a non-trivial task. This research investigates optimization schemes for GANs that improve image quality by providing meaningful training objective functions. In this paper we use a DeblurGAN and Super-Resolution Generative Adversarial Network (SRGAN) on the chosen dataset.
- Full Text:
- Date Issued: 2022
- Authors: Kuhlane, Luxolo L , Brown, Dane L , Connan, James , Boby, Alden , Marais, Marc
- Date: 2022
- Subjects: To be catalogued
- Language: English
- Type: text , article
- Identifier: http://hdl.handle.net/10962/465157 , vital:76578 , xlink:href="https://www.researchgate.net/profile/Luxolo-Kuhlane/publication/363257796_Deep_Learning_Approach_to_Image_Deblurring_and_Image_Super-Resolution_using_DeblurGAN_and_SRGAN/links/6313b5a01ddd44702131b3df/Deep-Learning-Approach-to-Image-Deblurring-and-Image-Super-Resolution-using-DeblurGAN-and-SRGAN.pdf"
- Description: Deblurring is the task of restoring a blurred image to a sharp one, retrieving the information lost due to the blur of an image. Image deblurring and super-resolution, as representative image restoration problems, have been studied for a decade. Due to their wide range of applications, numerous techniques have been proposed to tackle these problems, inspiring innovations for better performance. Deep learning has become a robust framework for many image processing tasks, including restoration. In particular, generative adversarial networks (GANs), proposed by [1], have demonstrated remarkable performances in generating plausible images. However, training GANs for image restoration is a non-trivial task. This research investigates optimization schemes for GANs that improve image quality by providing meaningful training objective functions. In this paper we use a DeblurGAN and Super-Resolution Generative Adversarial Network (SRGAN) on the chosen dataset.
- Full Text:
- Date Issued: 2022
Exploring the Incremental Improvements of YOLOv7 over YOLOv5 for Character Recognition
- Boby, Alden, Brown, Dane L, Connan, James, Marais, Marc
- Authors: Boby, Alden , Brown, Dane L , Connan, James , Marais, Marc
- Date: 2022
- Subjects: To be catalogued
- Language: English
- Type: text , article
- Identifier: http://hdl.handle.net/10962/463395 , vital:76405 , xlink:href="https://link.springer.com/chapter/10.1007/978-3-031-35644-5_5"
- Description: Technological advances are being applied to aspects of life to improve quality of living and efficiency. This speaks specifically to automation, especially in the industry. The growing number of vehicles on the road has presented a need to monitor more vehicles than ever to enforce traffic rules. One way to identify a vehicle is through its licence plate, which contains a unique string of characters that make it identifiable within an external database. Detecting characters on a licence plate using an object detector has only recently been explored. This paper uses the latest versions of the YOLO object detector to perform character recognition on licence plate images. This paper expands upon existing object detection-based character recognition by investigating how improvements in the framework translate to licence plate character recognition accuracy compared to character recognition based on older architectures. Results from this paper indicate that the newer YOLO models have increased performance over older YOLO-based character recognition models such as CRNET.
- Full Text:
- Date Issued: 2022
- Authors: Boby, Alden , Brown, Dane L , Connan, James , Marais, Marc
- Date: 2022
- Subjects: To be catalogued
- Language: English
- Type: text , article
- Identifier: http://hdl.handle.net/10962/463395 , vital:76405 , xlink:href="https://link.springer.com/chapter/10.1007/978-3-031-35644-5_5"
- Description: Technological advances are being applied to aspects of life to improve quality of living and efficiency. This speaks specifically to automation, especially in the industry. The growing number of vehicles on the road has presented a need to monitor more vehicles than ever to enforce traffic rules. One way to identify a vehicle is through its licence plate, which contains a unique string of characters that make it identifiable within an external database. Detecting characters on a licence plate using an object detector has only recently been explored. This paper uses the latest versions of the YOLO object detector to perform character recognition on licence plate images. This paper expands upon existing object detection-based character recognition by investigating how improvements in the framework translate to licence plate character recognition accuracy compared to character recognition based on older architectures. Results from this paper indicate that the newer YOLO models have increased performance over older YOLO-based character recognition models such as CRNET.
- Full Text:
- Date Issued: 2022
Improving licence plate detection using generative adversarial networks
- Authors: Boby, Alden , Brown, Dane L
- Date: 2022
- Subjects: To be catalogued
- Language: English
- Type: text , article
- Identifier: http://hdl.handle.net/10962/464145 , vital:76480 , xlink:href="https://link.springer.com/chapter/10.1007/978-3-031-04881-4_47"
- Description: The information on a licence plate is used for traffic law enforcement, access control, surveillance and parking lot management. Existing li-cence plate recognition systems work with clear images taken under controlled conditions. In real-world licence plate recognition scenarios, images are not as straightforward as the ‘toy’ datasets used to bench-mark existing systems. Real-world data is often noisy as it may contain occlusion and poor lighting, obscuring the information on a licence plate. Cleaning input data before using it for licence plate recognition is a complex problem, and existing literature addressing the issue is still limited. This paper uses two deep learning techniques to improve li-cence plate visibility towards more accurate licence plate recognition. A one-stage object detector popularly known as YOLO is implemented for locating licence plates under challenging situations. Super-resolution generative adversarial networks are considered for image upscaling and reconstruction to improve the clarity of low-quality input. The main focus involves training these systems on datasets that include difficult to detect licence plates, enabling better performance in unfavourable conditions and environments.
- Full Text:
- Date Issued: 2022
- Authors: Boby, Alden , Brown, Dane L
- Date: 2022
- Subjects: To be catalogued
- Language: English
- Type: text , article
- Identifier: http://hdl.handle.net/10962/464145 , vital:76480 , xlink:href="https://link.springer.com/chapter/10.1007/978-3-031-04881-4_47"
- Description: The information on a licence plate is used for traffic law enforcement, access control, surveillance and parking lot management. Existing li-cence plate recognition systems work with clear images taken under controlled conditions. In real-world licence plate recognition scenarios, images are not as straightforward as the ‘toy’ datasets used to bench-mark existing systems. Real-world data is often noisy as it may contain occlusion and poor lighting, obscuring the information on a licence plate. Cleaning input data before using it for licence plate recognition is a complex problem, and existing literature addressing the issue is still limited. This paper uses two deep learning techniques to improve li-cence plate visibility towards more accurate licence plate recognition. A one-stage object detector popularly known as YOLO is implemented for locating licence plates under challenging situations. Super-resolution generative adversarial networks are considered for image upscaling and reconstruction to improve the clarity of low-quality input. The main focus involves training these systems on datasets that include difficult to detect licence plates, enabling better performance in unfavourable conditions and environments.
- Full Text:
- Date Issued: 2022
Improving signer-independence using pose estimation and transfer learning for sign language recognition
- Marais, Marc, Brown, Dane L, Connan, James, Boby, Alden
- Authors: Marais, Marc , Brown, Dane L , Connan, James , Boby, Alden
- Date: 2022
- Subjects: To be catalogued
- Language: English
- Type: text , article
- Identifier: http://hdl.handle.net/10962/463406 , vital:76406 , xlink:href="https://doi.org/10.1007/978-3-031-35644-5"
- Description: Automated Sign Language Recognition (SLR) aims to bridge the com-munication gap between the hearing and the hearing disabled. Com-puter vision and deep learning lie at the forefront in working toward these systems. Most SLR research focuses on signer-dependent SLR and fails to account for variations in varying signers who gesticulate naturally. This paper investigates signer-independent SLR on the LSA64 dataset, focusing on different feature extraction approaches. Two approaches are proposed an InceptionV3-GRU architecture, which uses raw images as input, and a pose estimation LSTM architecture. MediaPipe Holistic is implemented to extract pose estimation landmark coordinates. A final third model applies augmentation and transfer learning using the pose estimation LSTM model. The research found that the pose estimation LSTM approach achieved the best perfor-mance with an accuracy of 80.22%. MediaPipe Holistic struggled with the augmentations introduced in the final experiment. Thus, looking into introducing more subtle augmentations may improve the model. Over-all, the system shows significant promise toward addressing the real-world signer-independence issue in SLR.
- Full Text:
- Date Issued: 2022
- Authors: Marais, Marc , Brown, Dane L , Connan, James , Boby, Alden
- Date: 2022
- Subjects: To be catalogued
- Language: English
- Type: text , article
- Identifier: http://hdl.handle.net/10962/463406 , vital:76406 , xlink:href="https://doi.org/10.1007/978-3-031-35644-5"
- Description: Automated Sign Language Recognition (SLR) aims to bridge the com-munication gap between the hearing and the hearing disabled. Com-puter vision and deep learning lie at the forefront in working toward these systems. Most SLR research focuses on signer-dependent SLR and fails to account for variations in varying signers who gesticulate naturally. This paper investigates signer-independent SLR on the LSA64 dataset, focusing on different feature extraction approaches. Two approaches are proposed an InceptionV3-GRU architecture, which uses raw images as input, and a pose estimation LSTM architecture. MediaPipe Holistic is implemented to extract pose estimation landmark coordinates. A final third model applies augmentation and transfer learning using the pose estimation LSTM model. The research found that the pose estimation LSTM approach achieved the best perfor-mance with an accuracy of 80.22%. MediaPipe Holistic struggled with the augmentations introduced in the final experiment. Thus, looking into introducing more subtle augmentations may improve the model. Over-all, the system shows significant promise toward addressing the real-world signer-independence issue in SLR.
- Full Text:
- Date Issued: 2022
Investigating signer-independent sign language recognition on the lsa64 dataset
- Marais, Marc, Brown, Dane L, Connan, James, Boby, Alden, Kuhlane, Luxolo L
- Authors: Marais, Marc , Brown, Dane L , Connan, James , Boby, Alden , Kuhlane, Luxolo L
- Date: 2022
- Subjects: To be catalogued
- Language: English
- Type: text , article
- Identifier: http://hdl.handle.net/10962/465179 , vital:76580 , xlink:href="https://www.researchgate.net/profile/Marc-Marais/publication/363174384_Investigating_Signer-Independ-ent_Sign_Language_Recognition_on_the_LSA64_Dataset/links/63108c7d5eed5e4bd138680f/Investigating-Signer-Independent-Sign-Language-Recognition-on-the-LSA64-Dataset.pdf"
- Description: Conversing with hearing disabled people is a significant challenge; however, computer vision advancements have significantly improved this through automated sign language recognition. One of the common issues in sign language recognition is signer-dependence, where variations arise from varying signers, who gesticulate naturally. Utilising the LSA64 dataset, a small scale Argentinian isolated sign language recognition, we investigate signer-independent sign language recognition. An InceptionV3-GRU architecture is employed to extract and classify spatial and temporal information for automated sign language recognition. The signer-dependent approach yielded an accuracy of 97.03%, whereas the signer-independent approach achieved an accuracy of 74.22%. The signer-independent system shows promise towards addressing the real-world and common issue of signer-dependence in sign language recognition.
- Full Text:
- Date Issued: 2022
- Authors: Marais, Marc , Brown, Dane L , Connan, James , Boby, Alden , Kuhlane, Luxolo L
- Date: 2022
- Subjects: To be catalogued
- Language: English
- Type: text , article
- Identifier: http://hdl.handle.net/10962/465179 , vital:76580 , xlink:href="https://www.researchgate.net/profile/Marc-Marais/publication/363174384_Investigating_Signer-Independ-ent_Sign_Language_Recognition_on_the_LSA64_Dataset/links/63108c7d5eed5e4bd138680f/Investigating-Signer-Independent-Sign-Language-Recognition-on-the-LSA64-Dataset.pdf"
- Description: Conversing with hearing disabled people is a significant challenge; however, computer vision advancements have significantly improved this through automated sign language recognition. One of the common issues in sign language recognition is signer-dependence, where variations arise from varying signers, who gesticulate naturally. Utilising the LSA64 dataset, a small scale Argentinian isolated sign language recognition, we investigate signer-independent sign language recognition. An InceptionV3-GRU architecture is employed to extract and classify spatial and temporal information for automated sign language recognition. The signer-dependent approach yielded an accuracy of 97.03%, whereas the signer-independent approach achieved an accuracy of 74.22%. The signer-independent system shows promise towards addressing the real-world and common issue of signer-dependence in sign language recognition.
- Full Text:
- Date Issued: 2022
Investigating the Effects of Image Correction Through Affine Transformations on Licence Plate Recognition
- Boby, Alden, Brown, Dane L, Connan, James, Marais, Marc
- Authors: Boby, Alden , Brown, Dane L , Connan, James , Marais, Marc
- Date: 2022
- Subjects: To be catalogued
- Language: English
- Type: text , article
- Identifier: http://hdl.handle.net/10962/465190 , vital:76581 , xlink:href="https://ieeexplore.ieee.org/abstract/document/9856380"
- Description: Licence plate recognition has many real-world applications, which fall under security and surveillance. Deep learning for licence plate recognition has been adopted to improve existing image-based processing techniques in recent years. Object detectors are a popular choice for approaching this task. All object detectors are some form of a convolutional neural network. The You Only Look Once framework and Region-Based Convolutional Neural Networks are popular models within this field. A novel architecture called the Warped Planar Object Detector is a recent development by Zou et al. that takes inspiration from YOLO and Spatial Network Transformers. This paper aims to compare the performance of the Warped Planar Object Detector and YOLO on licence plate recognition by training both models with the same data and then directing their output to an Enhanced Super-Resolution Generative Adversarial Network to upscale the output image, then lastly using an Optical Character Recognition engine to classify characters detected from the images.
- Full Text:
- Date Issued: 2022
- Authors: Boby, Alden , Brown, Dane L , Connan, James , Marais, Marc
- Date: 2022
- Subjects: To be catalogued
- Language: English
- Type: text , article
- Identifier: http://hdl.handle.net/10962/465190 , vital:76581 , xlink:href="https://ieeexplore.ieee.org/abstract/document/9856380"
- Description: Licence plate recognition has many real-world applications, which fall under security and surveillance. Deep learning for licence plate recognition has been adopted to improve existing image-based processing techniques in recent years. Object detectors are a popular choice for approaching this task. All object detectors are some form of a convolutional neural network. The You Only Look Once framework and Region-Based Convolutional Neural Networks are popular models within this field. A novel architecture called the Warped Planar Object Detector is a recent development by Zou et al. that takes inspiration from YOLO and Spatial Network Transformers. This paper aims to compare the performance of the Warped Planar Object Detector and YOLO on licence plate recognition by training both models with the same data and then directing their output to an Enhanced Super-Resolution Generative Adversarial Network to upscale the output image, then lastly using an Optical Character Recognition engine to classify characters detected from the images.
- Full Text:
- Date Issued: 2022
Iterative Refinement Versus Generative Adversarial Networks for Super-Resolution Towards Licence Plate Detection
- Boby, Alden, Brown, Dane L, Connan, James
- Authors: Boby, Alden , Brown, Dane L , Connan, James
- Date: 2022
- Subjects: To be catalogued
- Language: English
- Type: text , article
- Identifier: http://hdl.handle.net/10962/463417 , vital:76407 , xlink:href="https://link.springer.com/chapter/10.1007/978-981-99-1624-5_26"
- Description: Licence plate detection in unconstrained scenarios can be difficult because of the medium used to capture the data. Such data is not captured at very high resolution for practical reasons. Super-resolution can be used to improve the resolution of an image with fidelity beyond that of non-machine learning-based image upscaling algorithms such as bilinear or bicubic upscaling. Technological advances have introduced more than one way to perform super-resolution, with the best results coming from generative adversarial networks and iterative refinement with diffusion-based models. This paper puts the two best-performing super-resolution models against each other to see which is best for licence plate super-resolution. Quantitative results favour the generative adversarial network, while qualitative results lean towards the iterative refinement model.
- Full Text:
- Date Issued: 2022
- Authors: Boby, Alden , Brown, Dane L , Connan, James
- Date: 2022
- Subjects: To be catalogued
- Language: English
- Type: text , article
- Identifier: http://hdl.handle.net/10962/463417 , vital:76407 , xlink:href="https://link.springer.com/chapter/10.1007/978-981-99-1624-5_26"
- Description: Licence plate detection in unconstrained scenarios can be difficult because of the medium used to capture the data. Such data is not captured at very high resolution for practical reasons. Super-resolution can be used to improve the resolution of an image with fidelity beyond that of non-machine learning-based image upscaling algorithms such as bilinear or bicubic upscaling. Technological advances have introduced more than one way to perform super-resolution, with the best results coming from generative adversarial networks and iterative refinement with diffusion-based models. This paper puts the two best-performing super-resolution models against each other to see which is best for licence plate super-resolution. Quantitative results favour the generative adversarial network, while qualitative results lean towards the iterative refinement model.
- Full Text:
- Date Issued: 2022
- «
- ‹
- 1
- ›
- »