Annals of Emerging Technologies in Computing (AETiC)

 
Paper #2                                                                             

Integration Named Entity Recognition and Latent Dirichlet Allocation to Enhance Topic Modeling

Hawraa Ali Taher, Noralhuda N. Alabid and Bushra Mahdi Hasan


Abstract: Topic modeling from texts is one of the important topics in natural language processing (NLP), as it plays a fundamental role in summarizing texts, understanding their content, and facilitating access to the main ideas, especially in light of the vast quantity of unstructured texts available today. Extracting titles is used in a variety of fields, such as news archiving, document classification, and content analysis in social media, making it an essential tool for improving information management and effective presentation. In this research, we focused on improving the methodology for extracting titles from texts by integrating two leading techniques: the topic assignment model using Latent Dirichlet Allocation (LDA) and the named entity recognition technique (NER). This combination aims to achieve a balance between identifying general topics of texts via LDA and extracting important information and key entities using NER, ensuring the generation of accurate and understandable titles that better reflect the actual content of the texts. The results of the study showed that the combined methodology achieved an accuracy of 71.97%, outperforming the performance of each technique separately, where the accuracy of NER alone was 29.78% and the accuracy of LDA alone was 67.80%. These results underscore the importance of integrating different techniques into NLP to improve headline extraction performance. This approach contributes to the development of more efficient text analysis methods, which enhances NLP applications in areas such as news analysis, content management, and document summarization, highlighting the importance of the topic in improving the handling of large texts and presenting them in a clearer and more appropriate way.


Keywords: Bert Model; Cosine Similarity; Latent Dirichlet Allocation (LDA); Name Entity Recognition (NER); Text Analysis; Topic Modeling.


 
Full Text

This work is licensed under a Creative Commons Attribution 4.0 International License. Creative Commons License


This browser does not support PDFs. Please download the PDF to view it: Download PDF.

 
 International Association for Educators and Researchers (IAER), registered in England and Wales - Reg #OC418009                         Copyright © IAER 2023