Contextual Urdu Lemmatization Using Recurrent Neural Network Models

Artículo Materias > Ingeniería Universidad Europea del Atlántico > Investigación > Producción Científica
Fundación Universitaria Internacional de Colombia > Investigación > Artículos y libros
Universidad Internacional Iberoamericana México > Investigación > Producción Científica
Universidad Internacional Iberoamericana Puerto Rico > Investigación > Producción Científica
Universidad Internacional do Cuanza > Investigación > Producción Científica
Abierto Inglés In the field of natural language processing, machine translation is a colossally developing research area that helps humans communicate more effectively by bridging the linguistic gap. In machine translation, normalization and morphological analyses are the first and perhaps the most important modules for information retrieval (IR). To build a morphological analyzer, or to complete the normalization process, it is important to extract the correct root out of different words. Stemming and lemmatization are techniques commonly used to find the correct root words in a language. However, a few studies on IR systems for the Urdu language have shown that lemmatization is more effective than stemming due to infixes found in Urdu words. This paper presents a lemmatization algorithm based on recurrent neural network models for the Urdu language. However, lemmatization techniques for resource-scarce languages such as Urdu are not very common. The proposed model is trained and tested on two datasets, namely, the Urdu Monolingual Corpus (UMC) and the Universal Dependencies Corpus of Urdu (UDU). The datasets are lemmatized with the help of recurrent neural network models. The Word2Vec model and edit trees are used to generate semantic and syntactic embedding. Bidirectional long short-term memory (BiLSTM), bidirectional gated recurrent unit (BiGRU), bidirectional gated recurrent neural network (BiGRNN), and attention-free encoder–decoder (AFED) models are trained under defined hyperparameters. Experimental results show that the attention-free encoder-decoder model achieves an accuracy, precision, recall, and F-score of 0.96, 0.95, 0.95, and 0.95, respectively, and outperforms existing models metadata Hafeez, Rabab; Anwar, Muhammad Waqas; Jamal, Muhammad Hasan; Fatima, Tayyaba; Martínez Espinosa, Julio César; Dzul López, Luis Alonso; Bautista Thompson, Ernesto y Ashraf, Imran mail SIN ESPECIFICAR, SIN ESPECIFICAR, SIN ESPECIFICAR, SIN ESPECIFICAR, ulio.martinez@unini.edu.mx, luis.dzul@uneatlantico.es, ernesto.bautista@unini.edu.mx, SIN ESPECIFICAR (2023) Contextual Urdu Lemmatization Using Recurrent Neural Network Models. Mathematics, 11 (2). p. 435. ISSN 2227-7390

[img] Texto
mathematics-11-00435.pdf
Available under License Creative Commons Attribution.

Descargar (1MB)

Resumen

In the field of natural language processing, machine translation is a colossally developing research area that helps humans communicate more effectively by bridging the linguistic gap. In machine translation, normalization and morphological analyses are the first and perhaps the most important modules for information retrieval (IR). To build a morphological analyzer, or to complete the normalization process, it is important to extract the correct root out of different words. Stemming and lemmatization are techniques commonly used to find the correct root words in a language. However, a few studies on IR systems for the Urdu language have shown that lemmatization is more effective than stemming due to infixes found in Urdu words. This paper presents a lemmatization algorithm based on recurrent neural network models for the Urdu language. However, lemmatization techniques for resource-scarce languages such as Urdu are not very common. The proposed model is trained and tested on two datasets, namely, the Urdu Monolingual Corpus (UMC) and the Universal Dependencies Corpus of Urdu (UDU). The datasets are lemmatized with the help of recurrent neural network models. The Word2Vec model and edit trees are used to generate semantic and syntactic embedding. Bidirectional long short-term memory (BiLSTM), bidirectional gated recurrent unit (BiGRU), bidirectional gated recurrent neural network (BiGRNN), and attention-free encoder–decoder (AFED) models are trained under defined hyperparameters. Experimental results show that the attention-free encoder-decoder model achieves an accuracy, precision, recall, and F-score of 0.96, 0.95, 0.95, and 0.95, respectively, and outperforms existing models

Tipo de Documento: Artículo
Palabras Clave: neural networks; natural language processing; inflectional morphology; derivational morphology; MSC: 68T50
Clasificación temática: Materias > Ingeniería
Divisiones: Universidad Europea del Atlántico > Investigación > Producción Científica
Fundación Universitaria Internacional de Colombia > Investigación > Artículos y libros
Universidad Internacional Iberoamericana México > Investigación > Producción Científica
Universidad Internacional Iberoamericana Puerto Rico > Investigación > Producción Científica
Universidad Internacional do Cuanza > Investigación > Producción Científica
Depositado: 01 Feb 2023 23:30
Ultima Modificación: 21 Oct 2024 23:30
URI: https://repositorio.unincol.edu.co/id/eprint/5660

Acciones (logins necesarios)

Ver Objeto Ver Objeto

<a class="ep_document_link" href="/15333/1/nutrients-16-03907.pdf"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>

en

open

Youth Healthy Eating Index (YHEI) and Diet Adequacy in Relation to Country-Specific National Dietary Recommendations in Children and Adolescents in Five Mediterranean Countries from the DELICIOUS Project

Background/Objectives: The diet quality of younger individuals is decreasing globally, with alarming trends also in the Mediterranean region. The aim of this study was to assess diet quality and adequacy in relation to country-specific dietary recommendations for children and adolescents living in the Mediterranean area. Methods: A cross-sectional survey was conducted of 2011 parents of the target population participating in the DELICIOUS EU-PRIMA project. Dietary data and cross-references with food-based recommendations and the application of the youth healthy eating index (YHEI) was assessed through 24 h recalls and food frequency questionnaires. Results: Adherence to recommendations on plant-based foods was low (less than ∼20%), including fruit and vegetables adequacy in all countries, legume adequacy in all countries except for Italy, and cereal adequacy in all countries except for Portugal. For animal products and dietary fats, the adequacy in relation to the national food-based dietary recommendations was slightly better (∼40% on average) in most countries, although the Eastern countries reported worse rates. Higher scores on the YHEI predicted adequacy in relation to vegetables (except Egypt), fruit (except Lebanon), cereals (except Spain), and legumes (except Spain) in most countries. Younger children (p < 0.005) reporting having 8–10 h adequate sleep duration (p < 0.001), <2 h/day screen time (p < 0.001), and a medium/high physical activity level (p < 0.001) displayed a better diet quality. Moreover, older respondents (p < 0.001) with a medium/high educational level (p = 0.001) and living with a partner (p = 0.003) reported that their children had a better diet quality. Conclusions: Plant-based food groups, including fruit, vegetables, legumes, and even (whole-grain) cereals are underrepresented in the diets of Mediterranean children and adolescents. Moreover, the adequate consumption of other important dietary components, such as milk and dairy products, is rather disregarded, leading to substantially suboptimal diets and poor adequacy in relation to dietary guidelines.

Producción Científica

Francesca Giampieri mail francesca.giampieri@uneatlantico.es, Alice Rosi mail , Francesca Scazzina mail , Evelyn Frias-Toral mail , Osama Abdelkarim mail , Mohamed Aly mail , Raynier Zambrano-Villacres mail , Juancho Pons mail , Laura Vázquez-Araújo mail , Sandra Sumalla Cano mail sandra.sumalla@uneatlantico.es, Iñaki Elío Pascual mail inaki.elio@uneatlantico.es, Lorenzo Monasta mail , Ana Mata mail , María Isabel Pardo mail , Pablo Busó mail , Giuseppe Grosso mail ,

Giampieri

<a class="ep_document_link" href="/15640/1/s12911-024-02780-0.pdf"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>

en

open

Enhanced interpretable thyroid disease diagnosis by leveraging synthetic oversampling and machine learning models

Thyroid illness encompasses a range of disorders affecting the thyroid gland, leading to either hyperthyroidism or hypothyroidism, which can significantly impact metabolism and overall health. Hypothyroidism can cause a slowdown in bodily processes, leading to symptoms such as fatigue, weight gain, depression, and cold sensitivity. Hyperthyroidism can lead to increased metabolism, causing symptoms like rapid weight loss, anxiety, irritability, and heart palpitations. Prompt diagnosis and appropriate treatment are crucial in managing thyroid disorders and improving patients’ quality of life. Thyroid illness affects millions worldwide and can significantly impact their quality of life if left untreated. This research aims to propose an effective artificial intelligence-based approach for the early diagnosis of thyroid illness. An open-access thyroid disease dataset based on 3,772 male and female patient observations is used for this research experiment. This study uses the nominal continuous synthetic minority oversampling technique (SMOTE-NC) for data balancing and a fine-tuned light gradient booster machine (LGBM) technique to diagnose thyroid illness and handle class imbalance problems. The proposed SNL (SMOTE-NC-LGBM) approach outperformed the state-of-the-art approach with high-accuracy performance scores of 0.96. We have also applied advanced machine learning and deep learning methods for comparison to evaluate performance. Hyperparameter optimizations are also conducted to enhance thyroid diagnosis performance. In addition, we have applied the explainable Artificial Intelligence (XAI) mechanism based on Shapley Additive exPlanations (SHAP) to enhance the transparency and interpretability of the proposed method by analyzing the decision-making processes. The proposed research revolutionizes the diagnosis of thyroid disorders efficiently and helps specialties overcome thyroid disorders early.

Producción Científica

Ali Raza mail , Fatma Eid mail , Elisabeth Caro Montero mail elizabeth.caro@uneatlantico.es, Irene Delgado Noya mail irene.delgado@uneatlantico.es, Imran Ashraf mail ,

Raza

<a class="ep_document_link" href="/14584/1/s41598-024-73664-6.pdf"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>

en

open

Performance of the 4C and SEIMC scoring systems in predicting mortality from onset to current COVID-19 pandemic in emergency departments

The evolution of the COVID-19 pandemic has been associated with variations in clinical presentation and severity. Similarly, prediction scores may suffer changes in their diagnostic accuracy. The aim of this study was to test the 30-day mortality predictive validity of the 4C and SEIMC scores during the sixth wave of the pandemic and to compare them with those of validation studies. This was a longitudinal retrospective observational study. COVID-19 patients who were admitted to the Emergency Department of a Spanish hospital from December 15, 2021, to January 31, 2022, were selected. A side-by-side comparison with the pivotal validation studies was subsequently performed. The main measures were 30-day mortality and the 4C and SEIMC scores. A total of 27,614 patients were considered in the study, including 22,361 from the 4C, 4,627 from the SEIMC and 626 from our hospital. The 30-day mortality rate was significantly lower than that reported in the validation studies. The AUCs were 0.931 (95% CI: 0.90–0.95) for 4C and 0.903 (95% CI: 086–0.93) for SEIMC, which were significantly greater than those obtained in the first wave. Despite the changes that have occurred during the coronavirus disease 2019 (COVID-19) pandemic, with a reduction in lethality, scorecard systems are currently still useful tools for detecting patients with poor disease risk, with better prognostic capacity.

Producción Científica

Pedro Ángel de Santos Castro mail , Carlos del Pozo Vegas mail , Leyre Teresa Pinilla Arribas mail , Daniel Zalama Sánchez mail , Ancor Sanz-García mail , Tony Giancarlo Vásquez del Águila mail , Pablo González Izquierdo mail , Sara de Santos Sánchez mail , Cristina Mazas Pérez-Oleaga mail cristina.mazas@uneatlantico.es, Irma Dominguez Azpíroz mail irma.dominguez@unini.edu.mx, Iñaki Elío Pascual mail inaki.elio@uneatlantico.es, Francisco Martín-Rodríguez mail ,

de Santos Castro

<a href="/14950/1/fmicb-15-1481418.pdf" class="ep_document_link"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>

en

open

Evolving epidemiology, clinical features, and genotyping of dengue outbreaks in Bangladesh, 2000–2024: a systematic review

Background: The 2023 dengue outbreak has proven that dengue is not only an endemic disease but also an emerging health threat in Bangladesh. Integrated studies on the epidemiology, clinical characteristics, seasonality, and genotype of dengue are limited. This study was conducted to determine recent trends in the molecular epidemiology, clinical features, and seasonality of dengue outbreaks. Methods: We analyzed data from 41 original studies, extracting epidemiological information from all 41 articles, clinical symptoms from 30 articles, and genotypic diversity from 11 articles. The study adhered to the standards of the Preferred Reporting Items for Systematic Review and Meta-Analysis (PRISMA) Statement and Cochrane Collaboration guidelines. Conclusion: This study provides integrated insights into the molecular epidemiology, clinical features, seasonality, and transmission of dengue in Bangladesh and highlights research gaps for future studies.

Producción Científica

Nadim Sharif mail , Rubayet Rayhan Opu mail , Tama Saha mail , Abdullah Ibna Masud mail , Jannatin Naim mail , Khalaf F. Alsharif mail , Khalid J. Alzahrani mail , Eduardo René Silva Alvarado mail eduardo.silva@funiber.org, Irene Delgado Noya mail irene.delgado@uneatlantico.es, Isabel De la Torre Díez mail , Shuvra Kanti Dey mail ,

Sharif

<a class="ep_document_link" href="/15624/1/s41598-024-73664-6%20%281%29.pdf"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>

en

open

Performance of the 4C and SEIMC scoring systems in predicting mortality from onset to current COVID-19 pandemic in emergency departments

The evolution of the COVID-19 pandemic has been associated with variations in clinical presentation and severity. Similarly, prediction scores may suffer changes in their diagnostic accuracy. The aim of this study was to test the 30-day mortality predictive validity of the 4C and SEIMC scores during the sixth wave of the pandemic and to compare them with those of validation studies. This was a longitudinal retrospective observational study. COVID-19 patients who were admitted to the Emergency Department of a Spanish hospital from December 15, 2021, to January 31, 2022, were selected. A side-by-side comparison with the pivotal validation studies was subsequently performed. The main measures were 30-day mortality and the 4C and SEIMC scores. A total of 27,614 patients were considered in the study, including 22,361 from the 4C, 4,627 from the SEIMC and 626 from our hospital. The 30-day mortality rate was significantly lower than that reported in the validation studies. The AUCs were 0.931 (95% CI: 0.90–0.95) for 4C and 0.903 (95% CI: 086–0.93) for SEIMC, which were significantly greater than those obtained in the first wave. Despite the changes that have occurred during the coronavirus disease 2019 (COVID-19) pandemic, with a reduction in lethality, scorecard systems are currently still useful tools for detecting patients with poor disease risk, with better prognostic capacity.

Producción Científica

Pedro Ángel de Santos Castro mail , Carlos del Pozo Vegas mail , Leyre Teresa Pinilla Arribas mail , Daniel Zalama Sánchez mail , Ancor Sanz-García mail , Tony Giancarlo Vásquez del Águila mail , Pablo González Izquierdo mail , Sara de Santos Sánchez mail , Cristina Mazas Pérez-Oleaga mail cristina.mazas@uneatlantico.es, Irma Dominguez Azpíroz mail irma.dominguez@unini.edu.mx, Iñaki Elío Pascual mail inaki.elio@uneatlantico.es, Francisco Martín-Rodríguez mail ,

de Santos Castro