Real Word Spelling Error Detection and Correction for Urdu Language
Artículo
Materias > Ingeniería
Universidad Europea del Atlántico > Investigación > Producción Científica
Fundación Universitaria Internacional de Colombia > Investigación > Artículos y libros
Universidad Internacional Iberoamericana México > Investigación > Producción Científica
Universidad Internacional Iberoamericana Puerto Rico > Investigación > Producción Científica
Universidad Internacional do Cuanza > Investigación > Producción Científica
Abierto
Inglés
Non-word and real-word errors are generally two types of spelling errors. Non-word errors are misspelled words that are nonexistent in the lexicon while real-word errors are misspelled words that exist in the lexicon but are used out of context in a sentence. Lexicon-based lookup approach is widely used for non-word errors but it is incapable of handling real-word errors as they require contextual information. Contrary to the English language, real-word error detection and correction for low-resourced languages like Urdu is an unexplored area. This paper presents a real-word spelling error detection and correction approach for the Urdu language. We develop an extensive lexicon of 593,738 words and use this lexicon to develop a dataset for real-word errors comprising 125562 sentences and 2,552,735 words. Based on the developed lexicon and dataset, we then develop a contextual spell checker that detects and corrects real-word errors. For the real-word error detection phase, word-gram features are used along with five machine learning classifiers, achieving a precision, recall, and F1-score of 0.84,0.79, and 0.81 respectively. We also test the proposed approach with a 40% error density. For real-word error correction, the Damerau-Levenshtein distance is used along with the n-gram model for further ranking of the suggested candidate words, achieving an accuracy of up to 83.67%.
metadata
Aziz, Romila; Anwar, Muhammad Waqas; Jamal, Muhammad Hasan; Bajwa, Usama Ijaz; Kuc Castilla, Ángel Gabriel; Uc-Rios, Carlos; Bautista Thompson, Ernesto y Ashraf, Imran
mail
SIN ESPECIFICAR, SIN ESPECIFICAR, SIN ESPECIFICAR, SIN ESPECIFICAR, SIN ESPECIFICAR, carlos.uc@unini.edu.mx, ernesto.bautista@unini.edu.mx, SIN ESPECIFICAR
(2023)
Real Word Spelling Error Detection and Correction for Urdu Language.
IEEE Access.
p. 1.
ISSN 2169-3536
Texto
Real_Word_Spelling_Error_Detection_and_Correction_for_Urdu_Language.pdf Available under License Creative Commons Attribution Non-commercial No Derivatives. Descargar (3MB) |
Resumen
Non-word and real-word errors are generally two types of spelling errors. Non-word errors are misspelled words that are nonexistent in the lexicon while real-word errors are misspelled words that exist in the lexicon but are used out of context in a sentence. Lexicon-based lookup approach is widely used for non-word errors but it is incapable of handling real-word errors as they require contextual information. Contrary to the English language, real-word error detection and correction for low-resourced languages like Urdu is an unexplored area. This paper presents a real-word spelling error detection and correction approach for the Urdu language. We develop an extensive lexicon of 593,738 words and use this lexicon to develop a dataset for real-word errors comprising 125562 sentences and 2,552,735 words. Based on the developed lexicon and dataset, we then develop a contextual spell checker that detects and corrects real-word errors. For the real-word error detection phase, word-gram features are used along with five machine learning classifiers, achieving a precision, recall, and F1-score of 0.84,0.79, and 0.81 respectively. We also test the proposed approach with a 40% error density. For real-word error correction, the Damerau-Levenshtein distance is used along with the n-gram model for further ranking of the suggested candidate words, achieving an accuracy of up to 83.67%.
Tipo de Documento: | Artículo |
---|---|
Palabras Clave: | Real-word errors, spelling correction, spelling detection, spell checker |
Clasificación temática: | Materias > Ingeniería |
Divisiones: | Universidad Europea del Atlántico > Investigación > Producción Científica Fundación Universitaria Internacional de Colombia > Investigación > Artículos y libros Universidad Internacional Iberoamericana México > Investigación > Producción Científica Universidad Internacional Iberoamericana Puerto Rico > Investigación > Producción Científica Universidad Internacional do Cuanza > Investigación > Producción Científica |
Depositado: | 14 Sep 2023 23:30 |
Ultima Modificación: | 14 Sep 2023 23:30 |
URI: | https://repositorio.unincol.edu.co/id/eprint/8800 |
Acciones (logins necesarios)
Ver Objeto |
<a class="ep_document_link" href="/15333/1/nutrients-16-03907.pdf"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>
en
open
Background/Objectives: The diet quality of younger individuals is decreasing globally, with alarming trends also in the Mediterranean region. The aim of this study was to assess diet quality and adequacy in relation to country-specific dietary recommendations for children and adolescents living in the Mediterranean area. Methods: A cross-sectional survey was conducted of 2011 parents of the target population participating in the DELICIOUS EU-PRIMA project. Dietary data and cross-references with food-based recommendations and the application of the youth healthy eating index (YHEI) was assessed through 24 h recalls and food frequency questionnaires. Results: Adherence to recommendations on plant-based foods was low (less than ∼20%), including fruit and vegetables adequacy in all countries, legume adequacy in all countries except for Italy, and cereal adequacy in all countries except for Portugal. For animal products and dietary fats, the adequacy in relation to the national food-based dietary recommendations was slightly better (∼40% on average) in most countries, although the Eastern countries reported worse rates. Higher scores on the YHEI predicted adequacy in relation to vegetables (except Egypt), fruit (except Lebanon), cereals (except Spain), and legumes (except Spain) in most countries. Younger children (p < 0.005) reporting having 8–10 h adequate sleep duration (p < 0.001), <2 h/day screen time (p < 0.001), and a medium/high physical activity level (p < 0.001) displayed a better diet quality. Moreover, older respondents (p < 0.001) with a medium/high educational level (p = 0.001) and living with a partner (p = 0.003) reported that their children had a better diet quality. Conclusions: Plant-based food groups, including fruit, vegetables, legumes, and even (whole-grain) cereals are underrepresented in the diets of Mediterranean children and adolescents. Moreover, the adequate consumption of other important dietary components, such as milk and dairy products, is rather disregarded, leading to substantially suboptimal diets and poor adequacy in relation to dietary guidelines.
Francesca Giampieri mail francesca.giampieri@uneatlantico.es, Alice Rosi mail , Francesca Scazzina mail , Evelyn Frias-Toral mail , Osama Abdelkarim mail , Mohamed Aly mail , Raynier Zambrano-Villacres mail , Juancho Pons mail , Laura Vázquez-Araújo mail , Sandra Sumalla Cano mail sandra.sumalla@uneatlantico.es, Iñaki Elío Pascual mail inaki.elio@uneatlantico.es, Lorenzo Monasta mail , Ana Mata mail , María Isabel Pardo mail , Pablo Busó mail , Giuseppe Grosso mail ,
Giampieri
<a href="/15640/1/s12911-024-02780-0.pdf" class="ep_document_link"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>
en
open
Thyroid illness encompasses a range of disorders affecting the thyroid gland, leading to either hyperthyroidism or hypothyroidism, which can significantly impact metabolism and overall health. Hypothyroidism can cause a slowdown in bodily processes, leading to symptoms such as fatigue, weight gain, depression, and cold sensitivity. Hyperthyroidism can lead to increased metabolism, causing symptoms like rapid weight loss, anxiety, irritability, and heart palpitations. Prompt diagnosis and appropriate treatment are crucial in managing thyroid disorders and improving patients’ quality of life. Thyroid illness affects millions worldwide and can significantly impact their quality of life if left untreated. This research aims to propose an effective artificial intelligence-based approach for the early diagnosis of thyroid illness. An open-access thyroid disease dataset based on 3,772 male and female patient observations is used for this research experiment. This study uses the nominal continuous synthetic minority oversampling technique (SMOTE-NC) for data balancing and a fine-tuned light gradient booster machine (LGBM) technique to diagnose thyroid illness and handle class imbalance problems. The proposed SNL (SMOTE-NC-LGBM) approach outperformed the state-of-the-art approach with high-accuracy performance scores of 0.96. We have also applied advanced machine learning and deep learning methods for comparison to evaluate performance. Hyperparameter optimizations are also conducted to enhance thyroid diagnosis performance. In addition, we have applied the explainable Artificial Intelligence (XAI) mechanism based on Shapley Additive exPlanations (SHAP) to enhance the transparency and interpretability of the proposed method by analyzing the decision-making processes. The proposed research revolutionizes the diagnosis of thyroid disorders efficiently and helps specialties overcome thyroid disorders early.
Ali Raza mail , Fatma Eid mail , Elisabeth Caro Montero mail elizabeth.caro@uneatlantico.es, Irene Delgado Noya mail irene.delgado@uneatlantico.es, Imran Ashraf mail ,
Raza
<a class="ep_document_link" href="/14584/1/s41598-024-73664-6.pdf"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>
en
open
The evolution of the COVID-19 pandemic has been associated with variations in clinical presentation and severity. Similarly, prediction scores may suffer changes in their diagnostic accuracy. The aim of this study was to test the 30-day mortality predictive validity of the 4C and SEIMC scores during the sixth wave of the pandemic and to compare them with those of validation studies. This was a longitudinal retrospective observational study. COVID-19 patients who were admitted to the Emergency Department of a Spanish hospital from December 15, 2021, to January 31, 2022, were selected. A side-by-side comparison with the pivotal validation studies was subsequently performed. The main measures were 30-day mortality and the 4C and SEIMC scores. A total of 27,614 patients were considered in the study, including 22,361 from the 4C, 4,627 from the SEIMC and 626 from our hospital. The 30-day mortality rate was significantly lower than that reported in the validation studies. The AUCs were 0.931 (95% CI: 0.90–0.95) for 4C and 0.903 (95% CI: 086–0.93) for SEIMC, which were significantly greater than those obtained in the first wave. Despite the changes that have occurred during the coronavirus disease 2019 (COVID-19) pandemic, with a reduction in lethality, scorecard systems are currently still useful tools for detecting patients with poor disease risk, with better prognostic capacity.
Pedro Ángel de Santos Castro mail , Carlos del Pozo Vegas mail , Leyre Teresa Pinilla Arribas mail , Daniel Zalama Sánchez mail , Ancor Sanz-García mail , Tony Giancarlo Vásquez del Águila mail , Pablo González Izquierdo mail , Sara de Santos Sánchez mail , Cristina Mazas Pérez-Oleaga mail cristina.mazas@uneatlantico.es, Irma Dominguez Azpíroz mail irma.dominguez@unini.edu.mx, Iñaki Elío Pascual mail inaki.elio@uneatlantico.es, Francisco Martín-Rodríguez mail ,
de Santos Castro
<a href="/14950/1/fmicb-15-1481418.pdf" class="ep_document_link"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>
en
open
Background: The 2023 dengue outbreak has proven that dengue is not only an endemic disease but also an emerging health threat in Bangladesh. Integrated studies on the epidemiology, clinical characteristics, seasonality, and genotype of dengue are limited. This study was conducted to determine recent trends in the molecular epidemiology, clinical features, and seasonality of dengue outbreaks. Methods: We analyzed data from 41 original studies, extracting epidemiological information from all 41 articles, clinical symptoms from 30 articles, and genotypic diversity from 11 articles. The study adhered to the standards of the Preferred Reporting Items for Systematic Review and Meta-Analysis (PRISMA) Statement and Cochrane Collaboration guidelines. Conclusion: This study provides integrated insights into the molecular epidemiology, clinical features, seasonality, and transmission of dengue in Bangladesh and highlights research gaps for future studies.
Nadim Sharif mail , Rubayet Rayhan Opu mail , Tama Saha mail , Abdullah Ibna Masud mail , Jannatin Naim mail , Khalaf F. Alsharif mail , Khalid J. Alzahrani mail , Eduardo René Silva Alvarado mail eduardo.silva@funiber.org, Irene Delgado Noya mail irene.delgado@uneatlantico.es, Isabel De la Torre Díez mail , Shuvra Kanti Dey mail ,
Sharif
<a href="/15624/1/s41598-024-73664-6%20%281%29.pdf" class="ep_document_link"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>
en
open
The evolution of the COVID-19 pandemic has been associated with variations in clinical presentation and severity. Similarly, prediction scores may suffer changes in their diagnostic accuracy. The aim of this study was to test the 30-day mortality predictive validity of the 4C and SEIMC scores during the sixth wave of the pandemic and to compare them with those of validation studies. This was a longitudinal retrospective observational study. COVID-19 patients who were admitted to the Emergency Department of a Spanish hospital from December 15, 2021, to January 31, 2022, were selected. A side-by-side comparison with the pivotal validation studies was subsequently performed. The main measures were 30-day mortality and the 4C and SEIMC scores. A total of 27,614 patients were considered in the study, including 22,361 from the 4C, 4,627 from the SEIMC and 626 from our hospital. The 30-day mortality rate was significantly lower than that reported in the validation studies. The AUCs were 0.931 (95% CI: 0.90–0.95) for 4C and 0.903 (95% CI: 086–0.93) for SEIMC, which were significantly greater than those obtained in the first wave. Despite the changes that have occurred during the coronavirus disease 2019 (COVID-19) pandemic, with a reduction in lethality, scorecard systems are currently still useful tools for detecting patients with poor disease risk, with better prognostic capacity.
Pedro Ángel de Santos Castro mail , Carlos del Pozo Vegas mail , Leyre Teresa Pinilla Arribas mail , Daniel Zalama Sánchez mail , Ancor Sanz-García mail , Tony Giancarlo Vásquez del Águila mail , Pablo González Izquierdo mail , Sara de Santos Sánchez mail , Cristina Mazas Pérez-Oleaga mail cristina.mazas@uneatlantico.es, Irma Dominguez Azpíroz mail irma.dominguez@unini.edu.mx, Iñaki Elío Pascual mail inaki.elio@uneatlantico.es, Francisco Martín-Rodríguez mail ,
de Santos Castro