Depression Intensity Classification from Tweets Using FastText Based Weighted Soft Voting Ensemble
Artículo
Materias > Ingeniería
Materias > Psicología
Universidad Europea del Atlántico > Investigación > Producción Científica
Fundación Universitaria Internacional de Colombia > Investigación > Artículos y libros
Universidad Internacional Iberoamericana México > Investigación > Producción Científica
Universidad Internacional Iberoamericana Puerto Rico > Investigación > Producción Científica
Universidad Internacional do Cuanza > Investigación > Producción Científica
Abierto
Inglés
Predicting depression intensity from microblogs and social media posts has numerous benefits and applications, including predicting early psychological disorders and stress in individuals or the general public. A major challenge in predicting depression using social media posts is that the existing studies do not focus on predicting the intensity of depression in social media texts but rather only perform the binary classification of depression and moreover noisy data makes it difficult to predict the true depression in the social media text. This study intends to begin by collecting relevant Tweets and generating a corpus of 210000 public tweets using Twitter public application programming interfaces (APIs). A strategy is devised to filter out only depression-related tweets by creating a list of relevant hashtags to reduce noise in the corpus. Furthermore, an algorithm is developed to annotate the data into three depression classes: ‘Mild,’ ‘Moderate,’ and ‘Severe,’ based on International Classification of Diseases-10 (ICD-10) depression diagnostic criteria. Different baseline classifiers are applied to the annotated dataset to get a preliminary idea of classification performance on the corpus. Further FastText-based model is applied and fine-tuned with different preprocessing techniques and hyperparameter tuning to produce the tuned model, which significantly increases the depression classification performance to an 84% F1 score and 90% accuracy compared to baselines. Finally, a FastText-based weighted soft voting ensemble (WSVE) is proposed to boost the model’s performance by combining several other classifiers and assigning weights to individual models according to their individual performances. The proposed WSVE outperformed all baselines as well as FastText alone, with an F1 of 89%, 5% higher than FastText alone, and an accuracy of 93%, 3% higher than FastText alone. The proposed model better captures the contextual features of the relatively small sample class and aids in the detection of early depression intensity prediction from tweets with impactful performances.
metadata
Rizwan, Muhammad; Mushtaq, Muhammad Faheem; Rafiq, Maryam; Mehmood, Arif; Diez, Isabel de la Torre; Gracia Villar, Mónica; Garay, Helena y Ashraf, Imran
mail
SIN ESPECIFICAR, SIN ESPECIFICAR, SIN ESPECIFICAR, SIN ESPECIFICAR, SIN ESPECIFICAR, monica.gracia@uneatlantico.es, helena.garay@uneatlantico.es, SIN ESPECIFICAR
(2024)
Depression Intensity Classification from Tweets Using FastText Based Weighted Soft Voting Ensemble.
Computers, Materials & Continua, 78 (2).
pp. 2047-2066.
ISSN 1546-2226
Texto
TSP_CMC_37347.pdf Available under License Creative Commons Attribution. Descargar (861kB) |
Resumen
Predicting depression intensity from microblogs and social media posts has numerous benefits and applications, including predicting early psychological disorders and stress in individuals or the general public. A major challenge in predicting depression using social media posts is that the existing studies do not focus on predicting the intensity of depression in social media texts but rather only perform the binary classification of depression and moreover noisy data makes it difficult to predict the true depression in the social media text. This study intends to begin by collecting relevant Tweets and generating a corpus of 210000 public tweets using Twitter public application programming interfaces (APIs). A strategy is devised to filter out only depression-related tweets by creating a list of relevant hashtags to reduce noise in the corpus. Furthermore, an algorithm is developed to annotate the data into three depression classes: ‘Mild,’ ‘Moderate,’ and ‘Severe,’ based on International Classification of Diseases-10 (ICD-10) depression diagnostic criteria. Different baseline classifiers are applied to the annotated dataset to get a preliminary idea of classification performance on the corpus. Further FastText-based model is applied and fine-tuned with different preprocessing techniques and hyperparameter tuning to produce the tuned model, which significantly increases the depression classification performance to an 84% F1 score and 90% accuracy compared to baselines. Finally, a FastText-based weighted soft voting ensemble (WSVE) is proposed to boost the model’s performance by combining several other classifiers and assigning weights to individual models according to their individual performances. The proposed WSVE outperformed all baselines as well as FastText alone, with an F1 of 89%, 5% higher than FastText alone, and an accuracy of 93%, 3% higher than FastText alone. The proposed model better captures the contextual features of the relatively small sample class and aids in the detection of early depression intensity prediction from tweets with impactful performances.
Tipo de Documento: | Artículo |
---|---|
Palabras Clave: | Depression classification; deep learning; FastText; machine learning |
Clasificación temática: | Materias > Ingeniería Materias > Psicología |
Divisiones: | Universidad Europea del Atlántico > Investigación > Producción Científica Fundación Universitaria Internacional de Colombia > Investigación > Artículos y libros Universidad Internacional Iberoamericana México > Investigación > Producción Científica Universidad Internacional Iberoamericana Puerto Rico > Investigación > Producción Científica Universidad Internacional do Cuanza > Investigación > Producción Científica |
Depositado: | 14 Mar 2024 23:30 |
Ultima Modificación: | 14 Mar 2024 23:30 |
URI: | https://repositorio.unincol.edu.co/id/eprint/11264 |
Acciones (logins necesarios)
Ver Objeto |
<a class="ep_document_link" href="/15333/1/nutrients-16-03907.pdf"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>
en
open
Background/Objectives: The diet quality of younger individuals is decreasing globally, with alarming trends also in the Mediterranean region. The aim of this study was to assess diet quality and adequacy in relation to country-specific dietary recommendations for children and adolescents living in the Mediterranean area. Methods: A cross-sectional survey was conducted of 2011 parents of the target population participating in the DELICIOUS EU-PRIMA project. Dietary data and cross-references with food-based recommendations and the application of the youth healthy eating index (YHEI) was assessed through 24 h recalls and food frequency questionnaires. Results: Adherence to recommendations on plant-based foods was low (less than ∼20%), including fruit and vegetables adequacy in all countries, legume adequacy in all countries except for Italy, and cereal adequacy in all countries except for Portugal. For animal products and dietary fats, the adequacy in relation to the national food-based dietary recommendations was slightly better (∼40% on average) in most countries, although the Eastern countries reported worse rates. Higher scores on the YHEI predicted adequacy in relation to vegetables (except Egypt), fruit (except Lebanon), cereals (except Spain), and legumes (except Spain) in most countries. Younger children (p < 0.005) reporting having 8–10 h adequate sleep duration (p < 0.001), <2 h/day screen time (p < 0.001), and a medium/high physical activity level (p < 0.001) displayed a better diet quality. Moreover, older respondents (p < 0.001) with a medium/high educational level (p = 0.001) and living with a partner (p = 0.003) reported that their children had a better diet quality. Conclusions: Plant-based food groups, including fruit, vegetables, legumes, and even (whole-grain) cereals are underrepresented in the diets of Mediterranean children and adolescents. Moreover, the adequate consumption of other important dietary components, such as milk and dairy products, is rather disregarded, leading to substantially suboptimal diets and poor adequacy in relation to dietary guidelines.
Francesca Giampieri mail francesca.giampieri@uneatlantico.es, Alice Rosi mail , Francesca Scazzina mail , Evelyn Frias-Toral mail , Osama Abdelkarim mail , Mohamed Aly mail , Raynier Zambrano-Villacres mail , Juancho Pons mail , Laura Vázquez-Araújo mail , Sandra Sumalla Cano mail sandra.sumalla@uneatlantico.es, Iñaki Elío Pascual mail inaki.elio@uneatlantico.es, Lorenzo Monasta mail , Ana Mata mail , María Isabel Pardo mail , Pablo Busó mail , Giuseppe Grosso mail ,
Giampieri
<a class="ep_document_link" href="/15640/1/s12911-024-02780-0.pdf"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>
en
open
Thyroid illness encompasses a range of disorders affecting the thyroid gland, leading to either hyperthyroidism or hypothyroidism, which can significantly impact metabolism and overall health. Hypothyroidism can cause a slowdown in bodily processes, leading to symptoms such as fatigue, weight gain, depression, and cold sensitivity. Hyperthyroidism can lead to increased metabolism, causing symptoms like rapid weight loss, anxiety, irritability, and heart palpitations. Prompt diagnosis and appropriate treatment are crucial in managing thyroid disorders and improving patients’ quality of life. Thyroid illness affects millions worldwide and can significantly impact their quality of life if left untreated. This research aims to propose an effective artificial intelligence-based approach for the early diagnosis of thyroid illness. An open-access thyroid disease dataset based on 3,772 male and female patient observations is used for this research experiment. This study uses the nominal continuous synthetic minority oversampling technique (SMOTE-NC) for data balancing and a fine-tuned light gradient booster machine (LGBM) technique to diagnose thyroid illness and handle class imbalance problems. The proposed SNL (SMOTE-NC-LGBM) approach outperformed the state-of-the-art approach with high-accuracy performance scores of 0.96. We have also applied advanced machine learning and deep learning methods for comparison to evaluate performance. Hyperparameter optimizations are also conducted to enhance thyroid diagnosis performance. In addition, we have applied the explainable Artificial Intelligence (XAI) mechanism based on Shapley Additive exPlanations (SHAP) to enhance the transparency and interpretability of the proposed method by analyzing the decision-making processes. The proposed research revolutionizes the diagnosis of thyroid disorders efficiently and helps specialties overcome thyroid disorders early.
Ali Raza mail , Fatma Eid mail , Elisabeth Caro Montero mail elizabeth.caro@uneatlantico.es, Irene Delgado Noya mail irene.delgado@uneatlantico.es, Imran Ashraf mail ,
Raza
<a href="/14584/1/s41598-024-73664-6.pdf" class="ep_document_link"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>
en
open
The evolution of the COVID-19 pandemic has been associated with variations in clinical presentation and severity. Similarly, prediction scores may suffer changes in their diagnostic accuracy. The aim of this study was to test the 30-day mortality predictive validity of the 4C and SEIMC scores during the sixth wave of the pandemic and to compare them with those of validation studies. This was a longitudinal retrospective observational study. COVID-19 patients who were admitted to the Emergency Department of a Spanish hospital from December 15, 2021, to January 31, 2022, were selected. A side-by-side comparison with the pivotal validation studies was subsequently performed. The main measures were 30-day mortality and the 4C and SEIMC scores. A total of 27,614 patients were considered in the study, including 22,361 from the 4C, 4,627 from the SEIMC and 626 from our hospital. The 30-day mortality rate was significantly lower than that reported in the validation studies. The AUCs were 0.931 (95% CI: 0.90–0.95) for 4C and 0.903 (95% CI: 086–0.93) for SEIMC, which were significantly greater than those obtained in the first wave. Despite the changes that have occurred during the coronavirus disease 2019 (COVID-19) pandemic, with a reduction in lethality, scorecard systems are currently still useful tools for detecting patients with poor disease risk, with better prognostic capacity.
Pedro Ángel de Santos Castro mail , Carlos del Pozo Vegas mail , Leyre Teresa Pinilla Arribas mail , Daniel Zalama Sánchez mail , Ancor Sanz-García mail , Tony Giancarlo Vásquez del Águila mail , Pablo González Izquierdo mail , Sara de Santos Sánchez mail , Cristina Mazas Pérez-Oleaga mail cristina.mazas@uneatlantico.es, Irma Dominguez Azpíroz mail irma.dominguez@unini.edu.mx, Iñaki Elío Pascual mail inaki.elio@uneatlantico.es, Francisco Martín-Rodríguez mail ,
de Santos Castro
<a href="/14950/1/fmicb-15-1481418.pdf" class="ep_document_link"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>
en
open
Background: The 2023 dengue outbreak has proven that dengue is not only an endemic disease but also an emerging health threat in Bangladesh. Integrated studies on the epidemiology, clinical characteristics, seasonality, and genotype of dengue are limited. This study was conducted to determine recent trends in the molecular epidemiology, clinical features, and seasonality of dengue outbreaks. Methods: We analyzed data from 41 original studies, extracting epidemiological information from all 41 articles, clinical symptoms from 30 articles, and genotypic diversity from 11 articles. The study adhered to the standards of the Preferred Reporting Items for Systematic Review and Meta-Analysis (PRISMA) Statement and Cochrane Collaboration guidelines. Conclusion: This study provides integrated insights into the molecular epidemiology, clinical features, seasonality, and transmission of dengue in Bangladesh and highlights research gaps for future studies.
Nadim Sharif mail , Rubayet Rayhan Opu mail , Tama Saha mail , Abdullah Ibna Masud mail , Jannatin Naim mail , Khalaf F. Alsharif mail , Khalid J. Alzahrani mail , Eduardo René Silva Alvarado mail eduardo.silva@funiber.org, Irene Delgado Noya mail irene.delgado@uneatlantico.es, Isabel De la Torre Díez mail , Shuvra Kanti Dey mail ,
Sharif
<a href="/15624/1/s41598-024-73664-6%20%281%29.pdf" class="ep_document_link"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>
en
open
The evolution of the COVID-19 pandemic has been associated with variations in clinical presentation and severity. Similarly, prediction scores may suffer changes in their diagnostic accuracy. The aim of this study was to test the 30-day mortality predictive validity of the 4C and SEIMC scores during the sixth wave of the pandemic and to compare them with those of validation studies. This was a longitudinal retrospective observational study. COVID-19 patients who were admitted to the Emergency Department of a Spanish hospital from December 15, 2021, to January 31, 2022, were selected. A side-by-side comparison with the pivotal validation studies was subsequently performed. The main measures were 30-day mortality and the 4C and SEIMC scores. A total of 27,614 patients were considered in the study, including 22,361 from the 4C, 4,627 from the SEIMC and 626 from our hospital. The 30-day mortality rate was significantly lower than that reported in the validation studies. The AUCs were 0.931 (95% CI: 0.90–0.95) for 4C and 0.903 (95% CI: 086–0.93) for SEIMC, which were significantly greater than those obtained in the first wave. Despite the changes that have occurred during the coronavirus disease 2019 (COVID-19) pandemic, with a reduction in lethality, scorecard systems are currently still useful tools for detecting patients with poor disease risk, with better prognostic capacity.
Pedro Ángel de Santos Castro mail , Carlos del Pozo Vegas mail , Leyre Teresa Pinilla Arribas mail , Daniel Zalama Sánchez mail , Ancor Sanz-García mail , Tony Giancarlo Vásquez del Águila mail , Pablo González Izquierdo mail , Sara de Santos Sánchez mail , Cristina Mazas Pérez-Oleaga mail cristina.mazas@uneatlantico.es, Irma Dominguez Azpíroz mail irma.dominguez@unini.edu.mx, Iñaki Elío Pascual mail inaki.elio@uneatlantico.es, Francisco Martín-Rodríguez mail ,
de Santos Castro