Depression Intensity Classification from Tweets Using FastText Based Weighted Soft Voting Ensemble

Article Subjects > Engineering
Subjects > Psychology
Europe University of Atlantic > Research > Scientific Production
Fundación Universitaria Internacional de Colombia > Research > Articles and books
Ibero-american International University > Research > Scientific Production
Ibero-american International University > Research > Scientific Production
Universidad Internacional do Cuanza > Research > Scientific Production
Abierto Inglés Predicting depression intensity from microblogs and social media posts has numerous benefits and applications, including predicting early psychological disorders and stress in individuals or the general public. A major challenge in predicting depression using social media posts is that the existing studies do not focus on predicting the intensity of depression in social media texts but rather only perform the binary classification of depression and moreover noisy data makes it difficult to predict the true depression in the social media text. This study intends to begin by collecting relevant Tweets and generating a corpus of 210000 public tweets using Twitter public application programming interfaces (APIs). A strategy is devised to filter out only depression-related tweets by creating a list of relevant hashtags to reduce noise in the corpus. Furthermore, an algorithm is developed to annotate the data into three depression classes: ‘Mild,’ ‘Moderate,’ and ‘Severe,’ based on International Classification of Diseases-10 (ICD-10) depression diagnostic criteria. Different baseline classifiers are applied to the annotated dataset to get a preliminary idea of classification performance on the corpus. Further FastText-based model is applied and fine-tuned with different preprocessing techniques and hyperparameter tuning to produce the tuned model, which significantly increases the depression classification performance to an 84% F1 score and 90% accuracy compared to baselines. Finally, a FastText-based weighted soft voting ensemble (WSVE) is proposed to boost the model’s performance by combining several other classifiers and assigning weights to individual models according to their individual performances. The proposed WSVE outperformed all baselines as well as FastText alone, with an F1 of 89%, 5% higher than FastText alone, and an accuracy of 93%, 3% higher than FastText alone. The proposed model better captures the contextual features of the relatively small sample class and aids in the detection of early depression intensity prediction from tweets with impactful performances. metadata Rizwan, Muhammad and Mushtaq, Muhammad Faheem and Rafiq, Maryam and Mehmood, Arif and Diez, Isabel de la Torre and Gracia Villar, Mónica and Garay, Helena and Ashraf, Imran mail UNSPECIFIED, UNSPECIFIED, UNSPECIFIED, UNSPECIFIED, UNSPECIFIED, monica.gracia@uneatlantico.es, helena.garay@uneatlantico.es, UNSPECIFIED (2024) Depression Intensity Classification from Tweets Using FastText Based Weighted Soft Voting Ensemble. Computers, Materials & Continua, 78 (2). pp. 2047-2066. ISSN 1546-2226

[img] Text
TSP_CMC_37347.pdf
Available under License Creative Commons Attribution.

Download (861kB)

Abstract

Predicting depression intensity from microblogs and social media posts has numerous benefits and applications, including predicting early psychological disorders and stress in individuals or the general public. A major challenge in predicting depression using social media posts is that the existing studies do not focus on predicting the intensity of depression in social media texts but rather only perform the binary classification of depression and moreover noisy data makes it difficult to predict the true depression in the social media text. This study intends to begin by collecting relevant Tweets and generating a corpus of 210000 public tweets using Twitter public application programming interfaces (APIs). A strategy is devised to filter out only depression-related tweets by creating a list of relevant hashtags to reduce noise in the corpus. Furthermore, an algorithm is developed to annotate the data into three depression classes: ‘Mild,’ ‘Moderate,’ and ‘Severe,’ based on International Classification of Diseases-10 (ICD-10) depression diagnostic criteria. Different baseline classifiers are applied to the annotated dataset to get a preliminary idea of classification performance on the corpus. Further FastText-based model is applied and fine-tuned with different preprocessing techniques and hyperparameter tuning to produce the tuned model, which significantly increases the depression classification performance to an 84% F1 score and 90% accuracy compared to baselines. Finally, a FastText-based weighted soft voting ensemble (WSVE) is proposed to boost the model’s performance by combining several other classifiers and assigning weights to individual models according to their individual performances. The proposed WSVE outperformed all baselines as well as FastText alone, with an F1 of 89%, 5% higher than FastText alone, and an accuracy of 93%, 3% higher than FastText alone. The proposed model better captures the contextual features of the relatively small sample class and aids in the detection of early depression intensity prediction from tweets with impactful performances.

Item Type: Article
Uncontrolled Keywords: Depression classification; deep learning; FastText; machine learning
Subjects: Subjects > Engineering
Subjects > Psychology
Divisions: Europe University of Atlantic > Research > Scientific Production
Fundación Universitaria Internacional de Colombia > Research > Articles and books
Ibero-american International University > Research > Scientific Production
Ibero-american International University > Research > Scientific Production
Universidad Internacional do Cuanza > Research > Scientific Production
Date Deposited: 14 Mar 2024 23:30
Last Modified: 14 Mar 2024 23:30
URI: https://repositorio.unincol.edu.co/id/eprint/11264

Actions (login required)

View Item View Item

<a href="/15333/1/nutrients-16-03907.pdf" class="ep_document_link"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>

en

open

Youth Healthy Eating Index (YHEI) and Diet Adequacy in Relation to Country-Specific National Dietary Recommendations in Children and Adolescents in Five Mediterranean Countries from the DELICIOUS Project

Background/Objectives: The diet quality of younger individuals is decreasing globally, with alarming trends also in the Mediterranean region. The aim of this study was to assess diet quality and adequacy in relation to country-specific dietary recommendations for children and adolescents living in the Mediterranean area. Methods: A cross-sectional survey was conducted of 2011 parents of the target population participating in the DELICIOUS EU-PRIMA project. Dietary data and cross-references with food-based recommendations and the application of the youth healthy eating index (YHEI) was assessed through 24 h recalls and food frequency questionnaires. Results: Adherence to recommendations on plant-based foods was low (less than ∼20%), including fruit and vegetables adequacy in all countries, legume adequacy in all countries except for Italy, and cereal adequacy in all countries except for Portugal. For animal products and dietary fats, the adequacy in relation to the national food-based dietary recommendations was slightly better (∼40% on average) in most countries, although the Eastern countries reported worse rates. Higher scores on the YHEI predicted adequacy in relation to vegetables (except Egypt), fruit (except Lebanon), cereals (except Spain), and legumes (except Spain) in most countries. Younger children (p < 0.005) reporting having 8–10 h adequate sleep duration (p < 0.001), <2 h/day screen time (p < 0.001), and a medium/high physical activity level (p < 0.001) displayed a better diet quality. Moreover, older respondents (p < 0.001) with a medium/high educational level (p = 0.001) and living with a partner (p = 0.003) reported that their children had a better diet quality. Conclusions: Plant-based food groups, including fruit, vegetables, legumes, and even (whole-grain) cereals are underrepresented in the diets of Mediterranean children and adolescents. Moreover, the adequate consumption of other important dietary components, such as milk and dairy products, is rather disregarded, leading to substantially suboptimal diets and poor adequacy in relation to dietary guidelines.

Producción Científica

Francesca Giampieri mail francesca.giampieri@uneatlantico.es, Alice Rosi mail , Francesca Scazzina mail , Evelyn Frias-Toral mail , Osama Abdelkarim mail , Mohamed Aly mail , Raynier Zambrano-Villacres mail , Juancho Pons mail , Laura Vázquez-Araújo mail , Sandra Sumalla Cano mail sandra.sumalla@uneatlantico.es, Iñaki Elío Pascual mail inaki.elio@uneatlantico.es, Lorenzo Monasta mail , Ana Mata mail , María Isabel Pardo mail , Pablo Busó mail , Giuseppe Grosso mail ,

Giampieri

<a href="/14584/1/s41598-024-73664-6.pdf" class="ep_document_link"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>

en

open

Performance of the 4C and SEIMC scoring systems in predicting mortality from onset to current COVID-19 pandemic in emergency departments

The evolution of the COVID-19 pandemic has been associated with variations in clinical presentation and severity. Similarly, prediction scores may suffer changes in their diagnostic accuracy. The aim of this study was to test the 30-day mortality predictive validity of the 4C and SEIMC scores during the sixth wave of the pandemic and to compare them with those of validation studies. This was a longitudinal retrospective observational study. COVID-19 patients who were admitted to the Emergency Department of a Spanish hospital from December 15, 2021, to January 31, 2022, were selected. A side-by-side comparison with the pivotal validation studies was subsequently performed. The main measures were 30-day mortality and the 4C and SEIMC scores. A total of 27,614 patients were considered in the study, including 22,361 from the 4C, 4,627 from the SEIMC and 626 from our hospital. The 30-day mortality rate was significantly lower than that reported in the validation studies. The AUCs were 0.931 (95% CI: 0.90–0.95) for 4C and 0.903 (95% CI: 086–0.93) for SEIMC, which were significantly greater than those obtained in the first wave. Despite the changes that have occurred during the coronavirus disease 2019 (COVID-19) pandemic, with a reduction in lethality, scorecard systems are currently still useful tools for detecting patients with poor disease risk, with better prognostic capacity.

Producción Científica

Pedro Ángel de Santos Castro mail , Carlos del Pozo Vegas mail , Leyre Teresa Pinilla Arribas mail , Daniel Zalama Sánchez mail , Ancor Sanz-García mail , Tony Giancarlo Vásquez del Águila mail , Pablo González Izquierdo mail , Sara de Santos Sánchez mail , Cristina Mazas Pérez-Oleaga mail cristina.mazas@uneatlantico.es, Irma Dominguez Azpíroz mail irma.dominguez@unini.edu.mx, Iñaki Elío Pascual mail inaki.elio@uneatlantico.es, Francisco Martín-Rodríguez mail ,

de Santos Castro

<a href="/14950/1/fmicb-15-1481418.pdf" class="ep_document_link"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>

en

open

Evolving epidemiology, clinical features, and genotyping of dengue outbreaks in Bangladesh, 2000–2024: a systematic review

Background: The 2023 dengue outbreak has proven that dengue is not only an endemic disease but also an emerging health threat in Bangladesh. Integrated studies on the epidemiology, clinical characteristics, seasonality, and genotype of dengue are limited. This study was conducted to determine recent trends in the molecular epidemiology, clinical features, and seasonality of dengue outbreaks. Methods: We analyzed data from 41 original studies, extracting epidemiological information from all 41 articles, clinical symptoms from 30 articles, and genotypic diversity from 11 articles. The study adhered to the standards of the Preferred Reporting Items for Systematic Review and Meta-Analysis (PRISMA) Statement and Cochrane Collaboration guidelines. Conclusion: This study provides integrated insights into the molecular epidemiology, clinical features, seasonality, and transmission of dengue in Bangladesh and highlights research gaps for future studies.

Producción Científica

Nadim Sharif mail , Rubayet Rayhan Opu mail , Tama Saha mail , Abdullah Ibna Masud mail , Jannatin Naim mail , Khalaf F. Alsharif mail , Khalid J. Alzahrani mail , Eduardo René Silva Alvarado mail eduardo.silva@funiber.org, Irene Delgado Noya mail irene.delgado@uneatlantico.es, Isabel De la Torre Díez mail , Shuvra Kanti Dey mail ,

Sharif

<a href="/14282/1/s40537-024-00959-w.pdf" class="ep_document_link"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>

en

open

DiabSense: early diagnosis of non-insulin-dependent diabetes mellitus using smartphone-based human activity recognition and diabetic retinopathy analysis with Graph Neural Network

Non-Insulin-Dependent Diabetes Mellitus (NIDDM) is a chronic health condition caused by high blood sugar levels, and if not treated early, it can lead to serious complications i.e. blindness. Human Activity Recognition (HAR) offers potential for early NIDDM diagnosis, emerging as a key application for HAR technology. This research introduces DiabSense, a state-of-the-art smartphone-dependent system for early staging of NIDDM. DiabSense incorporates HAR and Diabetic Retinopathy (DR) upon leveraging the power of two different Graph Neural Networks (GNN). HAR uses a comprehensive array of 23 human activities resembling Diabetes symptoms, and DR is a prevalent complication of NIDDM. Graph Attention Network (GAT) in HAR achieved 98.32% accuracy on sensor data, while Graph Convolutional Network (GCN) in the Aptos 2019 dataset scored 84.48%, surpassing other state-of-the-art models. The trained GCN analyzed retinal images of four experimental human subjects for DR report generation, and GAT generated their average duration of daily activities over 30 days. The daily activities in non-diabetic periods of diabetic patients were measured and compared with the daily activities of the experimental subjects, which helped generate risk factors. Fusing risk factors with DR conditions enabled early diagnosis recommendations for the experimental subjects despite the absence of any apparent symptoms. The comparison of DiabSense system outcome with clinical diagnosis reports in the experimental subjects was conducted using the A1C test. The test results confirmed the accurate assessment of early diagnosis requirements for experimental subjects by the system. Overall, DiabSense exhibits significant potential for ensuring early NIDDM treatment, improving millions of lives worldwide.

Producción Científica

Md Nuho Ul Alam mail , Ibrahim Hasnine mail , Erfanul Hoque Bahadur mail , Abdul Kadar Muhammad Masum mail , Mercedes Briones Urbano mail mercedes.briones@uneatlantico.es, Manuel Masías Vergara mail manuel.masias@uneatlantico.es, Jia Uddin mail , Imran Ashraf mail , Md. Abdus Samad mail ,

Alam

<a href="/14278/1/s41746-024-01194-6.pdf" class="ep_document_link"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>

en

open

Clinical phenotypes and short-term outcomes based on prehospital point-of-care testing and on-scene vital signs

Emergency medical services (EMSs) face critical situations that require patient risk classification based on analytical and vital signs. We aimed to establish clustering-derived phenotypes based on prehospital analytical and vital signs that allow risk stratification. This was a prospective, multicenter, EMS-delivered, ambulance-based cohort study considering six advanced life support units, 38 basic life support units, and four tertiary hospitals in Spain. Adults with unselected acute diseases managed by the EMS and evacuated with discharge priority to emergency departments were considered between January 1, 2020, and June 30, 2023. Prehospital point-of-care testing and on-scene vital signs were used for the unsupervised machine learning method (clustering) to determine the phenotypes. Then phenotypes were compared with the primary outcome (cumulative mortality (all-cause) at 2, 7, and 30 days). A total of 7909 patients were included. The median (IQR) age was 64 (51–80) years, 41% were women, and 26% were living in rural areas. Three clusters were identified: alpha 16.2% (1281 patients), beta 28.8% (2279), and gamma 55% (4349). The mortality rates for alpha, beta and gamma at 2 days were 18.6%, 4.1%, and 0.8%, respectively; at 7 days, were 24.7%, 6.2%, and 1.7%; and at 30 days, were 33%, 10.2%, and 3.2%, respectively. Based on standard vital signs and blood test biomarkers in the prehospital scenario, three clusters were identified: alpha (high-risk), beta and gamma (medium- and low-risk, respectively). This permits the EMS system to quickly identify patients who are potentially compromised and to proactively implement the necessary interventions.

Producción Científica

Raúl López-Izquierdo mail , Carlos del Pozo Vegas mail , Ancor Sanz-García mail , Agustín Mayo Íscar mail , Miguel A. Castro Villamor mail , Eduardo René Silva Alvarado mail eduardo.silva@funiber.org, Santos Gracia Villar mail santos.gracia@uneatlantico.es, Luis Alonso Dzul López mail luis.dzul@uneatlantico.es, Silvia Aparicio Obregón mail silvia.aparicio@uneatlantico.es, Rubén Calderón Iglesias mail ruben.calderon@uneatlantico.es, Joan B. Soriano mail , Francisco Martín-Rodríguez mail ,

López-Izquierdo