Depression Intensity Classification from Tweets Using FastText Based Weighted Soft Voting Ensemble
Artículo
Materias > Ingeniería
Materias > Psicología
Universidad Europea del Atlántico > Investigación > Producción Científica
Fundación Universitaria Internacional de Colombia > Investigación > Artículos y libros
Universidad Internacional Iberoamericana México > Investigación > Producción Científica
Universidad Internacional Iberoamericana Puerto Rico > Investigación > Producción Científica
Universidad Internacional do Cuanza > Investigación > Producción Científica
Abierto
Inglés
Predicting depression intensity from microblogs and social media posts has numerous benefits and applications, including predicting early psychological disorders and stress in individuals or the general public. A major challenge in predicting depression using social media posts is that the existing studies do not focus on predicting the intensity of depression in social media texts but rather only perform the binary classification of depression and moreover noisy data makes it difficult to predict the true depression in the social media text. This study intends to begin by collecting relevant Tweets and generating a corpus of 210000 public tweets using Twitter public application programming interfaces (APIs). A strategy is devised to filter out only depression-related tweets by creating a list of relevant hashtags to reduce noise in the corpus. Furthermore, an algorithm is developed to annotate the data into three depression classes: ‘Mild,’ ‘Moderate,’ and ‘Severe,’ based on International Classification of Diseases-10 (ICD-10) depression diagnostic criteria. Different baseline classifiers are applied to the annotated dataset to get a preliminary idea of classification performance on the corpus. Further FastText-based model is applied and fine-tuned with different preprocessing techniques and hyperparameter tuning to produce the tuned model, which significantly increases the depression classification performance to an 84% F1 score and 90% accuracy compared to baselines. Finally, a FastText-based weighted soft voting ensemble (WSVE) is proposed to boost the model’s performance by combining several other classifiers and assigning weights to individual models according to their individual performances. The proposed WSVE outperformed all baselines as well as FastText alone, with an F1 of 89%, 5% higher than FastText alone, and an accuracy of 93%, 3% higher than FastText alone. The proposed model better captures the contextual features of the relatively small sample class and aids in the detection of early depression intensity prediction from tweets with impactful performances.
metadata
Rizwan, Muhammad; Mushtaq, Muhammad Faheem; Rafiq, Maryam; Mehmood, Arif; Diez, Isabel de la Torre; Gracia Villar, Mónica; Garay, Helena y Ashraf, Imran
mail
SIN ESPECIFICAR, SIN ESPECIFICAR, SIN ESPECIFICAR, SIN ESPECIFICAR, SIN ESPECIFICAR, monica.gracia@uneatlantico.es, helena.garay@uneatlantico.es, SIN ESPECIFICAR
(2024)
Depression Intensity Classification from Tweets Using FastText Based Weighted Soft Voting Ensemble.
Computers, Materials & Continua, 78 (2).
pp. 2047-2066.
ISSN 1546-2226
Texto
TSP_CMC_37347.pdf Available under License Creative Commons Attribution. Descargar (861kB) |
Resumen
Predicting depression intensity from microblogs and social media posts has numerous benefits and applications, including predicting early psychological disorders and stress in individuals or the general public. A major challenge in predicting depression using social media posts is that the existing studies do not focus on predicting the intensity of depression in social media texts but rather only perform the binary classification of depression and moreover noisy data makes it difficult to predict the true depression in the social media text. This study intends to begin by collecting relevant Tweets and generating a corpus of 210000 public tweets using Twitter public application programming interfaces (APIs). A strategy is devised to filter out only depression-related tweets by creating a list of relevant hashtags to reduce noise in the corpus. Furthermore, an algorithm is developed to annotate the data into three depression classes: ‘Mild,’ ‘Moderate,’ and ‘Severe,’ based on International Classification of Diseases-10 (ICD-10) depression diagnostic criteria. Different baseline classifiers are applied to the annotated dataset to get a preliminary idea of classification performance on the corpus. Further FastText-based model is applied and fine-tuned with different preprocessing techniques and hyperparameter tuning to produce the tuned model, which significantly increases the depression classification performance to an 84% F1 score and 90% accuracy compared to baselines. Finally, a FastText-based weighted soft voting ensemble (WSVE) is proposed to boost the model’s performance by combining several other classifiers and assigning weights to individual models according to their individual performances. The proposed WSVE outperformed all baselines as well as FastText alone, with an F1 of 89%, 5% higher than FastText alone, and an accuracy of 93%, 3% higher than FastText alone. The proposed model better captures the contextual features of the relatively small sample class and aids in the detection of early depression intensity prediction from tweets with impactful performances.
Tipo de Documento: | Artículo |
---|---|
Palabras Clave: | Depression classification; deep learning; FastText; machine learning |
Clasificación temática: | Materias > Ingeniería Materias > Psicología |
Divisiones: | Universidad Europea del Atlántico > Investigación > Producción Científica Fundación Universitaria Internacional de Colombia > Investigación > Artículos y libros Universidad Internacional Iberoamericana México > Investigación > Producción Científica Universidad Internacional Iberoamericana Puerto Rico > Investigación > Producción Científica Universidad Internacional do Cuanza > Investigación > Producción Científica |
Depositado: | 14 Mar 2024 23:30 |
Ultima Modificación: | 14 Mar 2024 23:30 |
URI: | https://repositorio.unincol.edu.co/id/eprint/11264 |
Acciones (logins necesarios)
Ver Objeto |
<a class="ep_document_link" href="/14584/1/s41598-024-73664-6.pdf"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>
en
open
The evolution of the COVID-19 pandemic has been associated with variations in clinical presentation and severity. Similarly, prediction scores may suffer changes in their diagnostic accuracy. The aim of this study was to test the 30-day mortality predictive validity of the 4C and SEIMC scores during the sixth wave of the pandemic and to compare them with those of validation studies. This was a longitudinal retrospective observational study. COVID-19 patients who were admitted to the Emergency Department of a Spanish hospital from December 15, 2021, to January 31, 2022, were selected. A side-by-side comparison with the pivotal validation studies was subsequently performed. The main measures were 30-day mortality and the 4C and SEIMC scores. A total of 27,614 patients were considered in the study, including 22,361 from the 4C, 4,627 from the SEIMC and 626 from our hospital. The 30-day mortality rate was significantly lower than that reported in the validation studies. The AUCs were 0.931 (95% CI: 0.90–0.95) for 4C and 0.903 (95% CI: 086–0.93) for SEIMC, which were significantly greater than those obtained in the first wave. Despite the changes that have occurred during the coronavirus disease 2019 (COVID-19) pandemic, with a reduction in lethality, scorecard systems are currently still useful tools for detecting patients with poor disease risk, with better prognostic capacity.
Pedro Ángel de Santos Castro mail , Carlos del Pozo Vegas mail , Leyre Teresa Pinilla Arribas mail , Daniel Zalama Sánchez mail , Ancor Sanz-García mail , Tony Giancarlo Vásquez del Águila mail , Pablo González Izquierdo mail , Sara de Santos Sánchez mail , Cristina Mazas Pérez-Oleaga mail cristina.mazas@uneatlantico.es, Irma Dominguez Azpíroz mail irma.dominguez@unini.edu.mx, Iñaki Elío Pascual mail inaki.elio@uneatlantico.es, Francisco Martín-Rodríguez mail ,
de Santos Castro
<a href="/14282/1/s40537-024-00959-w.pdf" class="ep_document_link"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>
en
open
Non-Insulin-Dependent Diabetes Mellitus (NIDDM) is a chronic health condition caused by high blood sugar levels, and if not treated early, it can lead to serious complications i.e. blindness. Human Activity Recognition (HAR) offers potential for early NIDDM diagnosis, emerging as a key application for HAR technology. This research introduces DiabSense, a state-of-the-art smartphone-dependent system for early staging of NIDDM. DiabSense incorporates HAR and Diabetic Retinopathy (DR) upon leveraging the power of two different Graph Neural Networks (GNN). HAR uses a comprehensive array of 23 human activities resembling Diabetes symptoms, and DR is a prevalent complication of NIDDM. Graph Attention Network (GAT) in HAR achieved 98.32% accuracy on sensor data, while Graph Convolutional Network (GCN) in the Aptos 2019 dataset scored 84.48%, surpassing other state-of-the-art models. The trained GCN analyzed retinal images of four experimental human subjects for DR report generation, and GAT generated their average duration of daily activities over 30 days. The daily activities in non-diabetic periods of diabetic patients were measured and compared with the daily activities of the experimental subjects, which helped generate risk factors. Fusing risk factors with DR conditions enabled early diagnosis recommendations for the experimental subjects despite the absence of any apparent symptoms. The comparison of DiabSense system outcome with clinical diagnosis reports in the experimental subjects was conducted using the A1C test. The test results confirmed the accurate assessment of early diagnosis requirements for experimental subjects by the system. Overall, DiabSense exhibits significant potential for ensuring early NIDDM treatment, improving millions of lives worldwide.
Md Nuho Ul Alam mail , Ibrahim Hasnine mail , Erfanul Hoque Bahadur mail , Abdul Kadar Muhammad Masum mail , Mercedes Briones Urbano mail mercedes.briones@uneatlantico.es, Manuel Masías Vergara mail manuel.masias@uneatlantico.es, Jia Uddin mail , Imran Ashraf mail , Md. Abdus Samad mail ,
Alam
<a class="ep_document_link" href="/14278/1/s41746-024-01194-6.pdf"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>
en
open
Emergency medical services (EMSs) face critical situations that require patient risk classification based on analytical and vital signs. We aimed to establish clustering-derived phenotypes based on prehospital analytical and vital signs that allow risk stratification. This was a prospective, multicenter, EMS-delivered, ambulance-based cohort study considering six advanced life support units, 38 basic life support units, and four tertiary hospitals in Spain. Adults with unselected acute diseases managed by the EMS and evacuated with discharge priority to emergency departments were considered between January 1, 2020, and June 30, 2023. Prehospital point-of-care testing and on-scene vital signs were used for the unsupervised machine learning method (clustering) to determine the phenotypes. Then phenotypes were compared with the primary outcome (cumulative mortality (all-cause) at 2, 7, and 30 days). A total of 7909 patients were included. The median (IQR) age was 64 (51–80) years, 41% were women, and 26% were living in rural areas. Three clusters were identified: alpha 16.2% (1281 patients), beta 28.8% (2279), and gamma 55% (4349). The mortality rates for alpha, beta and gamma at 2 days were 18.6%, 4.1%, and 0.8%, respectively; at 7 days, were 24.7%, 6.2%, and 1.7%; and at 30 days, were 33%, 10.2%, and 3.2%, respectively. Based on standard vital signs and blood test biomarkers in the prehospital scenario, three clusters were identified: alpha (high-risk), beta and gamma (medium- and low-risk, respectively). This permits the EMS system to quickly identify patients who are potentially compromised and to proactively implement the necessary interventions.
Raúl López-Izquierdo mail , Carlos del Pozo Vegas mail , Ancor Sanz-García mail , Agustín Mayo Íscar mail , Miguel A. Castro Villamor mail , Eduardo René Silva Alvarado mail eduardo.silva@funiber.org, Santos Gracia Villar mail santos.gracia@uneatlantico.es, Luis Alonso Dzul López mail luis.dzul@uneatlantico.es, Silvia Aparicio Obregón mail silvia.aparicio@uneatlantico.es, Rubén Calderón Iglesias mail ruben.calderon@uneatlantico.es, Joan B. Soriano mail , Francisco Martín-Rodríguez mail ,
López-Izquierdo
<a href="/14344/1/journal.pone.0304774.pdf" class="ep_document_link"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>
en
open
Novel model to authenticate role-based medical users for blockchain-based IoMT devices
The IoT (Internet of Things) has played a promising role in e-healthcare applications during the last decade. Medical sensors record a variety of data and transmit them over the IoT network to facilitate remote patient monitoring. When a patient visits a hospital he may need to connect or disconnect medical devices from the medical healthcare system frequently. Also, multiple entities (e.g., doctors, medical staff, etc.) need access to patient data and require distinct sets of patient data. As a result of the dynamic nature of medical devices, medical users require frequent access to data, which raises complex security concerns. Granting access to a whole set of data creates privacy issues. Also, each of these medical user need to grant access rights to a specific set of medical data, which is quite a tedious task. In order to provide role-based access to medical users, this study proposes a blockchain-based framework for authenticating multiple entities based on the trust domain to reduce the administrative burden. This study is further validated by simulation on the infura blockchain using solidity and Python. The results demonstrate that role-based authorization and multi-entities authentication have been implemented and the owner of medical data can control access rights at any time and grant medical users easy access to a set of data in a healthcare system. The system has minimal latency compared to existing blockchain systems that lack multi-entity authentication and role-based authorization.
Shadab Alam mail , Muhammad Shehzad Aslam mail , Ayesha Altaf mail , Faiza Iqbal mail , Natasha Nigar mail , Juan Castanedo Galán mail juan.castanedo@uneatlantico.es, Daniel Gavilanes Aray mail daniel.gavilanes@uneatlantico.es, Isabel de la Torre Díez mail , Imran Ashraf mail ,
Alam
<a class="ep_document_link" href="/14933/1/s41746-024-01194-6.pdf"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>
en
open
Emergency medical services (EMSs) face critical situations that require patient risk classification based on analytical and vital signs. We aimed to establish clustering-derived phenotypes based on prehospital analytical and vital signs that allow risk stratification. This was a prospective, multicenter, EMS-delivered, ambulance-based cohort study considering six advanced life support units, 38 basic life support units, and four tertiary hospitals in Spain. Adults with unselected acute diseases managed by the EMS and evacuated with discharge priority to emergency departments were considered between January 1, 2020, and June 30, 2023. Prehospital point-of-care testing and on-scene vital signs were used for the unsupervised machine learning method (clustering) to determine the phenotypes. Then phenotypes were compared with the primary outcome (cumulative mortality (all-cause) at 2, 7, and 30 days). A total of 7909 patients were included. The median (IQR) age was 64 (51–80) years, 41% were women, and 26% were living in rural areas. Three clusters were identified: alpha 16.2% (1281 patients), beta 28.8% (2279), and gamma 55% (4349). The mortality rates for alpha, beta and gamma at 2 days were 18.6%, 4.1%, and 0.8%, respectively; at 7 days, were 24.7%, 6.2%, and 1.7%; and at 30 days, were 33%, 10.2%, and 3.2%, respectively. Based on standard vital signs and blood test biomarkers in the prehospital scenario, three clusters were identified: alpha (high-risk), beta and gamma (medium- and low-risk, respectively). This permits the EMS system to quickly identify patients who are potentially compromised and to proactively implement the necessary interventions.
Raúl López-Izquierdo mail , Carlos del Pozo Vegas mail , Ancor Sanz-García mail , Agustín Mayo Íscar mail , Miguel A. Castro Villamor mail , Eduardo René Silva Alvarado mail eduardo.silva@funiber.org, Santos Gracia Villar mail santos.gracia@uneatlantico.es, Luis Alonso Dzul López mail luis.dzul@uneatlantico.es, Silvia Aparicio Obregón mail silvia.aparicio@uneatlantico.es, Rubén Calderón Iglesias mail ruben.calderon@uneatlantico.es, Joan B. Soriano mail , Francisco Martín-Rodríguez mail ,
López-Izquierdo