Depression Intensity Classification from Tweets Using FastText Based Weighted Soft Voting Ensemble

Artículo Materias > Ingeniería
Materias > Psicología Universidad Europea del Atlántico > Investigación > Producción Científica
Fundación Universitaria Internacional de Colombia > Investigación > Artículos y libros
Universidad Internacional Iberoamericana México > Investigación > Producción Científica
Universidad Internacional Iberoamericana Puerto Rico > Investigación > Producción Científica
Universidad Internacional do Cuanza > Investigación > Producción Científica Abierto Inglés Predicting depression intensity from microblogs and social media posts has numerous benefits and applications, including predicting early psychological disorders and stress in individuals or the general public. A major challenge in predicting depression using social media posts is that the existing studies do not focus on predicting the intensity of depression in social media texts but rather only perform the binary classification of depression and moreover noisy data makes it difficult to predict the true depression in the social media text. This study intends to begin by collecting relevant Tweets and generating a corpus of 210000 public tweets using Twitter public application programming interfaces (APIs). A strategy is devised to filter out only depression-related tweets by creating a list of relevant hashtags to reduce noise in the corpus. Furthermore, an algorithm is developed to annotate the data into three depression classes: ‘Mild,’ ‘Moderate,’ and ‘Severe,’ based on International Classification of Diseases-10 (ICD-10) depression diagnostic criteria. Different baseline classifiers are applied to the annotated dataset to get a preliminary idea of classification performance on the corpus. Further FastText-based model is applied and fine-tuned with different preprocessing techniques and hyperparameter tuning to produce the tuned model, which significantly increases the depression classification performance to an 84% F1 score and 90% accuracy compared to baselines. Finally, a FastText-based weighted soft voting ensemble (WSVE) is proposed to boost the model’s performance by combining several other classifiers and assigning weights to individual models according to their individual performances. The proposed WSVE outperformed all baselines as well as FastText alone, with an F1 of 89%, 5% higher than FastText alone, and an accuracy of 93%, 3% higher than FastText alone. The proposed model better captures the contextual features of the relatively small sample class and aids in the detection of early depression intensity prediction from tweets with impactful performances. metadata Rizwan, Muhammad; Mushtaq, Muhammad Faheem; Rafiq, Maryam; Mehmood, Arif; Diez, Isabel de la Torre; Gracia Villar, Mónica; Garay, Helena y Ashraf, Imran mail SIN ESPECIFICAR, SIN ESPECIFICAR, SIN ESPECIFICAR, SIN ESPECIFICAR, SIN ESPECIFICAR, monica.gracia@uneatlantico.es, helena.garay@uneatlantico.es, SIN ESPECIFICAR (2024) Depression Intensity Classification from Tweets Using FastText Based Weighted Soft Voting Ensemble. Computers, Materials & Continua, 78 (2). pp. 2047-2066. ISSN 1546-2226

Texto
TSP_CMC_37347.pdf
Available under License Creative Commons Attribution.
Descargar (861kB)

URL Oficial: http://doi.org/10.32604/cmc.2024.037347

Resumen

Predicting depression intensity from microblogs and social media posts has numerous benefits and applications, including predicting early psychological disorders and stress in individuals or the general public. A major challenge in predicting depression using social media posts is that the existing studies do not focus on predicting the intensity of depression in social media texts but rather only perform the binary classification of depression and moreover noisy data makes it difficult to predict the true depression in the social media text. This study intends to begin by collecting relevant Tweets and generating a corpus of 210000 public tweets using Twitter public application programming interfaces (APIs). A strategy is devised to filter out only depression-related tweets by creating a list of relevant hashtags to reduce noise in the corpus. Furthermore, an algorithm is developed to annotate the data into three depression classes: ‘Mild,’ ‘Moderate,’ and ‘Severe,’ based on International Classification of Diseases-10 (ICD-10) depression diagnostic criteria. Different baseline classifiers are applied to the annotated dataset to get a preliminary idea of classification performance on the corpus. Further FastText-based model is applied and fine-tuned with different preprocessing techniques and hyperparameter tuning to produce the tuned model, which significantly increases the depression classification performance to an 84% F1 score and 90% accuracy compared to baselines. Finally, a FastText-based weighted soft voting ensemble (WSVE) is proposed to boost the model’s performance by combining several other classifiers and assigning weights to individual models according to their individual performances. The proposed WSVE outperformed all baselines as well as FastText alone, with an F1 of 89%, 5% higher than FastText alone, and an accuracy of 93%, 3% higher than FastText alone. The proposed model better captures the contextual features of the relatively small sample class and aids in the detection of early depression intensity prediction from tweets with impactful performances.

Tipo de Documento:	Artículo
Palabras Clave:	Depression classification; deep learning; FastText; machine learning
Clasificación temática:	Materias > Ingeniería Materias > Psicología
Divisiones:	Universidad Europea del Atlántico > Investigación > Producción Científica Fundación Universitaria Internacional de Colombia > Investigación > Artículos y libros Universidad Internacional Iberoamericana México > Investigación > Producción Científica Universidad Internacional Iberoamericana Puerto Rico > Investigación > Producción Científica Universidad Internacional do Cuanza > Investigación > Producción Científica
Depositado:	14 Mar 2024 23:30
Ultima Modificación:	14 Mar 2024 23:30
URI:	https://repositorio.unincol.edu.co/id/eprint/11264

Acciones (logins necesarios)

Ver Objeto

open

Detecting hate in diversity: a survey of multilingual code-mixed image and video analysis

The proliferation of damaging content on social media in today’s digital environment has increased the need for efficient hate speech identification systems. A thorough examination of hate speech detection methods in a variety of settings, such as code-mixed, multilingual, visual, audio, and textual scenarios, is presented in this paper. Unlike previous research focusing on single modalities, our study thoroughly examines hate speech identification across multiple forms. We classify the numerous types of hate speech, showing how it appears on different platforms and emphasizing the unique difficulties in multi-modal and multilingual settings. We fill research gaps by assessing a variety of methods, including deep learning, machine learning, and natural language processing, especially for complicated data like code-mixed and cross-lingual text. Additionally, we offer key technique comparisons, suggesting future research avenues that prioritize multi-modal analysis and ethical data handling, while acknowledging its benefits and drawbacks. This study attempts to promote scholarly research and real-world applications on social media platforms by acting as an essential resource for improving hate speech identification across various data sources.

Producción Científica

Hafiz Muhammad Raza Ur Rehman mail , Mahpara Saleem mail , Muhammad Zeeshan Jhandir mail , Eduardo René Silva Alvarado mail eduardo.silva@funiber.org, Helena Garay mail helena.garay@uneatlantico.es, Imran Ashraf mail ,

Raza Ur Rehman

open

Ensemble stacked model for enhanced identification of sentiments from IMDB reviews

The emergence of social media platforms led to the sharing of ideas, thoughts, events, and reviews. The shared views and comments contain people’s sentiments and analysis of these sentiments has emerged as one of the most popular fields of study. Sentiment analysis in the Urdu language is an important research problem similar to other languages, however, it is not investigated very well. On social media platforms like X (Twitter), billions of native Urdu speakers use the Urdu script which makes sentiment analysis in the Urdu language important. In this regard, an ensemble model RRLS is proposed that stacks random forest, recurrent neural network, logistic regression (LR), and support vector machine (SVM). The Internet Movie Database (IMDB) movie reviews and Urdu tweets are examined in this study using Urdu sentiment analysis. The Urdu hack library was used to preprocess the Urdu data, which includes preprocessing operations including normalizing individual letters, merging them, including spaces, etc. concerning punctuation. The problem of accurately encoding Urdu characters and replacing Arabic letters with their Urdu equivalents is fixed by the normalization module. Several models are adopted in this study for extensive evaluation of their accuracy for Urdu sentiment analysis. While the results promising, among machine learning models, the SVM and LR attained an accuracy of 87%, according to performance criteria such as F-measure, accuracy, recall, and precision. The accuracy of the long short-term memory (LSTM) and bidirectional LSTM (BiLSTM) was 84%. The suggested ensemble RRLS model performs better than other learning algorithms and achieves a 90% accuracy rate, outperforming current methods. The use of the synthetic minority oversampling technique (SMOTE) is observed to improve the performance and lead to 92.77% accuracy.

Producción Científica

Komal Azim mail , Alishba Tahir mail , Mobeen Shahroz mail , Hanen Karamti mail , Annia A. Vázquez mail annia.almeyda@uneatlantico.es, Angel Olider Rojas Vistorte mail angel.rojas@uneatlantico.es, Imran Ashraf mail ,

Azim

open

Efficient CNN architecture with image sensing and algorithmic channeling for dataset harmonization

The process of image formulation uses semantic analysis to extract influential vectors from image components. The proposed approach integrates DenseNet with ResNet-50, VGG-19, and GoogLeNet using an innovative bonding process that establishes algorithmic channeling between these models. The goal targets compact efficient image feature vectors that process data in parallel regardless of input color or grayscale consistency and work across different datasets and semantic categories. Image patching techniques with corner straddling and isolated responses help detect peaks and junctions while addressing anisotropic noise through curvature-based computations and auto-correlation calculations. An integrated channeled algorithm processes the refined features by uniting local-global features with primitive-parameterized features and regioned feature vectors. Using K-nearest neighbor indexing methods analyze and retrieve images from the harmonized signature collection effectively. Extensive experimentation is performed on the state-of-the-art datasets including Caltech-101, Cifar-10, Caltech-256, Cifar-100, Corel-10000, 17-Flowers, COIL-100, FTVL Tropical Fruits, Corel-1000, and Zubud. This contribution finally endorses its standing at the peak of deep and complex image sensing analysis. A state-of-the-art deep image sensing analysis method delivers optimal channeling accuracy together with robust dataset harmonization performance.

Producción Científica

Khadija Kanwal mail , Khawaja Tehseen Ahmad mail , Aiza Shabir mail , Li Jing mail , Helena Garay mail helena.garay@uneatlantico.es, Luis Eduardo Prado González mail uis.prado@uneatlantico.es, Hanen Karamti mail , Imran Ashraf mail ,

Kanwal

open

Deep image features sensing with multilevel fusion for complex convolution neural networks & cross domain benchmarks

Efficient image retrieval from a variety of datasets is crucial in today's digital world. Visual properties are represented using primitive image signatures in Content Based Image Retrieval (CBIR). Feature vectors are employed to classify images into predefined categories. This research presents a unique feature identification technique based on suppression to locate interest points by computing productive sum of pixel derivatives by computing the differentials for corner scores. Scale space interpolation is applied to define interest points by combining color features from spatially ordered L2 normalized coefficients with shape and object information. Object based feature vectors are formed using high variance coefficients to reduce the complexity and are converted into bag-of-visual-words (BoVW) for effective retrieval and ranking. The presented method encompass feature vectors for information synthesis and improves the discriminating strength of the retrieval system by extracting deep image features including primitive, spatial, and overlayed using multilayer fusion of Convolutional Neural Networks(CNNs). Extensive experimentation is performed on standard image datasets benchmarks, including ALOT, Cifar-10, Corel-10k, Tropical Fruits, and Zubud. These datasets cover wide range of categories including shape, color, texture, spatial, and complicated objects. Experimental results demonstrate considerable improvements in precision and recall rates, average retrieval precision and recall, and mean average precision and recall rates across various image semantic groups within versatile datasets. The integration of traditional feature extraction methods fusion with multilevel CNN advances image sensing and retrieval systems, promising more accurate and efficient image retrieval solutions.

Producción Científica

Jyotismita Chaki mail , Aiza Shabir mail , Khawaja Tehseen Ahmed mail , Arif Mahmood mail , Helena Garay mail helena.garay@uneatlantico.es, Luis Eduardo Prado González mail uis.prado@uneatlantico.es, Imran Ashraf mail ,

Chaki

open

Fundus image classification using feature concatenation for early diagnosis of retinal disease

Background Deep learning models assist ophthalmologists in early detection of diseases from retinal images and timely treatment. Aim Owing to robust and accurate results from deep learning models, we aim to use convolutional neural network (CNN) to provide a non-invasive method for early detection of eye diseases. Methodology We used a hybridized CNN with deep learning (DL) based on two separate CNN blocks, to identify multiple Optic Disc Cupping, Diabetic Retinopathy, Media Haze, and Healthy images. We used the RFMiD dataset, which contains various categories of fundus images representing different eye diseases. Data augmenting, resizing, coping, and one-hot encoding are used among other preprocessing techniques to improve the performance of the proposed model. Color fundus images have been analyzed by CNNs to extract relevant features. Two CCN models that extract deep features are trained in parallel. To obtain more noticeable features, the gathered features are further fused utilizing the Canonical Correlation Analysis fusion approach. To assess the effectiveness, we employed eight classification algorithms: Gradient boosting, support vector machines, voting ensemble, medium KNN, Naive Bayes, COARSE- KNN, random forest, and fine KNN. Results With the greatest accuracy of 93.39%, the ensemble learning performed better than the other algorithms. Conclusion The accuracy rates suggest that the deep learning model has learned to distinguish between different eye disease categories and healthy images effectively. It contributes to the field of eye disease detection through the analysis of color fundus images by providing a reliable and efficient diagnostic system.

Producción Científica

Sara Ejaz mail , Hafiz U Zia mail , Fiaz Majeed mail , Umair Shafique mail , Stefanía Carvajal-Altamiranda mail stefania.carvajal@uneatlantico.es, Vivian Lipari mail vivian.lipari@uneatlantico.es, Imran Ashraf mail ,

Ejaz

Enlaces de interés

Enlaces de interés

Depression Intensity Classification from Tweets Using FastText Based Weighted Soft Voting Ensemble

Resumen

Acciones (logins necesarios)

TEMÁTICA

ACCESO

IDIOMA

Filtros

Información