Ensemble Partition Sampling (EPS) for Improved Multi-Class Classification

Artículo Materias > Ingeniería Universidad Europea del Atlántico > Investigación > Producción Científica
Fundación Universitaria Internacional de Colombia > Investigación > Producción Científica
Universidad Internacional Iberoamericana México > Investigación > Producción Científica
Universidad Internacional Iberoamericana Puerto Rico > Investigación > Producción Científica
Universidad Internacional do Cuanza > Investigación > Producción Científica
Abierto Inglés Classification is a commonly used technique in data mining and is applied in various fields such as sentiment analysis, fraud detection, and fault diagnosis. Multiclass classification, which involves more than two classes, is more complex than binary classification. There are mainly two ways to approach multiclass classification, one is to expand the binary classifier into a multiclass classifier through various strategies and the other is to divide the multiclass classification problem into multiple binary problems (binarization). Two popular approaches for binarization are One vs One (OvO) and One vs All (OvA). It is simpler to aggregate the outputs of all binary classifiers as the number of classifiers decreases. However, it causes an imbalance of positive and negative sample numbers, which affects the classification effect of each binary classifier. In this article, we contribute to the field of ensemble learning and multi-class classification by proposing a new method called Ensemble Partition Sampling (EPS). This article presents a new approach to multiclass classification using an "Ensemble Partition Sampling" method within the "one-vs-all" (OvA) framework. The primary goal of this method is to tackle the problem of data imbalance by incorporating ensemble learning and preprocessing techniques into each binary dataset. The study found that Ensemble Partition Sampling (EPS) is the most effective method for imbalanced and multiclass imbalanced classification, outperforming other methods including OvA, SMOTE, k-means-SMOTE, Bagging-RB, DES-MI, OvO-EASY, and OvO-SMB. The study used CART, Random Forest, and SVM as classifiers, and the results consistently showed that EPS outperformed all other algorithms. The findings suggest that EPS is a highly effective method for improving classification performance in imbalanced and multiclass imbalanced datasets. metadata Jabir, Brahim; Díez, Isabel De la Torre; Bautista Thompson, Ernesto; Ramírez-Vargas, Debora L. y Kuc Castilla, Ángel Gabriel mail SIN ESPECIFICAR (2023) Ensemble Partition Sampling (EPS) for Improved Multi-Class Classification. IEEE Access. p. 1. ISSN 2169-3536

Texto completo no disponible.

Resumen

Classification is a commonly used technique in data mining and is applied in various fields such as sentiment analysis, fraud detection, and fault diagnosis. Multiclass classification, which involves more than two classes, is more complex than binary classification. There are mainly two ways to approach multiclass classification, one is to expand the binary classifier into a multiclass classifier through various strategies and the other is to divide the multiclass classification problem into multiple binary problems (binarization). Two popular approaches for binarization are One vs One (OvO) and One vs All (OvA). It is simpler to aggregate the outputs of all binary classifiers as the number of classifiers decreases. However, it causes an imbalance of positive and negative sample numbers, which affects the classification effect of each binary classifier. In this article, we contribute to the field of ensemble learning and multi-class classification by proposing a new method called Ensemble Partition Sampling (EPS). This article presents a new approach to multiclass classification using an "Ensemble Partition Sampling" method within the "one-vs-all" (OvA) framework. The primary goal of this method is to tackle the problem of data imbalance by incorporating ensemble learning and preprocessing techniques into each binary dataset. The study found that Ensemble Partition Sampling (EPS) is the most effective method for imbalanced and multiclass imbalanced classification, outperforming other methods including OvA, SMOTE, k-means-SMOTE, Bagging-RB, DES-MI, OvO-EASY, and OvO-SMB. The study used CART, Random Forest, and SVM as classifiers, and the results consistently showed that EPS outperformed all other algorithms. The findings suggest that EPS is a highly effective method for improving classification performance in imbalanced and multiclass imbalanced datasets.

Tipo de Documento: Artículo
Palabras Clave: Ensemble Partition Sampling (EPS); One vs One (OvO); One vs All (OvA); Multi-Class Classification; Imbalanced learning; multiclass imbalanced classification
Clasificación temática: Materias > Ingeniería
Divisiones: Universidad Europea del Atlántico > Investigación > Producción Científica
Fundación Universitaria Internacional de Colombia > Investigación > Producción Científica
Universidad Internacional Iberoamericana México > Investigación > Producción Científica
Universidad Internacional Iberoamericana Puerto Rico > Investigación > Producción Científica
Universidad Internacional do Cuanza > Investigación > Producción Científica
Depositado: 09 May 2023 23:30
Ultima Modificación: 09 May 2023 23:30
URI: https://repositorio.unincol.edu.co/id/eprint/7028

Acciones (logins necesarios)

Ver Objeto Ver Objeto

<a class="ep_document_link" href="/11265/1/Food%20Frontiers%20-%202024%20-%20Cassotta%20-%20Human%E2%80%90based%20new%20approach%20methodologies%20to%20accelerate%20advances%20in%20nutrition%20research.pdf"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>

en

open

Human‐based new approach methodologies to accelerate advances in nutrition research

Much of nutrition research has been conventionally based on the use of simplistic in vitro systems or animal models, which have been extensively employed in an effort to better understand the relationships between diet and complex diseases as well as to evaluate food safety. Although these models have undeniably contributed to increase our mechanistic understanding of basic biological processes, they do not adequately model complex human physiopathological phenomena, creating concerns about the translatability to humans. During the last decade, extraordinary advancement in stem cell culturing, three-dimensional cell cultures, sequencing technologies, and computer science has occurred, which has originated a wealth of novel human-based and more physiologically relevant tools. These tools, also known as “new approach methodologies,” which comprise patient-derived organoids, organs-on-chip, multi-omics approach, along with computational models and analysis, represent innovative and exciting tools to forward nutrition research from a human-biology-oriented perspective. After considering some shortcomings of conventional in vitro and vivo approaches, here we describe the main novel available and emerging tools that are appropriate for designing a more human-relevant nutrition research. Our aim is to encourage discussion on the opportunity to explore innovative paths in nutrition research and to promote a paradigm-change toward a more human biology-focused approach to better understand human nutritional pathophysiology, to evaluate novel food products, and to develop more effective targeted preventive or therapeutic strategies while helping in reducing the number and replacing animals employed in nutrition research.

Producción Científica

Manuela Cassotta mail manucassotta@gmail.com, Danila Cianciosi mail , Maria Elexpuru Zabaleta mail maria.elexpuru@uneatlantico.es, Iñaki Elío Pascual mail inaki.elio@uneatlantico.es, Sandra Sumalla Cano mail sandra.sumalla@uneatlantico.es, Francesca Giampieri mail francesca.giampieri@uneatlantico.es, Maurizio Battino mail maurizio.battino@uneatlantico.es,

Cassotta

<a href="/11322/1/journal.pone.0298582.pdf" class="ep_document_link"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>

en

open

Design and development of patient health tracking, monitoring and big data storage using Internet of Things and real time cloud computing

With the outbreak of the COVID-19 pandemic, social isolation and quarantine have become commonplace across the world. IoT health monitoring solutions eliminate the need for regular doctor visits and interactions among patients and medical personnel. Many patients in wards or intensive care units require continuous monitoring of their health. Continuous patient monitoring is a hectic practice in hospitals with limited staff; in a pandemic situation like COVID-19, it becomes much more difficult practice when hospitals are working at full capacity and there is still a risk of medical workers being infected. In this study, we propose an Internet of Things (IoT)-based patient health monitoring system that collects real-time data on important health indicators such as pulse rate, blood oxygen saturation, and body temperature but can be expanded to include more parameters. Our system is comprised of a hardware component that collects and transmits data from sensors to a cloud-based storage system, where it can be accessed and analyzed by healthcare specialists. The ESP-32 microcontroller interfaces with the multiple sensors and wirelessly transmits the collected data to the cloud storage system. A pulse oximeter is utilized in our system to measure blood oxygen saturation and body temperature, as well as a heart rate monitor to measure pulse rate. A web-based interface is also implemented, allowing healthcare practitioners to access and visualize the collected data in real-time, making remote patient monitoring easier. Overall, our IoT-based patient health monitoring system represents a significant advancement in remote patient monitoring, allowing healthcare practitioners to access real-time data on important health metrics and detect potential health issues before they escalate.

Producción Científica

Md. Milon Islam mail , Imran Shafi mail , Sadia Din mail , Siddique Farooq mail , Isabel de la Torre Díez mail , Jose Breñosa mail josemanuel.brenosa@uneatlantico.es, Julio César Martínez Espinosa mail ulio.martinez@unini.edu.mx, Imran Ashraf mail ,

Islam

<a class="ep_document_link" href="/11666/1/Pneumonia_Detection_Using_Chest_Radiographs_With_Novel_EfficientNetV2L_Model.pdf"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>

en

open

Pneumonia Detection Using Chest Radiographs With Novel EfficientNetV2L Model

Pneumonia is a potentially life-threatening infectious disease that is typically diagnosed through physical examinations and diagnostic imaging techniques such as chest X-rays, ultrasounds or lung biopsies. Accurate diagnosis is crucial as wrong diagnosis, inadequate treatment or lack of treatment can cause serious consequences for patients and may become fatal. The advancements in deep learning have significantly contributed to aiding medical experts in diagnosing pneumonia by assisting in their decision-making process. By leveraging deep learning models, healthcare professionals can enhance diagnostic accuracy and make informed treatment decisions for patients suspected of having pneumonia. In this study, six deep learning models including CNN, InceptionResNetV2, Xception, VGG16, ResNet50 and EfficientNetV2L are implemented and evaluated. The study also incorporates the Adam optimizer, which effectively adjusts the epoch for all the models. The models are trained on a dataset of 5856 chest X-ray images and show 87.78%, 88.94%, 90.7%, 91.66%, 87.98% and 94.02% accuracy for CNN, InceptionResNetV2, Xception, VGG16, ResNet50 and EfficientNetV2L, respectively. Notably, EfficientNetV2L demonstrates the highest accuracy and proves its robustness for pneumonia detection. These findings highlight the potential of deep learning models in accurately detecting and predicting pneumonia based on chest X-ray images, providing valuable support in clinical decision-making and improving patient treatment.

Producción Científica

Mudasir Ali mail , Mobeen Shahroz mail , Urooj Akram mail , Muhammad Faheem Mushtaq mail , Stefanía Carvajal-Altamiranda mail stefania.carvajal@uneatlantico.es, Silvia Aparicio Obregón mail silvia.aparicio@uneatlantico.es, Isabel De La Torre Díez mail , Imran Ashraf mail ,

Ali

<a href="/11174/1/Detecting_Pragmatic_Ambiguity_in_Requirement_Specification_Using_Novel_Concept_Maximum_Matching_Approach_Based_on_Graph_Network.pdf" class="ep_document_link"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>

en

open

Detecting Pragmatic Ambiguity in Requirement Specification Using Novel Concept Maximum Matching Approach Based on Graph Network

Requirements specifications written in natural language enable us to understand a program’s intended functionality, which we can then translate into operational software. At varying stages of requirement specification, multiple ambiguities emerge. Ambiguities may appear at several levels including the syntactic, semantic, domain, lexical, and pragmatic levels. The primary objective of this study is to identify requirements’ pragmatic ambiguity. Pragmatic ambiguity occurs when the same set of circumstances can be interpreted in multiple ways. It requires consideration of the context statement of the requirements. Prior research has developed methods for obtaining concepts based on individual nodes, so there is room for improvement in the requirements interpretation procedure. This research aims to develop a more effective model for identifying pragmatic ambiguity in requirement definition. To better interpret requirements, we introduced the Concept Maximum Matching (CMM) technique, which extracts concepts based on edges. The CMM technique significantly improves precision because it permits a more accurate interpretation of requirements based on the relative weight of their edges. Obtaining an F-measure score of 0.754 as opposed to 0.563 in existing models, the evaluation results demonstrate that CMM is a substantial improvement over the previous method.

Producción Científica

Khadija Aslam mail , Faiza Iqbal mail , Ayesha Altaf mail , Naveed Hussain mail , Mónica Gracia Villar mail monica.gracia@uneatlantico.es, Emmanuel Soriano Flores mail emmanuel.soriano@uneatlantico.es, Isabel De La Torre Diez mail , Imran Ashraf mail ,

Aslam

<a class="ep_document_link" href="/11264/1/TSP_CMC_37347.pdf"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>

en

open

Depression Intensity Classification from Tweets Using FastText Based Weighted Soft Voting Ensemble

Predicting depression intensity from microblogs and social media posts has numerous benefits and applications, including predicting early psychological disorders and stress in individuals or the general public. A major challenge in predicting depression using social media posts is that the existing studies do not focus on predicting the intensity of depression in social media texts but rather only perform the binary classification of depression and moreover noisy data makes it difficult to predict the true depression in the social media text. This study intends to begin by collecting relevant Tweets and generating a corpus of 210000 public tweets using Twitter public application programming interfaces (APIs). A strategy is devised to filter out only depression-related tweets by creating a list of relevant hashtags to reduce noise in the corpus. Furthermore, an algorithm is developed to annotate the data into three depression classes: ‘Mild,’ ‘Moderate,’ and ‘Severe,’ based on International Classification of Diseases-10 (ICD-10) depression diagnostic criteria. Different baseline classifiers are applied to the annotated dataset to get a preliminary idea of classification performance on the corpus. Further FastText-based model is applied and fine-tuned with different preprocessing techniques and hyperparameter tuning to produce the tuned model, which significantly increases the depression classification performance to an 84% F1 score and 90% accuracy compared to baselines. Finally, a FastText-based weighted soft voting ensemble (WSVE) is proposed to boost the model’s performance by combining several other classifiers and assigning weights to individual models according to their individual performances. The proposed WSVE outperformed all baselines as well as FastText alone, with an F1 of 89%, 5% higher than FastText alone, and an accuracy of 93%, 3% higher than FastText alone. The proposed model better captures the contextual features of the relatively small sample class and aids in the detection of early depression intensity prediction from tweets with impactful performances.

Producción Científica

Muhammad Rizwan mail , Muhammad Faheem Mushtaq mail , Maryam Rafiq mail , Arif Mehmood mail , Isabel de la Torre Diez mail , Mónica Gracia Villar mail monica.gracia@uneatlantico.es, Helena Garay mail helena.garay@uneatlantico.es, Imran Ashraf mail ,

Rizwan