eprintid: 28577 rev_number: 10 eprint_status: archive userid: 2 dir: disk0/00/02/85/77 datestamp: 2026-05-15 23:30:12 lastmod: 2026-05-15 23:30:13 status_changed: 2026-05-15 23:30:12 type: article metadata_visibility: show creators_name: Iman, Eshmal creators_name: Jabbar, Sohail creators_name: Ramzan, Shabana creators_name: Raza, Ali creators_name: Raoof, Farwa creators_name: Carvajal-Altamiranda, Stefanía creators_name: Lipari, Vivian creators_name: Ashraf, Imran creators_id: creators_id: creators_id: creators_id: creators_id: creators_id: stefania.carvajal@uneatlantico.es creators_id: vivian.lipari@uneatlantico.es creators_id: title: An Integrated Machine Learning and Genomic Framework for Precise Detection of Gastric Cancer ispublished: pub subjects: uneat_bm subjects: uneat_eng divisions: uneatlantico_produccion_cientifica divisions: unincol_produccion_cientifica divisions: uninimx_produccion_cientifica divisions: unic_produccion_cientifica divisions: uniromana_produccion_cientifica full_text_status: public keywords: Gastric cancer histological images k-means clustering unsupervised learning convolutional neural networks image processing abstract: This study presents a novel integrative approach for the analysis of high-dimensional gene expression data, leveraging the complementary strengths of unsupervised clustering and supervised classification. Using K-means clustering, the dataset is stratified into three distinct clusters, revealing intrinsic biological patterns and relationships. The resulting cluster assignments are subsequently employed as pseudo-labels to train machine learning models, including support vector machines, random forest, and a stacking ensemble classifier. To validate and enhance the robustness of clustering, complementary methodologies such as hierarchical clustering and DBSCAN are employed, with results visualized through PCA-driven dimensionality reduction. The high predictive accuracy achieved by the classifiers underscores the separability and reliability of the identified clusters. Furthermore, feature importance analysis highlighted key genetic determinants within each cluster, offering actionable insights into potential biomarkers and critical genomic features. This framework bridges the gap between exploratory unsupervised learning and predictive supervised modeling, providing a scalable and interpretable methodology for analyzing complex genomic datasets. Its applicability extends to biomarker discovery, patient stratification, and other precision medicine applications, emphasizing its utility in advancing genomic research and clinical practice. date: 2026-05 publication: The American Journal of Pathology id_number: doi:10.1016/j.ajpath.2026.04.014 refereed: TRUE issn: 00029440 official_url: http://doi.org/10.1016/j.ajpath.2026.04.014 access: open language: en citation: Artículo Materias > Biomedicina Materias > Ingeniería Universidad Europea del Atlántico > Investigación > Producción Científica Fundación Universitaria Internacional de Colombia > Investigación > Artículos y libros Universidad Internacional Iberoamericana México > Investigación > Producción Científica Universidad Internacional do Cuanza > Investigación > Producción Científica Universidad de La Romana > Investigación > Producción Científica Abierto Inglés This study presents a novel integrative approach for the analysis of high-dimensional gene expression data, leveraging the complementary strengths of unsupervised clustering and supervised classification. Using K-means clustering, the dataset is stratified into three distinct clusters, revealing intrinsic biological patterns and relationships. The resulting cluster assignments are subsequently employed as pseudo-labels to train machine learning models, including support vector machines, random forest, and a stacking ensemble classifier. To validate and enhance the robustness of clustering, complementary methodologies such as hierarchical clustering and DBSCAN are employed, with results visualized through PCA-driven dimensionality reduction. The high predictive accuracy achieved by the classifiers underscores the separability and reliability of the identified clusters. Furthermore, feature importance analysis highlighted key genetic determinants within each cluster, offering actionable insights into potential biomarkers and critical genomic features. This framework bridges the gap between exploratory unsupervised learning and predictive supervised modeling, providing a scalable and interpretable methodology for analyzing complex genomic datasets. Its applicability extends to biomarker discovery, patient stratification, and other precision medicine applications, emphasizing its utility in advancing genomic research and clinical practice. metadata Iman, Eshmal; Jabbar, Sohail; Ramzan, Shabana; Raza, Ali; Raoof, Farwa; Carvajal-Altamiranda, Stefanía; Lipari, Vivian y Ashraf, Imran mail SIN ESPECIFICAR, SIN ESPECIFICAR, SIN ESPECIFICAR, SIN ESPECIFICAR, SIN ESPECIFICAR, stefania.carvajal@uneatlantico.es, vivian.lipari@uneatlantico.es, SIN ESPECIFICAR (2026) An Integrated Machine Learning and Genomic Framework for Precise Detection of Gastric Cancer. The American Journal of Pathology. ISSN 00029440 document_url: http://repositorio.unincol.edu.co/id/eprint/28577/1/PIIS0002944026001367.pdf