All published articles of this journal are available on ScienceDirect.

RESEARCH ARTICLE

Predictive Analysis of Methylation Patterns in Oral Squamous Cell Carcinoma (OSCC) Using Machine Learning

Debasree Sarkar1 , * Open Modal iD Authors Info & Affiliations
The Open Bioinformatics Journal 07 Oct 2025 RESEARCH ARTICLE DOI: 10.2174/0118750362423160250929075233

Abstract

Introduction

Oral and oropharyngeal cancers are the most common types of head and neck cancers, with over 90% originating from squamous cells in the mouth and throat. Chronic tobacco and alcohol use, inflammation, viral infections, betel quid chewing, and genetic predisposition are major risk factors for OSCC, which kills over 100,000 patients annually. Epigenetic mechanisms, such as DNA methylation, can silence tumor suppressor genes, contributing to cancer progression and patient outcomes in Oral Squamous Cell Carcinoma (OSCC). This study aimed to predict prominent methylation signatures that can distinguish OSCC from normal cells.

Methods

Machine learning algorithms, like Support Vector Machine (SVM), Random Forest (RF), and Multilayer Perceptron (MLP), were implemented using R packages and a balanced training dataset consisting of M-values of methylated CpG sites from 46 matched OSCC and normal adjacent tissue samples.

Results

MLP model demonstrated the highest accuracy of 92% on the training dataset and 100% on the blind dataset, even with a reduced feature set of just 10 significantly differentially methylated CpG sites.

Discussion

Despite the high burden of oral cancer in South America, and an alarming trend of rising number of cases, research into this particular area is sorely lacking. This work aims to address the issue by performing a machine learning-based analysis of methylation patterns, a major established factor, in oral cancer datasets obtained from Brazilian patients. However, the lack of experimental evidence supporting the results of this analysis can be considered a significant limitation of this study.

Conclusion

A highly accurate and generalizable machine learning model was developed using the Multi-Layer Perceptron with multiple layers (MLP-ml) algorithm, which achieved an accuracy of 95% on an independent validation dataset of 15 OSCC tumors and 7 non-tumor adjacent tissue samples. Machine learning algorithms can therefore provide valuable insights into biological datasets that may be overlooked by regular bioinformatics workflows.

Keywords: DNA methylation, Oral cancer, Methylome, Machine learning, Random forest, Multilayer perceptron, Support vector machine.
Fulltext HTML PDF
1800
1801
1802
1803
1804