RESEARCH ARTICLE

FastImpute: Development and Validation of a Workflow for Open-source, Reference-Free Genotype Imputation Methods - An Example in Breast Cancer (PRS313_BC)

The Open Bioinformatics Journal 16 Feb 2026 RESEARCH ARTICLE DOI: 10.2174/0118750362421210250929110508

Abstract

Introduction

Genotype imputation improves the resolution of genetic data, but traditional methods are computationally intensive or compromise privacy. Deep learning alternatives are often too large for client-side deployment. In this study, FastImpute, a workflow for creating lightweight, reference-free imputation models, was developed that enables real-time, accessible genetic risk assessment on edge devices.

Methods

Using whole-genome sequencing data from 2,504 individuals in the 1000 Genomes Project, linear and logistic regression models were trained to impute single-nucleotide polymorphisms (SNPs) used in the breast cancer polygenic risk score PRS313_BC. Models used SNPs from commercial genotyping arrays, and performance was evaluated against sequencing data and benchmarked against Beagle.

Results

The polygenic risk score (PRS) calculated with our linear model correlated strongly with the PRS from true sequencing data (R² = 0.86), significantly outperforming no imputation and minor allele frequency imputation (R² = 0.38). Our logistic model correctly identified 4 of 6 individuals in the top 1% of breast cancer risk, matching Beagle’s performance.

Discussion

Our approach balances performance and efficiency, enabling deployment on personal devices and preserving user privacy through local data processing. This approach democratizes access to genetic risk assessment using direct-to-consumer data. However, this proof of concept requires validation across other genomic contexts before clinical use.

Conclusion

The FastImpute pipeline demonstrates that lightweight models can enable real-time genetic risk assessment on edge devices.

Keywords: Genotype imputation, Reference-free methods, FastImpute, Breast cancer, PRS313, Client-side imputation, Web technologies, Polygenic risk score, Direct-to-consumer test.
Fulltext HTML PDF ePub
1800
1801
1802
1803
1804