Compressed Sparse FM-Index:
Fast Sequence Alignment Using Large K-Steps
1Barcelona Supercomputing Center (BSC)
2Universitat Politècnica de Catalunya (UPC)
3ARM Cambridge 4Universidad de Zaragoza, I3A
3ARM Cambridge 4Universidad de Zaragoza, I3A
News
- June 2020: Article available on IEEE Xplore.
- July 2020: Web launched.
- Jan-Feb 2022: Article published on IEEE Xplore (print ISSN).
Abstract
The FM-index is a data structure used in genomics for exact search of input sequences over large reference genomes. Algorithms based on the FM-index show an irregular memory access pattern, resulting in a memory bound problem. We analyze a recent implementation of the FM-index and highlight existing throughput-memory trade-offs, showing that memory requirements limit implementation of large k-steps. We propose COFI, a COmpressed FM-Index for large K-steps. COFI enables a 15-step FM-index using less than 16~GB for a human genome reference of 3 giga base pairs. An algorithm based on this new layout is evaluated on both a Knights Landing (KNL) and an Skylake-based system (SKX). We achieve average speed-ups of 1.46x and 1.39x, respectively, with respect to an state-of-the-art FM-index implementation that is already well optimized.Downloads
Bibtex
@ARTICLE{9109660, author={Langarita, Rubén and Armejach, Adrià and Setoain, Javier and Ibáñez-Marín, Pablo and Alastruey-Benedé, Jesús and Moretó, Miquel}, journal={IEEE/ACM Transactions on Computational Biology and Bioinformatics}, title={Compressed Sparse FM-Index: Fast Sequence Alignment Using Large K-Steps}, year={2022}, volume={19}, number={1}, pages={355-368}, doi={10.1109/TCBB.2020.3000253} }
Related
2018: Accelerating Sequence Alignments Based on FM-Index Using the Intel KNL Processor. Jose M. Herruzo, Sonia Gonzalez-Navarro, Pablo Ibañez, Victor Viñals, Jesús Alastruey-Benedé, and Oscar Plata. Accepted for publication in IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB).
[paper, pdf]