Compressed Sparse FM-Index:
Fast Sequence Alignment Using Large K-Steps
1Barcelona Supercomputing Center (BSC)
2Universitat Politècnica de Catalunya (UPC)
3ARM Cambridge 4Universidad de Zaragoza, I3A
3ARM Cambridge 4Universidad de Zaragoza, I3A
News
- June 2020: Article available on IEEE Xplore.
- July 2020: Web launched.
- Jan-Feb 2022: Article published on IEEE Xplore (print ISSN).
Abstract
The FM-index is a data structure used in genomics for exact search of input sequences over large reference genomes. Algorithms based on the FM-index show an irregular memory access pattern, resulting in a memory bound problem. We analyze a recent implementation of the FM-index and highlight existing throughput-memory trade-offs, showing that memory requirements limit implementation of large k-steps. We propose COFI, a COmpressed FM-Index for large K-steps. COFI enables a 15-step FM-index using less than 16~GB for a human genome reference of 3 giga base pairs. An algorithm based on this new layout is evaluated on both a Knights Landing (KNL) and an Skylake-based system (SKX). We achieve average speed-ups of 1.46x and 1.39x, respectively, with respect to an state-of-the-art FM-index implementation that is already well optimized.Downloads
Bibtex
@ARTICLE{9109660,
author={Langarita, Rubén and Armejach, Adrià and Setoain, Javier and Ibáñez-Marín, Pablo and Alastruey-Benedé, Jesús and Moretó, Miquel},
journal={IEEE/ACM Transactions on Computational Biology and Bioinformatics},
title={Compressed Sparse FM-Index: Fast Sequence Alignment Using Large K-Steps},
year={2022},
volume={19},
number={1},
pages={355-368},
doi={10.1109/TCBB.2020.3000253}
}
Related
2018: Accelerating Sequence Alignments Based on FM-Index Using the Intel KNL Processor. Jose M. Herruzo, Sonia Gonzalez-Navarro, Pablo Ibañez, Victor Viñals, Jesús Alastruey-Benedé, and Oscar Plata. Accepted for publication in IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB).
[paper, pdf]