News

Abstract

The FM-index is a data structure used in genomics for exact search of input sequences over large reference genomes. Algorithms based on the FM-index show an irregular memory access pattern, resulting in a memory bound problem. We analyze a recent implementation of the FM-index and highlight existing throughput-memory trade-offs, showing that memory requirements limit implementation of large k-steps. We propose COFI, a COmpressed FM-Index for large K-steps. COFI enables a 15-step FM-index using less than 16~GB for a human genome reference of 3 giga base pairs. An algorithm based on this new layout is evaluated on both a Knights Landing (KNL) and an Skylake-based system (SKX). We achieve average speed-ups of 1.46x and 1.39x, respectively, with respect to an state-of-the-art FM-index implementation that is already well optimized.

Downloads

Bibtex

@ARTICLE{9109660, author={R. {Langarita} and A. {Armejach} and J. {Setoain} and P. E. {Ibanez Marin} and J. {Alastruey-Benede} and M. {Moreto Planas}}, journal={IEEE/ACM Transactions on Computational Biology and Bioinformatics}, title={Compressed Sparse FM-Index: Fast Sequence Alignment Using Large K-Steps}, year={2020}, volume={}, number={}, pages={1-1},}

Related

Acknowledgements

This work has been partially supported by the Arm-BSC Centre of Excellence initiative, the Spanish Ministry of Economy and Competitiveness (contracts TIN2015-65316-P and TIN2016-76635-C2-1-R (AEI/ERDF, EU)), the Generalitat de Catalunya (contracts 2017-SGR-1328 and 2017-SGR-1414), Gobierno de Aragón (T58_20R research group), ERDF 2014-2020 "Construyendo Europa desde Aragón", and the European Union's Horizon 2020 research and innovation program under the Mont-Blanc 2020 project (grant agreement 779877). M. Moretó has been partially supported by the Spanish Ministry of Economy, Industry and Competitiveness under Ramon y Cajal fellowship number RYC-2016-21104. Finally, A. Armejach has been partially supported by the Spanish Ministry of Economy, Industry and Competitiveness under Juan de la Cierva postdoctoral fellowship number IJCI-2017-33945.