IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB 2020)

Accelerating Sequence Alignments
Based on FM-Index
Using the Intel KNL Processor

José-Manuel Herruzo1 Sonia González-Navarro1 Pablo Ibáñez2
Víctor Viñals2 Jesús Alastruey-Benedé2 Óscar Plata1
1Universidad de Málaga 2Universidad de Zaragoza, I3A

News

Abstract

Abstract—FM-index is a compact data structure suitable for fast matches of short reads to large reference genomes. The matching algorithm using this index exhibits irregular memory access patterns that cause frequent cache misses, resulting in a memory bound problem. This paper analyzes different FM-index versions presented in the literature, focusing on those computing aspects related tothe data access. As a result of the analysis, we propose a new organization of FM-index that minimizes the demand for memorybandwidth, allowing a great improvement of performance on processors with high-bandwidth memory, such as the second-generationIntel Xeon Phi (Knights Landing, or KNL), integrating ultra high-bandwidth stacked memory technology. As the roofline model shows, our implementation reaches 95% of the peak random access bandwidth limit when executed on the KNL and almost all the available bandwidth when executed on other Intel Xeon architectures with conventional DDR memory. In addition, the obtained throughput in KNL is much higher than the results reported for GPUs in the literature.

Downloads

Bibtex

@article{Herruzo_FMindex_KNL, 
author = {J.M. Herruzo, S. Navarro-González, P. Ibáñez, V. Viñals, J. Alastruey and O. Plata}, 
title = {Accelerating Sequence Alignments Based on FM-Index Using the Intel KNL Processor}, 
journal = {IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)}, 
year={2020}, 
volume={17}, 
number={4}, 
pages={1093-1104}, 
keywords={FM-index; short-read alignment; high-bandwidth memory; Intel Xeon Phi Knights-Landing}, 
doi={10.1109/TCBB.2018.2884701}, 
ISSN={1557-9964}, 
month={-}}

Related

Acknowledgements

We thank the anonymous reviewers for their insightful comments. This work was supported in part by grants (1) TIN2016-76635-C2-1-R from Agencia Estatal de Investigación (AEI) and European Regional Development Fund (ERDF), and (2) gaZ: T58_17R research group from Aragón Goverment and European Social Fund (ESF).