As a continuation of our previous work on mining graph structures via propositionalization, we needed to explore new methods to speed up pattern mining. Given the similarities between transactions in the context of frequent itemset mining and sentences (both can be seen as sequences of tokens), we explored the possibility to partition the mined databases in an informative way by exploiting the distributional semantics that is captured by language models. The results supported our hypothesis and were accepted at SIGMOD’24, held this year in Santiago de Chile. #SIGMOD24
Congratulations to my co-authors, Jordi Bernad and Pierre Maillot!