This repository contains the code used to build the models and run the experiments presented in Is Function Similarity Over-Engineered? Building a Benchmark. The paper can be found here.
Our paper evaluates five models: jTrans, a GNN from Li et. al, A Naive Multiheaded-Attention Transformer Encoder, Ghidra's BSim Plugin, and REFuSe, a new model introduced in our paper. These models are assessed against five datasets: Assemblage, MOTIF, CommonLibraries, Marcelli Dataset-1, and BinaryCorp.
data
The recipe for building our Assemblage dataset, and the code to run our BSim experiments. A guide to reproducing datasets from recipes can be found here, and more details about BSim can be found in the corresponding folder.
data-processing
Preprocess data for experiments.
model-training
Train models on Assemblage data.
model-evaluation
Evaluate models on datasets.