Paper • Website • Video • Citation
-
This is the official implementation of "Inferring Phishing Intention via Webpage Appearance and Dynamics: A Deep Vision-Based Approach"USENIX'22 link to paper, link to our website
-
Existing reference-based phishing detectors:
- ❌ Subject to false positive because they only capture brand intention
-
The contributions of our paper:
- ✅ We propose a referenced-based phishing detection system that captures both brand intention and credential-taking intention. To the best of our knowledge, this is the first work that analyzes both brand intention and credential-taking intentions in a systematic way for phishing detection.
- ✅ We set up a phishing monitoring system. It reports phishing webpages per day with the highest precision in comparison to state-of-the-art phishing detection solutions.
Input
: a screenshot, Output
: Phish/Benign, Phishing target
-
Step 1: Enter Abstract Layout detector, get predicted elements
-
Step 2: Enter Siamese Logo Comparison
- If Siamese report no target,
Return Benign, None
- Else Siamese report a target, Enter step 3 CRP classifier
- If Siamese report no target,
-
Step 3: CRP classifier
- If CRP classifier reports its a CRP page, go to step 5 Return
- ElIf not a CRP page and havent execute CRP Locator before, go to step 4: CRP Locator
- Else not a CRP page but have done CRP Locator before,
Return Benign, None
-
Step 4: CRP Locator
- Find login/signup links and click, if reach a CRP page at the end, go back to step 1 Abstract Layout detector with an updated URL and screenshot
- Else cannot reach a CRP page,
Return Benign, None
-
Step 5:
- If reach a CRP + Siamese report target:
Return Phish, Phishing target
- Else
Return Benign, None
- If reach a CRP + Siamese report target:
|_ configs: Configuration files for the object detection models and the gloal configurations
|_ modules: Inference code for layout detector, CRP classifier, CRP locator, and OCR-aided siamese model
|_ models: the model weights and reference list
|_ ocr_lib: external code for the OCR encoder
|_ utils
|_ configs.py: load configuration files
|_ phishintention.py: main script
Requirements:
- Anaconda installed, please refer to the official installation guide: https://docs.anaconda.com/free/anaconda/install/index.html
- CUDA >= 11
- Create a local clone of PhishIntention
git clone https://github.com/lindsey98/PhishIntention.git
cd PhishIntention
- Setup. In this step, we would be installing the core dependencies of PhishIntention such as pytorch, and detectron2. In addition, we would also download the model checkpoints and brand reference list. This step may take some time.
chmod +x setup.sh
export ENV_NAME="phishintention"
./setup.sh
conda activate phishintention
- Run
python phishintention.py --folder <folder you want to test e.g. datasets/test_sites> --output_txt <where you want to save the results e.g. test.txt>
The testing folder should be in the structure of:
test_site_1
|__ info.txt (Write the URL)
|__ shot.png (Save the screenshot)
|__ html.txt (HTML source code, optional)
test_site_2
|__ info.txt (Write the URL)
|__ shot.png (Save the screenshot)
|__ html.txt (HTML source code, optional)
......
- In our paper, we also implement several phishing detection and identification baselines, see here
Please consider citing our work :)
@inproceedings{liu2022inferring,
title={Inferring Phishing Intention via Webpage Appearance and Dynamics: A Deep Vision Based Approach},
author={Liu, Ruofan and Lin, Yun and Yang, Xianglin and Ng, Siang Hwee and Divakaran, Dinil Mon and Dong, Jin Song},
booktitle={30th $\{$USENIX$\}$ Security Symposium ($\{$USENIX$\}$ Security 21)},
year={2022}
}
If you have any issues running our code, you can raise an issue or send an email to [email protected], [email protected], [email protected]