Skip to content

Commit

Permalink
Adds detection for various bots (matomo-org#7930)
Browse files Browse the repository at this point in the history
* Adds detection for Pigafetta
* Adds detection for Cotoyogi
* Adds detection for SuggestBot

ref matomo-org#7929
  • Loading branch information
liviuconcioiu authored Nov 21, 2024
1 parent 81102ef commit 620714b
Show file tree
Hide file tree
Showing 2 changed files with 45 additions and 0 deletions.
24 changes: 24 additions & 0 deletions Tests/fixtures/bots.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8409,3 +8409,27 @@
producer:
name: Google Inc.
url: https://www.google.com/
-
user_agent: Mozilla 5.0 (compatible; Pigafetta/0.5; +http://visual-seo.com/Pigafetta-Bot)
bot:
name: Pigafetta
category: Crawler
url: https://visual-seo.com/Pigafetta-Bot
producer:
name: aStonish Studio Srl
url: http://www.astonishstudio.com/
-
user_agent: Mozilla/5.0 (compatible; Cotoyogi/4.0; +https://ds.rois.ac.jp/center8/crawler/
bot:
name: Cotoyogi
category: Crawler
url: https://ds.rois.ac.jp/center8/crawler/
producer:
name: Joint Support-Center for Data Science Research (ROIS-DS)
url: https://ds.rois.ac.jp/
-
user_agent: SuggestBot/1.0
bot:
name: SuggestBot
category: Crawler
url: https://github.com/nettrom/suggestbot
21 changes: 21 additions & 0 deletions regexes/bots.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4891,6 +4891,27 @@
category: 'Crawler'
url: 'https://web.archive.org/web/20151228225429/https://cloudservermarket.com/spider.html'

- regex: 'Pigafetta'
name: 'Pigafetta'
category: 'Crawler'
url: 'https://visual-seo.com/Pigafetta-Bot'
producer:
name: 'aStonish Studio Srl'
url: 'http://www.astonishstudio.com/'

- regex: 'Cotoyogi'
name: 'Cotoyogi'
category: 'Crawler'
url: 'https://ds.rois.ac.jp/center8/crawler/'
producer:
name: 'Joint Support-Center for Data Science Research (ROIS-DS)'
url: 'https://ds.rois.ac.jp/'

- regex: 'SuggestBot'
name: 'SuggestBot'
category: 'Crawler'
url: 'https://github.com/nettrom/suggestbot'

# Generic bots
- regex: 'nuhk|grub-client|Download Demon|SearchExpress|Microsoft URL Control|borg|altavista|dataminr\.com|teoma|oegp|http%20client|htdig|mogimogi|larbin|scrubby|searchsight|semanticdiscovery|snappy|vortex(?!(?: Build|Plus| CM62| HD65))|zeal(?!ot)|dataparksearch|findlinks|BrowserMob|URL2PNG|ZooShot|GomezA|Google SketchUp|Read%20Later|7Siters|centuryb\.o\.t9|InterNaetBoten|EasyBib AutoCite|Bidtellect|tomnomnom/meg|cortex|Re-re Studio|adreview|AHC/|NameOfAgent|Request-Promise|ALittle Client|Hello,? world|wp_is_mobile|0xAbyssalDoesntExist|Anarchy99|^revolt|nvd0rz|xfa1|Hakai|gbrmss|fuck-your-hp|IDBTE4M CODE87|Antoine|Insomania|Hells-Net|b3astmode|Linux Gnu \(cow\)|Test Certificate Info|iplabel|Magellan|TheSafex?Internetx?Search|Searcherx?web|kirkland-signature|LinkChain|survey-security-dot-txt|infrawatch|Time/|r00ts3c-owned-you|nvdorz|Root Slut|NiggaBalls|BotPoke|GlobalWebSearch|xx032_bo9vs83_2a|sslshed|geckotrail|Wordup|Keydrop|^xenu|^(?:chrome|firefox|Abcd|Dark|KvshClient|Node.js|Report Runner|url|Zeus|ZmEu)$'
name: 'Generic Bot'
Expand Down

0 comments on commit 620714b

Please sign in to comment.