Skip to content

Commit

Permalink
Pattern updates for better recognition (#271)
Browse files Browse the repository at this point in the history
  • Loading branch information
omrilotan authored Aug 14, 2024
1 parent 86ee644 commit c1945b2
Show file tree
Hide file tree
Showing 7 changed files with 130 additions and 28 deletions.
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# Changelog

## [5.1.15](https://github.com/omrilotan/isbot/compare/v5.1.14...v5.1.15)

- [Pattern] Pattern updates for better recognition

## [5.1.14](https://github.com/omrilotan/isbot/compare/v5.1.13...v5.1.14)

- [Pattern] More accurate patterns for some substrings
Expand Down
7 changes: 7 additions & 0 deletions fixtures/browsers.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ AAA App User Agent String Examples - Native and Webview based user agent forms f
- Mozilla/5.0 (Windows Phone 10.0; Android 4.2.1; Microsoft; Lumia 535 Dual SIM) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.0.0 Mobile Safari/537.36 Edge/13.0 WebViewApp ExampleApp/1
ABrowse:
- Mozilla/5.0 (compatible; U; ABrowse 0.6; Syllable) AppleWebKit/420+ (KHTML, like Gecko)
AirWatch:
- Mozilla/5.0 (iPhone; CPU iPhone OS 17_5_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148 Version/17.5.1 Safari/605.1.15 (AirWatch Browser v24.06)
Amazon 4K Fire TV:
- Mozilla/5.0 (Linux; Android 5.1; AFTS Build/LMY47O) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/41.99900.2250.0242 Safari/537.36
Amiga:
Expand Down Expand Up @@ -189,6 +191,7 @@ Facebook App:
- Mozilla/5.0 (iPhone; CPU iPhone OS 11_4_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15G77 [FBAN/FBIOS;FBAV/183.0.0.41.81;FBBV/119182652;FBDV/iPhone6,2;FBMD/iPhone;FBSN/iOS;FBSV/11.4.1;FBSS/2;FBCR/VIVO;FBID/phone;FBLC/pt_BR;FBOP/5;FBRV/0]
- Mozilla/5.0 (iPhone; CPU iPhone OS 13_3_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148 [FBAN/FBIOS;FBDV/iPhone11,8;FBMD/iPhone;FBSN/iOS;FBSV/13.3.1;FBSS/2;FBID/phone;FBLC/en_US;FBOP/5;FBCR/]
- Mozilla/5.0 (iPhone; CPU iPhone OS 16_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148 LightSpeed [FBAN/MessengerLiteForiOS;FBAV/410.0.0.23.86;FBBV/477357144;FBDV/iPhone14,7;FBMD/iPhone;FBSN/iOS;FBSV/16.2;FBSS/3;FBCR/;FBID/phone;FBLC/en-GB;FBOP/0
- Mozilla/5.0 (iPhone; CPU iPhone OS 17_5_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/21F90 [FBAN/FBPageAdmin;FBAV/400.0.0.21.108;FBBV/620787505;FBDV/iPhone15,3;FBMD/iPhone;FBSN/iOS;FBSV/17.5.1;FBSS/3;FBID/phone;FBLC/en_GB;FBOP/5;FBDI/D9016741-C75C-4FC3-8A3B-34249A953D07;FBRV/0]
- Mozilla/5.0 (Linux; Android 7.0; SM-G570M Build/NRD90M; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/69.0.3497.100 Mobile Safari/537.36 [FB_IAB/FB4A;FBAV/192.0.0.34.85;]
- Mozilla/5.0 (Linux; Android 9; SM-G950U Build/PPR1.180610.011; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/80.0.3987.149 Mobile Safari/537.36 [FB_IAB/FB4A;FBAV/263.0.0.46.121;]
- Mozilla/5.0 (Windows NT 10.0.16299.98; osmeta 10.3.3308) AppleWebKit/602.1.1 (KHTML, like Gecko) Version/9.0 Safari/602.1.1 osmeta/10.3.3308 Build/3308 [FBAN/FBW;FBAV/140.0.0.232.179;FBBV/83145113;FBDV/WindowsDevice;FBMD/Predator G9-793;FBSN/Windows;FBSV/10.0.16299.125;FBSS/1;FBCR/;FBID/desktop;FBLC/de_DE;FBOP/45;FBRV/0]
Expand Down Expand Up @@ -264,6 +267,8 @@ Google Search App:
- Mozilla/5.0 (iPhone; CPU iPhone OS 13_3 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) GSA/102.0.304944559 Mobile/15E148 Safari/604.1
Hawk QuickBrowser:
- Mozilla/5.0 (Linux; U; Android 7.0; es-us; Moto C Build/NRD90M.063) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/69.0.3497.100 Mobile Safari/537.36 Hawk/QuickBrowser/2.4.8.22800
HiSearch:
- Mozilla/5.0 (Linux; Android 10; STK-L21 Build/HUAWEISTK-L21; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/126.0.6478.186 Mobile Safari/537.36 HiSearch/22.0.6.315
HuaweiBrowser:
- Mozilla/5.0 (Linux; Android 10; MAR-LX3A; HMSCore 6.12.4.311; GMSCore 23.48.16) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.88 HuaweiBrowser/14.0.2.311 Mobile Safari/537.36
- Mozilla/5.0 (Linux; Android 12; HarmonyOS; TET-AN00; HMSCore 6.12.4.312) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.88 HuaweiBrowser/14.0.5.301 Mobile Safari/537.36
Expand Down Expand Up @@ -501,6 +506,7 @@ QQ Browser:
- Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.48 Safari/537.36 QQBrowser/8.0.3197.400
QtWebEngine:
- Mozilla/5.0 (Windows NT 6.2) AppleWebKit/537.36 (KHTML, like Gecko) QtWebEngine/5.5.1 Chrome/40.0.2214.115 Safari/537.36
- Mozilla/5.0 (X11; Linux x86_64) Miniature.io/6.0 AppleWebKit/537.36 (KHTML, like Gecko) QtWebEngine/5.12.4 Chrome/69.0.3497.128 Safari/537.36
QuickTime:
- QuickTime/7.6.6 (qtver=7.6.6;cpu=IA32;os=Mac 10.6.8)
RadiosNet:
Expand Down Expand Up @@ -724,5 +730,6 @@ ZZZ Insignificat bots - Crosswalk project (deprecated):
ZZZ Insignificat bots - These bots have very low appearance rate and are not worth blocking:
- Mozilla/5.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322) 360JK yunjiankong 427691
- Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; Banca Caboto s.p.a.)
- Mozilla/5.0 (platform; rv:geckoversion) Gecko/geckotrail Firefox/firefoxversion
- Mozilla/5.0 (Unknown; Linux x86_64) AppleWebKit/538.1 (KHTML, like Gecko) HLB/1.247
- Opera/9.70 (Linux armv7l ; U; turbotabbee/TSV2.0/1.02Q; fr) Presto/2.2
2 changes: 1 addition & 1 deletion fixtures/downloaded/downloaded
Original file line number Diff line number Diff line change
@@ -1 +1 @@
Tue, 30 Jul 2024 08:14:34 GMT
Tue, 13 Aug 2024 17:44:56 GMT
54 changes: 50 additions & 4 deletions fixtures/downloaded/matomo-org.json
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,7 @@
"dotbot",
"DuckDuckBot/1.0; (+http://duckduckgo.com/duckduckbot.html)",
"Mozilla/5.0 (compatible; DuckDuckGo-Favicons-Bot/1.0; +http://duckduckgo.com)",
"DuckAssistBot/1.1; (+http://duckduckgo.com/duckassistbot.html)",
"EMail Exractor",
"Mozilla/5.0 (compatible; EasouSpider; +http://www.easou.com/search/spider.html)",
"Mozilla/5.0 (iPhone; U; CPU iPhone OS 3_0 like Mac OS X; en-us; EasouSpider; +http://www.easou.com/search/spider.html)",
Expand Down Expand Up @@ -286,7 +287,8 @@
"PritTorrent/1.0",
"QuerySeekerSpider ( http://queryseeker.com/bot.html )",
"Quora Link Preview/1.0 (http://www.quora.com)",
"Mozilla/5.0 (compatible; Qwantify/2.2w; +https://www.qwant.com/)/*",
"Mozilla/5.0 (compatible; Qwantify/2.2w; +https://www.qwant.com/)",
"Mozilla/5.0 (compatible; Qwantify-prod34997/1.0; +https://help.qwant.com/bot/)",
"ROI Hunter; https://api-dev.roihunter.com",
"RSSRadio (Push Notification Scanner;[email protected])",
"Rainmeter WebParser plugin",
Expand Down Expand Up @@ -409,7 +411,6 @@
"Mozilla/5.0 (compatible; Linux i686; Yandex.Gazeta Bot/1.0; +http://gazeta.yandex.ru)",
"Mozilla/5.0 (compatible; YaDirectFetcher/1.0; +http://yandex.com/bots)",
"Mozilla/5.0 (compatible; YandexAntivirus/2.0; +http://yandex.com/bots)",
"Mozilla/5.0 (compatible; YandexAntivirus/2.0; +http://yandex.com/bots)",
"Mozilla/5.0 (compatible; YandexBlogs/0.99; robot; B; +http://yandex.com/bots)",
"Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)",
"Mozilla/5.0 (compatible; YandexDirect/3.0; +http://yandex.com/bots)",
Expand Down Expand Up @@ -613,6 +614,7 @@
"l9explore/1.3.0",
"Expanse, a Palo Alto Networks company, searches across the global IPv4 space multiple times per day to identify customers' presences on the Internet. If you would like to be excluded from our scans, please send IP addresses/domains to: [email protected]",
"Mozilla/5.0 (compatible; MaCoCu; +https://www.clarin.si/info/macocu-massive-collection-and-curation-of-monolingual-and-bilingual-data/)",
"Mozilla/5.0 (compatible; CLASSLA-web; +https://www.clarin.si/info/classla-web-crawler/)",
"Electronic Frontier Foundation's Do Not Track Verifier (for questions or concerns email [email protected])",
"Mozilla/5.0 (compatible; InfoTigerBot/1.9; +https://infotiger.com/bot)",
"Mozilla/5.0 (compatible; Birdcrawlerbot/0.5; +https://crawla.de)",
Expand Down Expand Up @@ -780,7 +782,7 @@
"Mozilla/5.0 (compatible; Odin; https://docs.getodin.com/)",
"YouBot (+http://www.you.com)",
"Mozilla/5.0 (compatible; YouBot/1.0; +https://about.you.com/youbot/)",
"SiteScoreBot v20210315",
"SiteScoreBot v20210315 - https://sitescore.ai",
"Mozilla/5.0 (compatible; AwarioBot/1.0; +https://awario.com/bots.html)",
"MBCrawler/1.0 (https://monitorbacklinks.com/robot)",
"mariadb-mysql-kbs-bot (+https://github.com/williamdes/mariadb-mysql-kbs; [email protected])",
Expand Down Expand Up @@ -963,5 +965,49 @@
"acme.sectigo.com/v2/DV",
"acme.sectigo.com/v2/OV",
"acme.sectigo.com/v2/EV",
"Mozilla/5.0 (compatible; Website-info.net-Robot; https://website-info.net/robot)"
"Mozilla/5.0 (compatible; Website-info.net-Robot; https://website-info.net/robot)",
"Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36 (compatible; PagePeeker/3.0; +https://pagepeeker.com/robots/)",
"Mozilla/5.0 (compatible; SemrushBot-SWA/0.1; +http://www.semrush.com/bot.html)",
"Mozilla/5.0 (compatible; RedekenBot/0.1; +https://www.redeken.com/bot/)",
"semaltbot/0.1 (+http://semalt.net)",
"Mozilla/5.0 (compatible; MakeMerryBot/1.0; +https://makemerry.app/bots)",
"Timpibot/0.9 (+http://www.timpi.io)",
"Mozilla/5.0 (compatible; Timpibot/0.8; +http://www.timpi.io)",
"Tublm.com/Bot/fubpdfdotcom/Bot/Bot -❤️- +https://tublm.com/game/2048_merge",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0.1 Safari/605.1.15 (compatible; Validbot; +https://www.validbot.com)",
"NPBot",
"Mozilla/5.0 (compatible; CuriousCatgirl Research; +https://curiouscatgirl.cynthia.dev)",
"xx032_bo9vs83_2a",
"Mozilla/5.0 (compatible; heritrix/3.3.0-SNAPSHOT-20160721-2308 +https://www.domaincodex.com)",
"Swisscows Favicons",
"Mozilla/4.0 (compatible; fluid/0.0; +http://www.leak.info/bot.html)",
"workona-favicon-service/1.0.0",
"Bloglines/3.1 (http://www.bloglines.com)",
"shadowforce.io - sslshed/0.1",
"search.marginalia.nu",
"Mozilla/5.0 (compatible;vu-server-health-scanner/1.0;https://130.37.198.75/index.html)",
"Searcherxweb",
"Mozilla/5.0 (platform; rv:geckoversion) Gecko/geckotrail Firefox/firefoxversion",
"Report Runner",
"Node.js",
"Mozilla/5.0 (X11; Windows x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36 Functionize",
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/W.X.Y.Z Safari/537.36 Prerender (+https://github.com/prerender/prerender)",
"Mozilla/5.0 (Linux; Android 11; Pixel 5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/W.X.Y.Z Mobile Safari/537.36 Prerender (+https://github.com/prerender/prerender)",
"Prerender (+https://github.com/prerender/prerender)",
"bl.uk_ldfc_bot/3.4.0-20220727 (+https://www.bl.uk/legal-deposit/web-archiving)",
"Mozilla/5.0 (Windows NT 6.1; Win64; x64; +http://url-classification.io/wiki/index.php?title=URL_server_crawler) KStandBot/1.0",
"Wordupindexinfo1/0",
"wordupsearchengine-1",
"Wordup-1",
"Mozilla/5.0 (X11; Linux x86_64) Miniature.io/6.0 AppleWebKit/537.36 (KHTML, like Gecko) QtWebEngine/5.12.4 Chrome/69.0.3497.128 Safari/537.36",
"Mozilla/5.0 (Linux; Android 9; SM-G973F) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.101 Mobile Safari/537.36 Convertify",
"ZoteroTranslationServer/WMF (mailto:[email protected])",
"Mozilla/5.0 (compatible; MuckRack/1.0; +https://muckrack.com)",
"Mozilla/5.0 (compatible; um-IC/1.0; mailto: [email protected]; Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.1",
"Mozilla/5.0 (compatible; um-ANS/1.0; mailto: [email protected]; Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.1",
"Mozilla/5.0 (compatible; um-FC/1.0; mailto: [email protected])",
"Mozilla/5.0 (compatible; um-CC/1.0; mailto: [email protected])",
"Mozilla/5.0 (compatible; CyberFind Crawler; +https://cyberfind.net/bot.html)/Nutch-1.20",
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/124.0.6367.207 Safari/537.36 WordPress.com mShots",
"wp.com feedbot/1.0 (+https://wp.com)"
]
Loading

0 comments on commit c1945b2

Please sign in to comment.