r2 - 06 Mar 2007 - 12:41:27 - PozycjonowanieStronYou are here: ClamAV Wiki >  Main Web  >  OtherTopics > PhishingDetection

Warning: if flag --no-phishing-strict-url-check (or equivalent clamd/clamav-milter config option) isn't given, then phishing scanning is done only for domains listed in daily.pdb.

If your daily.pdb is empty, then by default NO PHISHING scan is DONE, UNLESS you give the --no-phishing-strict-url-check.

[No longer applies, since there is now an official daily.pdb.
no-phishing-strict-url-check is the old phish-scan-alldomains option.

This document is slightly out-dated. It should be updated before 0.90 release.
]

phishingCheck() determines if @displayedLink is a legit representation of @realLink.

Steps:

1. if realLink == displayLink => CLEAN

2. url cleanup (normalization)

  • whitespace elimination
  • (simple) html entity conversion
  • convert hostname to lowercase
  • normalize \ to /

If there is a dot after the last space, then all spaces are replaced with dots, otherwise spaces are stripped. So both: 'Go to yahoo.com', and 'Go to e b a y . c o m', and 'Go to ebay. com' will work.

3. Matches the urls against a whitelist: a realLink, displayedLink pair is matched against the whitelist.

The whitelist is a list of pairs of realLink, displayedLink. Any of the elements of those pairs can be a regex.

If url is found in whitelist --> CLEAN

4. URL is looked up in the domainlist, unless disabled via flags (--no-phishing-strict-url-check). The domainlist is a list of pairs of realLink, displayedLink (any of which can be regex). This is the list of domains we do phishing detection for (such as ebay,paypal,chase,....) We can't decide to stop processing here or not, so we just set a flag.

Note(!): the flags are modified by the the domainlist checker. If domain is found, then the flags associated with it filter the default compile-time flags.

5. Hostname is extracted from the displayed URL. It is checked against the whitelist, and domainlist.

6. Now we know if we want to stop processing. If we are only scanning domains in the domainlist (default behaviour), and the url/domain isn't found in it, we return (and mark url as not_list/clean). If we scan all domains, then the domainlist isn't even checked.

7. URL cloak check. check for %00, and hex-encoded IPs in URL.

8. Skip empty displayedURLs

9. SSL mismatch detection. Checks if realLink is http, but displayedLink is https or viceversa. (by default the SSL detection is done for hrefs only, not for imgs)

10. Hostname of real URL is extracted.

11. Skip cid: displayedLink urls (images embedded in mails).

12. Numeric IP detection. If url is a numeric IP, then -> phish. Maybe we should do DNS lookup? Maybe we should disable numericIP checks for --phish-scan-alldomains?

13. isURL(displayedLink). Checks if displayedLink is really a url. if not -> clean

14. Hostnames of real, displayedLink are compared. If equal -> clean

15. Extract domain names, and compare. If equal -> clean

16. Do DNS lookups/reverse lookups. Disabled now (too much load/too many lookups).

-- TorokEdwin - 03 Dec 2006



</center-->

Edit | WYSIWYG | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r2 < r1 | More topic actions
 
Powered by TWiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding ClamAV Wiki? Send feedback