Most phishing sites follow a known pattern. And people like to flag bad sites. So Google and a few other organizations, such as stopbadware.org have a collection of feeds that can be leveraged by software vendors to provide a warning or flat-out block potentially fraudulent sites.
If a piece of malware is found, even if buried deep in a site, the site will likely get picked up by a robot or reported by a user. Robots can pick up a lot, as people who exploit WordPress sites and stuff like that are often after playing a numbers game. Harvesting hundreds of thousands or email address and sending phishing emails. It only takes one person to give you banking information Given that they’re just dropping a file in an open web directory, the attacker might otherwise go months before enough people complained and the web host shut them down.
Google Safe Browsing came about similar to how realtime blacklisting has worked with email for a long time. Sites are listed and then blocked as needed. But privacy works differently with web browsing and so Google added a bunch of cool stuff that is described at https://safebrowsing.google.com. Basically though, there are some encrypted files on nearly every computer running Safari, Firefox, Chrome, etc that contains information about bad sites. This is updated fairly regularly, as well as some signatures of known nastiness and a little machine learning magic so that the systems are able to react to emerging threats.
In case you’re interested in writing your own tools, Google Safe Browsing has an API, which is documented at API Documentation.
If you find that you’re managing a site that gets attacked, maybe you learn about it initially from having the site blocked. If this happens, you would need to remove the stuff that was put on your site that resulted in the site being blocked and then request removal from the list of reported phishing sites, use this form provided by Google.
Safari uses Google Safe Browsing. There is a “Fraudulent sites” setting in the Security Preference pane for Safari. Here, you check a box and then you get prompted when you attempt to open a bad site.
Safari SafeBrowsing involves having Safari pull a new version of the bad stuff from Google every now and then. You can see the date and timestamp that this occurred using the defaults command to read com.apple.Safari.SafeBrowsing.plist, as follows:
defaults read com.apple.Safari.SafeBrowsing.plist
The output contains the SafeBrowsingRemoteConfigurationLastUpdateDate key for /Users//Library/Preferences/com.apple.Safari.SafeBrowsing.plist:
The actual bad stuff file is tricky. A number of temporary dynamic files are stored in /var/folders, and then inside a hierarchy generated by guids for a given system. Here, you’ll find a couple of files, including /var/folders/r1/05ns3cqs0cg5c42x38gk0c0w0000gn/C/com.apple.Safari.SafeBrowsing and
These files are binaries and cannot be viewed. They appear to be downloaded via the com.apple.Safari.SafeBrowsing.BrowsingDatabases.Update service routinely. Looking at their date and time stamp though, will give you a good idea of when the last update was run if you care to find that out.
If you notice, a lot of the built-in apps can be scanned with the same mdls command. There are certainly better ways for some, but when it comes to runtime cost, spotlight can respond quicker than a lot of other tools (other than purpose-built open source tools of course, who already have a smaller amount of data specific to the task). 3rd party software can be checked the same way. Let’s take Microsoft Outlook as an example:
Additionally, Frameworks work a little differently. If I wanted to get the WebKit Framework version programmatically, I will need the system_profiler command along with the SPFrameworksDataType option. This will show me the version of WebKit, but strictly piping the output into grep won’t find the WebKit version. Instead I actually need to use an option I don’t use often with grep. Note that -A will allow you to define a number of lines to output following the pattern in question, so here I’m saying constrain my output to what you find that’s WebKit + the next ten lines, then constrain further for just the version number.
system_profiler SPFrameworksDataType | grep -A10 WebKit: | grep Version