Saturday, June 11, 2011

How to understand the Google Safe Browsing Diagnostic report for malicious or hacked websites



When the Google web crawler visits a site and gets attacked by malware, Google flags the site as suspicious with a "This site may harm your computer" warning in search results. The search result links no longer go to the site, but go instead to an explanatory page about the warning. The Firefox browser, which looks up sites in the Google Safe Browsing database, displays a "Reported Attack Site!" warning, with a link to the explanation. By either of these routes, you can end up at a Google Safe Browsing Diagnostic report.  
Another way to view the Safe Browsing Diagnostic, for any site, is to enter this URL in your browser address bar. Replace EXAMPLE.COM with the name of the site:
http://www.google.com/safebrowsing/diagnostic?site=EXAMPLE.COM
The report is short and lacks explanation, but it contains useful information for webmasters who are trying to clean up their legitimate sites that have been turned dangerous by hackers. 
Below are explanations of the sections of the Safe Browsing Diagnostic report, directed toward webmasters who are trying to clean up their websites. This page was previously part of a long article about how to diagnose the reason a site is flagged and how to get the Google warning removed.

What is the current listing status for _____?

Site is listed as suspicious - visiting this web site may harm your computer.

This tells you whether your site is listed right now as suspicious. If it is, it means that Google has determined that at least one of your pages, by one method or another, under at least some circumstances, is causing visitors to get attacked by malware. There will be warnings (as described above) in Google search results and in the Firefox and Chrome browsers. Internet Explorer does not use the Google Safe Browsing database (it uses a Microsoft database), so IE might not give any warning message. That does not mean the site is clean. If the Google Safe Browsing diagnostic says that visiting a site can cause malicious content to be downloaded to your computer without your permission, you can be almost 100% sure that their assessment is correct. 
If this report disagrees with what you see in search results (for example, you know that your site currently is flagged in search results, but the diagnostic says it is not listed as suspicious), it's possible your site has more than one diagnostic report and you need to find the other(s). There are at least two situations where you can have more than one diagnostic report: 
  1. Only part of your site, not the whole site, is flagged. In the search results, click the link to one of the pages that is flagged, to get the diagnostic report for that part of the site, such as example.com, example.com/forum, or blog.example.com.
     
  2. In the past, it was possible to have separate diagnostic reports (which sometimes did not agree with each other) for example.com and www.example.com. Google seems to have resolved that problem in most cases, but check out this possibility anyway if the diagnostic does not seem to be accurate for your situation.
If the diagnostic report says your site is not listed as suspicious, but you still get warnings in Firefox, it is due to a delay in Firefox updating from the Google database, and is normally resolved within a day or so. 

Part of this site was listed for suspicious activity 9 time(s) over the past 90 days.

This tells you the recent history. In the example above, the site has been flagged and unflagged 9 separate times, which is a lot. Its webmaster has probably been removing malicious code over and over again but not fixing the site's security vulnerabilities, so the site keeps getting hacked repeatedly.

What happened when Google visited this site?

Of the 110 pages we tested on the site over the past 90 days, 11 page(s) resulted in malicious software being downloaded and installed without user consent.

This is mostly self-explanatory. It gives you an indication how widespread the infection is in the pages of your site. You can get a partial listing of the pages Google considers suspicious at Webmaster Tools at Google Webmaster Central.

The last time Google visited this site was on 2009-11-20, and the last time suspicious content was found on this site was on 2009-11-20.

When the first and second dates are the same, it means that the most recent review found malware. The site is still infected.

The last time Google visited this site was on 2009-11-20, and the last time suspicious content was found on this site was on 2009-11-18.

This means that the most recent review did not find malware. If your site is still shown as "suspicious" even though the last scan did not find malware, the status should change to "not suspicious" within approximately 1 day, unless the site has been flagged many times recently. In that case, there might be a several-day delay while Google waits to see if the site stays clean. Another reason for a delay is if you deleted the infected pages instead of cleaning them. Google wants to see cleaned pages. They do not want you to delete pages, get the flag removed, and then put infected pages back online.

Malicious software includes 1 scripting exploit(s), 1 trojan(s). Successful infection resulted in an average of 1 new process(es) on the target machine.

This itemizes the kinds of malware that attacked the Google crawler when it visited your pages.

Malicious software is hosted on 2 domain(s), including gumblar.cn/, beladen.net/.

When your pages cause malware to be loaded into a visitor's browser, it means just that: they cause it to happen. It does not necessarily mean the actual virus code is in your page. It probably isn't, and it probably is not even in some other file on your website. Usually, the virus code is stored at some other site. But if your page contains an iframe that fetches its content from that other site, it will cause the malicious code to be loaded into the visitor's browser.
This line in the diagnostic is the list of sites where the malware is actually hosted (stored). The visitor's browser is fetching the virus code from there. If you are hunting for malicious iframes in your website files, these domain names are ones you should be hunting for. Unfortunately, they might be encoded in a way that makes them hard to find with a text search, and it is also possible that other domains, rather than these, are referenced in your iframes. The reason is that sometimes there is a chain or sequence of events, involving other intermediary websites, that eventually, but not immediately, causes malware from the above sites to be loaded. I will discuss intermediaries in the next section.
This list of hosting domains can be very helpful. In the first of the examples above, the reference to gumblar.cn means that it is certain your site was hacked as the result of a virus infection on the PC of one of your website administrators, which stole the FTP password. In the second example, the reference to beladen.net means that it is not just your website that is compromised; the entire server is infected, and so are all the websites on it. A web search on the domain names you find in this list can help discover what type of infection your website has and also can indicate what type of security vulnerability it has that allowed it to be infected. Unfortunately, it doesn't often lead to such definitive conclusions as it does for gumblar or beladen.   

3 domain(s) appear to be functioning as intermediaries for distributing malware to visitors of this site, including...  

As mentioned above, when your site is loaded in a browser, elements in your page such as iframes can trigger a chain of events that bring malicious content to the visitor's browser. That chain could involve several hops, through several different websites, before the malicious code gets delivered.
For example, let's say your page contains an iframe that loads a page from site A, but that page consists of JavaScript code that fetches and executes a VBScript from site B, which fetches a Trojan downloader (the payload, the first part of the actual malicious software, which might consist of several parts carrying different types of attacks) from site C.
In this scenario, your site will certainly be flagged for causing the malicious content to get loaded into the visitor's browser (initiating the sequence). Sites A and B are intermediaries in the chain, and site C is the host of the code that carries out the attack.
When you are searching your code for hidden iframes, search for domains listed in this section of the report as intermediaries, in addition to the ones listed in the previous section as hosts.  

This site was hosted on 1 network(s) including...

This tells you the internet network where your site is hosted.  You might recognize the name of your webhost here, or the name of a larger network that your host is part of. This does not seem to be particularly useful information. Any large network will have many compromised websites in it, and no large network will consist 100% of compromised websites.  

Has this site acted as an intermediary resulting in further distribution of malware?

Over the past 90 days, _____ appeared to function as an intermediary for the infection of __ site(s) including _____, _____, ...

Is your site one of the intermediaries as described in the previous section? In addition to the general scenario presented above, here are two more specific ones:
  • Let's say you host a PHP script that other sites call to get dynamically generated content from you. Your site gets hacked, and someone injects your PHP code with iframes that point to a third site that hosts malicious code. As long as your own site's pages don't call your own PHP script, you're not causing malware to be loaded into a visitor's browser, and you're not the host of the malware, either, but your PHP script is facilitating the distribution of malware by acting as a middle link. You're an intermediary.
     
  • Let's say you are an advertising distributor. You accept ads submitted to you by companies who want to advertise, and you place those ads on the sites in your publisher network. One of your advertisers submits a malicious ad. When the ad appears on your publisher websites, you're an intermediary. This Safe Browsing report is an example of an advertiser listed as an intermediary at the time of this writing. Note that although they are not flagged as suspicious, and their own pages are not flagged in search results with "This site may harm your computer", they can be causing their publisher network sites to get flagged. 

Has this site hosted malware?

This part of the report usually says No. As mentioned earlier, most sites, even compromised ones, do not actually host (contain) the virus code. The hackers store the virus code at a central location. Then they hack many sites, injecting iframe code that points to the central location. With this arrangement, they can change the virus code quickly and easily. The changes get propagated throughout the internet without their having to re-hack thousands of sites to update the code to the new and improved version.
If your report says Yes, your site is hosting malware, then you are one of the chosen few where they actually are storing the virus code. When web surfers load pages from other sites, those pages contain iframes that point to your site and fetch the virus code from your site. Obviously, you need to find where the virus code is being stored in your website files.
If your report says No, this site has not hosted malware, that does not mean your site is clean. It only means your site is not a central location where the virus code is being stored.

Notes

  • In the scenario described earlier where your site is flagged because its pages initiate the sequence of malware delivery and "sites A and B are intermediaries, and C is the host", the intermediaries and hosts will not necessarily be flagged as suspicious. This is counterintuitive because the intermediaries and hosts are a danger to the internet because they are either conduits to the flow of malware or store it so it can be used in attacks against web surfers or against other websites.

    The internet danger level would be reduced if these sites were flagged as suspicious. It would alert the webmasters (at least the ones who are innocent victims) that their sites need to be cleaned and better secured. Without such warning, many webmasters of sites that are intermediaries or hosts have no idea that they have a problem.

    The best sense I can make of this situation is that the Google search result warning is intended to help protect web surfers by giving them information they can do something about: they can avoid going to a flagged site.

    Intermediary and host sites usually play their part through "orphan" files hidden inside their sites. These are files that have no ordinary hyperlinks pointing to them from anywhere on the internet. Because they are not pages that web surfers can get to by following links, and because Google's intent is to protect web surfers using their search results (not necessarily to "make the internet safer"), they do not bother to flag intermediaries and hosts. Once a web surfer visits a site that initiates the delivery of malware, the chain through the intermediaries and hosts is automatic. There is nothing a web surfer could do about it even if they had advance warning, so there is no point in creating such a warning. 

    There is something a web surfer can do for protection, however: turn JavaScript Off. In many cases, that will prevent you from being redirected into the chain of intermediaries and hosts, and prevent the malware from being delivered to your browser.

Unusual situations observed

1) Firefox blocked access to a "Reported Attack Site", but the site was not flagged in Google search results. The Safe Browsing Diagnostic report said suspicious content was "never found", yet it also said that the suspicious software was hosted on 3 domains, and gave their names. The reason: the website was hacked and did redirect visitors to the sites that Google knew were malicious. However, the malicious sites had already been shut down, so they weren't serving any actual malware.

0 comments:

Post a Comment