Saturday, June 11, 2011

how to avoid SQL Injection vulnerabilities in your MySQL database query code, with examples



A task that people often want to do with server-side programming code is pull data from a database. How to do that can become a rather complicated topic because there are many languages, such as PHP and ASP.NET, that you can use for connecting to many types of databases, such as MySQL and Microsoft SQL Server.
This article is about the security issues of using PHP to query a MySQL database, but the principles of security best practices are the same when querying any database using any language.
The basic act of querying a database is safe. You do it with SQL code that looks like this:
SELECT * FROM `pets` WHERE `owner`='Gwen' AND `species`='cat'
If you always ran the query that way with the search terms hard-coded in the text, there would never be a security issue. However, when writing code for a website, you most often have an SQL code template like this:
SELECT * FROM `pets` WHERE `owner`='something' AND `species`='something'
and it is your site visitors who provide both "somethings" by typing them into text boxes. Your PHP code must combine their input with your SQL template code to create a customized SQL query that you can execute. 
There is a security issue because the most popular ways of creating that combination, used by many people for many years, published in code examples all over the web, are insecure. They allow a malicious visitor to type into the text box, instead of a legitimate "something", SQL code. By clever use of punctuation, the SQL code can corrupt the query template and trick it into doing something other than the simple search that you intended. It can pull secret data out of the database and display it on a web page, insert new malicious data into the database tables, or even delete the database. 
The specifics of how this can occur are described in greater detail and slightly more technical language in the SQL Injection article at Wikipedia. I will not try to improve on their examples.  
From this point forward, I'll assume that you are probably reading this article because your website has already been hit with an SQL Injection attack and you are trying to figure out how to repair your code to prevent it from happening again. 

Example code vulnerable to SQL Injection, and how to repair it

These are the two most popular ways of combining user input with an SQL template to create the final query. If your code looks like this, it is vulnerable to SQL Injection and needs repair: 
$query = 
 "SELECT * FROM `pets` WHERE `owner`='" . 
 $_POST['ownername'] . 
 "' AND species='" . 
 $_POST['species'] . "'";
$query = 
 sprintf("SELECT * FROM `pets` WHERE `owner`='%s' AND `species`='%s'", 
 $_POST['ownername'], 
 $_POST['species']);
The most serious problem with both is that they insert the user-submitted text directly into the query template. If the user-submitted text contains quote characters, it's easy to see that the combined text can end up with the wrong numbers of quote characters, or unmatched quotes. Those are the things that can corrupt the query and hijack it to do something malicious. The solution is to "escape" all the incoming quote characters, using the mysql_real_escape_string() function. This makes the user-submitted quotes look different from the quotes that were already in the template, so MySQL won't get them mixed up with each other. Here are the improved versions:   
$query = 
 "SELECT * FROM `pets` WHERE `owner`='" . 
 mysql_real_escape_string($_POST['ownername']) . 
 "' AND species='" . 
 mysql_real_escape_string($_POST['species']) . "'";
$query = 
 sprintf("SELECT * FROM `pets` WHERE `owner`='%s' AND `species`='%s'", 
 mysql_real_escape_string($_POST['ownername']), 
 mysql_real_escape_string($_POST['species']));
Even if all you do is revise your code to look like the improved versions, that is a big step toward improving its security.
In the longer example code below, I'll add an additional security measure. I'll pre-validate the incoming $_POST variables for legitimacy and not use them at all (not do the query) if they're invalid.

PHP+MySQL Query Example Code

1) Example entire PHP page for processing form input, using the (older) mysql extension methods:

If you are only repairing old code and don't want to switch to completely new database methods, this example shows ways to improve security. You can use it as a guide or template for changes your code might need. Links go to pages in the PHP online manual.
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>

<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<meta http-equiv="Content-Language" content="en-us">
<title>Pets Database Search Page</title>
</head>

<body>

<?php

$ResultCount = 0;

// ACCUMULATES THE ROWS OF THE HTML OUTPUT TABLE. IT STARTS WITH THE TOP ROW COLUMN HEADINGS.
$ResultTableRows = 
 '<tr style="font-weight:bold;">
  <td>Id</td>
  <td>Name</td>
  <td>Owner</td>
  <td>Species</td>
  <td>Sx</td>
  <td>Birth</td>
  <td>Death</td>
 </tr>
 '; 

/*
The following variable validations have the side-effect of prohibiting the potentially dangerous 
quote (and other) chars, so this validation is a secondary defense against SQL Injection. 

The allowed characters in the preg_match() example are ones legal in regular expressions
because I wanted to use regex for testing queries on the pets database.
For normal use on a web page, the set of allowed characters would usually be much more restrictive.

My validation strategy: each variable starts as unset. That protects against register_globals=On.
It also allows using an isset() test on it, as a flag, later.
The variable only becomes set if the user-provided value for it was acceptable.

Using trim() to remove leading/trailing whitespace is optional, 
but useful for variables where accidental user-submitted whitespace 
could cause a search to fail when it really should have succeeded
except for irrelevant whitespace that caused two strings not to match exactly.
*/

unset($FindOwner);
if(isset($_POST['ownername']))
{
 $_POST['ownername'] = trim($_POST['ownername']);
 if(preg_match('/^[a-zA-Z0-9^$.*+\[\]{,}]{1,32}$/u', $_POST['ownername']))
  $FindOwner = $_POST['ownername'];
}

unset($FindSpecies);
if(isset($_POST['petspecies']))
{
 $_POST['petspecies'] = trim($_POST['petspecies']);
 if(preg_match('/^[a-zA-Z0-9^$.*+\[\]{,}]{1,24}$/u', $_POST['petspecies']))
  $FindSpecies = $_POST['petspecies'];
}

/*
For the numeric variable, this example shows an alternative validation strategy:
Give the variable an initial default value that can be used if the user input is unacceptable.
That can also be done with string variables, but makes no sense for the 2 variables above.

All incoming $_GET/POST variables, even numeric ones, initially arrive as strings.
*/

$RowsLimit = 100;
if(isset($_POST['rowslimit']))
{
 $_POST['rowslimit'] = trim($_POST['rowslimit']);
 if(preg_match('/^[0-9]{1,4}$/u', $_POST['rowslimit']))
 {
  settype($_POST['rowslimit'], 'int');
  if(($_POST['rowslimit'] > 0) && ($_POST['rowslimit'] <= 1000))
   $RowsLimit = (int)$_POST['rowslimit'];
 }
}

// IF WE HAVE LEGAL SEARCH CRITERIA FOR BOTH REQUIRED FIELDS, DO THE SEARCH.
if(isset($FindOwner) && isset($FindSpecies))
{
 require($_SERVER['DOCUMENT_ROOT'] . '/config.php');
 if(mysql_connect($server, $user, $password) && mysql_select_db($database)) 
 {
  $query = sprintf("SELECT * FROM `pets` WHERE `owner` REGEXP '%s' AND `species` REGEXP '%s' LIMIT %d", 
   mysql_real_escape_string($FindOwner), 
   mysql_real_escape_string($FindSpecies),
   (int)$RowsLimit);
  
  if($result = mysql_query($query))
  {
   while($row = mysql_fetch_array($result, MYSQL_ASSOC))
   {
    $ResultCount++;
    $ResultTableRows .=  
     "<tr>\n" .
     '<td>' . htmlentities($row['id'], ENT_QUOTES) . "</td>\n" . 
     '<td>' . htmlentities($row['name'], ENT_QUOTES) . "</td>\n" . 
     '<td>' . htmlentities($row['owner'], ENT_QUOTES) . "</td>\n" . 
     '<td>' . htmlentities($row['species'], ENT_QUOTES) . "</td>\n" . 
     '<td>' . htmlentities($row['sx'], ENT_QUOTES) . "</td>\n" .  
     '<td>' . htmlentities($row['birth'], ENT_QUOTES) . "</td>\n" . 
     '<td>' . htmlentities($row['death'], ENT_QUOTES) . "</td>\n" . 
     "</tr>\n";
   } 
   mysql_free_result($result);
  }
  mysql_close();
 }
}
?>

<form method="post" action="">
<p>Enter criteria to search for:</p>
<p>Owner name (or partial): 
<input id="ownername" name="ownername" size="50" value="<?php if(isset($FindOwner)) echo $FindOwner; ?>"></p>
<p>Species to search for  : 
<input id="petspecies" name="petspecies" size="50" value="<?php if(isset($FindSpecies)) echo $FindSpecies; ?>"></p>
<p><input type="submit" value="Submit"></p>
</form>

<?php
if(!empty($_POST))
{
 echo '<h2>', $ResultCount, ' results:</h2>
  <table cellpadding="4" cellspacing="0" border="1" align="left" width="100%">', 
  $ResultTableRows, 
  '</table>';
}
?>

</body>
</html>

2) Example to process form input, using the (new) mysqli extension methods:

These mysqli methods are even safer. This block of code replaces the equivalent block in the example page code above.
if(isset($FindOwner) && isset($FindSpecies))
{
 require($_SERVER['DOCUMENT_ROOT'] . '/config.php');
 if($mysqli = new mysqli($server, $user, $password, $database))
 {
  if($stmt = $mysqli->prepare('SELECT * FROM `pets` WHERE `owner` REGEXP ? AND `species` REGEXP ? LIMIT ?')) 
  {
   if($stmt->bind_param('ssi', $FindOwner, $FindSpecies, $RowsLimit))
   {
    if($stmt->execute())
    {
     if($stmt->bind_result($id, $name, $owner, $species, $sx, $birth, $death))
     {
      while($stmt->fetch())
      {
       $ResultCount++;
       $ResultTableRows .=  
        "<tr>\n" .
        '<td>' . htmlentities($id, ENT_QUOTES) . "</td>\n" . 
        '<td>' . htmlentities($name, ENT_QUOTES) . "</td>\n" . 
        '<td>' . htmlentities($owner, ENT_QUOTES) . "</td>\n" . 
        '<td>' . htmlentities($species, ENT_QUOTES) . "</td>\n" . 
        '<td>' . htmlentities($sx, ENT_QUOTES) . "</td>\n" .  
        '<td>' . htmlentities($birth, ENT_QUOTES) . "</td>\n" . 
        '<td>' . htmlentities($death, ENT_QUOTES) . "</td>\n" . 
        "</tr>\n";
      }
     }
    }
   } 
   $stmt->close();
  }
 }
 $mysqli->close();
}

Other features of the code examples

Validate and/or sanitize all user-submitted values

For every data item that you'll be receiving from your users, make a list of all its possible legitimate values. If such a list isn't possible (it often isn't), define as completely as possible the attributes that all its possible legitimate values would have. For example, if the input is supposed to be an integer, it must consist of only digits. If it is supposed to be a name, it should contain only alphabetic characters and no punctuation except perhaps a hyphen or apostrophe (unfortunately, allowing apostrophes opens a security hole). In all cases, you can apply a length test, too. Incoming values should not be excessively long. Make your rules as restrictive as possible so that only legitimate values can pass the test.
In whatever programming language you use, build a regular expression (or a substitute using another coding method) based on the rules you just created, for testing the incoming value. If the incoming value passes the test (i.e. it's legitimate and reasonable for what your script expects it to be), you can proceed to use the value in your database query. If it doesn't pass the test, don't do the query. Reject the form submission.
That is called data validation. It is a standard best practice for dealing with all data that comes from outside a script. The reasons, and various validation methods, including a more modern method (PHP filter) for validating numbers and other variable types, are described more thoroughly in my article on preventing Remote File Inclusion, another type of security vulnerability.
In addition to having minimal examples of input validation, just enough to show the principle, the example scripts above demonstrate that it can be easier to implement validation if you don't use the incoming $_POST values directly. Instead, there are internal variables for holding the values. An incoming $_POST value is transferred to its internal variable only if it passes the validation tests. This strategy makes it easy to give the internal variable a safe default value which (depending on the application) can be usable when the incoming value doesn't validate. 
Sometimes, if an incoming value doesn't validate, a programmer chooses to do some text processing on it, such as removing or transforming illegal characters, to make it pass the validation. That is called data sanitization, or cleaning, or scrubbing. In some applications, that can be a sensible and reasonable thing to do, but in many applications it's not. After you transform the user's input, the end result is likely not to be what the user intended, anyway, and if the input was malicious, there's no point trying to help them. Just reject the submission.
Data validation is critically important in all languages, when dealing with all database programs. Unfortunately, it's not enough to ensure security by itself. 

Use prepared statements of parameterized queries with placeholders for bound variables

In Example 2) above, that is done by this line:
$mysqli->prepare('SELECT * FROM `pets` WHERE `owner` REGEXP ? AND `species` REGEXP ? LIMIT ?')
The question marks are placeholders for three pieces of not-yet-known data which are the "parameters" in this parameterized query. "Parameterized" means PHP considers the code and data parts of the query to be distinct and separate.
PHP compiles this into a prepared statement (or you could call it a pre-prepared query) even though the data parts are not yet known.
It then binds three variables as data into the locations of the placeholder question marks, but only after casting (converting) them to the specified data types. 'ssi' means string, string, integer:
$stmt->bind_param('ssi', $FindOwner, $FindSpecies, $RowsLimit)
The data items that replace the question marks will be treated as data even if they happen to look like (or are) SQL code. Because PHP knew ahead of time which part of the query is the code, it cannot be confused by data trying to masquerade as code.
Contrast that with the old-school method of creating a query by string concatenation: the query didn't exist until after the code and data were combined into a text string, and PHP had to try to parse the text string to create a query. Obviously, mistakes could be made.
Example 1) above does not have these protections and cannot be revised to have them. It creates the query after code and data have been combined into a text string, so it has that inherent weakness. The protection of having code and data kept separate is provided by the newer mysqli database extension, which is why it is so much better.

Keep database connection data in a separate file

The examples read database connection data (username, password) from a config.php file, for two reasons:
  1. All scripts that need the connection data can read it from that one file. If it is ever necessary to revise the information (such as to change the database password), it only needs to be done in that one file.
     
  2. If a server misconfiguration causes the PHP interpreter to fail, a web page is served with its PHP code still in it. If the connection data is in the file, it gets printed on the web page, an obvious security hazard. By contrast, if the connection data is in a separate file and included with the PHP require() function, the PHP failure causes require() not to execute, and the connection data remains safe.
The config file can be named anything, but it should have a .php extension, not .txt, .inc, or anything else. That is because when a .php file is served (and PHP is functioning properly), the PHP code is stripped out (while being executed) before the page is sent. Thus, even if someone manages to request your config.php file with their browser, it will just be a blank page. However, if your file is called something like config.txt, config.inc, or config.db, it does not have that PHP protection. The full text will be sent to the browser.
Since config.php has the PHP protection against disclosure, additional security measures are optional under normal circumstances, but they include: store config.php outside public_html, or in a password-protected directory, or protect it with an .htaccess rule. On a Linux server that uses suPHP, you can set its file permissions to 0640 or even 0600 to protect it from being read by another user on your shared server.
An example config.php file:
<?php
$server = 'localhost';
$user = 'YourMySQLUser';
$password = '5m#Cnx(6hjNG';
$database = 'testpets';
?>
Example .htaccess protection for it:
<Files config.php>
order allow,deny
deny from all
</Files>

Use the htmlentities() function on any text that will be output to a web page

As the example code retrieves text from the database, it passes it through the htmlentities() function before outputting it to the web page. This is a best practice when handling any text that might contain HTML tags that are not, however, meant to be interpreted as HTML code when a browser receives them. Instead, they are meant to be displayed on the page, as text.
Here is an example of what htmlentities() does. An HTML script tag looks like this:
<script>
If your PHP code pulls that text out of your database and places it on the web page as-is, a browser that receives the web page will not display the text on the page. Instead, it thinks it marks the beginning of some JavaScript it is supposed to run.  
However, if your PHP code uses htmlentities("<script>") to put the text on the web page, the result is this: 
&lt;script&gt;
When a browser receives that in a web page, it knows that it's not a <script> tag. Instead, it knows that it's supposed to put the text "<script>" on the page.
That might not seem like a very big deal, especially if your database data never contains HTML tags, anyway, but there is a situation where it can be quite important: what if, in spite of your best efforts, you do become the victim of an SQL Injection attack, and somebody manages to inject malicious script code into your database tables, where there previously weren't any HTML tags to worry about? 
If your PHP code was putting the text on the page as-is, it will now output the malicious script, as-is, like this:
<script type="text/javascript" src="http://badsite.com/badscript.js"></script>
Suddenly, your page is infected with a malicious JavaScript, and visitors to your site will start getting warnings from their antivirus programs.  
However, if your PHP code was passing the output through htmlentities(), it will output the malicious script to your web page like this:
&lt;script type=&quot;text/javascript&quot; src=&quot;http://badsite.com/badscript.js&quot;&gt;&lt;/script&gt;
Visitors to your site will see that text on the page, but their browsers will not interpret it as JavaScript code as the hackers intended, and they won't get warnings from their antivirus programs. 
htmlentities() rendered the output harmless EVEN AFTER an attack put malware in the database!

More defenses against SQL Injection and database corruption

Limited user privileges

Whenever you connect to a database, you must do it as a "database user". The "user" is one of the pieces of information in your connection data. You don't have to, and should not, connect to all your databases as the same user. Instead, create a different user for each database. You should not connect to any of your databases as your cPanel user. You can create as many MySQL users as you want.
Whenever you need to do a database task in a PHP script, you should do it as a user who is only authorized to access that one database, and who has the least possible privileges required to accomplish the task. For example, if your script only searches a database and outputs results on the page, the user for doing that should have read-only privileges for the database.
The reason is that if an SQL Injection attack succeeds in corrupting a query, it can only do as much damage as the hijacked user is authorized to do. If it tries to insert new data into a table, but the user doesn't have INSERT privileges, it will fail.
How to manage MySQL users in cPanel.

Ban SQL Injection attacks in .htaccess

You can use Apache .htaccess to ban (reject, without processing) incoming requests where the HTTP query string contains SQL code or punctuation symbols that are often used in SQL code. This can help protect you from undiscovered SQL Injection vulnerabilities in applications you use but whose code you have no control over because you didn't write it.

Preventing SQL Injection in other languages and other database programs

As mentioned earlier, the principles of security discussed in this article are applicable when connecting to any database program from any scripting language, but the specifics of creating and using the connection vary widely. I've tried to mention throughout the article key words and concept phrases that would be useful in web searches to find similar articles about other languages and other database programs.
PHP has connection methods for many databases, including Microsoft SQL Server.
Some of the most severe and widespread injection attacks have been against ASP/ASP.NET, especially in combination with MSSQL and IIS. I don't have experience in ASP, ASP.NET, or MSSQL/T-SQL to draw on for creating example code, so instead I'm trying to assemble a list of links that appear to be most useful.
Microsoft offers a free downloadable program for scanning and finding SQL Injection vulnerabilities in Classic ASP VBScript source code, with instructions and links to Microsoft articles. They also describe best practices and example code for ASP.NET (plus another page about the same), and how to use the .NET Regex class for data validation in ASP.NET. 
This is the simplest and thus most understandable example of creating a parameterized query in ASP.NET that I've seen.
This demonstrates creating a parameterized query for a login page with VBScript and dynamic SQL in Classic ASP using ADODB.Connection. This, also Classic ASP, is more complex, but adds typecasting to force variables to their desired types. This does a similar thing with slightly different code.
Code snippets for parameterized queries in several different languages/environments, including Cold Fusion, Delphi, Java.

Notes

  1. My somewhat strange example code is based on the "pets" database in the MySQL Tutorial.
     
  2. You can discover whether your website is receiving SQL Injection attacks at my hack attempt identifier.

0 comments:

Post a Comment