domingo, 15 de enero de 2012

Eliminating SQL Injection Attacks - A Transparent Defense Mechanism

M. Muthuprasanna, Ke Wei, Suraj Kothari
Iowa State University, Ames, IA, USA
Email: fmuthu, weike, kotharig@iastate.edu

Abstract
The widespread adoption of web services as an instant means of information dissemination and various other transactions, has essentially made them a key component of today's Internet infrastructure. Web-based systems comprise
both of infrastructure components and of application specific code. Various organizations have started extensively deploying Intrusion Detection/Prevention Systems and Firewalls as a means of securing their vital installations.
However, very little emphasis is laid on securing the applications that run on these systems, apart from frequent updates and patching. SQL-Injection Attacks are a class of attacks that many of these systems are highly vulnerable to, and there is no known fool-proof defense against such
attacks. In this paper, we propose a technique, which combines static application code analysis with runtime validation to detect the occurrence of such attacks. The deployment of this technique eliminates the need to modify source code of application scripts, additionally allowing seamless
integration with currently-deployed systems. We provide various optimizations improving overall efficiency, and also preliminary evaluation of prototype developed
.

1. Introduction
The World Wide Web has experienced remarkable growth in recent years. Businesses, individuals and governments have found that web applications can offer efficient and reliable solutions to challenges of communicating and conducting commerce in the 21th century. Various corporate bodies whose business model completely focuses on the Web like Google, Yahoo, Amazon etc. have taken web interactions to newer heights. As many enterprise applications dealing with sensitive financial and medical data turn online, the security of such web applications has come under close scrutiny. Compromise of these applications represents a serious threat to organizations that have deployed them, and also to users that trust these systems to store confidential data. The potential downtime and damages that
could easily amount to millions of dollars have also prohibited many mission critical applications, which could greatly benefit users, from going online. Hence, it is crucial to protect these applications from targeted attacks.
However, the current state of application security leaves much to be desired. The 2002 CSI and FBI revealed that, on a yearly basis, over half of all databases experience at least one security breach and an average episode results in close to $4 million in losses [1]. A recent penetration testing study of more than 250 Web applications concluded that at least 92% of Web applications are vulnerable to some form of malicious intrusions [2]. Recent U.S. industry regulations such as Sarbanes-Oxley Act, try to enforce strict security compliance by application vendors [3] and there is an urgent need to find means of satisfying these requirements.
SQL Injection Attacks (SQLIAs) constitute an important class of attacks against web applications. SQLIAs can give attackers direct access to the database underlying an application and allow them to leak/alter confidential information [4] or to even execute any malicious code [5]. There are many examples of SQLIAs with serious consequences, and
the list of victims includes high-profile organizations, such as Travelocity, Tower Records, RIAA etc. [6]. The increasing number of web applications falling prey to these attacks is alarmingly high [7] [8] [9]. In fact, SQLIAs have been included in list of top 10 threats to web applications [10].
2. SQL Injection Attacks
In this section, we present a web application that is vulnerable to a SQLIA and explain how an attacker could exploit this vulnerability. We also discuss various other techniques that can be employed to gain illegitimate access to systems.
Consider a typical web application in which an user on a client machine can access services provided by a web server, having a database backend, like an online email account.
When the user enters a login and a password in the web form and presses the Submit button, an URL is generated (http://foo.com/home.jsp?login=guest&pass=test) and sent to the web server. The user input is interpreted by the servlet home.jsp, which then in turn builds a dynamic SQL query, submits the query to the database and uses the response from the database to generate HTML-pages that are sent back to the user. Suppose query in servlet page is of form;
SELECT * FROM user WHERE login='$login' AND pass='$pass'
If the login and password as provided by the user are used, the query to be submitted to the database takes the form;
SELECT * FROM user WHERE login='guest' AND pass='test'
A web site that uses this servlet would be vulnerable to SQLIAs. If the user were to enter [' OR 1=1 --] and [ ] instead of [guest] and [test], the query would take the form;
SELECT * FROM user WHERE login=' ' OR 1=1 --' AND pass=' '
The characters -- mark the beginning of a SQL comment, and anything beyond is ignored. The query as interpreted by the database now has a tautology and is always satisfied; hence returning information about all users. Thus
an attacker can bypass all authentication modules gaining unrestricted access to critical information on the server.
An SQL Injection Attack (SQLIA) is a subset of the unverified/unsanitized input vulnerability and occurs when an attacker attempts to change the logic, semantics or syntax of a legitimate SQL statement by inserting new SQL keywords or operators into the statement. This definition includes,
but is not limited, to attacks based on tautologies, injected additional statements, exploiting untyped parameters, stored procedures, overly descriptive error messages, alternate encodings, length limits, second-order injections and injection of "UNION SELECT", "ORDER BY" and
"HAVING" clauses. A detailed explanation of different forms of SQLIAs and ways in which they can be exploited are available in the public domain [11] [12].
The widely deployed defense today is to train the programmers and web-developers about the security implications of their code and to teach them corrective measures and good programming practices, as outlined in [13]. However, rewriting or revising the entire lot of existing legacy code is
not an easy process and is not a financially viable option for many organizations. Even this does not guarantee any foolproof defense and hence we need automated processes to detect the vulnerability and eliminate them. Various other techniques like use of stored procedures [14], prohibiting display of database server error messages and use of escape sequences (available in PHP as Magic Quotes) for sanitizing
user inputs are employed as a quick fix solution. Unfortunately,
even these security measures are inadequate against highly sophisticated attacks as outlined in [15]. Recently, better detection strategies like SQLIA signature detection have been proposed by IDS/IPS vendors [16], but their success is still limited to a small subset of the whole range of
attack mechanisms [17]. It is of even greater concern that products like Microsoft SQL Server etc. provide attackers direct access to command line shell, registry using methods like xp_cmdshell, xp_regread etc.

3. Related Work
Various SQLIA detection techniques have been proposed in literature but many of them suffer in terms of immediate usability and deployability. Many existing techniques, such as filtering, information-flow analysis, penetration testing, and defensive coding, can detect and prevent a subset of the vulnerabilities that lead to SQLIAs. Techniques that employ input validation are prone to a large number of false positives and yet there is no guarantee that there are no false negatives. Safe Query Objects [18] and SQLDOM [19] use encapsulation of database queries to provide a safe and
reliable way to access databases but they require developers to learn and use a new programming paradigm. SQLrand [20] provided a radical shift in the way this problem can be approached using query randomization [21]. However, it could be circumvented if the key used for randomization were to be exposed.
Another popular mechanism has been static analysis of the code for vulnerabilities [22]. The Java String Analysis library [23] provides us with a mechanism for generating models for Java strings and can be extended
to generate fairly accurate SQL-query models. JDBCChecker [24] [25] statically checks for the type correctness of dynamically generated SQL queries. Although these techniques are effective, they cannot capture more general forms of SQLIAs that generate syntactically and type correct
queries. The authors use automated reasoning in [26] to detect tautologies in the dynamically generated SQL queries, but other forms of SQLIAs still go undetected.
Recently, researchers have been exploring the use of static analysis in conjunction with runtime validation [27] to detect instances of SQLIAs. In [28], the authors have proposed the use of parse trees to detect malicious user input.
In [29] [30], the authors have used an automaton construction technique to defend against SQLIAs. However, these techniques still require modification of the application source code which may not be preferable to most developers and their organizations in general. Also there is an additional runtime analysis overhead in terms of execution time which cannot be avoided due to the sequential nature of the analysis techniques. Also the element of access control is not captured in such models which can, in the theoretically worst case, still allow SQLIAs to occur for certain poor implementations of the application.

Our Contribution: We propose SQLIA detection combining static & runtime analysis with following features:
1. No code modification required, simple web server patch
required (SQLIAs captured by altered data flow path).
2. Optimized runtime analysis using SQL-graphs, and SQL query validation in parallel, for faster webpage accesses.
3. SQLIAs using access control violations in the script and different character encodings also captured.

4. Proposed SQLIA Defense Mechanism
We propose a SQLIA detection/prevention technique here that addresses all types of SQLIAs, as discussed previously.
The technique works by combining static analysis with runtime validation. The basis of such a technique is that the application source code contains key information that would help us in inferring models of the expected, legitimate SQL queries generated by the application at runtime, and that a
SQLIA would necessarily violate these models. In the static analysis stage, we use program analysis techniques to represent the SQL-queries as Finite State Automata (FSA) and view them as a SQL-graph. In runtime validation stage, we check dynamically generated SQL queries with static data structures for compliance and flag them safe/unsafe.

4.1. Static Analysis
To perform Static Analysis of the application source code, we make use of the Java String Analysis library [23]. It is essentially an inter-procedural data-flow analysis that abstracts the control flow of the program and represents the semantics of string manipulation operations on string variables as a flow graph. To use this analysis, we identify certain
locations of the target program as hotspots. A hotspot is defined as a point in the application code that issues SQL queries to the underlying database. For each hotspot, we build a model that represents all the possible SQL queries that may be generated at that hotspot. In our analysis, we identify all the "execute" methods of the Statement class in
Java as hotspots. Consider a sample web application having an authentication or login script that implements 3 functions
namely, checking whether a user has been blacklisted, checking whether the user is using a weak/insecure password and then authenticating the user. Code 1 represent a small code snippet from such a sample application. We
then identify lines 10, 13 and 17 as hotspots. The Strings login and pass are user inputs and the 3 functions BlackList, WeakPass and Authentic are used to construct SQL queries.
Figure 1. String Analysis

The string analysis uses the SOOT Framework [31] [32] to parse a class file and produce inter-procedural control-flow graphs. Starting with the hotspots in a Java program, it keeps track of the String, StringBuffer and multidimensional String Arrays that get manipulated and thus construct a flow graph. The nodes in flow graph represent expressions, edges represent directed def-use relationships for possible data flow while string manipulation methods are represented as operators.
1. public class Authentic extends HttpServlet {...
2. public void StatusCheck() {...
3. Connection conn = DriverManager.getConnection(connString);
4. Statement stmt = conn.createStatement();
5. ResultSet output 1 = BlackList(stmt, login, pass);
6. ResultSet output 2 = WeakPass(stmt, login, pass);
7. ResultSet output 3 = Authentic(stmt, login, pass); ... }
8. public ResultSet BlackList (Statement stmt, String login, String pass) {
9. String sql = "SELECT * FROM user WHERE login = '" + login + "'";
10. return stmt.executeQuery(sql); }
11. public ResultSet WeakPass (Statement stmt, String login, String pass) {
12. String sql = "SELECT * FROM weak pass WHERE pass = '" + pass + "'";
13. return stmt.executeQuery(sql); }
14. public ResultSet Authentic(Statement stmt, String login, String pass) {
15. String sql = "SELECT * FROM user WHERE ";
16. sql += "login = '" + login + "' AND pass = '" + pass + "'";
17. return stmt.executeQuery(sql); }
18. }
Code 1: 3 Hotspots, 3 SQL Queries, and 2 User Inputs

In Fig. 1, nodes Trim, Replace and Concat represent string operations, while Fragment_1, Fragment_2 and SQL Statement represent variables or expressions.
4.1.1 SQL Finite State Machine
The final result of the string analysis is a Non-Deterministic Finite State Automaton (NDFA) that expresses all the possible values a particular string can assume using single character transitions in the automaton. We now create a SQL Finite State Machine (SQL-FSM) by performing a depth first
traversal of the NDFA for that hotspot and grouping characters as either SQL keywords, operators, or literal values, and creating transitions that are annotated by the literal values (tokens). Each SQL keyword is represented as is, while the user input string variables are represented as VAR, indicating that they can change at runtime. Fig. 2 shows the NDFA
and SQL-FSM for the first hotspot in the sample code. Note that in the general case, both the NDFA and SQL-FSM can have multiple non-looping branches, indicating possible execution of multiple SQL queries at a single hotspot.
4.1.2 SQL-graph Representation
We can thus construct a SQL-FSM for each of the hotspots in the program. These data structures now capture the semantics of the different SQL queries that are to be sent to the database at runtime. Any user input would be compared against this template and any change in the SQL-FSM structure would indicate a possible SQLIA.We note that running each and every query under the scanner at runtime could be an expensive process. Given that the user input would realistically consist of a few strings only but the number of SQL queries that get executed in a program could be very large, we now try to optimize number of queries that need to be
put under the scanner during runtime to ensure the validity of dynamically generated queries, using a SQL-graph.

Figure 2. NDFA and SQL-FSM for Hotspot

The SQL-graph in Fig. 3 represents 4 different SQL queries in the program as nodes within a logical boundary, and 3 different user inputs as being outside the logical boundary.
If a particular user input (I) is used in a SQL query (Q), the relationship (R) between the two nodes is indicated by an undirected link between the 2 nodes. We now define dependencies (D) in the SQL-graph as links that point from one SQL query to another SQL query such that the user inputs used by the former is a proper superset of the user inputs
used by the latter. For SQL queries that use the same set of user inputs, one of them is chosen as a representative query and is made to point to the others. We see the dependencies represented as directed arrows in the SQL-graph. Drawing equivalence to Code 1, Q1, Q2 y Q3 represent the 3 different SQL queries (also the 3 different hotspots in this case), while I1 and I2 represent the user inputs login and pass. Q4 and I3 could possibly correspond to some other hotspot in the program not represented in the code snippet.

Figure 3. SQL-graph Representation


The concept of SQL-graph is used to reduce runtime scanning overhead by restricting the number of queries that need to be scanned along any execution path that is taken in the program. SQL queries that do not use user inputs are not included in the SQL-graph. Only the SQL queries that are manipulations included) are included in the SQL-graph representation.
The choice of such a representation and the resulting benefits in terms of runtime overhead would be explained as part of Runtime Validation.
exposed to the user inputs in some form or the other (string manipulations included) are included in the SQL-graph representation.
The choice of such a representation and the resulting benefits in terms of runtime overhead would be explained as part of Runtime Validation.

4.2. Runtime Validation
During runtime, the SQL queries (with the user inputs embedded) are compared against the corresponding SQLFSMs to check for their validity. If the user inputs cause the dynamically generated SQL queries to not conform to the semantics of the intended SQL queries as in the SQLFSMs, then they are flagged as SQLIAs, else they are passed through. Fig. 4 shows the case where an SQLIA is not caused and the query is passed through. Also, it shows the second example where an SQLIA has been caused and hence gets rejected as a potentially malicious query. The literals along both the static SQL-FSM and the runtime SQL-FSM, as one traverses from the Start node to the End node, should be identical. The other check that can be enforced is that the length of the SQL-FSM chain for a particular instance is exactly the same for the static and runtime SQL-FSMs. Thus SQLIAs employing tautologies and injecting additional statements can be captured by this technique.
The case where alternate encodings like URL Encoding, UTF-8 etc. are used by the attackers can also be addressed by requiring the runtime validation to occur only after all the user input has been converted to a single encoding format as interpreted by SQL Engine in database server.

Figure 4. Possible SQL-FSM Violation

The SQL-graphs for the different programs or applications resident on the application server are computed of ine, using static analysis as described above. These SQL-graphs need to be constantly updated to re ect any changes in the code made by the programmer at any point of time. Hence,
each of these SQL-graphs is associated with a version number that corresponds to the last modi cation timestamp on the original application source code. Every time the SQLgraph is invoked, the version number is checked to see whether it is current, else it is recomputed. Every time a
client request comes in, the user inputs are fed into the SQL graph and the runtime SQL-FSMs of the different SQL queries in the SQL-graph are validated. A Verification Table (VT) is then computed for the different SQL queries indicating whether it can be allowed to pass through or whether it should be dropped before being sent to the database server.
Now verifying the SQL-FSM for all queries in SQL-graph can be computationally intensive and expensive in terms of runtime processing, resulting in degraded Quality of Service(QoS) to the end-users. The concept of directed dependency is thus used to reduce total runtime overhead.
If an user input does not cause any SQLIA in one query, it means that it conforms well to the SQL query semantics as de ned by the SQL language. Then it is implicitly known that the same input in any other query would also not cause a SQLIA. Hence, we see that if this knowledge is not exploited, we would be redundantly verifying the same user input over and over again in multiple SQL queries in the SQL-graph. The directed dependency in the SQL-graph tells us which SQL queries are supersets of which other SQL queries in the SQL-graph. It would suffice to check only those SQL queries that are supersets of other queries and thus implicitly check the other queries encompassed by it. Thus we filter out all those SQL queries that have no directed dependency edges coming into them and verify only the validity of the SQL-FSMs corresponding to those SQL queries. We thus see that we need to validate only a small fraction of the entire SQL-graph and still achieve SQLIA prevention guarantee. In Fig. 3, it would suffice to check SQL-FSMs corresponding to SQL queries Q1 and Q4. In code 1, normally the SQL-FSMs for Q1, Q2 and Q3 would have been invoked, while using a SQL-graph we need to invoke the SQL-FSM for Q1 only.
The worst case number of queries that need to be examined at runtime is at most equal to unique number of user inputs (very few in number), assuming an extreme case of a single user input per SQL query. Without using SQL-graphs, we would have examined all SQL queries generated by that application code, which could realistically be very large in number. Thus using SQL-graphs, we can achieve optimal SQLIA prevention using runtime validation of the abstraction of application code generated by static analysis. See Appendix for correctness proof of SQL-graph reduction.

4.3. Architecture
One of the distinguishing features of the proposed technique as opposed to other known techniques in literature, is its transparent mode of operation requiring no changes to the underlying application source code. The proposed technique can be implemented as a minor modification to the existing web server architecture by use of a middleware, thereby eliminating the need to tamper with the application code. Fig. 5 is a modular representation of the proposed architecture. The client request (HTML or XML request) is directly processed by the Web Server (WS) and in the normal case, the web server forwards the request to the Script Engine (SE) in the Application Server (AS), which then issues SQL queries to the Database Server (DS) and replies back to the client. In the modified architecture, the web server duplicates the request and sends it to the Application Middleware (AM) in addition to sending the request to the script engine. The SQL queries generated dynamically by the script engine pass through the Database Middleware (DM) before it is actually sent to the database server. The AM also sends the Verification Table (VT) to the DM. Thus the new architecture consists of 2 new modules, namely the Application Middleware and the Database Middleware.
These changes can be easily incorporated into current web server architecture by installing few patches (prototype implementation
discussed later) and is a one-time deployment requiring only a single web server daemon restart.

Figure 5. Proposed Architecture

The SQL-graphs and the SQL-FSMs for all the different application programs available for public use, are stored at the AM. Every time the AM receives a request comprising of the user inputs and the program to be invoked, it fetches the cached copy of that program's SQL-graph and corresponding
SQL-FSMs. It validates the version number of the SQL-graph by comparing it with the last modification timestamp on the corresponding program source code. If it is a stale copy, it issues a recompute command and waits for
it to complete. If it is still a fresh copy, it then runs a runtime validation on the SQL-graph by comparing the static and runtime SQL-FSMs. It then generates a Verification Table (VT) indicating which SQL queries can be passed as is and which need to be dropped before it reaches the database server. This VT is sent to the DM through a control channel.
The script engine issues SQL queries which then get filtered at the DM based on the VT before they are sent to the database server. We thus see that using the current architecture, we can prevent all incidents of SQLIAs in a transparent manner. The code independent nature of this scheme makes its deployment commercially viable, and without fear that it would break any existing functionality offered.
As there could be many SQL queries in the program currently being executed, we would need to associate a hotspot with its corresponding SQL-FSM to avoid raising false alarms. We thus need to associate a unique identifier with every hotspot and its corresponding SQL-FSM in the SQLgraph
for every program. The choice of the unique identifier can technically be any parameter, preferably one that is most suitable to the underlying application. For example, we could use the SOOT Framework [31] and use the line number of the hotspot as the unique identifier. We do not impose any restrictions on the choice of this parameter and leave this open as a design issue.
Although the runtime overhead incurred due to the extra SQLIA validation is minimal as verified by the prototype evaluation, we would still ideally like to reduce this overhead to provide the users an enhanced web browsing experience, with added security without any noticeable processing
latency or delay. We try to achieve this by introducing parallelism in the proposed architecture to reduce the overall execution time. If the Application Middleware and the Script Engine were to run sequentially, the total runtime of the application would be the sum total of the original execution time and the extra overhead. But by requiring the web server to feed the request to both of them simultaneously, we let both of them operate in parallel in multi-core systems, thus effectively cutting down the execution time to the maximum of the original execution time and the extra overhead. Experimental verification of the performance improvement due to this parallelism introduced in the architecture is provided in the prototype evaluation section.

5. Enhanced Features
We identify additional pitfalls in defending against SQLIAs, and present safeguards against such advanced attacks.
5.1. Access Control Enforcement
Consider a hypothetical web application maintained by the Patent and Copyright division in some corporation. The company policies require that if an employee is connecting from within the corporate network and is authenticated, then he is given unrestricted access to the entire patent database, using his true passphrase. However, if the user is not connecting
from within the corporate network, he is given a default passphrase to access limited portions of the database.
If both the default passphrase and the true passphrase (giving unrestricted access) are both stored in a single table in the database and the web application is badly implemented as indicated in Code 2, it gives ample scope for a SQLIA employing access control violation to succeed. Let us suppose that the default passphrase is returned on not being authenticated and the unrestricted access passphrase is returned on successful authentication by the application. An employee now connecting from outside the corporate network and providing a login as [ AND pass = '] would get access to the true passphrase as the user input would cause a SQL Injection Attack and authenticate him successfully. Thus the employee can bypass the security mechanisms put in place. It would be of even greater concern if the employee's password has been compromised by some hacker and then used in turn to steal all confidential data. Although the chances of such an implementation are quite rare, there is no guarantee that the programmers would have ever forseen such a situation and designed the application in an intelligent manner.
This SQLIA would not be captured by most known detection strategies including static analysis techniques employing runtime monitoring. Line 8 in the code represents a hotspot and both the SQL queries correspond to
the same hotspot. As the SQL-FSMs are constructed for the hotspots, both these queries are treated as valid at that hotspot. However, the actual execution path in application code that caused that hotspot to be reached could be different from the one that it is validated against during runtime. Thus access control in terms of which execution path triggers the hotspot is not captured by any known technique. Although it represents a very restrictive class of SQLIAs, it could still be a potential pitfall against targeted hackers. Our prototype has been designed to additionally
capture access control semantics of the code during static analysis and runtime validation to prevent such attacks.
1. public class CopyrightedInfo extends HttpServlet { ...
2. public ResultSet AccessKey (String login, String pass) { ...
3. Connection conn = DriverManager.getConnection(connString);
4. Statement stmt = conn.createStatement();
5. String sql = "SELECT * FROM passphrase";
6. if (InCorporateNetwork() == "no") sql += " WHERE login = '" + login + "'";
7. else sql += " WHERE login = '" + login + "' AND pass = '" + pass + "'";
8. return stmt.executeQuery(sql); }
9. }
Code 2: Access Control Violation leading to SQLIA


The main problem that leads to access control violation is that the SQL-FSM associated with a particular hotspot is branched, and that the program execution might continue along a certain branch while the SQLIA validation/
detection is done along a different branch. Thus we need to ensure that the branches in a SQL-FSM that are used for both verification and execution are always the same. It is important to note in this context that all SQLIA-prevention techniques proposed so far in literature employing static
analysis with runtime validation suffer from this vulnerability.
We now additionally associate a Trace Table (TT) with both the Script Engine and the Application Middleware, such that the different SQL-FSMs and the various branches that were processed in both the blocks are stored in order. The Database Middleware checks whether the Trace Tables generated by both the SE and the AM are identical, and in case of a failure ags a SQLIA, and drops the associated database queries while rejecting the connection.
The Trace Tables can be enumerated by using a connection identifier, SQL-FSM branch identifier, or any other identifier, the choice being left open as a design parameter.

5.2. Complete Character Set Inclusion
Most of the SQLIA prevention techniques in literature assume that the user does not enter a quote or tick (') as legitimate user input. Each such attempt would be treated as an SQLIA attempt by the web server, disliked by both developers and users alike. For any technique to gain widespread
acceptance, this issue has to be addressed appropriately. We propose a simple randomization approach, as used in [20], to tackle this problem. The quotation marks as interpreted by the SQL-graphs during runtime would keep varying randomly with each access and the substitute representation
could be any long bitstream or bytestream. The randomness would ensure that there is only a very low probability that a legitimate user input would be falsely agged as a SQLIA.
We additionally need to tackle the issue of compromise of either the random generator seed or the next occurring random number. By ensuring that the random bitstream used for any SQLIA validation does not occur in any user input, we can ensure that any compromise or intelligent guesswork
would not affect the system, and is caught in runtime validation stage. The implementation details can be easily derived and are not presented here due to space constraints.

5.3. Key Differentiators
Various SQLIA detection techniques have been proposed in literature using offline static analysis and runtime attack detection.
Although our technique uses the same basic framework, it has key differentiating features that provide additional performance and security guarantees. The work by Halfond & Orso [29] [30] is the closest and also currently the best known technique for SQLIA detection.
The main differentiator is that the proposed technique can be deployed in a transparent manner, requiring no change to the underlying application source code, whereas the different techniques proposed in literature require significant changes to be made to the source code. Thus the proposed technique is easily deployable to protect legacy codes at
minimal costs requiring little human intervention. Additionally, the SQLIA protection module is an architectural modification that can co-exist and run in parallel to the scripting engine, and hence avoids the sequential execution overhead as in most other techniques in literature. The parallel implementation ensures least QoS loss or service degradation, while providing maximum security benefits.

Also, the use of the SQL-graph data structure reduces the number of SQL queries that need to be analyzed during runtime, and thus minimizes the overall execution time to a great extent. The number of SQL queries validated in now linear in number of user inputs and not in the number
of queries issued by application code to the database. Thus the proposed technique is the fastest SQLIA-resistant codeindependent defense mechanism known.
Additionally, it enables the application to use the entire character set rather than imposing restrictions on the use of certain ASCII symbols. Various advanced attack scenarios such as access control violations have not been studied to date in literature, and the proposed technique defends
against attacks that exploit this vulnerability. All these additional performance and security benefits provided by the proposed SQLIA prevention technique lends well for easy deployment by organizations to protect systems and software against unauthorized access.

6. Prototype Evaluation
In order to evaluate our approach, we developed a prototype of our proposed technique and deployed it on an existing web server. We ran a few trial runs on the web server before the new prototype was deployed and repeated the same tests on the new setup. The performance metrics we use are the number of false positives generated, number of false negatives
generated and the extra processing overhead during runtime as tested on a few representative web applications having varying number of SQL queries and tokens.
6.1. SQLIA Detection Accuracy
We subjected both the protected and unprotected web servers to different types of SQLIAs; namely use of tautologies, inserting additional SQL statements, access control violation, second-order SQL injection [11] and various other SQLIAs known. The proposed technique detected all types of SQLIAs in all cases. The proposed technique is thus a secure and robust solution to defend against SQLIAs.

Table 1. SQLIA Detection Accuracy

6.2. Execution Time - Static Analysis
The proposed technique introduces two types of overhead. The first overhead is due to the static analysis of the application source code to construct SQL-FSMs and SQLgraphs
and the second due to runtime validation. As the static analysis is an offline process, the users do not experience the delay induced due to this one-time operation (until next code modification). Fig. 6 shows the time required for the static analysis to complete when executed on different
programs having varying number of hotspots and tokens in SQL queries. The graph shows the time taken by this offline process, for increasing number of SQL tokens. All experiments were run on a standard dual-core system.
Figure 6. Overhead due to Static Analysis


6.3. Execution Time - Runtime Validation
The runtime validation incurs some overhead in terms of execution time at both the Application Middleware and the Database Middleware. We again consider two different architectures here, namely sequential and parallel depending on whether the Application Middleware and the Script Engine in the Application Server, run concurrently or one after another in a pipelined fashion. These experiments were run on 3 different programs having identical SQL-graphs and SQL-FSMs i.e. the difference between the 3 programs is the extra computation time at the Script Engine alone.
We denote by S, the execution time for the program at the Script Engine. We denote by M, the execution time for the program at the Application Middleware. When tested under the different experimental conditions, namely varying the number of hotspots, number of SQL queries and tokens
in the program, we see that for the programs S1 and S2, S M respectively in all cases, while for S3, the relationship between S and M is variable. Fig. 7 shows the results of running these benchmarking tests on the different systems under varying program conditions.
For the program S1, we see that the Application Middleware always takes longer to execute than the Script Engine.
Hence, the user does experience a negligible delay of the order of a couple of milliseconds, on a parallel architecture.
The delay is much more significant when run on a sequential architecture. For the program S2, the Application Middleware always finishes quicker than the Script Engine and hence the user does not experience any delay at the AppliApplication Server. However, the user would experience a delay
at the Database Middleware, which is of the order of a few hundred microseconds. This delay has been amplified a bit in the graph to distinguish between the different curves in the graph. For the program S3, the Script Engine executes faster when there are fewer tokens in the SQL queries in the program, while the Application Middleware is faster as this count increases. The overall execution time is the higher of the two runtime durations at all times. Note that the parallel architecture performs better than the sequential architecture in all cases. Also, the experiments indicate that when the Database Middleware acts as a blocking device waiting for both the inputs to reach it before execution, the actual execution overhead at the Database Middleware is negligible, amounting to a few microseconds. We thus see that although traditionally people have looked at sequential architectures as filtering modules for SQLIAs and other attacks also, the parallel architecture that we propose actually outperforms it. Also the proposed architecture implements the technique in a transparent manner requiring no change to the source code and the programmers can remain oblivious to the existence of such an attack prevention middleware.
Various other experiments based on the specific use of stored procedures have been studied and presented in our previous work [33], but not here due to space constraints.

7. Conclusions
Most web applications employ a middleware technology (scripting engine) designed to request information from a relational database in SQL parlance. SQL injection is a common techniques hackers employ to attack these webbased applications. These attacks reshape the SQL queries, thus altering the behavior of the program for the benefit of the hacker. In this paper, we present a fully automated technique for detecting, preventing and reporting SQLIA incidents.
The technique abstracts the intended SQL query behavior in an application in the form of a SQL-graph, as a one-time offline procedure using static analysis of the application code. This graph is then validated against all the different user inputs at runtime to capture all malicious SQL queries, before they are sent to the database server for execution.
This graph model helps in capturing all the different types and modes of execution of SQLIAs, in a transparent manner requiring no modification to the underlying application source. We have also provided preliminary evaluation results of prototype developed against various performance
metrics affecting web server performance. Thus addressing these critical security issues in web applications would help transition easily towards next generation web services.

References
[1] Computer Security Institute, Computer crime and security survey,
http://www.gocsi.com/press/20020407.jhtml, 2002
[2] WebCohort Inc., Only 10% Web applications secured against common hacking techniques,
http://www.imperva.com/company/news/2004-feb-02.html, 2004
[3] K. Beaver, Achieving Sarbanes-Oxley compliance for Web applications,
http://www.spidynamics.com/support/whitepapers/, 2003
[4] Cesar Cerrudo, Manipulating Microsoft SQL Server Using SQL Injection, http://www.appsecinc.com/presentations/ Manipulating SQL Server Using SQL Injection.pdf
[5] Steve Friedl, SQL Injection Attacks by Example,
http://www.unixwiz.net/techtips/sql-injection.html
[6] Tower Records site exposes Data, http://www.netsecurity.org/news.php?id=1593, 2002
[7] SANS ISC, @RISK: The Consensus Security Vulnerability Report,
http://www.sans.org/newsletters/risk/display.php?v=5&i=22
[8] CERT Note, http://www.kb.cert.org/vuls/id/982109, 2005
[9] Bugtraq Archive, http://seclists.org/lists/bugtraq/2005/, 2005
[10] Open Web Application Security Project (OWASPD), Top Ten most critical web application vulnerabilities, http://www.owasp.org/documentation/topten.html, 2005
[11] C. Anley, Advanced SQL Injection in SQL Server Applications,
http://www.nextgenss.com/papers/advanced_sql_injection.pdf
[12] C. Anley,(more) Advanced SQL Injection, http://www.nextgenss.com/papers/more_advanced_sql_injection.pdf
[13] M. Howard, D. LeBlanc, Writing Secure Code, Microsoft Press, Redmond, Washington, 2nd Edition, 2003
[14] Statements and Prepared Statements, http://g.bookpool.com/gp/0507apress/159059407X-2367.pdf, White Paper
[15] Kevin Spett, Blind SQL Injection, http://www.spidynamics.com/whitepapers/Blind_SQLInjection.pdf, White Paper
[16] K. Mookhey, N. Burghate, Detection of SQL Injection & Cross-site
Scripting Attacks, http://www.securityfocus.com/infocus/1768
[17] O. Maor, A. Shulman, SQL Injection Signature Evasion, http://www.imperva.com/application_defense_center/white_papers/sql_injection_signatures_evasion.html, White Paper
[18] W. R. Cook, S. Rai, Safe Query Objects: Statically Typed Objects as Remotely Executable Queries, ICSE 2005
[19] R. McClure, I. Kruger, SQL DOM: Compile Time Checking of Dynamic SQL Statements, ICSE 2005
[20] S. W. Boyd, A. D. Keromytis, SQLrand: Preventing SQL Injection Attacks, pp. 292-304, ACNS 2004
[21] G. S. Kc, A. D. Keromytis, V. Prevelakis, Countering Code- Injection Attacks with Instruction-Set Randomization, CCS 2003
[22] V. B. Livshits, M. S. Lam, Finding Security Vulnerabilities in Java Applications with Static Analysis, USENIX Security Symp., 2005
[23] A. S. Christensen, A. Moller, M. I. Schwartzbach, Precise Analysis of String Expressions, pp. 1-18, SAS 2003
[24] C. Gould, Z. Su, P. Devanbu, JDBC Checker: A Static Analysis Tool for SQL/JDBC Applications, ICSE 2004
[25] C. Gould, Z. Su, P. Devanbu, Static Checking of Dynamically Generated Queries in Database Applications, ICSE 2004
[26] G. Wassermann, Z. Su, An Analysis Framework for Security in Web Applications, pp. 70-78, SAVCBS 2004
[27] Y. Huang et. al., Securing Web Application Code by Static Analysis & Runtime Protection, WWW2004
[28] G. T. Buehrer, B. W. Weide, P. A. G. Sivilotti, Using Parse Tree Validation to Prevent SQL Injection Attacks, SEM 2005
[29] W. G. Halfond, A. Orso, Combining Static Analysis & Runtime Monitoring to Counter SQL-Injection Attacks, WODA 2005
[30] W. G. Halfond, A. Orso, AMNESIA: Analysis and Monitoring for NEutralizing SQL Injection Attacks, ASE 2005
[31] R. Vallee-Rai, E. Gagnon, L. Hendren, P. Lam, P. Pominville, V. Sundaresan, Optimizing Java Bytecode Using the Soot Framework: Is It Feasible?, LNCS, Volume 1781, pp. 18, Jan 2000
[32] A. R. Group, Java Architecture for Bytecode Analysis (JABA), http://www.cc.gatech.edu/aristotle/Tools/jaba.html
[33] Ke Wei, M. Muthuprasanna, S. Kothari, Eliminating SQL Injection Attacks in Stored Procedures, pp. 191-198, IEEE ASWEC, 2006

8. APPENDIX
CLAIM: Validating subset of SQL queries derived using superset dependency
from SQL-graph, suffices to ensure that no SQLIA goes undetected.
PROOF: As each user input is treated as a variable in automata construction
for a particular hotspot in the application source, a single placeholder
is allocated for it. The user input (including a NULL), is then associated
with that particular placeholder. Hence, any valid user input results in preserving the length of the SQL-FSM, while an SQLIA results in increasing
the length of the SQL-FSM. Thus a reduction in length of SQL-FSM is
not feasible under any circumstances. Now, any user input that leads to
SQLIA needs to inject a tick (') to limit the variable boundary and then
inject the malicious code. Thus an user input that increases the length of a SQL query by overflowing the variable boundary, always does so in every
query in which it occurs. Similarly, an user input that is safe in a certain SQL query is also safe in every other SQL query in which it occurs. However, these other SQL queries maybe be potentially unsafe due to other user inputs. Thus, individually verifying every user input in every query trivially guarantees correctness or assured SQLIA detection. However, as argued before, it suf ces to check a particular user input in any one query for correctness. Thus to reduce the number of SQL queries that need to be checked at runtime, we choose those queries for validation where many user inputs occur to avoid multiple verifications for each of those user inputs.
The optimal choice of query subset is provided by dependency relations
in SQL-graph, and thus validating these SQL queries alone suffices to guarantee correctness, and avoid false positives and false negatives.

0 comentarios:

Publicar un comentario