During the 1960's a British computer scientist by the name of Edgar Frank Codd worked on theories relating to data arrangement for IBM. In 1970 Dr. Codd released a paper pertaining to the subject of relational data model, the paper was titled, "A relational Model of Data for Large Shared Data Banks" (Codd, 1970, p. 6). According to documentation, Dr. Codd did not like the prolonged rate at which IBM was moving in applying his relational data model, leaving Codd to extend outward to IBM customers himself as competitors were catching on to the idea and implementing his theories.
With pressure from their customers IBM included the idea in one of their upcoming projects, System R. Although IBM included the idea in the System R project they denied the development team and Dr. Codd access to each other. System R was designed as an experiment to show the usage of the relational data model and how it could be beneficial in a system with the complete function and high performance for everyday use in a production environment. (Berkeley, 1970's) Since Dr. Codd published his original paper on relational data model it has become widely recognized but early on in its origin, there were questions presented on whether an automatic system could function as effectively as algorithms written by advanced programmers. (Berkeley, 1970's) In the end, System R was able to perform and pull together a data sublanguage known as SQL with code at machine level. (Berkeley, 1970's)
The development team was not familiar with Dr. Codd's thoughts and viewpoints and ended up creating a sublanguage which is believed to be SQUARE, "a non-relational language." (Wikipedia, p. Work) SQUARE, stands for "Specifying Queries As Relational Expression", (Collins, 2007) and was developed by Donald D. Chamberlin and Raymond F. Boyce who also worked for IBM. The language utilized set theory and predicate mathematics to select data from a database. (Collins, 2007) Chamberlin and Boyce published "SEQUEL" which stands for "Structured English Query Language" in essence outlined the improvements to SQUARE around 1974. (Chamberlin, 1974)
SEQUEL is equal in strength to the SQUARE language but aimed for users who prefer the English keyword format rather than mathematical notation. In a document by a team of researchers working at the IBM Research Laboratory in San Jose California, "A History and Evaluation of System R" (Berkeley, 1970's) you are clearly able to see the entire outline of how today's SQL language was constructed and arrived at by these early papers. This document precisely outlines E.F. Codd's theories "that relational database systems having two important properties, the first being that information is represented by data values not by connections and two that the system supports high-level language where users place requests for data without utilizing algorithms for processing the requests." (Berkeley, 1970's) It was originally designed to pull data from IBM's relational database management system. SEQUEL was later renamed to SQL as the name "SEQUEL" was a trademark to a UK aircraft company, Hawker Siddeley. (Wikipedia)
SQL is a computer language for databases created for the ability to be able to control data within a relational database system, (RDBMS - Relational Database Management Systems) and was initially built on the concept of relational algebra (R.F. Boyce) with the first being developed at MIT in the early 70's. In the latter part of the 1970's Oracle Corporation which was known back then as Relational Software, Inc. was one of the initial competitors to IBM who saw the potential in Codd's theories and developed Oracles own version of SQL. It wasn't until 1979 that Oracle released their first commercial available version of SQL known as Oracle V2, which was available for VAX systems where FORTRAN language was being utilized. After Oracle released their commercial version of SQL, IBM decided to jump on the band wagon and they released their own commercial version between 1979 and 1983. The SQL language included data queries, updates, schemas, and data access. SQL is divided up into several elements including: clauses, expressions, predicates, queries, statements and whitespace.
Recalling that SQL was originally developed for the purpose of querying data that was being held in IBM's relational databases; therefore SQL is stated as being a "set-based declarative query language and not an imperative language like C or BASIC." (culturalview.com/books/sql.pdf) What is set-based mean? Set based programming means you state the relations and join the tables, add some grouping and the criteria and the leave the database engine to worry with the specifics of "How to do it", so simply you tell SQL "what you want". Declarative query language describes what it wants to accomplish rather than focusing on how to achieve it, therefore you show a relationship between the statements rather than specifying sequences of those statements. For example JavaFX script is a declarative language whereas Java is an imperative language. The C and BASIC languages are procedural languages (imperative) meaning you specify the steps the program takes to get to where it wants to go.
There are several criticisms of the SQL language the first being that implementations are not consistent and are in more cases than not incompatible between vendors, such as the date and time syntax and string concatenation. If the WHERE clause in the query is mistyped a runaway result set could happen because of the ability to join to all possible combinations. (culturalview.com/books/sql.pdf) Additionally some think the syntax of SQL is difficult, where it's believed that some of the syntax was taken from the COBOL language such as the use of keywords. As well as deleting or updating more rows in a table than a user initially wanted to because the "WHERE" clause was constructed wrong.
"SQL Injection flaws are created when a developer creates software that uses dynamic database queries that includes input supplied by the user. Avoiding SQL injections is easy. The developers of the software need to either stop using dynamic queries and /or prevent the user from inputting code that contains malicious SQL that affects the logic of the executed query." (1) There are 3 primary techniques and 2 additional ways to defend against SQL injections. The 3 primary defenses are Use prepared statements, Use stored Procedures, and Escape all User Supplied Input. The additional Defenses that work best when combined with a primary defense. The two additional Defenses are Least Privilege and White List Input Validation.
The first of the Primary defenses and probably the most important of them all is the use of prepared statements or parameterized queries. "This technique is how all developers should first be taught how to write database queries." (1) Parameterized queries will force the developer to define all the SQL code, and then pass all of the parameter to the query later. (1) This coding allows the database to distinguish between code and data; regardless of what user input is supplied. (1)
What prepared statements do is ensure that an attacker is not able change the query. This is even if an attacker tries to insert malicious code. An example of this is if someone were to enter jak'or'1'='1 in a user Id field the parameterized query would not work because it would take the user id the attacker inputted and take it as a literal string. It would search the entire database of users for jak'or'1'='1 and won't find anything. Some developers like Prepared Statements because all the SQL code is in application only. It makes the database independent from the applications. (1)
The second Primary defense is Stored Procedures. "This procedure also makes the developer write the SQL code first and makes them pass the parameters last." (1) Really the only difference between this and prepared statements is that the stored procedures get stored in the database, and then it gets called by the applications. "A benefit when using stored procedures is that one can restrict user accounts to allow them only access to stored procedures." (1) Another benefit to the stored procedure is that in most cases it gives better performance because all the SQL code is in one spot. Both are excellent techniques but there is a potential flaw with stored procedures.
Whenever you use stored procedures there is a risk that a developer could create a dynamic query inside of a stored procedure. It is rare that it would happen but none the less, if it were to happen it would be susceptible to SQL Injection. However, if you can't avoid using dynamic queries in your stored procedures it is best to validate or us a proper escape. (1)
The third primary technique is called Escaping all User supplied Input. "This is for people who think that the other 2 techniques would break their applications." (1) This type is mainly used to modify legacy code. "Every Database Management System supports at least one type of character escaping scheme." (1) This escapes all special characters; this indicates that the characters being inputted into the fields are meant to be data, and not malicious code.
There is one other method that is kind of controversial they call it magic quotes. This is exclusive to the PHP language. This is more to help beginners strengthen their code. Magic quotes when it is on all the single quotes, double quotes, backslashes or null characters will be escapes automatically with a backslash. Magic quotes are not very good portability wise, it also has huge performance issues and it messes with data that doesn't need to be escaped. That is why it is controversial.
There are three reasons that magic quotes aren't good to use. The first and probably most obvious is the "scripts made with magic quotes don't work if the server doesn't have the feature enabled." (2) Magic Quotes has huge performance issues because it wastes a lot of processing power since not all the data gets entered into the database. The last reason magic quotes aren't good to use, because it is really inconvenient there are so many extra slashes that is added when submitting a form for example. Knowing all this it doesn't really matter anymore since PHP in version 6 is getting rid of magic quotes. (3)
Now we get into some additional defenses. The first one is Least Privileges. This may be the most obvious defense, this means a user should only be able to access the information that they need to do their job. "One thing you should not do is assign the DBA or the administrator rights to your applications." (1) When setting up rights to the database it is best to strip all rights from the account and add rights as you go. This could save time since you won't have to look through all the rights and decide which ones you would have to take away.
The second additional defense is the white list Input Validation. It is always a good idea to use White List validation on any field that requires user input. White List Validation defines what is authorized to be put into the input fields, and everything that is not on the list isn't authorized into the fields. (1) If you have data that has a distinct structure such as dates, zip codes, email, and social security numbers the developer should have no problem defining a strong validation pattern using regular expressions. (1) It isn't always easy to use white list validation on user input fields like text areas for things like blogs. "These can be validated to a degree by excluding all non-printable characters and define a number of maximum characters the text area can hold." (1)
The Final prevention technique is editing the form components. This will restrict the amount of characters that you can input into a field. For example, using a login name field you could limit the number of characters in the field to 7 - 12 characters. Doing this would limit what an attacker can do. It will not prevent an attack but it will make it will make it harder for the attacker to significantly damage your database. (2)
In conclusion there are many ways to protect your database from SQL injection. Now a day's most developers when starting new project go with prepared statements to protect themselves from attack. However, if you are looking to update legacy code you would want to Escape all user supplied input. Also in addition to the primary defenses against SQL injection it is recommended that you use White list Validation which is Regular expressions and/or utilize in your database the least privilege rule.