This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
Dealing with voluminous amount of data everyday, it is important to know whether the data is well protected and the implication of misuse of data. Data that are stored in database are susceptible to these vulnerabilities. In this paper, I am going to summarize a type of security issues faced by multi level secure database which is the problem of inference.
Inference can be described as a phenomenon where low level information is collected to infer the data that is of high sensitivity. Various types of inference attacks are explained by example and measures that are used against inference are discussed. In addition some methods which are being implemented to safeguard the database against such attacks are also discussed.
The database security consists of different types of control mechanism implemented to protect the confidentiality, integrity and availability of data. Databases often store data that are sensitive in nature. For example, in a database system of an organization, the salary of an employee should be accessible by the managers but should be hidden from other co-workers, in this case, a multilevel security is required defining which data is available to whom. The incorrect use or loss of such data may harm the business operations negatively.
The goals of database security are:
a. Integrity: Only authorized user should be able to modify data.
b. Availability: Authorized user and application should have uninterrupted access to the data.
c. Confidentiality: Protection of data from unauthorized disclosure.
The security risks to the database include:
1. Unauthorized access to sensitive data. (eg: User without privilege to view or manipulate data misusing the stored data intentionally or accidently)
2. Leakage, loss or misuse of data caused by virus and malwares.
3. Data corruption due to the entry of invalid data and commands.
To control these threats to the database various measures are implemented. Author of the book 'Fundamentals of Database System' Elmasri and Navathe classifies security measures into four categories as:
1. Assess Control
2. Inference Control
3. Flow Control
4. Data Encryption
Access Control: A robust database should be able to prevent unauthorized user from accessing the database for viewing or for manipulation of data. Creating user accounts, roles and password is one way of restricting access to data. The operations such as creating roles, username, password comes under access control.
Inference control is used in situation where the user should only be permitted to access the inference or summary of result but not allowed to examine detail confidential information of individual data. This type of security control is seen in situations where the database keeps the records of research statistics. The result of the research might be available to everyone, but if anyone tries to get the information of the individuals taking part in the research he/she should be prohibited.
This security measure prevents the flow of information such that it reaches the unauthorized users.
It is a security measure used to protect the sensitive information transmitting through a communication channel. In this scheme, the data is first encoded using encryption algorithm, into some scripts that is unreadable to common users, those encoded data is transferred through communication channel to other party, the receiving party decodes the data using a key. Unauthorized user not having the key is unable to decode the message.
In distributed environment, number of application is simultaneously accessed by users having different levels of access from various places. Some example is: User can access his/her bank account via internet to perform various banking transactions. User can buy/sell things online proving credit card information. A doctor access patient's record via internet to view and modify the patient's treatment.These types of operations requires a high level of security which guarantees privacy, integrity, confidentiality. Inappropriate disclosure of such information can have several legal consequences. Inference Problem in database security
Database should allow for open and easy access to data but on the other hand, sensitive information is also stored in the database which should be protected against unauthorized disclosure. The sensitive data can be protected against direct access using username/ password, views and roles, but these techniques can not prevent against indirect access. Accessing data through indirect access takes us to the inference problem. When prohibited information is inferred from the database using the available information, it is called inference problem.
Let's assume a table named employee in the database. It holds the general information of the employees and their salaries.
If we are selecting the average salary of female employees, the data is not the insensitive one. But, if we wanted to find out the salary of Sheila Kelly, and run a direct query the DBMS will reject the query considering it sensitive.
But, if we list all the employees of the company and the city where they come from. We can rewrite the query asking to give the average salary of women coming from Mankato. If Sheila Kelly is the only female employee coming from Mankato, then the average salary will be her salary.
Select name, city from employees where sex='F';
Select avg (salary) from employees where sex='F' and city='Mankato'
These two queries which actually disclose insensitive data when executed individually, but when put together can disclose the sensitive information. This is called Inference Problem.
Row Level Security Using Views:
Inference problem can be overcome by using 'Row Level Security' using views. Views can be created for every user in the database so that the user can only work on his view. When updates are made in the views, the same are replicated in the database. Using views restricts the user to access the data at the row level which actually means to allow the user to access only the data that is relevant to him.
Creating view for every employee in the case of thousands of employee is not the feasible choice always. Thus another solution to Inference problem is Virtual Private Database.
Virtual Private Database:
In virtual private database (VPD), the submitted query is rewritten by appending some predicate which depends upon the user's session attribute and other application context. The VPD is located on the server side of the database. It generates sub queries to be appended to the WHERE clause of the submitted query. Now, the modified query is executed.
To explain further, the query generated by the VPD can be as simple as a string which is based on the username.
If the username is DBA(Database Administrator), an empty string (predicate) is appended to the original query and executed, where as if the user is normal user, the corresponding string is appended to the WHERE clause in the original query and executed.
Considering the same example stated above lets see what happens with VPD.
If the user is DBA and he executes the following query:
Select avg(salary) from employees where sex='F' and city='Mankato'
Since the user is DBA, no sub query is appended so the query executes as it is.
If the user 'Bob Englehorn' executes the same query, the query is modified by adding predicate as:
Select avg(salary) from employees where sex='F' and city='Mankato' and user = 'Bob Englehorn';
No records will be returned for the query.
Column - wise Masking VPD
The two techniques described above restrict the number of rows returned by the query. In this technique, the columns of the database are categorized as sensitive column. Now, those column values are masked while executed by particular user.
For the same table of employees we used above, if the column salary is defined on the database as sensitive column, then the query executed by Bob Englehorn would return:
Select name, salary, city from employees;
Name Salary City
Sheila Kelly Mankato
Bob Engehorn $60,000 Minneapolis
The salary of Sheila Kelly has been masked.
Inference from data combined with metadata
Key Integrity Problem: This type on inference occurs when the data retrieved from the database is combined with the constraints used in the database. A user in low security class can used the data returned from the query to deduce information from higher security class. This type of situation is explained in following example.
Suppose the ship transportation system uses the cargo table to keep the information of all cargo hold on all outbound ships.
If a user TS in Top Secret class requests the information, he would see all the cargos. Following the security rules, the data in higher security class is hidden from the lower class. So, if an unclassified user U comes and requests same information, he would only see the cargo in A and B. The user U assuming that the cargo in C is empty wants to include sugar in the cargo C. Hence, he issues the insert command. But the insert statement will fail because of unique constraint. In such case, either the DBMS should delete the existing tuple or inform the user that the tuple can not be inserted because tuple with such key already exist. In both case there is a problem. Considering these all information, User B can infer that the ship no 2001 has some secret shipment and can find out source and destination from other tables getting enough information about the secret shipment.
This type of problem can be handled by using polyinstantation. In polyinstantation, the classification column is also included in the unique constraint. Following that will allow records with various classifications to exist in the same table. User U will never be aware of the shipment but here the shipment containing sugar will be stranded at the airport.
Inference channel also occur because of functional and multivalued dependency constraints in the relation. If the functional dependency is known to the normal user, the user can use his knowledge to predict the secret data.
A table EMPLOYEE_SAL has the data like name, position and salary of employees in a company. The name and position are the non sensitive data so they are visible to everyone but salary is sensitive and is hidden. But, everybody in the company is aware that position determines the salary. In this situation, any employee who knows the position can determine the salary also.
This situation occurred because the salary is functionally dependent to rank. An alternative way to address such situation is to have position also classified as sensitive information and make hidden. Thus, before assigning security labels, the functional dependency between the attributes in security labels should also be checked.
The value constraints defined for attributes limit the value it can represent. Thus, using the value constraints in database can lead to inference channel in database.
Let us assume that A and B are two columns of a table. A constraint is defined on the database about the addition of A and B, such that A + B < 20.
A is unclassified and B is secret attribute. The condition A+B < 20 is also unclassified. In such situation user can predict the value of B.
The solution to the problem is to never define the constraint in various security levels. If A is also made secret attribute, the problem is solved. The other solution is to split the condition involving two variables to condition having only one variable. Thus, if A<=10 and B< = 10 are the two conditions defined, the inference channel is blocked.
Detection and Removal of Inference Channel
There are mainly two techniques used to detect and remove the Inference Channel.
Design Phase: Some techniques are designed which detect the inference channel in the design phase. Semantic data modeling is one example which is used to detect the inference in the design phase. If inference channel is found in the design phase, the database is remodeled removing the inference channels detected. As described in Hinke(1995), a graph is constructed to find out the inference channel in the database. Each attribute in the database is represented by nodes and relationship between nodes is represented by edges connecting two nodes. If attribute X implies Y, a edge is drawn from X to Y, and another edge from Y to X. If two paths are discovered from X to Y then the possibility of inference channel is said to be detected. Such inference channel is further investigated to see if it is the unsafe one, if found so the edges are split into two or more if possible. This technique serves well in situations where there is less interrelated data but in cases where there ate many related data, the process becomes very time consuming.
Another techniques in semantic data modeling is described in Jajodia(1995) using PINFER(X,Y) function. It determines the probability that one can infer Y given X. The PINFER function is evaluated by an expert. Also, fuzzy logic is used to determine other probability that one can infer Z from X.
Query Phase: Other techniques are designed such that if they detected the possibility of inference in database, they stop the query from executing or simply modify the query. This means that this technique detects the inference channel both in data and schema level.
Mazumdar(1988) uses a technique which can be used to determine the security of database. They propose a theorem which evaluates if the secret of the system can be deduced by constraints of the database, the input to the transaction and precondition of the transaction.
A different technique is suggested by , According to the level of information that can be inferred from the data, a set a data is classified by classification constraint.
When a query is submitted to the system, the system upgrades the result of the query to the level constraint previously determined and returns the result. A history mechanism can also be added to the system which collects information about previously issued transactions and raises an alarm if the user is trying to gather enough information to infer other queries.
For example, if X and Y are used to infer Z. Z is secret attribute but X and Y are not. So, if one user issues a query to see value X, and then issues another query to see value of Y, the user is prevented to see Y; assuming he would be able to deduce Z.
Vulnerabilities in Database:
Some guidelines can be followed to make sure that the database is robust against the inference attack.
Inconsistent classification of security for replicated data: Regardless of the knowledge that replicated data should be avoided in the database, some of them are hard to remove completely. If replicated data occur in database, the security classification of those data should be similar. Same column should not be categorized as unclassified in one table and top secret in other, because the column value can be easily obtained from other table.
Inadequately restricting data: The major reason why Inference attack occurs is because the data are inadequately restricted. The data that are vulnerable in inference attack should be identified by detail study and measures should be taken to ensure the data are restricted such that illegal inferences are not possible.
N- Item k-percent rule violation: N represents the number of columns returned by the query where as k is the percent. When ever a query returns N number of results, the number N should not exceed the percent k value set in the database.For example, if the user executed a query which has only one column as the result, those queries may be restricted because even though only one record is returned the percent is 100 here. With this condition it makes sure that where clause is not attached to aggregated query, which makes the data infer easier.
Unencrypted Index: Index is used in tables to make searching and executing query efficient. The database is encrypted but index could be left unencrypted, such index could be used to gather information about table and column name , which can lead to inference.
Methods of Inference Attacks
Out of Channel Attack: In this type of attack first some information is gathered from publicly available outside sources and same is used to attack secure database. For example, data mining is done in publicly available numerous sources to get hint of secure data. The same data is used in attacking secure sources. It is very hard to control the amount of data that is publicly available because people are using internet for their all activities and those things can all be collected to predict behavior of individual.
Direct Attack: When queries are executed on the target database directly to find out secret information the process is called direct attack. Database can be safeguarded against direct attack by classifying data by the level of security and controlling the access right.
Indirect Attack: In Indirect Attack the intermediate results are used to derive final information. Using statistical function or set theory to derive some information and using the result to infer secure information is one of the examples of indirect attack.
Current research trends on Security of distributed databases are as follows:
1. Multilevel security and conflict with data consistency
Multilevel security requirements conflict with the requirement of data consistency. Researchers are looking to integrate these problems.
2. Applying security tags and mandatory security policy to provide multilevel security.
3. Security for different views of the database
Researchers are also interested in security concerns regarding different views of the same database.
4. Centralized vs. distributed control
Researchers are looking at pros and cons of both centralized and distributed control for security purpose.
5. Security and distributed data mining
Distributed databases are vulnerable unauthorized access of certain information that can be inferred by a user who has access to the data that can be used to make a educated guess. Researchers are working on identifying user's motives and blocking users who wish to access sensitive information by accumulating freely available data.