Software Protection With Function Hiding Using Obfuscation Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Abstract: Application service provider (ASP) is a business that affords computer-based services (small and medium sized businesses) to clients over a network. The usual ASP sells a large application to large enterprises, but also provides a pay-as-you-go model for smaller clients. One of the main problems with ASP is the insufficient security to resist attacks and guarantee pay-as-you-go. Function hiding can be used to achieve protection for algorithms and assure charging clients on a per-usage bases. Encryption functions that can be executed without prior decryption (function hiding protocol) give good solution to the problems of software protection. Function hiding protocol face a problem if the same encryption scheme used for encrypting some data about the function and the output of the encrypted function, in this case, the attacker could reveal the encrypted data easily.

This paper aims to develop software protection system based on function hiding protocol with software obfuscation to overcome function hiding problems. The protocol allows charging clients on a per-usage basis (pay-as-you-go) and satisfy both confidentiality and integrity for ASP and client.

1. Introduction

ASPs has evolved from the increasing costs of dedicated software of small to medium sized businesses. With ASPs, the costs of such software can be lowered and the problem of upgrading have been reduced from the client by placing the services-upgrade responsibility on the ASP. There are several forms of ASP business for instance: functional ASP distributes a single application, such as credit card payment processing or time-sheet services; an enterprise ASP delivers broad spectrum solutions; a local ASP delivers small business services which provides a pay-as-you-go mode. To provide an ASP offering, the vendor must also provide a secure product [IBM red book2001, weight paper]. One of the approaches that could be used to assure charging clients on a per-usage bases and provide certain level of security is through function hiding protocol. The key point of function hiding is to encrypt a special class of functions such that they remain executable and produce encrypted result to prevent clients from copying and using the program without paying anything for it.

In function hiding protocol, the client executes the protected program with encrypted coefficients, he will not get the clear-text results until he sends them to the producer (who charges the client) to decrypt them and send clear-text result back to the client. The encryption technique used is probabilistic (Goldwassr-Micali) with two supporting algorithms (Plus and Mixed-mult) that are used to allow encrypted function to be executed without needing to prior decryption[Sander98].

Function hiding protocol needs to guaranty the secrecy of its coefficients especially when the same key is used for encrypting the coefficients of the function and the output of the encrypted function, which allows the attacker to reveal the encrypted coefficients easily This function hiding protocol might suffer from coefficient attack problem: Instead of sending outputs of the program to the producer, the client (attacker) sends the encrypted coefficients that he finds in the program. The client may even scramble them by multiplication with some random quadratic residue such that producer can not recognize these values as the hidden function coefficients. According to the protocol, the producer has to decrypt them and thus would hand him out the main information that should be kept secret[Sander98]. This is a general problem that function hiding schemes have to face. In order to hide secrets of software implementation (i.e. the hidden function coefficients), an obfuscation technique has been used.

Finally, to provide security to the clear-text results generated by the producer before transmitting them to the client, the clear text results are encrypted using public key (its private key known only to the client), and then encrypted with private key of the producer (to prove the authenticity of the service provider "producer").

In this paper, a detailed description of the implementation of the function hiding process is given, with 10 algorithms are written to build the developed protection system. In addition to the used obfuscation technique. This system is tested with three different applications and proved secure and successful. The tests are carried on stations of a LAN.

2. Software Protection via Function Hiding

Main applications for code privacy is found in the software industry and with service providers that seek for methods to make copying or learning proprietary algorithms technically impossible. For instance, for ASP and mobile software agents (designed to be executed on different hosts with different environmental security conditions), it is important to provide protection against various attacks such as unauthorized access to private data, malicious modification of its code etc. Function hiding can be used to accomplish software protection against disclosure and ensures that only licensed users are able to acquire the clear-text output of the protected software[Hacini 2006v23, Dev 98,redbook]. The basic steps of function hiding protocol is illustrated in figure (1)[Dev 98]:

Figure 1 A basic protocol for executing encrypted functions

Let E be a mechanism to encrypt a function f implemented in a program P where Alice producer and Bob is client:

1) The producer encrypts f and creates a program P(E(f))

2) Producer sends software P(E(f)) to the client

3) Client executes P(E(f)) with input x and sends the result (y) back to the producer

4) Producer decrypts (y), obtains f(x) and sends the result back to the client

Based on the above protocol, software producer can charge clients on a per-usage basis. To implement such a technique, additive homomorphic scheme could be used to enable hiding of a polynomial function in a program.

3. Public-Key and Probabilistic Public-Key Systems [Buc01][Men 97]

Public-Key crypto system is introduced by Diffie and Hellman in 1976. In such system a user A has a public encryption transformation EA with a public key (PA) saved in a public key directory to be used by others to encrypt messages before send them to A; and a private decryption transformation DA used to decipher the received messages, known only to user A. Secrecy and Authenticity are provided by separate transformations.

The public key crypto systems RSA and Knapsack schemes are deterministic in the sense that under a fixed public key, a certain plain text m is always has some or one of the following:

1- The scheme is not secure for all probability distributions of the message space.

2- It is sometimes easy to compute partial information about the plaintext m from the cipher text c.

3- It is easy to detect when the message sent twice.

Public-key encryption scheme is said to be polynomially secure if no passive adversary can, in expected polynomial time, select two plaintext messages m1 and m2 with probability significantly greater than 1/2

Public key encryption scheme is said to be significantly secure if, for all probability distributions over the message space, whatever a passive adversary can compute in expected polynomial time about the plaintext given the cipher text, it can also compute in expected polynomial time without the cipher text.

The probabilistic public-key encryption[Men97] has some differences from the Public key cryptosystems, these are, the encryption decryption operations are performed on binary numbers, quadratic residue principle and Jacobi symbols are used to get the public key and does not produce the same encrypted result when repeating the encryption operation more than once , so it is none deterministic.

4. Mathematical Background

In this section, mathematical principles needed in the implementation of the proposed system are illustrated.

4.1 Quadratic Residue [Men97]

"Let aZ*n, a is said to be a quadratic residue modulo n, or a square modulo n, if there exists an xZ*n such that x2 ≡ a(mod n). If no such x exists, then a is called a quadratic non-residue modulo n. The set of all quadratic residues modulo n is denoted by , and the set of all quadratic non-residue is denoted by ."

4.2 Rings [Joh82]

"A ring <R, +, .> is a set R together with two operations + and ., which is called addition and multiplication respectively, defined on R such that the following axioms are satisfied:

R1: <R, +> is an Abelian group,

R2: multiplication is associative,

R3: for all a, b, c R, the left distribution law, a(b + c) = (ab) + (ac), and the right distributive law, (a+b)c = (ac) + (bc), holds."

4.3 Relatively Prime Numbers[Men97]

Two integers a and b are said to be relatively prime or coprime if gcd(a, b) =1, where gcd is the greatest common divisor.

4.4 Legendre Symbol and Jacobi Symbol [Men97]

The Legendre symbol is a useful tool for keeping track of whether or not an integer a is a quadratic residue modulo a prime number p.

Let p be an odd prime and a is an integer. The Legendre symbol is defined for a≥0 and p odd prime where:

Let n ≥ 3 be odd with prime factorization n = . Then the Jacobi symbol is defined to be:

Observe that if n is prime, then the Jacobi symbol is just the Legendre symbol.

4.6 Additively Homomorphic Encryption [Dav98][2008/378]:

Let R and S be ring function E: R  S is called additively homomorphic if there is an efficient algorithm Plus to compute E(x+y) from E(x) and E(y) that does not reveal x and y.

4.7 Polynomial Rings[Dan96]:

"If R is a commutative ring, then a polynomial in the indeterminate x over the ring R is an expression in the form:

F(x) = a0 + a1x1 + a2x2 + a3x3 + a4x4 + ………..+ anxn

Where each ai .R and n≥ 0. The element ai is called the coefficient of xi in f(x). The largest integer m for which am ≠0 is called the leading coefficient of f(x). If f(x) = a0 ( a constant polynomial) and a0 ≠ 0, , then f(x) has degree 0. If all the coefficients of f(x) are 0, then f(x) is called the zero polynomial and its degree, for mathematical convenience, is defined to be -∞. The polynomial f(x) is said to be monic if its leading coefficient is equal to 1.

Each polynomial is composed of a number of monomials. A monomial in x is an expression of the form: axn. Where a and x are integer numbers. The number a is called the coefficient of the monomial. If a ≠ 0, the degree of the monomial is n.

5. The Developed Function Hiding System

We developed a system based on the model shown in figure 2. The details of our system is illustrated in Figure 2. In the figure, E is the encryption function, F is the function to be protected, E-1 is the decryption function, and R is the result. The two functions Mixed-Mult, and Plus are the functions that are used to support the operation of function hiding.




Key obfuscation

P O(P)




Private key
















Figure 2 the proposed protocol for executing encrypted functions

Let E be a mechanism to encrypt a function f implemented in a program P:

The producer encrypts f and create a program P(E(f))

Producer perform obfuscation on program P and produce O(P)

Producer sends software O(P(E(f))) to the client

Client executes O(P(E(f))) at the input x and sends the encrypted result (E[R]) to the producer

Producer decrypts (E[R]), obtains R

To provide security for client results, encrypt R with public key of the client and produce R`

To prove authenticity of the producer, then encrypt the R` by private key of producer and generate R`` and sends the result back to the client

The main steps that are used to construct the function hiding system are illustrated in Algorithm 1 shown below. Other functions are called within this algorithm in order to accomplish the function hiding process.

Algorithm 1: (Function Hiding Model)

Let F: be the polynomial:

a0 + a1xl + a2x2 + a3x3+ ………+ anxn

In order to hide this polynomial, the following steps are performed:

Step1: Encrypt each coefficient (a1, a2, a3, … , an) using algorithm 7 (Goldwasser-Micali probabilistic public-key encryption) to get E(a1), E(a2), E(a3), …, E(an). Where each element E(ai) represents a set of numbers resulting from encrypting each binary digit of the coefficient ai.

Step2: Compute x1, x2, x3, …, xn.

Step3:Compute the result of each monomial i.e. E(an) xn using algorithm 9 (Mixed-Mult) and store the results in an array M; where each monomial is stored in a single cell of M.

Step4: Add-Up the elements of array M using algorithm 10.

5.1 Encryption- Decryption Moduls:

Step 1 in algorithm 1 encrypts the coefficient of the polynomial. I this section we describe the design of algorithm that implements Goldwasser-Micali encryption method.

Algorithm 2: (Greatest Common divisor calculation GCD)

Input: Two non - negative integers a and n where a ≥ n.

Output: The GCD of a and n.

Step1: While n ≠0, Do

Set γ  a mod n, a  n, n  γ.

Step2: Return (a)

Algorithm 3: ( Z*n Calculation)

Input: n; such that n is an integer

Output: A set of integers such that any integer a  [0, …, n-1],

where gcd(a,n) = 1.

Step1: Specify Zn = [0,…, n-1]

Step2: For each a Zn, Do

If gcd(a,n) = 1, then add a to the set of Z*n

Algorithm 4: (Jacobi and Legendre Symbol Computations)

JACOBI(a, n)

Input: An odd integer n ≥ 3, and an integer a, 0≤ a ≥ n.

Output: The Jacobi symbol ( and hence the Legendre symbol when n is prime)

Step1: If a = 0 then return (0).

Step2: If a = 1 then return (1)

Step3: Write a = 2e a1, where a1 is odd.

Step4: If e is even then set s  1.

Otherwise set s  1 if n ≡ 1 or 7(mod 8),

set s -1 if n ≡3 or 5(mod 8).

Step5: If n ≡ 3(mod 4) and a1 ≡ 3(mod 4) then set s  -s.

Step6: Set n1  n mod a1

Step7: If a1 = 1 then return (s);

otherwise return (sÃ- JACOBI( n1, a1))

Algorithm 5: (Quadratic Residue Modulo n Test)

Input: n, an integer

Output: Set of Quadratic residue Module n numbers.

Step1: Compute Z*n; using algorithm 3 above.

Step2: For each a  Zn Do;

Step3: If(x2 - a) mod n = 0  add a to the quadratic residues modulo n set; where x is any other integer such that a  Zn.

Algorithm 6: (Key Generation for Goldwasser-Micali Probabilistic Public Key encryption.)

Step1: Select two large prime numbers p and q randomly, where they should be roughly the same size (number of digits)

Step2: compute n = pq

Step3 Select an integer y  Zn such that y is a quadratic non-residue modulo n and the Jacobi symbol ….. = 1, using algorithms 4 and 5.

Step4 The public key of user A is (n,y); and the privet key is the pair (p,q).

Algorithm 7: (Goldwasser-Macli Probabilistic Public-Key Encryption )

This algorithm encrypts an integer m into t-tuple, where t is the number of binary digits of the integer m.

User A encrypts an integer m for user B, and then B will decrypt this integer.

A should perform the following steps

Step1: Obtain B's authentic public key(n,y), using algorithm 6.

Step2: Represent the message m as binary string m = m1 m2 ….mt of length t.

Step3: For i= 1 to t Do

a. Evaluate Z*n using algorithm 3

b. Pick an x  Zn at random

c. If mi = 1 then set ci  yx2 mod n; otherwise set ci  x2 mod n

Step4: Send the t-tuple c = (c1, c2, …, ct) to B.

Algorithm 8: (Goldwasser-Micali Probabilistic Public-Key Decryption)

This algorithm takes t-tuple and transforms it back to an integer m; where m is the clear text. To recover the plaintext message m (of length t bits) from c, user A should do the following

Step1: For i= 1 to t Do

a. Compute the Legendre symbol ei = ---- . Using algorithm 4.

b. If ei = 1 then set mi  0; otherwise set mi  1.

Step2: The decrypted message is m = m1 m2 ... mt.

Algorithm 9: (Mixed-Mult Computation)

Input: integer variable x (having b binary digits, such that x = x1...xb) and an encryption of coefficients a; E(a).

Output: list ( M ) of encrypted integers.

Step1: For i = 1 to b Do

a. If xi = 1, then compute E(a2i), using algorithm 3

b. Put the result in list M

Step2: Add-up elements of list M using the plus algorithm (algorithm 10).

Algorithm 10: (Plus Computation)

This algorithm adds up the monomials of the encrypted polynomial:

where each Pi is a list (M) obtained by algorithm 9.

Step1: Pick a random number x from Z*n, let c = x2 mod n.

Step2: For j=1 to b, Do steps 3-5; where b is the number of binary digits of each number a.

Step3: Sum[j] = P1[j] . P2[j] mod n

Step4: Sum[j] = s Sum[j]. c mod n

Step5. If P1[j] and P2[j] ≠ x2 mod n, then c = y.x2 mod n.

Step6: For i = 3 to m, Do steps 7; where m is the number of monomials in the polynomial.

Step7: For j = 1 to b, Do steps 8-10

Step8: Sum[j] = Sum[j] . Pi[j] mod n

Step9: Sum[j] = Sum[j] . c mod n

Step10: if Sum[j] and Pi[j] ≠x2mod n, then c = y . x2mod n.

6. The Realistic Threat Model

When security mechanism is required to achieve security goal, it is important to illustrate the realistic threat model, which points up what a cracker is able to do. Crackers knowledge and resources could be discriminated based on:

Algorithm understanding level of the used protection mechanism: the cracker know the cipher algorithm, but not the secret information such as the secret key.

Level of system observation skill: the cracker owns a binary file, disassembled code, decompiled code of P, as well as a computer system M in which P is executed. The cracker has a debugger with breakpoint functionality that can watch internal states of M, e.g. memory snapshot of M, audio-visual outputs of M and the input and output value of P. The cracker also monitor the execution trace of P, i.e. a history of executed opcodes, operands system control skill [118, 20080910].

system control skill level: when program P is executed on computer system M, the cracker controls the mouse and keyboard inputs of M and run P with an arbitrary input. values. The cracker can change the instructions and the operand values in P, in addition to the memory image of M, before and/or during running P on M.

In this work, the expected threat model based on reverse engineering at which a cracker may have binary program (executable program), understanding the principles of the used algorithm. Also, assume that the cracker has a static analyzer such as a disassembler and a decompiler, as well as a debugger (dynamic analyzer). In other words, the expected cracker has both algorithm understanding and observation skill that allows him to extract the encrypted operands of the hidden function.

In order to hide secrets in software implementation, number 0f obfuscation techniques have been proposed based on the expected threat model.

7.Software Obfuscation

Software obfuscation has become a vital mean to hide secret information that exist in software systems. Obfuscations transform a program P to obfuscation program O(P) so that it is more complex and difficult to understand but it is functionally equivalent to the original program [wbc,118-8]. The most popular obfuscation techniques[20080910, 1152008,1162009 ].:

lexical obfuscations (e.g., comment removal, identifier renaming and debugging info removal, etc.),

data obfuscations (Data obfuscations thoroughly change the data structure of a program and encrypt literals including modifying inheritance relations, restructuring arrays, etc.. They make the obfuscated codes so complicated that it is impossible to recreate the original source code.)

control-flow obfuscation: obfuscates the layout and control flow of binary code. Many obfuscation techniques use opaque predicates to forged infeasible control flow, and then insert fake code that obfuscates the control and data flow

To overcome the expected threat model (illustrated in the previous section), two obfuscation techniques are used: lexical obfuscator, and changing data type obfuscator for a chosen variables (variables that represent the encrypted hidden function operands) from long-term to short-term to make the data obfuscation complicated. The used approach is as follows:

1. Parse the source program (unobfuscated program) to remove comments and find all tokens of the program.

2. find and keep all program variables through analyzing the tokens, perform variable renaming, then

3. Choose the variables that are important to obfuscate. To obfuscate variables, choose splitting or extending method and convert them into array of short term variables [0742004,2008].

The resulting program is obfuscate program O(P). For further security, white-box cryptography [wbc] could also be used.

8. System Testing

To illustrate the main steps of hiding a function, some typical problems that need polynomial evaluation are used for system testing. These test problems are Horner's method for fast polynomial evaluation, A specific polynomial with multi-variables and Surface area calculation for specific shapes.

Horner's method protection: Horner's method computes the result of a polynomial in an efficient and fast way. To solve the polynomial a0+ a1 x+ a2x2+ ……+ anxn, it is transformed into another form to facilitate fast calculation of the polynomial for given x. This form is: a0 + x( a1+x(a2+x(a3 + …..+ x(an-1x(an)))))

Assume that the polynomial to be solved is 7 + 4x + x2 + 6x3 + 3x4. The following steps will be follows:

1- The software user will present the coefficients values to the system (7, 4, 1, 6, 3) and also present the value of the variable x, for example x = 2.

2- Let the value of p= 13, q=23, then p*q = n = 299

3- Algorithm 3 calculates Z*n for n =299, this will generates the set Z*n = {1, …,298}; where {13, 23, 26, 39, 46, 52, 65, 69, 78, 91, 92, 104, 115, 117, 130, 138, 143,156, 161,169, 182, 184, 195, 207, 208, 221, 230, 234, 247, 253, 260, 273, 276, 286}  Z*n.

4. Algorithm 5 used to check Quadratic Residue Modulo n, to find the Quadratic residue set ={1, 3, 4, 9, 12, 16, 25, 27, 29,35, 36, 48, 49, 55, 62, 64, 75, 77, 81, 82, 87, 94, 100, 101, 105, 108, 116, 118, 121, 127, 131, 133, 139, 140, 142, 144, 146, 147, 165, 170, 173, 179, 185, 186, 192, 196, 209, 211, 220, 225, 231, 233, 235, 243, 246, 248, 256, 257, 259, 261, 269, 277, 282, 285, 289}, and hence the Quadratic non-residue set ={x| where x  Z*n-Quadratic Residue modulo n}.

5. The value for the public key generated by algorithm 6, with given no quadratic-residue set. calculated by algorithm 5 and n = 299, is y = 5.

6. Algorithm 7 is used to encrypt the last polynomial coefficient (3), after transforming it into a binary representation (110000000000000), with inputs Z*n, n, and y. The output from the algorithm will be a list of integers (each bit is encrypted separately), an integer for each bit.

7. Table 1 shows the result of applying (Mixed-Mult) and (Plus) algorithms (9 and 10).

Table 1 the encrypted result of Horner's method





{165, 145, 80, 101, 257, 144, 105, 257, 105, 3, 3, 133, 55, 231, 220} = R1

E(3 )Ã- 2

E(a4), x


{35, 116, 7, 249, 9, 261, 87, 144, 55, 289, 185, 25, 284, 105, 29}= R2

R2+ E(6)

R1, E(a3)


{62, 209, 108, 80, 275, 277,186, 173, 139, 1,3, 100, 81, 4,211} = R3

R2 Ã-2

R2, x


{189, 196, 256, 249, 45, 261, 87, 144, 55, 289, 185, 289, 284, 105, 29} = R4

R3 + E(1)

R3, E(a2)


{173, 19, 108, 82, 249, 37, 94,256, 139, 185, 131, 95, 146, 142, 75}= R5

R4 Ã- 2

R4, x


{35, 83, 7, 289, 45, 109, 87, 144, 55, 289, 185, 289, 248, 105, 29}= R6

R5 + E(4)



{269, 289, 176, 122, 196, 19, 63, 144, 101, 94, 231, 146, 256, 131, 257} = R7

R6 Ã- 2

R6, x


{189, 281, 221, 289, 45, 109, 136, 144, 55, 289, 185, 289, 248, 105, 29} = R8

R7 + E(7)

R7, E(a0)


The values in the result field of the last row of the table are the encrypted coefficients, these are sent to the producer to decrypt them and send the result back to the user. Another two examples are illustrated in appendix a

7. Conclusions:

Software piracy is a major financial problem for ASP where small enterprises can sell software on a per-usage basis. This paper concerned with suggesting an approach that makes use of function hiding technique to achieve protection of algorithms against revelation, and guarantee charging clients on a per-usage basis. Moreover, we describe a protocol that ensures, under certain conditions, that only licensed users are able to gain the cleartext output of the program, and also provides secrecy and integrity for both ASP and client. These results are applied to a special class of functions for which secure and computationally feasible solution are to be obtained: the key point is to encrypt functions such that they remain executable. We further improve the secrecy of the system by making reverse engineering a difficult task. This was accomplished by using both lexical obfuscation and changing data type obfuscation (changing the data type of variables, from long-term to short-term or short-term to long-term, to protect important variables of program) methods to hide any secret in a program. As future work, improve obfuscations using a WBC, and evaluation of the proposed framework with other programs.