Print Email Download Reference This Send to Kindle Reddit This
submit to reddit

Analysis and Detection Metamorphic Viruses

Chapter 1
1.1 Motivation

Metamorphic Viruses are very special type of viruses which have ability to reconstruct into entirely new offspring which is completely different than the parent; Main object to use these techniques to rebuild itself is to avoid detection by Antivirus Software. Although for the time being some well known metamorphic viruses are detectable, but it is predicted that in future we might face problem of similar viruses those would be capable of changing their identification and perform malicious tasks. Our objective in this thesis is to perform an in-depth analysis of metamorphic code, and evaluate some best practices for detection of metamorphic viruses.

1.2 Outline

This document has been divided into five chapters; first two chapters are for introductory purpose it provides basic information about viruses in Chapter 2 we have tried to give some details about virus evolution how metamorphic viruses came into existence. Chapter 3 includes detailed information about metamorphic Virus, Formal definition, Core components of Architecture and some explanations from a virus writer about metamorphic viruses. Chapter 3 deals with some of techniques which are being used by metamorphic viruses and what advantages these viruses have using those techniques. Chapter 4 contains different type of detection methodologies used to detect metamorphic viruses. It also contains sample code from different metamorphic viruses for their feature comparison.

Chapter 2
Computer Virus Introduction
2.1 Introduction

The term "Virus" was first described by Dr. Fred Cohan in his PHD thesis during 1986[1] although different type of computer malware where already exited that time but the term was specifically introduced by Dr. Fred. That's why in may research papers he is considered the father of Virus Research [2]. According to his formal definition as virus

"A program that can infect other programs by modifying them to include a possibly evolved copy of itself"[1]

Based on this definition we have taken some pseudo code of Virus V from his research [25].

program virus:=


subroutine infect-executable:=

{loop:file = get-random-executable-file;

if first-line-of-file = 1234567 then goto loop;

prepend virus to file;


subroutine do-damage:=

{whatever damage is to be done}

subroutine trigger-pulled:=

{return true if some condition holds}



if trigger-pulled then do-damage;

goto next;}


This is a typical example of a computer virus, we can divide this virus into three major parts first subroutine which is infect-executable it tries to look for and executable file or any other target file which it wants to infect it contains a loop which tried to append the virus body to with the target file. Second subroutine do-damage is the virus code its self for which virus has been written this is called virus payload upon execution it performs some damage to the system. The third subroutine trigger-pulled is some sort of trigger to execute the virus code it could be some condition based on date or system or file. Main code of virus is that once the condition is met we it should append itself to the target file and perform something.

If we evaluate this definition modern viruses cannot be considered as virus because there are several different type of viruses which are not performing any harm such as “Co-Virus”, their main target is to help the original virus by performing such tasks so the execution of original virus could be performed without being detected. Peter Szor has redefined this definition [2] as

“A computer virus is a program that recursively and explicitly copies a possibly evolved version of itself.”

This definition is also self explanatory, as the author suggest it recursively and explicitly search for the target files and then infect them with virus code to make possible copies. As we are all aware virus is special kind of malware which always requires a user attention to propagate such as either he access the infracted file or tries to execute infected files. Grimes[26] append this definition with boot sector information and other methodologies as Viruses are not limited to file infections only.

2.1.1 Different Type of Malware

In this section we will try to discuss some type of malware which like virus but they are not virus. This section is for information purpose only. Viruses its self could be of different kind based on their activity we can define their category, such as boot sector virus, File Infection Virus or some of advanced Macro Viruses which are used inside Microsoft Office documents to automate the process. Basically all virus follow the same process of infection which is described by Dr. Fred Cohen in V Sample Virus. We will define some of advanced code armoring techniques in Section 2.2. Trojans

Trojans are very famous backdoor malware some time they are not considered as virus as their main objective is to let attacker gain access to the target machine without getting noticed by the user. Their main objective is not only to gain access but it could be executing some sort of malicious code. Origin of their name is from Greek History where a giant horse was built to gain access inside the castle and transport soldiers through that horse. Same technique is used with Trojans they tricked users by displaying something on screen and behind it is doing something else. Trojan does not infect files or attach their code to other files usually they contain some sort to joiner utility which help users to embed their code or application inside the Trojan. Trojans can used to gain access to infected systems, mounting share drives or disturbing network traffic through Denial of Services attacks. Some famous examples of Trojans are Netbus, Subseven, Deep Throat ,Beast etc.

Some remote administration Trojans can have their client side which can be used to communicate to the infected computer. Above image is Client side of Beast Trojan which can perform so many operations on the target machine once it is connected. Spyware and Adware

Spyware are very common problem of today's internet user. They are used to get information about users and monitor their activity with or without his knowledge. Till now antivirus companies are unable to define detection and removal of spyware software because there are some famous companies who are selling spyware software to monitor user activities and they are getting legal support to protect spyware from getting removed by antivirus. With spyware it is quite possible that without user knowledge they transport all user information and activities to some monitoring email address. There is some sort of spyware which are only used to get all key press events by users whatever he is typing or writing in email or entering password. It will be recorded and based on the software settings it can be sent to email or saved on disk.

Adware are slightly different than malware they collect information about users internet activity and based on that they tries to display target advertisement to the users or install some software on users system which displays unwanted advertisement to the user. Rootkits

Root kits are specially crafted virus; their main objective is to gain administrative level access on the target system. Usually they contain some virus or script to execute the malicious code on target machine, enable root level access for the attacker and hide the process, allowing attacker full access to machine without getting noticed. Detailed information about root kits is beyond the topic. Based on their functionality we can say that they hijack the target system and monitor all system calls. They are now capable of patching kernel also so attacker can get higher level of permissions.

Security researchers have demonstrated a new technology called “Blue-Pill”[27] which has helped them creating a super root kit without getting any performance degradation or system restart. They have used virtualization support inside processor to run in a virtual machine mode.

Worms are considered as the most advanced version of malware unlike virus they do not require any user interaction to propagate, but like virus they can replicate their code by infecting other target files. They can be combined with Trojan horses to execute on target machine. But unlike virus they are always dependent on some software for their execution without that specific software they cannot perform their actions. These try to exploit vulnerabilities of software or operating system to perform malicious actions. Love Bug is one of famous worm example it used Microsoft Emailing software to distribute its copies. CodeRed and Nimda are some other examples which used Microsoft protocols to distribute and infect other systems.

2.2 Virus Evolution

Viruses are evolved throughout the time that's why today we are dealing with the most advanced type of viruses of all time. Most of time researchers are challenged by the virus writers to detect their created virus and create vaccine for it. In the following section we will describe some of the techniques which are used by virus to satisfy the main objective of Virus writer that is “Make Virus Completely Undetectable”. From time to time they have used different techniques in this section we will discuss those techniques and how those techniques lead toward metamorphic viruses.

2.2.1 Encryption

Encryption is the main sources of information hiding. It has been used some centuries the same way virus writers are using encryption to avoid detection by antivirus. A decryptor is attached with the main virus code to decryp the virus body and performs the action.

lea si, Start ; position to decrypt (dynamically set)

mov sp, 0682 ; length of encrypted body (1666 bytes)


xor [si],si ; decryption key/counter 1

xor [si],sp ; decryption key/counter 2

inc si ; increment one counter

dec sp ; decrement the other

jnz Decrypt ; loop until all bytes are decrypted

Start: ; Encrypted/Decrypted Virus Body

The above code is from [5] for Cascade Virus. In the same article the author has suggested four major reasons why some virus writer will use encryption:

1-Prevention against code analysis: With encryption it becomes quite difficult to disassemble the virus code and examining the code for instructions which can be quite interesting for the virus researchers. For example if someone is performing specific operations such as calling INT 26H or calling specific Crypto API. By using encryption users will bet get an idea about what are the intentions of users because most of file contents will be encrypted and it is quite possible it may contain some Junk Code also.

2-Making disassembling more difficult: Virus writers can used encryption not only to make it difficult they can also us to make this process more time consuming and difficult they can include more junk code inside or wrong instruction so the researchers will not be able to perform static analysis of code and get some confusing idea about the code itself.

3-Making virus temper proof: Same like real life business products some virus writers do not want their virus code to be used by others with their name or generate new variant from their code because it is quite possible someone will decrypt virus and again generate another virus by modifying the code. This is also some sort of prevention from reverse engineering the virus.

4-Avoid detection: This is the core objective of virus write to evade detection by Anti Virus software, based on time to time new techniques have been developed in following section we will discuss some of these techniques how they use encryption.

Mostly the virus contains the decryptor within their code this had helped the Virus researchers to detect viruses based on their decryption signature. But this method is not very successful as it may raise an exception in case some other software tries to use similar methodologies to decrypt data. As time evolved they have developed some new interesting techniques. Most of time in assembly they use simply XOR ing operations help then in decrypting virus code. For example in above code of Cascade Virus it is using XOR to decrypt each byte of virus code unless all body is decrypted. With XOR they have some advantage first of all it is very simple operating and second XOR ing the same values twice yields the first value this operating can help them in decryption and making it more confusing during static code analysis. Peter Szor has described some of these strategies which can be used to make process of encryption and decryption more difficult [2-Chapter7], according to him:

* Virus Writers are not require to store decryption key inside the virus body some advanced virus such as RDA.Fighter generate their decryption key upon execution. This technique is called Random Key Decryption. They use brute force method to generate key during run-time. These Viruses are very hard to detect.

* It is under control by the attacker how he wants to modify the flow of decryption algorithm, it can be forward or backward or it is also possible to have multiple loops inside a single body. Or multiple layers of encryption. Second most important factor is the key size which can make decryption process more difficult based on the key length. Obfuscation is another factor involved in it. In Metamorphic Viruses Similie.D was one of the virus which used non-linear encryption and decrypts the virus body in semi-random order and most important thing is that it access the encrypted portion of virus body only once.[3]

* There is another factor involved in virus encryption such as virus is encrypted with very strong algorithm such as IDEA virus [9] which contains several decryptors. Main source of interest is that it is quite easy to detect virus and remove it but it is extremely difficult to repair the infected file as on second layer of IDEA it uses RDA for key generation.

* Microsoft Crypto API is part of Windows operating system. This can also be used for malicious purpose, Virus writers can use Crypto API to encrypt data with some secret key or call their API through virus code to perform encryption. It is also difficult to detect this because other program such as Internet Explorer also uses this API to encrypt transmission over secure channel.

* There is another variation in decryption which was demonstrated by W95/Silcer Virus that the first portion of virus which is already decrypted force Windows Loader to relocate infected software images once they are executed loaded in to memory. For the purpose of decryption the virus itself transfers relocation information.

* There are other possibilities such as some virus use file name as their decryption key in such case if file name is modified virus cannot execute and there is possibility we will not be able to recover that file after infection. Other methods such as it can use decryptor code itself as decryption key it help them in such condition if someone is analyzing code or virus execution is under a debugger it will raise an exception.

2.2.2 Oligomorphism

With encrypted virus it is quite possible to find the decryption mechanism to challenge this situation virus writers implemented a new technique to create multiple decryptors and use them randomly while they are infecting other files. Major difference between Encryption and Oligomorphism is that in encryption is uses same decryptor for encryption purpose while in oligomorphic virus have multiple decryptors and they can use any of them during the process. Whale Virus was first of this kind to use multiple decryptors. W95/Memorial[7] is one of very famous examples of oligomprphic viruses it uses 96 different type of decryptors.

mov ebp,00405000h ; select base

mov ecx,0550h ; this many bytes

lea esi,[ebp+0000002E] ; offset of "Start"

add ecx,[ebp+00000029] ; plus this many bytes

mov al,[ebp+0000002D] ; pick the first key


nop ; junk

nop ; junk

xor [esi],al ; decrypt a byte

inc esi ; next byte

nop ; junk

inc al ; slide the key

dec ecx ; are there any more bytes to decrypt?

jnz Decrypt ; until all bytes are decrypted

jmp Start ; decryption done, execute body

; Data area


; encrypted/decrypted virus body

Sliding key feature can also be noted as with this feature it is quite possible to change instructions for decryptor. If we get other instance of same virus it has little variations there is a little change in loop instruction Another Variant of W95 Memorial

mov ecx,0550h ; this many bytes

mov ebp,013BC000h ; select base

lea esi,[ebp+0000002E] ; offset of "Start"

add ecx,[ebp+00000029] ; plus this many bytes

mov al,[ebp+0000002D] ; pick the first key


nop ; junk

nop ; junk

xor [esi],al ; decrypt a byte

inc esi ; next byte

nop ; junk

inc al ; slide the key

loop Decrypt ; until all bytes are decrypted

jmp Start ; Decryption done, execute body

; Data area


; Encrypted/decrypted virus body

. It has been mentioned [2] that a virus is only called Oligomorphic if it can mutate its decryptor slightly. Detecting Oligomorphic virus is extremely difficult because as they have random decryptors it is quite possible that our virus detecting mechanism will not able to detect if there are quite large number of decryptors.

2.2.3 Polymorphism

The term Polymorphism came from Greek origin "Poly" means multiple and "morphi" means forms. We can say that these types of viruses can take multiple forms. They are much advanced than their ancestors like Oligomorphic virus they rely on mutating their decryptor in such a way so it generates number of variation of same virus. Core of their operation is their engine which helps them in mutating. For each infection their mutation engine generates a completely new instruction set for decrypter. This process help them in generating a completely new virus having exact functionality as their parents but the sequence of instruction is entirely different from others[28].

Antivirus software are challenged by their method as every time a new file is infected it generated a new encryption code and decryptor so those software who are relying on virus decryptor signature will not be able to detect those viruses as new offspring are completely different in decryptors signature. Research has already shown that it is possible for a mutation engine to generate several million different type of decryptor code for new viruses [28].

Dark Mutation Engine is one of very famous example of polymorphic virus following code has been taken from [2].

mov bp,A16C ; This Block initializes BP

; to "Start"-delta

mov cl,03 ; (delta is 0x0D2B in this example)

ror bp,cl

mov cx,bp

mov bp,856E

or bp,740F

mov si,bp

mov bp,3B92

add bp,si

xor bp,cx

sub bp,B10C ; Huh ... finally BP is set, but remains an

; obfuscated pointer to encrypted body


mov bx,[bp+0D2B] ; pick next word

; (first time at "Start")

add bx,9D64 ; decrypt it

xchg [bp+0D2B],bx ; put decrypted value to place

mov bx,8F31 ; this block increments BP by 2

sub bx,bp

mov bp,8F33

sub bp,bx ; and controls the length of decryption

jnz Decrypt ; are all bytes decrypted?


; encrypted/decrypted virus body

Idea behind making a code engine was that in beginning virus writing was very difficult and time consuming so the experienced virus writers helped novice in virus generating by giving them code mutation engine with little modification they can use this engine within their own virus code and it can perform same operations.

Based on the virus type and engine capabilities it can enhance the virus functionality there are several viruses which can use Microsoft CryptoAPI in their polymorphic operations. Marburg is also one of very famous polymorphic virus which has entirely different mechanism in file infection. till now we could think that infection method if polymorphic virus could be same just decryptor is changing but that virus introduced some of new methodologies like key length in encryption could be different and each file which it is infecting is using different encryption mechanism.[8]


; Encrypted/Decrypted Virus body is placed here


dec esi ; decrement loop counter



mov esi,439FE661h ; set loop counter in ESI



xor byte ptr [edi],6F ; decrypt with a constant byte



add edi,0001h ; point to next byte to decrypt



call Routine-1 ; set EDI to "Start"

call Routine-3 ; set loop counter


call Routine-4 ; decrypt

call Routine-5 ; get next

call Routine-6 ; decrement loop register

cmp esi,439FD271h ; is everything decrypted?

jnz Decrypt ; not yet, continue to decrypt

jmp Start ; jump to decrypted start


call Routine-2 ; Call to POP trick!


pop edi

sub edi,143Ah ; EDI points to "Start"


There are examples of other viruses which shows that

2.2.4 Metamorphism

After all these evolution in virus, now we are dealing with one of the most advanced version of these viruses. Polymorphic viruses were really challenging to detect and remove from system, but it was just a matter of time Researchers tried to build solutions against polymorphic viruses. Viruses writer tired to work on something really amazing a virus which would be able to rebuild itself with same functionality but entirely different from the parent. This proposed solution was first implemented in W32/Apparition, If it finds a compiler in some machine it tries to rebuild itself into completely new shape. Following code has been taken from [2] two different variants of W95/Regswap . This virus was first of its kind to implement metamorphism in shifting registers.


5A pop edx

BF04000000 mov edi,0004h

8BF5 mov esi,ebp

B80C000000 mov eax,000Ch

81C288000000 add edx,0088h

8B1A mov ebx,[edx]

899C8618110000 mov [esi+eax*4+00001118],ebx


58 pop eax

BB04000000 mov ebx,0004h

8BD5 mov edx,ebp

BF0C000000 mov edi,000Ch

81C088000000 add eax,0088h

8B30 mov esi,[eax]

89B4BA18110000 mov [edx+edi*4+00001118],esi

Although till now there is no big incident reported due to metamorphism as normal computers do not contain such utilities like compilers or scripting support to rebuild virus but this situation could be very dangerous for Linux machine where scripting languages and compilers are enabled by default. Upcoming versions of Microsoft Windows are also having support of .Net and MSIL which is capable of generating such virus very easily MSIL/Gastropod is one of famous example of metamorphic virus. In upcoming section we will describe main architecture of metamorphic viruses.

Chapter 3
Metamorphic Virus Architecture

The idea behind metamorphic legacies came from the same biological aspect that the parents are mutating and generating new offspring's which are entirely different than their parents but they are performing the same actions as their parents were doing. Virus Writers have adopted the same idea and implemented in the form of metamorphic virus. Power of any virus relies in its power to bypass the Antivirus Scanner and perform actions. Usually constants in their virus body, specific register allocation, patterns or heuristics scanning are some of the common ways to detect a virus.

Metamorphic Viruses are one of those kinds which are capable of transforming their code into new generation, these viruses are capable of changing their syntax but their semantics remain same throughout generations. Polymorphic viruses were difficult to detect but their main weakness was their decryption mechanism once researchers found their decryption methodology and add this as a signature to antivirus products through this they were able to detect full generation of polymorphic virus but in case of metamorphic virus this approach fails because the syntax of code and mechanism of operation is entirely different throughout generations. They are considered as shape shifters [2] because each generation is entirely different than each other.

Metamorphic engines are mostly buggy, this could be our luck that till now there is no perfect metamorphic engine available. It has been reported that metamorphism has been used as a mean of software security the same way it has been used in viruses to for their protection. They can be used stand alone by which they are self generating viruses and capable of performing actions on target system or they could take help from the surrounding environment in downloading some plug-in form internet or generating their new copies.

Metamorphic viruses are capable of changing arrangement of their instruction. This ability gives them ability to generate new undetectable virus for examples if a virus contains n number of subroutines it will generate n! different type of generations. In case of BADBoy Virus it has 8 subroutines and it is capable of rearranging it's subroutines it can generate 8! = 40320 type of different virus. This grows if number of subroutine increases inside the Virus Body.

Above image is a code module of Badboy Virus in file it just need to take care of Entery Point whereregardless of where it is located remaining subroutines are access through jump instructions throughout the code.

Zperm is another exam of metamorphic virus the above code sample is from Zperm virus which shows its rearrangement of code.

3.1 Formal Definition

This formal definition is presented in [13] according to this definition let ᵠP(d,p) represented as a function which is going to be computed by a program P in the current environment (d,p) in this case p represents programs stored on computer and d represents data processed. D(d,p) and S(p) are two recursive functions , T(d,p) is a trigger and is an injury condition and I(d,p) is considered as injury condition.

In Case of this we can say that pair (v,v') are recursive functions and( v and v') are metamorphic virus if all conditions X(v,v') satisfies.

Where T(d,p) ,I(d,p),S(d,p) is entirely different than T'(d,p) ,I'(d,p),S'(d,p). Based on that we can say that v and v' are metamorphic virus and they are performing same actions. Polymorphic Virus share their kernel but in metamorphic virus each virus has its own kernel.

3.2 Core Architecture

In this section we will discuss major components of metamorphic virus, although there are several other components already explained but architecture represented in [10] is considered as best. According to original author they have divided metamorphic virus in to two categories close-world or open-world. Open World are those who integrate with executing environment and perform some actions such are download some spyware etc. Here we will describe functional architecture of closed world viruses. Most of them perform binary transformation.

3.2.1 Locate Own Code

The virus must be able to locate its code from inside the infected file or its own body each time it is transforming into new form or infecting a new file, metamorphic virus which are infecting other files and use them as their carrier must be capable of locating their code from inside the infected file. Mostly in file they use some predefined location of their startup code this location is mostly constant and remains contestant throughout the other generations. There are only few incidents when Engine tries to put dynamic locations.

3.2.2 Decode

Once the code of virus is located by metamorphic engine it tries to obtain some sort of blue print information about how to transform. Although this is one of the drawback of metamorphic virus that within them self it they contain their architecture about how they are getting transformed. This information is very critical because this information is further encoded inside body of new virus. This unit can also retrieve information about flags, bit-vectors, markers, hints which will help in building new viruses. There is a drawback of this approach as it is required by the virus engine itself so virus write cannot obfuscate this area.

3.2.3 Analyze

Once the core information is gathered there is other information which is very critical for proper execution of metamorphic virus. Without this information transformation cannot be performed. Metamorphic engine must have information about the register liveliness. If it is not available from Decode phase the engine must be capable of constructing it via "def-use" analysis. Control Flow Graph is also required by transformation phase because this will help in the rewriting logic and flow of the program.

Control Flow graph is required in case if the malware itself is capable of generating the code which can shrink or grow in new generations and also it is required to process the control flow logic which is further transformed into code. In the following code it has gather its main idea about code what it is required to perform and it is further transforming it to simplified instructions.


mov [esi+4], 9

mov [esi+4], 6

add [esi+4], 3


mov [ebp+8], ecx

push eax

mov eax, ecx

mov [ebp+8], eax

pop eax


push 4

mov eax, 4

push eax


push eax

push eax

mov eax, 2Bh

3.2.4 Transform

This unit is most important area of virus as it is capable of generating new virus. Most of virus logic resides here. This unit generate new instruction blocks semantically which are exactly same like its code but syntax is a bit different. Here some sort of obfuscation is also performed, metamorphic engine tries to rename registers , inserts NOP and garbage instructions and reorder the execution of block.

Following code block has been taken from their examples in [10].


mov eax, 10

mov eax, 5

add eax,5


mov eax, 5

sub eax, 10

mov eax, 1

add eax, 2

sub eax, 8


mov eax, 5

add eax, 5

mov eax, 10


cmp eax, 5

ja L1

cmp eax, 2

je L2

cmp eax, 5

jb L3

L1 : mov ebx, 3

jmp L4

L2 : mov ebx, 10

jmp L4

L3 : mov ebx, 10

jmp L4


cmp eax, 5

ja L1

cmp eax, 5

jb L2

L1 : mov ebx, 3

jmp L3

L2 : mov ebx, 10

jmp L3


3.2.5 Attach

Attach unit it only available in those viruses which infect files and use them as source of replication. Transform unit not only transforms own code but also the code of target file, where it sets some entry point to virus main routine. During the attachment process it also shuffle the code inside file but it contains jump instructions to different subroutine of virus code.

3.3 Architecture from Underground

In [30] the virus author Benny has given some explanation of metamorphic engine. That what a good metamorphic engine should contain. He is member of famous virus writing group 29A. According to him a good metamorphic engine should contain following components.

1. Internal disassembler: This component will disassemble the virus code instruction.

2. Opcode shirker : This component will shrink instruction in such a way so two or more instructions could be embedded into one instruction

3. Opcode expander : This components will expand one line instruction into two or more

4. Opcode swapper : This component will swap any two instructions.

5. Relocator/recalculator : This component will relocate all information and flow of the program usually it contains all pointer and jump instruction and where it should go.

6. Garbager : This will insert the garbage instructions in the virus code.

Chapter 4
Metamorphic Virus Obfuscation Techniques

In the following section we will try to evaluate some of the techniques' which are being used by metamorphic engines to transform new viruses and obfuscation purpose. These all techniques are used by metamorphic viruses to avoid detection by antivirus software.

In [32] as comparison has been given for different metamorphic viruses following table displays that comparison.






Instruction substitution


Instruction permutation




Variable substitution





Dead code insertion




Changing the control flow




This table showing which type of obfuscation technique is used by which metamorphic virus.

4.1 Instruction Substitution

Metamorphic viruses are very intelligent they keep semantics of their parent and give their execution code a completely new face. Through this technique it tries to create a substitute instruction for execution. This technique has some drawback as the execution flow of program is not changing so this process can be detected through behavioral scanning. As due to this process code of virus code expand or shrink based on number of instructions increased or decreased from the parent code.

push eax

mov [edi], 0x04

jmp label

push eax

push ecx

mov ecx, 0x04

mov [edi], ecx

pop ecx

jmp label


push 0x04

mov eax, 0x09

jmp label

mov eax, 0x04

push eax

mov eax, 0x09

jmp label


mov eax, 0x04

push eax

jmp label

mov eax, 0x04

push eax

mov eax, 0x09

jmp label

This example has been taken from [31] for W32.Evol virus which is a metamorphic virus. as we can see through example , the virus engine is trying to substitute code through other instructions. If we perform static code analysis we yield same result but for virus scanner this becomes difficult to keep signature of every generated offspring.

4.2 Code Mutation

Mutation is another feature of Virus code; either the code itself or the target file can mutate and generate something new. When a virus mutates it can keep it's functionality but when it mutates the infected file then there are chances that after removal of infection the file will not be usable. For mutation purpose metamorphic engine executes the code morphing routine which executes and replaces the code inside file. Some sort of mutations could be arithmetic or Boolean operations.

mov edi, 2580774443

mov ebx, 467750807

sub ebx, 1745609157

sub edi, 150468176

xor ebx, 875205167

push edi

xor edi, 3761393434

push ebx

push edi

mov ebx, 535699961

mov edx, 1490897411

xor ebx, 2402657826

mov ecx, 3802877865

xor edx, 3743593982

add ecx, 2386458904

push ebx

push edx

push ecx

Above example is taken from [29] and it s code from Win9x.ZMorph.A Virus where it is mutating. Another example of mutation can be Win95.Bistro virus which mutates its host file also [20]

4.3 Permutation

Permutation is another way of obtaining obfuscation in metamorphic viruses. W95.Ghost and W95.Zperm are one of those examples who are using permutation for obfuscation. In this method virus code remains constant just only the subroutines are reordered as we already explained if a virus contains n number of subroutine that virus can have "!n" number of permutations. In [16] it has been mentioned that through this method code is divided into frames and than those frames are randomly connected through jump instructions. In [17] it has been mentioned that this method has a drawback that it could be detected by signature. The author has also suggested that the Opcode must be arranged in such a way so it matches the region of Opcode. And perform permutation on each sequence of code before they are getting aligned.

We can use the same example for permutation, which can display arrangement of the virus subroutines. Virus can make detection more difficult if they insert junk instructions between those frames.

4.4 Substitution of Variables

Same like instruction substitution there is another process which can be performed that is variable substitution. W95.Regswap is performing this type of substitution [14]. For example mov eax,0 could be replaced by mov ecx,0 etc.

4.5 Garbage Insertion

This is considered the most advanced method of obfuscation; this is used to generate some junk code just to confuse the debugger. Through this method virus writer make their code camouflage so the scanner is unable to locate hexadecimal signatures of virus from it. Inserting garbage instructions inside code does not make any changes to functionality of virus. There are several type of code which is considered as junk such as code which is not computing anything or code which is just exchanging registers again and again. Win32/Evol is considered the first metamorphic virus which was capable of inserting junk code to next generation. in earlier generation it contains this code. This sample code is taken from [20].

C7060F000055 mov [esi], 5500000Fh

C746048BEC5151 mov [esi+0004], 5151EC8Bh

Once it transformed in to new generation we found

BF0F00055 mov edi, 5500000Fh

893E mov [esi], edi

5F pop edi ; garbage

52 push edx ; garbage

B640 mov dh, 40 ; garbage

BA8BEC5151 mov edx, 5151EC8Bh

53 push ebx ; garbage

8BDA mov ebx, edx

895E04 mov [esi+0004}, ebx

If we perform a static analysis code both code , In functionality both are performing same operation but the second code is more difficult to understand and confusing the debugger to generate any specific signature for virus. W95/Bistro is another example of junk code insertion which does not include junk code it just inserts nop instruction inside subroutines. Through these techniques it bypasses the emulation scanning method.

Unconditional Jump instructions are also considered as garbage the following example is taken from [29]

pop edx

mov edi,0004h

mov esi,ebp

mov eax,000Ch

pop edx

jmp label1


jmp label3


mov edi,0004h

mov esi,ebp

jmp label2


mov eax,000Ch

4.6 Control Flow Modification

Control flow modification is another important feature of metamorphic viruses. Like permutation they can rearrange their subroutine but they are also capable of changing their flow of execution. for this purpose they are implementing different methodologies. Win95/RegSwap was first of its kind which was using different registers in next generations [16]. Following code has been taken from [16] , two different generations :

5A pop edx

BF04000000 mov edi,0004h

8BF5 mov esi,ebp

B80C000000 mov eax,000Ch

81C288000000 add edx,0088h

8B1A mov ebx,[edx]

899C8618110000 mov [esi+eax*4+00001118],ebx

58 pop eax

BB04000000 mov ebx,0004h

8BD5 mov edx,ebp

BF0C000000 mov edi,000Ch

81C088000000 add eax,0088h

8B30 mov esi,[eax]

89B4BA18110000 mov [edx+edi*4+00001118],esi

Another control flow modification can be done through Jump code insertion. In [17] it has given example of W95/Zperm which is capable of inserting jump code instruction in next generations.

In research it has been shown that this virus does not have a constant body and it is virtually impossible to detect this virus using normal search string.

Chapter 4
Metamorphic Code Detection

In the following section we will try to evaluate some methodologies to detect metamorphic virus although till now there is no single method defined to detect metamorphic viruses. Due to their architecture and code transformation features we cannot predict exact specification of virus. But we can detect metamorphic viruses through combination of the following methodologies.

4.1 Weakness in Architecture

We are lucky till now we have not faced any real problem with metamorphic viruses, core of their operation lies in its engine and still not any perfect metamorphic engine released for operation. There are several limitations in architecture of metamorphic virus; core objective of virus writer is to avoid detection and later avoid analysis of binary, in [33] author has described some of techniques which are used by metamorphic virus writer to avoid detection. Such as :

* Attacking on disassembly : So it will not let disassemble virus code or confused the disassamber by inserting garbage code

* Attacking on procedure abstraction: By using this method it will not let anybody get any idea about boundaries of the procedure. It is performed using jump and push instructions.

* Attacking on Control Flow Graph generation: Through dynamic analysis of binary graphs are generated by detecting jump instructions throughout the code but through this method and unnecessary jump instructions will lead to confusing the CFG generating algorithm which result in generating unnecessary edges.

* Attacking Data Flow Analysis: Like previous it tries to generate unnecessary edges. Usually each edge contains information about datasets and variable. To attack this data they try to store data somewhere outside the boundary of program.

* Attacking property verification: Some antivirus make detection of virus through API calls if some specific API calls are made then there is a probability that some virus is calling, but this method can be exploited by using return address of API calls instead of calling method calls. This technique is used in W95/Evol.

Based on these methodologies authors have suggested that we have a possibility to attack metamorphic virus we perform the same actions what they are performing, With this we will be capable of detecting their weak spots in their architecture [16]. In another study authors have suggested that using this method both virus writers and researchers have same theoretical limits.

As we already explained in architecture of metamorphic code they need to decode and analyze their own code before they are transforming a new generation. This is the weakest point of their architecture which we can exploit. Regardless of how much obfuscation they use they still need to decode and analyze blueprint of virus code in somewhere on memory or file. And they have to write those transformations in the same sequence like its parent. In [15] it has been suggested if we generate some sort of reverse metamorphic software which is capable of analyzing the same process of analyzing code and reverse the transformations we are capable of detecting virus code from itself. But this information is still in theoretical phase as they virus researcher are required to have multiple generations of same virus to detect transformation and static data.

4.2 Detection Methodologies

In the following section we will describe some of researched methodologies to detect metamorphic code.

4.2.1 Signature Based Detection

This is very old methodology which is being used since first generation of antivirus. There are different methodologies of generating signatures for virus. Most of time it is being used with string searching for example it looks for specific hexadecimal string from within the infected file.

There is a proposed method of Wild card and Half byte scanning [16]. This is also some sort of signature based detection. Let's take example of W95/Regswap Virus, which is using techniques of swapping registers in next generations.

First generation:

BE04000000 mov esi,000000004

8BDD mov ebx,ebp

B90C000000 mov ecx,00000000C

81C088000000 add eax,000000088

8B38 mov edi,[eax]

89BC8B18110000 mov [ebx][ecx]*4[00001118],edi

2BC6 sub eax,esi

49 dec ecx

Second generation:

BB04000000 mov ebx,000000004

8BCD mov ecx,ebp

BF0C000000 mov edi,00000000C

81C088000000 add eax,000000088

8B30 mov esi,[eax]

89B4B920110000 mov [ecx][edi]*4[00001120],esi

2BC3 sub eax,ebx

4F dec edi

Following code has been taken from [16]. If we carefully analyze both code fragments we can detect some of constant Opcode throughout different generations. We can use those specific constants as a signature to detect above virus. Metamorphic Engine Signature

This method is theoretically proposed to generate signature based on metamorphic engine [12]. This methodology is targeting the specific engine. It checks for the instruction substitution. These techniques are only effective for those viruses which are having finite set of instruction to transform their instruction semantics into new generations. They use the concept of likelihood and engine specific scoring. Such as, what will be the probability and likelihood is of code which is generated by engine. In this detection they software tries to gather information about the instructions which are going to be transformed or make some assumptions that it will transform into that specific code.

In above image it is showing the probability of friendliness more information we have about the code more accurately we can detect the virus.

W95/Evol there is a clue set of instruction substitution. Which type of instructions can be transformed into which. Same research methodologies are used by the author to detect what is the possible offspring. Limitations

As we have already described in architecture different metamorphic malware use different type of techniques to change their shape. We cannot predict any specific string to detect metamorphic code. Due to permutation and obfuscation their inner code is not constant and even they don't have a constant body. Furthermore latest viruses are capable of changing their functionality so still we can have a finial signature which is capable of detecting single virus and all upcoming generations.

4.2.2 Heuristics Detections

Heuristics scanner are very useful in detecting macro virus, They are also used to detect future viruses or virus family. usually this type of detection is just assumptions because it tries to assume that this is a possible infections in [2] author has described some of possible examples of heuristics scanning by which we can detect whether there is an infection

* Code Entry point is somewhere else other than the starting point.

* Size of Image is incorrect

* Junk code is present or white spaces between sections of code

* Suspicious section name or jump instructions

* Multiple headers of file.

With this method a scanner tries to check specific instructions from virus or possible infection in an infected file. in upcoming section we will describe some method to detect metamorphic virus using heuristics scanning. Geometric Detection

This method is defined in [16] this is a heuristics based scanning method. Using this method we try to evaluate which type of modification has been performed to the infected file. This type of scanning has been given name of "Shape-Heuristics". For example in case of W95/Zmist virus it infects the file and increases the size of data section of PE file to at least 32K. As now they scanner has given indication that there is a possible infection of Zmist. Another example is given in [16] which is W96/Bistro which adds higher byte value to a lower field. Limitations

Heuristics based detections are always prone to false positive detections. such as in case of W95/Zmist there are chances if due to some change is file size the scanner detects it as a virus operation. There are several other limitations which could generate false positive results if some modification is occurred to file structure.

4.2.3 Code Emulation

Code emulation uses the concept of virtual machine where it tries to simulate a virtual CPU and memory to test the virus for infection. It executes that virus inside that virtual machine [2] . Virtual machine is completely protective so code is not executing outside the environment. Virus scanner tries to detect information which are possibly to occur in case of infection. In [14] author has given an example of code emulation using ACG virus:

mov ax, 65a1

xchg dx, ax

mov ax, dx

mov bp, ax

add ebp, 69bdaa5f

mov bx, bp

xchg bl, dh

mov bl, byte ptr ds:[43a5]

xchg bl, dh

cmp byte ptr gs:[b975], dh

sub dh, byte ptr ds:[6003]

mov ah, dh

int 21

In above case when INT21h is reached the register is having values of ah=4a and bx=100, using this information we can detect ACG virus. In case of metamorphic virus we can take example of W95/Evol. As the virus it has hidden data inside and while transforming code it changes that code throughout generations. It tries to build constant data on stack. Although it contains variable data but it tries to rebuild its architecture on stack where it is in decrypted format where it has information about where and which API they need to call. Using emulation we can detect such type of information in a virtual machine. Limitations

The only limitation is that we are required to have constant data information about the virus otherwise in case of dynamic information this method fails. Permutation and obfuscation is a way to defeat this type of detection. For example if some virus is checking some outside link or some call which emulator cannot process it may alert the engine that it is being executed with in a virtual environment which will later halt its execution.

4.2.4 Detection by Disassembling

If we can disassemble the virus code perfectly we can detect so many abnormalities which can indicate a possible infection of virus. Using some sort o heuristics scanning we can detect virus body but still we need to look for the executing code.

In [16] Author has described some possible ways through which we can detect De permutation of Subroutine

We have seen the process of permutation where it tries to re-arrange subroutines. W95/Zmist contains RPME which stands of Real Permutation Engine. This can generate possible variants of virus through arrangement. Mostly these instructions and subroutines are scattered throughout the file and later linked to each other through jump instructions. We can counter this approach by using some sort of emulation before permutation occurs to construct the virus code before it is permuted and rearranged. Expertises are required in this technique as the researcher should able to analyze when it is required to halt the execution of code. Detecting Dummy Loops

In some virus they are using dummy loop insertion method this method is called macho technique. This technique is usually used again emulation. By detecting dummy loops through static analysis we can check what it is performing emulator will be busy in generating millions of do nothing and fake loop instructions. We are required to test what operations they are performing instead of looking for jump instructions. Using Regular Expressions and DFA

Through this method we try to detect virus by disassemble code and then try to matching code patterns using Regular Expressions and Deterministic Finite Automata. This technique has shown proven results again obfuscation. As we already know that Regular Expression are used in programming languages and code to test the input from user against some specified pattern. If that pattern is mated in regular expression it alerts the system. DFA is a technique in which we have a transition table which contains states and values of their next correspondence state.

In this process we try to automate some process to generate some code. In this process we also use some sort of grammar which contains information about code or some set of rules of generating code. By using this method it tries to disassemble the virus code and tries to simulate the code of virus and build a new generation according to the rules already specified in grammar then at later stage using regular expressions and grammar it tries to detect whether that pattern matches with existing virus code or not.

Different component of DFA system may contains normalize which can insert garbage instructions, or depermutator which can reverse the process of permutation. Once it has process and simulated the virus code using DFA we can use our result sets to detect whether it is a virus or not.

4.2.5 Algebraic Specification

This techniques was proposed by Webster and Malcolm in Journal of computer Virology[36]. Based on their researched they proposed that by using Algebraic specification of IA-32 Assembly Language we can detect metamorphic virus. They used OBJ specification which is an algebraic specification formalism and theorem proven and is base on order-sorted equation logic. They used obj to specify subset of IA-32 assembly language instruction for syntax and semantics of website. Furthermore they reduce instruction sets by elimination similar instructions from sequences.

They used term of all morphs which are two different generations of same virus and have entirely different syntax but they are performing same operations. By using their technique it is possible to prove equivalence and semi-equivalence of different metamorphic generations. We can use this technique in Antivirus as emulation based dynamic analysis. Once they have signature of virus they can test it with other code segment whether this code segment is similar to virus signature or not. It may also generate some false positive results,

4.2.6 Neural Network and Hidden Markov Models

This is an experimental research implemented by Wong and Stamp [35]. Hidden Markov Models are used for statistical pattern Analysis. They are commonly used in speech recognition, biological sequence analysis. Hidden Markov Model is a state machine in which transitions between each different state have fixed probabilities. Due to their functionality they provide a very different approach to describe sequence of variations. According to their research they try to train a model first with virus characteristics, In this model virus code acts as states and instructions of Opcode are implemented as observations. With this model it can help in detecting different type of virus which belongs to same family. As metamorphic virus use similar approach so HMM can be used to detect virus based on their similarities.

There are some limitations like metamorphic virus change their shape in next generations but despite that fact there are several similarities which remain same in all generations so based on probabilities we can try to analyze files and detect them as virus if they posses higher probabilities. During their experiment they disassembled virus code and generate a long sequence of Opcode to train the model. It can use different type of sequences to train model. By using same research Authors were able to train a HMM to detect entire virus family.

Once their model is fully trained it can be used to compute a log of likelihood of each virus variants by scanning multiple files. Based on Score they can detect different variants of virus. Although this technique is not fully trusted as due to assembly language it may produce some false positive response.

4.2.7 Zeroing Transformation

This methodology is proposed by Lakhtia and Mohammad in [34]. Authors have proposed a possible solution to detect metamorphic viruses. First they try to reduce number of possible variant which are going to be created de to transformation of viruses. They have proposed method to reorder inner statement and reshaping expressing along variable renaming. In their research they have explained examples on how to reduce instructions from metamorphic viruses. They try to emulate the virus code. Their proposed solution is that Antivirus companies can use zero form signatures to detect full family of metamorphic virus [34] instead of having different signature of same virus variation they can use zero form signatures.

According to their research they try to detect and transform different generation of metamorphic virus toward same form by providing different inputs and arrangement of inner instructions and variables. They used Code Surfer software (which is used to perform manual analysis of code), to implement and demonstrate binary transformation techniques of metamorphic virus.

Based on their research they try to apply Zeroing Transformation on virus which will transform it into its zero form. In [34] they have described these steps:

We are required to create a Program Tree or PT for each procedure and instructions inside the program. We can use control dependence sub graph for constructing trees.
Once we have PT, We need to partition PT Nodes into reorder able sets. We need to take care that each set should contain statement which may be reordered without affecting the semantics of the program.
Once we have generated sets we can generate isomorphic set sequences by partitioning re-orderable sets. These isomorphic sets should have same string representation, but it is not dependent on order of statements, order of variables or variable names.
Using Depth First traversal in Program tree we need to number each statement. Statements in re-orderable sets are visited according to their sequence in isomorphic sets but statements in isomorphic sets are randomly visited.
Based on the arrangement and order of statement we need to create a new program. In each expression variable name is changed with number of statement where it was first defined.

By using these steps they created a tool with C Statements which was using Program Dependence Graph generated by Code Surfer to gather information about data dependencies which are required to identify statements and how to re-order those. Based on their research they proposed that using Zero Transformation we can decrease number of possible variants of metamorphic virus. This is performed by using decrease in permutation.

4.2.8 Resolution Based Detection

This technique is researched by Ando, Quynh, and Takefuji and they have proposed a resolution-based detection method [19]. They have demonstrated and theoretically explained possible way to detect metamorphic viruses. By using this method they try to extras parts from virus code and reusable those parts to generate a specific signature of virus. They try to remove all type of redundancy from the virus code such as junk code and nop instructions.

According to their research if a program is infected by some virus and it is in state to perform some malicious operation such as making some API calls. To get information about these call the resolution must have all information and states about those call which is going to be generated through transformation.

If we get further information about this methodology we see they provide two type of reasoning, resolution and demodulation, first part tries to gather information about different type of instructions and try to decide them to simplified one. And the second part removes the junk code from it. This process help them generation a proper virus signature. It has a redundancy control strategy which is designed to reduce obstacle for reasoning program. This technique is useful against anti heuristics virus.



[1] Fred Cohen. Computer Viruses. PhD thesis, University of Southern California, 1986.

[2] Peter Szor. The Art of Computer Virus Research and Defense. Addison Wesley Professional, 1 edition, February 2005

[3] Peter Szor, Peter Ferrie, and Frederic Perriot , "Striking Similarities," Virus Bulletin, May 2002, pp. 4-6

[4] X. Lai, J. L. Massey , "A Proposal for New Block Encryption Standard," Advances in Cryptology Eurocrypt'90, 1991.

[5] Fridrik Skulason. Virus encryption techniques. Virus Bulletin, pages 13-16, November 1990.

[6] Peter Szor. Junkie memorial. Virus Bulletin, pages 6-8, September 1997.

[7] Fridrik Skulason. 1260 - The variable virus. Virus Bulletin, page 12, March 1990.

[8] Peter Szor. The marburg situation. Virus Bulletin, pages 8-10, November 1998.

[9] Peter Szor , "Bad IDEA," Virus Bulletin, pages 18-19 April 1998

[10]Andrew Walenstein, Rachit Mathur, Mohamed Chouchane, Arun Lakhotia."The Design Space of Metamorphic Malware".Proceedings of the 2nd International Conference on Information Warfare, (Monterey, CA, U.S.A., Mar 8-9), 2007.March 2007

[11] Zhang Qinghua. "Polymorphic and Metamorphic Malware Detection". PHD Thesis. North Carolina State University. 2008

[12] Mohamed R. Chouchane and Arun Lakhotia. Using engine signature to detect metamorphic malware. In WORM '06: Proceedings of the 4th ACM workshop on Recurring malcode, pages 73-78, New York, NY, USA, 2006. ACM Press.

[13] Zhihong Zuo, Qing-xin Zhu, and Ming-tian Zhou. On the time complexity of computer viruses. IEEE Transactions on information theory, 51(8):2962-2966, August 2005.

[14] Peter Szor and Peter Ferrie. Hunting for metamorphic. In Virus Bulletin Conference, September 2001.

[15] Arun Lakhotia, Aditya Kapoor, and Eric Uday Kumar. “Are metamorphic computer viruses really invisible?” part 1. Virus Bulletin, pages 5-7, December 2004.

[16] Rodelio G. Finones and Richard t. Fernandez. “Solving the metamorphic puzzle”. Virus Bulletin, pages 14-19, March 2006.

[17] Myles Jordan. Dealing with metamorphism. Virus Bulletin, pages 4-6,October 2002.

[18] Peter Ferrie and Peter Szor. Zmist oportunities. Virus Bulletin, pages 6-7, March 2001.

[19] Ruo Ando, Nguyen Annh Quynh,Yoshiyasu Takefuji. "Resolution based metamorphic computer virus detection using redundancy control strategy". Keio University, Japan.

[20] Peter Szor. The new 32-bit medusa. Virus Bulletin, pages 8-10, December 2000.

[21] X. Gao, Metamorphic software for buffer overflow mitigation, Master's thesis, Department of Computer Science, San Jose State University, 2005.

[22] Z0mbie. About reversing. VX Heavens.

[23] Z0mbie. Some ideas about metamorphism. VX Heavens.

[24] Driller, M. (2002) Metamorphism in practice or "How I made MetaPHOR and what I've learnt". VX Heavens.

[25] F. Cohen. Computer viruses: theory and experiments. Computer Security,6(1):22-35, 1987.

[26] Roger A. Grimes. Malicious Mobile Code: Virus Protection for Windows. O'Reilly & Associates, Inc., Sebastopol, CA, USA, 2001.

[27] Joanna Rutkowska and Alexander Tereshkin. Blue pill project., 2007.

[28] Carey Nachenberg. Computer virus-antivirus coevolution. Commun.ACM, 40(1):46-51, 1997.

[29] Webster Matthew Paul."Formal Models of Reproduction from Computer Viruses to Artificial Life". PHD Thesis. University of Liverpool. 2008

[30] Issue 29A-4."Virus EZine”,

[31] A. Walenstein, R. Mathur, M.R. Chouchane and A. Lakhotia, "Normalizing Metamorphic Malware Using Term Rewriting," Proc. Int'l Workshop on Source Code Analysis and Manipulation (SCAM), IEEE CS Press, Sept. 2006. Pages 75-84.

[32] Borello Jean-Marie, Mé Ludovic."Code obfuscation techniques for metamorphic viruses".Springer-Verlag France 2008

[33] Arun Lakhotia and Prabhat K. Singh. Challenges in getting 'formal' with viruses. Virus Bulletin, pages 15-19, September 2003.

[34] Moinuddin Mohammed. Zeroing in on Metamorphic Computer Viruses. Master's Thesis . University of Louisiana .2003

[35] Wing Wong and Mark Stamp. Hunting for metamorphic engines. Journal in Computer Virology, 2(3):211-229, 2006.

[36] MattWebster and Grant Malcolm. Detection of metamorphic computer viruses using algebraic specification. Journal in Computer Virology, 2(3):149-161, December 2006. DOI: 10.1007/s11416-006-0023-z.


More from UK Essays

Doing your resits? We can help!