Analysis of Botnet Security Threats
Disclaimer: This dissertation has been submitted by a student. This is not an example of the work written by our professional dissertation writers. You can view samples of our professional work here.
Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of UK Essays.
During the last few decades, we have seen the dramatically rise of the Internet and its applications to the point which they have become a critical part of our lives. Internet security in that way has become more and more important to those who use the Internet for work, business, entertainment or education.
Most of the attacks and malicious activities on the Internet are carried out by malicious applications such as Malware, which includes viruses, trojan, worms, and botnets. Botnets become a main source of most of the malicious activities such as scanning, distributed denial-of-service (DDoS) activities, and malicious activities happen across the Internet.
1.2 Botnet Largest Security Threat
A bot is a software code, or a malware that runs automatically on a compromised machine without the user's permission. The bot code is usually written by some criminal groups. The term “bot” refers to the compromised computers in the network. A botnet is essentially a network of bots that are under the control of an attacker (BotMaster). Figure 1.1 illustrates a typical structure of a botnet.
A bot usually take advantage of sophisticated malware techniques. As an example, a bot use some techniques like keylogger to record user private information like password and hide its existence in the system. More importantly, a bot can distribute itself on the internet to increase its scale to form a bot army. Recently, attackers use compromised Web servers to contaminate those who visit the websites through drive-by download . Currently, a botnet contains thousands of bots, but there is some cases that botnet contain several millions of bots .
Actually bots differentiate themselves from other kind of worms by their ability to receive commands from attacker remotely . Attacker or better call it botherder control bots through different protocols and structures. The Internet Relay Chat (IRC) protocol is the earliest and still the most commonly used C&C channel at present. HTTP is also used because Http protocol is permitted in most networks. Centralized structure botnets was very successful in the past but now botherders use decentralized structure to avoid single point of failure problem.
Unlike previous malware such as worms, which are used probably for entertaining, botnets are used for real financial abuse. Actually Botnets can cause many problems as some of them listed below:
i. Click fraud. A botmaster can easily profit by forcing the bots to click on advertisement for the purpose of personal or commercial abuse.
ii. Spam production. Majority of the email on the internet is spam.
iii. DDoS attacks. A bot army can be commanded to begin a distributed denial-of-service attack against any machine.
iv. Phishing. Botnets are widely used to host malicious phishing sites. Criminals usually send spam messages to deceive users to visit their forged web sites, so that they can obtain users' critical information such as usernames, passwords.
1.3 Botnet in-Depth
Nowadays, the most serious manifestation of advanced malware is Botnet. To make distinction between Botnet and other kinds of malware, the concepts of Botnet have to understand. For a better understanding of Botnet, two important terms, Bot and BotMaster have been defined from another point of views.
Bot - Bot is actually short for robot which is also called as Zombie. It is a new type of malware  installed into a compromised computer which can be controlled remotely by BotMaster for executing some orders through the received commands. After the Bot code has been installed into the compromised computers, the computer becomes a Bot or Zombie . Contrary to existing malware such as virus and worm which their main activities focus on attacking the infecting host, bots can receive commands from BotMaster and are used in distributed attack platform.
BotMaster - BotMaster is also known as BotHerder, is a person or a group of person which control remote Bots. Botnets- Botnets are networks consisting of large number of Bots. Botnets are created by the BotMaster to setup a private communication infrastructure which can be used for malicious activities such as Distributed Denial-of-Service (DDoS), sending large amount of SPAM or phishing mails, and other nefarious purpose [26, 27, 28]. Bots infect a person's computer in many ways.
Bots usually disseminate themselves across the Internet by looking for vulnerable and unprotected computers to infect. When they find an unprotected computer, they infect it and then send a report to the BotMaster. The Bot stay hidden until they are announced by their BotMaster to perform an attack or task. Other ways in which attackers use to infect a computer in the Internet with Bot include sending email and using malicious websites, but common way is searching the Internet to look for vulnerable and unprotected computers . The activities associated with Botnet can be classified into three parts: (1) Searching - searching for vulnerable and unprotected computers. (2) Dissemination - the Bot code is distributed to the computers (targets), so the targets become Bots. (3) sign-on - the Bots connect to BotMaster and become ready to receive command and control traffic.
The main difference between Botnet and other kind of malwares is the existence of Command-and-Control (C&C) infrastructure. The C&C allows Bots to receive commands and malicious capabilities, as devoted by BotMaster. BotMaster must ensure that their C&C infrastructure is sufficiently robust to manage thousands of distributed Bots across the globe, as well as resisting any attempts to shutdown the Botnets. However, detection and mitigation techniques against Botnets have been increased [30,31]. Recently, attackers are also continually improving their approaches to protect their Botnets. The first generation of Botnets utilized the IRC (Internet Relay Chat) channels as their Common-and-Control (C&C) centers. The centralized C&C mechanism of such Botnet has made them vulnerable to being detected and disabled. Therefore, new generation of Botnet which can hide their C&C communication have emerged, Peer-to-Peer (P2P) based Botnets. The P2P Botnets do not experience from a single point of failure, because they do not have centralized C&C servers . Attackers have accordingly developed a range of strategies and techniques to protect their C&C infrastructure.
Therefore, considering the C&C function gives better understanding of Botnet and help defenders to design proper detection or mitigation techniques. According to the C&C channel we categorize Botnets into three different topologies: a) Centralized; b) Decentralized and c) Hybrid. In Section 1.1.4, these topologies have been analyzed and completely considered the protocols that are currently being used in each model.
1.4 Botnet Topologies
According to the Command-and-Control(C&C) channel, Botnet topology is categorized into three different models, the Centralized model, the Decentralized model and Hybrid model.
1.4.1 Centralized Model
The oldest type of topology is the centralized model. In this model, one central point is responsible for exchanging commands and data between the BotMaster and Bots. In this model, BotMaster chooses a host (usually high bandwidth computer) to be the central point (Command-and-Control) server of all the Bots. The C&C server runs certain network services such as IRC or HTTP. The main advantage of this model is small message latency which cause BotMaster easily arranges Botnet and launch attacks.
Since all connections happen through the C&C server, therefore, the C&C is a critical point in this model. In other words, C&C server is the weak point in this model. If somebody manages to discover and eliminates the C&C server, the entire Botnet will be worthless and ineffective. Thus, it becomes the main drawback of this model. A lot of modern centralized Botnets employed a list of IP addresses of alternative C&C servers, which will be used in case a C&C server discovered and has been taken offline.
Since IRC and HTTP are two common protocols that C&C server uses for communication, we consider Botnets in this model based on IRC and HTTP. Figure 1.2 shows the basic communication architecture for a Centralized model. There are two central points that forward commands and data between the BotMaster and his Bots.
22.214.171.124 Botnets based on IRC
The IRC is a type of real-time Internet text messaging or synchronous conferencing . IRC protocol is based on the Client Server model that can be used on many computers in distributed networks. Some advantages which made IRC protocol widely being used in remote communication for Botnets are: (i) low latency communication; (ii) anonymous real-time communication; (iii) ability of Group (many-to-many) and Private (one-to-one) communication; (iv) simple to setup and (v) simple commands. The basic commands are connect to servers, join channels and post messages in the channels; (vi) very flexibility in communication. Therefore IRC protocol is still the most popular protocol being used in Botnet communication.
In this model, BotMasters can command all of their Bots or command a few of the Bots using one-to-one communication. The C&C server runs IRC service that is the same with other standard IRC service. Most of the time BotMaster creates a channel on the IRC server that all the bots can connect, which instruct each connected bot to do the BotMaster's commands. Figure 1.3 showed that there is one central IRC server that forwards commands and data between the BotMaster and his Bots.
Puri  presented the procedures and mechanism of Botnet based on IRC, as shown in Figure. 1.4.
Bots infection and control process :
i. The attacker tries to infect the targets with Bots.
ii. After the Bot is installed on target machine, it will try to connect to IRC server. In this while a random nickname will be generate that show the bot in attacker's private channel.
iii. Request to the DNS server, dynamic mapping IRC server's IP address.
iv. The Bot will join the private IRC channel set up by the attacker and wait for instructions from the attacker. Most of these private IRC channel is set as the encrypted mode.
v. Attacker sends attack instruction in private IRC channel.
vi. The attacker tries to connect to private IRC channel and send the authentication password.
vii. Bots receive instructions and launch attacks such as DDoS attacks.
126.96.36.199 Botnet based on HTTP
The HTTP protocol is an additional well-known protocol used by Botnets. Because IRC protocol within Botnets became well-known, internet security researchers gave more consideration to monitoring IRC traffic to detect Botnet. Consequently, attackers started to use HTTP protocol as a Command-and-Control communication channel to make Botnets become more difficult to detect. The main advantage of using the HTTP protocol is hiding Botnets traffics in normal web traffics, so it can easily passes firewalls and avoid IDS detection. Usually firewalls block incoming and outgoing traffic to not needed ports, which usually include the IRC port.
1.4.2 Decentralized model
Due to major disadvantage of Centralized model-Central Command-and-Control (C&C)-attackers tried to build another Botnet communication topology that is harder to discover and to destroy. Hence, they decided to find a model in which the communication system does not heavily depending on few selected servers and even discovering and destroying a number of Bots.
As a result, attackers take advantage of Peer-to-Peer (P2P) communication as a Command-and-Control (C&C) pattern which is much harder to shut down in the network. The P2P based C&C model will be used considerably in Botnets in the future, and definitely Botnets that use P2P based C&C model impose much bigger challenge for defense of networks.
In the P2P model, as shown in Fig. 1.6, there is no Centralized point for communication. Each Bot have some connections to the other Bots of the same Botnet and Bots act as both Clients and servers. A new Bot must know some addresses of the Botnet to connect there. If Bots in the Botnet are taken offline, the Botnet can still continue to operate under the control of BotMaster.
P2P Botnets aim at removing or hiding the central point of failure which is the main weakness and vulnerability of Centralized model. Some P2P Botnets operate to a certain extent decentralized and some completely decentralized. Those Botnets that are completely decentralized allow a BotMaster to insert a command into any Bots. Since P2P Botnets usually allow commands to be injected at any node in the network, the authentication of commands become essential to prevent other nodes from injecting incorrect commands.
For a better understanding in this model, some characteristics and important features of famous P2P Botnets have been mentioned:
Slapper: Allows the routing of commands to distinct nodes. Uses Public key and private key cryptography to authenticate commands. BotMasters sign commands with private key and only those nodes which has corresponding public key can verify the commands . Two important weak points are: (a) its list of known Bots contains all (or almost all) of the Botnet. Thus, one single captured Bot would expose the entire Botnet to defenders  (b) its sophisticated communication mechanism produces lot traffic, making it vulnerable to monitoring via network flow analysis.
Sinit: This Bot uses random searching to discove other Bots to communicate with. It can results in an easy detection due to the extensive probing traffic .
Nugache: Its weakness is based on its reliance on a seed list of 22 IP addresses during its bootstrap process .
Phatbot: Uses Gnutella cache server for its bootstrap process which can be easily shutdown. Also its WASTE P2P protocol has a scalability problem across a long network .
Strom worm: it uses a P2p overnet protocl to control compromised hosts. The communication protocol for this Bot can be classified into five steps, as describes below :
i. Connect to Overnet - Bots try to join Overnet network. Each Bot initially has hard-coded binary files which is included the IP addresses of P2P-based Botnet nodes.
ii. Search and Download Secondary Injection URL - Bot uses hard-coded keys to explore for and download the URL on the Overnet network .
iii. Decrypt Secondary Injection URL - compromised hosts take advantages of a key(hard coded) to decrypt the URL.
iv. Download Secondary Injection - compromised hosts attempt to download the second injection from a server(probably web server). It could be infected files or updated files or list of the P2P nodes .
1.4.3 Hybrid model
The Bots in the Hybrid Botnet are categorized into two groups:
1) Servant Bots - Bots in the first group are called as servant Bots, because they behave as both clients and servers, which have static, routable IP addresses and are accessible from the entire Internet.
2) Client Bots - Bots in the second group is called as client Bots since they do not accept incoming connections. This group contains the remaining Bots, including:- (a) Bots with dynamically designated IP addresses; (b) Bots with Non-routable IP addresses; and (c) Bots behind firewalls which they cannot be connected from the global Internet.
1.5 Background of the Problem
Botnets which are controlled remotely by BotMasters can launch huge denial of service attacks, several infiltration attacks, can be used to spread spam and also conduct malicious activities . While bot army activity has, so far, been limited to criminal activity, their potential for causing large- scale damage to the entire internet is immeasurable . Therefore, Botnets are one of the most dangerous types of network-based attack today because they involve the use of very large, synchronized groups of hosts for their malicious activities.
Botnets obtain their power by size, both in their increasing bandwidth and in their reach. As mentioned before Botnets can cause severe network disruptions through huge denial- of-service attacks, and the danger of this interruption can charge enterprises big sums in extortion fees. Botnets are also used to harvest personal, corporate, or government sensitive information for sale on a blooming organized crime market.
1.6 Statement of the Problem
Recently, botnets are using new type of command-and-control(C&C) communication which is totally decentralized. They utilize peer-to-peer style communication. Tracking the starting point and activity of this botnet is much more complicated due to the Peer-to-Peer communication infrastructure.
Combating botnets is usually an issue of discovering their weakness: their central position of command, or C&C server. This is typically an IRC network that all bots connect to central point, however with the use of P2P method; we cannot find any central point of command. In the P2P networks each bots in searching to connect other peers which can receive or broadcast commands through network. Therefore, an accurate detection and fighting method is required to prevent or stop such dangerous networks.
1.7 Research Questions
a. What are the main differences between centralized and decentralized botnets?
b. What is the best and efficient general extensible solution for detecting non-specific Peer-to- Peer botnets?
1.8 Objectives of the Study
i. To develop a network-based framework for Peer-to-Peer botnets detection by common behavior in network communication.
ii. To study the behavior of bots and recognizing behavioral similarities across multiple bots in order to develop mentioned framework.
1.9 Scope of the Study
The project scope is limited to developing some algorithms pertaining to our proposed framework. This algorithms are using for decreasing traffics by filtering it, classifying intended traffics, monitoring traffics and the detection of malicious activities.
1.10 Significance of the study
Peer-to-Peer botnets are one of the most sophisticated types of cyber crime today. They give the full control of many computers around to world to exploit them for malicious activities purpose such as spread of virus and worm, spam distribution and DDoS attack. Therefore, studying the behavior of P2P botnets and develop a technique that can detect them is important and high-demanded.
Understanding the Botnet Command-and-Control(C&C) is a critical part in recognizing how to best protect against the overall botnet threat. The C&C channels utilized by the Botnets will often show the type and degree of actions an enterprise can follow in either blocking or shutting down a botnet, and the probability of success.
It is also obvious that attackers have been trying for years to move away from Centralized C&C channels, and are achieving some success using Decentralized(P2P) C&C channels over the last 5 or so years. Therefore in this chapter we have defined a classification for better understanding of Botnets C&C channels, which is included Centralized, Decentralized, and Hybrid model and tried to evaluate recognized protocols in each of them. Understanding the communication topologies in Botnets is essential to precisely identify, detect and mitigate the ever-increasing Botnets threats.
Before majority of botnets was using IRC (Internet Relay Chat) as a communication protocol for Command and Control(C&C) mechanism. Therefore, many researches tried to develop botnet detection scheme which was based on analysis of IRC traffic . As a result, attackers decided to develop more sophisticated botnets, such as Storm worm and Nugache toward the utilization of P2P networks for C&C infrastructures. In response to this movement, researches have proposed various models of botnets detection that are based on P2P infrastructure .
One key advantage of both IRC and HTTP Botnet is the use of central Command and Control. This characteristic provides the attacker with very well-organized communication. However, the assets also considers as a main disadvantage to the attacker . The threat of the Botnet can be decreased and possibly omitted if the central C&C is taken over or taken down . The method that is starting to come out is P2P structure for Botnet interaction. There is not any centralized centre for P2P botnets. Any nodes in P2P botnet behave as client and server as well. If any point in the network is shut down the botnet still can continue its operation.
The storm botnet is one of the main and recognized recent P2P botnets. It customized the overnet P2P file-sharing application which is based on the Kademlia distributed hash table algorithm  and exploit it for its C&C infrastructure. Recently many researchers specially in the anti-virus community and electronic media concentrated on storm worm [56,57].
2.2 Background and History
A peer-to-peer network is a network of computers that any computer in the network can behave as both a client and a server.
Some explanation of peer-to-peer networks does not need any form of centralized coordination. This definition is more comfortable because the attacker may be interested in hybrid architectures .
The table 2.1 shows a summary of some well-known bots and P2P protocols. The range of time from the first bots, EggDrop, until the Storm Worm P2P bot is newly released. The first non-malicious bot was EggDrop that came up many years ago, and we know it as one of the first IRC bots that came to market. GTBot that have many other categories is another well-known malicious bot, that its variants are IRC client, mIRC.exe.
After a while, P2P protocols have been used for Botnet activities. Napster is one of the first bot that used P2P as its communication. Napster built an platform that permit all bots can find each other and share files with each other in the network. In this bot, file sharing has been done in the centralized server that we can say it was not completely a P2P botnet. Therefore, all bots have to upload an index of their files to the centralized server and also if they are looking for other files among all bots, have to search in centralized server. If it can find any file that looking for, then can directly connect to that bot and download what they want. Nowadays, because Napster has been shutdown as their service recognized as illegal service, many other P2P service focusing on avoiding such finding.
After few years after Napster, Gnutella protocol came up as the first completely P2P services. Actually after Gnutellas , as shown in Table 2.1, many other P2P protocols have been released, such as Kademilia and Chord. This two new p2p service are using distributed hash table as a method for finding information in the peer-to-peer networks.
Agobot is another malicious P2P bot that came up recently and become widespread because of good design and modular code base . Nowadays many researchers are concentrating on P2P bots and there is an anticipation that P2P bots will reach to the stage that Centralized botnets will not been used any more in the future.
Table 2.1: P2P based Botnets
2.3 Peers-to-Peer Overlay Networks
Overlay networks are categorized into two categories: Structured and Unstructured. All nodes in first category can connect to most X peers regarding some conditions for identification of nodes that those peers want to connect. However in unstructured type there is not any specified limit for the number of peers that they can connect, in spite of the fact that there is not any condition for connecting to other peers. Overnet is a good example of structured p2p networks and Chorf is a good example of unstructured P2P networks.
2.3.1 Brief overview of Overnet
One of the popular file sharing networks is Overnet that use for their design use distributed hash table (DHT) algorithm that called Kademlia. Each node produces a 128-bit id for joining the network and also use for sending to other node for introducing itself. Actually each node in the network saves the information about other nodes in order to route query messages.
2.3.2 Brief overview of Gnutella
Gnutellas is a unstructured file sharing network. In this network, when a node like n want to connect to a node like m, use a ping message to inform the other node for its presence. As long as node m received ping message, then send it back to other nodes in its neighbor and also send a Pong message to the sender of ping message that was node n. this transaction among node let them to learn about each other.
2.4 Botnet Detection
In particular, to compare existing botnet detection techniques, different methods are described and then disadvantages of each method are mentioned respectively.
2.4.1 Honeypot-based tracking
Honeypot can be used to collect bots for analyzing its behavior and signatures and also for tracking botnets. But using honeypots have several limitations. The most important limitation is because of limited scale of exploited activities that can track. And also it cannot capture the bots that use the method of propagation other than scanning, such as spam. And finally it can only give report for infection machines that are anticipated and put in the network as trap system. So it means that it can not give a report for those computers that are infected with bot in the network but are not devoted as trap machines. So we can come to this conclusion that generally in this technique we have to wait until one bot in the network infect our system and then we can track or analyze the machine.
2.4.2 Intrusion detection systems
Intrusion detection techniques can be categorized into two categories: host-based and network-based solution. Host-based techniques are used for recognizing malware binaries such as viruses. A good example of this type is anti-virus detection systems. However, we know that anti-virus are good for just virus detection. The most important disadvantages of anti-virus are that bots can easily evade the detection technique by changing their signatures easily, because the detection system cannot update their databases consistency. And also bots can disable any anti-virus tools in the system to protect themselves from detection.
Network- based intrusion detection system is another method for detection that is used in the field of botnet detection. Snort and Bro are the two well-known signature based detection system that are used currently. They use a database as signatures of famous malicious activities to detect botnets or any other malware. Actually if our objective is using this technique for botnet detection, we have to keep updating the database and recognizing all malware quickly to make a signature of it and add to our database. For solving this solving this problem recently researchers are using anomaly based IDS that can detect malicious activities based on behavior of malware or detection techniques.
2.4.3 Bothunter : Dialog correlation-based Botnet detection
This technique developed an evidence-trail approach for detecting successful bot infection with patterns during communication for infection process. In this strategy, bot infection pattern are modeled to use for recognizing the whole process of infection of botnet in the network. All behavior that occur the bot infection such as target scanning, C&C establishment, binary downloading and outbound propagation have to model by this method. This method gathers an evidence-trail of connected infection process for each internal machine and then tries to look for a threshold combination of sequences that will convince the condition for bot infection .
The BotHunter use snort with adding two anomaly-detection components to it that are SLADE (Statistical payLoad Anomaly Detection Engine) and SCADE (Statistical scan Anomaly Detection Engine). SCADE produce internal and external scan detection warnings that are weighted for criticality toward malware scanning patterns. SLADE perform a byte-distribution payload anomaly detection of incoming packets, providing a matching non-signature approach in inbound exploit detection [32 ].
Slade use an n-gram payload examination of traffics that have typical malware intrusions. SCADE execute some port scan analysis for incoming and outgoing traffics. Actually BotHunter has a link between scan and alarm intrusion that shows a host has been infected. When a adequate sequence of alerts is established to match BotHunter's infection dialog model, a comprehensive report is created to get all the related events participants that have a rule in infection dialog . This method provides some important features:
i. This technique concentrates on malware detection by IDS-driven dialog correlation. This model shows an essential network processes that occur during a successful bot infection.
ii. This technique has one IDS-independent dialog correlation engine and three bot-specific sensors. This technique can automatically produce a report of whole detection of bot, as well as the infection of agent, identification of the computer that has been infected and source of Command and Control centre.
188.8.131.52 Bot infection sequences
Actually understanding bot infection life processes is a challenging work for protection of network in the future. The major work in this area is differentiating between successful bot infection and background exploit attempt. For reaching to this point analysis of two-way dialog flow between internal hosts and external hosts (internet) is needed. In a good design network which uses filtering at gateway, the threats of direct exploitations are limited. However, contemporary malware families are highly flexible in their ability to attack vulnerable hosts through email attachments, infected P2P media, and drive-by download infections .
184.108.40.206 Modeling the infection dialog process
The bot distribution model can conclude by an analysis of external communication traffics that shows the behavior of relevant botnet. Incoming scan and utilize alarms are not enough to state a winning malware infection, as are assumed that a stable stream of scan and exploit signals will be observed from the way out monitor .
Figure 2.1 shows the process of bot infection in BotHunter that used for evaluating network flows through eight stages. This model is almost similar with the model that Rajab et al. presented for IRC detection model. The model that they proposed has early initial scanning that is a preceding consideration happen in form of IP exchange and pointing vulnerable ports. Actually figure 2.1 is not aimed for a strict ordering of infection events that happen during bot infection.
The important issue here is that bot dialog processes analysis have to be strong to the absence of some dialog events and must not need strong sequencing on the order in bound dialog is conducted. One solution to solve the problem of sequence order and event is to use a weighted event threshold system that take smallest essential sparse sequences of events under which bot profile statement can be initiated . For instance, it is possible put weighting and threshold system for the look of each event in a way that a smallest set of event is important prior of bot detection.
220.127.116.11 Design and implementation
More attention devoted for designing a passive network monitoring system in this part which be able of identifying the bidirectional warning signs when internal hosts are infected with bots. Actually BotHunter is composed of some IDS components that fully inspect internal and external traffic flows and have a correlation engine that produce a good image of bot infection in the network. The point that is significant in this part is location of BotHunter that has to be at gateway of network that can fully receive all communication packets that take place between the internal hosts and the Internet. The IDS components are based on the snort which is an open source tools. This model use snort's signature engine.
2.4.4 BotMiner: structure independent Botnet detection
This model tried to more focus on distributed structures like peer-to-peer, however still is applicable for centralized structures such as IRC and HTTP. Actually this model is based on parallel correlation for its detection approaches that aiming to discover the machines that joined botnet. Based on the statistics , almost 53% of botnet activity commands in some way are associated to scan, the usually are used for dissemination purpose and almost 14.4% are related to binary downloading that are used for malware updating. The word controlled here actually is referring to bots that have to contact their command and control servers to get commands to perform their malicious activities. It means that there is communication between bots and command and control servers. Also the word Coordinated group means that several bots in the same botnet will execute similar communication and malicious activities. It means that if botherder orders every bot alone by using different commands and channel, then the bots are just unrelated infection machines.
This model has been designed based on the botnet definition that described above. BotMiner actually monitors two important activities: who is communicating with whom and who is performing what activities, and try to find a coordinated group pattern in both kinds of activities. This model categorizes identical communication activities in the C-plane( the traffic of command and control communication), categorizes identical malicious activities in the A-plane( activity traffic) and executing cross-cluster correlation to discover the computers that share identical communication patterns and malicious activity patterns. The well-known aspects of this model are:
i. Firstly this model has not design for special kind of protocols and is applicable for both centralized and decentralized structures, and more importantly this model do not need to know about command and control addresses or signatures of botnets.
ii. This model is independent of the content of command and control communication, and can very accurately classify identical command and control traffic patterns.
18.104.22.168 Some disadvantages
In order for a botherder to order their bots for some malicious activities, the existence of command and control channel that bots can receive commands are initiate attacks are investable. Actually the existence of command and control channel is necessary if bots want to form a botnet. The IRC protocol is an old kind of communication among bots. In this protocol, every bots have to log into IRC channel and look for commands from botherders. Still many botnets around the world are using IRC channel as a platform for communication.
Recently botnets have been started to use other kind of protocols for their communication. The reason that attacker changed their intrest from IRC to Http protocol is that Http protocol are allowed in majority of networks. However, centralized command and control structure is effective but they have serious disadvantages which is single point of failure. It means that if IRC server or web server that control botnets is shutdown because of some detection systems, the whole botnet will shutdown, because the botnet loses its command and control structures and become a group of isolated machines that don't have any power to attack as before.
Recently, botherder started to change their structure from centralized to decentralized to avoid the serious problem that centralized have. Therefore, peer-to-peer communication has been used recently for communication among bots. As an example Storm worm and Nugache are two well-known p2p botnets that recently came up and infected many machines around the world.
Based on the botnet definition that described previously (“the group of bots in the same botnet will show similar communication and malicious activities”) a botnet characterized by its command and control communication channel and malicious activities that they execute. There are some other kind of malware like worms that execute some malicious activities but we don't consider them as botnet, since they do not connect to command and control channel. At the other hand there are some IRC clients and P2P file sharing tools like Bittorent that have analogous communication pattern to botnets, however they do not execute malicious activities.
Actually members of same botnet, centralized or decentralized, are communicate with each other through the command and control channel. It means that despite of architecture of botnet, command and control channel play a key role in communicating of botnets. There are some rare cases that botnets don't need command and control channel, or better say they are isolated bost. This technique strongly is based on this point that bots in the same botnet perform similar synchronized activities.
22.214.171.124 Botnet architecture
Actually in this model two traffic monitoring have been installed at the edge of the network for inspection of traffic between internal and external network which is C-plane and A-plane. They are working in parallel to each other and monitoring network traffics. The first one which is C-plane is designed for logging communication flows in a format that can easily and simply analyze and store. The A-plane is accountable to discover suspicious activities that happen and commonly in each network, like scanning, spamming and exploit attempts. At the next step, the c-plane clustering and A-plane clustering components examine the logs that produced at previous stages. Actually both components pull out some information from the logs and then executing clustering algorithms to discover groups of hosts that show analogous communication and activity patterns. At the end, cross-plane correlator mixes the results of the c-plane and A-plane clustering and decides which compromised computers are the members of botnets. In a very perfect situation, the traffic monitors have to distribute on different area on the internet and then the logs that they produce are sent to central repository centre for further inspection and analysis.
126.96.36.199 Traffic monitors
C-plane Monitor. This component capture the network flows and records the useful information on the machines that have communication with each other. This component takes the advantage of fcapture tool. Each record has some information to record like source ip, destination ip. The main privilege of this tool is that it works at very high speed networks (new generation of networks are very high speed) and can produce very compact records for further inspection and processes by C-plane clustering module.
A-plane Monitor. This component logs information on each machine job to understand what they are doing. It mainly examines the traffics that going out from internal network and is able to discover malicious activities that the interior machines may execute. This component is capable to spot scanning activities, spamming, binary downloading and exploit attempts.
A P2P botnet is a network of compromised computers (i.e., bots) that are under the control of an attacker (i.e., a botmaster) through some command and control (C&C) channel and is completely decentralized. It typically contains thousands of bots, however some even have several millions of bots. Botnets are now used for many malicious activities such as DDoS attacks, spam and phishing. With the scale and the strength of attacks applicable by their united bandwidth and processing power, botnets and specially P2P are now measured as the main risk to Internet security. We analyzed the previous work on detection of P2P botnet and mainly analyzed Bothunter and Botminer that is more important in this area.
After examining many and various Botnets detection techniques, the analysis of literature and review of previous attempts to create botnets detection, the need for further research focusing on botnets and conditions that needs in higher education level is evident.
In this chapter, the main plan of how to develop the new framework for detecting botnet will be explained. It is quite important to outline the whole plan in order to make sure that the progress of the project will not divert from its main objective and the outcome can be achieved in desirable way.
3.2 Research Approach
There are two methods to evaluate whether the research result is compatible with the requirements or not which are Qualitative and Quantitative approaches.
3.2.1 Qualitative Approach
This approach consist a complete understanding of manners and the justification that control behavior. This approach is based on reason behind various aspects of behavior and actually these characteristics differentiate it from quantitative approach. It analyze the how and why of taking decision that if compare to qualitative research we can see what, where and when of decision making. Wikipedia.
3.2.2 Quantitative Approach
Quantitative method is facing with numbers and anything that is calculable. Consequently, they have to be distinguished from qualitative methods. Counting and measuring are common forms of quantitative methods. The outcome of the research is a figure, or a series of figures. These are frequently presents in tables, graphs or other form of statistics, Wikipedia.
Quantitative research is broadly used in both the natural and social sciences, including sociology, physics, psychology, biology, geology, education and journalism. Quantitative research is usually using technical methods which include, Wikipedia.
1- The creation of models, theories and hypotheses
2- The advance of instruments and methods for measurement
3- Experimental control and manipulation of variables
4- Gathering of experimental data
5- Modeling and study of data
6- Assessment of results.
There are some differences between qualitative and quantitative research approaches as shown in Table3.1.
Table 3.1: Qualitative and Quantitative Differences
Develops theory and tests
theory as well.
Describes meaning or discovery
Establishes relationship or causation
uses communication and
Uses unstructured data collection
Uses structured data collection
3.3 Research Strategy
In any research, gathering the necessary information, verifying requirements and method are very important steps to complete the project and get the expected results. The methods that researchers are usually used are:
1- Document retrieval method.
2- Comparative study method.
3.3.1 Document Retrieval Method
Document retrieval is explained as the corresponding of some confirmed user query in opposition to a set of free-text records. It is possible that mentioned records be any form of mostly unstructured text, real estate records or paragraphs in a manual. Wikipedia.
Document retrieval system locates information to a specified condition by matching text records (documents) against user demands, as opposed to expert systems that answer questions by inferring over a logical knowledge database. A document retrieval system includes a file of documents, a categorization algorithm to create a full text directory, and a user interface to access the directory. A document retrieval system has two main tasks :( Wikipedia)
1 - Find relevant documents to user queries.
2 - Evaluate the matching results and sort them according to relevance, using some algorithms.
3.3.2 Comparison Study Method
In comparative study, the items are specimens or cases which are alike in some features (it means that they are essential for comparison) however they could be different in some area. These differences become the focal point of examination. The goal is to find out why the cases are different, Wikipedia.
Comparative research's design is very easy. The comparative method does not need any earlier model or theory to start with. Therefore, it is a good method that researchers do investigative studies at first and then try to progress from the preliminary level of case study to a sophisticated and advanced level of theoretical invariance. In this project, the comparative studies based on literature reviews have been used. As we mentioned in chapter2, many similar frameworks and detection algorithms have been analyzed completely.
3.4 Overview of Methodology
Figure 3.1 display the structural design of our proposed botnet detection method which consists of four main components: Filtering, Application Classifier, Traffic Monitoring and Malicious Activity Detector. Filtering is responsible to decrease traffic flows. The main benefit of this stage is reducing the traffic workload and makes application classifier work properly. Application classifier is responsible for separating IRC and HTTP traffics from the rest of traffics. Malicious activity detector is responsible to analyze the traffics carefully and try to detect malicious activities that internal host may perform and separate those hosts and send to next stage. Traffic Monitoring is responsible to detect the group of hosts that have similar behavior and communication pattern by inspecting network traffics. Analyzer is a simple component that is responsible for comparing the results of previous parts (Traffic Monitoring and Malicious Activity Detector) and finding hosts that are common on the results of both lists.
IRC & HTTP traffics
The main benefit of this stage is decreasing the traffic workload and makes application classifier process more efficient. For this purpose many solution could be used. For example, We also can filter the traffic which destination IPs are well-known like Google, Yahoo, MSN, and many other famous web servers around world.
3.4.1 Application classifier
Classifying the traffic of network into diverse application is a complicated job and is an hard issue that many researchers are focusing on it. The classification of traffics mainly is based on port numbers, which was an efficient way before. However, because of increase in applications that uses tunneling through Http and also emergence of p2p, port numbers are no longer effective in this area.
Because recognizing suspicious HTTP and IRC traffic is quite simple we can first separate traffics which have these protocols and then send the rest of the traffics to the traffic monitoring stage.
3.4.2 Traffic monitoring
The log flow stage records all information related to headers of packets which we need. The information that we have to records from network flows in mainly depend on the behaviors that we want to monitor and the target that we are looking for. Based on some initial findings and evaluating similar works in this context we think that source IP, destination port, destination IP and mostly the amount of packets and bytes that are transferred might be a good option.
3.4.3 Malicious activity detector
It analyzes the traffics carefully and tries to detect malicious activities that internal hosts may perform. It is more sophisticated process in comparison to other stages. The system have to be able of discovering at least very famous malicious activities like scanning, binary downloading and spam.
This stage has to analyze completely the output of previous stages (traffic monitoring and malicious activity detector).
3.5 Limitation of Research
There are several limitations for this research. The most noticeable limitation relates to not access to sample of P2P based botnets. In order to inspect the behavior of botnet in network, we need to install P2P botnet in the network which due to high risk of its unexpected behavior, currently is impossible and even it is unlikely to access to the gateway of big network like university for analyzing the incoming traffics.
In this chapter we reviewed the proposed algorithm for P2P botnet detection. The important stage of this technique has been analyzed briefly in this chapter. Our proposed method is based on common behavior in network communication. It is an extendable technique which can be improved for finding Centralized botnet as well.
RESULT AND ANALYSIS
This chapter has been written in order to complete the related literature review and mentioned objective in chapter one. In addition, scenario method analyses with respect to related analysis been studied.
4.2 Botnet Detection Framework and Components
Our proposed a framework is based on common behavior in network communication. This model is based on the definition of P2P botnets that several bots inside the same botnet will demonstrate analogous command and control communication patterns and analogous malicious activity patterns. We share almost similar idea for definition of botnet as proposed by Gu et al. in Botminer , however many other researchers' detection model is based on the same definition of Botnet.
Figure 4.1 illustrates the structural design of our proposed Botnet detection system including of four core components: Filtering, Application Classifier, Traffic Monitoring, Malicious Activity Detector and one small component which is analyzer.
The main objective of this part is to decrease the traffic workload and makes the rest of the system perform more powerfully. The location that we have to filter out network traffics is at the gateway of network. Figure 4.2 show the proper location that our system has to use in order to filter out efficiently. We have to use this filtering system at the gateway of network.
In C1, we filter those traffics which targets (destination IP address) are recognized servers and will very rarely host botnet C&C servers. For this purpose we used the top 500 websites on the web (Http://www.alexa.com/topsites), which the top 3 are google.com, facebook.com and yahoo.com. In C2, we filter handshaking processes traffics (connection establishments) that are not completely established. Handshaking is the process of initial negotiation that automatically sets some parameters for a communication channel between two parties prior normal communication happen. It follows the physical establishment of the channel and precedes normal information transfer . A good example that usually we face with that in network is TCP protocol operations. To establish a connection, TCP uses a three-way handshake; in this case we filter out the traffics that TCP handshaking have not completed. Like a host sends SYN packets without completing the TCP handshake. Based on our experience these flows are mostly caused by scanning activities. Figure 4.3 illustrate the stages of our filtering system.
4.2.2 Application Classifier
Application Classifier is responsible to separate IRC and HTTP traffics from the rest of traffics and send them to Centralized part. For detecting IRC traffics we can inspect the contents of each packet and try to match the data against a set of user defined strings. For this purpose we use payload inspection that only inspects the first few bytes of the payload and looking for specific strings. These IRC specific strings are NICK for the client's nickname, PASS for a password, USER for the username, JOIN for joining a channel, OPER that says a regular user wants to become a channel operator and PRIVMSG that says the message is a private message. Using this strategy for detecting IRC traffic is almost simple, as shown in figure 4.4.
In next step, we also have to separate Http traffics and send to Centralized part. For this purpose we also can inspect the first few bytes of Http request and if it has certain pattern or strings, separate it and send it to centralized part. For detecting Http traffics we focus on concept of Http protocol. HTTP uses the client-server model: An HTTP client start a link and sends a message to an HTTP server (e.g. "Get me the file 'home.html'"); the server send back a reply message, frequently the resource that was requested("Here's the file", followed by the file itself). After sending the response, the server terminates the connection. 
In the format of Http request message, we are focusing on Http methods. Three common Http methods are “GET”, “HEAD”, or “POST”: 
i. A GET is the most familiar Http method; it says "give me this file".
ii. A HEAD request is alike to GET request, apart from it just asks the server to send back the reply headers.
iii. A POST request is utilized to send data to the server.
Therefore we inspect the traffics and if the first few bytes of an Http request contain “GET”, “POST” or “HEAP”, it's the indication of Http protocol and will separate those flows and send them to Centralized part, as shown in figure 4.5. After filtering out Http and IRC traffics, the remaining traffics that have the probability of containing P2P traffics are send to Traffic Monitoring part and Malicious Activity Detector. However in parallel we can use other approaches for identifying P2P traffics. We have to take into consideration that P2P traffic is one of complicated application type. Payload-based classification approaches customized to p2p traffic have been presented in [109, 110], while identification of p2p traffic through transport layer characteristics is proposed in . Our suggestion for using specific application or tools for identifying P2P traffics other than sending remaining traffics is use of BLINC  that can identify general P2P traffics. BLINC is based on recognizing host behavior at transport layer. This method has two well-known aspects. First, it can operate in a situation that there is not any access to packet payload and second there is not any need for knowledge of port numbers .
4.2.3 Traffic Monitoring
Traffic Monitoring is responsible to detect the group of hosts that have similar behavior and communication pattern by inspecting network traffics. Therefore we are capturing network flows and record some special information on each flow. We are using Wireshark which is an open source tool [ ] for monitoring flows and record information that we need in this part. Each flow record has following information: Source IP(SIP) address, Destination IP(DIP) address, Source Port(SPORT), Destination Port(DPORT), Duration, Protocol, Number of packets(np) and Number of bytes(nb) transmitted.
Table 4.1: Recorded information of network flows using ARGUS
Then we insert this information on a data base like Table 4.1, which are network flows. After this stage we specify the period of time which is 6 hours and during each 6 hours, all n flows that have same Source IP, Destination IP, Destination port and same protocol (TCP or UDP) are marked and for each network flow (row) we calculate Average number of bytes per second and Average number of bytes per packet:
* Average number of bytes per second(nbps) = Number of bytes/ Duration
* Average number of bytes per packet(nbpp) = Number of Bytes/ Number of Packets
Then, we insert this two new values ( nbps and nbpp) including SIP and DIP of the flows that have been marked into another database, similar to Table 4.2 . Therefore, during the specified period of time (6 hours), we might have a set of database, which each of these databases have same SIP, DIP, DPORT and protocol (TCP/UDP). We are focusing just at TCP and UDP protocols in this part.
Table 4.2: Database for analogous flows
As we mentioned earlier, the bots inside the same botnet have same characteristics. They have similar behavior and communication pattern, especially when they want to update their commands from botmasters or aim to attack a target; their similar behaviors are more obvious. Therefore, next step is looking for group of Databases that are similar to each other.
We proposed a simple solution for finding similarities among group of databases. For each database we can draw a graph in x-y axis, which x-axis is the Average Number of Bytes per Packet (nbpp) and y-axis is Average Number of Byte per Second (nbps). (X, Y)= (bpp, bps)
For example, in database (), for each row we have nbpp that specify x-coordinate and have nbps that determine y-coordinate. Both x-coordinate and y-coordinate determine a point (x,y) on the x-y axis graph. We do this procedure for all rows (network flows) of each database. At the end for each database we have number of points in the graph that by connecting those points to each other we have a curvy graph. We have an example, figure 4.6, for two different databases based on our assumption that their graphs are almost similar to each other.
Next step is comparing different x-y axis graphs, and during that period of time (each 6 hours) those graphs that are similar to each other are clustered in same category. The results will be some x-y axis graphs that are similar to each other. Each of these graphs is referring to their corresponding databases in previous step. We have to take record of SIP addresses of those hosts and send the list to next step for analyzing.
4.2.4 Malicious Activity Detector
In this part we have to examine the network traffics that going out from the network and try to detect the potential malicious activities that the internal machines are performing. Scanning and spamming are the two malicious activities that usually and mainly botnets are doing in the network. However, botnets can perform many other malicious activities.
Botnets use Scanning for propagation of some malware like worm and virus and also as a platform for DOS attacks. As we can see in figure 4.7 in many areas around the world scanning is performing. There has been little work on the problem of detecting scan activities. Most scan detection has been based on detecting N events within a time interval of T seconds. This approach has the problem that once the window size is known, the attackers can easily evade detection by increasing their scanning interval. Snort are also use this approaches. Snort uses two preprocessors for this issue. The first is packet-oriented, concentrating on detecting abnormal packets used for “secrecy scanning” by tools such as nmap . The second is connection oriented. Snort suffers from the same drawbacks as Network Security Monitor (NSM)[ 62] since both rely on the same metrics .
Other work that focused on scan detection is by Staniford et al. on Stealthy Probing and Intrusion Correlation Engine (SPICE) . SPICE is focusing on detecting sneaky scans, especially scans that spread across several source IP addresses and execute at very low rates.
An important need in our system is quick response; however reaching to our target in detecting malicious scanners is a difficult task. Another solution is also using Threshold Random Walk (TRW) , an online detection algorithm which is based on sequential hypothesis testing.
After assessing different approaches for detecting scanning activities, the best solution for using in this part is Statistical sCan Anomaly Detection Engine (SCADE) . In figure 4.8 we can see the architecture of SCADE in Bothunter framework.
Inbound Scan Detection (ISD): In this part SCADE has focused on detection of scan activities based on ports that are usually used by malware. One of the good advantages of this procedure is that it is less vulnerable to DOS attacks, mainly because its memory trackers do not sustain each external source IP addresses. SCADE here just tracks scans that are targeted to internal hosts. The bases of Inbound Scan Detection are on failed connection attempts. SCADE has two types of ports: High-Severity (hs) ports specified for highly vulnerable and commonly exploited services and low-severity (ls) ports. For make it more applicable in current situation SCADE focused on TCP and UDP ports as high-secure and all other as low-secure ports. There are different weights to a failed scan attempt for different types of ports.
The warning for ISD for a local host is produced based on an anomaly score that is calculated as based on this formula:
: indicate numbers of failed attempts at high-severity ports.
: shows numbers of failed attempts at low-severity ports.
Outbound Scan Detection (OSD): SCADE in this part has three anomaly detection module which work together and follow all outbound connection per internal host:
• Outbound scan rate (s1): Detects local hosts that perform high-rate scans for many external addresses.
• Outbound connection failure rate (s2): Detects unusually high connection fail rates, with sensitivity to HS port usage. The anomaly score s2 is calculated based on this formula
Fhs: indicate numbers of failed attempts at high-severity ports.
Fls: shows numbers of failed attempts at low-severity ports
: is the total number of scans from the host within a time window
• Normalized entropy of scan target distribution (s3): A consistently distributed scan target model provides an indication of a possible outbound scan. “It is used an anomaly scoring technique based on normalized entropy to identify such candidates” :
H: is the entropy of scan target distribution which
m: is the total number of scan targets
: is the percentage of the scans at target
188.8.131.52 Spam-related Activities
E-mail spam, is the practice of sending useless email messages, in large scale to an indiscriminate set of recipients. More than 95% of email on the internet is spam, which most of these spams are sent from botnets. As shown in figure 4.9, there are many malicious activities that related to email such as phishing email, spam email and spoofed email.
A common approach for detecting spam is the use of DNS Black/Black Hole List (DNSBL) such as http://www.dnsbl.info/dnsbl-list.php. DNSBLs specify a list of spam senders' IP addresses and SMTP servers are blocking the mail according to this list. This method is not efficient for bot-infected hosts, because legitimate IP addresses may be used for sending spam in our network.
Creation or misuse of SMTP mail relays for spam is one of the most well-known exploitation of botnets. As we know user-level client mail application use SMTP, as shown in figure 4.10 for sending messages to mail server for relaying. However for receiving messages, client application usually use Post Office Protocol (POP) or the Internet Message Access Protocol (IMAP) to access the mail box on a mail server. Our idea in this part is very simple and efficient. Our target here is not recognizing which email message is spam,
Cite This Dissertation
To export a reference to this article please select a referencing stye below: