Parsing Operations Based Approach Towards Phishing Attacks Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Currently, web attacks are the so popular attacks of cyber crime. Generally phishing attacks, SSL attacks and some other hacking attacks are kept into this category. Security against these attacks is the major issue in internet security.

This paper presents an approach of parsing operation analysis of web URLs to provide the security against web attacks. This methodology is based on various parsing operation which uses many techniques to detect the phishing attacks as well as other web attacks. This approach is completely based on the browser operation and also affects the speed of browsing. This approach includes some DB-generated query operation, detection operation of the URL details and etc. Using proposed methodology, a new browser easily detects the phishing attacks, SSL attacks, and some other hacking attacks. With the use of new browser, we can easily achieve 98.14% security against web attacks.

In Current scenario, cyber crime is a popular and major issue over the internet. These crimes can easily be defined as criminal activity that include illegal access of data, illegal interception of data, eavesdropping of unauthorized data, an information technology infrastructure , data interference( which includes unauthorized damaging, deletion, deterioration, alteration or suppression of computer data),Unethical access of web services , Disturbance of social-peace, systems interference (interfering with the functioning of a computer system by inputting, transmitting, damaging, deleting, deteriorating, altering or suppressing computer data), misuse of devices, forgery (ID theft), and electronic fraud.[1][4]

Cyber crime issues have become high-profile, particularly those surrounding hacking, copyright infringement, child pornography and child grooming.

In the field of internet security, phishing is the most popular web attack. Phishing can be defined as the criminally fraudulent  process of attempting to acquire sensitive user information(such as usernames, passwords) and other confidential information(like security key and credit card or debit card details , master card details) by masquerading as a trustworthy entity in an electronic communication. Communications purporting to be from popular social web sites, auction sites, online payment Gateway or IT administrators are commonly used to lure the unsuspecting public. Phishing attacks are typically carried out by e-mail or instant messaging and they often direct users to enter details at a fake website whose look and feel are almost identical to the legitimate one. Even when using server authentication, they may require tremendous skill to detect that the website is fake. Phishing is an example of social engineering techniques used to fool users, and exploit the poor usability of current web security technologies, to break the security system of many web services, to access many authorized information unethically.[8]

Security of a system depends upon the following properties: Confidentiality, Authenticity, Integrity, and Non-Repudiation that constitutes the acronym "CAIN". [12]

In this document, we are proposing the new technique for stopping phishing attacks by introducing the concept of parsing the web-URL before visiting the URL(Uniform Resource Locator) .Multi parsers are used for multiples multiple operation to detect the phishing attacks. Here in this methodology the browser will be more participating in the process of detecting the phishing attacks.

Related Work

Many techniques and algorithms had been developed and implemented for prevention of phishing and to secure the thefts of confidential information (usernames, passwords, security key, credit card /debit card/master card details).But there are also some issues are remaining on this matter.

Many techniques and schemes are proposed to provide a secure environment for e-banking services, e-commerce services and payment gateway services and to block the sniffing, eavesdropping etc. So that transmission of the confidential information will be preserved and unauthorized personnel can't access that information. 

But day by day, phishing attacks are increasing. While most phishing attacks target the financial transaction website (Banking site, e-commerce, e-shopping website, payment gateway websites), more and more phishing incidents targeting online game operators and large ISPs (internet service provider) have also been discovered.

There have been technical approaches (e.g. toolbars) and training approaches (e.g. tips) to mitigate phishing. The anti-phishing toolbars are web browser plug-ins that warns users when they reach a suspected phishing site (An anti-phishing approach that uses training intervention for phishing websites detection). Anti-phishing tools use two major methods for mitigating Phishing sites. The first method is to use heuristics to check the host name and the URL for common spoofing techniques. The second method is to use a blacklist that lists phishing URLs. The heuristics approach is not 100% accurate since it produces low false negatives (FN), i.e. a phishing site is mistakenly judged as legitimate, which implies they do not correctly identify all phishing sites. The heuristics often produce high false positives (FP), i.e. incorrectly identifying a legitimate site as fraudulent. Blacklists have a high level of accuracy because they are constructed by paid experts who verify a reported URL and add it to the blacklists if it is considered as a phishing website. [1][4][8]

Detecting and identifying phishing websites in real-time, particularly for e-banking is really a complex and dynamic problem involving many factors and criteria. Methods like improving site authenticity, one time password, having separate login and transaction password, personalized e-mail communication, user education about phishing are being implemented to prevent phishing attacks. Many phishing detection and prevention tools are not 100% secure.

Proposed Methodology

Before proposing a new browser based methodology against phishing attacks, we are aware of this fact that most of time phishing websites are new registered domains and they have some identical portion of the real website domain. Here we propose query based analyzer approach for the above that is based on the above facts and some other facts also.

Our methodology uses some knowledge base which contains the information about previous blacklisted web domain for the particular user. Using the knowledge base, detection of phishing attack is also performed.

In proposed methodology web-URL is parsed into various parsers to detect the phishing attacks. Proposed browser based approach, follow the few steps which are as follows:

Initially web URL is parsed into parser-A. During the parsing operation, if parser-A find 4 or more dots (.) letters in the web URL then it generates a pop-up alert box for the URL address, because URL can be a phishing website URL.

This parsing step is based on the fact, that phishing attackers use the some fraction of the actual URL to generate the phishing URL with the combination of some dots(.) letters, But this is not always true for each phishing website. So proposed browser methodology also follow some other steps to detect the phishing attacks and provide a secure platform for the transmission of information and confidential data over the internet.

After completing the parser-A operation, URL is parsed into parser-B. This parser is used to get the other details of the URL (like year of domain registration, rating of the domain, popularity of the domain etc.).Using those details parser-B declares the URL is phishing website URL or actual website URL.

After the above 2 operations, URL enters into parser-C. Operation of parser-C is db-generated query operation. Parser-C uses the fact that the web-URL is already visited by that specific user then it will be maintained in the history database of web browser of that user. During the parsing operation, it generates a query to find the trusted zone status of that particular URL. If the URL is already present in trusted zone for that user, then it will declare the URL as a safe and secure URL otherwise it will declare the URL as first visited URL.

This trusted zone db of the URLs can be different for the different user. So this db is completely dependent upon the website status which is already specified by the particular user.

After the finishing the operations of parser A,B,C ;URL enters into parser-D and parser-D is more analytical parser which analyze the URL and also title-tag content of the URL and finds other URLs whose pattern are like the analyzed URL, Compare all URLs using the URL details( like year of domain registration, rating of the domain ,popularity of the domain etc.) and display the results on the browser screen before redirecting to the web page. Parser-D also uses some information which is already analyzed with the help of parser-B.

Figure 1: Diagrammatic Representation of Parsing Operation of the URL.

Implementation and Results

We have implemented the proposed methodology in a new browser "AP browser" with the help of Java programming, Java network APIs, and using some web scripts. 'AP browser' stands for Anti phishing browser. We can implement this methodology with some new add-ons to install in present web browsers (like other firefox add-ons).

We have analyzed the URL visited with the help of new browser. The new browser provides 98.14 % security against phishing attacks and some hacking attacks. We have not implemented our proposed methodology for the during Dec,2009 and Jan,2010 but implemented during Feb,2010 to April,2010.

The following table data represents the recorded activities of the Web URLs in the other browser and in the new 'AP -beta version 1.0 browser' towards the phishing attacks and some hacking attacks.

Table 1: URL and some Web Attacks Analysis







No. of URLs visited






Phishing Attacks






Detected phishing attacks with the browser






SSL Attacks






Detected SSL attacks with the browser






Some other Hacking attacks






Detected Hacking attacks with the browser






Conclusion and Limitation

Our proposed methodology is inspired by a problem with a large number of Phishing, SSL and other web attacks, we have encountered. We have recorded the web URLs activities of with the usage of proposed methodology and without usage of proposed methodology over 5 months. From data, we have analyzed the attacks and detected attacks over the time. The experiment results provide the complete scenario of the problem and security over the web. Our system indicated that the 98.14% security over the browsing. Table 1 represents the recorded data over the 5 months time period.

Limitations of the proposed method are that due to various parsing operations, its time complexity and space complexity is higher. So many times, it increases the browsing time of web browser. Due to slower speed of browsing, generally web users avoid this type of higher web security.