This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
Abstract- Google Hacking uses the Google search engine to locate sensitive information or to find vulnerabilities that may be exploited. This paper tells how Google Hacking works and how serious the threat of Google Hacking is. This paper discusses the countermeasures to reduce the risk of becoming a victim of this form of information leakage. This paper assess the seriousness of information disclosure using Google Hacking and make recommendations of what can be done to defend against Google hackers.
Keywords- Information security, web security, Google hacks, ethical hacking, google hacking database.
Wikipedia defines Google Hacking as "the art of creating complex search engine queries in order to filter through large amounts of search results for information related to computer security. In its malicious format it can be used to detect websites that are vulnerable to numerous exploits and vulnerabilities as well as locate private, sensitive information about others, such as credit card numbers, social security numbers, and passwords. This filtering is performed by using advanced Google operators."
Google hacking involves using the popular Google search engine to locate sensitive online information which should be protected but is not. Attackers can use Google Hacking to uncover sensitive information about a company or to uncover potential security vulnerabilities. A security professional can use Google Hacking to determine if their websites are disclosing sensitive information. In this paper, we assess the seriousness of information disclosure using Google Hacking and make recommendations of what can be done to defend against Google hackers.
How Google Works
Google uses automated "spiders" or Googlebots to crawl the web and find documents to add to its searchable index. Google grabs a copy of the document and files it away. When one enters a Google query, Google returns a results page with entries that list the name of the site, a summary of the site, the URL of the actual page, the size of the page, a cached link that shows the page as it looked when the Googlebot last visited the page, and a link to pages that have similar content.
Google's search results are dynamic. When a query is submitted through Google's web interface, Google takes the user to a dynamically created results page that can be represented by a single URL that will appear in user browser's address bar. By clicking on either the name of the site or the URL, the user is taken directly to the actual page located at the host's site (if it still exists). In doing so, the web server hosting the page will most likely log the hostname (or IP address) of the machine requesting the page. This is obviously something a malicious hacker would not want. So, rather than accessing the actual page on the targeted web server, one can simply access the copy of the page that Google stored when it last crawled the page. This can be done via the cached link at the bottom of the entry. As a result, the malicious hacker who wants to remain anonymous can access the page without ever visiting the actual site.
Google grabs most of the pages it crawls, but omits images and some other space consuming media. When user view Google cached pages by simply clicking on the "cached link" on the results page, he ends up connecting to the target's server to get the rest of the page content. This might identify the hackers Google Hacking to the target website. When &strip=1 parameter is added to the URL then Google returns only crawled content and user is not connect to the target's server to get any information.
A system administrator might decide to prevent access to a certain part of the site by moving it, protecting it with a password or simply shutting down the server. What administrators often do not realize is that the information that they are trying to protect may still exist on Google's servers and can be accessed through cached pages. This allows hackers to view data on websites that had been removed.
Using Advanced Operators
Search reduction is the key to successful Google hacking.Crafting advanced queries to return pages containing very specific content can be done by using Google's advanced operators.
We'll focus on the various basic operators available:
The plus sign (+) is used to force a search for an overly common word. The minus sign (-) is used to exclude a term from a search.
To search for a phrase, the phrase is surrounded by double quotes (" ").
A period (.) serves as a single-character wildcard.
An asterisk (*) is used represent any word.
Google advanced operators help refine searches. Advanced operators, just like basic Google operators, become part of a Google query and have a very specific syntax that must be followed. Popular advanced operators that are used in Google hacking queries include intitle, inurl, filetype, site and link. Unfortunately, Google does not provide a full list of all of their advanced operators, but by combining the operators that are listed on Google's help page for advanced operators at http://www.google.com/help/operators.html and using the Advanced Search page (available from Google's home page) to see how URL query strings are formed, one can see some advanced operators at work. Advanced operators use a syntax such as the following:
The site: operator instructs Google to restrict a search to a specific web site or domain. The web site to search must be supplied after the colon.
The filetype: operator instructs Google to search only within the text of a particular type of file. The file type to search must be supplied after the colon.
The link: operator instructs Google to search within hyperlinks for a search term.
The cache: operator displays the version of a web page as it appeared when Google crawled the site. The URL of the site must be supplied after the colon.
The intitle: operator instructs Google to search for a term within the title of a document.
The inurl: operator instructs Google to search only within the URL (web address) of a document.
By using the basic search techniques combined with Google's advanced operators, hackers can perform information-gathering and vulnerability-searching using Google. Here we discuss a few techniques.
1) Site Mapping :
To find every web page Google has crawled for a specific site, the site: operator is used. Consider the following query:
This query searches for the word microsoft, restricting the search to the http://www.microsoft.com web site. Google searches not only the content of a page, but the title and URL as well. The word microsoft appears in the URL of every page on http://www.microsoft.com. With a single query, an attacker gains a rundown of every web page on a site cached by Google.
There are some exceptions to this rule. If a link on the Microsoft web page points back to the IP address of the Microsoft web server, Google will cache that page as belonging to the IP address, not the http://www.microsoft.com web server. In this special case, an attacker would simply alter the query, replacing the word microsoft with the IP address(es) of the Microsoft web server.
2) Finding Directory Listings :
Directory listings provide a list of files and directories in a browser window instead of the typical text-and graphics mix generally associated with web pages. These pages offer a great environment for deep information gathering.
Locating directory listings with Google is fairly straightforward. The above figure shows that most directory listings begin with the phrase Index of, which also shows in the title. An obvious query to find this type of page might be intitle:index.of, which may find pages with the term index of in the title of the document
Several queries provide more accurate results:
intitle:index.of "parent directory"
intitle:index.of name size
Fig.1 Example for Directory listing
These queries indeed provide directory listings by not only focusing on index.of in the title, but on keywords often found inside directory listings, such as parent directory, name, and size. Obviously, this search can be combined with other searches to find files of directories located in directory listings.
Google Hacking Database
According to the Johnny Long's Google Hacking Database , there are roughly fourteen categories of Google hacks. This paper looks at five of them: Error Messages, Open Directories, Documents & Files, Network Devices, and Personal Information Gathering.
Error messages provide a wealth of information. Developers use these error messages to pinpoint where their code has gone wrong. Unfortunately for web administrators, error messages that are open to the world provide that information to those who know how to look for them. Database error messages can provide information like usernames, passwords, and server names. Here is an example of a MySQL error messages that tell the Googler the username for a MySQL database.
"Warning: mysql_connect(): Access denied for user: '*@*" "on line" -help -forum
Fig. 2 Example for an Error Message
Google's web-bots crawl pages in a site that a web administrator may not want to be catalogued. Most sites stop users from browsing their directory structure, but not all websites are setup correctly.
Directory browsing allows someone to see all the files in a web server. Much of the important company information is stored on its server directories. Leaving those directories accessible for outsiders can compromise the entire company's line of defense and make hackers' lives way too easy.
A search of intitle:"index of" returns a list of sites that allow directory browsing. Not only does it give a potential hacker access to all files, many times index pages reveal information like the operating system and web server software. This information gives a hacker a roadmap to which vulnerabilities may the company have.
For example, a simple Google search like
intitle:"index of" + solutions
potentially give students access to solutions. Adding a site search parameter (site:some_university.edu), they will be able to obtain a solution manuals for a particular department potentially allowing then to cheat on class assignments.
One of the most popular hacking techniques used within directory listings is the "directory traversal" technique. This technique refers to modifying parts of the originally found URL in order to access other directories on the server. These may not be accessible to direct Google searches.
For example, if one has found a relative URL /cs/accounting/admin/jerryb, he can start getting rid of parts of the original URL in order to access parent directories such as admin or accounting, or one could replace some parts of the URL with potential directories names, such as hr. Such information might be used by hackers to attack the university. This technique should be used by penetration testers to determine whether sensitive company information is being exposed on the web.
Documents and Files
1) Office Documents:
Website administrators do not always think of how a search engine will crawl their site when they build it. Companies may store sensitive information, such as financial reporting or human resources documentation, on their websites in spreadsheets. By searching Google using this simple query
site:some_website.com intitle:index.of .xls, one can find Microsoft Excel files stored within directory listings. This information should not be publicly obtainable through a simple Google search.
2) WS_FTP Logs:
Another source of information is log files. By default, WS_FTP creates a WS_FTP.log on the web server. This file contains a wealth of sensitive information such as: usernames, file directories, file names, times of file uploads/downloads, web server usage information. This information can save hackers a lot of time in their attempt to attack a company's website. WS_FTP.log files contain information about file transfers to and from FTP servers.
3) Source Code :
A source code of a computer program can contain large amounts of sensitive information. Source code can show how the system was implemented and how the database is accessed. Code can contain passwords, server names, database tables and field names, and directories. Programmers may backup their code by making copies of their files with extensions such as .bak, .bak2, or .bak3. Web servers may contain pages like MyCode.asp.bak. These code files may be retrieved from the web server. Web servers display a page based on the file extension.
The web server has no idea how to display these backup files, and will display them as a plain text. That means that all of the code is now exposed to the user, perhaps revealing sensitive information.
One can find much more that just documents on the Internet. There are also many types of devices, interactive environments, collaboration tools, and social networks. Devices accessible through the Internet are a very popular target for hackers. Being able to control printers, web cameras, and network routers can be useful to plan an attack on a company. It is important that penetration testers understand those threats and protect companies against them.
To provide convenience to its employees, companies may put hardware devices online. With the increase in telecommuting, this is happening more and more. There are countless devices online, and the Google Hacking Database provides users with queries to find them.
1) WebCams :
The first type of device that rookie Google hackers will attempt to find is webcams. Simple searches like
camera linksys inurl:main.cgi
reveal web pages that have Linksys web cameras. Webcam information may not seem very interesting, considering that webcams themselves are designed to be shown on the web. Some webcam owners put their devices online but do not share the URL for the device, except with a certain set of people. This security through obfuscation does not hold up very well with Google. The Google bots crawl all accessible pages indiscriminately. One specific webcam found allows the user to control the camera's direction, tilt, zoom, and display size.
For example that when a webcam at a construction website is found, it will show so much detail one can read the license plate numbers.
Fig. 3 Fully Controllable WebCam
2) Routers and Firewalls:
Routers and hardware firewalls are connected to the Internet are to allow remote administration. These devices are almost always password protected by system administrators. Unfortunately, some companies keep the default login and password. This information is easily found by using these Google queries.
intitle:"Main page - SmoothWall Express"
intitle:"Smoothwall Express" inurl:cgi-bin "up * days"
Google uses the information in the title of the SmoothWall Express firewall client to find the administrative login pages for the device.
3) Network Printers :
Finally, network printers are also available online. Many of these are password protected, but often they are available to anyone.
Personal Information Gathering
1) Email Address Harvesting :
A simple search like, site:website.com + @, will return all web pages that have the @ sign on the page. This query gives a spammer a legal means to gather countless email addresses.
While the Google Terms of Service prohibit users from using tools that will automatically query websites, one can create a simple program that will use a simple Google query to return a list of pages that have email addresses. Using screen scrapes and regular expression, this kind of program can be written in no time.
For example, once can run a simple telnet program and use the GMAIL servers to validate email addresses.
open gmail-smtp-in.l.google.com 25
MAIL FROM: <email address>
2) Shipment Tracking Information :
In the past years, online shipment tracking systems have become very popular. People enjoy checking the status of their shipments online in real time. But how secure is that information?
Consider one tries searching for UPS tracking information using the following Google query
site:ups.com intitle:"Ups Package tracking" intext:"1Z ### ### ## #### ### #" posted on the Johnny Long's Google Hacking Database. This query will bring back a substantial amount of pages with tracking information for UPS packages that are currently in transit. This information can be used to track all incoming UPS packages for a selected address, perhaps to steal a package. This is very vulnerable for hacking.
Google Automated Scanning
Any user running an automated Google querying tool (with the exception of tools created with Google's extremely limited API) must obtain express permission in advance to do so. Google allow restricted automated access through their Google SOAP Search API service. Developers must first agree to the terms and conditions of the service, and then they will be e-mailed a license key to their registered Google account. Developers can then write software applications that connect remotely to the Google SOAP Search API service. This communication is performed via SOAP - an XML-based mechanism for exchanging typed information.
1) GooScan :
GooScan is a UNIX (Linux/BSD/Mac OS X) tool that automates queries against Google search appliances (which are not governed by the same automation restrictions as their web-based brethren). For the security professional, gooscan serves as a front end for an external server assessment and aids in the information-gathering phase of a vulnerability assessment. For the web server administrator, gooscan helps discover what the web community may already know about a site.
2) Googledorks :
The term "googledork" means "An inept or foolish person as revealed by Google." After a great deal of media attention, the term came to describe those who "troll the Internet for confidential goods." The term googledork conveys the concept that sensitive stuff is on the web, and Google can help you find it. The official googledorks page lists many different examples of unbelievable things that have been dug up through Google by the maintainer of the page, Johnny Long. Each listing shows the Google search required to find the information, along with a description of why the data found on each page is so interesting.
3) GooPot :
"A honey pot is a computer system on the Internet that is expressly set up to attract and 'trap' people who attempt to penetrate other people's computer systems."
GooPot, the Google honeypot system, uses enticements based on the many techniques outlined in the googledorks collection. In addition, the GooPot more closely resembles the juicy targets that Google hackers typically go after. Johnny Long, the administrator of the googledorks list, utilizes the GooPot to discover new search types and to publicize them in the form of googledorks listings, creating a self-sustaining cycle for learning about and protecting from search engine attacks. To learn how new attacks might be conducted, the maintainers of a honeypot system monitor, dissect, and catalog each attack, focusing on those attacks that seem unique.
Google Hacking is well documented and easy to learn. It is very important for security professionals to protect their companies against Google Hacking. To protect a site against Google Hacking, one need to establish a solid security policy of what information can be put on the web. Security professionals should perform Google Hacking against their website to check for sensitive information disclosure. There is no 100% protection against Google Hacking, but strong policies and testing can improve the security of a site.
Security professionals need to learn Google Hacking to provide a good level of protection for their sites. One can start using some of the automated Google Hacking tools. This will automate the hacks, ensuring that every single page within your site is protected. Automated tools allow for periodic security checks with frequency that is simply impossible to achieve with manual hacks.
1) Protecting against Google Hackers :
The following list provides some basic methods for protecting oneself from Google hackers:
Keep your sensitive data off the web! Even if one think he is only putting data on a web site temporarily, there's a good chance that he'll either forget about it, or that a web crawler might find it. Considering more secure ways of sharing sensitive data, such as SSH/SCP or encrypted email will the best method.
Googledork! Use gooscan from http://johnny.ihackstuff.com to scan site for bad stuff, but first one must get advance express permission from Google. Security professionals should check the official googledorks web site on a regular basis to keep up on the latest tricks and techniques. Thus Security professionals should use google automated tools to ensure security.
Consider removing site from Google's index. The Google webmasters FAQ provides invaluable information about ways to properly protect and/or expose a site to Google. From that page: "Please have the webmaster for the page in question contact us with proof that he/she is indeed the webmaster. This proof must be in the form of a root level page on the site in question, requesting removal from Google. Once one receive the URL that corresponds with this root level page, he will remove the offending page from our index." In some cases, one may want to remove individual pages or snippets from Google's index. This is also a straightforward process that can be accomplished by following the steps outlined at http://www.google.com/remove.html.
Use a robots.txt file. Web crawlers are supposed to follow the robots exclusion standard. This standard outlines the procedure for "politely requesting" that web crawlers ignore all or part of a web site. One must note that hackers may not have any such scruples, as this file is certainly a suggestion. The major search engine's crawlers honor this file and its contents. For examples and suggestions for using a robots.txt file, one may visit http://www.robotstxt.org.
While Google Hacking does not necessarily follow the standard definition of hacking, it can prove just as fruitful. By using Google, one can gain access to information that may otherwise be hidden. The information that one gather using these hacks will allow him to gain access to systems or devices. Security professionals can address the problem of Google Hacking in a manner similar to addressing other security issues. They can use Google Hacking to test their Web sites for sensitive information disclosure. They can educate employees concerning what information should not be put on the Internet. It's important that Security professionals need to learn Google Hacking to provide a good level of protection for their sites.