This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
"Spam is the abuse of electronic messaging systems (including most broadcast media, digital delivery systems) to send unsolicited bulk messages indiscriminately."
Preventing Spam requires anti - spam techniques to be implemented by the user as well as the administrator of a network. They are generally embedded in products as a software to ease the burden on users and administrators. Anti - Spam techniques are implemented by users, administrators, senders of email as well as researchers and law enforcement agencies.
The techniques implemented on the user's end are generally elementary and many a times are simply guidelines or good practices that help in avoiding spam. Some of these guidelines are to share email addresses among trusted groups, posting anonymously to avoid email address harvesting, avoid responding to spam. Apart from these guidelines various software techniques are also used such as the use of disposable email addresses, and passwords to authenticate unknown senders, etc.
The techniques used by email administrators are of utmost importance. They use specialized software systems to prevent spam from reaching the user. They use various technologies in the software such as - Challenge/Response Systems, E-mail Authentication, Blacklisting, Filtering, Tarpits, etc. Some of these methods are based on statistics whereas some use machine learning. Most of the system software combines 2 or more of these technologies to optimally filter out spam. This report focuses on those concepts of anti-spam systems that are implemented in software..
Types of Spam
Although spam most widely occurs in e-mail, they also occur in other media. Some of them are discussed below.
SPIM (Instant messaging spam)
Spam that targets users of instant messengers such as Yahoo messenger and Windows live messenger is known as SPIM. Instant messenger's store a directory of users along with their demographic information. This information is obtained by spammers who may send spam or even viruses to the victims.
Newsgroup spam pre-dates e-mail spam. In newsgroup spam, Usenet newsgroups are the target of spam. The attacker posts multiple messages on the newsgroup. The newsgroup manager must delete these messages in order to keep the spam in check.
Spamdexing is also known as search engine spam. Search engines use various algorithms to determine relevancy ranking. Unrelated phrases are repeated by the attacker to manipulate the relevancy of resources indexed by search engines, making the indexing mechanism useless. The search engine then shows those results that are directed by the spammer.
In mobile spamming, the attacker sends unsolicited commercial advertisements as sms to recipients of mobile users.
Internet Forum spam
An internet forum is an online discussion site. The attacker posts unwanted messages on the forum with the help of spambots.
Anti SPAM TECHNOLOGIES
There are two ways in which software's handle spam. In the first method, e-mail from sites known to send spam is rejected (Blocking). In the other, the software automatically analyses the content of the e-mail and filters out spam (Filtering). Filtering software's usually make use of machine learning algorithms to detect spam.
There are various technologies that are used in any anti - spam software. Some of the most important technologies are explained below. Some anti- spam software's make use of a particular technology; while some integrate multiple technologies to better facilitate false negatives and improve the efficiency of the software.
Here are a few technologies that are prominently used in anti-spam software systems:
Challenge/Response is integrated in most of the anti-spam software's. Incoming mail can be of 3 types
Mail that is surely spams.
Mail that is surely not spams (ham).
Mail you are not sure of being either spam or ham.
The mail in this intermediate zone is an ideal candidate for Challenge/Response. The underlying working is simple, when an e-mail comes from an unknown sender, the email is held and a challenge is sent to the sender to confirm that the person is an authentic sender and not a spammer. On receiving the challenge, the sender responds to the challenge and authenticates himself. This process has to be done only once. The next time, the sender does not have to authenticate before sending an email. Spammers usually use a forged e-mail address to send spam and hence they will not get a challenge to respond to. Even if a forged e-mail address is not used, spammers have to respond to many challenges (as spam is sent in bulk), thus increasing the overhead of the spammer. There are a few issues while using this technique,
It requires the sender to authenticate for the first time.
Legitimate emails containing a registration key, receipts and shipping notices will be blocked.
In an email authentication system, a pool of sites is maintained. All the sites in this pool that authenticate themselves are allowed to send emails and listed on the Domain Name Server. Email authentication assumes that all the sites that have authenticated are trustworthy and will not leak out information. Reverse DNS lookup of the connecting IP address is an email authentication scheme wherein we find the domain name of the incoming emails IP address using reverse DNS and lookup the corresponding IP address of the domain in the DNS. If a match is found, then the email is legitimate else it is spam.
A spammer cannot falsify and forge records belonging to another domain and does not have access to any of the connections between external DNS server (i.e. not belonging to the spammers domain).
DNS Based Blacklists
Domain Name System Blacklist's (DNSBL) is a software mechanism that allows a website administrator to block messages from specific systems that have a history of sending spam. It uses the Domain Name Server to publish a list of IP addresses linked to spamming. The blacklist is maintained on the experience, i.e. if spam is received from a specific domain that server is blacklisted by the software, and all the messages from this server would be rejected by all the sites that make use of DNSBL. The three basic components that make up a DNS Blacklist are
a domain name to host it under
a server to host that domain
a list of addresses to publish to the list
DNS Blacklists can vary greatly from one to the other. The choice of a DNSBL depends on the specific security need of the user. Hence, a certain DNSBL might be optimal for a certain user, but might not work well for another. The credibility and trustworthiness of a DNSBL's depends on the services provided by it. DNSBL's can be classified as being strict or lenient. An example of a strict DNSBL is a one that not only blocks a single IP address, but an entire ISP that harbors spam. DNSBL's can also be classified as being automatic and manual. An example of an automatic DNSBL is a one that lists sites for a specific amount of time from the date the last spam was received. Less lenient lists might allow more spam to get through, but will have fewer false negatives. On the other hand, lists that have stricter guidelines will allow less spam through, but may have more false negatives. There are a few issues while using this technique,
An anti-abuse mechanism needs to be in place.
The abuse has to be reported
An abuse can be wrongly reported.
Email filters check the content of the email and depending on the fuzzy logic implementation; decide whether an email is spam. They tend to give false negatives when they filter out newsletter or bulk email that the user has subscribed to. This problem can be easily solved by setting up special privileges for domains that the user subscribes to.
Content filters analyze the headers, subject and body of the email to determine whether it is spam. They make use of Bayesian filters which analyze outgoing email to learn the characteristics of a particular organization such as common words, subject and recipient servers. By learning outgoing mail, the filter can better understand what type of incoming email is authentic and which spam is. Filters compare keyword combinations in incoming and outgoing messages to determine a pattern for authenticity.
Spammers send messages in bulk; hence all of them will be identical with small variations. Checksum based filtering basically finds similarity in an incoming message. They take out everything that has similarity and reduce the message to a checksum. This checksum is checked in a database and if it is found, the mail is flagged to be spam. There are a few issues while using this technique,
It takes a lot of time to figure the pattern of mails that are coming in as spam.
We also need to store each of these patterns; hence e-mail filtering is time and space consuming.
There is a problem of outdated data.
This is a software system which resides on the server. It is basically used for slowing down the response to the client's request. It does this by responding slowly towards clients commands. When a legitimate sender sends an email, this time delay will not make a difference to him, however due to the large volume sent by spammers, the spammers will feel the effect. Another method delays connection only to known spammers by using the blacklist method. Yet another method adopts greylisting wherein, the server rejects the first connection attempt from any previously-unseen IP address. When the server rejects a connection attempt, a legitimate sender would keep on trying whereas a spammer would not. Hence only a legitimate email would eventually pass through the server.
Honeypot is an interesting concept. Honeypots are an imitation of a mail transfer agent (MTA) which gives the appearance of being an open mail relay or TCP/IP proxy server which gives the appearance of being an open proxy.
Spammers who probe these open relay/proxies will get caught and reveal their information and their source. Now once we have the spammer's details, they can be combined with blacklisting to block out known spammers. Basically honeypots are traps set up for the spammer.
In spamtraps, "invalid" email addresses are generated and placed on the internet such that spammers can find it, but legitimate senders cannot. This can be done by embedding these email addresses in the source HTML code.
Let's use the email address "firstname.lastname@example.org" as an example. This email is placed on the web such that spammers can find it but legitimate senders cannot. When the spammer sends mail with the destination address of "email@example.com" the SpamTrap knows this is a spammer and flags them.