Privacy in online social networks

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

1. Social Networking - a niche phenomenon to mass adoption

Social Networking has sky rocketed in the recent years from a novel phenomenon used by a small group of people, to a mass adoption where people share information with each other publicly. Facebook, a popular social networking site which is ranked second, has approximately registered around 350,000,000 users, active and passive users put together. The purpose of each social networking site differs and according to the Social Software Weblog ( [1], they can be divided into nine broad categories which includes Common Interests, Business, Dating, Face-to-Face, Facilitation, Friends, Pets and Photos.

People using a social networking service try to stay in touch with their friends and also to find new friends but often with a lot of strangers. A social networking service is supposed to facilitate healthy relationships amongst friends and most times between it also allows a user to meet strangers. Unfortunately this invites lots of potential risks which are exploited by the adversaries with high accuracy and efficiency. This tendency of people to welcome the unknown makes the users vulnerable to spam. Hence the problem is born from these unsolicited communications.

Depending on the taste and the interests of the user the variety of attacks grows to a huge list. A user might be attracted to gaming and might become vulnerable to those kinds of malware attacks. Some get attracted towards the opposite sex and get deluded by the misleading words, especially in dating websites. A lot of these attacks are via email, Malware and various kinds of Web attacks. Web attacks may even be a simple redirection to a different website, which goes unnoticed by the user, prompts the username and password and this causes and causes forcible authentication failure. Therefore the user might end up providing personal information. These kinds of attacks not only ends up in extracting sensitive information from the user but also results in Denial of Service (DOS). The birth of these kinds of simple but highly intimidating problems signifies that the social networking service users need a strong grasp of the radicals.

Spammers normally use some attractive means of projecting themselves to delude the users. Here largely the novice users are targeted. And not surprisingly some of the experienced users also fall into the trap. Thus users get carried away and accept them into their personal network. After entering a user's personal network the spammer can pop in a lot of unnecessary stuff and unexpected advertisements. These can only be prevented by the user himself by making a proper decision in identifying the unknown spammer as Spam.

Other kinds of attacks may include some external agent using the information posted by the user publicly. With the advent of several scripting languages like Perl, a lot of website hacking can be performed. So the public information of a user becomes exposed to an extent with which the adversary can impersonate the user to his friends and might also end up extracting some information about the user's friends. So it is at the user's risk that the user has to post his personal information publicly. In spite of the identity of the users may not be visible, and only the overall network is available to the adversary, the adversary can still manipulate the friends of the user by playing with graphs and eliminating the impossible combinations [4]. So just by hiding the identity of a user, these kinds of attacks cannot be addressed.

This paper is steered towards analyzing various kinds of privacy threats and how the users become vulnerable to basic Spam, Malware and Email attacks. It also discusses the solutions suggested by experts and will conclude by providing the inferences from the research work performed by experts.

2. Neighborhood Attacks

There was a presumption that if the social network data [4] is released with anonymous individuals i.e. not revealing their identity were good enough to prevent the adversary from manipulating the network. That was proved false by Bin Zhou and Jian Pei [4]. The below example was taken as a motivation example and they based their work on this example.

If the adversary has even some local knowledge about the individual vertices present in the social network, he may be able to predict the neighbors of that particular node almost accurately with some graph manipulations. Bin Zhou and Jian Pei considered the example shown in Figure 1 [4] and they claimed the following. This is an example of a piece of a social network of friends. Here each vertex (Ada, Dell, Ed, e.t.c) represents a person and the edges represent the links that the person has. If an edge exists

between two vertices then it signifies that those two vertices are friends and they are a part of each other's network. The question the authors had was that if the identities of the vertices are removed as shown in Figure 1b, will it be enough to prevent the adversaries from manipulating the friend's network? The answer was NO and rightly so. The authors believe if the adversary has some knowledge about an individual then just removing the identities and publishing the social network information will not solve the purpose. The privacy information may still be leaked [4]. In this example, if the attacker knows prior that 'Ada' has two close friends who are known to each other and another set of two friends who are directly not connected as illustrated in Figure 1c, then Bin Zhou and Jian Pei state that the adversary will be able identify the vertex 'Ada' since no other vertex has the same kind of 1- neighborhood graph. Similarly Bob can also be identified in a similar way.

The authors believe these kinds of intrusions which involve identifying the individuals from the released social network put the privacy of the users in danger. These kinds of inferences made by the adversary can be used by the adversary efficiently to garner other useful information about other vertices to base his attack. In fact in this example the authors believe that if the adversary knows that Bob and Ada are friends, then he can also conclude that Bob and Ada have one friend in common 'Cathy'. Likewise the generic idea of the network gets revealed which can be utilized by the adversary.

To prevent these kinds of simple and easily available information to the adversaries, one way suggested by L. Sweeney [6] was the k-anonymity model which assures that the adversaries cannot identify the individual in an anonymized social network with a probability higher than 1/k, where k is the user - specified parameter carrying the same spirit in the k-anonymity model. This involves adding a false link between the selected pair of nodes, in this example, a false link between Harry and Irene. As a result the 1 - neighborhood graph of the vertices lose their uniqueness which prevents the adversary from arriving at concrete conclusions so easily. Though the adversary can still derive information about the vertices, the authors conclude that he can only do it with a success probability of <= ½.

Bin Zhou and Jian Pei [4] state that anonymizing social networked data is difficult than anonymizing relational data for the following reasons:

a. Modeling the background knowledge of adversaries and social network attacks.

b. Measuring the information loss in anonymizing the social network.

c. Finally anonymizing the social network data is much more challenging than anonymizing the relational data in the form of tables. Moreover changing labels and edges may affect the neighborhoods of other vertices and the properties of a network.

The authors formulated the problem as the following

a. Identification of Privacy information to be preserved

b. Modeling the background knowledge that an adversary may use to attack the privacy of the social network.

c. The usage specification of the published data so that the anonymization method can try its best to retain the utility in the process of preserving privacy.

By considering all the above mentioned parameters an Anonymization method [4] was devised which involved a couple of steps:

1. Extraction of neighborhoods of all the vertices in the social network. This was done to compare the neighborhoods of several vertices and to propose a neighborhood component coding technique.

2. Greedily organize vertices into groups and anonymizing the neighborhoods of vertices in the same group.

This model proposed was then examined empirically using synthetic data sets and a real data set. The authors conclude by stating that the scope of improvement from this point would be is to protect d- neighborhoods (d>1) and to consider some analogous mechanisms like l-diversity in introducing the anonymity [7].

3. Spam and Malware Attacks - an overview

A spam classification is highly subjective as it is up to the user's discretion on how he decides whether the information is relevant or not. Vaughan-Nichols noted that spam is almost impossible to define. For some emails related to gaming are spam and for game freaks it's the opposite. Usually the spam could be a simple email spam which might result in unwanted redirects to malicious websites, causing forceful authentication failures resulting in the user revealing the private information and enforcing Denial of Service (DOS).

Steve Webb et al [5] believe that the new generation of spam in the field of social networking is the spam profiles which attempts to manipulate user behavior. So these malicious spam profiles are inserted into the social network by the adversaries to learn information about the private and public variables of the user and to learn a pattern of the user activity. The authors believe that the spam profiles can also redirect an innocent user to web spam pages and can be used to infect other users with malicious information thus disrupting the quality of the community based knowledge.

According to Paul Heymann et al [8] the social web sites have 4 radical characteristics which are crucial for spammers. They are One Controlling entity, Well-defined interactions, Identity and Multiple Interfaces. He also classifies the anti-spam strategies into three main categories as shown in Figure 2 as Detection based, Demotion based and Prevention based. Detection based strategies attempt [8] to identify the presence of spam and try to remove it or reduce its performance as much as possible.

Demotion based strategies attempt to lower the ranking of spam in the ordered lists. Thirdly prevention based strategies attempt to reduce the intensity of the spam or may be prevent it by changing the interfaces in an unpredictable way and limiting user actions thereby preventing the user from falling into the trap of spammers.

3.1 Social Honeypots

Steve Webb, James Caverlee and Calton Pu introduced Social Honeypots for tracking and monitoring social spam [5]. Based on their analysis they reckon that the behaviors of social spam exhibit recognizable temporal and geographic patterns [5]. Based on these conclusions they believe that the social Honeypots will be able to identify social spam automatically. Basically honeypot profiles are created and introduced into the social network to attract spammer activity. By doing this we are able to learn and analyze the characteristics of spam profiles. Stave and team actually conducted an experiment of creating 51 honeypot profiles and distributed them across different geographic locations in MySpace. Based on the information collected over a four month evaluation period they came up with the following findings:

a. The behaviors of spam profiles follow temporal patterns.

b. Most popular spamming targets are in the Midwest

c. 52.7 % of the spam profiles obtain their "About me" content from other profiles.

d. Spam profiles use a lot of redirection links.

Steve Webb, James Caverlee and Calton Pu also state that spam profiles are logged into the social network for a long time. By doing this the spam profiles make themselves identified by the social network as the search mechanisms of various social networks works that way. This way the spam profiles are prominently displayed in the network [5]. The second strategy is to send out friend requests to several users which the authors consider as an aggressive approach.

The social honeypots were designed by the authors in such a way that all of the honeypot profiles were similar except for their geographic locations. Some attractive features like relationship status as single, Body type as athletic e.t.c were an integral part of all the honeypot profiles to attract spam profiles. The authors conducted this experiment in Hence the results are based on this particular social networking site. But this can be generalized to other social networks as well as the underlying concept is the same. The honeypot profiles will have to be logged on to the social network all the time in order to efficiently attract spammers. In order to increase the likelihood of attracting spam special MySpace bots were used [5] to make sure that all the honeypot profiles were logged in 24 by 7.

The MySpace bots more importantly were checking for any new friend requests received. If a friend request was received the spam profile is downloaded by the MySpace bots which also stores a copy of the spam profile information along with a timestamp. The bots then reject the friend request for the following two reasons [5].

1. To identify spam profiles those are repeat offenders (profiles which send out requests until they are accepted).

2. For an obvious reason. The honeypot profiles should not look like a spam profile.

If the honeypot profiles start accepting the friend requests from the spam profiles then in a way they are helping the spam profiles to propagate into the network. Hence to avoid such a trivial situation the authors decided to devise honeypot profiles in such a way that they will be storing the spam profile information and will eventually be rejecting the entire friend request from the spam profiles.

Basically most of the spam profiles are built in such a way that their "About Me" section is displayed in an attractive way or in an inquisitive manner which might invite the innocent to visit those pages. So it usually contains a link to a web page and not surprisingly those links are outside the domain of the social network. These might not be just a single link but might involve a number of redirections to other malicious websites.

The authors have designed the bots in such a way that it will investigate those links apart from storing the profile information. So the bots crawl the web pages displayed by the spam profiles in the "About Me" section. This is done immediately after the bots store a local copy of the spam profile. Once the spam profile is stored the bots parse the "About Me" section and extract the URLs [5]. Followed by this the bots also crawl the web pages associated with those URLs. Once the crawling is done the bots store the information about these web pages along with the profile information of that particular spam profile.

The URLs published by the Spam profiles are supposed to have a lot of redirections. In order to track this information the bots were designed in such a manner that they will dive to that last link in the chain of redirections which will further not have any more redirections. So that is the ultimate location in the web the spam profiles wants the user to be at. The bots reach that particular end point and crawls the JavaScript information and stores the redirections involved. Thus by doing this the bots end up storing a copy of the profile information of the spam profiles along with the crawled web page information of almost all the redirects involved.

With all the collected information over the period of four months by the MySpace bots the authors were able to narrow down their research of automatically detecting spam profiles. They were able to infer related information like temporal distribution of spam friend requests and came to know that most number of friend requests was received around the Halloween, Thanksgiving and Columbus days. This is for an obvious reason that on holidays a lot of users will be using the social network and the spammers can have a lot of prey.

Steve Webb et al also noticed that after a certain period of time the friend request started to drop down. There is still a speculation that the spammers got a clue of the honeypots as their entire friend requests were denied. Current research [5] is focused towards revisiting the decision of rejecting all the friend requests. They were also able to infer some of the spam profile examples.

Based on the experiment conducted by the authors [5] they classify the spam profiles into the following broad categories.

1. Click Traps - As the name suggests the profile contains a background image which will redirect the user to a malicious web page

2. Friend Infiltrators - This kind of spam tries to establish friendly connections with as many profiles as possible spam through every available communication system.

3. Pornographic Storytellers - these profiles are said to have links which redirect the users to pornographic sites

4. Japanese Pill Pushers - They depict a woman who claims that her boyfriend purchased some male enhancement pills at a very low cost and says that if the user acts immediately he can also do the same and thereby inviting the user to purchase those pills.

5. Winnies - All of these profiles are found to have the same name Winnie and they all redirect the user to a web page with pornographic pictures of those particular women.

3.2 Anti-Spam Strategies

According to Paul Heymann, Georgia Koutrika and Hector Garcia-Molina [8] the Anti-Spam strategies can be classified into three broad categories as Identification-based, Rank-based and Interface or Limit based.

3.2.1 Identification based

The authors suggest that [8] identification based anti spam strategy works firstly by identifying the spam manually or using pattern recognition followed by the system interfaces accounting the likely spam to either delete it completely or to alert the user of a spam like activity. These methods treat the spam agents to be objects and attributes associated with those objects. For example if the spam is a web spam then the web pages are considered to be objects and the objects might be the links or any other metadata present in the web page. The identification can be done manually or automatically with trained set of objects which look at source analysis or text analysis or behavior analysis. Source analysis tries to identify the objects contributors [8], text analysis deals with investigation of the text content and finally the behavior analysis looks at the users associated with the objects. There are several ways to manually report spam as well. The authors also state that this list is not comprehensive.

3.2.2 Rank based

Rank based methods are aimed to provide better results from a search. Search engines are supposed to provide k number of results and the authors believe that if those k results are optimized with spam free links then the spam can be mitigated to a great extent. So a rank based algorithm can be applied [8] to determine the popular web pages that will be returned by the search agent.

3.2.3 Interface or Limit-based

As the name suggests the system is designed to make some details [8] hidden to the spammers or by limiting automated interaction. The authors state that this kind of anti spam methodology can be implemented in two ways. Interface - based and Limit based. Interface based methods try to prevent the user as much as possible not to publish any content which might attract spam activity. Limit based systems try to control the number of times an action can be performed by a user by either applying social norms or by charging the user to perform a particular action.

Heymann et al believe that this might not be a comprehensive solution because even though there might be limits imposed on resources; the adversaries will eventually come to know about the limits and they might come up with a different strategy of attack. They might create several spam profiles and each spam profile can perform that malicious activity within the imposed limits. The malicious users can also sometime post good content into the social network making it extremely difficult for the spam identifiers to detect spam. Moreover the term SPAM has a lot of definitions which adds to the difficulty in identifying spam.

Having all such problems in identifying spam Heymann et al have proposed some spam models for identifying spam in a social network. They are Synthetic Spam model and trace driven spam model [8].

Basically all the methods that were discussed deals with controlling the user activity individually. It can also be extended to the community level. Suppose if a user is a part of some community then that user can be attacked by a spammer who can attack that community. Since the range of networking has increased from user level to group level, imposing secure communications has become increasingly difficult. There are more ways for a spammer to attack a social network than the number of solutions available to prevent spamming. In general it depends a lot on the user's activity and every user is expected to take basic security measures while accepting friend requests and while navigating through the links.

4. Inferences and Conclusion

The privacy problem complicates as more and more data is posted publicly in a social network. It basically starts from the user's ignorance and unfortunately a single user's ignorance might affect a whole community of users. The methods discussed above have been proposed by various scientists deal with introducing some kind of an Anti spam strategy which will help the users to operate in the social network in safe mode or by introducing Social Honeypots which will end up attracting the spam profiles and will eventually help in identifying the spam profiles automatically.

As the variety in spamming is very diverse I believe the existing methods might be helping to fight against the spam for some time. There could be a better way of identifying false emails other than by just sending confirmation emails. Suppose if the email is, then it would be better if the email service provider in our case is "def" maintains the user activity and has a track of the recent user activity and an overall monitoring of the activity of the user. Then the social network can communicate with the email service provider to check out if the email used to sign up into the social network is acceptable or not. I reckon it is okay to ask the user to use an email address which has had some previous activity or which was created some time back. This might prevent the spammers from being easily creating a spam email id and entering the network.

As always prevention is better than cure. The first step would be to avoid letting spam profiles sign up in to the network. The next level would be is to avoid the users in allowing the spammers access their related information.

The social network administration plays a huge role in admitting new user sign ups. I would suggest that the restrictions imposed by the social networks currently is insufficient enough to easily let spam enter the social network. There could be more anti spam strategies devised and rather than just receiving and manipulating the messages from spam profiles, the anti spam agents should also flood out certain kind of check messages into the network and try to analyze the nature of the messages to catalyze the automatic spam detection process.

The Anti spam strategies proposed by Paul Heymann, Georgia Koutrika and Hector Garcia-Molina all deal with tackling spam after the spam has entered the social network. I would suggest if more work is put into preventing the spam from entering into the social network. The social Honeypots suggested by Steve Webb et al are aimed at identifying a pattern of social spam profiles and end up automatically detecting spam profiles which I think is a good direction to steer our research in the fight against spam in social networks.

As many scientists have proved, I reckon that Social network data is not as simple as traditional relational data when it comes to maintaining privacy. Hence as scientists also believe, I would suggest a lot of research work should be focused towards creating spam templates and predicting spam activity and eventually preventing spam from entering into a social network.

5. References

[1] Ralph Gross, Alessandro Acquisti, H. John Heinz, III Information revelation and privacy in online social networks. In Proceedings of the 2005 ACM Workshop on Privacy in the Electronic Society (Alexandria, VA, USA, November 07 - 07, 2005). WPES '05. ACM, New York, NY, 71-80.

[2] Elena Zheleva ,Lise Getoor To join or not to join: the illusion of privacy in social networks with mixed public and private user profiles. In Proceedings of the 18th international Conference on World Wide Web (Madrid, Spain, April 20 - 24, 2009). WWW '09. ACM, New York, NY, 531-540.

[3] Jack Lindamood ,Raymond Heatherly ,Murat Kantarcioglu ,Bhavani Thuraisingham Inferring private information using social network data. In Proceedings of the 18th international Conference on World Wide Web (Madrid, Spain, April 20 - 24, 2009). WWW '09. ACM, New York, NY, 1145-1146.

[4] Bin Zhou Jian Pei Preserving Privacy in Social Networks Against Neighborhood Attacks.In Proceedings of the 2008 IEEE 24th international Conference on Data Engineering (April 07 - 12, 2008). ICDE. IEEE Computer Society, Washington, DC, 506-515.

[5] Steve Webb, James Caverlee and Calton Pu in Social Honeypots: Making Friends with a Spammer Near You," presented at the Fifth Conference on Email and Anti-Spam (CEAS 2008), August 21-22, 2008, Mountain View, CA.

[6] L. Sweeney, 'K-anonymity: a model for protecting privacy," international Journal of uncertainlty, Fuzziness and Knowledge-based System, vol. 10, no. 5, pp. 557-570, 2002

[7] A. Machanavajjhala et al., L-diversity: Privacy beyond k-anonymity, in ICDE'06.

[8] Heymann, Paul and Koutrika, Georgia and Garcia-Molina, Hector (2007) Fighting Spam on Social Web Sites: A Survey of Approaches and Future Challenges. IEEE Internet Computing, 11 (6). pp. 36-45.