The way people communicate

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.


In recent years social networks has changed the way people communicate with each other. The growth of social network has created lot of opportunities for researchers and business to learn people behaviors, patterns and interests. However, social networks have created many privacy concerns to the users. In this paper I will be discussing about various privacy concerns caused by inferring sensitive attributes of the user from users profile, groups and communities which the user is linked to in the social network. The paper will also discuss various current proposals to deal with these issues.

1 Introduction

According to [9] Social networking sites (SNS) can be defined as a service, which allows users to create a public or semi public profile within a bounded system, and to create connections with users and communities or groups, and allows viewing and traversing all the viewable profiles. To enter a SNS user need to create a profile, which is used to uniquely identify the user in a network [9]. The user can customize the profile by putting pictures, adding videos and providing other information like interests etc. Usually friend connections and joining to various communities/groups like university are using a request and accept technique. SNS like Facebook, Orkut and MySpace have millions of users. This tremendous growth of social networking sites have created large amount of information that can be used by researchers for various studies and business for learning patterns and interests of the targeted market. The popularity of the social networks has made it a good platform for attacker as well.

Recently, the there have been a lot of attacks on such SNS [10]. One of the major attacks on such sites is on privacy of users. Privacy can defined as the ability of the user to control the amount of information that one wants to reveal about oneself. Through various inferences technique attackers are able to find sensitive information of user which one don't want to reveal. Privacy attacks on social networks sites can be broadly classified into two types [2]: - Identity disclosure [2] and attribute disclosure [2].

In identity disclosure the attacker is able to identify the user profile in the SNS to real world entity [2]. Identity disclosure is usually done by linking the information the attacker obtains form the social networks to a global reference like postal information or some other repository.

In attribute disclosure the attacker is able to identity the attribute of user like gender, political interests or educational background from the user profile which user don't want to reveal [2]. In this attack the attributes which user has hidden from public viewing can be inferred using learning techniques [2]. Though the attribute is hidden in the user profile, the learning algorithms can uses the friendship links and group or community membership to reveal those attributes.

In this paper I will be discussing about the attribute disclosure through inference.

The paper is organized as follows. Sections 2 discuss the background information. Section 3 contains related works in SNS. Section talks about various mechanisms for inferring the user's attribute. Section 5 talks about various proposed solutions for defending against this attack. Section 6 talks about future works and conclusion.

2 Background

SNS are represented as graph G=(V, E, H) where V is the set of nodes which represents a user profile, E is the set of edges that represent the friendship links, and H is the set of groups or communities which the user is attached [2]. This paper follows the nomenclature used by paper [2] for explaining the various inference techniques, which is described below. The paper [2] uses classifies user profile into private profile and public profile [2]. Private profile represents a user, which have sensitive attributes, which is unknown or kept hidden by the user [2]. Public profile are the ones in which the attribute information is known or observed [2].

The social network data can be obtained either by crawling the social network or from the data that is released publically by the service provider. A simple crawling algorithm will be to pick a user profile and then look for all his friends and add the information and links to a database recursively [7]. The paper [2] talks about various crawling techniques and its pros and cons.

3 Related works

One of the major research areas in the field of SNS has been concentrated on protecting the privacy of published data. There has been a number of papers which talks about ways to anonymize the privacy of the user .The paper [13,14,15] talks about it is possible to identity a user even if the data is modified using conventional anonymization techniques.

Another area of research has been the effect of spam's on SNS. The paper [11] discusses about various strategies spam detection, demotion and prevention. The paper [12] discuss about a technique to recognize if a sender who is sending friend request is spam or not.

4 Inference techniques

The sensitive attribute inference technique can be classified into three main categories:-

Attacks without links and groups information, attacks using link information and attack using both link and group information [2,3,5,6].

4.1 Attacks without links and groups

In this attack only the attribute values of the nodes are used. No link or group information.


In the BASIC model the probability of a sensitive attribute value is estimated as the fraction of observed users who have that sensitive attribute value which can be represented as [2].

PBASIC (vs.a = ai; G) = P (vs.a = ai|Vo.A) =|Vo .ai |/|Vo|

where |Vo .ai | is the number of public profiles with sensitive attribute value ai and

|Vo| is the total number of public profiles [2].

This method will always pick the most probable value among the attributes seen in the profiles [2]. So it can have a bias to predominant attribute values while predicting a particular attribute. However, experiments [2,3] shown that these attacks are only as good as random guess [2,3].

4.2 Privacy attacks using links

Link-based privacy attacks use both the node information and the link information [2]. It uses the correlation between the nodes that are linked to each other [2]. The paper provides a couple of models for inferring the sensitive attribute. Experiment results [2,3] have shown that these attacks perform better than attacks without link information.

Friend-aggregate model (AGG)[2]

The AGG models uses the sensitive attribute distribution of the friends that are directly linked to the node whose value is to be inferred [2]. The sensitive attribute value can be estimated as [2]

PAGG(vs.a = ai; G) = P (vs.a = ai|Vo.A, E) = |Vo'.ai|/| Vo' |

where Vo' = {vo ? Vo|?(vs,vo) ? E} i.e. number of all the directly connected friends and Vo'.ai = {vo ? Vo'|vo.a = ai} i.e number of directly connected friend with attribute ai [2].

This model picks the most probable attribute value among the friends that are directly linked to the nodes whose value is inferred [2]. If the friend's attribute values are diverse, the prediction won't be accurate.

Collective classification model (CC) [2]

In CC model it infers and learns from private profile as well as public profiles [2]. The method uses some approximation algorithm like iterative classification algorithm (ICA), Gibbs sampling etc to infer the private profiles attribute .For example the ICA algorithm first assigns a label to each private profile based on the attributes of the friends with public profiles, then it iteratively re-assigns labels considering the labels of both public and private-profile friends [2].

Flat-link model (LINK) [2]

In this model it uses adjacency matrix to represent the graph where each row in the matrix corresponds to a user profile [2]. And each row, which represents a user, has a list of binary features equal to the size of the network. Each feature has a value of 1 if the user is a friend with the person who corresponds to this feature, and 0 otherwise [2]. And the user instances are also given a class label, which is known, or unknown depending on whether the user is public or private [2]. The user instances that are public profiles are the training data, which can be fed to any traditional classifier like Naive Bayes, logistic regression or SVM [2]. Then the learned model can then be applied to predict the private profile labels [2].

Blockmodeling attack (BLOCK) [2]

In this model the basic idea is that the users form natural clusters or blocks and link probability between two users is the same as the link probability between their corresponding blocks. BLOCK probability can be estimated as [2]

PBLOCK(;G)=P(|Vo.A,E,?)= sim(?i,?(v))/Z

where ?i is the vector of all link probabilities between a block and every other block., ?(v) is the link probabilities between a user v and a block , sim() is a vector similarity function and Z is a normalization factor [2].

For predicting whether the attribute of a user, it first looks for the block to which the user belongs and then predict the attribute based on that selected block [2].

Bayesian inference [3]

In this model to predict the attribute of a node it creates a Bayesian network from the node's social network and uses a Bayesian inference to obtain the probability of the node having a certain attribute value [3]. The network could be a single hop network where only the direct friends are considered or multi hop network, which uses friends of friend's information also [3].

4.3 Privacy attacks using links and group information

This model of inference uses group information in addition to the link information between the nodes [2]. Usually members of the same group will share common interests .So group can provide lot of information to help in the inference process[2].For example if the user is part of group "Fall 2009" this could provide additional information which might help to predict his age better. The experiments [2] showed that group information give better accuracy if the group information is relevant to the attribute being inferred [2].

Groupmate-link model (CLIQUE)[2]

This model treats' a group membership as a friendship link [2]. And if there is at least a single group that is shared between two users it creates a friend ship link between them [2]. This allows using any of the algorithms for the privacy attack using links to be used for inference [2]. However, it fails to account the strength of the relationship between two friends based on the number of common groups between them [2].

Group-based classification model (GROUP) [2]

The group-based classification approach contains three main steps [2].In the first step, the algorithm performs feature selection: it selects the groups that are relevant to the node classification task[2]. Since every group may not provide additional information for referring a particular attribute [2]. In this step it determines all groups that are relevant for inferring a particular attribute [2]. In the second step, the algorithm learns the model by training a classifier by taking all the relevant groups of the public nodes as features [2]. In the third step, the classifier returns the predicted sensitive attribute for each private profile [2].

5 Prevention Techniques

These section talks about various proposed techniques that can be used to prevent such attribute inference attacks. One of the options is to falsify or hide user social relations in a SNS [2]. Other option is to configure the settings so that attacker is given very limited information about a user and his social relations, so that the problem of predicting an users attribute becomes hard [2].

5.1 Falsifying or hiding information

The paper [3] talks about different kinds of technique for falsifying or hiding information. These techniques select attributes or friend relationship that needs to be hidden/falsified selectively [3]. The techniques are

Selective hiding attribute value (SHA)[3]: In this technique the attribute values of appropriate friends are hidden [3]. The friends are selected using some selection algorithm so that it helps to provide enough randomness to prevent the attacker from inferring the attribute value correctly.

Selectively falsifying attribute value (SFA)[3]: In this technique the attributes values of the friends are falsified so that the attacker won't be able to infer the attribute of the user correctly.

Selectively hiding relationships (SHR) [3]:In this technique it hides the friendship links between user and selected direct friends to alter the inference.

Selectively adding relationships (SAR) [3]: In this technique it selectively adds friendship links with other nodes, which will cause incorrect inference of the target profile attribute.

All these technique alter so that attacker will be forced to infer a wrong attribute value. The experiments [5] showed that effectiveness of the technique followed the following order SAR>SFA>SHR>SHA [3]. All these techniques work better than randomly changing the attributes [3]. However, implementing these techniques in the current SNS doesn't seem to be very feasible. Requesting your friend to hide/falsify attribute doesn't seem to be very feasible. Similarly, adding/hiding a link relationship will prevent a new friend to be able to search a common friend of both of you. Moreover, the dynamic nature of the SNS where new relations are added/deleted very frequently managing these kind of relations so that the privacy of the sensitive attributes cannot be predicted seems to be very cumbersome [3].

5.2 Configuring privacy settings

Another proposed solution for preventing such attacks is used to control the privacy settings of the SNS. The options to do these are to hide friendship links and group membership, limiting the profile searching options etc [2]. However, all these settings tremendously limit the purpose of the social networks functionality [2]. One of the main objectives of the SNS is the ability for friends to be able to search and connect to their friends. All these settings like hiding friendship/ group membership or make the profile search limited to friends or certain set of people seems to affect the ability of other people to be able to search for you and your friends, which limits the functionalities of SNS very heavily.

6 Future work & Conclusion

It can be seen that even if a user hides an attribute in his profile which he don't want to reveal it can't ensure the privacy of a user. These attribute can be inferred by other information that can be obtained from the social network like friendship, groups membership etc [2,3,5,6].

Preventing attribute inference so that it won't hinder the normal function of SNS seems to be an open problem [2,3,5,6]. The solutions that are proposed seem to prohibit the user from completely benefiting from the functionality of the SNS [2,3,5,6].

Until a solution is implemented for preventing such attacks, I believe the user should be informed of the privacy risks that can be caused when he joins a group or make a friendship link to make an informed decision [2]. That is, how much risk is involved in sensitive attributes on joining a particular network. So that he has the option to hide/show certain friend or group relation.