This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
In this thesis we more focused on one of this social network, which is Facebook. The most important element of the thesis is there isn't enough matches to reach for the next version of the Internet. One of the accomplished of researchers was Web 3.0 that uses a clear list of address security, mobility, and other challenges that have happened from the time that Internet was first developed. Actually what is achieved from latest technologies can cover different types of outfit for Web 3.0. This Master thesis also conveyed at the new look of Internet with Web 3.0 to reach Web 4.0. In this thesis, concepts such as representing knowledge with a Semantic Web language, algorithm to do the matching, reasoning, and querying have been mentioned to realize a Semantic Web application. Also extracting data from Facebook user profiles to find the best matching for friend suggestion were discussed. We started with differentiation of the semantic web with old web and find what makes it more powerful than the previous versions of the web. We also can say a collection of data a Semantic Web the Semantic Web is a web of data. However, we should know that there is some data that we can not call it part of web. As we discussed in chapter 1 Semantic Web technology, information retrieval based on the Semantic Web has become a researching focus. In the semantic retrieval, the calculating method of semantic similarity is very important to information recall and precision ratio.
According to chapter 2 Information retrieval on the Web (Dr. Nolan Brian, 2003) is more relevant to the process of retrieving data from huge amount of online documents which is available for users. This way of retrieving the data is a routine way to capture the information. Therefore, in this way the extraction is not occurred from vast amount of data that called document. However, when user searches for specific document he/she should look up all the information of the web to find the exact thing that he/she was looking for and related to his/her work. However on the Web, there are some unstructured data with mark-up languages which is difficult for users or even machines to recover. The technique, which is used now days for recovering is called browsing and keyword searching that means all the words that user is typed in search engine is analysed letter by letter. This way of retrieving has so many problem and limit ions that we talked about it in previous chapters.
In this thesis according to our model of extraction the data that was gathered from different layer of contracting is presented in HTML with the use of FOAF vocabulary. However, at first is represented by RDF. A technology that is used for combining information from different elements of information sources is called RDF (Peter Mika et al., 2005). The very first pace in for this procedure is to convey all information using a common presentation like RDF (Peter Mika et al., 2005). Social Networks with personal information are personal information and social networks are reported in FOAF and the valid emails are explicated in a proprietorship, while the rest of metadata is shown in terms of the full structured (Peter Mika et al., 2005). The main reason of this thesis was to make a social network such as Facebook semantic, in the form of logic way. There are many Social network but the scope of this thesis was on one of them that is Facebook . All the data that collected for building the prototype is used for the content of every profile in Facebook. We should know that the process of Facebook is based on suggesting friends and invitation. There is no searching based on contents in this social network. Besides as the Facebook users know, there is no semantic suggesting friend in this website. The suggestions are based on Friend-Of-A-Friend (FOAF). According to this issue, one of my concerns in this thesis was to find a way to make these suggestions more logic. To do this we took Female students that had Facebook account (Facebook API 2011) but just 500 of them are active users with full profile details and information so that we took 12 of them randomly for doing the extraction and comparison. The objects of study were real world objects.
With the current technology for Facebook there wass no logic way for suggesting friends to fill this gap so in this thesis I should extract the content of every profile in a social network (Facebook) and then convert the entire concept from HTML to TEXT to find the relationships and every single connection between the data of the profiles with semantic web algorithm. Therefore, in this way we could build a system that can semantically suggest friends to users based on their similarities in every aspect. First of all we found the problem that could make a failure for this aim, and then find a solution for it. Therefore, in this way we reached the final solution easier, moreover we built a system accurately. The main reason for this work was to build a system that can be logical.
5.2 Methods of Extracting Data
In this section we will talk about the algorithm and methods that is used for extracting a social network.
Public Listings: There is a very easy method for gathering the information which is "Crawling Public Profile Listings". This method does not need any active account and can be done by browsing the search engines with crawling, as is recommended by Facebook(Joseph Bonneau, 2009).
False Profiles: When we don't have enough public data in view we need to create false profiles to gather more information (Joseph Bonneau, 2009). This is a very easy way and it is just needed a valid email address that can be also temporary (Tor Project, 2009) (Joseph Bonneau, 2009). If the profile is created as a "searchable profile" it will show all the details of the account like name, address, photo etc (Joseph Bonneau, 2009).
5. 3 Problems with Current Web
However WWW is really amazing and with the features and benefits that is developed have changed our world completely (Dhingra Vandana, 2011). Despite, with these current web technologies all the needs of today's dynamic, distributed, and robust computing can not be covered and they are still unsolved (Dhingra Vandana, 2011). Some of the requirements of new web technologies are structure of the information, improve current search mechanism, and exhibit the semantics and meaning of the information (Dhingra Vandana, 2011). As you can see below the more important limitations of current web is mentioned that push us to think about new generation and version for the web (Dhingra Vandana, 2011).
1) Single Document Search
The most important limitation of extracting data from the web with current technology is that commonly information could be recovered only from a single web page and a single document, and it is really hard to jointly do it for more than one documents and several web pages (Dhingra Vandana, 2011).
2) Search Limited to Keywords (No semantics)
As we mentioned before one way of crawling data in the web was keyword search which make the user to face some problems because of mismatching and error in type with the original searched document (Dhingra Vandana, 2011). This problem happens because documents in the web use different terminology and vocabulary that if the information is even exist in the web but it fails (Dhingra Vandana, 2011).
3) Irrelevant and Excessive Information
Another problem is happened when we use keyword search and is facing with lots of relevant and irrelevant information (Dhingra Vandana, 2011). However most of the time user face with more irrelevant information instead of the desire data that he searched for and it really time consuming to retrieve the desired on from all (Dhingra Vandana, 2011).
4) Semi structured Information Representation
Our current web is too much based on document-centric because of supporting advanced information representation (Dhingra Vandana, 2011). Most of the available information in the web is unstructured or semi structured (Dhingra Vandana, 2011). More than half of information in current web is based on HTML that is appropriate for direct human use but not appropriate at all for the automated information exchange, retrieval and processing by software agents (machines) (Dhingra Vandana, 2011).
5) Research Context and Basic Definitions
Context: Research communities. We can identify three communities doing web-service research: the Industrial Web-Services community, the Semantic Web-Services community, and the e-Services community. These communities differ in their basic assumptions, research method, and state of the art developments. The Industrial Web Services community performs pragmatic industry oriented research. It is driven by practical needs and requirements. This community sees web services as a way of performing remote procedure calls over the Web. This community performs bottom-up research: the assessment of the specific technical problems lead to the construction of the representations and systems that can solve them. These systems are tested by implementing numerous case studies and applications.
5.4 Limitations of Current Approach
When we face with the limitation of current web we come to the conclusion that to find a way for improvement (Sophia Alim et al., 2011). Some of these limitations include:
1) With current technique there is no way for extraction the full list of friends for every profile (Sophia Alim et al., 2011). It just can be done for top friends that we can choose (Sophia Alim et al., 2011).
2) There is just one model of going through the graph. Only Breadth First Search was used to go beyond the online social network (Sophia Alim et al., 2011). This way is not sufficient for using the comparison with algorithms and implement and analysis the results (Sophia Alim et al., 2011).
3) Various profile structures: Musicians, magazines, and band profiles and fane pages can not be extracted because their profiles have a different structure (Sophia Alim et al., 2011). This factor made the profile hard to extract from (Sophia Alim et al., 2011). Moreover, the friendship between a person and a band or magazine is different from two individuals who are friends (Sophia Alim et al., 2011). The friendship between a person and a band is a "fan based" relationship compared to a relationship between two individuals, which is a "friend of" relationship (Sophia Alim et al., 2011). Our results suggest that social networks should limit the number of mechanisms that has the authority to access user data, to control the data sharing and prevent the data phishing (Joseph Bonneau, 2009). However the biggest trouble is lack of user's knowledge about privacy which they can have in every site today (Joseph Bonneau, 2009).
4) LSA Website has factor limitation which just 75000 factors could compare in every text so if we had a file with more than this it would give an error for it so you have to divide the content into small parts and then do the extraction that this way may change the result and it won't be accurate anymore.
5) There is another limitation in LSA Website that is number of words which available in corpus. In this thesis for doing the extraction we had a text file for input in LSA , this text file included some words which were not available in the corpus of LSA so we put some percent error for it to make the results more accurate and close to real one.
5.5 Future Research
As we know Social networks play important roles in our daily lives. People can communicate and share information with each other as friends, family, colleagues, collaborators, and business partners. In this chapter of the thesis the way of extraction is shown as well as the comparison of every profile with different kind of level for extraction and all the works is done with method for extraction and online comparison (Sophia Alim et al., 2011). Automatic extraction of information is the direct way, which can occur with semi-structured web pages (Sophia Alim et al., 2011). After extracting the data and some experiments that we done we found out that social networks like Facebook have more than one profile structure template and the user can make to specifications the template (Sophia Alim et al., 2011).This master thesis has provided opportunities for future research that is listed below:
1) From what we did for extracting the data from online social network like Facebook it shows that it just happens with the use of a depth first search. After extraction we could compare all the recovered data with the results in the Breadth First Search.
2) Development of the application to extract all the friends and their attributes from the profile rather than just the top or random friends. This will help to provide a more accurate graph and changes in the online social network can be tracked because the application can be run more than once.
3) Projection of profile connections from the repository into a graph. The graph will map the profiles and their relationships with other profiles. The graph will be a directed weighted multi graph.
4) Make the data retrieval automated for the development of an agent.
5) Extract from other online social networks users with registered profiles.
This research project proposed an indexation for Semantic Social Networks with a prototype for it , that in this thesis is used Facebook as an example. From what we achieved, a Semantic Social Network like Facebook repository will create and maintain by crawling Social network profile and other Web sites that have the connection with it. For accomplish the prototype method, we used the matching algorithm (semantic match-up algorithm LSA Latent Semantic Analysis) and a desire set of results is returned with different number of matching conceived after doing the comparison it showed that which two profiles are the best match for suggesting to each other in Facebook according to percentage of similarity. This algorithm will use to find the similarities in text which is the content of every people profile in a social network like Facebook . According to the results of this thesis with this final results we can add more flexibility to the current Social Networks like Facebook and give the chances to both users and providers like the suggesting friends in Facebook will be more logical, according to their similarities in the content of their profile (for ex: suggesting friends according to the kind of sport or movie or even the same content of images albums). In addition, the prototype developed in this thesis showed the improvement for a user profiles with different application could be added to a Social Network like Facebook. Instead of using natural language processing to extract the data from every profile in every existing documents, we just need to explain any knowledge representation language (XML, RDF, and OWL) and then extract the profile content in HTML file and convert it to text. However knowledge representation can solve many of the today Web's problems, but with current research we cannot provide a exact application to Semantic Web. This thesis presented and all the new methods that are used for extracting data is depending on the structure of content. This technique that presented in this thesis permits automatic integration of sources .So in this way we can have a more logical Social Networks. Nevertheless, in this thesis I focus on just one of this Social Network that is Facebook. Moreover, develop a prototype for this Social Network (Facebook).
This approach could improve the use of knowledge management, especially for methods such as extracting, integrating, recovering, implementing and identification of dynamic content inside an organization. The result of this thesis shows that the current way of protection of data crawling in a social network like Facebook is not a big deal for browsing in search engines in comparison with global networks. From what we found the ways for extraction both personal data and social graph data from social networks can be done with different structure with having different privacy options.