This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
P2p system architecture is widely used because of its file sharing capabilities over the Internet. Gnutella is one of the very popular file-sharing networks that allow users to send/receive files and other content sharing over the Internet. Gnutella is widely used protocol for distributed search and digital content distribution. Although it supports a traditional client/centralized server search paradigm, but Gnutella's distinction is its peer-to-peer, decentralized model.
The Gnutella software comes under the GNU General Public License. This allows the source of the program to be made available to the public and the reason of first three letters of Gnutella are taken from the GNU. The subsequent fraction of the name is taken from Nutella, a chocolate hazelnut spread, which rumor has it that the developers ate a lot of whilst working on the project [http://www.lycos.com/info/gnutella--gnutella-network.html].
As discussed in the previous chapter that, in P2p system model every client is a server, and vice versa. The Gnutella protocol architecture has Servent components at each user/node. These servents perform tasks generally associated with both clients and servers. The servent has two interfaces 1) Client and 2) Server. The client interface is a client service through which users/nodes of a Gnutella network can send queries and view search results. Whilst at the same time the server interface is also able to take queries from other servants, to check for possible matches against their local pool of resources. And finally responds with appropriate results to the queries of the client interfaces of the other nodes. Gnutella is a distributed network in nature so operation of the network is not heavily effected even if a subset of servents goes down. This ability of Gnutella network protocol makes it highly robust and fault-tolerant.
Gnutella follows a decentralized P2p system model. The participating nodes/users can share their resources from their system for others nodes/users to see and get, and find resources shared by others on the network. The shared resources can be of a wider variety: mappings/pointing to other resources, cryptographic keys, files of any type, meta-information on key-able resources, etc. Theoretically the shared resources can be anything, but the semantics for finding/searching and handling resources may vary i.e. each type of resource have its own semantics or a group of resources can have similar semantic to find and handle.
In a system model whether it's a client-server or P2p system the starting point is when a user/node attempts to join a network. For example, in a Windows based Client-Server system a windows client machine joins and logs in the domain only then the client be able to access any of the services offered by domain server depending on the policies and controls placed on. Similarly in a Gnutella network a user joins the network when a Gnutella program is launched on a machine. The Gnutella program automatically starts to seek out other Gnutella nodes to which to connect. The set of already connected nodes, carry the Gnutella traffic, which consists of queries, replies to those queries, and also other control messages for facilitating discovery of other nodes. Before we look at how these queries, replies and control messages are accomplished let's see a very basic building blocks of the Gnutella network architecture and define what is a Gnutella network.
A Gnutella network also termed as GNET is made of interconnected hosts implementing the Gnutella protocol.
It is a software or program which participates in a Gnutella network. This program has both the abilities of a server and client. That is reason it's called SERVENT, Servent is derived from client-server i.e. “SERVer” and “cliENT”.
The terms “peer”, “node” and “host” refer to a network participant i.e. a user or computer machine rather than a program. As a servent can have a clear client or server role so sometimes a peer can also be called as “client” or “server”. But normally “client” is used as a synonym for servent. Some people use a variant spelling for servent i.e. “servant” instead of servent.
A message is essentially a data structure entity used to send control information or data over P2p network. Various terms for message such as "packet", "descriptor" etc, are used by different researchers but the concept of all these terms is very same.
Globally Unique Identifier (GUID) is a 16-byte long containing random bytes. This is used to identify servents and messages. It is worth to mention here that GUID is not a signature but a simple identifier data structure to identify network entities in an inimitable manner.
In the beginning when the Gnutella protocol was launched, to provide a list of Gnutella nodes to any GNET servent attempting to connect to it there were several permanently known or rigid hosts. These were called “hostcaches”. These hostcaches are not used anymore today because it was not a highly self adaptive and fault tolerant approach for a fully distributed P2p system architecture. This problem is known as "Initial Connection Point Problem" in P2p research community. So, it emerged to the current specifications, which is very flexible approach as compared to hostcahes approach. With the current specification, in order to connect to the Gnutella network, a servent when started perform a search to find and store hosts' addresses using a caching system known as Gnutella Web Caching System (GWCS). The aspiration of the GWCS or simply “cache” is to eradicate the “Initial Connection Point Problem” of an entirely decentralized network. The cache or GWCS is a component placed on any web server that accumulates the IP addresses of nodes in the Gnutella network and URLs of other caches or GWCS. Gnutella client nodes connect to a GWCS where a list of the connecting nodes is maintained randomly. The nodes or clients of the Gnutella network send and receive IP addresses and URLs from the cache or GWCS. It is interesting to note, a random connection approach is chosen for assurance that all caches ultimately discover about each other, and that all GWCS have comparatively fresher nodes and URLs. GWCS communication protocol do not have any specific requirements or semantics,, rather to get other nodes addresses, a servent need to send an HTTP request to GWCS. As Gnutella network is highly distributed and decentralized so usually, a node do not require more than one query in a session to get addresses of the other existing nodes .This allows an ease to node while joining the Gnutella network using a GWCS without a predefined rigid servers. Furthermore caches approach provides more flexibility to a node to join the Gnutella network as follows:
By requesting a GWCS.
The node addresses can be read, stored and used to join a GNET from X-Try or X-Try-Ultrapeers message headers while a successful handshaking process.
Beside these X-trays, the pong messages also contain the node addresses. The node address can be stored once having a successful connection established with the Gnutella network.
The QueryHit messages also hold the node's addresses and these can be read from it. It may be noted that at least two successfully established connections with the Gnutella network are needed to get the node's addresses from QueryHit messages. Getting the node's addresses from QueryHit message is also based on a assumption that the remote connecting servent utilizes the matching port for uploading slots and for the Gnutella network slots.
Since the caches were to improve the p2p network system implementation to a highly decentralized and scalable. In a real scenario for the sack of flexibility and scalability it may not be practised to query a GWCS more than once in a session. To follow this practice of not calling the GWCS very frequently during a session, the above described four techniques are implied by a servent to get host's addresses. The addresses extracted by these techniques along with GWCS response of a single query per session are maintained in a cache locally hosted by the servent. Though local cache improves it a bit but in order to ensure that the calling of GWCS may not exceed the mean count of one is described in a algorithm as given bellow:
- In beginning a servent loads the cache which may have already been maintained locally.
- The servent sends the connection requests to the Gnutella network using the connection information extracted from the above step.
- If there has been no any connection established to the Gnutella network at all, servent sends a random query to a GWCS.
- There is a possibility that the randomly selected GWCS is slower or it might have been down while requesting it so servent waits for a reasonable time. This waiting time can be application, implementation specific or based on other some factors which might have also been defined such QoS, response time etc. If waiting time has been passed and servent did not get any response from the GWCS, then a request is sent to the next randomly chosen GWCS similarly as done previously.
- If there is no response from some initially selected GWCS, servent will keep calling the GWCS subsequently unless at least one GWCS sends a positive response.
- After getting a response minimum from one GWCS a servent will not query any GWCS for the subsequent sessions rather the servent can easily connect to the Gnutella network using its local cache. The servent's local cache is updated with every successful response from a GWCS.
Therefore using this algorithm the bootstrapping nearly guarantees so that a servent should call GWCS more than once unless it gets a single response.
After a servent or node is connected to the GNET, it starts to its role in the network by communicating with rest of the servents or hosts. The protocol used to send and receive message over the GNET is known as Gnutella message protocol. In the next section we explain the different messages and protocol architecture.
Gnutella Message Architecture:
A servent can only participate in a Gnutella network if it has successfully established its connection with the network as described in above. The Gnutella network protocol framework for communication is a message based model. The Gnutella network message are different from OSI layer messages i.e. GNET protocol message are application level messages and their semantics is defined for the specific functional requirements of the P2p system architecture. The header of the messages and message structured and their starting, ending points etc is as explained below. Before explaining the structures it is useful to define some of the important features of the Gnutella protocol architecture.
- Not necessarily a message of the Gnutella network can properly fit into a single IP packet and vice versa. So a single IP packet can have multiple Gnutella messages and a single Gnutella message can be consisted of multiple IP packets i.e. we cannot predict accurately by simply reading out the data from a socket but there is a complete structure of a Gnutella messages as of any other network protocol may have.
- Usually in the specifications and so as in implementation of the Gnutella the following discussed structure's fields are little endian until their byte order is exclusively described.
- As most the current P2p networks including the Gnutella network use the IPv4 so we in this thesis project assume IPv4. An example of IPv4 byte structure is as:
The Gnutella message or message header is consists of the 23 bytes, which are further split to provide us a following structure of fields,
Message or Descriptor ID: It is a 16 bytes ID structure known as GUID, which uniquely identifies the messages of the protocol on the network. The byte 8 and 15 are also used for especial indications of a message, i.e. if a message ID has 0xff or all 1's in the 8th byte it represents a modern node or servent, and the 15th byte of the message ID has 0x00 or all 0's initially this bye is held for future usage. The rest of the purpose can have any possible value i.e. random numbers representing the message over the Gnutella network.
Payload: The Gnutella network has certain type messages and payload is a single byte field of the message header, which represents a specific message type.
Beside these message types other messages are also implied in the Gnutella network. But before sending any other messages a servent ensures whether remote servent also supports it. This assurance of any new type message support is checked by handshake headers.
As any message generated must be destroyed at some point so the TTL field represents the number of times a message can be forwarded by different nods and after it the message must be diminished from the network. In the Gnutella network whenever a nod forwards a message it decreases the TTL field by one. A message is discarded by a nod instead of forwarding it to the next nod only if TTL reaches to zero.
It represents how many times a message is already transmitted so its inversely proportional to the TTL. As each time a message is forwarded TLL decreases and Hops increases by one.