Print Email Download Reference This Send to Kindle Reddit This
submit to reddit

Re Authoring Web Pages For Mobile Devices Computer Science Essay

Today the world access the Internet more often and it became a in sore need of human being.because of this people need to access the web in where ever they are at the moment. In order to get more efficient mobile web browsing we have to re-author the web content . Five methods of web page re-authoring is discussed in this article and some of implementations also been discussed. They are device- specific authoring, multiple-device authoring, client-side navigation, automatic re-authoring, and web page filtering.in this article mainly discussed tree main automatic re-authoring systems called SmartWeb, RSS-feeds and Digestor system.

\chapter{Introduction}

Today the world access the Internet more often and it became a in sore need of human being.because of this people need to access the web in where ever they are at the moment. Because of this now people use mobile devices to access the web.figure 1.1 shows how the BBC site has visited by WAP users.Access to Web applications using hand-held devices is becoming a necessity for gathering information, conducting transactions, and interacting with people, and other information systems. Soon many people will retrieve information from the Web using hand-held, palm sized or even smaller computers. However, developing and deploying Web applications on mobile devices is not straightforward as it might sound. Web pages and applications for mobile environments pose certain unique requirements and challenges, compared to their desktop versions, which primarily arise from small size of the devices, limited input and interaction capabilities, slower communication, and need for tailored content depending on the dynamic context of use. Successful development and deployment of mobile Web applications calls for a better understanding of these requirements and challenges.

%image

\begin{figure}[h]

\centering

\includegraphics[scale=0.50]{bbc.PNG}

\caption{This is how bbc site visited}

\cite{5}

\end{figure}

\section{Problems in displaying web in mobile devices}

\begin{enumerate}

\item Display size of device :

Browsing full sized Web content on a mobile device is like viewing a desktop screen through a paper towel tube,because it is hard to know where the target content is located and one easily gets lost .

\item Lack of pointing tool :

When browsing the Internet using mobile devise there is no pointing tools like mouse or other.

\item Data transfer speed of the device :

One of the biggest reasons that users perform poorly on mobile devices is speed. Even people with fast Smartphone's are bound to encounter situations where their device is just too slow. Most users don't have the patience to wait for a long time.

\item Memory size of the device :

This is a another issue for mobile web browsing. The web page content has to be temporally store in the device. But when the memory is small it is difficult to store and retrieve the page.

\item Computation power of the device :

This is about page complexity. Complex components like images ,videos and animations make pages difficult to display on mobile devices, Because mobile devices may cannot compute them.

\item Typing ability of the device :

On many mobile devices (feature mobile phones that have reduced keypads, but also some touch screen devices) typing is hard.

\end{enumerate}

\newpage

\begin{center}

\section*{Outline}

\end{center}

\textbf{Chapter 2 :} Previous work .. previous works that done .\\ \\

\textbf{Chapter 3 :} re-authoring systems- complete discussion re-authoring techniques\\ \\

\textbf{Chapter 4 :} Future Directions and Challenges - discussion about future directions related

to previously explained application areas and challenges related to those application

areas.\\ \\

\textbf{Chapter 5 :} Conclusion contains the conclusion regarding the survey which contains a

summarizing discussion of re-authoring systems and current states.\\

\newpage

\chapter{Previous work}

For re-authoring web pages for display the big pages in small and less powerful mobile devices , we can categorize five technologies that have used in the industry .they are device-specific authoring, multiple device authoring, client-side navigation, automatic re-authoring, and web page filtering. Mostly used and effective technology is automatic- re-authoring .This article is going to describe them step by step.

\newpage

\chapter{Re-authoring systems}

\section{Client side navigation}

In Client side navigation the user is given the ability to interactively navigate a single web page by altering the portion of it that is displayed at any given time. good example for this is Scroll bars on the document display area on. Few some others are .. Active outlining, semi-transparent widgets, Magic Lens system, PAD++,Collapse-to-zoom.

\newline

\subsection{PAD++}

\paragraph{\newline}

Finding information effectively with a Pad++ interface is important because intuitive navigation through large information spaces is a primary motivation. To do this Pad++ supports visual searching with zooming in addition to traditional mechanisms, such as content-based search. Searching in Pad++ produces smooth animations to the desired objects. Animations interpolate in pan and zoom to bring the view to the specified location. If the end point, however, is more than one screen width away from the starting point, the animation zooms out to a point midway between the starting and ending points, far enough out so that both points are visible. The animation then smoothly zooms in to the destination. This gives a sense of context to the viewer and helps maintain object constancy. In addition it speeds up the animation since most of the panning is performed when zoomed out and thus covers more distance than panning while zoomed in.\cite{1}

\newline

\paragraph{There are some other things that can be implemented at client side in the client web browser \newline}

\begin{itemize}

\item Four-way-scrolling to present the reader an 80 column text, but it is very annoying to scroll the display in each line of the text you read.

\item Two-way-scrolling (only up and down) and only 50 column text. This requires new clipping of the text which sometimes leads to less beautiful results but is much easier to read than possibility one.

\item Client browser tells the information to the server during a negotiation algorithm a maximum of bytes it is able to receive.

\end{itemize}

\section{Page filtering}

Page filtering let user to view the content that users are interested at .this could be done at either client side or proxy server. By doing filtering process in proxy server we can reduce the cost of downloading unnecessary data. This specification can be done based on keyword or regular expression matching or page structure navigation and extraction commands, and can be either specified using visual tools or a scripting language.

\newline \\

\begin{itemize}

\item SPHINX system provides a visual tool that lets users create custom ' personal' web crawlers.

\item Lanacom' sHeadliner Pro and On Display' s CenterStage both provide visual editors that let users specify which structural parts of web pages to extract.

\item LiveAgent Pro and WebMethods WebAutomation Toolkit both include scripting languages that provide the same ability to designate portions of web pages as Digestor.

\end{itemize}

\section{Device specific authoring.}

\paragraph{\newline}

In this method ,author set of web pages for particular display device .As an example consider a particular cellular phone (Nokiya 9000).when develop this type web pages then only this type of display devices can have access them. Some web applications has developed using HDML,for small screen devices, so that web pages can access using small screen devices but big screen display users may not like that. As well as desired pages must be pre-defined and custom information extraction and page formatting software must be written to deliver the information to the small device.

\newline

\section{Multiple device Authoring}

\paragraph{\newline}

In this technique there are multiple versions of same page for a particular service. It supplies the most suitable pages for requested device by mapping them. One example of this is the StretchText approach in which portions of the document (potentially down to the word level) can be tagged with a 'level of abstraction' measure. Upon receiving the document, users can specify the level of abstraction they wish to view and are presented with the corresponding detail.

\newline

\subsection{StretchText}

This StretchText is a hypertext feature which gives more control to the reader in determining what level of detail to read at. Authors write content to several levels of detail in a work. It works as the current node is replaced with a newer node. This "stretching" to increase the amount of writing or to contract it gives the feature its name. This is same to zooming in to get more detail. Conceptually StretchText is similar to existing hypertext system where a link provides a more descriptive explanation of something, but there is a key difference between a link and a piece of stretchtext. A link completely replaces the current piece of hypertext with the destination, whereas stretchtext expands or contracts the content in place. Thus the existing hypertext serves as context. Another example of multiple-device authoring is HTML cascading style sheets (CSS).

\newline

\subsection{CSS }

Style sheets enable documents to remain vendor, platform, and device independent. Style sheets themselves are also vendor and platform independent, but CSS2 allows web developer to target a style sheet for a group of devices (e.g., printers).The user agent access the page by retrieving all style sheets associated with the document that are specified for the target media type. In CSS, a single style sheet defines a set of display attributes for different structural portions of a document. A series of style sheets may be attached to a document, each with a weight describing its desirability to the document's author.

The user can also specify a style sheet, as can the WWW browser using the 'default' style sheet. Although the author's style sheets normally override the users, the user can selectively enable or disable the author's, providing them with the ability to tailor the rendering of the document to their particular display.

\paragraph{\newline}

The problem in this technique is the developers must design multiple versions of same application. It is a additional work load for developers and they do not like that .And it is a consume of valuable time of web developers.

\section{Page filtering}

Page filtering is can be done in either client side or server proxy side. By this technique it remove some unnecessary content from the page. Filter specifications can be based on keyword or regular expression matching or page structure navigation and extraction commands, and can be either specified using visual tools or a scripting language. This method also can be used with other solution to get more effect. But as singe solution this is not a good solution because the content is removed by no interaction of user and then it may remove content that user want.

\section{Automatic Re-Authoring }

\paragraph{\newline}

In this method, desktop web pages are taken and transform them in order to view pages from difference display .this can be done in a server, intermediary HTTP proxy server or client side. For developing this technique, we have concerned few key difficulties when accessing web site using mobile devices.

\begin{itemize}

\item Mobile users have limited ways of inputting and they usually don't like to type words when browsing

\item Interfaces must be more users friendly and can be easily learn

\item Mobile users usually don't need animations and other decorations, because they are interested in the content and mostly it may search for specific information and also aspect fast search.

Specially when there are images the browser has to download them and that increase cost and the response time highly. But it is not necessary like that.

\end{itemize}

\begin{table}[ht]

\centering

\begin{tabular}{|c|c|c|}

\hline \hline

& \bf Elide & \bf Transform \\[0.5ex]

\hline

\bf Syntactic & Section Outlining & Image Reduction \\

\bf Semantic & Removing Irrelevant Content & Text Summarization \\[1ex]

\hline

\end{tabular}

\caption{Automatic re-authoring can be categorized in two dimension}

\label{table:nonline}

\end{table}

\paragraph{\newline}

Syntactic techniques operate on the structure of the page and semantic techniques rely on some understanding of the content. Elision techniques basically remove some information, leaving everything else untouched and transformation techniques involve modifying some aspect of the page's presentation or content.

\subsection{SmartWeb}

\paragraph{\newline}

This is a automatic re-authoring technique, that automatically download the original web page and convert that page into a page that can be display in clients device with its capabilities (display parameters, processor, memory), the available bandwidth and the user preferences. User preference means, user can define what they want and what they don't, for example if they don't want images to display they can predefine that.

\newline

\paragraph{Steps of the Adaptation Process \newline}

\begin{enumerate}

\item Build a specific tree from the HTML document that we call transformation tree.

\item Perform some analyses and modifications on the transformation tree.

\item Assemble the resulting HTML page on the basis of the transformation tree.

\end{enumerate}

\paragraph{\newline}

In order to get well structured page, the system convert the HTML page into XHTML page that ease the future woks to be done with that page. Then after that it creates a transformation tree from it. Every HTML tag will be a node of that tree. And every attribute will be a node parameter. During the modification of the transformation tree the system commit changes on it in order to produce a version of the document that best matches the user profile.

In analyses part it recognize semantically coherent parts of the page and this call as block recognition process. The modification can be categorized in to three modification techniques. Those are

\newline

\paragraph{1. Transformations \newline}

I n this step, transformation tree nodes are modified (simple editing). A transformation can change the attributes of a node, and can perform predefined atomic transformations. There are two types of transformations, ones that do not rely on the results of the block recognition and ones that do. The first group is called an automatic transformation which is take place before the block recognition and the second is called block recognition based transformations which is take palace just after block recognition.

The current version of SmartWeb uses the following transformations:

\begin{itemize}

\item Link and URL transformations

\item Removing all the formatting tags of a page

\item Image reduction and image elision

\item Removing the styles of the nodes

\item Applying a new CSS for the document

\item Applying a new CSS for the identified block types

\end{itemize}

\paragraph{2. Elisions \newline}

In this process, nodes (a node or a whole sub tree from the transformation tree) are deleted in order to get desired page. This process relies on block recognition.

\newline

\paragraph{3. Conversions \newline}

This process is also relying on block recognition. In here the nodes are replaced with simplified version of it. That makes the document simpler.

\subsection{RSS feeds}

\paragraph{\newline}

RSS Is stands for Really Simple Syndication .This is a family of XML file format and it summarize the content of the web site. Mostly news sites uses this for publish latest news in brief. And the visitors can view the shorten view and if it interested he or she then they can view the details of it. Now a day's most web sites have enabled RSS. For example CNN, BBC.

\newline

\paragraph{How it works:\newline}

First when user came to site he or she is prompt to use RSS if the site has enabled RSS feeds. if not then the content is adapted by proxy server and add menu and send the XHTML - MP content to browser. Else it has then user can select to access the content by using RSS feeds. If the user prefer to go through a available RSS feeds then the RSS system read every \textless title\textgreater \textless link \textgreater ,\textless description \textgreater element from the feed, adds the application's menu and returns the XHTML - MP content to the user.

When user opened the browser, he or she has to do is enter the url for the site, and also they don't has to worry about to type the complete url like typing ' http ', the proxy server will do it for they.figure 3.2 shows how the welcome page of the system. Once the url found the proxy server the XHTML - MP page will send to user. That decreases the usage of the key board.

\newline

%the image

\begin{figure}[h]

\centering

\includegraphics[scale=0.50]{chart.PNG}

\caption{Rss feeds procedure}

\cite{6}

\end{figure}

\paragraph{\newline}

The proxy server is now has saved the web page html of the web site .After that step in the proxy there is parsing algorithm to remove tags like\textless script\textgreater, \textless noscript \textgreater, \textless style\textgreater, \textless link \textgreater,\textless iframe \textgreater,

\textless object\textgreater and \textless embed\textgreater, that must be remove to get mobile friendly web page. These algorithm also remove comments because browser ignore them and by sending them to user is wresting of valuable bandwidth of the user. Even it decrease the usability of many site contents it removes the form tag also. It disables the usage of logging and search option. But those are additional options that we hope to involve with. Our target is past and cost effective web browsing. If there is a need of other behavior then the provider must provide that facilities by special proxy.figure 3.1 shows this procedure.

\newline

%image

\begin{figure}[h]

\centering

\includegraphics[scale=0.75]{Mainpage.PNG}

\caption{This is the welcome page}

\cite{6}

\end{figure}

\paragraph{\newline}

But when displaying tables there is a problem. That is the screen cannot display it because of small display area. The RSS system overcomes this problem, removing tables and instead of table it adds new line for every cell. For nested table does the same thing. But by doing this it remove the look and feel of the site. For an example this is how it looks like before and after.

\newline

\begin{figure}[h]

\centering

\includegraphics[scale=0.75]{Beforeformat.PNG}

\caption{This how look before format}

\cite{6}

\end{figure}

\newline

%2 images

\begin{figure}[h]

\centering

\includegraphics[scale=0.75]{afterformat.PNG}

\caption{This is how after format}

\cite{6}

\end{figure}

\paragraph{\newline}

But when displaying tables there is a problem. That is the screen cannot display it because of small display area. The RSS system overcomes this problem, removing tables and instead of table it adds new line for every cell. For nested table does the same thing. But by doing this it remove the look and feel of the site. For an example this is how it looks like before and after.

\newline

pages that only needed is a very important factor .that helps user to reduce the cost. The application can give this option by providing options that view only the page content or view only hyperlinks in the page.That is better if there a option for back to normal view also .when a request came to page that have rss feeds the system not going to introduce rss feeds to user straight forward it ask for the action from the user.to discover that the page have any rss feeds , the system go through the html code and find that <link > tags that contain value "application/rss+xml" before remove them. The rss title will be the title of the <link> tab.if there is more than one rss references the parser returns all the feeds in the order that they are detected in the web page code.

To do things discussed so far ,server need support PHP 5 or better, and client browser that support WAP 2.0 (today most devices have this ),and also the web pages of the application are XHTML-MP valid.

Using this technology we can access complex web site in a cost effective and clear way. But in this method there is some faults also . Removing forms is a big issue. There may no use of when there is no logging action in most applications in web. And removing tables will reduce the clearness of the page. The proxy server must be so powerful in order to transfer the page before the request time-out.

\newline

\subsection{The Digestor system}

\paragraph{\newline}

This system provides automatic authoring system together with filtering system. Digestor system intercept the user request and returns re-authorized page instead of the original page.It dose what is specify the display size of client browser and font size that are uses in browser and then return suitable web page to client. It may remove some images ,text or replaced by links .The following figure 3.5 show how the system works.

\newline

%image

\begin{figure}[h]

\centering

\includegraphics[scale=0.75]{digestor.PNG}

\caption{The digestot system procedure}

\cite{4}

\end{figure}

\paragraph{\newline}

To study how to do this automatic re-authoring initially conducted a manual re-authoring process through several web pages called Xerox Corporate web site.

Some of the design heuristics learned during this process were:

\begin{itemize}

\item Keeping at least some of the original images is important to maintain the look and feel of the original document. Common techniques include keeping only the first, or only the first and last image (bookend images) and eliding the rest.

\item Section headers (H1 - H6 tags) are not often used correctly. They are more frequently used to achieve a particular font size and style (e.g., bold), if they are used at all. Thus, they cannot be relied upon to provide a structural outline for most documents. Instead, documents with many text blocks can be reduced by replacing each text block with the first sentence or phrase of each block (first sentence elision).

\item An initial rule of thumb for images is to reduce them all in size by a standard percentage, dictated by the ratio of the display area that the document was authored for to the display area of the target device. Images which contain text or numbers can be reduced by only a small amount before their contents become illegible.

\item Semantic elision can be performed on sidebars which present information which is tangential to the main concepts presented in a page. Many of the Xerox pages had such sidebars which were simply eliminated in the reduced versions.

\item Semantic elision can also be performed on images which do not contribute any information to the page, but serve only to improve its aesthetics.

\item Pages can be categorized, and then re-authored based on their category. Two examples of these are banners and link tables. Banners primarily contain a set of images and a small number of navigation links (often only one) which serve to establish an aesthetic look, but contain little or no content. When space is at a premium, these can usually be omitted entirely. Link table pages consist primarily of a set of hypertext links to other pages, and very little additional content. These pages can usually be re-formatted into a more compact form which just lists the links in a text block.

\item White space, which is taken for granted on a large display, is at a premium on small devices. Several techniques were discovered for reducing the amount of white space in a page. Sequences of paragraphs (P tags) or breaks (BR tags) can be collapsed into one. Lists (UL, OL, and DL tags) take up valuable horizontal space with their indenting and bullets, and can be re-formatted into simple text blocks with breaks between successive items .

\end{itemize}

\paragraph{\newline}

As result after all digestor succeeded. Digestor applies only syntactic transformations currently and Such transformations are less error-prone and more generally applicable. Following are the transformation techniques that uses currently.

\paragraph{Outlining traceformation}

%iamage

\begin{figure}[h]

\centering

\includegraphics[scale=0.75]{outlining.PNG}

\caption{Outlining}

\cite{4}

\end{figure}

\paragraph{\newline}

The content of each section is dropped from the current document and the header is converted to hypertext link ,that link is refer to the content in another page. There are two approaches in this technology. The first--full outlining--works by keeping only the section headers and eliding all content, with the results looking like a table of contents for a book. In the second approach--to-level outlining--a cutoff level in the section hierarchy is determined and all content below that level (including lower-level section headers) is elided, but all content above that level is kept.the figure 3.6 shows how it look likes.

\paragraph{\newline First sentence elision transform. \newline}

In this technique each text box is replaced with its first sentence and that sentence is converted into hyperlink to original text box. This technique reduce the required screen are.

\paragraph{\newline Table transform. \newline}

When the page contents a table that cannot be displayed on the client browser, it outputs one sub-page per table cell, in top-down, left-to-right order. Nested tables are also handled in same manner. The table transforms uses heuristics to detect when there are navigation sidebars using table columns and moves the cell to end of the table.

\paragraph{\newline Image Reduction and Elision \newline}

Image present is the most difficult problem that we found. Because we have to decide whether the image keep, reduce, eliminate. And that decision is depend on the impotency of that image in order of its content and role in the page. digester mechanism overcome this by transforming images in the page using pre-defined scaling factors ( 25\%, 50\%, and 75\% ) and making that reduced image as link to the original image.

\paragraph{\newline Image map transform. \newline}

There may some images that are too large to display on small screen and cannot do it. If that is the situation then the image will remove by the digestor system. But that image may be a link to other. In that case the image cannot be removed. In that kind of situation the system will remove the image but will add a new text link that image was linking.

\paragraph{\newline How it works \newline}

When a request came the system get the page and create a Abstract Syntax tree and labels each node with unique identifier .The system is works basically the heuristic planner, search the document transformation space and find the best-first manner using many heuristics that describe preconditions for transformations and combinations of transformations, and each state has a version number that represent quality . As soon as good enough document created, the search is halted and the page is returned to client. If there are hard size constraints that are not met by the best document, a more destructive transformation is applied that breaks the documents up in the middle of paragraphs. Heuristic information is used to determine witch the transformation must apply. In general miner transformation is take place in this process.to get client device type the proxy the user requests a specific control URL from the proxy, resulting in delivery of the form shown.

\newpage

\chapter{Future Works}

\paragraph{\newline}

Plenty of works has been done to introduce web in to mobile devices in a more efficient manner. Except automatic re-authoring there is no much future works to do and we cannot rely on those techniques. In automatic re-authoring as discussed above there is lot works to be done and that bring us well accessibility of mobile web.

In smartweb the current version has just few transformations. It is good if current transformations extended and new transformations added. And future conversions must be adapted in order to be able to hide an identified block by a link that points to a new page that contains the whole block (or a simplified version of it). In the future it must improve it to be able to process the most common web content formats (e.g. pdf).

In rss feeds that current technique cannot apply to the systems like online transaction because it omits forms. And removing all the images of the page is not a better solution because future mobile device are much capable of rendering images. Those issues must be considered in future works of rss-feeds.

In digestor system more work-especially in the areas of user testing and navigation design-needs to be performed to perfect the system. In future the system must be improved to give options like that the user could dynamically apply and undo different transformations until they achieved a result they liked. By giving this we can perform user studies to determine the quality of the transformation used by Digestor. Current system works fairly well with fairly dense documents with minimal structure, but or documents with a lot of white space or that use advanced layout techniques (e.g., tables) it works poorly .in future that case must be handled .and also Transformations could also be generalized by the use of user-specified parameters so that, for example, a transformation could elide all text blocks containing a user-specified keyword.

\newpage

\chapter{Conclusion}

\paragraph{\newline}

In early steps device specific authoring is adopted but that was not good enough because in that presidia users of such specialty devices will only have access to a select set of services and the pages for these services can all be designed up-front for the device's particular display.In multiple device authoring that fault has been overcame but the developer has lot of works to be done.client side navigation is not bad but it has limited capabilities .it only can work with pages that already downloaded . automatic re-authoring is the best solution and there are lot of works has been done around this method.considering that automatic re-authoring systems that mentioned above we can get that the best solution among them is Digestor system. But all those have some capabilities and disabilities.we can develop the system with together all the techniques that can connect will become more effective.For an example when we developed a Digestor system with page filtering and good client side authoring we can come up with better solution.

\newpage

\bibliographystyle{IEEEtran}

\bibliography{bibfile}

\nocite{*}

\end{document}

Print Email Download Reference This Send to Kindle Reddit This

Share This Essay

To share this essay on Reddit, Facebook, Twitter, or Google+ just click on the buttons below:

Request Removal

If you are the original writer of this essay and no longer wish to have the essay published on the UK Essays website then please click on the link below to request removal:

Request the removal of this essay.


More from UK Essays