Document markup is a process of writing codes in a document to identify structure and format of a final appearance of a document. Markup was done by copy editor writing instruction on document that was followed by typesetter, before the computerization of printing industry. After computerization, text formatting languages were written. Copy editor’s markup was converted into the normal markup by a typesetter. When computers spread over the world, people began using word processing software to write documents. Each word processing program had its own feature and technique of markup. All electronic documents which store text use some form of markup. These markups can be visible, hidden or self-generated.
Document markup divided into two categories specific markup and generic markup. Specific markup uses instructions which are specific to the certain software that produces document. Generic markup indentifies structure of a document. The beginning of generic markup was macros for typesetting language. Series of commands can be executed by software instruction which called macro. Furthermore, macro keeps track of chapter numbers in document and the change of macro can modify the feature of the chosen chapter.
The first presentation about markup languages was made by William Tunnicliffe at Canadian Government Printing Office in 1967. Later in 1969 Raymond Lorie, Edward Mosher, and Charles Goldfarb created Generalized Markup Language (GML). GML was based on ideas of Tunnicliffe. In 1978 GML committee decide to make changes in markup and make it generic. They limited database searches to title by marking title as
HyperText Markup Language (HTML) is a markup language that creates web pages. Nowadays, all web pages are made by using HTML or another language which uses basic elements of HTML. Since 1990 when Tim Berners-Lee invented HTML, this markup language started to advance and now it is the most primary language in a world of web. HTML developed from HTML 1.0 to HTML 4.0 in 7 years.
In 1994, first HTML 2.0 document was released
First draft HTML 3.0 was released in 1995
In 1996, HTML 3.0 was developed to HTML 3.2
HTML 4.0 was released later in 1997. HTML 4.0 is the current version of HTML.
XML was created in 1996 by XML Working Group which was lead by Jon Bosak. The design goal of XML was:
XML should be easily accessible over the internet
Various applications must be supported by XML
XML should be similar to SGML. SGML is another markup language which was created before SGML. More precise information about SGML will be given in the next section
XML should be designed brief and official.
It should be easy for beginners to create documents in XML
XML, Extensible Markup Language, is a standard for document markup which is supported by World Wide Web Consortium (W3C). XML is markup language and a restricted form of Standardized General Markup Language (SGML) which provides a standard format for electronic documents. This format can be easily customized for web sites, vector graphics, object serialization and voice-mail systems. Programs which control data in XML documents can be written by users. So, the availability of libraries in different languages that can write and read XML, allows users to concentrate on the unique needs of their program. Moreover, XML documents can be modified using text editors and web browsers.
Data of XML is included as strings of text in this type of documents. XML and HTML documents look similar. However; there are some important differences. First of all, XML is a matemarkup language. That means XML provides sets of elements and tags that can work for everybody in different areas of interest. For example, Chemists can use molecules, atoms and reactions by using appropriate elements. Musicians can use elements notes, lyrics and much more that concerns their job. XML is a language that can be adapted to meet different requirements. Basic unit of data of XML is called an element. XML identify how elements surrounded by tags, the specification of tags and the names which appropriate for elements. The documents which satisfy these requirements are successfully formed; Otherwise documents containing errors will be rejected by XML.
Documents in XML are compared with specific scheme. If a document matches scheme then this document considered valid. Invalid documents are not equivalent to scheme, but it is not important for all documents to be valid. There are various XML schemes. Document type definition (DTD) is the mostly supported scheme which is the only scheme that supported by XML 1.0. DTD contains the entire legal markup and identifies places where these markups can be inserted in the document.
Worldwide standard for the explanation of device-independent, system-independent ways if representing texts in electronic form called Standard Generalized Markup Language (SGML). SGML documents are not limited by size and can form an independent unit that transported either electronically or in printed form. SGML documents contain interconnected elements that hold data serving for specific purpose. Document Type Definition (DTD) characterizes document’s structure according to the elements it contains and the order of these elements. DTD is linked with a SGML. Within the DTD each element in the document named in order to recognize the role of this element. The recognized elements can form the tags that identify the start and the end of elements, when they placed within markup delimiters. SGML parses must authorize files that positioned into the repository.
SGML system can be constructed from interrelated software components. Data repository is a central part of SGML system. Before the documents are dispatched from the repository another validation needed in order to guarantee that referenced data has not been neglected from the transmitted data set.
SGML documents can be formatted according to features required by the author. There is a standard for defining:
the structure of document’s classes;
characters for markup, and in the text of document;
common text that will be written;
special marking up techniques like tag minimization or identification of various versions of document;
Attributes qualify some SGML markup tags. These attributes can help to identify specific tags, recall data that is externally stored and indicate the role of elements. Moreover attributes can control the way of presenting text to readers. SGML not only make easy to associate feature to text by short form of tags, but also named character references can be used to demand characters that are not included in character set of word processor.
In order to modify SGML-coded document, it needed to be converted into form that word processors can understand. It is a complex process for generalized SGML document and there one application which is specifically designed for converting word -processed text into and out of SGML. This application is called HyperText Markup Language (HTML) that is used on the World Wide Web.
-HTML and XHTML
HyperText Markup Language (HTML) is the computer program that allows user to create own web page by formatting text and present it to the Internet public. First of all, hypertext and universality are the main qualities of HTML. Hypertext means that it possible to insert a link in a webpage that leads to another webpage. So, it is possible to access information on the internet from various places. Secondly, HTML is universal language, because these types of documents can be accessed from any computer such as Macintosh, UNIX and Microsoft. Special programs were designed in order to read HTML pages. These programs called browsers and the popular ones are Mozilla Firefox, Internet Explorer, Netscape operator and Opera.
XML looks like HTML language; However , XML is a language that not only helps to create web pages, but also the program by which author can create his custom markup language to format documents. By using XML It is possible to use written data as custom information and used it further in the text. Nevertheless, XML is not as merciful as HTML. XML need careful coding, since XML is sensitive to upper and lowercase letters, quotations and tags. There is a huge number of HTML pages that can be read by any browser. So, World Wide Web Consortium (W3C) decided to write HTML in XML. This new invention that called XHTML had all features of HTML as well as XML’s power and flexibility.
There are two different categories of text processors:
WYSIWYG (what you see is what you get) system. User can see exactly what his document will look like while typing text.
Markup system, Authors type text scattered with formatting commands, but they are unable to see appearance of their document while working on text. In order to see final result user must run the program.
Since the final product can be seen by author, WYSIWYG systems have the obvious advantage; however, person can easily be distracted by the appearance of the document and will be unable to write the paper effectively. Because of these reasons, people who want academic paper prefer markup systems. TEX is an example of the markup system that was built in early eighteens, before WYSIWYG systems were widespread.
Advantages of TEX
TEX documents formed by the help of macros, that define format of each component. By the help of macros users don’t have to write unnecessary formatting commands. For example, in order to write a footnote it is enough to write “footnote” command, instead of moving text to the bottom of page. Macros make TEX flexible. By changing definition of certain macro, the appearance of a document can be changed. TEX is easily adaptable, since new commands can be created. In order to print document which is written in TEX, user needs to give the output to a driver program which will convert descriptions into commands that understood by printers. Tex is doesn’t rely on technology of printers; you just write a new driver when technology changes. TEX is freely available over the internet. TEX is an ideal program to create academic papers and even books.
Disadvantages of TEX
First of all, beginners have difficulty at getting used to TEX programming. It is easy to write primary macros, but writing complex macros is not a beginner’s work. Moreover, memory and design of TEX is limited, it doesn’t grow dynamically and it uses only its own fonts. The most spread problem is interactivity of TEX. However, there is a solution to this problem. Many programs that give additional features to TEX are available. For example, Macintosh users can access the program named TEXtures that helps user to:
Edit multiple texts at the same time with a multi-window editor;
View the output of the program on the screen by the help of previewer;
A printer driver;
PC clone users can only edit one text at a time. User, should first of all edit a file, then run this document by the help of TEX, afterwards use previewer in order to see the output on the screen. If there are mistakes in the text user must write all of them down then edit file again.
Nowadays, most common used systems are WYSIWYG programs that use a TEX “engine”. User can easily change appearance by few mouse clicks without having any knowledge about TEX.
Cite This Work
To export a reference to this article please select a referencing style below: