# Rte And Textual Entailment Computer Science Essay

**Published:** **Last Edited:**

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

In the current section, different approaches for tree edit distance algorithms will be presented. This algorithm represents the core of our work. It is finding the minimum distance between two trees, also the sequence of edit operation to transfer one tree to another.

Edit operations were proposed by (Selkow, 1977). Selkow implements a recursive algorithm for computing the minimum operations' sequence that transformed one tree to another. These edit operations are relabel, delete and insert. Delete and insert operations are enabled on tree leaves solely and relabel at every node. Also, there is a nonnegative cost associate with each edit operation.

Unalike Selkow's approach, (Tai, 1979) proposed general edit operations. Tai uses a dynamic programming algorithm (Cormen, Leiserson, Rivest, & Stein, 2001; Dasgupta, Papadimitriou, & Vazirani, 2006) to solve the tree-to-tree correction problem between two trees. All trees are considered, rooted, ordered (which means a left to right order among siblings in the tree is given) and labeled (which means each node is an assigned a symbol from a fixed set of an alphabet) trees. Also, the root is the first node that visited then the subtrees are traversed in preorder. This algorithm is extended to a notion of string-to-string correction (Wagner & Fischer, 1974). Tai's algorithm uses the following basic operations: changing two nodes label, deleting a non rooted node (which means all the children of a deleted node become the children of the parent of that node) and inserting a node (which means an inserted node become a parent to some or all the sequence in the left to right order of its parent).

Subsequently, Zhang and Shasha (Zhang & Shasha, 1989) present an efficient technique based on dynamic programming to calculate the approximate tree matching for two rooted postordered trees. Approximate tree matching allows us to match tree with solely some parts of another tree not a whole. In this algorithm, for all the offspring of each node, the lowest cost mapping has to be calculated before the node is met. Hence, the least-cost mapping can be chosen right away. To do this, the algorithm pursues of the keyroots of the tree, which are a set that includes all nodes having a left sibling as well as the root of the tree. The number of keyroots must be identical to the sum of leaves in the tree. This algorithm will be surveyed with more circumstances in chapter 4.

Klein suggests a new version of the tree edit distance algorithm to comparing shapes (P. Klein, Tirthapura, Sharvit, & Kimia, 2000). Their dynamic program calculates the edit distance between two rooted trees T1 and T2 according to the following. The authors defining a half-problem that is Hi= (Ti, si), where si is an Euler string (P. Klein, 1998) (si specifies a part of Ti). For each tree, a pair of (H1, H2) is a sub-problem. A dynamic algorithm calculates a type of distance between the indicated portions of the two trees for each sub-problem in several set. Finally, the edit distance is computed by the dynamic algorithm for the sub-problem where both strings totally cover the two trees.

Elsewhere, Garfalakis and Kumar (Garofalakis & Kumar, 2005) develops an algorithm for efficient embedding of rooted and ordered trees as points in a L1 vector space. The authors combine their algorithm with pseudo random sketching approaches to get novel and small space algorithms. Then, these algorithms are used for establishing concise sketch synopses and approximating similarity joins over streaming XML data.

In another study, Akutsu (Akutsu, 2006) judges that there is a relation between the edit distance for two rooted ordered trees and the edit distance of its corresponding Euler strings. The author demonstrates that the edit distance between two trees is at least half the edit distance between its corresponding Euler strings.

Mehdad (Mehdad, 2009) states that selecting a best set of the cost of edit operations has been challenged. The author proposes an automatic technique based on Particle Swarm Optimization (PSO) for estimating the finest set of cost operations in the tree edit distance algorithm. PSO is a stochastic technique that depends on the social behavior of bird flocking and fish schooling (Kennedy, Eberhart, & Shi, 2001).

## Recognizing Textual Entailment challenges

Recognizing Textual Entailment (RTE) is one of the resent challenges of Natural Language Processing (NLP). Therefore, a lot of competitions for organizing RTE have been done in the literature. Both the new characteristics and the quality of training and test data were improved every edition. A summarization for every challenge will be explained in the following sub-sections.

## First RTE (RTE-1) challenge [1]

The first PASCAL RTE Challenge (Dagan, Magini, & Glickman, 2005) supplied the first benchmark for the entailment recognizing task. It is trying to promote an abstract general task that captures major semantic inference requires across applications. The dataset of RTE-1 [2] that manually collected consists of text snippet pairs, termed text-hypothesis (T-H) pairs. The text consists of one or two sentence(s) while the hypothesis consists of solely one sentence. The task requires judging whether H can be inferred (entailed) from T for each given (T-H) pair.

The dataset of RTE-1 divided to development, which manually labeled for entailment (entails or not) by annotators, and test datasets. This dataset was collected to cover seven text processing applications: Information Retrieval (IR), Question Answering (QA), Information Extraction (IE), Machine Translation (MT), Reading Comprehension (RC), Comparable Documents (CD) and Paraphrase Acquisition (PP).

The template for each pair in both development and test sets is formatted, in XML files, as follows:

<pair id="id_num" task="task_acronym" value="TRUE|FALSE">

<t> the text... </t>

<h> the hypothesis... </h>

</pair>

Where:

<pair> and </pair> : the start and the end of each pair.

id: a unique numeral identifier of the T-H pair.

task: the acronym of the application type, like IR, QA, ..etc.

value: the result of entailment (TRUE: T entails H, FALSE: otherwise), this in development set only.

<t> and </t>: the start and the end of the text.

<h> and </h>: the start and the end of the hypothesis.

The RTE-1 development and test datasets consist of 567 and 800 examples.

## Second RTE (RTE-2) challenge [3]

The primary aim of PASCAL RTE-2 (Bar-Haim et al., 2006a) was to support the community of research on textual entailment. As the RTE-1 challenge, the primary task is judging whether H is entailed (inferred) by T. The authors focus to collecting dataset that give more "realistic" T-H examples, which depend mostly on outputs of real systems. Also, these examples represent different levels of entailment reasoning, such as lexical, syntactic, morphological and logical. One of RTE-2 improvement is supplied some preprocessing for the examples, involving sentence splitting, using MXTERMINATOR (Reynar & Ratnaparkhi, 1997) and dependency parsing, using MINIPAR (Lin, 1998a). The RTE-2 follows the basic setting of RTE-1 (1-2 sentences to T and one sentence to H). Development and test datasets in RTE-2 [4] consist of 800 examples for each one. These examples are covered four application area (not seven like in RTE-1): IR, IE, QA and Multi-Documents Summarization (SUM), which equal CD in RTE-1. The authors also define the textual entailment as: "We say that T entails H if, typically, a human reading T would infer that H is most likely true". Furthermore, some extra judgment criteria and guidelines are explained, such as:

H must be entailed from T, while the opposite may not.

T does not entail H, if parts of H cannot be inferred from T.

Judge "entail", when inference is a very probable but not completely certain.

Presupposition of common knowledge is allowed.

## Third RTE challenge [5]

## As in the RTE-2, the RTE-3 dataset [6] consisted of the same number of T-H pairs (800 pairs for the development dataset and the same for the test dataset) and the same four application areas (IE, IR, QA and SUM) (Giampiccolo, Magnini, Dagan, & Dolan, 2007). A new thing in the RTE-3 dataset is that some of longer Ts (up to a paragraph in length) were included to have more realistic scenarios, which stimulate the need for discourse analysis. The longer Ts were marked as L when exceeding 270 bytes and their percentage about 17% of the total of the test dataset. Nevertheless, the majority of examples stayed close to those in preceding challenges, providing pairs with short texts. Furthermore, the US National Institute of Standards and Technology (NIST) suggested an optional pilot task, named "Extending the Evaluation of Inference from Texts". This needed the participating systems:

## To give a more comprehensive judgment, creation a three-way decision ("YES", "NO" and "UKNOWN") versus the same test dataset used in the main task.

## To give good reasons for decisions that made.

## Fourth RTE challenge

In this version of RTE challenge, there is no dataset for development was provided, and the participants solely used the past RTE data for training. As in RTE-2 and RTE-3, the RTE-4 covered the same four area application (IE, IR, QA and SUM). The difference in the current challenge is that 1000 T-H pairs in the RTE-4 (300 pairs for each IE and IR, while 200 pairs for QA and SUM) instead of 800 in the previous tasks. In particular, new relations, such as "X discover Y" and "X win Y" were produced for IE dataset pairs. Moreover, the RTE-4 challenge offered two classification techniques: the classic two-way task (as in RTE-1 and RTE-2) and three-way task (as in RTE-3). The classic two-way RTE task, the (T-H) pair will be marked as ENTAILMENT if T entails H, otherwise marked as NO ENTAILMENT. In contrast, in the three-way RTE task, the (T-H) pair will be marked as ENTAILMENT if T entails H, as CONTRADICTION if T contradicts H and as UNKNOWN the truth of H could not be determined based on T.

In 2008,the fourth RTE challenge in 2008 marked a major

Change with respect to the organization of the

campaign, being proposed for the first time as a

track at the Text Analysis Conference (TAC).

The challenge, jointly organized by CELCT and

NIST

## Fifth RTE challenge

## Textual Entailment related work

Different approaches for modeling textual entailment have been suggested in the literature. These approaches ranging from shallow approaches like measuring lexical overlap to syntactic parsing and the WordNet relations' utilization (Bar-Haim et al., 2006b). Most of these approaches apply lexical matching. Some of the approaches represent the text snippets (T and H) as dependency (or syntactic) tree before apply an actual task. Other systems, for solving the T-H entailment problems use the semantic relation such as logical inference. In recent years, there has been a move towards more structured meaning representations, abstracting away from semantically irrelevant surface.

A bag of words (BoW) method is presented by (Glickman, Dagan, & Koppel, 2005), which based on lexical entailment.

Adams's system (Adams, 2006), which based primarily on the notion of lexical overlapped, is developed to the (Glickman, et al., 2005)'s approach. The sentence is treated as BoW in this system. So, the number of overlapping words between a pair (T-H) could be regarded as features. The system starts with a similarity of BoW overlap measure, derived from a combination of WordNet lexical chains to the lexical entailment probability as in (Glickman, et al., 2005). Then, it looks for negations not found in the mapping, and for the lexical edit distance of the mapping. These items are then entered into a decision tree to determine the overall entailment.

Jijkoun and de Rijke (Jijkoun & de Rijke, 2005) describe a similar method. For every (T-H) pair, each sentence considered as the bag of words and computes directed sentence similarity score. To check for entailment, the score is measured against a threshold. Primarily, for every word in H, the most similar word in T is found according to the word similarity measure (Lin, 1998c). If such a similar word exists, the weighted similarity value is added to the score of total similarity. Otherwise, the weight of the word (using idf [7] ) is subtracted, penalizing words in H matching words in T.

The BLEU (Bilingual Evaluation Understudy) algorithm was proposed by (Papineni, Roukos, Ward, & Zhu, 2002) as a procedure for automatic evaluation of machine translation systems. In sum, the algorithm looks for n-gram coincidences between a candidate text that automatically produced a translation and a set of the human made translations' texts. The BLEU algorithm steps can be summarized as follow:

For different values of n (usually 1-4), find the percentage of n-grams from the candidate translation that show through any of the human translations.

Merge the marks acquired for each value of n, as a weighted linear average.

Apply a brevity factor to penalize short candidate texts.

On the strength of the BLEU output, the entailment is judged as true or false between T-H pairs (Perez & Alfonseca, 2005).

A tree edit distance algorithm that described in (Shasha & Zhang, 1990) is used by (Kouylekov & Magnini, 2005) to calculate the match between two dependency trees of both texts (text and hypothesis). For both text and hypothesis, a MXTerm splitter (Ratnaparkhi, 1996) and a Minipar parser (Lin, 1998b) are used to create a syntactic representation. According to their technique, the entailment between text and hypothesis occurs if an overall cost of a sequence of transformation (deletion, insertion and substitution) applied to the text such that we can get the hypothesis below a certain threshold. Estimation of the cost for editing operations among words is considered a key aspect of this technique. Each dependency relation R between tow nodes (A and B) has been rewritten as a complex node B-R (contain the node B and its relation name R with the node A). This is because that the dependency tree has a label between its nodes, while the algorithm of (Shasha & Zhang, 1990) does not consider them. This system does not use external resources. Furthermore, an updated for the previous technique is presented (Kouylekov, 2006). In this system, different functions for computing the relevance of a word, which is explained in (Kouylekov & Magnini, n.d.), have been experimented. This system aims to compare various resources contributions providing entailment (including lexical) rules like WordNet (Fellbaum, 1998) and UniAlberta thesaurus [8] and syntactic rules, which automatically acquired by the DIRT (Lin & Pantel, 2002) and TEASE (Szpektor, Tanev, Dagan, & Coppola, 2004) systems. In addition, an Answer Validation module for textual entailment based question answering and a relation extraction system have been developed and evaluated in this system.

A complex system is described by (Kozareva & Montoyo, 2006). The authors joined the initial method, which used by (Kouylekov & Magnini, 2005), with machine learning algorithms, which are Support Vector Machines (SVMs) (Collobert & Bengio, 2001).

In (Katrenko & Adriaans, 2006)'s system, the task of textual entailment by using tree mining and matching technique is addressed. The authors used MINIPAR dependency tree to represent the pair of texts. Then, they performed a syntactic matching method, which is proposed by (Zaki, 2005), between the pre-order format for each dependency tree, which resulted by the depth first search algorithm.

Another system is based on probabilistic transformations of the dependency trees (Harmeling, 2007). Possible set of transformations that was successfully applied to a RTE3 challenge data were also described.

Tree edit distance based approach is proposed by (Iftene & Balahur-Dobrescu, 2007). The authors used MINIPAR and LingPipe [9] , which find the named entities, for each (T-H) pair. Then, the same mapping technique that used by (Kouylekov & Magnini, 2005) is used to find the mapping between every an entity in the dependency tree associated with H (called hypothesis tree) and T (called text tree). For every mapping, a local fitness is calculated and then a global fitness score is computed using the extended local fitness. According to the final score, the entailment is determined. This system uses many resources like WordNet, DIRT, background knowledge (an extraction module for Wikipedia) and Acronym [10] database.

Nergi et al. (Negri, Kouylekov, Magnini, Mehdad, & Cabrio, 2009) present a freely open-source software package version 1.0, called EDITS [11] , for recognizing Textual Entailment. This tool is developed by FBK-irst, Trento, Italy. EDITS is the tree edit distance based approach for recognizing textual entailment. EDITS consists of three main components that can be extended by a user. First, an edit distance algorithm that computes the set of edit operations to transform T into H. This component provides three levels of distance algorithms, which are a string edit distance (over a sequence of characters), a token edit distance (over a sequence of tokens) and the tree edit distance (over a sequence of syntactic tree nodes). Second, a cost scheme that defines the cost function for each edit operation. Third, optional rules set for both entailment and contradiction rules. These rules can be manually defined or extracted from any resources such as WordNet, Wikipedia, ...etc. The main motivation underlying our work on EDITS is that, given the complexity of the TE task, due both to the linguistic phenomena that are involved, and to the variability of the application scenarios, TE engines actually need high capabilities to be adapted and optimized.

A PSO based method to improve the tree edit distance algorithm to textual entailment recognition is introduced by (Mehdad & Magnini, 2009). The authors use PSO technique to estimate automatically the cost for each edit operation in the tree edit distance algorithm. Tow fitness functions are used by PSO in this system. These functions are Bhattacharyya distance (Bhattacharyya, 1943), which find a similarity between two discrete probability distributions statistically, and accuracy, which is any performance measure obtained from a TED based system. The authors find that maximizing the accuracy would lead to increase the performance of the system to solve the problem.

A probabilistic model to textual entailment that depend on a calculus of transformations on dependency trees is introduced by (Harmeling, 2009), which is characterized by the fact that derivations in that calculus preserve the truth only with a certain probability. Stanford's parser (D. Klein & Manning, 2003) is used to extract the dependency tree for both T and H. The author also describes a heuristic approach that generates derivations for T-H pairs in such calculus.

The system presented by Pakray et al. (Pakray, Gelbukh, & Bandyopadhyay, 2010), proposed a development of syntactic textual entailment approach. This system uses Stanford's parser to extract T-H pairs syntactic structures. Then, the Hs relations are compared with Ts relations according to the following comparisons: subject-verb, WordNet based subject-verb, subject-subject, WordNet based object-verb, cross subject-object, number, noun, prepositional phrase, determiner and other relation comparisons. Each comparison gives score 1 in case of a complete match for the dependency relation along with all its argument matches in both T and H, otherwise gives score 0.5. The final entailment decision is determined according to the threshold that equal 0.3.

Recently, a Cross-Lingual Textual Entailment (CLTE) as a semantic relation between two different language text portions is investigated by (Mehdad, Negri, Federico, & Trento, 2010). Also, the definition of textual entailment is adapted by define CLTE as follows: "a relation between two natural language portions in different languages, namely a text T (e.g. in English), and a hypothesis H (e.g. in French), that holds if a human after reading T would infer that H is most likely true, or otherwise stated, the meaning of H can be entailed (inferred) from T". So, two main directions for CLTE can be seen: i) simply bring CLTE back to the monolingual case by translating H into the language of T, or vice-versa; ii) try to embed cross-lingual processing techniques inside the TE recognition process.

We argue that cross-lingual textual entailment (CLTE) can be a core technology for several cross-lingual NLP applications and tasks.