Automaton story 0.0.17

5/1/2023

The two-stage parser yields significant improvements over the best performing model of discourse parser on the PDTB corpus.'accept': 'text/html,application/xhtml+xml,application/xml q=0.9,image/avif,image/webp,image/apng,*/* q=0.8,application/signed-exchange v=b3 q=0.9', In the latter stage we experiment with different rerankers trained on the first stage $n$-best parses, which are generated using lexico-syntactic local features. The parser adopts a two-stage approach where first the local constraints are applied and then global constraints are used on a reduced weighted search space ($n$-best). The parser model follows up previous approach based on using token-level (local) features with conditional random fields for shallow discourse parsing, which is lacking in structural knowledge of discourse. We present techniques on using inter-sentential or sentence-level (global), data-driven, non-grammatical features in the task of parsing discourse. A good model for discourse structure analysis needs to account both for local dependencies at the token-level and for global dependencies and statistics. Finally, we address the problem of optimization in discourse parsing. This was especially notable when we used evaluation metrics taking partial matches into account for these measures, we achieved F-measure improvements of several points. We evaluated the resulting systems on the standard test set of the PDTB and achieved a rebalancing of precision and recall with improved F-measures across the board. These method uses a set of natural structural constraints as well as others that follow from the annotation guidelines of the Penn Discourse Treebank. Next, we describe two constraint-based methods that can be used to improve the recall of a shallow discourse parser based on conditional random field chunking.

The three system versions are compared to evaluate their robustness with respect to deep/shallow and automatically extracted syntactic features. We train and evaluate three different parsers using the PDTB corpus. The parsing architecture is based on a cascade of decisions supported by Conditional Random Fields (CRF). We present a novel end-to-end discourse parser that, given a plain text document in input, identifies the discourse relations in the text, assigns them a semantic label and detects discourse arguments spans. We also compare the results of cascaded pipeline with a non-cascaded structured prediction setting that shows us definitely the cascaded structured prediction is a better performing method for discourse parsing.

The comparative error analysis investigates the performance variability over connective types and argument positions.

We show that the best combination of features includes syntactic and semantic features. We train the CRFs on lexical, syntactic and semantic features extracted from the Penn Discourse Treebank and evaluate feature combinations on the commonly used test split. We design the argument segmentation task as a cascade of decisions based on conditional random fields (CRFs).

In contrast to previous work we do not make any assumptions on the span of arguments and consider parsing as a token-level sequence labeling task. In this research work first we take a data driven approach to identify arguments of explicit discourse connectives. Parsing discourse is a challenging natural language processing task.

0 Comments

Automaton story 0.0.17

Leave a Reply.

Author

Archives

Categories