Skip Nav Destination
Close Modal
Update search
NARROW
Format
Journal
TocHeadingTitle
Date
Availability
1-5 of 5
Weiwei Sun
Close
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Sort by
Journal Articles
Publisher: Journals Gateway
Computational Linguistics (2024) 50 (2): 535–561.
Published: 01 June 2024
FIGURES
| View All (9)
Abstract
View article
PDF
Divergence of languages observed at the surface level is a major challenge encountered by multilingual data representation, especially when typologically distant languages are involved. Drawing inspiration from a formalist Chomskyan perspective towards language universals, Universal Grammar (UG), this article uses deductively pre-defined universals to analyze a multilingually heterogeneous phenomenon, event nominals. In this way, deeper universality of event nominals beneath their huge divergence in different languages is uncovered, which empowers us to break barriers between languages and thus extend insights from some synthetic languages to a non-inflectional language, Mandarin Chinese. Our empirical investigation also demonstrates this UG-inspired schema is effective: With its assistance, the inter-annotator agreement (IAA) for identifying event nominals in Mandarin grows from 88.02% to 94.99%, and automatic detection of event-reading nominalizations on the newly-established data achieves an accuracy of 94.76% and an F 1 score of 91.3%, which significantly surpass those achieved on the pre-existing resource by 9.8% and 5.2%, respectively. Our systematic analysis also sheds light on nominal semantic role labeling. By providing a clear definition and classification on arguments of event nominal, the IAA of this task significantly increases from 90.46% to 98.04%.
Journal Articles
Publisher: Journals Gateway
Computational Linguistics (2021) 47 (1): 43–68.
Published: 21 April 2021
FIGURES
| View All (6)
Abstract
View article
PDF
In this work, we present a phenomenon-oriented comparative analysis of the two dominant approaches in English Resource Semantic (ERS) parsing: classic, knowledge-intensive and neural, data-intensive models. To reflect state-of-the-art neural NLP technologies, a factorization-based parser is introduced that can produce Elementary Dependency Structures much more accurately than previous data-driven parsers. We conduct a suite of tests for different linguistic phenomena to analyze the grammatical competence of different parsers, where we show that, despite comparable performance overall, knowledge- and data-intensive models produce different types of errors, in a way that can be explained by their theoretical properties. This analysis is beneficial to in-depth evaluation of several representative parsing techniques and leads to new directions for parser development.
Journal Articles
Publisher: Journals Gateway
Computational Linguistics (2019) 45 (1): 95–136.
Published: 01 March 2019
FIGURES
| View All (17)
Abstract
View article
PDF
We report our work on building linguistic resources and data-driven parsers in the grammatical relation (GR) analysis for Mandarin Chinese. Chinese, as an analytic language, encodes grammatical information in a highly configurational rather than morphological way. Accordingly, it is possible and reasonable to represent almost all grammatical relations as bilexical dependencies. In this work, we propose to represent grammatical information using general directed dependency graphs. Both only-local and rich long-distance dependencies are explicitly represented. To create high-quality annotations, we take advantage of an existing TreeBank, namely, Chinese TreeBank (CTB), which is grounded on the Government and Binding theory. We define a set of linguistic rules to explore CTB’s implicit phrase structural information and build deep dependency graphs. The reliability of this linguistically motivated GR extraction procedure is highlighted by manual evaluation. Based on the converted corpus, data-driven, including graph- and transition-based, models are explored for Chinese GR parsing. For graph-based parsing, a new perspective, graph merging, is proposed for building flexible dependency graphs: constructing complex graphs via constructing simple subgraphs. Two key problems are discussed in this perspective: (1) how to decompose a complex graph into simple subgraphs, and (2) how to combine subgraphs into a coherent complex graph. For transition-based parsing, we introduce a neural parser based on a list-based transition system. We also discuss several other key problems, including dynamic oracle and beam search for neural transition-based parsing. Evaluation gauges how successful GR parsing for Chinese can be by applying data-driven models. The empirical analysis suggests several directions for future study.
Journal Articles
Publisher: Journals Gateway
Computational Linguistics (2016) 42 (3): 391–419.
Published: 01 September 2016
Abstract
View article
PDF
From the perspective of structural linguistics, we explore paradigmatic and syntagmatic lexical relations for Chinese POS tagging, an important and challenging task for Chinese language processing. Paradigmatic lexical relations are explicitly captured by word clustering on large-scale unlabeled data and are used to design new features to enhance a discriminative tagger. Syntagmatic lexical relations are implicitly captured by syntactic parsing in the constituency formalism, and are utilized via system combination. Experiments on the Penn Chinese Treebank demonstrate the importance of both paradigmatic and syntagmatic relations. Our linguistically motivated, hybrid approaches yield a relative error reduction of 18% in total over state-of-the-art baselines. Despite the effectiveness to boost accuracy, computationally expensive parsers make hybrid systems inappropriate for many realistic NLP applications. In this article, we are also concerned with improving tagging efficiency at test time. In particular, we explore unlabeled data to transfer the predictive power of hybrid models to simple sequence models. Specifically, hybrid systems are utilized to create large-scale pseudo training data for cheap models. Experimental results illustrate that the re-compiled models not only achieve high accuracy with respect to per token classification, but also serve as a front-end to a parser well.
Journal Articles
Publisher: Journals Gateway
Computational Linguistics (2016) 42 (3): 353–389.
Published: 01 September 2016
FIGURES
| View All (13)
Abstract
View article
PDF
Derivations under different grammar formalisms allow extraction of various dependency structures. Particularly, bilexical deep dependency structures beyond surface tree representation can be derived from linguistic analysis grounded by CCG , LFG , and HPSG . Traditionally, these dependency structures are obtained as a by-product of grammar-guided parsers. In this article, we study the alternative data-driven, transition-based approach, which has achieved great success for tree parsing, to build general dependency graphs. We integrate existing tree parsing techniques and present two new transition systems that can generate arbitrary directed graphs in an incremental manner. Statistical parsers that are competitive in both accuracy and efficiency can be built upon these transition systems. Furthermore, the heterogeneous design of transition systems yields diversity of the corresponding parsing models and thus greatly benefits parser ensemble. Concerning the disambiguation problem, we introduce two new techniques, namely, transition combination and tree approximation, to improve parsing quality. Transition combination makes every action performed by a parser significantly change configurations. Therefore, more distinct features can be extracted for statistical disambiguation. With the same goal of extracting informative features, tree approximation induces tree backbones from dependency graphs and re-uses tree parsing techniques to produce tree-related features. We conduct experiments on CCG -grounded functor–argument analysis, LFG -grounded grammatical relation analysis, and HPSG -grounded semantic dependency analysis for English and Chinese. Experiments demonstrate that data-driven models with appropriate transition systems can produce high-quality deep dependency analysis, comparable to more complex grammar-driven models. Experiments also indicate the effectiveness of the heterogeneous design of transition systems for parser ensemble, transition combination, as well as tree approximation for statistical disambiguation.