Detecting Verbal Multi-Word Expressions

In April I attended the Multi-Word Expression Workshop in Valencia to present a system that we developed in the ADAPT Centre to identify verbal multi-word expressions.

Simply put, a verbal multi-word expression is a multi-word expression (MWE) that contains a verb. These verbal MWEs can be idiomatic expressions such as "kick the bucket" or "spill the beans", as well as light-verb construction where the non-verb words describe the action as in "have a conversation" (action described by the noun "conversation"). Phrasal verbs like "to look up" (a word in a dictionary) or "to put up" (with something/someone) are also verbal MWEs. And in some languages, like French or Spanish, verbs can also be used in reflexive form, like "se trouver" (to be located), "se dérouler" (to unfold), "se battre" (to fight, to strive), etc., producing verbal MWEs.

Our system competed at the PARSEME Verbal MWE Identification Shared Task, ranking in second place in most categories. We participated in 15 of the 18 languages for which data was provided.

We addressed this verbal MWE identification task as a named-entity recognition problem. More specifically, we applied conditional random fields (CRF), a state-of-the-art sequence labelling algorithm that is very successful at recognising named entities.

Another intuition, however, is that many verbal MWEs, like the idiomatic expressions, don't have a literal meaning. So, the meaning of the full verbal MWE will be somewhat unrelated to the meaning of each of its individual components. We exploited this intuition by computing distributional semantic similarity scores of vectors representing the full verbal MWE and its individual words. Our assumption is that the lower these similarity scores are, the less literal the MWE is. We integrated these similarity/literalness scores into a single score using linear regression, which we used to re-rank the top 10 label sequences from CRF, achieving 5-10% gains in F1 scores.

You can get all of the details of our approach in our research paper.

You can also see a video recording by Aaron Li-Feng Han of a talk I gave at the Dublin Computational Linguistics Research Seminar on this topic.

<< Go back to the previous page