In this paper, we focus on the extraction of social events in text from the web. We consider social events as complex Named Entities (NE) i.e. NEs represented by a list of properties that can be simple values (text, number, etc.), ”elementary” NEs and/or other complex NEs. Regarding the extraction of these complex NEs, our contribution focuses on the noisy context issue. We propose an original processing method based on supervised learning and patterns that makes it possible to focus property annotation on specific blocks of webpages. This process is generic and independent of the type of NE processed. We experimented and evaluated it with an example of complex NEs: social events. The results obtained show a clear improvement in the extraction process compared to the state of the art. The work was conducted with the objective of generalize it for other categories of complex NEs.

Armel Fotsoh, Annig Le Parc-Lacayrelle, Christian Sallaberry

Partager sur