Departamento de Inteligencia Artificial (Artificial Intelligence Departament)
Summary - proposal (PDF,
ZIP)
Slides of the tutorial (PDF 4Mb,
ZIP 1,94Mb)
References (BIB,
online
searchable bibliography)
The goal of this tutorial is making the audience familiar to the different ways that text is represented in Automated Text Categorization (ATC). I define and describe applications of Text Categorization, and present the general model for learning based ATC. Then I describe a number of works in which proposals for text representation have been presented, including the usage of statistical and linguistic phrases, Information Extraction patterns, and WordNet information. Also specific application text features are presented, based on stylometry and structural text properties, for several tasks including author, language and genre identification, spam detection, etc.
The tutorial is mainly divided into two parts: an overview of ATC, and the discussion of specific text representations. In particular, the tutorial covers the following topics:
A definition of (Automated) Text Categorization is presented, along a number of considerations. Some relevant links to this part are:
A number of applications are brieftly described in this tutorial. Some interesting sites include:
Here I present the learning-based model for ATC. Some interesting links include:
Some links on the topics covered in this part:
Some links on the topics covered in this part:
Some links on the topics covered in this part:
For more information, please visit the Tutorials home page at the EACL'03 website.
José María Gómez HidalgoLast updated February, 28th 2003