zum Inhalt springen

Juergen Hermes - Applying Machine Learning to Text Mining

Department of Linguistics, University of Cologne

Download the slides here


Machine Learning techniques are widely adopted in the field of Text Mining. By Text Mining (also referred to as text analytics) we understand the process of deriving structured information from (written) natural language data. A preliminary stage in the mining of texts is to group the texts into a given classification system based on textual features. In my presentation, I would like to show a real life example of text classification from our current research activities.


Through our cooperation with the Federal Institute for Vocational Education and Training (Bundesinstitut für Berufsbildung, BIBB, Bonn) we have obtained access to an growing corpus of several million job advertisements, containing the texts of the job ads themselves as well as various metadata like date of publication, region, sector etc. Collections of job advertisements are a useful research object with regard to the analysis of requirements of the rapidly changing German job market. The aim of the cooperation is to extract further details from the full text of each job ad in order to to enrich the database.


As a first step we built a zone analysis application based on machine learning techniques which separates sections in the texts and classifies them into three different content classes:


-Self description of the advertising company

-Description of the advertised job

-Required qualification of the potential candidates.


To evaluate whether one method is superior to other methods, we tested several thousand combinations of feature generation, feature weighting, and classification algorithms. Due to excellent evaluation scores, we can emphasize that job ads are very well suited for zone analysis applications and that the output produced by the methods could be used for further text mining applications.