MANAGING LARGE TEXT MASSES
Professor: Ari Visa
Information intensive industries need lot of management disciplines because information must be changed to knowledge. This change process needs tools to keep high efficiency and effectiveness. The tools to this information flow processing must handle lot of noise that must be removed or masked. In our project we enter to this interesting area of knowledge creation by using neural nets as tools to analyse large information flows.
In Internet, Intranet and Extranet applications huge volumes of data and text are already available. With suitable methodology in mind and workable methods at hand, it is possible to categorise, analyse and find meaning from these documents. This kind of work is very user centred and needs specific tools. Today there are, for example, businesses that follow the articles written about the commissioned subject in the press, and then report monthly on the articles to their principal.
We have already developed during the Impress project of TEKES a procedure, which has helped us to come closer to this interesting research area. We have gathered databases and text files that can be used in our research work. The results are so far promising and have given us strength to continue in this research area. With the advent of the Internet and big databases the management, the classification and the search of information in documents have become questions of the day.
Our research project has several general objectives:
First, we define a coherent theoretical framework for our research area that has a multidisciplinary character
Second, we formulate the research hypotheses for our different research contexts
Third, we have decided to develop and find an applicable research methods to our different problem formulations. This will be created by collaboration among the different fields of methodology science resources that we have collected to our research team.
All the above mentioned is supported by the rich empirical data and text coming from databases and text databases belonging to our overall context, i.e. company management.
The main objective of the research project is to develop auxiliary means for finding and administering information in the mass of text documents. The scientific objective is to find out if it is possible to find the desired things in a large mass of documents by using self-organising map (SOM) or similar methods combined with linguistic sentence analysis.
Before it is possible to use raw information it must be converted or filtered to knowledge. The context, timeframe, decoding and uncoding are important factors, which affect to the quality of information. However, the information conversion process must happen very efficiently.
The Technology is based on multilevel hierarchies of Self-Organizing Feature Maps and on smart encoding of words. The encoding of the word is language independent. The levels of the hierarchy are word, sentence, and paragraph maps. It is possible to find similar documents and to separate between different types of documents. This is also true considering paragraphs within a document.
The interesting issues can be connected with e.g. financial, technical or strategic information. In the test cases in question the desired issues can also be defined in some other way, and consequently comparison can be made and the method can be developed.
The research work is done by
Helsinki University, Department of General Linguistics
Tampere University of Technology, Signal Processing Laborary
Tampere University ot Technology, Pori School of Technology and Economics
Abo Akademi University, Laboratory of Information Systems
Finland during 1.1. 2000 - 31. 12. 2002. The budget of this three years project is roughly 600000 EUROs. The work is carried out in close co-operation with the following industrial companies:
Finnish Forest Industries Federation
Kemira Pigments Oy
Ramse Consulting Oy
Teollisuuden Voima Oy
The project manager is professor Ari Visa.
The following list of publications consists so far of those publications that have been published during the GILTA project.
Refereed International Conference Papers, see List
Back to Home page