Home arrow License arrow Introduction to Recoin
Newsflash
The new version 0.3 of the  Recoin Library has been released on 20-April-2005 and is available for download at Sourceforge. See the Download section for details.
Introduction to Recoin PDF Print E-mail

On this page I will try to explain to you what Recoin is, how the idea for it came into being and what you can use it for. Since Recoin is still a single-developer project, some thoughts conveyed here might actually make only sense to me. Please feel free to contact me if you have any questions regarding the software. Feedback about the documentation is also highly welcome.

It happened one semester at the University of Hildesheim that a group of young and ambitious men and women came together to design and develop a software for a retrieval system which should help evaluate new ideas and algorithms aimed at increasing the quality and performance of the information retrieval. I have to mention that most of the students taking part in the project were neither experienced programmers nor long-time project managers, but the spirits were high nevertheless.
We were not complete newbies, so we planned carefully, divided people in groups with different tasks, had a timeline and a project manager, and spend a lot of time on the software design which proved that there were ideas in abundance. The natural deadline of the project was marked by the end of the semester and when it came near we had to realize that we did not even come close to what we thought we could achieve.

Well, managing software projects is a complex matter and requires a lot of experience, not to forget a great team. But what really struck me when thinking about the factors for our shortcoming was the amount of time we spent on implementing things like user interfaces, database adapters, or other fundamental building blocks of the system. In fact, it took as so long to get a basic system up and running that in the end there was virtually no time to implement and test our ideas for improving the retrieval system.

And so, out of this not disappointing, but hard-learned lesson, and because there are numerous similar projects out there, the idea for Recoin was born: A framework to integrate retrieval components = Retrieval Component Integrator. The basic idea was to have a modular, highly configurable software system based on pluggable components which could be arranged to represent steps of a somewhat larger process like the information retrieval process, for example, which is comprised of several tasks each based on the outcome of the other: query processing, stopword removel, stemming, document retrieval, result merging, evaluation, etc.
Such a system in which the overall function or process is established through a chain of exchangeable components would allow for more effective experiments. I.e., by replacing one component of the process with another that has the same function but uses a different algorithm, for example, one could immediately experience the effect that new component has on the overall outcome of the process, given that input and other factors remain the same.

Today Recoin consists of different software libraries that are released under different conditions. A basic distinction is made between the Recoin Library and the Recoin Plug-Ins. For more details about the licenses see the legal notice.

The Recoin Library provides the building blocks of the framework and also contains the abstract classes on which third-party developers can base their own implementations of software components in order to use them in the framework. The Recoin Plug-Ins represent an implementation of the Recoin framework as a series of plug-ins for the Eclipse platform and provide user interfaces.

Today, Recoin has grown beyond the realm of information retrieval and represents a platform to modell larger processes based on software components. See also the 'What is Recoin?' page which tries to give a summary of Recoin's functionality.

Future goals for the project include:
  • Making the platform more stable and eliminating bugs.
  • Continuously implementing and testing new functionalities.
  • Create and make available a set of basic IR tools as components.

The acronym Recoin stands for Retrieval Component Integrator and although the name reflects the project's relation to information retrieval, the software is abstract enough to be applicable to a wide range of scenarios. Recoin is a framework that offers a set of classes with which it is possible to build a data processing application out of software components that each represent a subtask of the overall process. New components can be declared and loaded into the framework via a reflection mechanism and the order in which they are adressed can be specified, thus allowing one component's output to serve as another's input.
The framework provides means to flexibly set up and configure these tasks, for example by changing the order of the components, or by substituting one component for another that uses a different algorithm, or simply by changing processing parameters of the components. The setup and configuration of all components is managed in a persistent repository while detailed information about what components and what parameters were used for data processing are saved on a per-run basis.

The main purpose of the framework is to be able to record and examine the different stages, their parameters, and intermediate results that lead from an initial input to the system to a final output. The goal is to isolate the effects that single components or even single parameters have on the overall result, assuming the quality of the result is measurable.

Information Retrieval research presents a perfect field of application for Recoin, because on the one hand, the process of retrieving information out of a large pool of data comprises lots of very different steps which could be represented by Recoin's components. On the other hand there exist large collections of data used in international research efforts and campaigns like TREC or CLEF that have been pre-examined by professionals as to what documents are relevant for a given number of questions, thus making it possible to measure the quality of the output of any IR system responding to these questions.

By implementing the tools of information retrieval systems as components in Recoin, it becomes possible to ascertain their contribution to the quality of the outcome. Even more, by learning more about the variables that influence the retrieval quality and by using the evaluation data as feedback, Recoin could be trained to decide what setup of components and what parameters are most likely to yield a good retrieval result. The design and implementation of such an intelligent module is one of Recoin's future goals.


< Prev   Next >
Copyright 2000 - 2005 Miro International Pty Ltd. All rights reserved.
Mambo is Free Software released under the GNU/GPL License.