Noël Conruyt, David Grosser
IREMIA, Institut de REcherche en Mathématiques et Informatique Appliquées
University of La Réunion
15, av. René Cassin - 97715 Saint-Denis, Messag. Cedex 9, France
{Conruyt, Grosser}@univ-reunion.fr
Abstract
In many fields dependant upon complex observation, the structuring, depiction and treatment of knowledge can be of great complexity. For example in Systematics, the scientific discipline that investigates bio-diversity, the descriptions of specimens are often highly structured (composite objects, taxonomic attributes), noisy (erroneous or unknown data), and polymorphous (variable or imprecise data).
In this paper, we present IKBS, an Iterative Knowledge Base System for dealing with such complex phenomena. The originality of this system is to implement the scientific method in biology: experimenting (learning rules from examples) and testing (identifying new individuals, improving the initial model and descriptions). This methodology is applied in the following ways in IKBS:
- Knowledge is acquired through a descriptive model that suits the semantic demand of experts,
- Knowledge is processed with an algorithm derived from C4.5 in order to take into account structured knowledge introduced in the previous descriptive model of the domain,
- Knowledge is refined through the use of an iterative process to evaluate the robustness of the descriptive model and descriptions.
The IKBS system is presented here as a life science application facilitating the identification of coral specimens of the family Pocilloporidæ.
1. Introduction
In the natural sciences, data to be processed may be more complex than in other fields. In Systematics, attributes that describe organisms are numerous (> 100) compared with the number of individuals by class which is mostly not representative (< 10): the domain to describe is established deterministically (empirically) rather than probabilistically (statistically) [14]. In such domains, we must take into account diversity and incompleteness, and the exception is the only valid rule.
Learning systems intended to facilitate classification (class definition) and identification of natural organisms must adapt themselves to the representation and process of such reality.
For the necessities of representation, taking into account the structuring of biological knowledge [2], [5] is a progress that allows to consider useful common sense background knowledge in order to acquire, manage and process complex knowledge in a more elegant and efficient way.
The identification procedure that is described in this paper takes care of structured descriptions intelligently by reducing the number of eligible criteria for information gain calculation and manages coherent consultations through a guide to observation (web questionnaire).
Nevertheless, the problem we are faced with in Systematics is more difficult: good identifications depend on previously good classifications from experts, and also good descriptions from other biologists. Nature is so conceived that giving a name to organisms can also be difficult for experts (synonymies' problem), especially when there is a great intra-specific variation. This is the case in coral taxonomy where the number of named species in the world is uncertain [18]. Thus, managing complex knowledge in natural sciences means to cope with such evolving knowledge.
We have designed an Iterative Knowledge Base System to build knowledge bases in natural sciences that responds to these requirements. The main goal of IKBS is to produce quality descriptions which is a key factor for getting better results in identification process [11] and avoid future revisions.