ICCBR'99

1 2 3 4 5 6 7 8 9

2. Methodology

In the computer sciences, knowledge is a controversial term [9]. We thus offer a working definition for our purpose in biology that consists of three kinds of knowledge: domain, instantiated and derived.

Domain knowledge (or background knowledge) relates to the definition of what is observable, i.e., build a descriptive model that corresponds to the modeling of data, or metadata [6]. Instantiated knowledge refers to the description of observedinstances (case descriptions). Derived knowledge can be compared with produced hypotheses (cluster definitions, decision trees, rules, identification) discovered from domain and instantiated knowledge. Obviously, knowledge is also grounded in expert's mind and what is "extracted" is but a minimal part of his or her experience.

Knowledge Discovery methodology views knowledge as an output of a linear process of input data handling [8]. In biological domains, our emphasis is placed on a different interpretation of knowledge which consists of both input (domain and instantiated) and output (derived). This viewpoint is more relevant to Case-Based Reasoning methodology: i.e. the CBR cycle described in [1] with an extensive use of domain knowledge in the processing phase.

In practice, knowledge is extracted with IKBS by a cyclical process, divided into three parts:

Knowledge acquisition:
Acquire a descriptive model (domain knowledge or observable facts),
Acquire descriptions (observed facts or cases),

Knowledge processing:
Generate classification rules with decision tree induction,
Identify new observations(unknown specimens) with case-based reasoning,

Knowledge validation and refinement:
Verify the origin of misidentifications by analyzing differences of interpretation between the expert and the users of the knowledge base,
Iterate on the definition of the descriptive model (characters), update old cases.

For experts in biology, this approach is well suited to the natural process of their knowledge acquisition (conjecture and test) [16]:
Observe and familiarize oneself,
Represent observations, i.e. make descriptions,
Build hypotheses from descriptions (pre-classified), i.e. generate identification keys,
Test and experiment them with new observations, i.e. identify new specimens,
Refine their initial knowledge (new characters, cases and classifications).

The last point of the method is fundamental because the building of a knowledge base in natural sciences is very difficult. This was our experience in applications such as diagnosis of plant pathologies [12]. It is hard for experts to define the best representation of reality at once in a descriptive model. The challenge is to acquire the best character definitions and illustrations leading to interpretations of observations understood by anyone consulting the knowledge base.