1
2
3
4
5
6
7
8
9
2. Methodology
In the computer sciences, knowledge is a controversial term [9]. We
thus offer a working definition for our purpose in biology that consists of three kinds
of knowledge: domain, instantiated and derived.
Domain knowledge (or background knowledge) relates to the definition of what is
observable, i.e., build a descriptive model that corresponds to the modeling of data, or
metadata [6]. Instantiated knowledge refers to the description of observedinstances
(case descriptions). Derived knowledge can be compared with produced hypotheses
(cluster definitions, decision trees, rules, identification) discovered from domain and
instantiated knowledge. Obviously, knowledge is also grounded in expert's mind and
what is "extracted" is but a minimal part of his or her experience.
Knowledge Discovery methodology views knowledge as an output of a linear
process of input data handling [8]. In biological domains, our emphasis is placed on a
different interpretation of knowledge which consists of both input (domain and
instantiated) and output (derived). This viewpoint is more relevant to Case-Based
Reasoning methodology: i.e. the CBR cycle described in [1] with an extensive use of
domain knowledge in the processing phase.
In practice, knowledge is extracted with IKBS by a cyclical process, divided into
three parts:
- Knowledge acquisition:
- Acquire a descriptive model (domain knowledge or observable facts),
- Acquire descriptions (observed facts or cases),
- Knowledge processing:
- Generate classification rules with decision tree induction,
- Identify new observations(unknown specimens) with case-based reasoning,
- Knowledge validation and refinement:
- Verify the origin of misidentifications by analyzing differences of interpretation between the expert and the users of the knowledge base,
- Iterate on the definition of the descriptive model (characters), update old cases.
For experts in biology, this approach is well suited to the natural process of their knowledge acquisition (conjecture and test) [16]:
- Observe and familiarize oneself,
- Represent observations, i.e. make descriptions,
- Build hypotheses from descriptions (pre-classified), i.e. generate identification keys,
- Test and experiment them with new observations, i.e. identify new specimens,
- Refine their initial knowledge (new characters, cases and classifications).
The last point of the method is fundamental because the building of a knowledge
base in natural sciences is very difficult. This was our experience in applications such
as diagnosis of plant pathologies [12]. It is hard for experts to define the best
representation of reality at once in a descriptive model. The challenge is to acquire the
best character definitions and illustrations leading to interpretations of observations
understood by anyone consulting the knowledge base.