|
|
|
most probable one. However, this does not remove the problem presented above. This problem
is not caused by a flaw of the particular induction algorithm used by KATE since we could have
used another algorithm and encounter a similar problem. It is not a flaw of the decision tree
representation formalism since we could have used production rules generated automatically or
manually and still run into this same problem. It is caused by the fact that we are reasoning
using an abstraction of the training cases and have generalized away and thus lost some
discriminant information. If the consultation system is to handle any configuration of unknown
values, such as for applications that deal with photo-interpretation of objects whose features
may be hidden in any combinations, case-based reasoning will always perform better than rule-
based, decision tree-based or even neural network-based identification systems.
This has been confirmed by a set of experiments conducted using PATDEX. We have
measured its ability to reach a correct solution when the working case is incomplete (i.e.
contains unknown values). Experiments have been conducted with a training set of one
hundred cases. The test set also consists of one hundred cases. For every test case the number
of known symptom values has been stepwise reduced. Classification accuracy is measured
against reduction of the presented information. The results are shown in table 1. Here, a
reduced information of 70% means that every case is classified based on 30% of its known
symptom values (where 60% of such cases have been correctly classified).
|
|
|
|
|
Table 2 - Measuring Correctness against Reduction of Information
As confirmed by this set of experiments, up to a certain limit, classification accuracy is not
significantly decreased by reducing the number of known attribute values in the current case.
For instance, when half of the values are missing the system still correctly identifies 90% of the
test cases. When using induction, a single missing value for an attribute in the decision tree
(this corresponds to a 0.5% reduction in the information available) yields a loss of 50% in
accuracy. When a feature is unknown, a case-based reasoning tool looks for alternative features
to identify the current case. CBR reacts dynamically and exploit all the information available. In
addition, a CBR system is more resilient to errors made by the user during consultation since it
computes a similarity measure from the global description of the cases and not a minimal subset
like with the inductive approach. It can confirm the conclusions by asking additional questions
that modify the similarity measure accordingly.
This does not imply that CBR always performs better than induction. During the first year of
INRECA, we have defined a catalog of industrial criteria to conduct experiments and compare
the two technologies. Our criteria catalog does not merely adresses technical issues such as
performance and effectiveness, but also ergonomic and economic aspects such as user
acceptance of the technology (domain specialist, naive end-user, data clerk, case engineer etc.),
ease to build, validate and maintain the application and so on. After analysis, we claim that
induction and CBR are complementary techniques and that integrating these will improve their
standalone capabilities. Our comparison is summarized in the next section. The criterias have
|
|