1 2 3 4 5 6 7

277

Induction and Reasoning from Cases

most probable one. However, this does not remove the problem presented above. This problem is not caused by a flaw of the particular induction algorithm used by KATE since we could have used another algorithm and encounter a similar problem. It is not a flaw of the decision tree representation formalism since we could have used production rules generated automatically or manually and still run into this same problem. It is caused by the fact that we are reasoning using an abstraction of the training cases and have generalized away and thus lost some discriminant information. If the consultation system is to handle any configuration of unknown values, such as for applications that deal with photo-interpretation of objects whose features may be hidden in any combinations, case-based reasoning will always perform better than rule- based, decision tree-based or even neural network-based identification systems.

This has been confirmed by a set of experiments conducted using PATDEX. We have measured its ability to reach a correct solution when the working case is incomplete (i.e. contains unknown values). Experiments have been conducted with a training set of one hundred cases. The test set also consists of one hundred cases. For every test case the number of known symptom values has been stepwise reduced. Classification accuracy is measured against reduction of the presented information. The results are shown in table 1. Here, a reduced information of 70% means that every case is classified based on 30% of its known symptom values (where 60% of such cases have been correctly classified).

IMAGE imgs/Annexe604.gif

Table 2 - Measuring Correctness against Reduction of Information

As confirmed by this set of experiments, up to a certain limit, classification accuracy is not significantly decreased by reducing the number of known attribute values in the current case. For instance, when half of the values are missing the system still correctly identifies 90% of the test cases. When using induction, a single missing value for an attribute in the decision tree (this corresponds to a 0.5% reduction in the information available) yields a loss of 50% in accuracy. When a feature is unknown, a case-based reasoning tool looks for alternative features to identify the current case. CBR reacts dynamically and exploit all the information available. In addition, a CBR system is more resilient to errors made by the user during consultation since it computes a similarity measure from the global description of the cases and not a minimal subset like with the inductive approach. It can confirm the conclusions by asking additional questions that modify the similarity measure accordingly.

This does not imply that CBR always performs better than induction. During the first year of INRECA, we have defined a catalog of industrial criteria to conduct experiments and compare the two technologies. Our criteria catalog does not merely adresses technical issues such as performance and effectiveness, but also ergonomic and economic aspects such as user acceptance of the technology (domain specialist, naive end-user, data clerk, case engineer etc.), ease to build, validate and maintain the application and so on. After analysis, we claim that induction and CBR are complementary techniques and that integrating these will improve their standalone capabilities. Our comparison is summarized in the next section. The criterias have