Share Email Print

Proceedings Paper

Learning to recognize reusable software by induction
Author(s): Juan Carlos Esteva; Robert G. Reynolds
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

The goal of the Partial Metrics Project is the automatic acquisition of planning knowledge from target code modules in a program library . In the current prototype the system is given a target code module written in Ada as input, and the result is a sequence of generalized transformations that can be used to design a class of related modules. This is accomplished by embedding techniques from Artificial Intelligence into the traditional structure of a compiler. The compiler performs compilation in reverse, starting with detailed code and producing an abstract description of it. The principal task facing the compiler is to find a decomposition of the target code into a collection of syntactic components that are nearly decomposable. Here, nearly decomposable corresponds to the need for each code segment to be nearly independent syntactically from the others. The most independent segments are then the target of the code generalization process. This process can be described as a form of chunking and is implemented here in terms of explanation-bas|d learning. Chunking has been shown to be an important vehicle for learning in other application domains as well . The problem of producing nearly decomposable code components becomes difficult when target code module is not well structured. The task facing users of the system is to be able to identify well-structured code modules from a library of modules that are suitable for input to the system. In this paper we describe the use of inductive learning techniques, namely variations on Quinlan’s ID3 system that are capable of producing a decision tree that can be used to conceptually distinguish between well and poorly structured code. In order to accomplish that task a set of high-level concepts used by software engineers to characterize structurally understandable code were identified. Next, each of these concepts was operationalized in terms of code complexity metrics than can be easily calculated during the compilation process. These metrics are related to various aspects of the program structure including its coupling, cohesion, data structure, control structure, and documentation. Each candidate module was then described in terms of a collection of such metrics. Using a training set of positive and negative examples of well-structured modules, each described in terms of the appointed metrics, a decision tree was produce that was used to recognize other well-structured modules in terms of their metric properties. This approach was applied to modules from existing software libraries in a variety of domains such as vision and numerical methods. The results achieved by the system were then benchmarked against the performance of experienced programmers in terms of recognizing well structured code. In a test case involving 82 modules, the system was able to discriminate between poor and well-structured code 99% of the time as compared to an 80% average for the 25 programmers sampled. The results suggest that such an inductive system can serve as a practical mechanism for effectively identifying reusable code modules in terms of their structural properties

Paper Details

Date Published: 1 January 1990
PDF: 17 pages
Proc. SPIE 1293, Applications of Artificial Intelligence VIII, (1 January 1990); doi: 10.1117/12.21115
Show Author Affiliations
Juan Carlos Esteva, Eastern Michigan Univ. (United States)
Robert G. Reynolds, Wayne State Univ. (United States)

Published in SPIE Proceedings Vol. 1293:
Applications of Artificial Intelligence VIII
Mohan M. Trivedi, Editor(s)

© SPIE. Terms of Use
Back to Top
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research
Forgot your username?