Contents


List of Figures


List of Tables

Expert Prediction, Symbolic Learning, and Neural Networks:
An Experiment on Greyhound Racing

H. Chen1, P. Buntin, L. She, S. Sutjahjo, C. Sommer, D. Neely

Abstract:

This research examined in detail the characteristics and feasibility of various prediction techniques, including expert prediction, symbolic learning, and neural networks for complex and uncertain game playing scenarios. A decision-tree building algorithm called ID3 and a Backpropagation network were implemented and their performances were compared with those of human experts. ID3 was chosen because of its simplicity and understandable results. The Backpropagation network was chosen for its predictive power and noise-tolerant capability.

We examined a real-life problem domain, greyhound racing prediction, which involved numerous dog racing variables (fastest time, win-lose ratio, etc.) that were complex and sometimes noisy, but had complete histories that may serve as good performance predictors. Our experiment examined 200 training races and 100 testing races. Actual payoffs for each race allowed us to compare the performance levels of the different techniques. Our research showed that the machine learning techniques out-performed three human track experts in making racing predictions. The Backpropagation network, in particular, achieved the best performance. However, both algorithms were good at predicting long shots, which typically resulted in high payoffs.

Introduction

It is inherent in the nature of problem solving or decision making that it involves some degree of uncertainty that decision makers or problem solvers are always striving to reduce in order to outperform their counterparts. Techniques for lessening degree of uncertainty are constantly being developed and tested. One way to reduce problem uncertainty is by seeking the advice of an expert or becoming an expert. An expert is usually considered to be a person who possesses special knowledge or skill in a specific field. In new techniques that involve the utilization of computers to reduce uncertainty, the computer becomes an ``expert'' in a specific field through a variety of methods.

Machine learning is one such technique. The method involves utilizing a computer algorithm to capture hidden knowledge from data. Machine learning usually encompasses different types of solutions, such as decision trees, production rules, and neural networks [27]. ID3 [18] is an example of a decision-tree building algorithm. Backpropagation networks [21], a popular neural network learning algorithm, exhibits a different knowledge structure and learning rules. However, its predictive power and noise-tolerant property have made it one of the most popular methods for machine learning or knowledge discovery [11] [8].

Using a real-life greyhound racing application as an example, we compared the prediction performances of three track experts versus those of ID3 and a Backpropagation network. Greyhound racing prediction is a task which involves a complex problem space and high uncertainty and therefore allowed us to examine in detail the characteristics and performance of various expert-based and machine learning techniques.

This paper is structured as follows. Section 2 discusses expert prediction and machine learning approaches to game playing. Section 3 presents the ID3 algorithm and describes how it was applied to the application. Section 4 discusses the Backpropagation network solution to the prediction task. Section 5 presents our greyhound application, including attribute selection and system implementation details. In Section 6, the prediction accuracy and the payoffs obtained by the various methods are compared. Section 7 presents conclusions and discusses some further research directions.

Expert Prediction, Symbolic Learning, and Neural Networks

In artificial intelligence, various techniques have been developed and tested in the past two decades for the purpose of reducing complexity and/or uncertainty. Among these techniques, systems based on human or expert heuristics and algorithms which exhibit learning capability have gained significant attention of researchers.

Expert Systems Approach

One method of problem solving is through the creation of expert systems or knowledge-based systems [9]. An expert system can be defined as ``a computer program that represents and reasons with knowledge of some specialist subject with a view of solving problems or giving advice'' [10]. The idea is to make an expert-like system by storing large amounts of domain-specific knowledge with ``condition-action'' pairs, coupled with an inference engine which simulates human reasoning. An expert system is essentially the embodiment of a human's expertise in a computer model. The development of expert systems is rooted in cognitive research and human information processing theory, both of which have contributed significantly to important work in artificial intelligence [15]. Some knowledge-based systems may exhibit ``weak'' problem solving methods for various application domains. Examples of such systems are Newell's GPS (General Problem Solver) [15] and Salzberg's Handicapper [24]. These systems adopted human problem solving strategies (e.g., means-end analysis in GPS) or general information processing heuristics (e.g., Occam's Razor heuristic in Handicapper).

Because expert systems aim to solve real-world problems of significant complexity and uncertainty, most of them represent rule-based solutions to real-world problems. Applications of expert systems have ranged from analysis of chemical compounds to diagnosis of diseases and configuration of computer systems.

Despite their usefulness, expert systems or knowledge-based systems are considered performance systems [25] - they only perform what they were programmed to do, i.e., they are without learning ability. Significant effort often is required to acquire knowledge from domain experts and to maintain and update a knowledge base.

Machine Learning Approach

A newer paradigm, generally considered to be the learning approach, has attracted attention of researchers in artificial intelligence, computer science, and other functional disciplines such as engineering, medicine, and business [13] [2] [11] [27] [3] [4]. In contrast to performance systems which acquire knowledge from human experts, learning systems acquire knowledge automatically from data. The most frequently used techniques include symbolic, inductive learning algorithms such as ID3 [16] and multiple-layered, feed-forward neural networks such as Backpropagation networks [21]. We review these two techniques briefly below.

Over the past years there have been several studies which compared the performance of these techniques for different ``toy'' and real-life applications as well as some systems which used hybrid representations and learning techniques. In [14], Mooney et al. found that ID3 was faster than a Backpropagation net, but the Backpropagation net was more adaptive to noisy data sets. The performances of these two techniques were comparable, however. Weiss and Kapouleas [26] [27] suggested using a re-sampling technique such as leave-one-out for evaluation. Discriminant analysis methods, Backpropagation net, and decision tree-based inductive learning methods (ID3-like) were found to achieve comparable performance for several data sets. Fisher and McKusick [6] found that, using batch learning, Backpropagation performed as well as ID3 but was more noise-resistent. They also compared the effect of incremental learning versus batch learning.

Most of the applications that have been tested were in engineering or biomedical domains. These problem domains are complex and interesting, but the data sets used for testing were relatively clean and structured. In this research, we investigated a different problem solving scenario called game playing, which is unstructured, complex, and not well studied previously. We hoped that by studying such an application we would be able to gain insights about the relative strengths and weaknesses of the different AI techniques.

Game Playing

The scenario in most game playing situations involves a single winner and several losers. In games, participants compete with each other, the best performer at the game becomes the winner and everyone else is a loser. Performance, however, is based on a relative scale, i.e., relative to other participants. A weak performer may still win a game if other participants are even weaker.

It is assumed that participants' behaviors or capabilities are relatively stable in the short run, but may improve or degrade in the long run. Furthermore, a game is assumed to be conducted fairly and is free from other external human factors (i.e., cheating). Animal racing games such as horse racing and greyhound racing exhibit these characteristics. Based on these assumptions, it is tempting to ask, given enough historical information, whether human experts or machine learning algorithms will predict more accurately?

In [24], Salzberg first tested an algorithmic solution to the horse racing problem. He used general human heuristics to formulate hypotheses for Handicapper, a horse racing prediction program. Based on nine cognitive psychology-based heuristics, Handicapper predicted a horse to win in any given race. The heuristics used were based on what Salzberg called ``usefulness.'' For example, the ``unusualness'' heuristic focused on a rare or unusual feature of the situation preceding an unexpected event. ``Occam's Razor'' preferred simple hypotheses over complex ones. In several trials, Handicapper was found to perform better than human experts at picking winners. The heuristics designed for Handicapper and the rationalization process used provide a computer system with an excellent way to select a good hypothesis from among a very large set of candidate hypotheses.

In searching for an interesting domain, we consulted several real-life game-playing scenarios (e.g., horse racing, basketball games, etc.) and decided on greyhound racing for the following reasons. Greyhound racing is a complex problem domain which involves about 50 performance variables for eight competing dogs in a race. The large search space for problem solving and prediction poses a special challenge for human experts and machine learning algorithms. This problem domain consists of a large amount of historical information, some accurate and relevant and some noisy and irrelevant, which needs to be filtered, selected, and analyzed in order to assist in making a prediction. For every race, each dog's past history is complete and freely available to bettors. The question then becomes: can machine learning techniques be used to reduce the uncertainty in a complex game playing scenario? Can these methods outperform human experts in prediction? This research sought to answer these questions.

Greyhound Racing Prediction Using ID3 and Its Variant

Tucson Greyhound Park (TGP), located in Tucson, Arizona, is a recreational arena centered around greyhound dog racing. It was built in 1944 and has contracts with 20 kennels that together house between 1,000 and 1,200 dogs. Currently it holds evening races and matinee races on some afternoons. There are 112 races in an average week. TGP enlists the prediction services of three dog racing experts, who are not affiliated with TGP. To our knowledge, their predictions are solely their own thinking and TGP does not compensate them for publishing their advice in the daily programs. These experts base their predictions on the same information available to any bettor in the daily program.

In this section, we describe in detail the ID3 algorithm and our modifications to it for the greyhound application. ID3 is a decision-tree building algorithm developed by Quinlan [16]. It adopts a divide-and-conquer strategy for classification. Its goal is to classify mixed objects into their associated classes, based on the objects' attribute values.

In the greyhound racing application, a dog can be classified in one of two classes, as a winner or a loser. Each greyhound can be described by a set of attributes such as the number of races it has won, its fastest time, etc. The entropy value of an attribute describes how well that attribute can be used to classify a greyhound (as a winner or a loser) - the lower the entropy value, the less uncertainty, the more useful the attribute. Entropy is calculated by using the following function [16]:


\begin{eqnarray*}{\it entropy} & = & - \sum_{j=win}^{lose} p_{j} \log p_{j}
\end{eqnarray*}


where pj is the probability of a dog being classified as winner or loser. ID3 by design is able to deal with both categorical attributes and continuous values. For categorical attributes, attribute values can be easily enumerated. For continuous values, ID3 performs a sweeping analysis of entropy reduction for all possible partition points (between any two consecutive values) and selects the partition point which provides the most entropy reduction [18]. In essence, the algorithm performs a binary partition. In our system implementation, we developed an ID3 program by adopting this original design. In addition to this version, we also developed a variant of the ID3 algorithm which performed ternary partition for continuous variables.

The ternary partition based variant of the ID3 algorithm first sorted the continuous values in ascending order, e.g., (0.12 winner) (0.15 winner) (0.35 loser) (0.36 loser) (0.45 winner) (0.55 loser) (0.66 winner) (0.70 loser) (0.80 loser) (0.88 loser). It then marked the ``clean'' classes on the two ends of the sorted list as unique classes, e.g., (0.12 winner) (0.15 winner) are in the ``winner'' class and (0.70 loser) (0.80 loser) (0.88 loser) are in the ``loser'' class. Other values in the middle of the list were mixed and remained to be classified by using other attributes.

In our experiment, we found that the ternary partition variant tended to reduce more entropy than the original binary partition algorithm, the reason being that the two ends in the three partitions were clean classes and thus did not have any entropy value. This often resulted in a lower overall entropy value after partition than that produced by binary partition, which often generated two large mixed classes. The resulting decision tree was also less complex than that obtained through binary partition - ternary partition produced a simple ternary tree where each level of the tree contained exactly three branches (see Figure 3). Binary partition branched out two ways at each level, which resulted in a large and more complex binary tree. However, both algorithms were equally efficient, especially in comparison with the neural networks learning algorithms. In our implementation (in ANSI C and running on a DECstation 5000/120), it took only several seconds to build a decision tree from 1600 training cases. ID3's result was easy to understand and it was also simple to trace ID3's classification decisions.

A Connectionist Solution: Backpropagation Networks

Neural network systems learn to discriminate among different input patterns in a holistic manner. They are typically presented with training sets of representative instances of some classes, correctly classified, and they learn to recognize and classify other new instances of these classes. Among the computational models of neural networks proposed, the Backpropagation network has been shown to be theoretically sound [21] [20] and it has demonstrated excellent capability for various complex classification and prediction problems [11].


  
Figure 1: A Backpropagation network
\begin{figure}\rule{6in}{.01in}
\par\vspace{3in}
\par\rule{6in}{.01in}
\end{figure}

A Backpropagation network, as shown in Figure 1, is a multiple-layered, feed-forward neural network. Activation of the network flows in one direction only: from the input layer through the hidden layer, then on to the output layer. Each unit in a layer is connected in the forward direction to every unit in the next layer. A Backpropagation network may contain multiple hidden layers. Knowledge of the network is encoded in the (synaptic) weights between units. The activation levels of the units in the output layer determine the output of the whole network. Despite the promising results demonstrated in various applications, Backpropagation network development still requires extensive experimentation, parameters selection, and human judgment.

Testing the Techniques: A Greyhound Racing Example


  
Figure 2: Excerpt from TGP daily program
\begin{figure}\rule{6in}{.01in}
\par\vspace{4in}
\par\rule{6in}{.01in}
\end{figure}

In this section we review the procedure followed to create the greyhound racing databases and identify useful attributes. As previously noted, the Tucson Greyhound Park makes detailed programs available to its patrons (see Figure 2). Each program contains approximately 15 races, the races being of a variety of grades from ``A'' to ``D,'' with A being the most competitive. A few specially designated races with grades such as ``M'' (maiden race) are also included but were ignored in our experiment. Each race program includes information about eight dogs, including each dog's fastest time. Any time any individual dog records a faster time, it is reflected in future programs. In addition to the fastest time, the dog's total races competed and its number of first, second, third and fourth place finishes are included. Directly below the dog's summary data, the program lists its performance for the last seven of its races. The dog's starting position, its position during the first turn (called the break position), the second and third turns and its finishing position comprise the data included in this seven-race history. In addition, its race time and the grade of the race are recorded. Besides the daily program, TGP also publishes the previous day's results. These contain information about how each dog fared and the payoff odds associated with the winning dogs for all the races held the previous day. Deciding the payoffs was simply a matter of matching programs and result sheets to see whether a particular greyhound won or lost.

In a typical race, the TGP program contains about 50 variables that might or might not affect the outcome of that race. Among these variables are the condition of the track, the weather, dog owner and trainer, and the physical attributes associated with each dog. The problem of devising an appropriate encoding for training data has been widely recognized as being central for all inductive learning efforts. In [23], Flann and Diettrich argued that part of the power of the learning systems derived from the choice of representation as well as from the use of a domain theory. They proposed a ``multiple representation strategy'' which bridges the gap between structural representation and performance representation. For greyhound racing, most of the variables shown in the program can be considered structural variables, from which a small set of performance-related variables needs to be identified (either algorithmically or by some domain experts). The use of ``domain model'' [1] and ``abstraction space'' (of the underlying domain) [5] has also been shown to result in substantial speedup (i.e., it provides guidance for the rule formation program in a space of rules that is too large to search exhaustively and in a domain where the trivial or meaningless associations far outnumber the important ones [1]) and produce better classification results.

As is typical of complex problem-solving scenarios, the first task was to reduce the problem complexity or to prune the problem space by deciding on a smaller set of relevant performance attributes. As in [24] (horse racing prediction), the selection of performance attributes in our research was mainly based on the opinions of frequent bettors, track experts, and the management of TGP. These human experts were able to succinctly identify performance variables from the TGP program which they thought were the most crucial in predicting winners. Their domain knowledge helped to reduce the problem space significantly. In total, the TGP experts suggested the following 10 performance variables:

1.
fastest time: fastest time in seconds for a 5/16 mile race

2.
win percentage: number of races won divided by total number of races

3.
place percentage: number of runner-up places divided by total number of races

4.
show percentage: times in third place divided by total number of races

5.
break average: position during the first turn (averaged over the seven most recent races)

6.
finish average: the average finishing position over the previous seven races

7.
time 7 average: the average finishing time of the seven most recent races (TGP daily program, as shown in Figure 2, provides information for the last seven races of each dog.)

8.
time 3 average: the average finishing time of the three most recent races (Many experts indicated that the performance of a greyhound over the last three races was also an important performance indicator.)

9.
grade average: the average grade of the seven most recent races the dog competed in

10.
up grade: weight given to a dog when dropping down to less competitive race grade(s)

As will be discussed in a later section, identification of relevant performance variables (i.e., pruning the problem space) is probably one of the most important and as well as one of the most time-consuming tasks in our experiment. The machine learning algorithms' inability to analyze the initial noisy and fussy problem domain prompted us to rely on human experts' knowledge and heuristics. This may prove to be one of the weaknesses of some of the prevailing machine learning techniques and makes it an area which deserves more research attention.

The initial stage of the project involved data collection and entry. Some of the variables or attributes we selected required manipulating the data in the programs in order for it to be useful. For instance, we averaged the values listed in a greyhound's seven previous races to obtain the average break position. Another attribute requiring the mean computation of a variable was the finish average, a greyhound's finishing position during the previous seven races. Another two attributes, the time 3 average and the time 7 average, were calculated by computing the mean of the three most recent race times and the seven previous race times, respectively. For the race grade average, we assigned values to each type of grade, with the most competitive race grade, A, receiving a value of four. B grade races received the value of three, C races were assigned a value of two and D races, the least competitive race category, received a value of one. All other types of races, such as maiden races or training races, received a value of zero. These values were then averaged to procure the race grade average attribute. Furthermore, previous race grades were used to assign values when a greyhound moved down a race grade category. This was necessary to account for residual effects left from racing in a more competitive race previously. The value assigned for this ``up grade" attribute depended upon how recently the drop in race grade(s) occurred. If the most recent race was of a higher grade, three points were assigned, two points were given if the more competitive race was two races ago, etc., until the value of zero was given for a greyhound that had not raced in a more competitive event in the last three races.

Greyhound racing is essentially a competitive game. That is, the greyhounds are competing against each other in a win-lose scenario. Since the dogs are competing directly against one another, we manipulated the data such that the values were relatively scaled. This means that after making all calculations, such as averaging a greyhound's finish times, we then assigned the lowest value contained in one single race to be zero. The other values were scaled to reflect their difference from that lowest value. For example, a race program entry contains eight pieces of data that represent each of the eight greyhounds' fastest time. The worst fastest time among these dogs, e.g., 32.00 seconds, was assigned a value of zero. If another dog in that race had a fastest time of 31.00 seconds, then its fastest time was assigned a value of 1.00, since that would be the difference between the worst fastest time of 32.00 seconds, and its own time of 31.00 seconds. Relative scaling of the data was necessary due to the different statistics associated with different race grades, i.e., greyhounds in grade A races tended to have faster times.

As suggested by Weiss and Kulikowski [27], the proportions of training and testing were 2/3 and 1/3, respectively. The training stage for both ID3 and the Backpropagation network consisted of running 200 races. Each race was run by eight greyhounds, for a total of 1600 classified greyhounds. Testing consisted of an additional 100 races, or 800 new cases. We did not use any re-sampling technique for testing due to the large testing data set and the computational requirements for evaluating the Backpropagation algorithm.

Evaluating the performance of different techniques was simply a matter of computing the monetary payoffs. We bet $2 for each dog that an algorithm or an expert predicted to be winner. In some races, the algorithms may have predicted more than one winner (so we bet on all predicted winners) or no winner at all (so no bet was placed). Using the payoff odds given in the result sheets, we were able to compute the final payoffs after betting on the 100 races. In this experiment, for performance comparison purposes we only considered the payoff for first-place winners; we did not consider other more elaborate betting systems such as: Place (your greyhound must finish either first or second), Show (your greyhound must finish either first, second, or third), Quiniela (two greyhounds you bet on must finish first and second), etc. It could be argued that the payoff odds are not necessarily correct since each bet on a particular dog will change the payoff odds for that dog. We contend that since there are numerous bets on any given dog, the odds would not change significantly with the additional bet on a particular dog.

Experimental Results: Experts, ID3, and Backpropagation Network

We developed the programs in ANSI C and ran them on a DECstation 5000/120 (a 25 MIPS machine, running ULTRIX). Table 1 summarizes the numbers of races correctly and incorrectly predicted by the experts, ID3, and the Backpropagation network, respectively, and their final payoffs for 100 races. In 26 races, ID3 decided not to bet on any dog.


 
Table 1: Summary of experimental results

Technique Correct Incorrect Did not bet Payoffs ($)
Expert 1 19 81 0 -71.40
Expert 2 17 83 0 -61.20
Expert 3 18 82 0 -70.20
ID3 34 50 26 69.20
Backprop. 20 80 0 124.80

 


A summary table for the one-way analysis of variance (using MINITAB [22])) of the payoffs of the different approaches is provided in Figure 4. The means for the five underlying populations (i.e., payoffs generated by three experts and the two algorithms) were different at the 10% significance level (N = 100, P = 0.083). Two sample t-tests revealed that the Backpropagation network out-performed the best expert (and the other experts) in monetary payoffs at the 10% significance level (P = 0.084). However, ID3 did not out-perform the experts at a statistically significant level (P = 0.15) and there was no statistically significant difference in payoffs between Backpropagation and ID3 (P = 0.68).


  
Figure 4: ANOVA analysis for payoffs
\begin{figure}\rule{6in}{.01in}
\par {\scriptsize\bf
\begin{tex2html_preform}\be...
... 1.5 3.0\end{verbatim}\end{tex2html_preform}}
\par\rule{6in}{.01in}
\end{figure}

Conclusion

In terms of prediction accuracy and monetary payoff, both the Backpropagation network and ID3 performed better than human experts. Both algorithms appeared to be more robust than humans in their ability to analyze the large set of racing data objectively and reach unbiased conclusions. This characteristic is particularly evident from both algorithms' convincing prediction of numerous long shots human experts failed to identify. ID3's decision tree output was more understandable than that of the Backpropagation. In general, ID3 also predicted more conservatively than other approaches. The Backpropagation network, on the other hand, was more computationally expensive (and thus very slow), but made excellent predictions of long shots.

In this experiment, especially during the early stage, experts' heuristics for refining the initial performance variables were critical to the success of the machine learning algorithms. The 50-plus candidate variables were reduced to 10 of the most relevant parameters, which effectively reduced the problem space for this complex and noisy domain. Some variables were also the results of computations, e.g., time 3 average, time 7 average, etc. The machine learning algorithms that we adopted were ``supervised'' learning in which the learner was told the category membership of environmental observations; the sole learning task was to summarize the commonality among members of the same categories and differences among competing ones. These algorithms were not adaptive to unsupervised, self exploration of relevant variables or useful categories in the data using internalized heuristics. Nevertheless, there are other machine learning algorithms which exhibit unsupervised learning capability [7]. This may prove to be one of the most potentially fruitful directions for machine learning or knowledge discovery research [8] [12].

In summary, this research has examined in detail the characteristics and feasibility of various prediction techniques, including expert prediction, symbolic learning (ID3), and a connectionist approach (Backpropagation) for complex and uncertain game playing scenarios. For large real-life databases of historical information, recent machine learning techniques appeared to exhibit unique capabilities for data analysis and knowledge discovery.

Acknowledgments

This project was supported mainly by NSF grant #IRI-9211418 (1992-1994) and University of Arizona Summer Research Grant (1992).

Bibliography

1
B. G. Buchanan and T. M. Mitchell.
Model-directed learning of production rules.
In Pattern-Directed Inference Systems, pages 297-312, Edited by D. A. Waterman and F. Hayes-Roth, Acdemic Press, New York, NY, 1978.

2
J. G. Carbonell, R. S. Michalski, and T. M. Mitchell.
An overview of machine learning.
In Machine Learning, An Artificial Intelligence Approach, Pages 3-23, Michalski, R. S., Carbonell, J. G., and Mitchell, T. M., Editors, Tioga Publishing Company, Palo Alto, CA, 1983.

3
H. Chen, P. Hsu, R. Orwig, L. Hoopes, and J. F. Nunamaker.
Automatic concept classification of text from electronic meetings.
Communications of the ACM, 37(10):56-73, October 1994.

4
H. Chen and K. J. Lynch.
Automatic construction of networks of concepts characterizing document databases.
IEEE Transactions on Systems, Man and Cybernetics, 22(5):885-902, September/October 1992.

5
G. Drastal, G. Czako, and S. Raatz.
Induction in an abstraction space: A form of constructive induction.
In Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (IJCAI-89), pages 708-712, Detroit, MI, August 20-25, 1989.

6
D. H. Fisher and K. B. McKusick.
An empirical comparison of ID3 and back-propagation.
In Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (IJCAI-89), pages 788-793, Detroit, MI, August 20-25, 1989.

7
D. H. Fisher, M. J. Pazzani, and P. Langley.
Concept Formation: Knowledge and Experience in Unsupervised Learning.
Morgan Kaufmann Publishers, Inc., San Mateo, CA, 1991.

8
W. J. Frawley, G. Pietetsky-Shapiro, and C. J. Matheus.
Knowledge discovery in databases: an overview.
In Knowledge Discovery in Databases, pages 1-30, G. Piatetsky-Shapiro and W. J. Frawley, Editors, The MIT Press, Cambridge, MA, 1991.

9
F. Hayes-Roth, D. A. Waterman, and D. Lenat.
Building Expert Systems.
Addison-Wesley, Reading, MA, 1983.

10
P. Jackson.
Introduction to Expert Systems.
Addison-Wesley, Reading, MA, 1990.

11
R. P. Lippmann.
An introduction to computing with neural networks.
IEEE Acoustics Speech and Signal Processing Magazine, 4(2):4-22, April 1987.

12
K. J. Lynch and H. Chen.
Knowledge discovery from historical data: an algorithmic approach.
In Proceedings of the 25th Annual Hawaii International Conference on System Sciences (HICSS-25), Decision Support and Knowledge Based Systems Track, pages 70-79, Kaui, HI, January 7-10 1992.

13
R. S. Michalski.
A theory and methodology of inductive learning.
In Machine Learning, An Artificial Intelligence Approach, Pages 83-134, Michalski, R. S., Carbonell, J. G., and Mitchell, T. M., Editors, Tioga Publishing Company, Palo Alto, CA, 1983.

14
R. Mooney, J. Shavlik, G. Towell, and A. Gove.
An experimental comparison of symbolic and connectionist learning algorithms.
In Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (IJCAI-89), pages 775-780, Detroit, MI, August 20-25, 1989.

15
A. Newell and H. A. Simon.
Human Problem Solving.
Prentice-Hall, Englewood Cliffs, NJ, 1972.

16
J. R. Quinlan.
Learning efficient classification procedures and their application to chess end games.
In Machine Learning, An Artificial Intelligence Approach, Pages 463-482, Michalski, R. S., Carbonell, J. G., and Mitchell, T. M., Editors, Tioga Publishing Company, Palo Alto, CA, 1983.

17
J. R. Quinlan.
Induction of decision trees.
Machine Learning, 1:81-106, 1986.

18
J. R. Quinlan.
C4.5: Programs for Machine Learning.
Morgan Kauffmann, Los Altos, CA, 1993.

19
F. Rosenblatt.
Principles of Perceptrons.
Sparton, Washington, DC, 1962.

20
D. E. Rumelhart, G. E. Hinton, and J. L. McClelland.
A general framework for parallel distributed processing.
In Parallel Distributed Processing, pages 45-76, D. E. Rumelhart, J. L. McClelland, and the PDP Research Group, Editors, The MIT Press, Cambridge, MA, 1986.

21
D. E. Rumelhart, G. E. Hinton, and R. J. Williams.
Learning internal representations by error propagation.
In Parallel Distributed Processing, pages 318-362, D. E. Rumelhart, J. L. McClelland, and the PDP Research Group, Editors, The MIT Press, Cambridge, MA, 1986.

22
B. F. Ryan, B. L. Joiner, and T. A. Ryan.
MINITAB Handbook, 2nd Edition.
PWS-KENT Publishing Company, Boston, MA, 1985.

23
Flann N. S. and T. G. Dietterich.
Selecting appropriate representations for learning from examples.
In Proceedings of the Fifth National Conference on Artificial Intelligence (AAAI-86), pages 460-466, Philadelphia, PA, August 1986.

24
S. Salzberg.
Pinpointing good hypoetheses with heuristics.
In Artificial Intelligence and Statistics, Pages 133-158, Gale, W. A., Editor, Addison-Wesley Publishing Company, Reading, MA, 1986.

25
H. Simon.
Artificial intelligence: Where has it been, and where is it going?
IEEE Transactions on Knowledge and Data Engineering, 3(2):128-136, June 1991.

26
S. M. Weiss and I. Kapouleas.
An empirical comparison of pattern recognition, neural nets, and machine learning classification methods.
In Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (IJCAI-89), pages 781-787, Detroit, MI, August 20-25, 1989.

27
S. M. Weiss and C. A. Kulikowski.
Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Networks, Machine Learning, and Expert Systems.
Morgan Kaufmann Publishers, Inc., San Mateo, CA, 1991.

hchen@bpa.arizona.edu