|
|
GANNET: A machine
learning approach to document retrieval
Journal of Management
Information Systems; Armonk; Winter 1994/1995; Chen, Hsinchun; Kim, Jinwoo;
|
Source (subtitle): |
JMIS |
|
Volume: |
11 |
|
Issue: |
3 |
|
Start Page: |
7 |
|
ISSN: |
07421222 |
|
Subject Terms: |
Studies |
|
Classification Codes: |
9130: Experimental/theoretical treatment |
Abstract:
Information science
researchers have recently turned to new artificial intelligence-based inductive
learning techniques including neural networks, symbolic learning and genetic
algorithms. An overview of the new techniques and their usage in information
science research is provided. The algorithms adopted for a hybrid genetic
algorithms and neural nets based system, called GANNET, are presented. GANNET
performed concept (keyword) optimization for user-selected documents during
information retrieval using the genetic algorithms. It then used the optimized
concepts to perform concept exploration in a large network of related concepts
through the Hopfield net parallel relaxation procedure. Based on a test
collection of about 3,000 articles from DIALOG and an automatically created
thesaurus, and using Jaccard's score as a performance measure, the experiment
showed that GANNET improved the Jaccard's scores by about 50% and helped
identify the underlying concepts that best describe the user-selected documents.
|
Full Text: |
|
Copyright
M. E. Sharpe Inc. Winter 1994/1995 |
THE AVAILABILITY OF CHEAP AND
EFFECTIVE STORAGE DEVICES and information systems has prompted the rapid growth
and proliferation of relational, graphical, and textual databases in the past
few decades. Information collection and storage efforts have become easier, but
the effort required to retrieve relevant information has become significantly
more difficult, especially in large-scale databases. This situation is
particularly evident for textual databases, which are widely used in
traditional library science environments, in business applications (e.g.,
manuals, newsletters, and electronic data interchanges), and in scientific
applications (e.g., electronic community systems and scientific databases).
Information stored in these databases often has become voluminous, fragmented,
and unstructured after years of intensive use. Only users with extensive
subject area knowledge, system knowledge, and classification scheme knowledge
[9] are able to maneuver and explore in these textual databases.
Most commercial information
retrieval systems still rely on conventional inverted index and Boolean
querying techniques. Even full-text retrieval produced less than satisfactory
results [3]. In the past few decades, probabilistic retrieval techniques have
been used to improve the retrieval performance of information retrieval systems
[6, 49]. The approach is based on two main parameters, the probability of relevance
and the probability of irrelevance of a document. Despite various extensions,
probabilistic methodology still requires the independence assumption for terms
and it suffers the difficulty of estimating term-occurrence parameters
correctly [33, 73].
Since the late 1980s,
knowledge-based techniques have been used extensively by information science
researchers. These techniques have attempted to capture searchers' and
information specialists' domain knowledge and classification scheme knowledge,
effective search strategies, and query refinement heuristics in document
retrieval systems design [10]. Despite their usefulness, systems of this type
are considered performance systems [76]--they only perform what they were
programmed to do (i.e., they are without learning ability). Significant efforts
are often required to acquire knowledge from domain experts and to maintain and
update the knowledge base.
A newer paradigm, generally
considered to be the learning approach, has attracted attention of researchers
in artificial intelligence, computer science, and other functional disciplines
such as engineering, medicine, and business [7, 53, 83]. In contrast to
performance systems, which require knowledge from human experts, learning
systems acquire knowledge automatically from examples--that is, from source
data. The most frequently used techniques include symbolic, inductive learning
algorithms such as ID3 [62], multiple-layered, feedforward neural networks such
as Backpropagation net [72], and evolution-based genetic algorithms [30]. Many
information science researchers have started to experiment with these
techniques as well [2, 12, 13, 14, 33, 44].
In this paper, we present the
design and a prototype implementation a hybrid genetic algorithms and neural
networks based system, called GANNET, for "intelligent" information
retrieval. The next section presents an overview of the learning techniques and
some recent implementations in information retrieval (IR). We then describe the
genetic algorithms and Hopfield neural networks we adopted for the hybrid
system. The implementation details and testing results are presented in the
fourth section, and we conclude the paper and describe future research
directions in the final section.
"Intelligent"
Information Retrieval: An Overview
CREATING COMPUTER SYSTEMS WITH
KNOWLEDGE OR "INTELLIGENCE" has long been the goal of researchers in
artificial intelligence. Many interesting knowledge-based systems have been
developed in the past few decades for such applications as medical diagnosis,
engineering troubleshooting, and business decision making. Most of these
systems have been developed based on manual knowledge acquisition process, a
significant bottleneck for knowledge-based systems development.
A recent approach to knowledge
elicitation is referred to as "knowledge mining" or "knowledge
discovery" [26, 60]. Grounded on various machine learning techniques, the
approach is automatic and it acquires knowledge or identifies patterns directly
from some large collection of examples or real-life databases.
We review some of the important
works of knowledge-based systems in IR and learning systems in IR,
respectively, below.
Knowledge-Based Systems in IR
There have been many attempts to
capture information specialists' domain knowledge, search strategies, and query
refinement heuristics in document retrieval systems design. For example,
CoalSORT [54], a knowledge-based system, facilitates the use of bibliographic
databases in coal technology. A semantic network, representing an expert's domain
knowledge, embodies the system's intelligence. GRANT [18], developed by Cohen
and Kjeldsen, is an expert system for finding sources of funding for given
research proposals. Its search method--constrained spreading activation in a
semantic network--makes inferences about the goals of the user and thus finds
information that the user did not explicitly request but that is likely to be
useful. Fox's CODER system [25] consists of a thesaurus that was generated from
the Handbook of Artificial Intelligence and Collin's Dictionary. In CANSEARCH
[61], a thesaurus is presented as a menu. Users browse and select terms for
their queries from the menu. The "Intelligent Intermediary for Information
Retrieval" (I sup 3 R), developed by Croft [19], consists of a group of "experts"
that communicate via a common data structure, called the blackboard. The system
consists of a user model builder, a query model builder, a thesaurus expert, a
search expert (for suggesting statistics-based search strategies), a browser
expert, and an explainer. The IOTA system, developed by Chiaramella and Defude,
includes natural language processing of queries, deductive capabilities
(related to user modeling, search strategies definition, use of expert and
domain knowledge), management of full-text documents, and relevance evaluation
of answers [17]. Chen and Dhar's METACAT [10] incorporates several human search
strategies and a portion of the Library of Congress Subject Headings (LCSH) for
bibliographic search. The system also includes a branch-and-bound algorithm for
an automatic thesaurus (LCSH) consultation process.
The National Library of
Medicine's thesaurus projects are probably the largest-scale effort that uses
the knowledge in existing thesauri. In one of the projects, Rada and Martin [50,
65] conducted experiments for the automatic addition of concepts to MeSH
(Medical Subject Headings) by including the CMIT (Current Medical Information
and Terminology) and SNOMED (Systematized Nomenclature of Medicine) thesauri.
Access to various sets of documents can be facilitated by using thesauri and
the connections that are made among thesauri. The Unified Medical Language
System (UMLS) project is a long-term effort to build an intelligent automated
system that understands biomedical terms and their interrelationships and uses
this understanding to help users retrieve and organize information from
machine-readable sources [40, 46, 51]. The UMLS includes a Metathesaurus, a
Semantic Network, and an Information Sources Map. The Metathesaurus contains information
about biomedical concepts and their representation in more than ten different
vocabularies and thesauri. The Semantic Network contains information about the
types of terms (e.g., "disease," "virus," etc.) in the
Metathesaurus and the permissible relationships among these types. The
Information Sources Map contains information about the scope, location,
vocabulary, and access conditions of biomedical databases of all kinds.
Another important component of
information retrieval is user modeling capability, which is unique to reference
librarians. During the user-librarian consultation process, the librarian
develops an understanding of the type of user being dealt with on the basis of
verbal and nonverbal clues. Usually, the educational level of the user, the
type of question, the way the question is phrased, the purpose of the search,
and the expected search results all play major roles in helping the librarian
determine the needs of the user. The librarian, in essence, creates models of
the user profile and the task requirements during the consultation process.
User modeling has played a
crucial role in applications such as question-answering systems, intelligent
tutoring systems, and consultation systems [1, 8, 77, 78, 88]. An intelligent
interface for document retrieval systems must also exhibit the user modeling
capability of experienced human intermediaries. Daniels proposed a frame-based
representation for a user model and rules for interacting with the users. She
has shown that user modeling is a necessary unction in the presearch
information interaction [21]. Rich's Grundy system builds models of its users,
with the aid of stereotypes, and then uses those models to guide it in its
task, suggesting novels that people may find interesting [67, 68, 69].
Despite their successful and
useful application in numerous domains, the development process for
knowledge-based systems is often slow and painstaking. Knowledge engineers or
system designers need to be able to identify subject and classification knowledge
from some sources (usually some domain experts) and to represent the knowledge
in computer systems. The inference engines of such systems, which mainly
emulate human problem-solving strategies and cognitive processes [10], may not
be applicable across different applications.
Learning Systems: Neural
Networks, Symbolic Learning, and Genetic Algorithms
Contrary to the manual knowledge
acquisition process used in knowledge-based systems design, learning systems
rely on algorithms to extract knowledge or identify patterns in data. Various
statistics-based algorithms have been developed by management scientists and
have been used extensively over the past few decades for quantitative data
analysis. These algorithms examine quantitative data for the purposes of [58]:
(1) clustering descriptors with common characteristics, for example, factor
analysis and principal components analysis; (2) hypothesis testing for
differences among different populations, for example, t-test and analysis of
variance (ANOVA); (3) trend analysis, for example, time series analysis; and
(4) correlation between variables, for example, correlation coefficient and
linear/multiple regression analysis [27, 56]. These analysis techniques often
rely on complex mathematical models, stringent assumptions, or special
underlying distributions. The findings are then presented in mathematical
formulas and parameters.
Learning Systems: An Overview
The symbolic machine learning
technique, the resurgent neural networks approach, and evolution-based genetic
algorithms provide drastically different methods of data analysis and knowledge
discovery. These techniques, which are diverse in their origins and behaviors,
have shown unique capabilities in analyzing qualitative, symbolic data. We
provide a brief overview of these three classes of techniques along with a
representative technique for each class below.
* Symbolic Learning and ID3:
Symbolic machine learning techniques, which can be classified based on such
underlying learning strategies as rote learning, learning by being told,
learning by analogy, learning from examples, and learning from discovery [7],
have been studied extensively by AI researchers over the past two decades.
Among these techniques, learning from examples, a special case of inductive
learning, appears to be the most promising symbolic machine learning technique
for knowledge discovery in real databases. It induces a general concept
description that best describes the positive and negative examples.
Quinlan's ID3 decision-tree
building algorithm and its descendants [63, 64] are simple and yet powerful
algorithms for inductive learning. ID3 takes objects of a known class,
described in terms of a fixed collection of properties or attributes, and produces
a decision tree incorporating these attributes that correctly classifies all
the given objects. It uses an information-economics approach aimed at
minimizing the expected number of tests to classify an object. Its output can
be summarized in terms of IF-THEN rules. ID3 has been used successfully for
various classification and prediction applications.
* Neural Networks and
Backpropagation: The foundation of the neural networks paradigm was laid in the
1950s and has attracted significant attention in the past decade due to the
development of more powerful hardware and neural algorithms. Nearly all
connectionist algorithms have a strong learning component. In symbolic machine
learning, knowledge is represented in the form of symbolic descriptions of the
learned concepts. In connectionist learning, on the other hand, knowledge is
learned and remembered by a network of interconnected neurons, weighted
synapses, and threshold logic units [47, 71]. Learning algorithms such as Delta
rule can be applied to adjust connection weights so that the network can
predict or classify unknown examples correctly.
Among the numerous artificial
neural networks that have been proposed recently, Backpropagation networks have
been extremely popular for their unique learning capability. Backpropagation
networks [72] are fully connected, layered, feedforward models. Activations
flow from the input layer through the hidden layer, then to the output layer. A
Backpropagation network typically starts out with a random set of weights. The
network adjusts its weights each time it sees an input-output pair. Each pair
is processed at two stages, a forward pass and a backward pass. The forward
pass involves presenting a sample input to the network and letting activations
flow until they reach the output layer. During the backward pass, the network's
actual output is compared with the target output and error estimates are
computed for the output units. The weights connected to the output units are
adjusted in order to reduce the errors (a gradient descent method). The error
estimates of the output units are then used to derive error estimates for the
units in the hidden layer. Finally, errors are propagated back to the
connections stemming from the input units. The Backpropagation network updates
its weights incrementally until the network stabilizes.
* Genetic Algorithms and
Classifier System: During the past decade there has been a growing interest in
algorithms that rely on analogies to natural processes and Darwinian survival
of the fittest. The emergence of massively parallel computers made these
algorithms of practical interest. The best-known algorithm in this class is
probably the classifier system proposed by Holland [5], which represents
knowledge in terms of parallel, interconnected production rules.
Genetic algorithms were
developed based on the principle of evolution [30, 43, 52]. In such algorithms
a population of individuals (potential solutions) undergoes a sequence of unary
(mutation) and higher-order (crossover) transformations. These individuals
strive for survival: a selection (reproduction) scheme, biased toward selecting
fitter individuals, produces the individuals for the next generation. After
some number of generations the program converges--the best individual
represents the optimum solution.
Over the past years there have
been several studies that compare the performance of these techniques for
different applications as well as some systems that use hybrid representations
and learning techniques. We summarize some of these studies below.
Mooney et al. [57] found that
ID3 was faster than a Backpropagation net, but the Backpropagation net was more
adaptive to noisy data sets. The performances of these two techniques were
comparable, however. Weiss and Kapouleas [82, 83] suggested using resampling
technique such as leave-one-out for evaluation instead of using a holdout
testing data set. Discriminant analysis methods, Backpropagation net, and
decision tree-based inductive learning methods (like ID3) were found to achieve
comparable performance for several data sets. Fisher and McKusick [24] found
that using batch learning, Backpropagation performed as well as ID3, but it was
more noise-resistent. They also compared the effect of incremental learning
versus batch learning. Kitano [41] performed systematic, empirical studies on
the speed of convergence of Backpropagation networks and genetic algorithms.
The results indicated that genetic search is, at best, equally efficient as
faster variants of the Backpropagation algorithm in very small scale networks,
but is far less efficient in larger networks, Earlier research by Montana and
Davis [55], however, showed the performance was improved by using some
domain-specific genetic operators to train the Backpropagation network, instead
of the conventional Backpropagation Delta learning rule. Harp et al. [38] also
achieved good results by using GAs for neural network design.
Systems developed by Kitano [41]
and Harp et al. [38] are also considered hybrid systems (genetic algorithms and
neural networks) as are systems like COGIN [35], which performed symbolic
induction using genetic algorithms and SC-net [37], which is a fuzzy
connectionist expert system. Other hybrid systems developed in recent years
employ symbolic and neural net characteristics. For example, Touretzky and
Hinton [80] and Gallant [29] proposed connectionist production systems, and
Derthick [22] and Shastri [75] developed different connectionist semantic
networks. The system reported here is also a hybrid system that utilizes the
optimization property of the genetic algorithms and the parallel relaxation
capability of the Hopfield neural networks. Details about these methods are
presented in the next section.
Learning Systems in IR
The adaptive learning techniques
cited have also drawn attention from researchers in information science. Neural
networks computing, in particular, seems to fit well with conventional
retrieval models such as the vector space model [73] and the probabilistic
model[49].
Doszkocs et al. [23] provide an
excellent overview of the use of connectionist models in information retrieval.
These models include several related information-processing approaches, such as
artificial neural networks, spreading activation models, associative networks,
and parallel distributed processing. In contrast to more conventional
information-processing models, connectionist models are
"self-processing" in that no external program operates on the
network: the network literally processes itself, with "intelligent behavior"
emerging from the local interactions that occur concurrently between the
numerous network nodes through their synaptic connections. By taking a broader
definition of connectionist models, these authors were able to discuss the
well-known vector space model, cosine measures of similarity, and automatic
clustering and thesaurus in the context of network representation. Based on the
network representation, spreading activation methods such as constrained
spreading activation adopted in GRANT [18] and the branch-and-bound algorithm
adopted in METACAT [10] can be considered as variants of connectionist
activation. However, only a few systems are considered classical connectionist
systems that typically consist of weighted, unlabeled links and exhibit some
adaptive learning capabilities.
The work of Belew is probably
the earliest connectionist model adopted in IR. In AIR [2], he developed a
three-layer neural network of authors, index terms, and documents. The system
used relevance feedback from its users to change its representation of authors,
index terms, and documents over time. The result was a representation of the
consensual meaning of key words and documents shared by some group of users.
One of his major contributions was the use of a modified correlational learning
rule. The learning process created many new connections between documents and
index terms. Rose and Belew [70] extended AIR to a hybrid connectionist and
symbolic system called SCALIR which used analogical reasoning to find relevant
documents for legal research. Kwok [44] developed a similar three-layer network
of queries, index terms, and documents. A modified Hebbian learning rule was
used to reformulate probabilistic information retrieval. Wilkinson and Hingston
[84, 85] incorporated the vector space model in a neural network for document
retrieval. Their network also consisted of three layers: queries, terms, and
documents. They have shown that spreading activation through related terms can
help improve retrieval performance.
While the above systems represent
information retrieval applications in terms of their main components of
document, queries, index terms, authors, and so on, other researchers used
different neural networks for more specific tasks. Lin [45] adopted a Kohonen
network for information retrieval. Kohonen's feature map, which produced a
two-dimensional grid representation for N-dimensional features, was applied to
construct a self-organizing (unsupervised learning), visual representation of
the semantic relationships between input documents. In [48], a neural algorithm
developed by MacLeod was used for document clustering. The algorithm exhibited
effectiveness comparable to that of conventional hierarchical clustering
algorithms. Chen et al. [12, 13, 15] reported a series of experiments and system
developments that generated an automatically created weighted network of key
words from large textual databases and integrated it with several existing
man-made thesauri (e.g., LCSH). Instead of using a three-layer design, Chen's
systems developed a single-layer, interconnected, weighted/labeled network of
key words (concepts) for "concept-based" information retrieval. A
blackboard-based design that supported browsing and automatic concept
exploration using the Hopfield neural network's parallel relaxation method was
adopted to facilitate the usage of several thesauri [13].
Despite the popularity of using
neural networks for information retrieval, we see only limited use of other
adaptive learning techniques. In [4], the researchers used discriminant analysis
and a simple symbolic learning technique for document classification. Their
symbolic learning process simply represented the numeric classification results
in terms of IF-THEN rules. We adopted an ID3 decision-tree building algorithm
for information retrieval [16]. For a small test collection of documents, the
inductive learning method did a good job in identifying the concepts (key
words) that best represent the set of documents identified by users as relevant
(positive examples) and irrelevant (negative examples). More testing, however,
is under way to determine the effectiveness of example-based document retrieval
using ID3.
Our literature search revealed
several implementations of genetic algorithms in information retrieval. Gordon
[33] presented a genetic algorithms-based approach for document indexing.
Competing document descriptions (key words) are associated with a document and
altered over time by using genetic mutation and crossover operators. In his
design, a key word represents a gene (a bit pattern), a document's list of key
words represents individuals (a bit string), and a collection of documents
initially judged relevant by a user represents the initial population. Based on
a Jaccard's score matching function (fitness measure), the initial population
evolved through generations and eventually converged to an optimal (improved)
population--a set of key words that best described the documents. Gordon [34]
adopted a similar approach to document clustering. His experiment showed that
after genetically redescribing the subject description of documents,
descriptions of documents found corelevant to a set of queries will bunch
together. Redescription improved the relative density of co-relevant documents
by 39.74 percent after twenty generations and 56.61 percent after forty
generations. Raghavan and Agarwal [66] have also studied the genetic algorithms
in connection with document clustering. Petry et al. [59] applied genetic
programming to a weighted information retrieval system. In their research, a
weighted Boolean query was modified in order to improve recall and precision.
They found that the form of the fitness function has a significant effect upon
performance. Yang and his coworkers [86, 87] have developed adaptive retrieval
methods based on genetic algorithms and the vector space model using relevance
feedback. They reported the effect of adopting genetic algorithms in large
databases, the impact of genetic operators, and GA's parallel searching
capability. Frieder and Siegelmann [28] also reported a data placement strategy
for parallel information retrieval systems using a genetic algorithms approach.
Their results compared favorably with pseudo-optimal document allocations.
The connectionist approach to IR
and Gordon's GAs in IR provided great insight for this research. Specifically,
we attempted to investigate in more depth the role of genetic algorithms
adaptation for optimizing concept description in example-based document
retrieval; we also aimed to integrate the optimization capability of the
genetic algorithms with the parallel exploration (search) property of the
Hopfield networks. Details about our design are reported below.
GANNET for IR: An Integrated
Framework
WE DESCRIBE HERE IN ALGORITHMIC
DETAIL THE GENETIC ALGORITHMS and neural networks adopted in our hybrid system
for adaptive information retrieval.
Genetic Algorithms
Genetic algorithms (GAs) [30,
43, 52] are problem-solving systems based on principles of evolution and
heredity. A GA maintains a population of individuals, P(t) = x sub 1 ...,x sub
n at iteration t. Each individual represents a potential solution to the
problem at hand and is implemented as some (possibly complex) data structure S.
Each solution x sub i is evaluated to give some measure of fitness. Then a new
population at iteration t + 1 is formed by selecting the fitter individuals.
Some members of the new population undergo transformation by means of genetic
operators to form new solutions. There are unary transformations m sub i
(mutation type), which create new individuals by a small change of a single
individual and higher-order transformations c sub j (crossover type), which
create new individuals by combining parts from several (two or more)
individuals. For example, if parents are represented by a five-dimensional
vector (a sub 1 , a sub 2 , a sub 3 , a sub 4 , a sub 5 ) and (b sub 1 , b sub
2 , b sub 3 , b sub 4 , b sub 5 ) then a crossover of chromosomes after the
second gene produces offspring (a sub 1 , a sub 2 , b sub 3 , b sub 4 , b sub 5
) and (b sub 1 , b sub 2 , a sub 3 , a sub 4 , a sub 5 ). The control
parameters for genetic operators (probability of crossover and mutation) need
to be carefully selected to provide better performance. The intuition behind
the crossover operation is information exchange between different potential
solutions. After some number of generations the program converges--the best
individual hopefully represents the optimum solution. Goldberg [30] presents a
good summary of many recent GA applications in biology, computer science,
engineering, operations research, physical sciences, and social sciences.
Genetic algorithms use a
vocabulary borrowed from natural genetics. We would talk about genes (or bits),
individuals (chromosomes or bit strings), and population (of individuals).
Populations evolve through generations. Our genetic algorithm is executed in
the following steps:
1. Initialize population and
evaluate fitness:
To initialize a population, we
need first to decide the number of genes for each individual and the total
number of chromosomes (popsize) in the initial population. When adopting GAs in
IR, each gene (bit) in the chromosome (bit string) represents a certain keyword
or concept. The loci (locations of a certain gene) decide the existence (1, ON)
or nonexistence (0, OFF) of a concept. A chromosome represents a document that
consists of multiple concepts. The initial population contains a set of
documents that are judged relevant by a searcher through relevance feedback.
The goal of the GAs is to find an optimal set of documents that best match the
searcher's needs (expressed in terms of underlying key words or concepts). An
evaluation function for the fitness of each chromosome was selected based on
Jaccard's score matching function as used by Gordon [33] for document indexing.
The Jaccard's score between two sets X and Y is computed as:
(equation omitted)
where #(S) indicates the
cardinality of set S. The Jaccard's score is a common measure of association in
information retrieval [81].
2. Reproduction (selection):
Reproduction is the selection of
a new population with respect to the probability distribution based on the the
fitness values. Fitter individuals have better chances of getting selected for
reproduction [52]. A roulette wheel with slots (F) sized according to the total
fitness of the population is defined as follows:
(equation omitted)
where fitness (V sub i )
indicates the fitness value of chromosome V sub i according to the Jaccard's
score.
Each chromosome has a certain
number of slots proportional to its fitness value. The selection process is
based on spinning the wheel popsize times; each time we select a single
chromosome for a new population. Obviously, some chromosomes will be selected more
than once. This is in accordance with the genetic inheritance: the best
chromosomes get more copies, the average stay even, and the worst die off.
3. Recombination (crossover and
mutation):
Now we are ready to apply the
first recombination operator, crossover, to the individuals in the new
population. The probability of crossover, p sub c , gives us the expected
number p sub c x popsize of chromosomes that should undergo the crossover
operation. For each chromosome, we generate a random number r between 0 and 1;
if r < p sub c , then the chromosome is selected for crossover. We then mate
selected pairs of chromosomes randomly: for each pair of coupled chromosomes,
we generate a random number pos from the range of 1..m - 1), where m is the
total number of genes in a chromosome. The number pos indicates the position of
the crossing point. The coupled chromosomes exchanged genes at the crossing
point as described earlier.
The next recombination operator,
mutation, is performed on a bit-by-bit basis. The probability of mutation, p
sub m , gives us the expected number of mutated bits p sub m x m x popsize.
Every bit in all chromosomes of the whole population has an equal chance to
undergo mutation, that is, to change from 0 to 1 or vice versa. For each chromosome
in the crossovered population and for each bit within the chromosome, we
generate a random number r from the range of (0..1); if r < p sub m , we
mutate the bit.
4. Convergence:
Following reproduction,
crossover, and mutation, the new population is ready for its next generation.
The rest of the evolutions are simply cyclic repetitions of the above steps
until the system reaches a predetermined number of generations or converges
(i.e., no improvement in the overall fitness of the population).
However, due to the unique
knowledge representation scheme required of the GAs--binary bit strings--many
problems cannot be represented conveniently using GAs. For many applications
the most difficult problem is to transform the underlying representation into
fixed-length bit strings. Koza [43] observed:
Representation is a key issue in
genetic algorithm work because the representation scheme can severely limit the
window by which the system observes its worlds....String-based representation
schemes are difficult and unnatural for many problems and the need for more
powerful representations has been recognized for some time.
In summary, GA representation
schemes do not have dynamic variability. The initial selection of string length
limits in advance the number of internal states of the system and limits what
the system can learn. Another consequence of the representation requirement of
GAs is their inability to deal with nontrivial constraints [52].
In recent years, researchers
have suggested using real number instead of bit strings for representation
[52]. There have also been attempts to use knowledge-augmented genetic
operators for GAs [36, 55]. Goldberg [31, 32] proposed messy GAs to address
some of the representation problems. In messy GAs, strings are of variable length.
Simple crossover is replaced by two even simpler operators: splice and cut.
When using GAs in IR, the
optimized concepts are also severely constrained by the bit strings available
in the initial population. That is, the system can only use the key words of
the documents supplied by the users to find other relevant documents. Key words
that are relevant, but not in the initial document set, will not be considered
in GA optimization. In order to address this problem, we propose to incorporate
a neural network of related concepts (concept space), which acts as the
(mother) nature for the GAs, constantly providing new, useful genes for GA
evolution. The optimized genes from one round of GA optimization can call the
nature and produce other similar genes. These new genes can then be used in the
next round of GA optimization. By using a network of concepts (and some
underlying search method), GAs can change their underlying representation
dynamically. We hope this will alleviate the fixed-length representation
problem of GAs and improve overall optimization.
Hopfield Networks
An excellent candidate for
creating a network of related key words had been proposed by Chen et al. [12,
13], who used an asymmetric similarity function to produce automatic thesauri
for different underlying databases. The output was a weighted network of
keywords, akin to a single-layered neural network. They also adopted a variant
of the Hopfield parallel relaxation procedure for network search [13] and
concept clustering [11]. We present here an overview of the Hopfield network
and sketch the algorithm we adopted or our hybrid system.
The Hopfield net [39, 79] was
introduced as a neural net that can be used as a content-addressable memory.
Knowledge and information can be stored in single-layered interconnected
neurons (nodes) and weighted synapses (links) and can be retrieved based on the
network's parallel relaxation method--nodes are activated in parallel and are
traversed until the network reaches a stable state (convergence). It had been
used for various classification tasks [47] and was also used recently by the
authors in a blackboard-based retrieval system [13].
A weighted network of keywords
generated automatically using 3,000 computing-related articles extracted from
DIALOG was reported in [12]. These articles were used by a group of researchers
specializing in international computing technologies. Most articles were useful
for evaluating foreign computing technologies, especially those in the former
Soviet Union. Key words and user-specific folders were used extensively in this
environment. We adopted this weighted network as the nature for the concepts
that may be of interest to users of such database. Our prototype system
contained 1,488 nodes (key words) and 44,486 links (probabilities between 0 and
1), for example, (supercomputing 0.275 parallel programming), (parallel
programming 0.225 multiprocessor) (parallel programming 0.124 supercomputing),
and so on.
Our implementation incorporated
basic Hopfield net iteration and convergence ideas. However, significant
modification was also made to accommodate the unique characteristics for
information retrieval, for example, asymmetric link weights and the continuous
SIGMOID transformation function. With the optimized output (key words)
generated by GAs and the association of key words captured by the network, the
Hopfield parallel relaxation algorithm activates neighboring terms, combines
weighted links, performs a transformation function (a SIGMOID function, f sub s
), and determines the outputs of newly activated nodes. The process repeats
until node outputs remain unchanged with further iterations. The node outputs
then represent the concepts that are strongly related to the initial optimized
concepts provided by the GAs. With these dynamically increasing genes (key
words), the GAs can perform reproduction and recombination again to produce
even better populations. A sketch of the Hopfield net activation algorithm
follows:
1. Assigning synaptic weights:
The "training" phase
of the Hopfield net is completed when the weights have been computed based on
the similarity function discussed earlier. t sub ij represents the
"synaptic" weight from node i to node j. Each node represents a gene
(key word) in GAs.
2. Initialization with optimized
GA output:
An initial set of optimized key
words {S sub 1 , S sub 2 ,..S sub m } is provided by the GA procedure. Each
node in the network that matches the key words is initialized to have a weight
of 1.
mu sub i (0) = x sub i ,
0<=i<=n-1.
mu sub i (t) is the output of
node i at time t and x sub i , which has a value between 0 and 1, indicates the
input pattern for node i.
3. Activation. weight
computation, and iteration:
(equation omitted)
where f sub s is the continuous
SIGMOID transformation function as shown below [20, 42].
(equation omitted)
serves as a threshold or bias
and theta sub 0 is used to modify the shape of the SIGMOID function.
This formula shows the parallel
relaxation property of the Hopfield net. At each iteration, all nodes are
activated at the same time. The weight computation scheme,
(equation omitted)
is also a unique characteristic
of the Hopfield net algorithm. Based on parallel activation, each newly
activated node derives its new weight based on the summation of the products of
the weights assigned to its neighbors and their synapses.
* Convergence:
The above process is repeated
until there is no change in terms of output between two iterations, which is
accomplished by checking:
(equation omitted)
where xi is the maximal
allowable error (a small number). The final output represents the set of terms
relevant to the starting key words. Some default threshold values were selected
for (theta sub j , theta sub 0 ). More details about threshold selection are
discussed later.
A Hybrid System: GANNET
Gordon [33] adopted GAs to
indexing by simultaneously supplying alternative descriptions to the same
document and then deriving better descriptions based on GA reproduction and
recombination. After forty generations, Jaccard's score improved by about 20
percent. Gordon's research has provided great insight regarding the
representation issues for using GAs in IR and he has shown the potential for
such a technique. However, we believed we could adopt the same approach for
relevance feedback in IR and, even more importantly, that we could combine GAs
with the popular neural networks techniques that have been adopted frequently
in IR. We hoped that by combining the unique optimization capability of the GAs
and the powerful associative exploration property of the neural networks
(Hopfield networks, in particular), we could improve IR performance.
Figure 1 shows the overall
structure of GANNET. (Figure 1 omitted) The GAs manipulate input documents and
their associated key words to generate an initial set of optimized key words--a
process called concept optimization. In the following concept discovery or
concept exploration process, this set of key words is then used by the Hopfield
network to produce other relevant key words (new genes suggested by the
nature). These Hopfield-suggested key words are included in GAs for the next
round of concept optimization. The process repeats until there is no further
improvement in the overall fitness. During the actual implementation, we
improved the GAs by adopting a parameter selection procedure for p sub c and p
sub m and we improved the Hopfield net procedure by adopting a dynamic
threshold for theta sub j .
Figure 2 shows a desired
behavior of our hybrid design. (Figure 2 omitted) The un-shaded regions show
the potential improvement based on the GAs; the shaded regions show the
potential improvement from the Hopfield networks. Gordon's research has shown
the gain from GAs after a fixed number of generations--the first un-shaded GAs
region. Our iterative GA/HP (HP: Hopfield) cycles aimed to improve the overall
fitness by allowing the nature to alter the underlying representations of the
GAs and to explore a larger and more relevant search space. We envision that
fitness will improve even more significantly using the iterative GANNET cycles
and performance will converge over time. Detailed examples and some benchmark
testing results are provided in the following sections.
System Implementation and
Evaluation
WE DEVELOPED A PROTOTYPE SYSTEM
IN C on a DECStation 5000/120 (25 MIPS workstation, running ULTRIX). A test
database of 3,000 articles from DIALOG was used to generate a Hopfield network
of 1,488 nodes and 44,486 weighted links. We present a sample session, some
implementation details, and the benchmark testing results below.
Sample Session
In our hybrid system, a key word
represents a gene (bit) in GAs or a node in the Hopfield network; a
user-selected document represents a chromosome (individual); and a set of
user-selected documents represents the initial population.
Genetic Algorithms for Concepts
Optimization
The key words used in the set of
user-selected documents were first identified to represent the underlying bit
strings for the initial population. Each bit represented the same unique key
word throughout the complete GAs process. When a key word was present in a
document, the bit was set to 1, otherwise it was 0. Each document could then be
represented in terms of a sequence of 0s and 1s. The key words of five
user-selected documents are presented below. The set of unique concepts present
in these sample documents is also summarized--thirty-three key words (genes) in
total. Notice that some concepts were folder names assigned by the users (in
the format of *.*), such as QUERY.OPT folder for query optimization topics,
DBMS.AI folder for artificial intelligence and databases topics, KEVIN.HOT
folder for "HOT" (current) topics studied by Kevin (a Russian
computing researcher).
(Input Documents and Key Words
omitted)
(Concepts omitted)
(Genetic Pattern omitted)
We computed the fitness of each
document based on its relevance to other documents in the user-selected set. As
discussed earlier, we adopted the same Jaccard's score matching function used
in [33]. Higher Jaccard's score (a value between 0 and 1) indicated stronger
relevance between two documents. For document 0, we computed five different
Jaccard's scores between document 0 and documents 0, 1, 2, 3, and 4,
respectively (shown below). (Jaccard's scores omitted) An average fitness was
then computed for document 0 (0.28774). The same procedure was applied to other
documents to compute their fitnesses. A document that included more concepts
shared by other documents had a higher Jaccard's score.
If a user provides documents
that are closely related, the average fitness for the complete document set
will be high. If the user-selected documents are only loosely related, their
overall fitness will be low. Generally, GAs did a good job optimizing a
document set that was initially low in fitness. Using the previous example, the
overall Jaccard's score increased over generations. The optimized population
contained only one single chromosome, with an average fitness value of 0.45121.
The optimized chromosome contained six relevant key words that best described
the initial set of documents.
(Chromosomes omitted)
In our initial implementation,
we adopted a standard GA procedure incorporating a fixed number of generations
(40) and empirically selected p sub c and p sub m probabilities. Because the
application of the GAs was basically a random, parallel search process, its
performance was affected strongly by the control parameters, especially p sub c
and p sub m --p sub c exploited the search space by using the fitter
individuals while p sub m explored the search space by randomly mutating bits
[52]. Figure 3a shows the fitness deterioration and figure 3b shows the fitness
improvement when using different initial populations and p sub c and p sub m .
(Figure 3 omitted) In most cases, fitness increased and converged to a certain
level after about ten generations, but in some cases average fitness decreased,
as shown in figure 3a.
According to past research [52,
74], GAs lack robust heuristics to determine good values for their parameters.
Typically P sub m is much smaller than P sub c and GAs often converge quickly
after the first several generations. Based on these observations, we adopted a
revised GA that performed short cycles (each cycle contained ten generations)
of evolution. At each cycle GANNET adopted various combinations of p sub c and
p sub m values to form candidate populations, and selected the best population
among these candidates for the net cycle. If fitness improved in a given cycle,
GANNET incremented p sub m for the next cycle in an attempt to explore the
search space fully and to avoid a local minimum. The process terminated when
the fitness no longer improved in two consecutive processing cycles. In
essence, this variant GA performed reproduction using a smaller number of generations
and dynamically changing parameters in order to exploit and explore the search
space fully.
As shown in figure 4 (each line
represents a different p sub c and p sub m combination), fitness improved
significantly in the first cycle. (Figure 4 omitted) The best population
produced from Cycle 1 was used in Cycle 2 for further optimization using
different p sub c and p sub m values. The optimization process converged after
four cycles. In our experiment, most test cases converged after about six cycles.
Typical p sub c selected ranged between 0.7 and 0.9 and p sub m ranged between
0.01 and 0.03.
Hopfield Net for Concept
Discovery
After concepts were optimized by
GAs, they were used as input to the Hopfield network to activate other relevant
concepts. The overall Hopfield network activation sensitivity was mainly
controlled by two threshold values, theta sub 0 and theta sub j . In our
implementation, we fixed the theta sub 0 value at 0.0, that is, no threshold,
and we changed the theta sub j values to obtain different levels of activation.
As shown in figure 5, the number of concepts generated by the Hopfield network
was inversely determined by the theta sub j threshold (each line represents a
different test case). (Figure 5 omitted)
For the concepts identified by
the Hopfield network, the ones that were activated by higher threshold tended
to be more relevant to the initial concepts. In order to prevent dilution of
the original concepts by less relevant concepts, good threshold values needed
to be carefully selected. We adopted some empirical heuristics based on our
experimentation. Because the number of activated concepts increased
exponentially when we decreased the threshold value, we gradually decreased
theta sub j . When a new theta sub j produced more than three new concepts,
GANNET terminated the Hopfield network activation process. This ad hoc
procedure allowed us to identify a reasonable set of related key words from the
Hopfield network.
As shown below using the same
example, GANNET proceeded to the Hopfield network activation process with the
output generated from GAs, that is, the optimized key words: RETRIEVAL, US,
INDEXING, KEVIN.HOT, INFORMATION RETRIEVAL SYSTEMS, STAIRS. By decrementing
theta sub j values from 0.18, the final threshold value was selected at 0.08.
Three other related concepts were identified, namely, INFORMATION, INFORMATION
RETRIEVAL, and THESAURI.
(Concepts omitted)
After the concept discovery
procedure, we then retrieved the set of documents which contained at least one
new Hopfield-activated key word. This simple process generated a lot of
documents as candidates. Our experiment showed that in order to retain relevant
documents and for computational reason, the appropriate number of documents to
include for the next GA procedure was between five and fifteen. In order to
adopt only highly relevant documents, we compared the Jaccard's scores of the
candidate documents. As shown in the example below, the number of candidate
documents generated was 2,628. (examples omitted) GANNET computed the Jaccard's
score of each document and used the optimized fitness value from the prior GA
process (0.4512) to retrieve documents. Four documents were selected. In order
to identify more documents (such that the total number of selected documents
was between five and fifteen), GANNET lowered the Jaccard's cutoff by 0.05 in
order to identify more documents that had a fitness value greater than 0.3512.
When a desired number of documents was obtained, GANNET retrieved these
documents and used them as the chromosomes of the population in the next round
of concept optimization (GA) and concept exploration (HP). In this example,
fourteen documents were retrieved for the next GA/HP process. This GA/HP cycle
was repeated until there was no more fitness improvement from the process--that
is, the process converged.
The initial fitness of the three
documents was 0.5899. After the first GA procedure, the fitness increased to
0.6349. The optimized concepts were sent to HP to find PARALLEL PROCESSING.
This new key word was then used to identify more relevant documents and the
GA/HP process continued. As shown in figure 6, as a result of optimizing the
output from HP using GA--that is, a HP/GA cycle (HP followed by GA)--fitness improved
over cycles and eventually converged after three HP/GA cycles (converged
fitness: 0.7675). (Figure 6 omitted) This finding was consistent with our
desired level of fitness improvement using the proposed hybrid architecture as
shown in figure 2.
Evaluation Results
Table 1 summarizes the results
from our benchmark testing. (Table 1 omitted) In the testing we randomly
retrieved five test cases of 1-document, 2-document, 3-document, 4-document,
5-document, and 10-document examples, respectively, from the 3,000-document
database. There were thirty test cases in total. For each test case, an initial
fitness based on the Jaccard's score was computed. For 1-document and
2-document test cases, their initial fitness tended to be higher due to the
smaller sample size (see column 2 of Table 1). In Table 1, we also report
important performance measures in terms of the Jaccard's scores for the first
GA and the final HP/GA (converged) processes, the CPU times, and the average
improvements in fitness. The important findings are summarized as follows:
1. Using the basic GA
optimization process without the Hopfield activation component, our system
achieved an average fitness improvement from 5.38 percent to 17.7 percent. This
finding was consistent with the performance improvement for indexing reported
by Gordon [33].
Another interesting observation
was that when more initial documents were present, the initial fitness tended
to be lower, which allowed the system to do a better job in improving the
precision of the initial key words and in identifying other relevant documents.
As shown in Table 1, fitness improvement increased as a function of the number
of initial documents, from 5.38 percent to 17.7 percent. This finding also
suggested that when initial user-supplied documents are fuzzy and not well
articulated, GAs will be able to make a more significant contribution in
suggesting other relevant documents. This could be quite important for complex
information retrieval sessions during which searchers need help in query articulation
and search refinement.
The number of documents
suggested by GANNET after the first GA process was between 9 and 13, with an
average of about 11 documents. The CPU times required of the first GA process
was also quite reasonable, with an average of 0.168 seconds.
2. With the repetitive HP/GA
cycles, the overall fitness improved even more. On an average, the Jaccard's
score improved from the GA's 0.5511 to a final HP/GA score of about 0.6655. The
average fitness improvement over the initial score went from 7.19 percent for
the initial GA to 47.79 percent for the repetitive HP/GA. This finding
confirmed the power of adopting our hybrid GA-HP architecture. Similar to the
finding in GA optimization, GANNET improved the fitness more when the initial number
of documents was small and/or when the initial documents were not strongly
related.
The average number of documents
suggested by GANNET was about ten, similar to that suggested by GA alone. But
the CPU time required averaged at 2.46 seconds, which was significantly longer
than the first GA process. This result was expected due to the extensive
computational effort required of our design. On average, GANNET performed about
five HP/GA cycles in order to reach a "global" convergence.
The system essentially performed
an incremental, evolutional leap by executing a GA optimization using the genes
presented by one population, calling the nature for more related genes,
executing another GA optimization using the newer set of genes, calling the
nature again, until finally even the nature cannot improve the population. In
figure 7 we present a graphic display of the fitness improvement over HP/GA
cycles for the thirty test cases. (Figure 7 omitted)
Conclusion
INFORMATION RETRIEVAL RESEARCH
HAS BEEN ADVANCING VERY QUICKLY over the past few decades. Researchers have
experimented with techniques ranging from probabilistic models and the vector
space model, to a knowledge-based approach and recent machine learning
techniques. At each stage, significant insights regarding how to design more
useful and "intelligent" information retrieval systems have been
gained.
We have presented an extensive
review of IR research which was mainly based on machine learning techniques.
Connectionist modeling and learning, in particular, has attracted considerable
attention due to its strong resemblance to some existing IR models and
techniques. Symbolic machine learning and genetic algorithms, two popular
candidates for adaptive learning in other applications, on the other hand, have
only been used occasionally. Based on Gordon's research using GAs in adaptive
indexing, we explored the idea of using GAs in adaptive IR and further expanded
the static GA representation to a hybrid architecture that contained both a GA
concept optimization component and a Hopfield network-based concept exploration
component.
Our hybrid prototype system,
GANNET, performed an iterative GA-HP concept optimization and exploration
process in order to evolve in a larger search space. Our benchmark testing
results confirmed the power of using GAs alone, with an average fitness
improvement of about 7 percent. But when using the complete GANNET cycles,
fitness improved even more significantly to about 48 percent.
We believe this research has
shed light on the usefulness of adopting both genetic algorithms and neural
networks in an integrated framework for adaptive information retrieval. By
combining the robust optimization characteristic of the genetic algorithms and
the associative property of the Hopfield network, an information retrieval
system can analyze the user-supplied evidence and "learn" from the
users. We envision the proposed design being incorporated in a truly
interactive retrieval environment in which users can continuously provide
relevance feedback and the system can adapt to the users' needs.
For future research, we plan to
examine the feasibility of adopting weighted bit string (0..1) instead of the
original 0/1 values for GAs. By assigning weights to each key word (gene)
(e.g., using the term frequency and inverse document frequency [73]), a system
may be able to better delineate between good genes (concepts, key words) and
bad genes. Another research direction will involve incorporating more natures
into the hybrid system. By incorporating more sources of knowledge (i.e.,
multiple thesauri) [13],we believe GANNET will be able to make more significant
adjustments to its underlying representation and thus make a bigger and better
evolutional leap.
REFERENCES
1. Appelt, D. The role of user
modelling in language generation and communication planning. In User Modelling
Panel, Proceedings of the Ninth International Joint Conference on Artificial
Intelligence, Los Angeles, August 1985, pp. 1298-1302.
2. Belew, R.K. Adaptive
information retrieval. In Proceedings of the Twelfth Annual International
ACM/SIGIR Conference on Research and Development in Information Retrieval,
Cambridge, MA, June 25-28, 1989, pp. 11-20.
3. Blair, D.C., and Maron, M.E.
An evaluation of retrieval effectiveness for a full-text document-retrieval
system. Communications of the ACM, 28, 3 (1985), 289-299.
4. Blosseville, M.J.; Hebrail,
G.; Monteil, M.G.; and Penot, N. Automatic document classification: natural
language processing, statistical analysis, and expert system techniques used
together. In Proceedings of the Fifteenth Annual International ACM/SIGIR
Conference on Research and Development in Information Retrieval, Copenhagen,
June 21-24, 1992, pp. 51-57.
5. Booker, L.B.; Goldberg, D.E.;
and Holland, .H. Classifier systems and genetic algorithms. In J.G. Carbonell
(ed.), Machine Learning, Paradigms and Methods. Cambridge: MIT Press, 1990, pp.
235-282.
6. Bookstein, A., and Swanson,
D.R. Probabilistic models for automatic indexing. Journal of the American
Society for Information Science, 26, 1 (January-February 1975), 45-50.
7. Carbonell, J.G.; Michalski,
R.S.; and Mitchell, T.M. An overview of machine learning. In R.S. Michalski,
J.G. Carbonell, and T.M. Mitchell (eds.), Machine Learning, an Artificial
Intelligence Approach, Palo Alto, CA: Tioga Publishing, 1983, pp. 3-23.
8. Chen, H., and Dhar, V.
Reducing indeterminism in consultation: a cognitive model of user/librarian
interaction. In Proceedings of the 6th National Conference on Artificial
Intelligence (AAAI-87), Seattle, July 13-17, 1987, pp. 285-289.
9. Chen, H., and Dhar, V. User
misconceptions of online information retrieval systems. International Journal
of Man-Machine Studies, 32, 6 (June 1990), 673-692.
10. Chen, H. and Dhar, V.
Cognitive process as a basis for intelligent retrieval systems design.
Information Processing and Management, 27, 5 (1991), 405-432.
11. Chen, H.; Hsu, P.; Orwig,
R.; Hoopes, L.; and Nunamaker, J.F. Automatic concept classification of text
from electronic meetings. Communications of the ACM, 37, 10 (October 1994).
12. Chen, H., and Lynch, K.J.
Automatic construction of networks of concepts characterizing document
databases. IEEE Transactions on Systems, Man and Cybernetics, 22, 5
(September-October 1992), 885-902.
13. Chen, H.; Lynch, K.J.; Basu,
K.; and Ng, T. Generating, integrating, and activating thesauri for
concept-based document retrieval. IEEE EXPERT, Special Series on Artificial
Intelligence in Text-Based Information Systems, 8, 2 (April 1993), 25-34.
14. Chen, H., and Mahboob, G.
Example-based document retrieval: an inductive machine learning approach.
Center for Management of Information, College of Business and Public
Administration, University of Arizona, Working Paper, CMI-WPS, 1992.
15. Chen, H., and Ng, T. An
algorithmic approach to concept exploration in a large knowledge network
(automatic thesaurus consultation): symbolic branch-and-bound vs. connectionist
Hopfield net activation. Journal of the American Society for Information
Science (1994), forthcoming.
16. Chen, H., and She, L.
Inductive query by examples (IQBE): a machine learning approach. In Proceedings
of the 27th Annual Hawaii International Conference on System Sciences
(HICSS-27), Information Sharing and knowledge Discovery Track, Maui, January
4-7, 1994.
17. Chiaramella, Y., and Defude,
B. A prototype of an intelligent system for information retrieval: IOTA.
Information Processing and Management, 23, 4 (1987), 285-303.
18. Cohen, P.R., and Kjeldsen,
R. Information retrieval by constrained spreading activation in semantic
networks. Information Processing and Management 23, 4 (1987), 255-268.
19. Croft, W.B., and Thompson,
R.H. I sup 3 R: a new approach to the design of document retrieval systems.
Journal of the American Society for Information Science, 38, 6 (1987), 399-404.
20. Dalton, J., and Deshmane, J.
Artificial neural networks. IEEE Potentials, 10, 2 (April 1991), 33-36.
21. Daniels, P.T. The user
modelling function of an intelligent interface for document retrieval systems.
In B.C. Brookes (ed.), Intelligent Information Systems for the Information
Society. Amsterdam: Elsevier Science Publishers B.V., North-Holland, 1986.
22. Derthick, M. Mundane
Reasoning by Parallel Constraint Satisfaction, Ph.D. dissertation, Carnegie
Mellon University, 1988.
23. Doszkocs, T.E.; Reggia, J.:
and Lin, K. Connectionist models and information retrieval. Annual Review of
Information Science and Technology (ARTIST), 25 (1990), 209-260.
24. Fisher, D.H., and McKusick,
K.B. An empirical comparison of ID3 and back-propagation. In Proceedings of the
Eleventh International Joint Conference on Artificial Intelligence (IJCAI-89),
Detroit, August 20-25, 1989, pp. 788-793.
25. Fox, E.A. Development of the
CODER system: A testbed for artificial intelligence methods in information
retrieval. Information Processing and Management, 23, 4 (1987), 341-366.
26. Frawley, W.J.;
Pietetsky-Shapiro, G.; and Matheus, C.J. Knowledge discovery in databases: an
overview. In G. Piatetsky-Shapiro and W.J. Frawley (eds.), Knowledge Discovery
in Databases. Cambridge, MA: MIT Press, 1991, pp. 1-30.
27. Freund, J.E. Mathematical
Statistics. Englewood Cliffs, NJ: Prentice-Hall, 1971.
28. Frieder, O., and Siegelmann,
H.T. On the allocation of documents in multiprocessor information retrieval systems.
In Proceedings of the Fourteenth Annual International ACM/SIGIR Conference on
Research and Development in Information Retrieval, Chicago, October 13-16,
1991, pp. 230-239.
29. Gallant, S.I. Connectionist
expert system. Communications of the ACM, 31, 2 (1988), 152-169.
30. Goldberg, D.E. Genetic
Algorithms in Search, Optimization, and Machine Learning. Reading, MA:
Addison-Wesley, 1989.
31. Goldberg, D.E. Messy genetic
algorithm: motivation, analysis, and first results. Complex Systems, 3 (1989),
493-530.
32. Goldberg, D.E. Messy genetic
algorithm revisited: studies in mixed size and scale. Complex Systems, 4
(1990), 415-444.
33. Gordon, M. Probabilistic and
genetic algorithms for document retrieval. Communications of the ACM, 31, 10
(October 1988), 1208-1218.
34. Gordon, M. User-based
document clustering by redescribing subject descriptions with a genetic
algorithm. Journal of the American Society for Information Science, 42, 5 (June
1991), 311-322.
35. Greene, D.P., and Smith,
S.F. COGIN: symbolic induction with genetic algorithms. In Proceedings of the
Tenth National Conference on Artificial Intelligence (AAAI-92), San Jose, CA,
July 12-16, 1992, pp. 111-116.
36. Grefenstette, J.J.
Incorporating problem specific knowledge into genetic algorithms. In L. Davis
(ed.), Genetic Algorithms and Simulated Annealing. San Mateo, CA: Morgan
Kauffmann, 1987, pp. 42-60.
37. Hall, L.O., and Romaniuk,
S.G. A hybrid connectionist, symbolic learning system. In Proceedings of the
Eighth National Conference on Artificial Intelligence (AAAI-90), Boston, July
29-August 3, 1990, pp. 783-788.
38. Harp. S.; Samad. T.; and
Guha, A. Towards the genetic synthesis of neural networks. In Proceedings of
the Third International Conference on Genetic Algorithms. San Mateo, CA: Morgan
Kaufmann, 1989.
39. Hopfield, J.J. Neural
network and physical systems with collective computational abilities.
Proceedings of the National Academy of Science, USA, 78, 8 (1982), 2554-2558.
40. Humphreys, B.L., and
Lindberg, D.A. Building the unified medical language system. In Proceedings of
the Thirteenth Annual Symposium on Computer Applications in Medical Care.
Washington, DC: IEEE Computer Society Press, November 5-8, 1989, pp. 475-480.
41. Kitano, H. Empirical studies
on the speed of convergence of neural network training using genetic
algorithms. In Proceedings of the Eighth National Conference on Artificial
Intelligence (AAAI-90), Boston, July 29-August 3, 1990, pp. 789-795.
42. Knight, K. Connectionist
ideas and algorithms. Communications of the ACM, 33, 11 (November 1990), 59-74.
43. Koza, J.R. Genetic
Programming: On the Programming of Computers by Means of Natural Selection.
Cambridge, MA: MIT Press, 1992.
44. Kwok, K.L. A neural network
for probabilistic information retrieval. In Proceedings of the Twelfth Annual
International ACM/SIGIR Conference on Research and Development in Information
Retrieval, Cambridge, MA, June 25-28, 1989, pp. 21-30.
45. Lin, X.; Soergel, D.; and
Marchionini, G. A self-organizing semantic map for information retrieval. In
Proceedings of the Fourteenth Annual International ACM/SIGIR Conference on
Research and Development in Information Retrieval, Chicago, October 13-16,
1991, pp. 262-269.
46. Lindberg, D.A., and
Humphreys, B.L. The UMLS knowledge sources: tTools for building better user
interface. In Proceedings of the Fourteenth Annual Symposium on Computer
Applications in Medical Care. Los Alamitos, CA: Institute of Electrical and
Electronics Engineers, November 4-7, 1990, pp. 121-125.
47. Lippmann, R.P. An
introduction to computing with neural networks. IEEE Acoustics Speech and
Signal Processing Magazine, 4, 2 (April 1987), 4-22.
48. MacLeod, K.J., and
Robertson, W. A neural algorithm for document clustering. Information
Processing and Management, 27, 4 (1991), 337-346.
49. Maron, M.E., and Kuhns, J.L.
On relevance, probabilistic indexing and information retrieval. Journal of the
ACM, 7, 3 (July 1960), 216-243.
50. Martin, B.K., and Rada, R.
Building a relational data base for a physician document index. Medical
Information, 12, 3 (July-September 1987), 187-201.
51. McCray, A.T., and Hole, W.T.
The scope and structure of the first version of the UMLS semantic network. In
Proceedings of the Fourteenth Annual Symposium on Computer Applications in
Medical Care. Los Alamitos, CA: Institute of Electrical and Electronics
Engineers, November 4-7, 1990, pp. 126-130.
52. Michalewicz, Z. Generic
Algorithms + Data Structures = Evolution Programs. Berlin: Springer-Verlag,
1992.
53. Michalski, R.S. A theory and
methodology of inductive learning. In R.S. Michalski, J.G. Carbonell, and T.M.
Mitchell (eds.), Machine Learning, An Artificial Intelligence Approach. Palo
Alto, CA: Tioga Publishing, 1983, pp. 83-134.
54. Monarch, I., and Carbonell,
J.G. CoalSORT: a knowledge-based interface. IEEE Expert (Spring 1987), 39-53.
55. Montana, D.J., and Davis, L.
Training feedforward neural networks using genetic algorithms. In Proceedings
of the Eleventh International Joint Conference on Artificial Intelligence
(IJCAI-89), Detroit, August 20-25, 1989, pp. 762-767.
56. Montgomery, D.D. Design and
Analysis of Experiments. New York: John Wiley, 1976.
57. Mooney, R.; Shavlik, J.;
Towell, G.; and Gove, A. An experimental comparison of symbolic and
connectionist learning algorithms. In Proceedings of the Eleventh International
Joint Conference on Artificial Intelligence (IJCAI-89), Detroit, August 20-25,
1989, pp. 775-780.
58. Parsaye, K.; Chignell, M.;
Khoshafian, S.; and Wong, W. Intelligent Databases. New York: John Wiley, 1989.
59. Petry, F.; Buckles, B.:
Prabhu, D.; and Kraft, D. Fuzzy information retrieval using genetic algorithms
and relevance feedback. In Proceedings of the ASIS Annual Meeting, 1993, pp.
122-125.
60. Piatetsky-Shapiro, G.
Workshop on knowledge discovery in real databases. In International Joint
Conference of Artificial Intelligence (1989).
61. Pollitt, S. Cansearch: an
expert systems approach to document retrieval. Information Processing and
Management, 23, 2 (1987), 119-138.
62. Quinlan, J.R. Discovering
rules by induction from large collections of examples. In D. Michie (ed.),
Expert Systems in the Micro-electronic Age. Edinburgh: Edinburgh University
Press, 1979, pp. 168-201.
63. Quinlan, J.R. Learning
efficient classification procedures and their application to chess end games.
In R.S. Michalski, J.G. Carbonell, and T.M. Mitchell (eds.), Machine Learning,
An Artificial Intelligence Approach. Palo Alto, CA: Tioga Publishing, 1983, pp.
463-482.
64. Quinlan, J.R. Induction of
decision trees. Machine Learning, 1 (1986), 81-164.
65. Rada, R.; Mili, M.;
Bicknell, E.; and Blettner, E. Development and application of a metric on
semantic nets. IEEE Transactions on Systems, Man, and Cybernetics, 19, 1
(January-February 1989), 17-30.
66. Raghavan, V.V., and Agarwal,
B. Optimal determination of user-oriented clusters: an application for the
reproductive plan. In Proceedings of the Second International Conference on
Genetic Algorithm and Their Applications, Cambridge, MA, July 1987, pp.
241-246.
67. Rich, E. Building and
exploiting user models. In International Joint Conference of Artificial
Intelligence, Tokyo, August 1979, pp. 720-722.
68. Rich, E. User modeling via
stereotypes. Cognitive Science, 3 (1979), 329-354.
69. Rich, E. Users are
individuals: individualizing user models. International Journal of Man-Machine
Studies, 18, 3 (March 1983), 199-214.
70. Rose, D.E., and Belew, R.K.
A connectionist and symbolic hybrid for improving legal research. International
Journal of Man-Machine Studies, 3, 1 (1991), 1-33.
71. Rumelhart, D.E.; Hinton,
G.E.; and McClelland, J.L. A general framework for parallel distributed
processing. In D.E. Rumelhart, J.L. McClelland, and the PDP Research Group
(eds.), Parallel Distributed Processing, Cambridge, MA: MIT Press, 1986, pp.
45-76.
72. Rumelhart, D.E.; Hinton,
G.E.; and Williams, R.J. Learning internal representations by error
propagation. In D.E. Rumelhart, J.L. McClelland, and the PDP Research Group
(eds.), Parallel Distributed Processing, Cambridge, MA: MIT Press, 1986, pp.
318-362.
73. Salton, G. Automatic Text
Processing. Reading, MA: Addison-Wesley, 1989.
74. Schaffer, I.; Caruana, R.;
Eshelman, L.; and Das, R. A study of control parameters affecting online
performance of genetic algorithms for function optimization. In Proceedings of
the Third International Conference on Genetic Algorithms. San Mateo, CA: Morgan
Kaufmann, 1989, pp. 51-60.
75. Shastri, L. Why semantic
networks? In J.F. Sowa (ed.), Principles of Semantic Networks: Explorations in
the Representation of Knowledge. San Mateo, CA: Morgan Kauffmann, 1991, pp.
109-136.
76. Simon, H. Artificial
intelligence: where has it been, and where is it going? IEEE Transaction on Knowledge
and Data Engineering, 3, 2 (June 1991), 128-136.
77. Sleeman, D. UMFE: a user
modeling front-end subsystem. International Journal of Man-Machine Studies, 23
(1985), 63-77.
78. Swartout, W. Explanation and
the role of the user model: how much will it help? In User Modelling Panel,
Proceedings of the Ninth International Joint Conference on Artificial
Intelligence, Los Angeles, August 1985, pp. 1298-1302.
79. Tank, D.W., and Hopfield,
J.J. Collective computation in neuronlike circuits. Scientific American, 257, 6
(December 1987), 104-114.
80. Touretzky, D., and Hinton,
G.E. A distributed connectionist production system. Cognitive Science, 12, 3
(1988), 423-466.
81. VanRijsbergen, C.I.
Information Retrieval, 2d ed. London: Butterworths, 1979.
82. Weiss, S.M., and Kapouleas,
I. An empirical comparison of pattern recognition, neural nets, and machine
learning classification methods. In Proceedings of the Eleventh International
Joint Conference on Artificial Intelligence (IJCAI-89), Detroit, August 20-25,
1989, pp. 781-787.
83. Weiss, S.M., and Kulikowski,
C.A. Computer Systems That Learn: Classification and Prediction Methods from
Statistics, Neural Networks, Machine Learning, and Expert Systems. San Mateo,
CA: Morgan Kaufmann, 1991.
84. Wilkinson, R., and Hingston,
P. Using the cosine measure in neural network for document retrieval. In
Proceedings of the Fourteenth Annual International ACM/SIGIR Conference on
Research and Development in Information Retrieval, Chicago, October 13-16,
1991, pp. 202-210.
85. Wilkinson, R.; Hingston, P.;
and Osborn, T. Incorporating the vector space model in a neural network used
for document retrieval. Library Hi Tech, 10, 12 (1992), 69-75.
86. Yang, J., and Korfhage, R.R.
Effects of query term weights modification in document retrieval: a study based
on a genetic algorithm. In Proceedings of the Second Annual Symposium on
Document Analysis and Information Retrieval, Las Vegas, April 26-28, 1993. pp.
271-285.
87. Yang, J.; Korfhage, R.R.;
and Rasmussen, E. Query improvement in information retrieval using genetic
algorithms: a report on the experiments of the TREC project. In Text Retrieval
Conference (TREC-1), Gaithersburg, MD, November 4-6, 1993, pp. 31-58.
88. Zissos, A.Y., and Witten,
I.H. User modeling for a computer coach: a case study. International Journal of
Man-Machine Studies, 23 (1985), 729-750.
HSINCHUN CHEN received a Ph.D.
in information systems from the Stern School of Business at New York University
in 1989. He is an Assistant Professor of Management Information Systems at the
Karl Eller Graduate School of Business at the University of Arizona. His
research interests include computer-supported cooperative work, human-computer
interactions, text-based information management and retrieval, internet
resource discovery, knowledge acquisition and knowledge discovery, machine
learning, and neural network modeling and classification. He received an NSF
Research Initiation Award in 1992 and the Hawaii International Conference on
System Sciences (HICSS) Best Paper Award in 1994. He was recently awarded a
Digital Library Initiative grant by NSF/NASA/ARPA for a joint project with the
University of Illinois from 1994 to 1998. Dr. Chen has published more than
twenty articles in publications such as Communications of the ACM, IEEE
Computer, IEEE Transactions on Systems, Man, and Cybernetics, IEEE EXPERT,
Journal of the American Society for Information Science, Information Processing
and Management, International Journal of Man-Machine Studies, and Advances in
Computers.
JINWOO KIM received his Ph.D. in
electrical and computer engineering, with a minor in management information
systems, from the University of Arizona in 1994. He is currently a research
associate in the Electrical and Computer Engineering Department at the
University of Arizona. His research interests include evolution programs,
parallel processing, discrete-event modeling and simulation, and machine
learning. Dr. Kim has published in IEEE Transactions on Robotics and
Automation.
|
|
Reproduced with permission of the copyright
owner. Further reproduction or distribution is prohibited without permission.