Home /Artificial Intelligence
Artificial Intelligence
Artificial Intelligence
Autors: Earl B. Hunt
Paperback: 407 pages
Published: ACADEMIC PRESS New York San Francisco London 1975
Series Editors: Edward C. Carterette, Morton P. Friedman
ISBN: 0-12-362340-5
Edition Language: English
PREFACE
For the artificial intelligence field, this is a very conservative book. My motivation for writing it developed after a number of observations of how people with one background zestfully approach another area with very little awareness of what has been done before. I also believe that training in the artificial intelligence field, and to a lesser extent, in computer science in general, is too bound to fads. Too much effort is placed on putting the student at the forefront ofthe field and too little in defining and presenting basic information. In my university, for instance, the first course in artificial intelligence (obviously not taught by me) deals with the interesting but tangentially relevant theories of Piaget on human cognitive development, and then plunges the first-year graduate students into detailed studies of the latest (often good) doctoral theses from Massachusetts Institute of Technology. The trouble with this approach, as I see it, is that the student is offered a good view of a few trees (chosen by the professor), and no idea either of the forest or of botany. In other words, I think that the professors and the texts should assume more responsibility for analyzing the basic principles and problems. After this is done, it is appropriate to have seminars dealing with the latest, hot-off-the-press research. So what is to be done? I have tried to gather together, in one place, a reasonably detailed discussion of the basic mathematical and computational approaches to problems in the artificial intelligence field. Emphasis has been on principles, not details. Also, I have tried to focus on the major unsolved conceptual problems, especially in the fields of machine comprehension and perception. You will find no descriptions of robots in this book. Neither will you find a machine playing chess.
You will find discussions which, if you will wade through them, will help you understand the basic approaches of machine pattern recognition, robot perception, deductive question answering, and computer game playing.
The level of discussion is appropriate for seniors and graduate students in any of the computer related sciences, or in experimental psychology. You must be easily familiar with algebra, and should be passingly familiar with the concepts of probability theory, calculus, and symbolic logic. Persons who do not have some mathematical background, however, probably will not find this text very illuminating.
ACKNOWLEDGMENTS
Dr. J. Ross Quinlan, of the University of Sydney and I had intended to write this book together, and did work on one or two chapters. Alas, we were defeated by the Pacific Ocean. It simply proved impossible to communicate frequently enough, especially in the light of the rapidly expanding literature. Nevertheless, Ross is responsible for a high proportion of those good ideas I have proposed and probably should be blamed for very few of the bad ones . . . and I certainly don't know which. Thanks, Ross.
My second thanks for scientific collaboration goes to Dr. Sharon Sickel, who forced me to keep my nose to the grindstone of mathematical theorem proving, just to be sure I knew what she was talking about. My discussion of theorem proving owes a great deal to her clear and precise reasoning.
A number of other associates have helped in various ways, either by their assistance in some of the specific research problems described here, or by their comments on various proposals and on earlier drafts of the text. I would like to mention particularly, Mark Stickel,David Garnatz, Patrick Russell, Edward Strasbourger, and Michael Delay, all of the University of Washington. Very helpful comments on the text have been received from Edward Carterette of U.C.L.A., Cordell Green of Stanford University, and A. A. Marley, of McGill.
Most of the original research reported here was sponsored by the U.S. Air Force, Office of Scientific Research (Air Systems Command). Indeed, the general plan of the book arose from discussions I had with those charged with various Air Force computer applications, when I asked what sort of consulting guidance they could use. It was there that a need for a book of basic techniques was most apparent, and most clearly stated. Dr. Glenn Finch, now retired, but then with the Office of Scientific Research, was a staunch source of encouragement and support for several years.
Many secretaries worked on this manuscript at various stages. I won't try to name them. They all did a good job. Without the courtesy of the various scientists who responded to preprint requests and put me on their mailing lists, I could not have written a book of this nature, nor could anyone else.
During the period in which I wrote this book I held the post of Chairman of the Psychology Department at the University of Washington. My colleagues in that department respected the idea that their administrator should have a scholarly life as well, and regulated their requests for services so that it was possible for me to maintain a scientific career. I appreciate this.
ACKNOWLEDGMENTS
Neither Academic Press nor my family left me during long periods when I failed to communicate and did not seem to have produced any tangible results. Completing
this book provides Academic Press with a manuscript and my family with more of my time. I hope both press and people will be pleased.
Chapter I
THE SCOPE OF ARTIFICIAL INTELLIGENCE
1.0 IS THERE SUCH A THING?
There are over 100 computer science curricula in universities in the United States. Practically every one of them includes a course entitled artificial intelligence. In fact, it appears that more people study programming techniques developed for artificial intelligence research than study techniques for business-data processing (Elliot, 1968). The educational programs are matched by comparable research activity. There is a journal called Artificial Intelligence and a series of scientific conference reports on machine intelligence. What gives rise to this activity?
If you asked physicists or chemists to offer an abstract definition of their field, you would expect to find them in substantial agreement. It is doubtful whether you would find such agreement if you were to gather together the various scientists studying artificial intelligence. Interestingly, however, you probably would find agreement if you examined the details of the courses they teach and the research they do. The Association for Computing Machinery's recommended computer science curriculum contains an outline for course A9, Artificial Intelligence, which lists theorem proving, game playing, pattern recognition, problem solving, adaptive programming, decision making, music composition by computer, learning networks, natural-language data processing, and verbal and concept learning as suitable topics (ACM Curriculum Committee, 1968). Similar veins run through the research literature. No one can question the existence of a scientific field of activity. But what is its significance?
One of the tests that has been proposed to determine the viability of a scientific field asks the question, "Is it having an effect on fields closely related to it?" By this criterion, artificial intelligence rates rather well. Programming techniques originally developed for artificial intelligence research have become widespread in computer programming. Psychology has undeniably been influenced by the concepts of artificial intelligence in a number of fields (Hunt, 1968; Frijida, 1972; Loehlin, 1968; Miller, Galanter, & Pribram, 1960; Weisstein, 1969). In chemistry, artificial intelligence techniques have been applied to the analysis of data from mass spectroscopes (Lederberg & Feigenbaum, 1968) and in planning the synthesis of organic molecules (Corey & Wipke, 1969).
There appears to be a meaningful body of knowledge whose unifying principles are difficult to identify. The problem seems to lie in the definition of intelligence. Psychologists, who have faced the problem of defining intelligence for some time, have adopted the pragmatic approach that intelligence is what the intelligence test measures. I shall do the same. For the first 90% of this book, "artificial intelligence" will simply be the collection of things taught in artificial intelligence courses. I shall select a subset of these topics which appear to me to be most interesting or important, and discuss them in some detail. By taking this approach, I postpone such broad philosophical issues as, "Can a machine think?" and "Is it possible to speak of machine understanding?" until a common background of factual knowledge has been established. To do this, the book has been divided into four sections. The first section contains, in this chapter, an overview of the various fields of artificial intelligence and, in the next chapter, an attempt to connect artificial intelligence problems to some of our notions of computability and abstract computing devices. No attempt will be made to give a comprehensive history, but a book of this nature should contain some overview and at least a few, possibly idiosyncratic, notions of how the field developed historically. Similarly, the chapter on abstract computing is not intended to be a rigorous introduction to the field, but rather an overview of the general notion of computability, with emphasis on the interaction between computability theory and artificial intelligence.
The remaining chapters are divided into sections dealing with substantive knowledge in three areas that are basic to all artificial intelligence: pattern recognition, problem solving, and machine comprehension. No attempt has been made to provide a comprehensive literature review. (There are well over 1500 relevant articles.) The goal has been to explain in detail those approaches and principles which appear to have the greatest promise in each of the fields under discussion. Completeness of coverage has thus been sacrificed; many artificial intelligence projects are simply not mentioned.
When I began work on this book, it soon became apparent that an analysis of previously published reports simply would not provide answers to several important questions which could be raised. Therefore, in conjunction with the discussion, previously unpublished research will be presented. For the most part, this work is concentrated in the fields of theorem proving and pattern recognition, which are the fields most crucial to other artificial intelligence endeavors. Most of the studies can be characterized as attempts to fill gaps, rather than to extend our knowledge of given approaches to a problem.
In the final chapter, I have confronted the philosophical questions which I had been avoiding. My answer to the "Can a machine think?" will probably be an unsatisfactory one to enthusiasts on both sides.
1.1 Problem Solving
"Problem solving" has a rather restricted meaning in the artificial intelligence lexicon. A problem is said to exist when one is given a present state, a description of the characteristics of a desired state, and the operations that can be used to go from one state to another. There may be constraints specifying that certain states are not to be visited at any point. This formulation is known as the state-space approach. A few examples will show how generally applicable it is.
In the "missionaries and cannibals" problem, three missionaries and three cannibals wish to cross a river from the left bank to the right. They have available a boat which can take two people on a trip. All can row. The problem is to get all six to the right bank, subject to the culinary constraint that at no time may the number of missionaries on either side of the river be exceeded by the number of cannibals on that side. To translate the puzzle into a formal problem, let a state be defined by the number of missionaries and cannibals on the left bank and the position of the boat. The starting position is (3, 3, L) and the goal state (0, 0, R).
The permissible moves of the boat define the operators. In theorem proving, one is given a set of true statements (premises) and a statement whose truth is not known (the theorem to be proved). By applying a sequence of allowable operations, such as the inference rules of algebra or trigonometry, expand the set of true statements to include the theorem. The states are defined by the statements proven thus far; the inference rules define the operators.
Given a chess position, change it into a position in which the opponent's king is checkmated. En route to this position, avoid any position in which your own king is checkmated or in which a stalemate occurs. The board positions define the states, and the piece moves the operator.
It has been said that modern artificial intelligence research began with the efforts of Allen Newell, Herbert Simon, and J. C. Shaw to write problem-solving programs (Fiegenbaum, 1968). Their studies were conducted jointly at the RAND Corporation and the Carnegie Institute of Technology (now Carnegie-Mellon University).
Intellectually, the RAND-Carnegie work was important both in its own right and because it set the tone for many other efforts. Technologically, the programming methods developed in the course of the research have become widely used throughout computer science.
Newell et al (1957) first produced the Logic Theorist (LT), a program designed to prove theorems in the sentential calculus. The LT successfully proved 38 of the 52 theorems of Chapter 2 of Whitehead and Russell's Principia Mathematica.1 As the Principia is regarded as one of the basic works establishing the logical foundations of mathematics, LT's feats were bound to attract interest. It is easy either to overstate or unduly minimize the significance of LT's success in reproducing the Principles results (references are purposely omitted here), so one wants to be aware of precisely what was done.
1Twelve of the 14 unsolved problems were not completed because of the physical limitations of the computer then available. The others were beyond the capacity of the algorithm for logical reasons. A modified LT operating on a larger computer later solved all 52 theorems (Stefferud, 1963).
The proofs of the theorems in Chapter 2 of Whitehead and Russell would not be considered deep by a mathematician. Bright university students can generally produce most of them. The brilliance of Whitehead and Russell's accomplishment was not in proving the theorems, but in realizing that their proof could be used as the basis for a development which leads from logic to mathematics. This realization was an act of creative insight which Newell, Shaw, and Simon made no attempt to mimic on a machine. On the other hand, the problems of Chapter 2 are not exactly trivial. These proofs probably are beyond the grasp of 60% of humanity, a fact one is likely to overlook if his circle of acquaintances is restricted to present and future Ph.D.'s. Newell et al. have repeatedly stated that the real significance of LT lies in how it produced proofs, not in the proofs it produced. Theorem proving is an example of a large class of problems for which there are solution techniques which are known to work, but which are not practical to execute. For example, the following exhaustive technique is guaranteed to produce a proof for any provable theorem—but it may take a while. . . .
Beginning with the premises, write down all inferences that can be made by combining two or more known true statements in various ways. Examine the set of statements so produced, to see if it contains either the theorem or its negation. In the former case, the theorem is proven; in the latter, it is disproven. If neither case occurs, add the new set of statements to the premises and repeat the procedure. There will be some number, n, such that a proof (or disproof) will be produced on the nth step, but there is no guarantee what n will be.2
Newell, Shaw, and Simon called this procedure the "British Museum algorithm," since it seemed to them as sensible as placing monkeys in front of typewriters in order to reproduce all the books in the British Museum. They suggested instead following a heuristic approach, a term they took from Polya (1954, 1957), who believed that most mathematical proofs are achieved by guessing the nature of a solution, then proving that the guess is correct. Polya contrasted this with the algorithmic technique of mechanically going through steps which are bound, eventually, to result in the correct answer.3 The British Museum algorithm is clearly algorithmic. What Newell and his colleagues set out to do was to write a set of rules (i.e., a program) for generating guesses, then proving that they were correct. The idea caught on quickly, and today "heuristic programming" is spoken of respectfully, and somewhat mysteriously, as an advanced programming technique, even though it is not a programming technique at all, but rather a way of thinking about what a program is supposed to do when it is used to attack a certain problem.
In writing the Logic Theorist, Newell, Shaw, and Simon encountered problems that forced them to develop an important new programming tool, list processing. This is a method of organizing a computer's memory so that ordered sets (lists), instead of variables, are the basic operands. List-processing techniques have found wide application throughout computer science (Knuth, 1969). It can also be argued that in thinking about complex information processing the appropriate language to use is a language for the manipulation of lists, since list processing has proven to be a very important tool in many applications involving symbolic rather than numeric processing.
2 The procedure may not terminate if the "theorem" to be proven is not derivable from the axioms. This point is discussed in more detail in Part III.
3 Polya did not give a formal definition of either algorithm or heuristic, nor did Newell et al. Currently within computer science the term algorithm is used to mean a procedure for operating on strings of input sentences from a specified set of legal strings (Glushkov, 1967). By this definition, any program is an algorithm, and the Newell et al. program should be thought of as algorithms for generating guesses.
The RAND-Carnegie approach to problem solving was characterized by a strong interest in how humans solve problems. A program is a precisely defined set of rules for attacking a given class of problems. Suppose a reasonable correspondence were to be established between the input and output of a program and the stimuli and responses observed in the psychological laboratory. Then one could say that, at the level of information processing, the program was a model of the man (Newell, Shaw & Simon, 1958; Newell & Simon, 1961, 1972). Simulating human thought, however, is a different goal from creating a good problem solver, since in one case, the criterion of success is that good problem solving be produced. Newell and Simon (1961) saw no incompatibility in the joint pursuit of these goals, and appear to have used knowledge of human problem-solving to suggest the structure of heuristic programs, and vice versa. Intuitively, this can be justified on the grounds that people are the most flexible problem solvers of which we have knowledge; so if we want to construct artificial intelligences, we should first study how the natural ones work.
The argument that programming should mimic human intelligence is weak if you are interested in solving problems within a specialized area in which "inhuman" methods may work well. The Newell et al. work on mathematical theorem proving was criticized on the grounds that more efficient exhaustive search methods than the British Museum algorithm existed, and that by using better exhaustive methods, better results could have been obtained (Wang, 1960). This criticism somewhat misses the point, since Newell and his co-workers were more interested in the generality of their methods than in substantive results in theorem proving. By the same reasoning, it was appropriate for Newell et al. to investigate the theorems of the Principia as an example of a general class of problems, even though group theory might have provided problems that would be more interesting to a mathematician.
The skeptics' retort is that general problem solving skills, divorced from content areas, may not exist. Newell and his associates' next project, the General Problem Solver (GPS) program, attempted to show (a) that such skills do exist and (b) that they can be discussed at the very concrete level of computer programming. Whereas the LT had had built into it the operations used in Whitehead and Russell's formalization of the propositional calculus, the GPS was a program for manipulating states and operators in the abstract. To use the GPS on a specific problem, one first had to define the structure of specific states and operators (e.g., the locations of missionaries and cannibals and the moves of the boat) to the program. The act of specification was called describing the task environment. The GPS program was capable of attacking any problem that could be translated into a suitable task environment. The goal of GPS research has been to show that well specified generally applicable procedures (i.e., programs) lead to the sort of solutions which, when they are produced by humans, are applauded as clever. The list of problems attacked by the GPS and similar programs includes elementary logic, chess, high school algebra word problems, and the answering of questions phrased in somewhat ambiguous English, but confined to a very small data base. In one of the most extensive investigations the GPS was used to attack ten different small problems in fields ranging from symbolic integration to solution of the missionaries and cannibals puzzle (Ernst & Newell, 1969).
On several occasions Newell and Simon have offered comparisons of traces of a theorem proving program's progress toward solution with comments recorded as a person "thinks out loud" as evidence in support of their contention that their program modeled human thought. In 1972, they published a large body of such data and proposed a general technique for designing programs intended as general simulations (Newell & Simon, 1972). About the same time that the RAND-Carnegie group began to attract attention, a closely related and similarly active group formed at the Massachusetts Institute of Technology, under the leadership of Marvin Minsky and John McCarthy.
Subsequently McCarthy and Feigenbaum, a member of the Carnegie group, both moved to Stanford University, which, together with the Stanford Research Institute, has also become a major center of the artificial intelligence studies. Both directly through the research of members of the groups, and ndirectly because of the prestige of the institutions involved, the M.I.T. and Stanford groups have heavily influenced the American study of computer problem solving.4
In contrast to the early Carnegie-RAND studies, the M.I.T. research was more concerned with formal mathematical representations. Typical problems studied included symbolic integration (Slagle, 1963), question answering using trivial data bases (Raphael, 1964), the combining of deductive arguments and information retrieval (McCarthy, 1959; Slagle, 1965; Green, 1969), and that hardy perennial, chess playing by machine (Greenblatt et al., 1967). McCarthy, Minsky, and their co-workers seem to see artificial intelligence as an extension of mathematics and symbolic logic, rather than as a parallel iscipline to psychology. They have been extremely careful to formalize their computing procedures and to relate them to more conventional mathematics, in particular to recursive function theory (McCarthy, 1961; Berkeley & Bobrow, 1964). Attempts by Amarel (1968) and Banerji (1969) to formalize problem solving processes are also worth examination, as they represent independent efforts directed to the same goal. An emphasis on formalization is of great potential value, since an adequate formalization will be necessary if we are to have a precise, viable theory of the problem solving process.
On the other hand, such a theory does not now exist, and it is our feeling that in many specific artificial intelligence projects the informal approach of the Newell-Simon variety would have been just as satisfactory as the often forbidding formalisms that have been presented.
4 Since 1965, the M.I.T. and Stanford groups have devoted a good deal of effort to the design and construction of robots, a problem which does not appear to have involved Newell, Simon, and their colleagues. Similarly, since 1965, the Newell-Simon group seems to have moved more toward the study of psychological problems. Naturally, any statement cannot cover all the members of a group.
A complementary approach to problem solving, based on the ideas of the mathematician J. A. Robinson (1965), has also greatly influenced the study of mechanical thought. Narrowly conceived, Robinson dealt only with a new approach to theorem proving in formal logic; but his methods can be applied to virtually any situation we normally call problem solving. As we shall see, the technical details of the method can be formidable. The basic logic is quite simple. Any statement can be proven true by showing that its negation is false. In theorem proving one must show that the statement
A U B (1)
is true, where A is the hypothesis of the problem and B the desired conclusion.
Now suppose that A and B are sets of statements. If we were to use the Newell and Simon approach to proving that (1) is true, we would begin with the set A0 of axioms, and then apply some selected rules of inference to produce a new set, Ai of axioms and inferred statements. Rules of inference would be applied to A! to produce a second set, A2, and then a third, and fourth, until some A! was found that contained B, the set B of statements to be proven, as a subset.
An alternative way to prove (1) would be to show that its negation
7(A U B) = 7(A U 7B) = (A · 7B) (2)
was false.5 This can be done by showing that the simultaneous assertion of A and 7By leads to a contradiction. Robinson proposed that this be done by the use of a single rule of inference, called resolution. We introduce the idea with a trivial example. Consider the musical comedy problem:
If the lady is Charley's aunt, then she will visit with Charley. But the lady is never seen with Charley.
Using a rather obvious translation to the formalisms of logic we have the clauses
Cl (AuntQady, Charley) D Present(lady, Charley)) (3)
C2 (7Present(lady, Charley))
5 throughout the text " 7 " will indicate negation, " u , " disjunction, and " · , " conjunction. Note that if B is a set of statements which, taken together are, to be interpreted as a single statement, then 7B is the disjunction of the negation of each statement in B.
Let A stand for "Aunt(lady, Charley)" and P for "PresentQady, Charley)." The clauses of (3) are the axioms of the problem. The hypothesis is that the lady is not Charley's aunt.6 This would be written (7A). To prove this conclusion by the resolution method we want to negate it [producing (A)], and add the resulting clauses to the axioms. Doing this, and rewriting clause Cl in disjunctive form, the set of clauses
Cl (7AUP), C2 (7P), C3 (A). (4)
is produced.
We now introduce the second inference step in Robinson's method. Suppose two clauses are of the form (A U B) and (JA U C). It follows that
{A U B) · (7A U C) (B U C) (5)
An inference of this sort is called a resolution of the clauses on the left of the ID to produce the resolvent clause on the right. A contradiction exists if two clauses of the form (A) and (7A) are simultaneously asserted. Formally, two such clauses can be resolved to produce the empty clause, ( ). Inference of the empty clause is an indication that the original set of clauses contained a (perhaps implied) contradiction.
In view of this background, we now have two ways of proving by resolution that the lady is not Charley's aunt. In one line of proof clauses Cl and C2 are resolved to produce the clause (7¢). This resolves with clause C3 to produce the contradiction. Discovery of the other proof is left to the curious.
In practice there are many complications which have not been illustrated. Nevertheless, it can be shown that resolution is a complete proof method. This means that if a contradiction can be obtained from a set of clauses by any valid proof method, then it can be obtained by resolution. In Polya's terminology, therefore, resolution is an algorithm rather than a heuristic. We shall see that heuristic techniques do play a prominent part in practical applications of the resolution method.
There have been very many amplifications of the basic resolution method. The major results are discussed in Part III. Quite apart from the specific results, one of the effects Robinson's work had was to shift the emphasis in artificial intelligence from attempts to mimic human solutions on machines to concern for machineoriented problem solving methods. A second effect of Robinson's work has been a renewal of interest in practical applications of theorem proving techniques, especially to mechanized information retrieval.
6 "She" was Charley in disguise.
1.2 Pattern Recognition
It may be true that every object is unique, but life would be impossible if we treated this idea too seriously. For many purposes we find it convenient to treat unique beings as typical—typical service station attendants, elevator operators, or collectors of internal revenue. We let our knowledge of the haracteristics of the general class determine our reaction to the specific example. To avoid disaster we must learn to make appropriate classifications. It has been suggested that the ability to do this is basic to intelligent behavior (Bourne, 1965; Bruner, Goodnow, & Austin, 1956; Guilford, 1967; Hunt, 1962). People make subtle classifications so routinely that we do not realize how impressive their performance is. A man returning home will instantly recognize his wife, children, and the family dog.
Obviously, the differences between the way the wife, and certainly the dog, looked in the morning and evening are trivial, but this is precisely the point. How do we specify to a computer what is a trivial change in a pattern and what is a change sufficient to require a new classification?
In an influential early paper, Selfridge (1958) proposed that pattern recognition be achieved by computing the weighted sum of a number of "recommended" classifications, each based on different features of the object to be recognized (descriptors). The individual recommendations would need be only slightly more accurate than chance, but the system might be quite accurate. This idea has been developed into what will be called the parallel method of pattern recognition. Every object can be considered to have a most primitive description, representable as a vector, whose elements serve as the arguments for a number of functions, whose values are always computed. These, in turn, serve as the arguments for a decision function which determines the eventual classification. The preceding description, which does not do full justice to Selfridge's ideas, presents pattern recognition as a problem in classifying vectors, or points in «-dimensional space. This view has been made more explicit by Sebesteyen (1962) in a monograph relating pattern recognition problems to mathematical decision theory, and by Miesel (1972) in an excellent discussion of pattern recognition techniques. These aspects of pattern recognition are closely related to classical statistical techniques in multivariate analysis (Anderson, 1952; Tatsuoka, 1971). Pattern recognition may also be approached by analogy to biological processes.
In some circumstances animals' capacity for pattern recognition exceed those of any buildable machine. For simplicity, consider only human capabilities. When we are dealing with classification based on immediate sensory experience, e.g., recognizing faces or spoken words, humans can easily utperform machines. In nonsensory situations human performance is less impressive. For example, people cannot compete with pattern classification programs if the correct classification rule involves logical combinations of abstract properties, such as color, size, and form (Hunt, Marin, & Stone, 1966). To make the problem more complex, these are situations in which it is not clear on what basis people perform erratically, either very well or very poorly. What pattern recognition is involved in reasoning by analogy? The topic is worth study.
Since pattern recognition must be a neural function in animals, one might look for the key to biological pattern recognition in the properties of the neuron itself. For many purposes it may be assumed a neuron is a threshold detection element.
That is, it either produces a constant output when the sum of its inputs reaches a certain point, or it is quiescent. McCulloch and Pitts (1943) have proven that any computable function can be realized by a suitably organized network of idealized neurons, threshold detection elements whose logical properties can reasonably be attributed to an actual neuron. The question, then, is whether any reasonable network reorganization principle can be found which would make it possible for an initially randomly connected group of idealized neurons to organize themselves into a "computing device" capable of solving an arbitrarily selected pattern recognition problem.7 Such a reorganization principle would be a theory of learning applicable at the level of the individual neuron. It is intuitively obvious that such a principle must exist, since animals demonstrably do learn new classification rules, and it is
ridiculous to believe that they are born "prewired" for all the classifications which they might learn.8 A neurological learning theory put forward by the Canadian psychologist, D. O. Hebb (1948), although originally intended strictly as a model for psychology, has proven quite influential in artificial intelligence. A modification of it was used to define the Perceptron pattern recognizer (Rosenblatt, 1958, 1962). The Perceptron, or rather, perceptrons, since what Rosenblatt described was a principle for building programs rather than a single program, exist both as programs and as specially designed analog computers. Substantial effort has been devoted to an analysis of the general class of pattern recognition systems which perceptrons represent. The concepts of linear pattern recognition systems and of threshold logic systems have been developed. The first term refers to the method used for combining the individual judgments of different feature-recognition elements, and the second to the use of devices which give a constant signal once the level of their input signals exceeds a fixed value. A substantial mathematical theory describing such devices has been developed (Nilsson, 1965a), culminating in an analysis of the type of problems which can be solved by linear, threshold logic systems (Minsky & Papert, 1969).
7 Suppose that every object to be classified is completely described by a vector of not more than k binary digits. Therefore, there will be at most 2 possible distinguisable objects. We desire a function which, upon receiving one of the 2 possible descriptions, will produce a 1 if and only if the described object is in class "A," and a 0 otherwise. One of the implications of McCulloch and Pitts's work is that such a function can be constructed from a network of idealized neurons.
8 It is not so ridiculous to believe that animals are predisposed toward learning certain types of functions. The nervous system as a whole certainly is not random. On the other hand, the genetic message does not contain enough information to specify all the details of the adult nervous system.
In research on perceptrons the emphasis has been on adjusting the weights assigned to a fixed set of feature detectors. This is consistent with Selfridge's formulation of the problem. An alternative pattern-recognition technique is to search for "good" features, which discriminate among classes so clearly that an appropriate feature weighting rule will be easy to detect. This approach has been stressed in a number of research projects, notably those of Uhr and his associates (Uhr, 1965; Uhr & Vossler, 1963) and in the Soviet literature (Bongard, 1970). An examination of recent articles in the journal Pattern Recognition indicates that increasingly greater concern is being paid to feature detection than to feature weighting.
The pattern recognizers mentioned so far are at least analogs to biological pattern recognition. In biology the term "pattern recognition" is implicitly assumed to refer to classification at the sensory level. This orientation is shown by frequent reference to visual pattern-recognition examples. Psychologists have used the term "concept learning" to refer to a task that is mathematically identical to pattern recognition but different at a psychological level. The contrast is easy to illustrate. Imagine that a program has been written to distinguish between pictures of men and women. Now what does "showing a picture to a computer" mean? It means that shades of gray in small areas of a picture are by some method coded into numbers, and the resulting vector used as input to a pattern recognition program. A very simple example is shown in Figure 1-1.
Fig. 1-1. Simple example of how a
picture is mapped into a vector for pattern
recognition. The large square is divided into
four quadrants. If the upper left quadrant is
shaded, the first component of the vector is
1; otherwise it is zero. Similar rules apply to
other quadrants and vector components.
The program, then, is required to classify vectors, not pictures per se. We can contrast this to a program which is to be used to classify animals into their appropriate species and genus. Training would proceed by showing the program descriptions of the animals and indicating the correct species and genus. At the atomic level, the program would be reading sets of characters describing an animal, and converging the descriptions into ordered sets of codes. The pattern recognition program would again be classifying vectors. Since the problems are identical at the machine level, it seems reasonable that the same program would be used to derive a pattern classification rule in each case. Yet if we think of how a person might go about learning these tasks, we suspect that there could be a difference between perception and cognition. Those scientists interested in the more cognitive pattern recognition tasks have, in fact, developed rather different algorithms than those developed by workers who began with a sensory orientation toward pattern recognition. Instead of making a number of tests in parallel, and then evaluating the joint results, logical classification algorithms tend to proceed by first making a few tests, and then, depending upon the outcome of these, either making a second test or classifying the object. The nature of the second test made will depend upon the outcome of the first. The process can obviously be extended to any number of tests. Classification rules of this nature are called sequential decision procedures. Until recently, there was little development of sequential decision procedures in the artificial intelligence literature, although such procedures were studied in detail in statistics and operations research. It is interesting to note that in at least one case in which logical pattern recognition was approached from a strictly statistical viewpoint (Morgan & Sonquist, 1963) the resulting computing algorithms were very similar to those developed from independent work in artificial intelligence (Hunt, Marin, & Stone, 1966). In the last few years greater appreciation of the importance of sequential
procedures has been evidenced in the pattern recognition literature (Fu, 1969).
Human vision may well be the finest pattern recognition system in existence. It is obvious that computers ought to be able to look at a scene and analyze it in the way that a human does. In fact, most science fiction scenarios on television assume that the computer of the future will do this routinely. The truth is that computer vision presents a number of very difficult problems. Analysis of a visual scene appears to be all but impossible unless the analyzing device contains a logical model of the scene being viewed, in order to determine how to resolve ambiguities in the visual input. This conclusion is hardly surprising to those psychologists who have studied the perceptual illusions and constancies, and a number of suggestions about how computers ought to process visual data have been adopted from contemporary research on human vision. Similarly, the efforts to give a computer eyes have produced ideas which may be of use in the construction of theories of human vision, although the nature of the contribution is not clear (Weisstein, 1969). At present we can only present a glimpse of this fast moving field, which may not yet have found its basic techniques.
1.3 Game Playing and Decision Making
Attempts to program computers to play games have been a feature of modern artificial intelligence from its beginning. Undeniably, one of the reasons for this is the sheer challenge of the problem. People play complex games. It is quite easy to show that in principle a computer program can be written to play a perfect board game.9 Somehow, no one seems to be quite able to produce a program which matches the best human play.
A second, more general reason for studying game-playing programs is that they may represent a general class of problems which, at present, computers attack rather poorly. In an excellent article on early attempts at chess playing, Newell, Shaw, and Simon (1963) point out that there are too many alternatives for a computer to examine each move, so an adequate chess playing program must contain heuristics which restrict it to an examination of reasonable moves. Also, to win a game you need not select the best moves, just the satisfactory ones. Hopefully the study of games will teach us how to use computers to search for satisfactory, though sometimes suboptimal, solutions to very complex problems. More pessimistically, it may be that all the efforts in game playing will lead only to a formalization of strategies for the particular games which are studied.
A third reason for studying game playing is that it represents an area in which learning plays an important but not well understood role. Telling a bright person the rules of chess does not make him an expert player. One must have experience, although it is not clear what experience does. If we could define effective game playing programs which profited from their experience then we might have some idea of just what it is to be learned by practicing intellectual tasks. Samuel (1961, 1963, 1965), whose work on checkers is perhaps the most sophisticated work on self-modifying programs for game playing, has stressed that his research is an exploration of machine learning using checkers as a vehicle, and not a study of checkers for its own sake.
Studies of game playing programs have developed almost independently of the mathematical study of "game theory." In part, this may be because the two efforts are really directed at different goals. The mathematical analysis of games concentrates on the strategy to be followed by an idealized perfect player. In programming a game player one tries to produce acceptable play within certain limits on the use of resources. Simon (1956) has summed up the problem neatly by distinguishing between strategies that optimize and those that satisfy. There may be a wide gulf between the description of a perfect player and the construction of an adequate one. Nevertheless, one can reasonably ask why there has not been more coordination between the two fields of study.
We shall treat game playing as part of problem solving and, in fact, shall deal with it only briefly. This approach has been taken for two reasons. In order to write a game playing program one must solve the problem of representing the structure of the game at hand. It is not clear that the solution to a particular game problem tells us anything general about either games or programming. Game playing becomes of interest when aspects of individual games can be shown to be specializations of a definable general class of problems. Our discussion will stress the class of problems and not the details of any one game. We do point out, though, that our approach leads us away from an interesting and developed field. There are regularly scheduled computer chess matches. A number of other games have been studied less intensively. The publication of a volume summarizing this work and the ideas behind it would be of considerable value.
9 More precisely, an algorithm exists for determining the optimal move at any stage of an «-person game in which each person has perfect information and there are a finite number of moves on each play (Luce & Raiffa, 1956).
1.4 Natural Language and Machine Comprehension
There have been many attempts to process natural language by computer. These studies are motivated by intensely practical considerations. The biggest bottleneck in the use of computers is the time it takes to write programs. Think how much easier it would be to use computers if the machines could understand simple statements in a natural language. It has been suggested that the ideal computing system would be an "advice taker" which would receive its input in natural language, then respond to queries by deducing implications from the facts it already knew (McCarthy, 1958). Such a program would be extremely important from a theoretical viewpoint, since intelligent answers to complex queries would imply "understanding," something which machines are not supposed to be able to do. There are obvious practical applications.
A quite different application requiring analysis of linguistic data is the recurring effort to produce an automatic language translator. In the early 1950s there were many optimists who believed that mechanical translation was largely a matter of creating hardware large enough to hold a dictionary, since the translation process itself would be straightforward. Today the optimists are few and far between, since virtually the only contribution to technology of the various early mechanical translation projects was to show how little we know about linguistics. A case study suggests that the quality of mechanical translation may not have advanced from 1964 to 1971. (Sinaiko, 1971). In fact, if one's goal is to create a common language for scholars and diplomats, it might be easier to go back to teaching classic Latin on a world-wide basis than to build a perfect machine translator! Fortunately, more modest goals are both attainable and useful. A third motivation for computer processing of written data is provided by the sheer volume of words in our society. Although more and more of our records are being placed in machine readable form, most of our transactions are still conducted in a natural language. Through these records society produces a vast traffic of information about itself. In order to control organizations of hundreds of millions of people, there must be a rapid way to record, abstract, and analyze the traffic. In some cases, such as the recording of bank checks, we now use computers to manage the information flow. At the other extreme, we rely on human journalists to abstract and comment upon the week's political news. There is an intermediate area in which we still need humans, but where much of the burden of the task could be assigned to the computer. Automatic abstracting and indexing of technical documents is a case in point. The power of computers to do this and similar tasks will depend greatly upon our ability to write programs that can master the intricacies of natural human languages.
Efforts to develop natural language processors fall between artificial intelligence and information retrieval. There is no hard and fast rule to distinguish when a particular application is in one field or the other. We can distinguish some end points. Programs that search for a particular sequence of letters in a text, but do nothing more than record the occurrence or nonoccurrence of the sequence, although very useful in indexing and abstracting, are seldom considered artificial intelligence examples. Programs that compose complex and varied replies to a user, even though those replies are keyed by single words in the input stream, are in the artificial intelligence field. Perhaps deciding how to make the distinction will have to await the creation of a good pattern recognizer!
1.5 Self-Organizing Systems
Much of our intellect, and certainly most of our research capital, is spent in the design and production of machines. Could this task itself be relegated to an automaton? If you accept the view that animals are subject to the same laws as machines, the answer to the question must be "yes," since the evolutionary sequence can be considered an example of the action of physically understandable laws.10 In principle, it should be possible to mimic evolution in machine design. There might be other ways of building automata.
Von Neumann (1956) sketched a proof showing that it was logically possible to build a device which could either (a) reproduce itself or (b) reproduce variations of itself. His reasoning was based upon McCulloch and Pitt's previously cited proof that any computable function can be realized by a network of idealized neurons. From the beginning, then, the mathematical argument had a strong biological flavor. Von Neumann's proof showed only that the self-reproducing machine was possible. He did not discuss its structure, nor did he discuss the relative efficiency of different schemes for "evolution." These questions have been explored subsequently, without conclusive results. A slightly different question has also been raised about self-organizing systems. Consider a device that has the capability to make changes in its internal state, and hence the ability to alter its reaction to specific stimuli. Here the analogy is to learning, rather than to evolution. There is also a question of control involved. The total system must contain a mechanism for keeping itself in a "viable" region of its space, by adjusting its reactions as a function of both environmental and internal conditions. How should this system be organized?
Weiner (1948, 1952) stressed that the key to building a system capable of adjusting to a changing environment was the proper use of the cybernetic principle. Roughly, a cybernetic device is one which receives as one of its inputs some function of its previous output. More colloquially, if a marksman wants to keep making good shots, he has to know where the bad shots are going; and half the advice to the lovelorn seems to be "watch the reaction you get." We have purposely chosen wide-ranging examples. The concepts of systems which reorganize themselves under feedback control can be applied to any organization whose state can be described by specifying mutually dependent variables, from the neurological examples that interested Weiner to international power politics. At such a level of generality, "cybernetics" refers more to a view of the world than to a precisely defined mathematical technique.
Abstract theories of evolution and learning are as broad topics as one could imagine. Many devices embody the ideas introduced by Von Neumann and Weiner, yet we would hesitate to call them intelligent. An automatic thermostat is a good example. Can anything of content be said about self-organization itself? A number of people think so, but I am not among them. I believe that much of the work done is so general as to border on the vacuous. Because of this prejudice, self-organizing systems will not be discussed.
10 The converse is not true. You could maintain that the evolutionary sequence, or the creation of man, was an event outside natural laws and still agree that principles of self-organization did exist.
1.6 Robotology
From the Golem to astounding science fiction, robots have fascinated humanity. You could look on all the programs we have mentioned as part of the brain of a robot. To complete the design we need "only" to put them together and to attach the resulting program to some effectors and sensors. The "only" is justified because, once we had the program running, adding the necessary hardware would present technological, but not scientific problems. The quotes are justified because there is no assurance that the technological problems could be solved.
Modern robot studies follow along two general lines, which for lack of better names, we shall call "scientific" and "commercial." At times the scientific goal has seemed to be quite literally to shoot at the sky. At the 1965 Fall Joint Computer Conference, McCulloch stated that he was trying to design an object, weighing not more than 5 pounds, which could survive a landing on Mars and would have the perceptual capacities of a frog. Since then "Shakey," a robot (of sorts) designed at Stanford Research Institute, has been featured in the now-defunct Life magazine. Several scientists have implied that, given the necessary support, it would be possible to place some sort of complex automated explorer on the moon or a planet in the near future and, indeed, quite complex devices have already been placed there. Whether these are robots or not depends on your definition of the term. The commercial robot builders have been content to design simpler devices to function closer to home. Let us arbitrarily define the study of robots as the actual building of realistic devices, ruling out studies in which one builds toys to illustrate a certain principle that would be useful in an unspecified robot. We are aware of ubstantial robot design projects at three institutions—M.I.T., Stanford University, and Stanford Research Institute. Some work has also been done at the University of Edinburgh.
While many people have been involved, to an outsider the work clearly bears the stamp of McCarthy, Minsky, and their colleagues, in that it follows their engineering and mathematically oriented approach to artificial intelligence rather than the more psychological approach of Newell and Simon. The only exception to this approach has been in the design of the robot's visual system. Here psychological and physiological models have made a marked impact on robot design. Particularly noticeable aspects of the robot work are the reliance on list-processing techniques in computer programming and a tendency to view robot perception problems as problems in engineering rather than as simulations of biological perception.
Another characteristic of the scientific approach to robots has been the concern for general solutions to classes of problems, sometimes at the expense of clumsy solutions to specific problems. For example, in designing a robot to pick up blocks,
McCarthy and his group have concentrated on schemes for visual scene analysis and hand-eye coordination, where a commercial robot might have moved light beams of known intensity over a surface of known reflectance, assuming that the scene before it would always obey rigid restrictions. Despite the remarks about the utility of robots in exploring hostile environments, scientific robot projects have displayed only secondary concern for topics such as miniturization, protective packaging, and economy. In a panel session at the fall 1968 Joint Computer Conference, it was noted that the Stan do rd University robot had its brains in a PDP-10 computer, its ears in a conventional microphone, its single hand mounted on a fixed shaft, and its cyclops eye in a television camera. One word for such a system was "robot"; the speaker also suggested "kludge."11 This is not a damning criticism since the first
goal is to find general solutions. Practical techniques will come later. . . hopefully.
Commercial robot projects have taken exactly the opposite route. Obviously, it would be nice to have an economic machine that is as general-purpose an object manipulator as man is. This would be particularly true if the machine were less subject to heat, radiation, and overtime pay than is man. Such a device is not extant, nor will it be very soon. On the other hand, there are hundreds of thousands of jobs, now done by man, that can be automated by specifically engineered devices. One could consider these to be "special purpose manipulators," just as an analog computer is a special-purpose information processor. What commercial robot designers are attempting to do is to produce easily programmable general-purpose object manipulators. In doing so, the commercial designer must strike a trade-off between generality efficiency at specific tasks, and cost. The location of this optimum point will depend on the technology and the application, and will vary from time to time. Presentations at computer conferences in the late 1960s suggested to us that current commercially made robots are fairly general devices which cleverly and properly avoid difficult issues such as the processing of visual input by making it easy for a man to program new decisions into them as he "instructs" his robots to do a new task. For these reasons, little will be said about either commercial robots or about the technology needed to build the small computers, flexible mechanical arms, and T.V. cameras capable of surviving the bump of an extraterrestrial landing. When this restriction is placed, the robot problem becomes a specialization of problem solving, pattern recognition, and vision. It will be treated as such.
11 "A more or less functioning system created by mating components whose original designers had never intended to have used together."
End of the introductory section.