People like to invent conflicts within science. I hate this. I am from a totally opposed movement that thinks things are just the opposite of what this other movement says…
I believe most of the time these big disputes are mentioned, there is actually no dispute. Generally it’s just one group saying that there are two groups, and nobody actually fights for the other side. Also, it’s often a false dilemma. There is no real conflict, just orthogonal or unrelated choices.
One such naïve dichotomic fight that people like to describe is the famous one between reductionism and holism, about which I wrote here once. There are three more others I’ve been dealing with recently: the GOTO and structure programming controversy (wrote about it too), the Perceptrons and neural networks controversy, and finally the probability controversy (classical vs frequencialism vs bayesian). I want to write about all of them, specially because in all cases I defend the view that the conflict is fictitious.
The Perceptron is a very interesting entity that deserves the attention of all. The legend says that it was a great invention that were maliciously disregarded by Marvin Minsky, the merciless, who wanted to become the dominant alpha male of AI. This would have delayed scientific advance for decades. I will talk about how I find this concept ludicrous.
This is part one of the article. (It is huge, sorry.)
The best way to pinpoint the development of Neuronal Networks to a single research is to cite the article by McCulloch and Pitts, A Logical Calculus of Ideas Immanent in Nervous Activity (1943), an important article that should be browsed by all. Their work is frequently mentioned en passant in introductory courses on neural networks, but few people know well what they actually did, said and thought.
It is important to situate this work within its context. Turing’s famous article containing his machine and formalizing the idea of computation was published in 1936, and Shannon’s master thesis linking electronic circuit design to Boolean logic came in 1937. That was also the period when the first modern computers started to be built. Until then, the most complex computers were differential analysers, such as the one made by Vannevar Bush (1927). Some of the first computers and calculating machines built were the Z machines of Konrad Zuse (1936), the machines by Stibitz (1936), the ABC (1937), the Colossus (1943), the ASCC (1944) and the ENIAC (1944). Notice that many of these inventors did not know of parallel or previous similar researches. For example, I believe only Aiken (ASCC) was aware of the work of Babbage from the previous century.
Returning to Alan Turing, while his machine is important to help engineers to think about the concept of an automated computing machine, this was not the first reason Turing came up with it, and actually, engineers only started to care about machines being “Turing-complete” more recently (when they do at all). Turing and Alonzo Church were concerned with fundamental questions of mathematics. Their work solved the famous Entscheidungsproblem proposed by Hilbert in 1928, and were also explorations over the 1931 findings of Kurt Gödel, the incompleteness theorems. These mathematical works are related to the development of modern logic.
Logic started to be studied in the classical era, specially by the Greek Aristotle. Leibniz pursued some understandings in the 18th century, and later Boole and De Morgan gave birth to modern symbolic logic. The development continued through the 19th century with thinkers such as Frege, with his book Begriffsschrift in 1879, and Peirce. Peirce’s work was vast, but remained practically unknown for many years, and much of his findings were later rediscovered by others. The research on symbolic logic had its climax with the Vienna School, around the 1920’s, and with the publication of Bertrand Russell and Whitehead’s Principia Mathematica (1910—1913). Russell was the main responsible for the popularization of Frege’s work, and inspired a large number of scientists.
So, computers share some history with logic itself… Now let’s look at neurophysilogy.
I don’t want to get into the study of the whole brain or the mind, but just of the neurons, and simple phenomena. The most important moment of the study of neurons was the formulation of the “neuron doctrine”, that states that the neurons are the relevant cells that work the brain’s magic (or at least are the main players). And this enhancement of the importance of the neurons derived mainly from the works of Camillo Golgi and Santiago Ramón y Cajal, who shared the 1906 Nobel Prize in Physiology or Medicine. One interesting dispute at that time—a real dispute, unlike these false dilemmas I mentioned—is that Golgi devised a pioneering technique to observe neurons, but also proposed that neurons where connected in a reticular “continuum”. Cajal did further investigations, initially using Golgi’s technique, and correctly proposed that neurons were actually separate cells.
One other major landmark of neuron research was the study of the giant axon of the squid neuron, by Alan Hodgkin and Andrew Huxley published around 1952. This work helped to elucidate how electrical signals are transmitted through neurons. They received the Nobel Prize in 1963. The electric nature of the nervous system was known ever since the later 18th century: Galvani studied the animal electricity in 1783. Volta started to study the subject in 1791. He was already a researcher of electricity before that, and the battery he developed in 1800, the Voltaic pile, is a result of a dispute with Galvani about the nature of animal electricity. (Another real dispute…)
So, what I find interesting is that the study of the nervous system and of electricity walked more closely than sometimes it seems when we consider just how things are taught in school. The electrical nature of the nervous system was discovered together with other important electrical phenomena, as the battery. Later, when modern symbolic logic started to be devised, the neurons were discovered, and some decades later, when electrical calculating and “thinking” machines started to be developed based on the interaction of small logical elements, the logic gates, it was perhaps also the natural moment to consider how the neurons worked in group to do their work.
And that’s when I believe we get back to McCulloch and Pitts. That famous article, that I won’t pretend to understand too well, starts with considerations of physical behaviour observed in neurons, to then borrow some tools from researchers such as Carnap, Hilbert and Russel to show that a certain neuron model could originate an interesting logical calculus. The article considers networks with layers, cycles and whatnot.
The article is no easy reading, and has some passages I am not very fond of. But it has great moments, and there is one paragraph that I believe is very important, and that contemporary students and teachers of neural networks should definitely read:
One more thing is to be remarked in conclusion. It is easily shown: first, that every net, if furnished with a tape, scanners connected to afferents, and suitable efferents to perform the necessary motor-operations, can compute only such numbers as can a Turing machine; second, that each of the latter numbers can be computed by such a net; and that nets with circles can be computed by such a net; and that nets with circles can compute, without scanners and a tape, some of the numbers the machine can, but no others, and not all of them. This is of interest as affording a psychological justification of the Turing definition of computability and its equivalents, Church’s lambda-definability and Kleene’s primitive recursiveness: if any number can be computed by an organism, it is computable by these definitions, and conversely.
In this paragraph they state that their nets can be used to create systems that work just as Turing machines. A feed-forward network can do this with the help of an external memory, and a network with loops can even dismiss that. This is a pretty strong, important and exciting statement!...
One important smaller consequence of this is that their nets can also implement simple Boolean functions… I mean, any Boolean function. That includes the functions AND, OR, XOR and any other 22k of the possible functions with k inputs, given a sufficient number of neurons.
So, they found a simplified physical, or mathematical model for the neuron that ended up being sufficient to build a Logical Calculus for Nervous Activity. They cared about (or at least were aware of) Turing-completeness, and also considered all the first obvious analysis one should do when testing a certain element to create a network: they tried to analyse single neurons first, and then feed-forward, and the circular nets.
This is very important to accept, and is the centre of my critic. People talk about this article, and about others from the same period, as if they had done much less. People disregard older scientists, and think that they used to leash dogs with sausages. They didn’t.
To finish this article, I would just like to mention a couple of other important researches on the computational nature of networks of elements that were not simple logical gates, which is one of the many studied at the period and is much more famous than the others today because of its important theoretical rôle.
Shannon wrote with Moore (of the Moore machine fame) in 1956 an article called Reliable Circuits Using Less Reliable Relays. Von Neumann also studied a similar question. I have found here this interesting 1958 article on the subject, and it mentions not only these two researches, plus an article by McCulloch analysing von Neumann’s work. It’s interesting to give an idea of the feeling of the researchers back then on these questions.
Another researcher that studied such “network architectures” was Alan Turing himself. This work is the subject of a 2001 book that I want for Christmas. Turing’s networks were composed by two-variable NAND gates (the powerful ampheck), and the connections could be changed by an external agent to perform learning. It is a Boolean function made from smaller gates with feed-forward connections, but he considered random connections and the learning question. He studied this in 1948, but never published.
I believe this all gives a nice idea of what research on computing and artificial intelligence was in the 1950s. Next article I will talk specifically about the controversy involving Rosenblatt’s Perceptrons, and Marvin Minsky, which is in my opinion a great reasearcher who is much criticized for no reason.
This is a bitty review of the historical narrative but I am glad you are bringing it to people’s attention.
I agree that there is much work done before 1950 that merits deeper consideration. Frankly I find papers by logicians and mathematicians from that period (1850-1950) better produced and more clear than contemporary papers. They should not be hard to read, and I think that it is a sad observation that many today think that this is so.
Turing’s work is important because it shapes our contemporary view so, but I think you are over excited about its implications – compounding half a century of over excitement about its implications.
There are, in fact, distinctions within the views of logical evaluation, primarily between the integrative view championed by Turing and others, and the “differentiated from the whole” view championed by Carnap et al. This conflict is real and has a profound impact upon the interpretation of neurophysiological behavior.
Despite the mention of Carnap above, McCulloch and Pitts make use of Carnap’s Syntax without appreciating his deeper insights. Carnap himself attempted to solve these broader problems with inductive logic and probability theory, not the deductive atomic logic of the models you suggest.
Hi. Thanks for your attention and comment.
It is certainly bitty. As an enthusiast of information theory, I like to see bits everywhere!... :)
As I intend to write in my next article, my main criticism is towards people who believe that the discovery presented in Perceptrons was that a single McCulloch-Pitts neuron cannot compute some boolean functions such as XOR, and that because of that they would be useless. Then in the 1980s they would have finally started to consider multi-layered units, becoming able to compute the XOR function and then “avenging” the ANNs.
To know this is not true, we don’t even need to read Minsky’s book, all we need to do is look at the work of these scientists back then, and see that they already studied complicated networks, and that they knew very well that if a unit had the power to do a few specific logical functions, than a network of units could do any boolean function. More than that, plugging such a network in a proper memory can give us a “symbolic computer”, something wrongly considered entirely different from an ANN. So it is very unlikely that a sane scientist ever considered that the inability of a single neuron to implement a certain function could lead to a deficiency in networks of such elements. What Minsky did was consider more complicated efficiency issues that were never countered.
In other words, I want to make people see that before the 1970s people knew much better, and wouldn’t make careless claims about what a network of non-linear modules can do by looking only at a single module, as Minsky is accused to have done. If people knew the older researches better, they would be forced to understand the newer ones properly.
Now, regarding this difference between Turing and Carnap, with the integrative and “differentiated from the whole” views, I am very excited because you mentioned that, because I don’t have the slightest idea of what you are talking about!! :) At least not with these words. I would be glad if you could point me / us to a nice reference about this subject. You mentioned inductive logic, does it has to do with non-monotonic logic as a whole?
I hope I did not imply somewhere that I have an opinion regarding a subject I don’t know well!!... :)
I apologize that I have not had the time to respond here, and this must be brief.
It’s hardly surprising that an XOR mechanism cannot be identified, but probably not for the reasons you suppose. In terms of conventional mechanisms in any case, you should know that all boolean operators can be constructed from AND and NOT (see Hilbert and Ackermann).
Yes, the point is that not only I know that e.g. NAND and {AND, NOT} are sufficient operator sets, but people back in the 1950s also knew that, and because of that they would never drop Perceptrons just because a single unit cannot implement XOR. Since they knew these neuronal units are capable of implementing operations such as AND, NAND, NOT and OR, that in group are powerful enough, they would know how to implement any function building a “network”, including the infamous XOR.
The problem could not have been this (making XOR with a single Perceptron), beucase the generality of Perceptron networks is a bit obvious. Every student should be able to question their teachers when they suggest in class that this was the case… The main problem was that researchers were claiming that they could do wonders with Perceptrons that had “local” inputs, and Minsky’s book proves that for some feed-forward three-layered networks, there were some functions that needed neurons connected to ALL inputs… A much more subtle and complicated issue…