This article continues my series on major conscious and unconscious processes in the brain. In my last two posts I have talked about 8 major unconscious processes in the brain viz sensory, motor, learning , affective, cognitive (deliberative), modelling, communications and attentive systems. Today, I will not talk about brain in particular, but will approach the problem from a slightly different problem domain- that of modelling/implementing an artificial brain/ mind.
I am a computer scientist, so am vaguely aware of the varied approaches used to model/implement the brain. Many of these use computers , though not every approach assumes that the brain is a computer.
Before continuing I would briefly like to digress and link to one of my earlier posts regarding the different traditions of psychological research in personality and how I think they fit an evolutionary stage model . That may serve as a background to the type of sweeping analysis and genralisation that I am going to do. To be fair it is also important to recall an Indian parable of how when asked to describe an elephant by a few blind man each described what he could lay his hands on and thus provided a partial and incorrect picture of the elephant. Some one who grabbed the tail, described it as snake-like and so forth.
With that in mind let us look at the major approaches to modelling/mplementing the brain/intelligence/mind. Also remember that I am most interested in unconscious brain processes till now and sincerely believe that all the unconscious processes can, and will be successfully implemented in machines. I do not believe machines will become sentient (at least any time soon), but that question is for another day.
So, with due thanks to @wildcat2030, I came across this book today and could immediately see how the different major approaches to artificial robot brains are heavily influenced (and follow) the evolutionary first five stages and the first five unconscious processes in the brain.
The book in question is ‘Robot Brains: Circuits and Systems for Conscious Machines’ by Pentti O. Haikonen and although he is most interested in conscious machines I will restrict myself to intelligent but unconscious machines/robots.
The first chapter of the book (which has made to my reading list) is available at Wiley site in its entirety and I quote extensively from there:
Presently there are five main approaches to the modelling of cognition that could be used for the development of cognitive machines: the computational approach (artificial intelligence, AI), the artificial neural networks approach, the dynamical systems approach, the quantum approach and the cognitive approach. Neurobiological approaches exist, but these may be better suited for the eventual explanation of the workings of the biological brain.
The computational approach (also known as artificial intelligence, AI) towards thinking machines was initially worded by Turing (1950). A machine would be thinking if the results of the computation were indistinguishable from the results of human thinking. Later on Newell and Simon (1976) presented their Physical Symbol System Hypothesis, which maintained that general intelligent action can be achieved by a physical symbol system and that this system has all the necessary and sufficient means for this purpose. A physical symbol system was here the computer that operates with symbols (binary words) and attached rules that stipulate which symbols are to follow others. Newell and Simon believed that the computer would be able to reproduce human-like general intelligence, a feat that still remains to be seen. However, they realized that this hypothesis was only an empirical generalization and not a theorem that could be formally proven. Very little in the way of empirical proof for this hypothesis exists even today and in the 1970s the situation was not better. Therefore Newell and Simon pretended to see other kinds of proof that were in those days readily available. They proposed that the principal body of evidence for the symbol system hypothesis was negative evidence, namely the absence of specific competing hypotheses; how else could intelligent activity be accomplished by man or machine? However, the absence of evidence is by no means any evidence of absence. This kind of ‘proof by ignorance’ is too often available in large quantities, yet it is not a logically valid argument. Nevertheless, this issue has not yet been formally settled in one way or another. Today’s positive evidence is that it is possible to create world-class chess-playing programs and these can be called ‘artificial intelligence’. The negative evidence is that it appears to be next to impossible to create real general intelligence via preprogrammed commands and computations.
The original computational approach can be criticized for the lack of a cognitive foundation. Some recent approaches have tried to remedy this and consider systems that integrate the processes of perception, reaction, deliberation and reasoning (Franklin, 1995, 2003; Sloman, 2000). There is another argument against the computational view of the brain. It is known that the human brain is slow, yet it is possible to learn to play tennis and other activities that require instant responses. Computations take time. Tennis playing and the like would call for the fastest computers in existence. How could the slow brain manage this if it were to execute computations?
The artificial neural networks approach, also known as connectionism, had its beginnings in the early 1940s when McCulloch and Pitts (1943) proposed that the brain cells, neurons, could be modelled by a simple electronic circuit. This circuit would receive a number of signals, multiply their intensities by the so-called synaptic weight values and sum these modified values together. The circuit would give an output signal if the sum value exceeded a given threshold. It was realized that these artificial neurons could learn and execute basic logic operations if their synaptic weight values were adjusted properly. If these artificial neurons were realized as hardware circuits then no programs would be necessary and biologically plausible artificial replicas of the brain might be possible. Also, neural networks operate in parallel, doing many things simultaneously. Thus the overall operational speed could be fast even if the individual neurons were slow. However, problems with artificial neural learning led to complicated statistical learning algorithms, ones that could best be implemented as computer programs. Many of today’s artificial neural networks are statistical pattern recognition and classification circuits. Therefore they are rather removed from their original biologically inspired idea. Cognition is not mere classification and the human brain is hardly a computer that executes complicated synaptic weight-adjusting algorithms.
The human brain has some 10 to the power of 11 neurons and each neuron may have tens of thousands of synaptic inputs and input weights. Many artificial neural networks learn by tweaking the synaptic weight values against each other when thousands of training examples are presented. Where in the brain would reside the computing process that would execute synaptic weight adjusting algorithms? Where would these algorithms have come from? The evolutionary feasibility of these kinds of algorithms can be seriously doubted. Complicated algorithms do not evolve via trial and error either. Moreover, humans are able to learn with a few examples only, instead of having training sessions with thousands or hundreds of thousands of examples. It is obvious that the mainstream neural networks approach is not a very plausible candidate for machine cognition although the human brain is a neural network.
Dynamical systems were proposed as a model for cognition by Ashby (1952) already in the 1950s and have been developed further by contemporary researchers (for example Thelen and Smith, 1994; Gelder, 1998, 1999; Port, 2000; Wallace, 2005). According to this approach the brain is considered as a complex system with dynamical interactions with its environment. Gelder and Port (1995) define a dynamical system as a set of quantitative variables, which change simultaneously and interdependently over quantitative time in accordance with some set of equations. Obviously the brain is indeed a large system of neuron activity variables that change over time. Accordingly the brain can be modelled as a dynamical system if the neuron activity can be quantified and if a suitable set of, say, differential equations can be formulated. The dynamical hypothesis sees the brain as comparable to analog feedback control systems with continuous parameter values. No inner representations are assumed or even accepted. However, the dynamical systems approach seems to have problems in explaining phenomena like ‘inner speech’. A would-be designer of an artificial brain would find it difficult to see what kind of system dynamics would be necessary for a specific linguistically expressed thought. The dynamical systems approach has been criticized, for instance by Eliasmith (1996, 1997), who argues that the low dimensional systems of differential equations, which must rely on collective parameters, do not model cognition easily and the dynamicists have a difficult time keeping arbitrariness from permeating their models. Eliasmith laments that there seems to be no clear ways of justifying parameter settings, choosing equations, interpreting data or creating system boundaries. Furthermore, the collective parameter models make the interpretation of the dynamic system’s behaviour difficult, as it is not easy to see or determine the meaning of any particular parameter in the model. Obviously these issues would translate into engineering problems for a designer of dynamical systems.
The quantum approach maintains that the brain is ultimately governed by quantum processes, which execute nonalgorithmic computations or act as a mediator between the brain and an assumed more-or-less immaterial ‘self’ or even ‘conscious energy field’ (for example Herbert, 1993; Hameroff, 1994; Penrose, 1989; Eccles, 1994). The quantum approach is supposed to solve problems like the apparently nonalgorithmic nature of thought, free will, the coherence of conscious experience, telepathy, telekinesis, the immortality of the soul and others. From an engineering point of view even the most practical propositions of the quantum approach are presently highly impractical in terms of actual implementation. Then there are some proposals that are hardly distinguishable from wishful fabrications of fairy tales. Here the quantum approach is not pursued.
The cognitive approach maintains that conscious machines can be built because one example already exists, namely the human brain. Therefore a cognitive machine should emulate the cognitive processes of the brain and mind, instead of merely trying to reproduce the results of the thinking processes. Accordingly the results of neurosciences and cognitive psychology should be evaluated and implemented in the design if deemed essential. However, this approach does not necessarily involve the simulation or emulation of the biological neuron as such, instead, what is to be produced is the abstracted information processing function of the neuron.
A cognitive machine would be an embodied physical entity that would interact with the environment. Cognitive robots would be obvious applications of machine cognition and there have been some early attempts towards that direction. Holland seeks to provide robots with some kind of consciousness via internal models (Holland and Goodman, 2003; Holland, 2004). Kawamura has been developing a cognitive robot with a sense of self (Kawamura, 2005; Kawamura et al., 2005). There are also others. Grand presents an experimentalist’s approach towards cognitive robots in his book (Grand, 2003).
A cognitive machine would be a complete system with processes like perception, attention, inner speech, imagination, emotions as well as pain and pleasure. Various technical approaches can be envisioned, namely indirect ones with programs, hybrid systems that combine programs and neural networks, and direct ones that are based on dedicated neural cognitive architectures. The operation of these dedicated neural cognitive architectures would combine neural, symbolic and dynamic elements.
However, the neural elements here would not be those of the traditional neural networks; no statistical learning with thousands of examples would be implied, no backpropagation or other weight-adjusting algorithms are used. Instead the networks would be associative in a way that allows the symbolic use of the neural signal arrays (vectors). The ‘symbolic’ here does not refer to the meaning-free symbol manipulation system of AI; instead it refers to the human way of using symbols with meanings. It is assumed that these cognitive machines would eventually be conscious, or at least they would reproduce most of the folk psychology hallmarks of consciousness (Haikonen, 2003a, 2005a). The engineering aspects of the direct cognitive approach are pursued in this book.
Now to me these computational approaches are all unidimensional-
- The computational approach is suited for symbol-manipulation and information-represntation and might give good results when used in systems that have mostly ‘sensory’ features like forming a mental represntation of external world, a chess game etc. Here something (stimuli from world) is represented as something else (an internal symbolic represntation).
- The Dynamical Systems approach is guided by interactions with the environment and the principles of feedback control systems and also is prone to ‘arbitrariness’ or ‘randomness’. It is perfectly suited to implement the ‘motor system‘ of brain as one of the common features is apparent unpredictability (volition) despite being deterministic (chaos theory) .
- The Neural networks or connectionsim is well suited for implementing the ‘learning system’ of the brain and we can very well see that the best neural network based systems are those that can categorize and classify things just like ‘the learning system’ of the brain does.
- The quantum approach to brain, I haven’t studied enough to comment on, but the action-tendencies of ‘affective system’ seem all too similar to the superimposed,simultaneous states that exits in a wave function before it is collapsed. Being in an affective state just means having a set of many possible related and relevant actions simultaneously activated and then perhaps one of that decided upon somehow and actualized. I’m sure that if we could ever model emotion in machine sit would have to use quantum principles of wave functions, entanglemnets etc.
- The cognitive approach, again I haven’t go a hang of yet, but it seems that the proposal is to build some design into the machine that is based on actual brain and mind implemntations. Embodiment seems important and so does emulating the information processing functions of neurons. I would stick my neck out and predict that whatever this cognitive approach is it should be best able to model the reasoning and evaluative and decision-making functions of the brain. I am reminded of the computational modelling methods, used to functionally decompose a cognitive process, and are used in cognitive science (whether symbolic or subsymbolic modelling) which again aid in decision making / reasoning (see wikipedia entry)
Overall, I would say there is room for further improvement in the way we build more intelligent machines. They could be made such that they have two models of world – one deterministic , another chaotic and use the two models simulatenously (sixth stage of modelling); then they could communicate with other machines and thus learn language (some simulation methods for language abilities do involve agents communicating with each other using arbitrary tokens and later a language developing) (seventh stage) and then they could be implemented such that they have a spotlight of attention (eighth stage) whereby some coherent systems are amplified and others suppressed. Of course all this is easier said than done, we will need at least three more major approaches to modelling and implementing brain/intelligence before we can model every major unconscious process in the brain. To model consciousness and program sentience is an uphill task from there and would definitely require a leap in our understandings/ capabilities.
Do tell me if you find the above reasonable and do believe that these major approaches to artificial brain implementation are guided and constrained by the major unconscious processes in the brain and that we can learn much about brain from the study of these artificial approaches and vice versa.
In the last post I had wondered about the clustering based solution to categorization and how they may also inform us about how memory (semantic variety) is stored in brain, as semantic memory is best modeled by an associational or confectionist network.
Thus, a semantic memory based on clustering models may consist of associations between clusters or categories of information. For example one cluster may correspond to the names of countries and another to name of cities. A particular type of connection or association between these two clusters may map a relation of —-IS A CAPITAL OF —- type where for example the fact that Paris is the capital of France is stored. For this knowledge to exist, one has to have prior notions of France is a Country and Paris is a City and on top of that an associational relation between the individual entities France and Paris belonging to particular clusters.
Much of this would be more apparent once relational models of categorization are also covered. For now let us assume that (semantic) memory itself may consist of clusters of neurons that are also interconnected. Interestingly one such neural architecture, that has also been able to simulate short-term memory has been the small-world network model. In this a large number of nodes (neurons ) are connected by edges (synapses) as in a typical random graph. These small-world networks are special in the sense that they have high clustering coefficients and low mean path length. Translated in English, this means they exhibit more than chance clustering (to enhance local processing) as well as display a small value of smallest mean path length (reflecting ease of global processing).
It is intriguing thta in the short term memory model using small-world networks simulation, the researchers found that the model could exhibit bistability, which may be crucial for memory formation. In bistability, the cluster or functional region corresponding to a particular memory can be in two states, depending on an input variable. Thus, a pulse (direction of attention) can activate/ deactivate a memory.
Crucially, it can be hypothesized that as the small-world network model of memory/ categorization is good for local-global processing as well as reflective of the actual brain and AI simulation architectures, the entire brain is a small-world network adequately categorizing and representing the sensory, motor and cognitive information and processing them.
A recent MEG based study has established the fact that the small-world network topology exists in functional sphere in the brain at all oscillatory levels (crucial for binding) and that seems very promising.
Artificial Neural Networks have historically focussed on modeling the brain as a collection of interconnected neurons. The individual neurons aggregate inputs and either produce an on/off output based on threshold values or produce a more complex output as a linear or sigmoid function of their inputs. The output of one neuron may go to several other neurons.
Not all inputs are equivalent and the inputs to the neuron are weighed according to a weight assigned to that input connection. This mimics the concept of synaptic strength. The weights can be positive (signifying an Excitatory Post-Synaptic Potential ) or negative (signifying an Inhibitory Post-Synaptic Potential).
Learning consists of the determination of correct weights that need to be assigned to solve the problem; i.e. to produce a desired output, given a particular input. This weight adjustment mimics the increase or decrease of synaptic strengths due to learning. Learning may also be established by manipulating the threshold required by the neuron for firing. This mimics the concept of long term potentiation (LTP).
The model generally consists of an input layer (mimicking sensory inputs to the neurons) , a hidden layer (mimicking the association functions of the neurons in the larger part of the brain) and an output layer ( mimicking the motor outputs for the neurons).
This model is a very nice replication of the actual neurons and neuronal computation, but it ignores some of the other relevant features of actual neurons:
1. Neuronal inputs are added together through the processes of both spatial and temporal summation. Spatial summation occurs when several weak signals are converted into a single large one, while temporal summation converts a rapid series of weak pulses from one source into one large signal. The concept of temporal summation is generally ignored. The summation consists exclusively of summation of signals from other neurons at the same time and does not normally include the concept of summation across a time interval.
2. Not all neuronal activity is due to external ‘inputs’. Many brain regions show spontaneous activity, in the absence of any external stimulus. This is not generally factored in. We need a model of brain that takes into account the spontaneous ‘noise’ that is present in the brain, and how an external ‘signal’ is perceived in this ‘noise’. Moreover, we need a model for what purpose does this ‘noise’ serve?
3. This model mimics the classical conditioning paradigm, whereby learning is conceptualized in terms of input-output relationships or stimulus-response associations. It fails to throw any light on many operant phenomenon and activity, where behavior or response is spontaneously generated and learning consist in the increase\decrease \ extinction of the timing and frequency of that behavior as a result of a history of reinforcement. This type of learning accounts for the majority of behavior in which we are most interested- the behavior that is goal directed and the behavior that is time and context and state-dependent. The fact that a food stimulus, will not always result in a response ‘eat’, but is mediated by factors like the state (hunger) of the organism, time-of-day etc. is not explainable by the current models.
4. The concept of time, durations and how to tune the motor output as per strict timing requirements has largely been an unexplored area. While episodic learning and memory may be relatively easier to model in the existing ANNs, its my hunch that endowing them with a procedural memory would be well nigh impossible using existing models.
Over a series of posts, I would try to tackle these problems by enhancing the existing neural networks by incorporating some new features into it, that are consistent with our existing knowledge about actual neurons.
First, I propose to have a time-threshold in each neural unit. This time-threshold signifies the duration in which temporal summation is applicable and takes place. All inputs signals, that are received within this time duration, either from repeated firing of the same input neuron or from time-displaced firings of different input neurons, are added together as per the normal input weights and if at any time this reaches above the normal threshold-for-firing, then the neuron fires. This has combined both temporal and spatial summation concepts. With temporal summation, we have an extra parameter- the time duration for which the history of inputs needs to be taken into account.
All neurons will also have a very short-term memory, in the sense that they would be able to remember the strengths of the inputs signals that they have received in the near past , that is in the range of the typical time-thresholds that are set for them. This time-threshold can typically be in milliseconds.
Each time a neuron receives an input, it starts a timer. This timer would run for a very small duration encoded as the time-threshold for that neuron. Till the time this timer is running and has not expired, the input signal is available to the neuron for calculation of total input strength and for deciding whether to fire or not. As soon as the timer expires, the memory of the associated input is erased from the neurons memory and that particular input would no longer be able to affect any future firing of the neuron.
All timers as well as the memory of associated input signals are erased after each successful neural firing (every time the neuron generates an action potential). After each firing, the neuron starts from afresh and starts accumulating and aggregating the inputs it receives thereafter in the time-threshold window that is associated with it.
Of course there could be variations to this. Just like spatial aggregation/firing need not be an either/or decision based on a threshold; the temporal aggregation/ firing need not be an either-or decision: one could have liner or sigmoid functions of time that modulate the input signal strength based on the time that has elapsed. One particular candidate mechanism could be a radioactive decay function, that decreases the input signal strength by half after each half-life. Here, the half-life is equivalent to the concept of a time-threshold. While in the case of time-threshold, after a signal arrives, and once the time-threshold has elapsed, then the input signal is not available to the neuron at all, and while the time-threshold had not elapsed the signal was available in its entirety; in the case of radioactive deacy the inpiut signal is available till infinity in theory; but the strength of the signal would get diminisehd by half after each half-life period; thus making the effects of the input signal negligible after a few half-lives. Of course in the radioactive case too, once the neuron has fired, all memory of that input would be erased and any half-life decay computations stopped.
These are not very far-fetched speculations and modeling the neural networks this way can lead to many interesting results.
Second, I propose to have some ‘clocks’ or ‘periodic oscillators’ in the network, that would be generating spontaneous outputs after a pre-determined time and irrespective of any inputs. Even one such clock is sufficient for our discussions. Such a clock or oscillator system is not difficulty to envisage or conceive. We just need a non-random, deterministic delay in the transmission of signals from one neuron to the other. There do exist systems in the brain that delay the signals, but leaving aside such specialized systems, even a normal synaptic transmission along an axon between two neurons, would suffer from some deterministic delay based on the time it takes the signal to travel down the axon length and assuming that no changes in myelination takes place over time, so that the speed of transmission is constant.
In such a scenario, the time it takes for a signal to reach the other neuron, is held constant over time. (Note that this time may be different for different neuron pairs based on both the axon lengths involved and the associated myelination, but would be same for the same neuron pair over time). Suppose that both the neurons have very long, unmyelinated axons and that these axons are equal in length and provide inputs to each other. Further suppose that both the neurons do not have any other inputs , though each may send its output to many other neurons.
Thus, the sole input of the first neuron is the output of the second neuron and vice versa. Suppose that the thresholds of the two neurons are such that each would trigger, if it received a single input signal (from the peer neuron). As there would be a time lag between the firing of neuron one, and its reaching the second neuron, the second neuron would fire only after, say 5 milliseconds, the time it takes for signal to travel, after the first neuron has fired. The first neuron meanwhile will respond to the AP generated by the second neuron -which would reach it after (5+5= 10 ms) the round trip delay- and generate an AP after 10 ms from its initial firing.
We of course have to assume that somehow, the system was first put in motion: someone caused the first neuron to fire initially (this could not be other neurons, as we have assumed that this oscillator pair has no external input signals) and after that it is a self-sustaining clock with neuron 1 and neuron 2 both firing regularly at 10 ms intervals but in opposite phases. We just need GOD to initally fire the first neuron (the park of life) and thereafter we do have a periodic spontaneous activity in the system.
Thirdly, I propose that this ‘clock’, along with the concept of temporal summations, is able to calculate and code any arbitrary time duration and any arbitrary time dependent behavior, but in particular any periodic or sate/ goal based behavior. I’ve already discussed some of this in my previous posts and elaborate more in subsequent posts.
For now, some elementary tantalizing facts.
1. Given a 10 ms clock and a neuron capable of temporal summation over 50 ms duration, we can have a 50 ms clock: The neuron has the sole input as the output of the 10ms clock. After every 50 ms, it would have accumulated 5 signals in its memory. If the threshold-for-firing of the neuron is set such that it only fires if it has received five time the signal strength that is outputted by the 10 ms clock , then this neuron will fire after very 50 ms. This neuron would generate a periodic output after every 50 ms and implements a 50 ms clock.
2. Given a 10 ms clock and a neuron capable of temporal summation over 40 ms, (or lets have the original 50 ms time-threshold neuron, but set its threshold-for-firing to 4 times the output strength of the 10 ms clock neuron) , using the same mechanism as defined above, we can have a 40 ms clock.
3. Given a 40 ms clock, a 50 ms clock and a neuron that does not do temporal summation, we can have a 2000 ms clock. The sole inputs to the neuron implementing the 2000 ms clock are the outputs of the 50 ms and the 40 ms clock. This neuron does not do temporal summation. Its threshold for firing is purely spatial and it fires only if it simultaneously receives a signal strength that is equal to or greater than the combined output signal strength of 50ms and 40 ms neuron. It is easy to see, that if we assume that the 50 ms and 40 ms neurons are firing in phase, then only after every 2000 ms would the signals from the two neurons arrive at the same time for this 2000ms clock. Viola, we have 2000 ms clock. After this, I assume, its clear that the sky is the limit as to the arbitrariness of the duration that we can code for.
Lastly, learning consists of changing the temporal thresholds associated with a neuron, so that any arbitrary schedule can be associated with a behavior, based on the history of reinforcement. After the training phase, the organism would exhibit spontaneous behavior that follows a schedule and could learn novel schedules for novel behaviors (transfer of learning).
To me all this seems very groundbreaking theorizing and I am not aware of how and whether these suggestions/ concepts have been incorporated in existing Neural Networks. Some temporal discussions I could find here. If anyone is aware of such research , do let me know via comments or by dropping a mail. I would be very grateful. I am especially intrigued by this paper (I have access to abstract only) and the application of temporal summation concepts to hypothalamic reward functions.