Category Archives: intelligence

Moral Intuitions (alternate title : Who framed roger rabbit?)

Disclaimer: Haven’t seen the movie “Who framed roger rabbit”, nor know the storyline- just used the alternate title as it is eye-catching:-))

Classical Moral intuitions research has focused on identifying how we arrive at moral conclusions. The Kohlberg’s developmental theory is based around identifying the reasoning process, by which, the children arrive at a moral decision regarding a moral dilemma; or identifying an action that would be ethical in a given situation; or forming a moral judgment regarding a given event-outcome.

Much of the discourse is limited by the few example problems around which these dilemmas are framed. A good example is the famous Trolley problem, in which one has to decide whether it would be worth sacrificing a single person, in lieu of five or six others; and its variations involving whether one is in direct contact with the person and is performing an active action of ‘sacrificing’ the person by pushing him/her from the footbridge; or is merely a bystander and passively (from a distance) pulling a switch that would direct the trolley to a different track. Variations include whether the person (who if sacrificed could save five or six others) is related to you, or whether he is innocent (a child playing on an unused track) vis-a-vis those being sacrificed are careless and thus not worth saving ( stupid children playing on running tracks).

While some framing of this Trolley problem are in utilitarian terms- one life versus many others, other framings are in emotional & selfish versus sacrificial & rational terms -your child or your action vs other children and universal action (by universal action I mean the same action irrespective of whether you are in touch with the person (the footbridge case) or are merely pulling a lever).

The framing involving ‘good/ careful’ vs. ‘bad/careless’ in the good-boy-on-unused-track and bad-boys-on-used-tracks fascinates me the most.

At the outset, let me clarify that in regards to moral dilemmas of this sort, my personal position is reasonably clear. In a discussion some years back with some good friends (not over a cup of coffee; but over an intranet discussion group:-) , while we were discussing this dilemma, I had surmised that while we may debate endlessly what the action should be, the most reasonable guess one can make is that there would be no action at all. In the Trolley switch case, this means that the person my get so much frozen by the decision pressure and inability to arrive at a conclusion, that he/she may not pull the switch at all (the switch that would direct the train/ trolley to the unused track ). Instead, he may just remain frozen- just like one gets frozen sometimes in times of extreme fear- a third reaction apart from the usual fight or flight response. Yet, dilemmas, such as these, and our ‘hypothetical’ responses to these may somehow tell us more about how we reason about moral situations- whether it is post hoc (just like it is claimed that Consciousness is post hoc)- and if so, why would we be constructing different post-hoc moral reasons for the same dilemma when it is framed in different terms. (Hauser’s research shows that the intuitions are different in the classical trolley (switch) versus the personal contact (footbridge) cases.)

Marc Hauser’s lab is doing some excellent research in this field and though I have taken their Moral Sense Test, I have a feeling that I have stumbled on a new type of framing and dilemma (that was not present in their tests…though one can never be sure:0) that may enable us to reflect a bit more on our moral reasoning process.

I’ll frame it first in neutral terms, and then try to refine it further. Let’s call this the Aeroplane problem. Suppose that you are traveling in an Aeroplane, and there is only one doctor present on board, and the Air hostess staff is not sufficiently educated in all first aids. Suppose further that you are way above ground, with any emergency landing at least 20 minutes distant. Suppose, that their are two people on the Airplane, who start getting a third heart attack (they are both carrying medical histories/ badges that tell that it is the third and potentially fatal heart attack (BTW, why is the myth of 3rd heart attack being fatal so enduring?) ), and the heart attacks are almost simultaneous, and only the lone doctor on board can give them the first-aid and resuscitation (CPR) that could ensure that they both remain alive, till the airplane makes an emergency landing (the emergency landing may itself risk the life of all passengers slightly). Now, when all other details are unknown, it is potentially futile to ask which one to attend- you may as well choose one patient and concentrate all efforts on him/her.

Suppose, one of them is an octogenarian, while the other is a teenager. Now, which one should the doctor choose? Suppose one is an old lady, while the other is a young brat, which one should the doctor choose?

Suppose the Doctor has Asthma, and no body else knows how to administer the oral inhalation medicine correctly except for the doctor; then should the doctor take care of a patient or should he/she take care of himself/herself? what if there is only one patient and one doctor? What if there is one doctor and many patients? Would the decision be easy?

Suppose further, that out of the two persons, one is faking heart attack symptoms, while the other is genuinely suffering; should the doctor be able to find out who is who? Would this make the dilemma easier? Would we (the airplane travelers) respect the doctor’s decision and let him /her attend to the person s/he thinks is genuinely suffering from heart attack?

Suppose further, that both the patients are terrorists and the doctor says that both are faking symptoms, potentially to hijack the plane; would we listen to the doctor and let him not attend to any of the potential causalities? Or would we try to help ourselves, potentially causing bedlam and fulfilling the plans of the terrorists?

I am sure by now you can conceive of other similar scenarios!! (one that comes to my mind is both the doctor and patient are accomplices and terrorists on-board to cause bedlam and mayhem and hijack the plane. Please let’s add as many scenarios in the comments as possible.)

Now let us take a moment to reflect on our moral reasoning process. I believe most of us would be prone to go with our intuitions and would think about rationalizing our decisions later. Thank god, we do have some moral intuitions to guide us in time of indecision/ threat perception.

Suppose that instead of framing the last few scenarios in an anxiety provoking setting (involving terrorists and what-nots), we framed this in terms of forward-looking, futuristic terms.

Suppose that one of the patients is a very promising child (has an IQ of 200/ or is a sport prodigy and is as well-known as Sania Mirza) while the other is again a famous scientist indulging in some ground-breaking research (Say Marie Curie, whose Radioactivity discovery is definitely a very useful discovery); then who should the doctor choose? Should she look at their achievements or potentials? Or should she remain immune to all this and dispassionately ignore all (ir)relevant information? or should s/he be affected by age, gender, race, achievement, potential etc?

Suppose further that instead of well-known celebrities like Abdul Kalam , or Sachin Tendulkar, who are present in the plane, the younger patient is a product of genetic engineering, destined to become a great scientist/ artist/ whatever; while the older patient is working on a top-secret classified dual use research which potentially could help humanity overcome the impending fuel crisis (and related arctic melting, ozone hole etc crisis-she is working on a hydrogen powered (water as fuel) engine, which could be used in automobiles as well as in outer Space like Mars, where only water may be available for refueling). Also, both these persons are not well-known currently and not recognizable by the doctor/ crew/ passengers. Death of the older person would put humanity back by at least 40 years- only after 40 years would someone like the younger patient that the doctor saved (in case the doctor let the older patient die), could have worked out the designs for using water as a fuel again. Now which one should the doctor attend to? Should s/he attend to the young one or the old one? The future or the present?

Should she take the time out to see the credentials (the proof that this child is genetically modified to have a good IQ/ whatever and the proof that this scientist is indeed working on classified research that may potentially help millions) of the patients or should she just act on her intuitions? Why is the reasoning different here as compared to the threat-scenario?

What if the instead of Science frames above, we used frames of Art(I mean artistic frames and not the frames that visual artists use for paintings:-)….Art is much more than visual art:-).

Suppose, that one of them (the older one) could become a Paul Gauguin; while the other (younger one) could become a Van Gogh (again I mean an artist like Gogh and Gauguin, not their works of arts:-) ), now which one should the doctor choose? Why does it become irrelevant as to who should be saved if the frame is of Art, but a question of life-and-death if the frame is of Science?

Finally, some things to note and think about: the Airplane problem is entirely framed in life-saving context (doctor helping save a life); while the Trolley problem is entirely in death-prevention context (someone acting messiah and preventing death of five Vs One; good vs careless etc). Again, Doctors usually give rise to feminine frames with one assuming a doctor to be a female; while the Foreman’s are usually entirely male. I hardly believe that framing is all of the problem; or that the framing is done deliberately: the framer of the problems/ dilemmas is equally susceptible to the same framing effects that the readers have experienced-while formulating a problem (a moral dilemma) one may fall prey to the same sorts of Frames that we become susceptible to when thinking about the problems (the moral dilemmas). Thus, the aphorisms, that (paraphrasing) “It is equally important to ask the right questions, as it is to find the answers to the problems”. Translated in the language of the scientific research world, this becomes that “it is important to design good experiments/ observation-study-setups and be very careful about the study designs.”

Returning back to the issue of framing of moral problems, if the frame exists it is also because of our history: just like the moral intuitions – that at times help us survive and at times let us fall prey to frames- are due to our shared evolutionary history: so too the frames we use to cast and perceive the moral dilemmas are rooted in our history ( Nothing profound- what I mean by shared history is that someone formulated the problems in those terms, silly!!.)

I believe the problem is more with our inability to detach ourselves form frames and take more reasonable perspectives and know when to use our intuitions and when reason. As the saying goes “It is by the fortune of God that, in this country, we have three benefits: freedom of speech, freedom of thought, and the wisdom never to use either.” Mark Twain (1835-1910). Alternately, another related saying that comes to mind(paraphrasing) ” God, give us the ability to change what we can, humility to accept what we cannot and the wisdom to know what is what”. We perhaps cannot change the historical frames or intuition that are in place, but we can definitely change our moral reasoning powers and following a developmental framework have compassion and understanding towards those who might not be employing the highest levels of moral reasoning.

Finally, If you are interested in my moral intuitions, I hypothesize, that the doctor (in the plane) would not be affected by Age, gender, race, potential, achievement etc would overcome his/ her Implicit Associations and would not try to find-out or gather-information deliberately to determine which life is more valuable- He/she would end up rushing between the patients and helping both at the same time; but if he/she is an intelligent doctor, would definitely save his/her life first, if suffering from Asthma, so that he/she could take care of others. This might seem like a rationalization (saving one’s life so that one can help in whatever small way others), but one should use intelligence, even before emotions or moral instincts take center stage.

I believe that in the Airplane Scenario described above, there is a potential for a histrionic/hysteric reaction of the crew and travelers, as everyone tries to help the patients, (especially if no doctor is on-board) and that this may be the reverse of the bystander-effect like phenomenon I have hypothesized might happen in the Trolley problem (freezing and taking no action when a train is approaching towards five or six humans or towards a lone human). To make more sense of preceding line please read comments by Mc
Ewen on Mind Hacks post titled ” “Mass Hysteria” closes school”. Also, a solemn and personal request, please do not jump to conclusions, read or try to co-relate things out of context- or try to make sense of psychological concepts based on everyday usage of terms. If you do not understand any concepts mentioned above, read related literature and focus on that aspect alone- to the exclusion of other distracting eye-catchers. In case of any persisting confusions, feel free to ask your local psychiatrist/ psychologist/ psychology professor as to what those concepts mean.

PS: I believe that the post has become difficult-to-read, this was not done intentionally. Again, there might be spelling mistakes/ grammatical errors- don’t get alarmed/ confused that this reflects racing thoughts etc- just point them out and I’ll fix them- most of the times the editorial errors (some of them quite funny) are due to lack of time to revise/ lethargy to read. Also, this is also a part of my ongoing series, where I have posited that their may be gender differences in cognitive styles. Some of that may also be a required reading.

Schizophrenia, Religion, Autism and the Indian culture (alternate title: Life, The Universe and Everything)

In continuation of my focus on the Schizophrenia-Autism dichotomy, I’ll like to highlight two articles that seem to support my view.

The first is a blog post by, John Horgan, speculating whether religiosity is the inverse of autism.

The anthropologist Stewart Guthrie proposes that religious experiences—and particularly those involving visions or intuitions of a personal God–may stem from our innate tendency toward anthropomorphism, “the attribution of human characteristics to nonhuman things or events.” Guthrie called his book on this theory Faces in the Clouds, but he could have called it Jesus in the Tortilla.

Recent findings in developmental psychology dovetail with Guthrie’s theory. By the age of three or four all healthy children manifest an apparently innate ability to infer the state of mind of other people.

Psychologists postulate that autism stems from a malfunction of the theory-of-mind module. Autistics have difficulty inferring others’ thoughts, and even see no fundamental distinction between people and inanimate objects, such as chairs or tables. That is why autism is sometimes called “mind-blindness.”

But many of us have the opposite problem—an overactive theory-of-mind capacity, which leads to what the psychologist Justin Barrett calls “hyperactive agent detection.” When we see squares and triangles moving around a screen, we cannot help but see the squares “chasing” the triangles, or vice versa, even when we are told that the movements are random.

This is compatible with this blog’s Schizophrenia-is-the-inverse-of-Autism theory for the following reasons:

1. Too much belief in agency in Schizophrenics (the hyperactive Agent detector conceptualized above) vs too less belief in agency in Autistics – characterized by me earlier as a Fantasy/Imagination Vs Reality orientation – has a direct relevance to whether one attributes anthropomorphic agency to non-living things and events (and thus Nature or God) or even fails to attribute intention to humans and animals and assumes them to be mere automata. I believe while a schizophrenic mindset can be characterized by a suspension-of-disbelief and too much causality and intention attribution (thus leading to the mindset compatible with religious/ spiritual leanings), the autistic mindset would lead to too much skepticism, too much even-causal-happenings-are-only-coincidental mindset and a reductionist, atheistic mindset that attributes no intention to humans, least of all animals, and believes that they are just advanced machines. I guess both are extremes of delusion, in one case one characterizes that as the GOD delusion; but the other extremist who sees no role of agency or intentionality (even in humans) is hauled as a great scientist!!

2. Another prominent dimension on which the Schizophrenics and autistic differ is the Literal-Metaphor dimension. I would like to frame that in terms of a Reference-Meaning use of a linguistic word and the consequent distinction in linguistics between a symbol as a referent of something and a symbol as signifying a meaning. For an excellent commentary on this difference, please do read this classical paper.

Meaning, let us remember, is not to be identified with naming. Frege’s example of ‘Evening Star’ and ‘Morning Star’ and Russell’s of ‘Scott’ and ‘the author of Waverly’, illustrate that terms can name the same thing but differ in meaning. The distinction between meaning and naming is no less important at the level of abstract terms. The terms ‘9’ and ‘the number of the planets’ name one and the same abstract entity but presumably must be regarded as unlike in meaning; for astronomical observation was needed, and not mere reflection on meanings, to determine the sameness of the entity in question.

It is my contention that while the Schizophrenics are meaning obsessed; the Autistics are more reference obsessed, and thus have problems with metaphorical and figurative speech. From linguistics one can stretch the Meaning-Reference distinction and conceive of too much meaning orientation in schizophrenics ( and a meaningful life requires a GOD that gives a meaning to our lives) versus a nihilistic orientation in autistics that views the life/ evolution as purposeless. As many evolutionists famously claim – there is no meaning inherent in evolution, life or humans – rather that the question of meaning is invalid. Life just is.

3. Many schizophrenic delusions can be explained by an extreme manifestation of religiosity/ spirituality. As Szasz famously said, ” If you talk to God, you are praying; if God talks to you you have schizophrenia”. Both a belief in GOD and his ability to listen to our prayers (the religious belief) and the converse belief that God can talk to us , many times in symbolic ways, but sometimes in the form of actual auditory hallucinations are a manifestation of the same cognitive mechanism that attributes too much agency, causality and meaning. Many schizophrenics, indeed do suffer from delusion of Grandeur, whereby they think of themselves as GOD-like; or the delusion of persecution and paranoia whereby they are persecuted by Satan like evil figures. thus both hallucinations as well as the common delusions are explainable by the religiosity orientation. this time the GOD delusion is different – one believes that one is a god-head. In non-religious cultures, these being-GOD delusions may take the non-religious forms of being a famous historical person (who had great agency and effect on Human history and is presumably now active via the agency of the deluded schizophrenic), and the persecution delusions may not refer to Satan- but to their non-secular counterparts- the CIA and the government!! Of course the pathological forms of an Autistic mindset, that may have nihilistic orientations, and out of boredom and feelings of meaninglessness, may resort to meaningless acts of violence like the Columbine Massacres is one direction which needs further study.

I would now like to now draw focus on the Cultural differences post where I had speculated on the different incidences rates of Schizophrenia and Autism in the East Asian and American cultures based on the differential emphasis on holistic and contextual versus analytical and local processing and cognition and also presented some supporting evidence. The well documented religious/spiritual inclination of Oriental cultures versus the Scientific/materialistic orientation of the American and western cultures may be another factor that would affect and explain the relative incidences of Schizophrenia and Autism in these cultures.

In a culture like India, in which the people believe in 18 crore (180 billion) Gods and Deities, believe in reincarnation and believe that every human being is potentially divine, if a human errs towards an extreme and starts developing funny ideas of being a God herself, then that may not ring the alarm bells immediately. Rather some form of that delusion may even be encouraged (that is why in India names are kept after the Gods and Deities; while its rare to find the name Jesus in West, you can find millions of Rams in India). If the same GOD-delusion develops in an American, then his idea of being Jesus (or an angel) would definitely be detected early, lead to an earlier ‘label’ and an earlier hospitalization.

That said, I would now like to draw attention to an article today in the Times Of India, that pointed me to some more literature that unequivocally shows that not only are the incidence rates of schizophrenia less in India (and other third world (Asian) countries), the prognosis is manifold better in Indian patients as compared to American patients.

The success story of schizophrenics in India was propagated by mental health professionals based on the WHO research DOSMeD in 1979. This was carried out in 10 countries including developing ones such as India, Nigeria and Columbia. The findings showed striking differences in the prognosis of schizophrenia between developed and developing countries. The underlying causes for the diversity were associated more with family and social variables than clinical determinants. Majority of patients in developing countries showed remission over two years; only 50 per cent of them had a single relapse though around 15 per cent never recovered. Patient outcome in developing countries was superior to that in developed economies.

This difference has been hypothesized to be due to the strong family structure (and I do believe that it is an important factor) and the social cushion, support and acceptance that a family provides to the patient and shields him/her from stressful situations that may trigger a relapse.

This theory of a family-protective-advantage has come under attack recently, but I think the attack is flawed because it clubs countries not according to Cultures, but according to developmental status. Indeed, the other factor that may be affecting a better outcome in schizophrenic patients may be the cultural differences like the different cognitive/perceptual styles and a more tolerance for religious/spiritual/ mystical ideas. By shielding a person from stigmatization and isolation, based on eccentricities exhibited along these dimensions, one may be preventing or delaying relapse, and ensuring better outcome by not pushing the person over the edge. In the pats, it was not infrequent, for those who had psychotic experiences to be labeled as shamans and to be treated with respect, rather than stigma and isolation; thus ensuring that they were not exposed to social stresses in the future.

I have taken a somewhat deprecating attitude towards the extreme autistic orientation characterized by no intentionality, causality, spiritual beliefs, but I am a strong believer in the fact that though the extreme manifestation of Autism/Schizophrenia makes one dysfunctional, a pronounced autistic/ schizophrenic orientation does endow one with creative faculties – either to understand and manipulate the world (the Sciences) or to understand and manipulate the subjective experiences (the Arts) . In particular, as the readers of this blog would most likely be scientists, and because I belong the the scientific community and have failed to see how a scientific orientation is incompatible with an artistic/symbolic/spiritual orientation , I have taken a harder line for the extreme Atheist and nihilism zealots.

I believe one can, and must, utilize the different types of cognitive abilities these extreme manifestations and disorders caricature. Do let me know what your think!

Abstract vs Concrete: the two genders?( the catogorization debate)

In my previous posts I have focussed on distinctions in cognitive styles based on figure-ground, linear-parallel, routine-novel and literal-metaphorical emphasis.

There is another important dimension on which cognitive styles differ and I think this difference is of a different dimension and mechanism than the figure-ground difference that involves broader and looser associations (more context) vs narrow and intense associations (more focus). One can characterize the figure-ground differences as being detail and part-oriented vs big picture orientation and more broadly as analytical vs synthesizing style.

The other important difference pertains to whether associations and hence knowledge is mediated by abstract entities or whether associations, knowledge and behavior is grounded in concrete entities/experiences. One could summarize this as follows: whether the cognitive style is characterized by abstraction or whether it is characterized by a particularization bias. One could even go a step further and pit an algorithmic learning mechanism with one based on heuristics and pragmatics.

It is my contention that the bias towards abstraction would be greater for Males and the left hemisphere and the bias towards Particularization would be greater for Females and the right hemisphere.

Before I elaborate on my thesis, the readers of this blog need to get familiar with the literature on categorization and the different categorization/concept formation/ knowledge formation theories.

An excellent resource is a four article series from Mixing Memory. I’ll briefly summarize each post below, but you are strongly advised to read the original posts.

Background: Most of the categorization efforts are focussed on classifying and categorizing objects, as opposed to relations or activities, and the representation of such categories (concepts) in the brain. Objects are supposed to be made up of a number of features . An object may have a feature to varying degrees (its not necessarily a binary has/doesn’t has type of association, one feature may be tall and the feature strength may vary depending on the actual height)

The first post is regarding classical view of concepts as being definitional or rule-bound in nature. This view proposes that a category is defined by a combination of features and these features are of binary nature (one either has a feature or does not have it). Only those objects that have all the features of the category, belong to a category. The concept (representation of category) can be stored as a conjunction rule. Thus, concept of bachelor may be defined as having features Male, single, human and adult. To determine the classification of a novel object, say, Sandeep Gautam, one would subject that object to the bachelor category rule and calculate the truth value. If all the conditions are satisfied (i.e. Sandeep Gautam has all the features that define the category bachelor), then we may classify the new object as belonging to that category.


Bachelor(x)= truth value of (male(x))AND(adult(x))AND(single(x))AND(human(x))

Thus a concept is nothing but a definitional rule.

The second and third posts are regarding the similarity-based approaches to categorization. These may also be called the clustering approaches. One visualizes the objects as spread in a multi-dimensional feature space, with each dimension representing the various degrees to which the feature is present. The objects in this n-dim space, which are close to each other, and are clustered together, are considered to form one category as they would have similar values of features. In these views, the distance between objects in this n-dim feature space, represents their degree of similarity. Thus, the closer the objects are the more likely that they are similar and the moire likely that we can label them as belonging to one category.

To take an example, consider a 3-dim space with one dimension (x) signifying height, the other (y) signifying color, and the third (z) signifying attractiveness . Suppose, we rate many Males along these dimensions and plot them on this 3-d space. Then we may find that some males have high values of height(Tall), color(Dark) and attractiveness(Handsome) and cluster in the 3-d space in the right-upper quadrant and thus define a category of Males that can be characterized as the TDH/cool hunk category(a category that is most common in the Mills and Boons novels). Other males may meanwhile cluster around a category that is labeled squats.

Their are some more complexities involved, like assigning weights to a feature in relation to a category, and thus skewing the similarity-distance relationship by making it dependent on the weights (or importance) of the feature to the category under consideration. In simpler terms, not all dimensions are equal , and the distance between two objects to classify them as similar (belonging to a cluster) may differ based on the dimension under consideration.

There are two variations to the similarity based or clustering approaches. Both have a similar classification and categorization mechanism, but differ in the representation of the category (concept). The category, it is to be recalled, in both cases is determined by the various objects that have clustered together. Thus, a category is a collection or set of such similar object. The differences arise in the representation of that set.

One can represent a set of data by its central tendencies. Some such central tendencies, like Mean Value, represent an average value of the set, and are an abstraction in the sense that no particular member may have that particular value. Others like Mode or Median , do signify a single member of that set, which is either the most frequent one or the middle one in an ordered list. When the discussion of central tendencies is extended to pairs or triplets of values, or to n-tuples (signifying n dim feature space) , then the concept of mode or median becomes more problematic, and a measure based on them, may also become abstract and no longer remain concrete.

The other central tendencies that one needs are an idea of the distribution of the set values. With Mean, we also have an associated Variance, again an abstract parameter, that signifies how much the set value are spread around the Mean. In the case of Median, one can resort to percentile values (10th percentile etc) and thus have concrete members as representing the variance of the data set.

It is my contention that the prototype theories rely on abstraction and averaging of data to represent the data set (categories), while the Exemplar theories rely on particularization and representativeness of some member values to represent the entire data set.

Thus, supposing that in the above TDH Male classification task, we had 100 males belonging to the TDH category, then a prototype theory would store the average values of height, color and attractiveness for the entire 100 TDH category members as representing the TDH male category.

On the other hand, an exemplar theory would store the particular values for the height, color and attractiveness ratings of 3 or 4 Males belonging to the TDH category as representing the TDH category. These 3 or 4 members of the set, would be chosen on their representativeness of the data set (Median values, outliers capturing variance etc).

Thus, the second post of Mixing Memory discusses the Prototype theories of categorization, which posits that we store average values of a category set to represent that category.


Similarity will be determined by a feature match in which the feature weights figure into the similarity calculation, with more salient or frequent features contributing more to similarity. The similarity calculation might be described by an equation like the following:

Sj = Si (wi.v(i,j))

In this equation, Sj represents the similarity of exemplar j to a prototype, wi represents the weight of feature i, and v(i,j) represents the degree to which exemplar j exhibits feature i. Exemplars that reach a required level of similarity with the prototype will be classified as members of the category, and those fail to reach that level will not.

The third post discusses the Exemplar theory of categorization , which posits that we store all, or in more milder and practical versions, some members as exemplars that represent the category. Thus, a category is defined by a set of typical exemplars (say every tenth percentile).

To categorize a new object, one would compare the similarity of that object with all the exemplars belonging to that category, and if this reaches a threshold, the new object is classified as belonging to the new category. If two categories are involved, one would compare with exemplars from both the categories, and depending on threshold values either classify in both categories , or in a forced single-choice task, classify in the category which yields better similarity scores.


We encounter an exemplar, and to categorize it, we compare it to all (or some subset) of the stored exemplars for categories that meet some initial similarity requirement. The comparison is generally considered to be between features, which are usually represented in a multidimensional space defined by various “psychological” dimensions (on which the values of particular features vary). Some features are more salient, or relevant, than others, and are thus given more attention and weight during the comparison. Thus, we can use an equation like the following to determine the similarity of an exemplar:

dist(s, m) = åiai|yistimymiex|

Here, the distance in the space between an instance, s, and an exemplar in memory, m, is equal to the sum of the values of the feature of m on all of dimensions (represented individually by i) subtracted from the feature value of the stimulus on the same dimensions. The sum is weighted by a, which represents the saliency of the particular features.

There is another interesting clustering approach that becomes available to us, if we use an exemplar model. This is the proximity-based approach. In this, we determine all the exemplars (of different categories) that are lying in a similarity radius (proximity) around the object in consideration. Then we determine the category to which these exemplars belong. The category to which the maximum number of these proximate exemplars belong, is the category to which this new object is classified.

The fourth post on Mixing Memory deals with a ‘theory’ theory approach to categorization, and I will not discuss it in detail right now.

I’ll like to mention briefly in passing that there are other relevant theories like schemata , scripts, frames and situated simulation theories of concept formation that take into account prior knowledge and context to form concepts.

However, for now, I’ll like to return to the prototype and exemplar theories and draw attention to the fact that the prototype theories are more abstracted, rule-type and economical in nature, but also subject to pragmatic deficiencies, based on their inability to take variance, outliers and exceptions into account; while the exemplar theories being more concrete, memory-based and pragmatic in nature (being able to account for atypical members) suffer from the problems of requiring large storage/ unnecessary redundancy. One may even extrapolate these differences as the one underlying procedural or implicit memory and the ones underlying explicit or episodic memory.

There is a lot of literature on prototypes and exemplars and research supporting the same. One such research is in the case of Visual perception of faces, whereby it is posited that we find average faces attractive , as the average face is closer to a prototype of a face, and thus, the similarity calculation needed to classify an average face are minimal. This ease of processing, we may subjectively feel as attractiveness of the face. Of course, male and female prototype faces would be different, both perceived as attractive.

Alternately, we may be storing examples of faces, some attractive, some unattractive and one can theorize that we may find even the unattractive faces very fast to recognize/categorize.

With this in mind I will like to draw attention to a recent study that highlighted the past-tense over-regularization in males and females and showed that not only do females make more over-regularization errors, but also these errors are distributed around similar sounding verbs.

Let me explain what over-regularization of past-tense means. While the children are developing, they pick up language and start forming the concepts like that of a verb and that of a past tense verb. They sort of develop a folk theory of how past tense verbs are formed- the theory is that the past tense is formed by appending an ‘ed’ to a verb. Thus, when they encounter a new verb, that they have to use in past tense (and which say is irregular) , then they will tend to append ‘ed’ to the verb to make the past tense. Thus, instead of learning that ‘hold’, in past tense becomes ‘held’, they tend to make the past tense as ‘holded’.

Prototype theories suggest, that they have a prototypical concept of a past tense verb as having two features- one that it is a verb (signifies action) and second that it has ‘ed’ in the end.

Exemplar theories on the other hand, might predict, that the past tense verb category is a set of exemplars, with the exemplars representing one type of similar sounding verbs (based on rhyme, last coda same etc). Thus, the past tense verb category would contain some actual past tense verbs like { ‘linked’ representing sinked, blinked, honked, yanked etc; ‘folded’ representing molded, scolded etc}.

Thus, this past tense verb concept, which is based on regular verbs, is also applied while determining the past tense of irregular verb. On encountering ‘hold’ an irregular verb, that one wants to use in the past tense, one may use ‘holded’ as ‘holded’ is both a verb, ends in ‘ed’ and is also very similar to ‘folded’. While comparing ‘hold’ with a prototype, one may not have the additional effect of rhyming similarity with exemplars, that is present in the exemplar case; and thus, females who are supposed to use an exemplar system predominantly, would be more susceptible to over-regularization effects as opposed to boys. Also, this over-regularization would be skewed, with more over-regularization for similar rhyming regular verbs in females. As opposed to this, boys, who are usinbg the prototype system predominantly, would not show the skew-towards-rhyming-verbs effect. This is precisely what has been observed in that study.

Developing Intelligence has also commented on the same, though he seems unconvinced by the symbolic rules-words or procedural-declarative accounts of language as opposed to the traditional confectionist models. The account given by the authors, is entirely in terms of procedural (grammatical rule based) versus declarative (lexicon and pairs of past and present tense verb based) mechanism, and I have taken the liberty to reframe that in terms of Prototype versus Exemplar theories, because it is my contention that Procedural learning , in its early stages is prototypical and abstractive in nature, while lexicon-based learning is exemplar and particularizing in nature.

This has already become a sufficiently long post, so I will not take much space now. I will return to this discussion, discussing research on prototype Vs exemplars in other fields of psychology especially with reference to Gender and Hemisphericality based differences. I’ll finally extend the discussion to categorization of relations and that should move us into a whole new filed, that which is closely related to social psychology and which I believe has been ignored a lot in cognitive accounts of learning, thinking etc.

Artificial Neural Networks: temporal summation, embedded ‘clocks’ and operant learning

Artificial Neural Networks have historically focussed on modeling the brain as a collection of interconnected neurons. The individual neurons aggregate inputs and either produce an on/off output based on threshold values or produce a more complex output as a linear or sigmoid function of their inputs. The output of one neuron may go to several other neurons.

Not all inputs are equivalent and the inputs to the neuron are weighed according to a weight assigned to that input connection. This mimics the concept of synaptic strength. The weights can be positive (signifying an Excitatory Post-Synaptic Potential ) or negative (signifying an Inhibitory Post-Synaptic Potential).

Learning consists of the determination of correct weights that need to be assigned to solve the problem; i.e. to produce a desired output, given a particular input. This weight adjustment mimics the increase or decrease of synaptic strengths due to learning. Learning may also be established by manipulating the threshold required by the neuron for firing. This mimics the concept of long term potentiation (LTP).

The model generally consists of an input layer (mimicking sensory inputs to the neurons) , a hidden layer (mimicking the association functions of the neurons in the larger part of the brain) and an output layer ( mimicking the motor outputs for the neurons).

This model is a very nice replication of the actual neurons and neuronal computation, but it ignores some of the other relevant features of actual neurons:

1. Neuronal inputs are added together through the processes of both spatial and temporal summation. Spatial summation occurs when several weak signals are converted into a single large one, while temporal summation converts a rapid series of weak pulses from one source into one large signal. The concept of temporal summation is generally ignored. The summation consists exclusively of summation of signals from other neurons at the same time and does not normally include the concept of summation across a time interval.

2. Not all neuronal activity is due to external ‘inputs’. Many brain regions show spontaneous activity, in the absence of any external stimulus. This is not generally factored in. We need a model of brain that takes into account the spontaneous ‘noise’ that is present in the brain, and how an external ‘signal’ is perceived in this ‘noise’. Moreover, we need a model for what purpose does this ‘noise’ serve?

3. This model mimics the classical conditioning paradigm, whereby learning is conceptualized in terms of input-output relationships or stimulus-response associations. It fails to throw any light on many operant phenomenon and activity, where behavior or response is spontaneously generated and learning consist in the increase\decrease \ extinction of the timing and frequency of that behavior as a result of a history of reinforcement. This type of learning accounts for the majority of behavior in which we are most interested- the behavior that is goal directed and the behavior that is time and context and state-dependent. The fact that a food stimulus, will not always result in a response ‘eat’, but is mediated by factors like the state (hunger) of the organism, time-of-day etc. is not explainable by the current models.

4. The concept of time, durations and how to tune the motor output as per strict timing requirements has largely been an unexplored area. While episodic learning and memory may be relatively easier to model in the existing ANNs, its my hunch that endowing them with a procedural memory would be well nigh impossible using existing models.

Over a series of posts, I would try to tackle these problems by enhancing the existing neural networks by incorporating some new features into it, that are consistent with our existing knowledge about actual neurons.

First, I propose to have a time-threshold in each neural unit. This time-threshold signifies the duration in which temporal summation is applicable and takes place. All inputs signals, that are received within this time duration, either from repeated firing of the same input neuron or from time-displaced firings of different input neurons, are added together as per the normal input weights and if at any time this reaches above the normal threshold-for-firing, then the neuron fires. This has combined both temporal and spatial summation concepts. With temporal summation, we have an extra parameter- the time duration for which the history of inputs needs to be taken into account.

All neurons will also have a very short-term memory, in the sense that they would be able to remember the strengths of the inputs signals that they have received in the near past , that is in the range of the typical time-thresholds that are set for them. This time-threshold can typically be in milliseconds.

Each time a neuron receives an input, it starts a timer. This timer would run for a very small duration encoded as the time-threshold for that neuron. Till the time this timer is running and has not expired, the input signal is available to the neuron for calculation of total input strength and for deciding whether to fire or not. As soon as the timer expires, the memory of the associated input is erased from the neurons memory and that particular input would no longer be able to affect any future firing of the neuron.

All timers as well as the memory of associated input signals are erased after each successful neural firing (every time the neuron generates an action potential). After each firing, the neuron starts from afresh and starts accumulating and aggregating the inputs it receives thereafter in the time-threshold window that is associated with it.

Of course there could be variations to this. Just like spatial aggregation/firing need not be an either/or decision based on a threshold; the temporal aggregation/ firing need not be an either-or decision: one could have liner or sigmoid functions of time that modulate the input signal strength based on the time that has elapsed. One particular candidate mechanism could be a radioactive decay function, that decreases the input signal strength by half after each half-life. Here, the half-life is equivalent to the concept of a time-threshold. While in the case of time-threshold, after a signal arrives, and once the time-threshold has elapsed, then the input signal is not available to the neuron at all, and while the time-threshold had not elapsed the signal was available in its entirety; in the case of radioactive deacy the inpiut signal is available till infinity in theory; but the strength of the signal would get diminisehd by half after each half-life period; thus making the effects of the input signal negligible after a few half-lives. Of course in the radioactive case too, once the neuron has fired, all memory of that input would be erased and any half-life decay computations stopped.

These are not very far-fetched speculations and modeling the neural networks this way can lead to many interesting results.

Second, I propose to have some ‘clocks’ or ‘periodic oscillators’ in the network, that would be generating spontaneous outputs after a pre-determined time and irrespective of any inputs. Even one such clock is sufficient for our discussions. Such a clock or oscillator system is not difficulty to envisage or conceive. We just need a non-random, deterministic delay in the transmission of signals from one neuron to the other. There do exist systems in the brain that delay the signals, but leaving aside such specialized systems, even a normal synaptic transmission along an axon between two neurons, would suffer from some deterministic delay based on the time it takes the signal to travel down the axon length and assuming that no changes in myelination takes place over time, so that the speed of transmission is constant.

In such a scenario, the time it takes for a signal to reach the other neuron, is held constant over time. (Note that this time may be different for different neuron pairs based on both the axon lengths involved and the associated myelination, but would be same for the same neuron pair over time). Suppose that both the neurons have very long, unmyelinated axons and that these axons are equal in length and provide inputs to each other. Further suppose that both the neurons do not have any other inputs , though each may send its output to many other neurons.

Thus, the sole input of the first neuron is the output of the second neuron and vice versa. Suppose that the thresholds of the two neurons are such that each would trigger, if it received a single input signal (from the peer neuron). As there would be a time lag between the firing of neuron one, and its reaching the second neuron, the second neuron would fire only after, say 5 milliseconds, the time it takes for signal to travel, after the first neuron has fired. The first neuron meanwhile will respond to the AP generated by the second neuron -which would reach it after (5+5= 10 ms) the round trip delay- and generate an AP after 10 ms from its initial firing.

We of course have to assume that somehow, the system was first put in motion: someone caused the first neuron to fire initially (this could not be other neurons, as we have assumed that this oscillator pair has no external input signals) and after that it is a self-sustaining clock with neuron 1 and neuron 2 both firing regularly at 10 ms intervals but in opposite phases. We just need GOD to initally fire the first neuron (the park of life) and thereafter we do have a periodic spontaneous activity in the system.

Thirdly, I propose that this ‘clock’, along with the concept of temporal summations, is able to calculate and code any arbitrary time duration and any arbitrary time dependent behavior, but in particular any periodic or sate/ goal based behavior. I’ve already discussed some of this in my previous posts and elaborate more in subsequent posts.

For now, some elementary tantalizing facts.

1. Given a 10 ms clock and a neuron capable of temporal summation over 50 ms duration, we can have a 50 ms clock: The neuron has the sole input as the output of the 10ms clock. After every 50 ms, it would have accumulated 5 signals in its memory. If the threshold-for-firing of the neuron is set such that it only fires if it has received five time the signal strength that is outputted by the 10 ms clock , then this neuron will fire after very 50 ms. This neuron would generate a periodic output after every 50 ms and implements a 50 ms clock.

2. Given a 10 ms clock and a neuron capable of temporal summation over 40 ms, (or lets have the original 50 ms time-threshold neuron, but set its threshold-for-firing to 4 times the output strength of the 10 ms clock neuron) , using the same mechanism as defined above, we can have a 40 ms clock.

3. Given a 40 ms clock, a 50 ms clock and a neuron that does not do temporal summation, we can have a 2000 ms clock. The sole inputs to the neuron implementing the 2000 ms clock are the outputs of the 50 ms and the 40 ms clock. This neuron does not do temporal summation. Its threshold for firing is purely spatial and it fires only if it simultaneously receives a signal strength that is equal to or greater than the combined output signal strength of 50ms and 40 ms neuron. It is easy to see, that if we assume that the 50 ms and 40 ms neurons are firing in phase, then only after every 2000 ms would the signals from the two neurons arrive at the same time for this 2000ms clock. Viola, we have 2000 ms clock. After this, I assume, its clear that the sky is the limit as to the arbitrariness of the duration that we can code for.

Lastly, learning consists of changing the temporal thresholds associated with a neuron, so that any arbitrary schedule can be associated with a behavior, based on the history of reinforcement. After the training phase, the organism would exhibit spontaneous behavior that follows a schedule and could learn novel schedules for novel behaviors (transfer of learning).

To me all this seems very groundbreaking theorizing and I am not aware of how and whether these suggestions/ concepts have been incorporated in existing Neural Networks. Some temporal discussions I could find here. If anyone is aware of such research , do let me know via comments or by dropping a mail. I would be very grateful. I am especially intrigued by this paper (I have access to abstract only) and the application of temporal summation concepts to hypothalamic reward functions.

Zombies, AI and Temporal Lobe Epilepsy : towards a universal consciousness and behavioral grammar?

I was recently reading an article on Zombies about how the Zombie argument has been used against physicalism and in consciousness debates in general, and one quote by Descartes at the beginning of the article captured my attention :

Descartes held that non-human animals are automata: their behavior is explicable wholly in terms of physical mechanisms. He explored the idea of a machine which looked and behaved like a human being. Knowing only seventeenth century technology, he thought two things would unmask such a machine: it could not use language creatively rather than producing stereotyped responses, and it could not produce appropriate non-verbal behavior in arbitrarily various situations (Discourse V). For him, therefore, no machine could behave like a human being. (emphasis mine)

To me this seems like a very reasonable and important speculation: although we have learned a lot about how we are able to generate an infinite variety of creative sentences using the generative grammar theory of Chomsky (I must qualify, we only know how to create a new grammatically valid sentence-the study of semantics has not complimented the study in syntax – so we still do not know why we are also able to create meaningful sentences and not just grammatically correct gibberish like “Colorless green ideas flow furiously” : the fact that this grammatically correct sentence is still interpretable by using polysemy , homonymy or metaphorical sense for ‘colorless’, ‘green’ etc may provide the clue for how we map meanings -the conceptual Metaphor Theory- but that discussion is for another day), we still do not have a coherent theory of how and why we are able to produce a variety of behavioral responses in arbitrarily various situations.

If we stick to a physical, brain-based, reductionist, no ghost-in-the-machine, evolved-as-opposed-to-created view of human behavior, then it seems reasonable that we start from the premise of humans as an improvement over the animal models of stimulus-response (classical conditioning) or response-reinforcement (operant conditioning) theories of behavior and build upon them to explain how and what mechanism Humans have evolved to provide a behavioral flexibility as varied, creative and generative as the capacity for grammatically correct language generation. The discussions of behavioral coherence, meaningfulness, appropriateness and integrity can be left for another day, but the questions of behavioral flexibility and creativity need to be addressed and resolved now.

I’ll start with emphasizing the importance of response-reinforcement type of mechanism and circuitry. Unfortunately most of the work I am familiar with regarding the modeling of human brain/mind/behavior using Neural Networks focuses on the connectionist model with the implicit assumption that all response is stimulus driven and one only needs to train the network and using feedback associate a correct response with a stimulus. Thus, we have an input layer for collecting or modeling sensory input, a hidden association layer and an output layer that can be considered as a motor effector system. This dissociation of input acuity, sensitivity representation in the form of input layer ; output variability and specificity in the form of an output layer; and one or more hidden layers that associate input with output and may be construed as an association layer maps very well to our intuitions of a sensory system, a motor system and an association system in the brain to generate behavior relevant to external stimuli/situations. However, this is simplistic in the sense that it is based solely on stimulus-response types of associations (the classical conditioning) and ignores the other relevant type of association response-reinforcement. Let me clarify that I am not implying that neural networks models are behavioristic: in the form of hidden layers they leave enough room for cognitive phenomenon, the contention is that they not take into account the operant conditioning mechanisms. Here it is instructive to note that feedback during training is not equivalent to operant-reinforcement learning: the feedback is necessary to strengthen the stimulus-response associations; the feedback only indicates that a particular response triggered by the particular stimuli was correct.

For operant learning to take place, the behavior has to be spontaneously generated and based on the history of its reinforcement its probability of occurrence manipulated. This takes us to an apparently hard problem of how behavior can be spontaneously generated. All our life we have equated reductionism and physicalism with determinism, so a plea to spontaneous behavior seems almost like begging for a ghost-in-the-machine. Yet on careful thinking the problem of spontaneity (behavior in absence of stimulus) is not that problematic. One could have a random number generator and code for random responses as triggered by that random number generator. One would claim that introducing randomness in no way gives us ‘free will’, but that is a different argument. What we are concerned with is spontaneous action, and not necessarily, ‘free’ or ‘willed’ action.

To keep things simple, consider a periodic oscillator in your neural network. Lets us say it has a duration of 12 hours and it takes 12 hours to complete one oscillation (i.e. it is a simple inductor-capacitor pair and it takes 6 hours for capacitor to discharge and another 6 hours for it to recharge) ; now we can make connections a priori between this 12 hr clock in the hidden layer and one of the outputs in the output layer that gets activated whenever the capacitor has fully discharged i.e. at a periodic interval of 12 hours. Suppose that this output response is labeled ‘eat’. Thus we have coded in our neural networks a spontaneous mechanism by which it ‘eats’ at 12 hour durations.

Till now we haven’t really trained our neural net, and moreover we have assumed a circuitry like a periodic oscillator in the beginning itself, so you may object to this saying this is not how our brain works. But let us be reminded that just like normal neurons in the brain which form a model for neurons in the neural network, there is also a suprachiasmatic nuclei that gives rise to circadian rhythms and implements a periodic clock.

As for training, one can assume the existence of just one periodic clock of small granularity, say 1 second duration in the system, and then using accumulators that code for how many ticks have elapsed since past trigger, one can code for any arbitrary periodic response of greater than one second granularity. Moreover, one need not code for such accumulators: they would arise automatically out of training from the other neurons connected to this ‘clock’ and lying between the clock and the output layer. Suppose, that initially, to an output marked ‘eat’ a one second clock output is connected (via intervening hidden neuron units) . Now, we have feedback in this system also. Suppose, that while training, we provide positive feedback only on 60*60*12 trials (and all its multiples) and provide negative feedback on all other trials, it is not inconceivable to believe that an accumulator neural unit would get formed in the hidden layer and count the number of ticks that come out of the clock: it would send the trigger to output layer only on every 60*60*12 th trial and suppress the output of the clock on every other trial. Viola! We now have a 12 hour clock (which is implemented digitally using counting ticks) inside our neural network coding for a 12 hour periodic response. We just needed to have one ‘innate’ clock mechanism and using that and the facts of ‘operant conditioning’ or ‘response-reinforcement’ pairing we can create an arbitrary number of such clocks in our body/brain. Also, please notice the fact, that we need just one 12 hour clock, but can flexibly code for many different 12 hour periodic behaviors. Thus, if the ‘count’ in accumulator is zero, we ‘eat’; if the count is midway between 0 and 60*60*12, we ‘sleep’. Thus, though both eating and sleeping follow a 12 hour cycle, they do not occur concurrently, but are separated by a 6 hour gap.

Suppose further, that one reinforcement that one is constantly exposed to and that one uses for training the clock is ‘sunlight’. The circadian clock is reinforced, say only by the reinforcement provided by getting exposed to the mid noon sun, and by no other reinforcements. Then, we have a mechanism in place for the external tuning of our internal clocks to a 24 hour circadian rhythm. It is conceivable, that for training other periodic operant actions, one need not depend on external reinforcement or feedback, but may implement an internal reinforcement mechanism. To make my point clear, while ‘eat’ action, i.e. a voluntary operant action, may get generated randomly initially, and in the traditional sense of reinforcement, be accompanied by intake of food, which in the classical sense of the word is a ‘reinforcement’; the intake of food, which is part-and-parcel of the ‘eat’ action should not be treated as the ‘feedback’ that is required during training of the clock. During the training phase, though the operant may be activated at different times (and by the consequent intake of food be intrinsically reinforced) , the feedback should be positive only for the operant activations inline with the periodic training i.e. only on trials on which the operant is produces as per the periodic training requirement; and for all other trails negative feedback should be provided. After the training period, not only would operant ‘eat’ be associated with a reinforcement ‘food’: it would also occur as per a certain rhythm and periodicity. The goal of training here is not to associate a stimulus with a response ( (not the usual neural networks association learning) , but to associate a operant (response) with a schedule(or a concept of ‘time’). Its not that revolutionary a concept, I hope: after all an association of a stimulus (or ‘space’) with response per se is meaningless; it is meaningful only in the sense that the response is reinforced in the presence of the stimulus and the presence of the stimulus provides us a clue to indulge in a behavior that would result in a reinforcement. On similar lines, an association of a response with a schedule may seem arbitrary and meaningless; it is meaningful in the sense that the response is reinforced in the presence of a scheduled time/event and the occurrence of the scheduled time/event provides us with a reliable clue to indulge in a behavior that would result in reinforcement.

To clarify, by way of an example, ‘shouting’ may be considered as a response that is normally reinforcing, because of say its being cathartic in nature . Now, ‘shouting’ on seeing your spouse”s lousy behavior may have had a history of reinforcement and you may have a strong association between seeing ‘spouse’s lousy behavior’ and ‘shouting’. You thus have a stimulus-response pair. why you don’t shout always, or while say the stimuli is your ‘Boss’s lousy behavior’, is because in those stimulus conditions, the response ‘shouting’, though still cathartic, may have severe negative costs associated, and hence in those situations it is not really reinforced. Hence, the need for an association between ‘spouse lousy behavior’ and ‘shouting’ : only in the specific stimulus presence is shouting reinforcing and not in all cases.

Take another example that of ‘eating’, which again can be considered to be a normally rewarding and reinforcing response as it provides us with nutrition. Now, ‘eating’ 2 or 3 times in a day may be rewarding; but say eating all the time, or only on 108 hours periodicity may not be that reinforcing a response, because that schedule does not take care of our body requirements. While eating on a 108 hours periodicity would impose severe costs on us in terms of under nutrition and survival, eating on 2 mins periodicity too would not be that reinforcing. Thus, the idea of training of spontaneous behaviors as per a schedule is not that problematic.

Having taken a long diversion, arguing for a case for ‘operant conditioning’ based training of neural networks, let me come to my main point.

While ‘stimulus’ and the input layer represent the external ‘situation’ that the organism is facing, the network comprising of the clocks and accumulators represent the internal state and ‘needs’ of the organism. One may even claim, a bit boldly, that they represent the goals or motivations of the organism.

A ‘eat’ clock that is about to trigger a ‘eat’ response, may represent a need to eat. This clock need not be a digital clock, and only when the 12 hour cycle is completed to the dot, an ‘eating’ act triggered. Rather, this would be a probabilistic, analog clock, with the ‘probability’ of eating response getting higher as the 12 hour cycle is coming to an end and the clock being rest, whenever the eating response happens. If the clock is in the early phases of the cycle (just after an eating response) then the need for eating (hunger) is less; when the clock is in the last phases of the cycle the hunger need is strong and would likely make the ‘eating’ action more and more probable.

Again, this response-reinforcement system need not be isolated from the stimulus-response system. Say, one sees the stimulus ‘food’, and the hunger clock is still showing ‘medium hungry’. The partial activation of the ‘eat’ action (other actions like ‘throw the food’, ignore the food, may also be activated) as a result of seeing the stimulus ‘food’ may win over other competing responses to the stimuli, as the hunger clock is still activating a medium probability of ‘hunger’ activation and hence one may end up acting ‘eat’. This however, may reset the hunger clock and now a second ‘food’ stimulus may not be able to trigger ‘eat’ response as the activation of ‘eat’ due to ‘hunger clock’ is minimal and other competing actions may win over ‘eat’.

To illustrate the interaction between stimulus-response and response-reinforcement in another way, on seeing a written word ‘hunger’ as stimulus, one consequence of that stimulus could be to manipulate the internal ‘hunger clock’ so that its need for food is increased. this would be simple operation of increasing the clock count or making the ‘need for hunger’ stronger and thus increasing the probability of occurrence of ‘eat’ action.

I’ll also like to take a leap here and equate ‘needs’ with goals and motivations. Thus, some of the most motivating factors for humans like food, sex, sleep etc can be explained in terms of underlying needs or drives (which seem to be periodic in nature) and it is also interesting to note that many of them do have cycles associated with them and we have sleep cycles or eating cycles and also the fact that many times these cycles are linked with each other or the circadian rhythm and if the clock goes haywire it has multiple linked effects affecting all the motivational ‘needs’ spectrum. In a mainc pahse one would have low needs to sleep, eat etc, while the opposite may be true in depression.

That brings me finally to Marvin Minsky and his AI attempts to code for human behavioral complexity.

In his analysis of the levels of mental activity, he starts with the traditional if, then rule and then refines it to include both situations and goals in the if part.

To me this seems intuitively appealing: One needs to take into account not only the external ‘situation’, but also the internal ‘goals’ and then come up with a set of possible actions and maybe a single action that is an outcome of the combined ‘situation’ and ‘goals’ input.

However, Minsky does not think that simple if-then rules, even when they take ‘gaols’ into consideration would suffice, so he posits if-then-result rules.

To me it is not clear how introducing a result clause makes any difference: Both goals and stimulus may lead to multiple if-then rule matches and multiple actions activation. These action activations are nothing but what Minsky has clubbed in the result clause and we still have the hard problem of given a set of clauses, how do we choose one of them over other.

Minsky has evidently thought about this and says:

What happens when your situation matches the Ifs of several different rules? Then you’ll need some way to choose among them. One policy might arrange those rules in some order of priority. Another way would be to use the rule that has worked for you most recently. Yet another way would be to choose rules probabilistically.

To me this seems not a problem of choosing which rule to use, but that of choosing which response to choose given several possible responses as a result of application of several rules to this situation/ goal combination. It is tempting to assume that the ‘needs’ or ‘gaols’ would be able to uniquely determine the response given ambiguous or competing responses to a stimulus; yet I can imagine a scenario where the ‘needs’ of the body do not provide a reliable clue and one may need the algorithms/heuristics suggested by Minsky to resolve conflicts. Thus, I see the utility of if-then-result rules: we need a representation of not only the if part (goals/ stimulus) in the rule; which tells us what is the set of possible actions that can be triggered by this stimulus/ situation/ needs combo; but also a representation of the results part of the rule: which tells us what reinforcement values these response(actions) have for us and use this value-response association to resolve the conflict and choose one response over the other. This response-value association seems very much like the operant-reinforcement association, so I am tempted once more to believe that the value one ascribes to a response may change with bodily needs and rather is reflective of bodily needs, but I’ll leave that assumption for now and instead assume that somehow we do have different priorities assigned to the responses ( and not rules as Minsky had originally proposed) and do the selection on the basis of those priorities.

Though I have posited a single priority-based probabilistic selection of response, it is possible that a variety of selection mechanisms and algorithms are used and are activated selectively based on the problem at hand.

This brings me to the critic-selector model of mind by Minsky. As per this model, one needs both critical thinking and problem solving abilities to act adaptively. One need not just be good at solving problems- one also has to to understand and frame the right problem and then use the problem solving approach that is best suited to the problem.

Thus, the first task is to recognize a problem type correctly. After recognising a problem correctly, we may apply different selctors or problem solving strategies to different problems.

He also posits that most of our problem solving is analogical and not logical. Thus, the recognizing problem is more like recognizing a past analogical problem; and the selecting is then applying the methods that worked in that case onto this problem.

How does that relate to our discussions of behavioral flexibility? I believe that every time we are presented with a stimulus or have to decide how to behave in response to that stimulus, we are faced with a problem- that of choosing one response over all others. We need to activate a selection mechanism and that selection mechanism may differ based on the critics we have used to define the problem. If the selection mechanism was fixed and hard-wired then we wont have the behavioral flexibility. Because the selection mechanism may differ based on our framing of the problem in terms of the appropriate critics, hence our behavioral response may be varied and flexible. At times, we may use the selector that takes into account only the priorities of different responses in terms of the needs of the body; at other times the selector may be guided by different selection mechanisms that involve emotions and values us the driving factors.

Minsky has also built a hierarchy of critics-selector associations and I will discuss them in the context of developmental unfolding in a subsequent post. For now, it is sufficient to note that different types of selection mechanisms would be required to narrow the response set, under different critical appraisal of the initial problem.

To recap, a stimulus may trigger different responses simultaneously and a selection mechanism would be involved that would select the appropriate response based on the values associated with the response and the selection algorithm that has been activated based on our appraisal of the reason for conflicting and competing responses. while critics help us formulate the reason for multiple responses to the same stimuli, the selector helps us to apply different selection strategies to the response set, based on what selection strategy had worked on an earlier problem that involved analogous critics.

One can further dissociate this into two processes: one that is grammar-based, syntactical and uses the rules for generating a valid behavioral action based on the critic and selector predicates and the particular response sets and strategies that make up the critic and selector clause respectively. By combining and recombining the different critics and selectors one can make an infinite rules of how to respond to a given situation. Each such rule application may potentially lead to different action. The other process is that of semantics and how the critics are mapped onto the response sets and how selectors are mapped onto different value preferences.

Returning back to the response selection, given a stimulus, clearly there are two processes at work : one that uses the stored if-then rules (the stimulus-response associations) to make available to us a set of all actions that are a valid response to the situation; and the other that uses the then-result rules (and the response-value associations, that I believe are dynamic in nature and keep changing) to choose one of the response from that set as per the ‘subjective’ value that it prefers at the moment. This may be the foundation for the ‘memory’ and ‘attention’ dissociations in working memory abilities used in stroop task and it it tempting to think that the while DLPFC and the executive centers determine the set of all possible actions (utilizing memory) given a particular situation, the ACC selects the competing responses based on the values associated and by selectively directing attention to the selected response/stimuli/rule.

Also, it seems evident that one way to increase adaptive responses would be to become proficient in discriminating stimuli and perceiving the subjective world accurately; the other way would be to become more and more proficient in directing attention to a particular stimulus/ response over others and directing attention to our internal representations of them so that we can discriminate between the different responses that are available and choose between them based on an accurate assessment of our current needs/ goals.

This takes me finally to the two types of consciousness that Hughlings-Jackson had proposed: subject consciousness and object consciousness.

Using his ideas of sensorimotor function, Hughlings-Jackson described two “halves” of consciousness, a subject half (representations of sensory function) and an object half (representations of motor function). To describe subject consciousness, he used the example of sensory representations when visualizing an object . The object is initially perceived at all sensory levels. This produced a sensory representation of the object at all sensory levels. The next day, one can think of the object and have a mental idea of it, without actually seeing the object. This mental representation is the sensory or subject consciousness for the object, based on the stored sensory information of the initial perception of it.

What enables one to think of the object? This is the other half of consciousness, the motor side of consciousness, which Hughlings-Jackson termed “object consciousness.” Object consciousness is the faculty of “calling up” mental images into consciousness, the mental ability to direct attention to aspects of subject consciousness. Hughlings-Jackson related subject and object consciousness as follows:

The substrata of consciousness are double, as we might infer from the physical duality and separateness of the highest nervous centres. The more correct expression is that there are two extremes. At the one extreme the substrata serve in subject consciousness. But it is convenient to use the word “double.”

Hughlings-Jackson saw the two halves of consciousness as constantly interacting with each other, the subjective half providing a store of mental representations of information that the objective half used to interact with the environment.


The term “subjective” answers to what is physically the effect of the environment on the organism; the term “objective” to what is physically the reacting of the organism on the environment.

Hughlings-Jackson’s concept of subjective consciousness is akin to the if-then representation of mental rules.One needs to perceive the stimuli as clearly as possible and to represent them along with their associated actions so that an appropriate response set can be activated to respond to the environment. His object consciousness is the attentional mechanism that is needed to narrow down the options and focus on those mental representations and responses that are to be selected and used for interacting with the environment.

As per him, subject and object consciousness arise form a need to represent the sensations (stimuli) and movements (responses) respectively and this need is apparent if our stimulus-response and response-reinforcement mappings have to be taken into account for determining appropriate action.

All nervous centres represent or re-represent impressions and movements. The highest centres are those which form the anatomical substrata of consciousness, and they differ from the lower centres in compound degree only. They represent over again, but in more numerous combinations, in greater complexity, specialty, and multiplicity of associations, the very same impressions and movements which the lower, and through them the lowest, centres represent.

He had postulated that temporal lobe epilepsy involves a loss in objective consciousness (leading to automatic movements as opposed to voluntary movements that are as per a schedule and do not happen continuously) and a increase in subjective consciousness ( leading to feelings like deja-vu or over-consciousness in which every stimuli seems familiar and triggers the same response set and nothing seems novel – the dreamy state). These he described as the positive and negative symptoms or deficits associated with an epileptic episode.

It is interesting to note that one of the positive symptom he describes of epilepsy, that is associated with subjective consciousness of third degree, is ‘Mania’ : the same label that Minsky uses for a Critic in his sixth self-consciousness thinking level of thinking. The critic Minsky lists is :

Self-Conscious Critics. Some assessments may even affect one’s current image of oneself, and this can affect one’s overall state:

None of my goals seem valuable. (Depression.)
I’m losing track of what I am doing. (Confusion.)

I can achieve any goal I like! (Mania.)
I could lose my job if I fail at this. (Anxiety.)

Would my friends approve of this? (Insecurity.)

Interesting to note that this Critic or subjective appraisal of the problem in terms of Mania can lead to a subjective consciousness that is characterized as Mania.

If Hughlings-Jackson has been able to study epilepsy correctly and has been able to make some valid inferences, then this may tell us a lot about how we respond flexibly to novel/ familiar situations and how the internal complexity that is required to ensure flexible behavior, leads to representational needs in brain, that might lead to the necessity of consciousness.

Incongruence perception and linguistic specificity: a case for a non-verbal stroop test

In a follow up to my last post on color memory and how it affects actual color perception, I would like to highlight a classical psychological study by Bruner and Postaman, that showed that even for non-natural artifacts like suits in a playing card deck, our expectation of the normal color or shape of a suit, affects our perception of a stimuli that is incongruent to our expectations.

In a nutshell, in this study incongruent stimuli like a red spade card or a black heart card was presented for brief durations and the subjects asked to identify the stimuli completely – the form or shape (heart/spade/club/diamond), the color (red/black) and the number( 1..10…face cards were not used) of the stimuli.

The trial used both congruent ( for eg a red heart, a black club) as well as incongruent stimuli (a black heart, a red spade).

To me this appears to be a form of stroop task , in which, if one assumes that form is a more salient stimulus than color, then a presentation of a spade figure would automatically activate the black color perception and the prepotent color naming response would be black, despite the fact that the spade was presented in red color. This prepotent ‘black’ verbal response would, as per standard stroop effect explanations, be inhibited for the successful ‘red’ verbal response to happen. I am making an analogy here that the form of a suit is equivalent to the linguistic color-term and that this triggers a prepotent response.

In these lights, the results of the experiment do seem to suggest a stroop effect in this playing-deck task, with subjects taking more trials to recognize incongruent stimuli as compared to congruent stimuli.

Perhaps the most central finding is that the recognition threshold for the incongruous playing cards (whose with suit and color reversed) is significantly higher than the threshold for normal cards. While normal cards on the average were recognized correctly — here defined as a correct response followed by a second correct response — at 28 milliseconds, the incongruous cards required 114 milliseconds. The difference, representing a fourfold increase in threshold, is highly significant statistically, t being 3.76 (confidence level < .01).

Further interesting is the fact that this incongruence threshold decreases if one or more incongruent trials precede the incongruent trial in question; or increases if the preceding trials are with normal cards. This is inline with current theories of stroop effect as involving both memory and attention, whereby the active maintenance of the goal (ignore form and focus on color while naming color) affects performance on all trials and also affects the errors , while the attentional mechanism to resolve incongruence affects only reaction times (and leads to RT interference).

As in the playing card study, no reaction time measures were taken, but only the threshold reached to correctly recognize the stimuli were used, so we don’t have any RT measures, but a big threshold is indicative of and roughly equal to an error on a trial. The higher thresholds on incongruent trial means that the errors on incongruent trial were more than on congruent trials. The increase in threshold , when normal card precede and a decrease when incongruent cards precede is analogous to the high-congruency and low-congruency trials described in Kane and Engel study and analyzed in my previous posts as well as in a Developing Intelligence post. It is intuitive to note that when incongruent trials precede, then the goal (ignore form and focus on color while naming color) becomes more salient; when normal cards precede one may have RT facilitation and the (implicit) goal to ignore color may become less salient.

Experience with an incongruity is effective in so far as it modifies the set of the subject to prepare him for incongruity. To take an example, the threshold recognition time for incongruous cards presented before the subject has had anything else in the tachistoscope — normal or incongruous — is 360 milliseconds. If he has had experience in the recognition of one or more normal cards before being presented an incongruous stimulus, the threshold rises slightly but insignificantly to 420 milliseconds. Prior experience with normal cards does not lead to better recognition performance with incongruous cards (see attached Table ). If, however, an observer has had to recognize one incongruous card, the threshold for the next trick card he is presented drops to 230 milliseconds. And if, finally, the incongruous card comes after experience with two or three previously exposed trick cards, threshold drops still further to 84 milliseconds.

Thus clearly the goal maintenance part of stroop effect is clearly in play in the playing-card task and affects the threshold for correct recognition.

The second part of explanation of stroop task is usually based on directed inhibition and an attentional process that inhibits the perpotent response. This effect comes into play only on incongruent trials. An alternate explanation is that their is increased competition of competing representations on incongruent trials and instead of any top-down directed inhibition, inline with the goal/expectation, their is only localized inhibition. The dissociation of a top-down goal maintenance mechanism ad another attentional selection mechanism seems to be more inline with the new model, wherein inhibition is local and not top-directed.

While RT measures are not available it is intersecting to take a look at some of the qualitative data that supports a local inhibition and attentional mechanism involved in reacting to incongruent stimuli. The authors present evidence that the normal course of responses that are generated by the subjects for (incongruent) stimuli is dominance, compromise, disruption and finally recognition.

Generally speaking, there appear to be four kinds of reaction to rapidly presented incongruities. The first of these we have called the dominance reaction. It consists, essentially, of a “perceptual denial” of the incongruous elements in the stimulus pattern. Faced with a red six of spades, for example, a subject may report with considerable assurance, “the six of spades” or the “six of hearts,” depending upon whether he is color or form bound (vide infra). In the one case the form dominates and the color is assimilated to it; in the other the stimulus color dominates and form is assimilated to it. In both instances the perceptual resultant conforms with past expectations about the “normal” nature of playing cards.

A second technique of dealing with incongruous stimuli we have called compromise. In the language of Egon Brunswik , it is the perception of a Zwischengegenstand or compromise object which composes the potential conflict between two or more perceptual intentions. Three examples of color compromise: (a) the red six of spades is reported as either the purple six of hearts or the purple six of spades; (b) the black four of hearts is reported as a “grayish” four of spades; (c) the red six of clubs is seen as “the six of clubs illuminated by red light.”

A third reaction may be called disruption. A subject fails to achieve a perceptual organization at the level of coherence normally attained by him at a given exposure level. Disruption usually follows upon a period in which the subject has failed to resolve the stimulus in terms of his available perceptual expectations. He has failed to confirm any of his repertory of expectancies. Its expression tends to be somewhat bizarre: “I don’t know what the hell it is now, not even for sure whether it’s a playing card,” said one frustrated subject after an exposure well above his normal threshold.

Finally, there is recognition of incongruity, the fourth, and viewed from the experimenter’s chair, most successful reaction. It too is marked by some interesting psychological by-products, of which more in the proper place.

This sequence points towards a local inhibition mechanism in which either one of the responses is selected and dominates the other; or both the responses mix and yield to give a mixed percept —this is why a gray banana may appear yellowish—or why a banana matched to gray background by subjects may actually be made bluish—as that of a blackish red perception of suit color; or in some cases there may be frustration when the incongruent stimuli cannot be adequately reconciled with expectations- leading to disruption- in the classical stroop task this may explain the skew in RT for some incongruent trials—-some take a lot of time as maybe one has just suffered from disruption—; and finally one may respond correctly but only after a reasonable delay. This sequence is difficult to explain in terms of top-down expectation model and directed inhibition.

Finally, although we have been discussing the playing card task in terms of stroop effect, one obvious difference is striking. In the playing cards and t e pink-banana experiments the colors and forms or objects are tightly coupled- we have normally only seen a yellow banana or a red heart suit. This is not so for the printed grapheme and linguistic color terms- we have viewed then in all colors , mostly in black/gray- but the string hue association that we still have with those colors is on a supposedly higher layer of abstraction.

Thus, when an incongruent stimuli like a red heart is presented , then any of the features of the object may take prominence and induce incongruence in the other feature. For eg, we may give more salience to form and identity it as a black spade; alternately we may identify the object using color and perceive incongruence in shape- thus we may identify it as a red spade. Interestingly, both kind of errors were observed in the Bruner study. Till date, one hast not really focussed on the reverse stroop test- whereby one asks people to name the color word and ignore the actual color- this seems to be an easy task as the linguistic grapheme are not tied to any color in particular- the only exception being black hue which might be reasonably said to be associated with all grapheme (it is the most popular ink). Consistent with this, in this reverse stroop test, sometimes subjects may respond ‘black’ when watching a ‘red’ linguistic term in black ink-color. This effect would be for ‘black’ word response and black ink-color only and for no other ink color. Also, the response time for ‘black’ response may be facilitated when the ink-color is black (and the linguistic term is also ‘black’) compared to other ink-colors and other color-terms. No one has conducted such an experiment, but one can experiment and see if there is a small stroop effect involved here in the reverse direction too.

Also, another important question of prime concern is whether the stroop interference in both cases, the normal stroop test, and the playing card test, is due to a similar underlying mechanism, whereby due to past sensory (in case of playing cards) or semantic associations (in case of linguistic color terms) the color terms or forms (bananas/ suits) get associated with a hue and seeing that stimulus feature automatically activates a sensory or semantic activation of the corresponding hue. This prepotent response then competes with the response that is triggered by the actual hue of the presented stimulus and this leads to local inhibition and selection leading to stroop interference effects.

If the results of the non-verbal stroop test, comprising of natural or man-made objects, with strong color associations associated with them, results in similar results as observed in the classical stroop test, then this may be a strong argument for domain-general associationist/ connectionist models of language semantics and imply that linguistic specificity may be over hyped and at least the semantics part of language acquisition, is mostly a domain general process. On the other hand, dissimilar results on non-verbal stroop tests form the normal stroop test, may indicate that the binding of features in objects during perception; and the binding of abstract meaning to linguistic words in a language have different underlying mechanisms and their is much room for linguistic specificity. Otherwise, it is apparent that the binding of abstract meaning to terms is different a problem from that of binding of different visual features to represent and perceive an object. One may use methods and results from one field and apply them in the other.

To me this seems extremely interesting and promising. The evidence that stroop test is due to two processes – one attentional and the other goal maintenance/ memory mediated – and its replication in a non-verbal stroop tests, would essentially help us a lot by focusing research on common cognitive mechanisms underlying working memory – one dependent on memory of past associations and their active maintenance- whether verbal/abstract or visual/sensory- and the other dependent on a real-time resolution of incongruity/ambiguity by focusing attention on one response to the exclusion of the other. This may well correspond to the Gc and Gf measures of intelligence. One reflecting how good we are at handling and using existing knowledge; the other how good we are able to take into account new information and respond to novel situations. One may even extend this to the two dissociated memory mechanisms that have been observed in parahippocampal regions- one used when encountering familiar situations/stimuli and the other when encountering novel stimuli. One essentially a process of assimilation as per existing schema/ conceptual metaphors; the other a process of accommodation, involving perhaps, an appreciation/formation of novel metaphors and constructs.

Enough theorizing and speculations for now. Maybe I should act on this and make an online non-verbal stroop test instead to test my theories!

Endgame: Another interesting twist to the playing cards experiment could be in terms of motivated perception. Mixing Memory discusses another classical study by Bruner in this regard. Suppose that we manipulate motivations of people so that they are either expecting to see a heart or a red color as the next stimuli- because only this desired stimuli would yield them a desired outcome, say, orange juice; then in this case when presented with an incongruent stimuli – a red spade- would we be able to differentially manipulate the resolution of incongruence; that is those motivated to see red would report seeing a ‘red spade’ and those motivated to see a heart would report a ‘black heart’ . Or is the effect modality specific with effects on color more salient than on form. Is it easier to see a different color than it is to see a different form? And is this related to the modality specific Sham’s visual illusion that has asymmetry in the sense that two beeps, one flash leads to perception of two flashes easily but not vice versa.

Gender bias in Math skills : a case of Traits Vs. Environment/Effort feedback?

A recent news article reports on a study that demonstrates that the gender bias in Math abilities may be due to environmental and cultural effects – specifically as a result of the negative self- perception garnered by the activation of the negative stereotype of women as having grossly inferior mathematical abilities than men.

The experiment involved giving 220 female study participants bogus scientific explanations for alleged sex differences in math and then having them write math tests. Those who were given a ‘nature’ explanation – that women have differential genetic composition than men and the cause of their low maths abilities was genetic and gender based – performed poorly on the Math tests compared to the group that was told that their math skills depended on how they were raised and were given a ‘nurture’ explanation and an experiential account of the sex differences such as math teachers treating boys preferentially during the first years of math education.

In the control condition some females were told that no sex differences exist while another group was reminded (primed) of the stereotype about female math under-achievement.

The worst performance was for genetic explanation females, followed by ‘stereotype primed’ females. Those who were given an experiential explanation performed as well or better than the control group that received the feedback that there were no sex differences in Math abilities.

While the authors analyze and explain the results in terms of the ‘Stereotype theory’ – that genetic explanations lead to more negative stereotypes and that activation of the negative stereotype affects performance- a more parsimonious explanations is that the differences can be explained by the same differential outcomes that are observed in people who have a genetic or trait-like versus an effort-driven or skill-like view of abilities. I have discussed previously how these differential view of abilities may develop and the experiment above has just the right conditions to induce such a differential view.

Those who were given a genetic explanation of sex differences in math abilities, may have formed a trait-like view of Math ability and were prone to see the ability as stable, genetic and immutable. This is the same view of math ability that would be formed if they had been given generic feedback – like “you are a math prodigy”.

Those who had been given experiential explanations of sex differences would have been more prone to form a skill-like view of math abilities and assume that the ability could be improved and honed based on environmental inputs like proper teaching, guidance, strategy or efforts. This would have been the case if they had been given ‘specific’ feedback – like “you solved this math problem very well this time”.

It is evident that a large part of the difference in the math test results observed in genetic vs experiential explanation conditions can be explained by the different view about math abilities that these experiments had induced. Those who were having the trait-like view of math ability would get frustrated while tackling a difficult problem and would be less resilient and effort-full while tackling the latter, more easy, problem on the test; as they would have formed a negative self-perception as one who has little mathematical talent. On the other hand, those who had been induced to form a skill-like view of math ability, would have been more resilient and effort-full when tackling latter problems, despite some early failures, as a failure would not have led to a resigned sate of mind, but would have only resulted in a belief that the strategies or effort or earlier training had not been sufficient to solve the particular problem.

It is not my contention that negative stereotype activation has no role to play- priming with stereotype words does lead to measurable effects on performance – but in this case, even if the stereotype activation is involved, the stereotype may be instrumental in activating the differential view of mathematical abilities and its effects mediated by the effects that such views have on test performances.

Belief about Intelligence : how it affects performance and how it is formed

Affective Teaching keeps posting some interesting basic cognitive tutorials and their latest one deals with the different concepts people have regarding intelligence and how that affects performance and attitudes.

As per that tutorial, people can either have fixed (entity) or trait-like view of intelligence/ abilities or a changeable (incremental) or skill-like view of intelligence/ abilities. Interestingly, those with fixed view are more prone to learned helplessness, external locus of control, less persistence and lack of use of learning strategies. On the other hand those with changeable view of intelligence are more persistent, having a mastery goal or orientation and apt to use learning strategies and credit success to effort and strategy.

This same difference in attitudes and outcomes was predicted by my recent blog post where I analyzed the differential effects of providing generic (person based) versus specific (outcome based) feedback and praise. It was surmised that this would lead to differential view of intelligence/abilities as being trait-like or skill-like in nature. It is heartening to note that existing research supports such a differentiation in the conceptualization of intelligence by individuals and also predicts accurately the different outcomes based on different underlying conceptualizations.

It should thus be clear that providing the right sort of feedback to the child is very important so that they hook on to the right conceptualization of intelligence early on. This may also go long way in settling the expertise debate: genius have a mastery orientation and an incremental view of intelligence which is different form the normal trait-like view held by most people. Thus, it is not just the case that that they are either more talented or just better learners (although they are both) ; they also have a different attitude- and a different underlying concept of intelligence/ability- which is very much a result of the environmental feedback they received in childhood ans is instrumental in making them what they are.

Is low IQ the cause of income inequality and low life expectancy or is it the other way round?

As per this post from the BPS research digest, Kanazawa of LSE has made a controversial claim that economic inequality is not the cause of low life expectancy, but that both low life expectancy and economic inequality are a result of the low IQ of the poor people. The self-righteous reasoning is that people with low IQ are not able to adapt successfully to the stresses presented by modern civilization and hence perish. He thinks he has data on his side when he claims that IQ is eight times more strongly related to life expectancy, than is socioeconomic status. What he forgets to mention(or deliberately ignores) is growing evidence that IQ is very much determinant on the socioeconomic environment of its full flowering and a low IQ is because of two components- a low genetic IQ of parent plus a stunted growth of IQ/intelligence due to impoverished environment available because of the low socio-economic status of the parents.

A series of studies that I have discussed earlier, clearly indicate that in the absence of good socioeconomic conditions, IQ can be stunted by as large as 20 IQ points. Also discussed there, is the fact that the modern civilization as a whole has been successful in archiving the sate of socioeconomic prosperity that is sufficient for the full flowering of inherent genetic IQ of a child and as such the increments in IQ as we progress in years and achieve more and more prosperity (the Flynn effect) has started to become less prominent. This fact also explains the Kanazawa finding that in ‘uncivilized’ sub-Saharan countries the IQ is not related to life expectancy, but socio-economic status is. although, he puts his own spin on this data, a more parsimonious ( and accurate) reason for this is that in the sub-Saharan countries, even the well -of don’t have the proper socio-economic conditions necessary for the full flowering of IQ and thus the IQ of both the well-off and poor parents in these countries is stunted equally. Thus, the well-off (which are not really that well-off in comparison to their counterparts in the western countries) are not able to be in any more advantageous position (with respect to IQ) than the poor in these countries. The resultant life expectancy effect is thus limited to that directly due to economic inequality and the IQ mediated effect of economic inequality is not visible.

What Kanazawa deduces from the same data and how he chooses to present these findings just goes on to show the self-righteous WASP attitude that many of the economists assume. After reading Freakonomics, and discovering how the authors twist facts and present statistics in a biased manner to push their idiosyncratic theories and agendas, it hardly seems surprising that another economist has resorted to similar dishonest tactics – shocking people by supposedly providing hard data to prove how conventional wisdom is wrong. Surprisingly, his own highlighting of sub-Saharan counties data that shows that life-expectancy is highly dependent on socio-economic conditions in these countries is highly suggestive of the fact that in cultures where the effects og economic inequality are not mediated via the IQ effects, economic inequality is the strongest predictor of low life expectancy.

Instead of just blaming the people for their genes/ stupidity, it would be better to address the reasons that lead to low IQs and when they are tackled, directly address the social inequality problem , as in the author’s own findings, when IQ is not to blame for the low life expectancy, the blame falls squarely on economic inequality (as in the sub-Saharan countries data) .

IQ variations across time and space : the why and wherefore?

Mind Hacks has two posts on IQ: one focusing on IQ variants across time and discussing Flynn effect and the other focusing on variation across space (different population groups!) and discussing variation in IQ of identical and fraternal twins and taking help of adoption studies with special focus on economic background of biological and adopted parents.

I’ll discuss the second posting first which is based on this NYT article.

This article mentions a few observations based on meta analysis of data related to twin studies and also a study of adopted children -raised either in environments (adopted homes) that are of same socio-economic status as that of their biological parents or in different socio-economic environments.

Some of these observations are (first six are from adoption studies and the seventh is from twin studies. :

  1. Children of well-off biological parents reared by poor/well -off adopted parents have Average IQ about 16 point higher than children of poor biological parents
  2. Children of well-off biological parents reared by well-off adopted parents had average IQ scores of 119.6
  3. Children of well-off biological parents reared by poor adopted parents had average IQ scores of 107.5 – 12 points lower
  4. Children of poor biological parents reared by well-off adopted parents had average IQ scores of 103.6
  5. Children of poor biological parents reared by well-off adopted parents had average IQ scores of 92.4
  6. In another study, the average I.Q. scores of youngsters (from an orphanage at ages 4-6)placed in well-to-do homes climbed more than 20 points, to 98 – a jump from borderline retardation to a whisker below average , when measured after 9 years of placement in the well-off home. That is a huge difference – a person with an I.Q. of couldn’t explain the rules of baseball, while an individual with a 98 I.Q. could actually manage a baseball team – and it can only be explained by pointing to variations in family circumstances.
  7. In a meta-analysis, it was found, that among the poorest families , for those twins raised in the poor families, the I.Q.’s of identical twins vary just as much as the I.Q’s of fraternal twins; while in rich families the IQ’s of Identical twins are more identical than is the case for the IQ differences in fraternal twins.

First let us discuss the Twin studies (observation 7). If some trait A is found to co-occur say 80% of the times in identical twins (which have identical genotype) that are raised apart and if the same trait A is only found to vary 40 % of the times in fraternal twins (that have only 50 % of genes in common) that are raised apart; then one can conclude that this trait A is highly heritable and genetics dependent, with environmental influence limited to say affecting only say 20 % variation in the trait.

The premise is that if it is conclusively proved that if two organisms (identical twins) which contain more similar genes (double the number in comparison to fraternal twins) than a control pair of organisms (the fraternal twins); and effect of environment is subtracted (by letting the two organisms live in dissimilar environments – one adopted, while the other in biological home atmosphere); and if it is found that some trait A is found to concur more in these organisms (identical twins) compared to the control pair (the fraternal twins), then that trait must have a genetic component and is heavily influenced by genetic factors as opposed to environmental factors. So far so good.

In the normal twin studies, the adopted twin generally belongs to the same socio-economic status as the one reared by the biological parents.

The normal observation that identical twins belonging to well-off/middle class families have IQ rates similar as compared to fraternal twins, thus indicates that for children from well-off background (biological/adopted), the IQ (observed phenotype) is mostly due to genetic factors (underlying genotype) and environmental factors are not a big determinant.

The paradoxical observation that identical twins belonging to poor families have IQ rates as varying as compared to fraternal twins, should indicate that for children from poor background (biological/adopted), the IQ (observed phenotype) is mostly due to environmental factors and genetic factors (the underlying genotype ) are not a big determinant.

How do we conciliate the two observations. The paradox becomes a non-issue when one shifts focus from either-or thinking in terms of gene-environment influences and moves towards an interactionist view point viz. Nature via nurture as outlined by Matt Riddley amongst others and using genotype-phenotype distinctions. As per this viewpoint, any genotype is a potentiality and only if proper environmental factors are available can it lead to the desired (adaptive) phenotype. In absence of the required environmental factors, the genotype may not lead to the phenotype or may lead to sub-optimal phenotype expression. In a typical example, a fish may not show the color that the genotype codes for, if the environment under which it is developing provides little incentive to exhibit that color for reproductive/survival fitness. In less dramatic example, one may have genotype for having a more than average height, but if proper nutrition during a critical phase of development is not provided, then that height may not be exhibited.

Returning back to our discussion, it is apparent that IQ , though highly heritable ( and being genotype based), remains as a potentiality and only if environmental factors ranging from nutrition to socio-economic factors resulting in environmental influences like number of words exposed during childhood, results in appropriate IQ scores and intelligence (observed phenotype) only when such environmental influences as measured by socio-economic atmosphere during childhood are present during critical stages of development. Thus, while children of and raised by well-off parents could have a high correlation between genotype and phenotype ( and thus show high correlation in IQ across identical twins vis-a-vis fraternal twins), the same would not be true for children of poor parents where environmental factors will limit the observed IQ scores ) and thus, though the genotype of identical twins is similar than fraternal in this case too, the variation would be greater as the genetic influence has been subdued by environmental (negative) influence.

Now to explain the first observation, viz. that children of biological parents have average IQ higher than children of poor/working parents can be explained by the fact, that as a group, the well-off parents would have higher IQ than poor/working parents- as intelligence would be one of the major factors predicting who would be well-off and who would be poor in a fair world. Thus, it is no surprise that their children, would also have higher intelligence as compared to poor children- as the rich parent’s child would on an average get better IQ genes than a poor children would get from its poor (and less IQ) parent.

The observations 2 and 3 taken together corroborate the fact that IQ flowers only under the right environment. When 2 child start with similar average IQ potentialities (as they are from well-off parents), they nevertheless end with different final exhibition of intelligence (as measured by IQ scores) based on the limiting influence of environment on the genes.

The observations 4 and 5 taken together yield to similar interpretations.

The observation 6 is a stark example of how providing a proper environment can lead to drastic improvements in the exhibited phenotype and lead to the phenotype attaining the maximum potentiality present in its genotype.

It is clear that affirmative action is needed to ensure that environmental influences do not lead to sub-optimal flowering of intelligence. These affirmative actions should be based on reducing poverty and focused on that alone. Other options like Mandal commission reservation of jobs (after the child has already got a sub-optimal IQ due to early socio-economic environment) are clearly counter-productive and unfair. Poverty is the only evil to be tackled.

Returning to the first post on Mind Hacks related to Flynn effect,based on this American Scientist article. To me, it seems apparent, that biological evolution is very slow in comparison to social and environmental evolution that we humans have managed to achieve. I believe that based on our current genotypes for intelligence, we have achieved a plateau in terms of providing the maximal socio-economic environmental conditions necessary for full flowering of intelligence. Thus, we seem to be reaching a plateau in terms of increases in IQ score from one generation to the other. The Flynn Effect, in my opinion, was not a change in genotype, but in exhibited phenotypes, due to availability of proper environmental conditions.

For IQ to change within generations due to underlying change in genotype is assuming heavy and continuous selection pressure on those genes responsible for IQ. I believe that IQ (and intelligence) would keep on improving, as it may be part of runaway selection due to other-sex mate preference (reproductive advantage) – like that of evolution of beauty – or peacock’s tail – or it may keep evolving as intelligence does confer survival advantage too, but such increases would not be as dramatically observable as the Flynn effect.