Category Archives: learning

Questions make learning easier

It is a well proven fact that answering questions related to the study material you have just learned makes learning easier and more durable. Although I know that readers of this blog are already very learned, without offense to anyone I would like to introduce new feature of this blog: the weekly question. This question will be visible on the left sidebar (at the top) and is actually implemented using the poll feature of the google blogger. This question would be related to the area that I cover in my blog posts for that week.

I encourage all of you to take the poll/answer the question. The correct answer with explanation would be posted , every Friday. This week’s question is related to Altruism as I have already covered two studies on that topic.I also encourage readers to contribute questions, that I can put on my blog. My mail id is sandygautamATyahooDOTcom.


Theories of Intelligence : Entity Vs Incremental theory

I have blogged previously about Carol Dweck’s work on how beliefs about intelligence affect performance outcomes. A new paper from her lab demonstrates how having a fixed or entity like belief of intelligence (talent based) leads to poorer academic achievement as compared to students who have a incremental or malleable concept of intelligence (effort and skill based). I’ll let the authors themselves describe the two frameworks:

In this model , students may hold different ‘‘theories’’ about the nature of intelligence. Some believe that intelligence is more of an unchangeable, fixed ‘‘entity’’ (an entity theory). Others think of intelligence as a malleable quality that can be developed (an incremental theory). Research has shown that, even when students on both ends of the continuum show equal intellectual ability, their theories of intelligence shape their responses to academic challenge. For those endorsing more of an entity theory, the belief in a fixed, uncontrollable intelligence ‘a ‘‘thing’’ they have a lot or a little of’ orients them toward measuring that ability and giving up or withdrawing effort if the verdict seems negative. In contrast, the belief that ability can be developed through their effort orients those endorsing a more incremental theory toward challenging tasks that promote skill acquisition and toward using effort to overcome difficulty.

Relative to entity theorists, incremental theorists have been found (a) to focus more on learning goals (goals aimed at increasing their ability) versus performance goals (goals aimed at documenting their ability; (b) to believe in the utility of effort versus the futility of effort given difficulty or low ability (c) to make low-effort, mastery-oriented versus low-ability, helpless attributions for failure and (d) to display mastery-oriented strategies (effort escalation or strategy change) versus helpless strategies (effort withdrawal or strategy perseveration) in the face of setbacks. Thus, these two ways of thinking about intelligence are associated with two distinct frameworks, or ‘‘meaning systems’’ , that can have important consequences for students who are facing a sustained challenge at a critical point in their lives. It is important to recognize that believing intelligence to be malleable does not imply that everyone has exactly the same potential in every domain, or will learn everything with equal ease. Rather, it means that for any given individual, intellectual ability can always be further developed.

The paper presents two studies. In the first study young children entering 7th grade were measured on their theories of intelligences as well as assessed on different motivational factors. Their performance for a couple of years was monitored and the data was analysed to find the relationships between theory of intelligences and performance outcomes and also to determine the mediating motivational factors . The results are as follows :

The process model suggests multiple mediational pathways. That is, it suggests that

(a) learning goals mediate the relation between incremental theory and positive strategies,
(b) positive strategies mediate the relation between learning goals and increasing grades,
(c) effort beliefs mediate the relation between incremental theory and helpless attributions,
(d) effort beliefs mediate the relation between incremental theory and positive strategies,
(e) helpless attributions mediate the relation between effort beliefs and positive strategies,
(f) positive strategies mediate the relation between effort beliefs and increasing grades, and
(g) positive strategies mediate the relation between helpless attributions and increasing grades.

The second study involved an experimental intervention based approach. Those students who had declining grades were divided in two groups- an experimental one which got interventions that endowed them with a malleable and incremental theory of intelligence and a control group. This study found that grades improved for those in the experimental condition. Overall quite a cool research paradigm which has the tremendous potential to affect education as well as achievement outside of academics.

Praise: how to hand it and when to hand it

The traditional press seems to be catching up. The New York Magazine has an article on how praising children for their innate intelligence can backfire, but praising them for their efforts can be redeeming. We, at the Mouse Trap, have already covered the studies of Prof Carol Dwecke here, here and here and had come to the same conclusion that giving positive, specific and outcome based praise is better than giving general and innate/ trait/ talent based praises.

Much of the literature on praise that the New York Magazine author discounts and dismisses, needs to be reviewed with the praise-is-specific vs praise-was-for-talent variable taken into account. Throwing praise out with the ‘talent’ myth would be throwing the baby out with the bathtub. So the only quibble I have with the article is the leaning towards the elimination of all praise for children, a quibble in common with Mind Hacks through which I discovered this article.

The Mouse is dreaming that it is in a Trap!!

New research has established that mice dream and during their sleep there is a two-way dialog between the hippocampal recent day memory area and the neo-cortex that is believed to be involved in long-term memory.

The content of the mice dream is also no longer secret. In the sleep they are replaying the sequence of steps that they had executed in a maze, but in a reverse order, and in lesser time and in general are rehearsing the structure of the maze (the mouse trap). Learning, it is to be remembered, arises from these replays of fast rewinds and sleep it seems is necessary for learning.

Some quotes from the article:

During nondreaming sleep, the neurons of both the hippocampus and the neocortex replayed memories — in repeated simultaneous bursts of electrical activity — of a task the rat learned the previous day.

Earlier this year Dr. Wilson reported that after running a maze, rats would replay their route during idle moments, as if to consolidate the memory, although the replay, surprisingly, was in reverse order of travel. These fast rewinds lasted a small fraction of the actual time spent on the journey.
In the findings reported today, the M.I.T. researchers say they detected the same replays occurring in the neocortex as well as in the hippocampus as the rats slept.

The rewinds appeared as components of repeated cycles of neural activity, each of which lasted just under a second. Because the cycles in the hippocampus and neocortex were synchronized, they seemed to be part of a dialogue between the two regions.

Because the fast rewinds in the neocortex tended to occur fractionally sooner than their counterparts in the hippocampus, the dialogue is probably being initiated by the neocortex, and reflects a querying of the hippocampus’s raw memory data, Dr. Wilson said.

“The neocortex is essentially asking the hippocampus to replay events that contain a certain image, place or sound,” he said. “The neocortex is trying to make sense of what is going on in the hippocampus and to build models of the world, to understand how and why things happen.”

PS: My blog post has deliberately used words like ‘dream’, ‘mouse’ and ‘traps’ instead of the correct ‘sleep’, ‘rats’ and ‘mazes’: just to come up with a juicy headline!!

Neurogeneisis, learning and small-world networks

Continuing this blog’s recent focus on categorization, one possibility of how new items are classified has been hypothesized as either assimilitaion (adding the item to an existing schema in the feature space) or accomodation (addition of a new schema around the item in the feature space). We’ll leave aside the newly introduced concept of Restructuring for this particular discussion.

Schemata, it is to be remembered, are conceptualized as nothing but a named cluster in the feature space. If we become a bit more audacious, we can posit that the clustering in the feature space is mimicked by the actual clustering/ connectivity of neurons in the Hippocampus (or the appropriate semantic memory brain module), with each neuron representing a particular item- say a neuron being a Halley Barry neuron. These neurons would not be randomly distributed- they form a small-world model with local clustering and bistability. whenever a group of neurons get activated together (and also belong to a cluster or clique), we can say that the memory of that category is activated.

Further suppose that learning and memory are crucially dependent on Neurogeneisis and new learning (of concepts ) happens by insertion of a new node (neuron in the small-world network of brain) and connecting it appropriately with other neurons.

As an example consider that all face recognition cells cluster together in the brain and the concept of face is activated by simultaneous activation of all cells of this cluster. The fact that a new visual stimulus (a novel human face of a stranger) is a face is determined by calculating the stimulus features and their difference from the prototypical/ exemplar face neurons and their features. A match so determined not only enables us to say that this new stimulus is a face (as this input would activate the face clique) , but would also give us an idea of where to place a new neuron that may encode for this new face and how to connect this with other neurons and with which other neurons.

Now whenever we encounter a novel stimulus we have two possibilities. If it matches some existing cluster / category, we encode this new memory by placing a new neuron coding for this in the region of that category in the feature space and (crucially) following preferential attachment attach it in a manner such that the probability of its linking to any other neighboring neuron is in proportion of the links that old neuron already has. (This can be readily implemented in brains as axonal connections will whither if not much functional activity happens at the synapse formed between the new neuron and the older one) . This is akin to assimilation of a new memory/ learning neuron. this method of insertion still keeps the neural net a small-world network.

Now consider the second case when the novel stimuli matches no older categories but necessitates that we form a new category if we have to represent that new item in the feature space. We need accommodation here. On the neural level this is still accomplished by inserting a new neuron, but this time the new node is not peripheral- the new neuron is a hub (category) neuron. So we use the method of copy to insert the new element. We copy the links (partially) of a neighboring hub (cluster center/ category label neuron) and use that link structure to link the newly introduced neuron in the small-world network. the network still remains scale-free and we have introduced a hub or a new category in this case.

All this seems very exciting. Some snippets from wikipedia article on scale -free networks are very relevant.

The mostly widely known generative model for a subset of scale-free networks is Barabási and Albert’s (1999) rich get richer generative model in which each new Web page creates links to existent Web pages with a probability distribution which is not uniform, but proportional to the current in-degree of Web pages.

A different generative model is the copy model studied by Kumar et al. (2000), in which new nodes choose an existent node at random and copy a fraction of the links of the existent node. This also generates a power law.

Recently, Manev and Manev (Med. Hypotheses, 2005) proposed that small world networks may be operative in adult brain neurogenesis. Adult neurogenesis has been observed in mammalian brains, including those of humans, but a question remains: how do new neurons become functional in the adult brain? It is proposed that the random addition of only a few new neurons functions as a maintenance system for the brain’s “small-world” networks. Randomly added to an orderly network, new links enhance signal propagation speed and synchronizability. Newly generated neurons are ideally suited to become such links: they are immature, form more new connections compared to mature ones, and their number but not their precise location may be maintained by continuous proliferation and dying off.

I am excited, what about you?

Categoristation: how to bookmark the interesting pages on the web!

In an earlier post, I had touched upon the different categorization theories that are in prevalence. One of these that was discussed in details was the prototype Vs exemplar method that was based on clustering and involved different representational methods of the categories thus derived.

This post is about how a new item is allocated to a pre-existing category. Simplistically, and in the last post this was the position I had taken, it seems apparent that by calculating the distance of a new item in feature space from the central tendencies of the neighboring clusters (the prototypes/ exemplars) one can find a best fit with one of the clusters and allocate the new item to that category.

This is simplistic as it explains fitting of new items to existing categories, but does not include any mechanisms for formation of new categories.

The analogical approach I take here is of how do I decide in which folder to add a new bookmark of an interesting page found on the web. Most probably the names I have chose for my bookmarks folders are reflective of the central tendencies (common prominent features) of all pages bookmarked in that folder. I would normally look at the new page, and also at my existing folders and see if there is a best fit. If so I juts file the new bookmark under the best-fit existing folder. Slightly extending the concept of categorization to resemble that of a schema, this is the classical case of assimilation in a schema.

However, in case the new web-page cannot be filed under any existing bookmark folder, I would usually create a new folder (with an adequate descriptive name based on the location of the web page in the feature space) and file the new bookmark under that new folder. This is akin to trying to fit in a novel item into existing clusters in the feature space, only to discover, it doesnt fit well with any cluster, but is an outlier. The best way to accommodate such an outlier , in my opinion, is to create a new cluster around the outlier. Extending this to schema, it is not hard to see that this is the classical case of accommodation and formation of a new schemata to incorporate a novel item that cannot be assimilated in existing schema.

Piaget, of course , stopped here (and so do I, sometimes, when managing my bookmarks!). but I would like to venture firth and discuss the other process that I engage in , very infrequently, to keep my bookmarks in good shape. This is what I would call reorganization or restructuring. when I restructure my bookmarks, I change the names, I move bookmarks form one folder to another , I merge bookmarks and also at times create more than a few sub folders. Also, interestingly, I delete some of the old bookmarks; while am captivated by some of the bookmarks and even forget to complete the restructuring part.

I believe that we too indulge in restructuring of our Schema/ categories periodically (it may be as frequent as daily during REM sleep) and that a crucial form of learning is not juts Assimilation and Accommodation, but also Restructuring. Also it is my contention, that we consciously remember anything only because we have actively restructured that information and embedded it in a contextual narrative. In the absence of restructuring, there can be information that can be used, but no conscious knowledge.

I plan to tie this up with the 3 factor model of memory that is emerging. One factor of the memory system uses familiarity detection (assimilation), the other novelty detection(accommodation), while the other involves conscious and contextual recollection(restructuring).

I also propose that these three factors are behind the three kinds of memory (content-wise and not duration wise). The first type of memory is semantic (or noetic)- facts like France’s capital is Paris; the second is procedural (or anoetic) – learning how to drive- and is unconscious; while the third is episodic or autonoetic) – personally remembered events and feelings) . Of course memories would also differer along the time dimension- working memory, long-term memory etc. , but that discussion is for another day.

Also a brief not to myself – how this may be linked with Hughling-Jackson’s theory of 3 states of consciousness and how they are differentially affected in dissociation– the autonoetic memory would be affected first- the noetic second and the anoetic or unconscious memory last in dissociation.

Returning back to categorization, this approach of adding new items either by assimilation, accommodation or restructuring is more guided my Mind-Is-A-Container metaphor. Other metaphors of mind- assuming it theory like – may yield to new and interesting views of how we form a theory-like theory of categorization. The other minor variation to above mind is a container metaphor may be using labels for bookmarks (instead of folders)- this is what Google bookmarks and del.icio are using. I haven’t experimented with that approach to bookmarking extensively, so am not sure what new insights can be gained form them. For those readers, who use labels to organize bookmarks, their insights as comments, would be greatly appreciated.

History in the making – the neurogeneisis discovery

There is an old article by Jonah Lehrer in the Seed magazine regarding the historical process via which the fact of neurogenesis in the humna brain was discovered and established.

One of the findings related to the stress/depression and the-lack-of-neurogenesis linkage and the underlying mechanisms that are involved (including sertonergic triggering of cascade reactions that lead to increase in trophic factors). A corollary finding was that enriched environments also lead to more neurogenesis and can help heal the scars formed due to depression/stress by stimulating neurogenesis in the adult brain. How neurogenesis (in areas like hippocampus and dompaminergic neurons) leads to recovery from depression/ stress is still not clear.

To briefly summarize the findings (though it is highly recommended that you read the original article which is very well written):

  1. Neurogenesis happens in adult brains (rats, primates and even humans).
  2. Stress reduces neurogenesis.
  3. Depression and reduced neurogenesis have been found to co-occur.
  4. Enriched environments lead to increase in neurogeneisis. (in rats, marmoset monkeys)
  5. Sertonin-based antidepressants primarily work by increasing neurogeneisis.

Hence inductively it seems probable that Low IQ is caused by Lower SES. (OK, this may seem like a joke…but do go and read the article and Gould’s views on the stress and poverty relationships- and I find her views (and her supporting experimental and observational facts) quite plausible.)

The scientists profiled in the article, at that time, were still wondering (and actively exploring) the exact mechanism between neurogenesis and depression/ stress.

My hypothesis of why depression leads to less nurogenesis in hippocampus would be related to the role of hippocampus in memory and learning and how, for example, repeated exposure to shocks in rats leads the rats to exhibit a phenomenon known as ‘learned helplessness’. Once the memory of a shockful and distressing repetitive experience is entrenched in the rat’s memory, in the hippocampal region, she may not try to explore the environment that much, to discover and learn what has changed regarding the environment, and whether the stressful conditions and environments are over. This may lead to reduced neurogenesis as the rat’s brain resigns itself to fate. This inability-to-learn or ‘learning helplessness’ (my slightly changed term for the same behavioral description) may lead to a vicious downward cycle leading to depression.

Once the neurogenesis is re-triggered, either due to administration of prozac or other antidepressants, or due to Cognitive behavioral therapy (and it had been found using brain scans that these two approaches seem to converge- one working in a top-down fashion (expecations and beliefs), while the other on a molecular and bottom-down fashion ), then the increased neurogenesis leads to an enhanced ability to learn and adapt and thus overcome the depressive epsiode and get rid of the symptoms. In both cases, the brunt of effort to get out of depression is still borne by the individual who is affected.

The other piece of information that caught my fancy was that of the dopimenergic neurogenesis and the potential cure of parkinson’s disease based on targetting this pathway. Whether neurogenisis is limited to hippocampal regions, or also happens in the substatntia nigra/ VTA region (where I guess all the dopaminergic neurons reside) is an important question and my lead to more insight as to which all areas of the brain (or all areas) are susceptible to neurogenesis.

Abstract vs Concrete: the two genders?( the catogorization debate)

In my previous posts I have focussed on distinctions in cognitive styles based on figure-ground, linear-parallel, routine-novel and literal-metaphorical emphasis.

There is another important dimension on which cognitive styles differ and I think this difference is of a different dimension and mechanism than the figure-ground difference that involves broader and looser associations (more context) vs narrow and intense associations (more focus). One can characterize the figure-ground differences as being detail and part-oriented vs big picture orientation and more broadly as analytical vs synthesizing style.

The other important difference pertains to whether associations and hence knowledge is mediated by abstract entities or whether associations, knowledge and behavior is grounded in concrete entities/experiences. One could summarize this as follows: whether the cognitive style is characterized by abstraction or whether it is characterized by a particularization bias. One could even go a step further and pit an algorithmic learning mechanism with one based on heuristics and pragmatics.

It is my contention that the bias towards abstraction would be greater for Males and the left hemisphere and the bias towards Particularization would be greater for Females and the right hemisphere.

Before I elaborate on my thesis, the readers of this blog need to get familiar with the literature on categorization and the different categorization/concept formation/ knowledge formation theories.

An excellent resource is a four article series from Mixing Memory. I’ll briefly summarize each post below, but you are strongly advised to read the original posts.

Background: Most of the categorization efforts are focussed on classifying and categorizing objects, as opposed to relations or activities, and the representation of such categories (concepts) in the brain. Objects are supposed to be made up of a number of features . An object may have a feature to varying degrees (its not necessarily a binary has/doesn’t has type of association, one feature may be tall and the feature strength may vary depending on the actual height)

The first post is regarding classical view of concepts as being definitional or rule-bound in nature. This view proposes that a category is defined by a combination of features and these features are of binary nature (one either has a feature or does not have it). Only those objects that have all the features of the category, belong to a category. The concept (representation of category) can be stored as a conjunction rule. Thus, concept of bachelor may be defined as having features Male, single, human and adult. To determine the classification of a novel object, say, Sandeep Gautam, one would subject that object to the bachelor category rule and calculate the truth value. If all the conditions are satisfied (i.e. Sandeep Gautam has all the features that define the category bachelor), then we may classify the new object as belonging to that category.


Bachelor(x)= truth value of (male(x))AND(adult(x))AND(single(x))AND(human(x))

Thus a concept is nothing but a definitional rule.

The second and third posts are regarding the similarity-based approaches to categorization. These may also be called the clustering approaches. One visualizes the objects as spread in a multi-dimensional feature space, with each dimension representing the various degrees to which the feature is present. The objects in this n-dim space, which are close to each other, and are clustered together, are considered to form one category as they would have similar values of features. In these views, the distance between objects in this n-dim feature space, represents their degree of similarity. Thus, the closer the objects are the more likely that they are similar and the moire likely that we can label them as belonging to one category.

To take an example, consider a 3-dim space with one dimension (x) signifying height, the other (y) signifying color, and the third (z) signifying attractiveness . Suppose, we rate many Males along these dimensions and plot them on this 3-d space. Then we may find that some males have high values of height(Tall), color(Dark) and attractiveness(Handsome) and cluster in the 3-d space in the right-upper quadrant and thus define a category of Males that can be characterized as the TDH/cool hunk category(a category that is most common in the Mills and Boons novels). Other males may meanwhile cluster around a category that is labeled squats.

Their are some more complexities involved, like assigning weights to a feature in relation to a category, and thus skewing the similarity-distance relationship by making it dependent on the weights (or importance) of the feature to the category under consideration. In simpler terms, not all dimensions are equal , and the distance between two objects to classify them as similar (belonging to a cluster) may differ based on the dimension under consideration.

There are two variations to the similarity based or clustering approaches. Both have a similar classification and categorization mechanism, but differ in the representation of the category (concept). The category, it is to be recalled, in both cases is determined by the various objects that have clustered together. Thus, a category is a collection or set of such similar object. The differences arise in the representation of that set.

One can represent a set of data by its central tendencies. Some such central tendencies, like Mean Value, represent an average value of the set, and are an abstraction in the sense that no particular member may have that particular value. Others like Mode or Median , do signify a single member of that set, which is either the most frequent one or the middle one in an ordered list. When the discussion of central tendencies is extended to pairs or triplets of values, or to n-tuples (signifying n dim feature space) , then the concept of mode or median becomes more problematic, and a measure based on them, may also become abstract and no longer remain concrete.

The other central tendencies that one needs are an idea of the distribution of the set values. With Mean, we also have an associated Variance, again an abstract parameter, that signifies how much the set value are spread around the Mean. In the case of Median, one can resort to percentile values (10th percentile etc) and thus have concrete members as representing the variance of the data set.

It is my contention that the prototype theories rely on abstraction and averaging of data to represent the data set (categories), while the Exemplar theories rely on particularization and representativeness of some member values to represent the entire data set.

Thus, supposing that in the above TDH Male classification task, we had 100 males belonging to the TDH category, then a prototype theory would store the average values of height, color and attractiveness for the entire 100 TDH category members as representing the TDH male category.

On the other hand, an exemplar theory would store the particular values for the height, color and attractiveness ratings of 3 or 4 Males belonging to the TDH category as representing the TDH category. These 3 or 4 members of the set, would be chosen on their representativeness of the data set (Median values, outliers capturing variance etc).

Thus, the second post of Mixing Memory discusses the Prototype theories of categorization, which posits that we store average values of a category set to represent that category.


Similarity will be determined by a feature match in which the feature weights figure into the similarity calculation, with more salient or frequent features contributing more to similarity. The similarity calculation might be described by an equation like the following:

Sj = Si (wi.v(i,j))

In this equation, Sj represents the similarity of exemplar j to a prototype, wi represents the weight of feature i, and v(i,j) represents the degree to which exemplar j exhibits feature i. Exemplars that reach a required level of similarity with the prototype will be classified as members of the category, and those fail to reach that level will not.

The third post discusses the Exemplar theory of categorization , which posits that we store all, or in more milder and practical versions, some members as exemplars that represent the category. Thus, a category is defined by a set of typical exemplars (say every tenth percentile).

To categorize a new object, one would compare the similarity of that object with all the exemplars belonging to that category, and if this reaches a threshold, the new object is classified as belonging to the new category. If two categories are involved, one would compare with exemplars from both the categories, and depending on threshold values either classify in both categories , or in a forced single-choice task, classify in the category which yields better similarity scores.


We encounter an exemplar, and to categorize it, we compare it to all (or some subset) of the stored exemplars for categories that meet some initial similarity requirement. The comparison is generally considered to be between features, which are usually represented in a multidimensional space defined by various “psychological” dimensions (on which the values of particular features vary). Some features are more salient, or relevant, than others, and are thus given more attention and weight during the comparison. Thus, we can use an equation like the following to determine the similarity of an exemplar:

dist(s, m) = åiai|yistimymiex|

Here, the distance in the space between an instance, s, and an exemplar in memory, m, is equal to the sum of the values of the feature of m on all of dimensions (represented individually by i) subtracted from the feature value of the stimulus on the same dimensions. The sum is weighted by a, which represents the saliency of the particular features.

There is another interesting clustering approach that becomes available to us, if we use an exemplar model. This is the proximity-based approach. In this, we determine all the exemplars (of different categories) that are lying in a similarity radius (proximity) around the object in consideration. Then we determine the category to which these exemplars belong. The category to which the maximum number of these proximate exemplars belong, is the category to which this new object is classified.

The fourth post on Mixing Memory deals with a ‘theory’ theory approach to categorization, and I will not discuss it in detail right now.

I’ll like to mention briefly in passing that there are other relevant theories like schemata , scripts, frames and situated simulation theories of concept formation that take into account prior knowledge and context to form concepts.

However, for now, I’ll like to return to the prototype and exemplar theories and draw attention to the fact that the prototype theories are more abstracted, rule-type and economical in nature, but also subject to pragmatic deficiencies, based on their inability to take variance, outliers and exceptions into account; while the exemplar theories being more concrete, memory-based and pragmatic in nature (being able to account for atypical members) suffer from the problems of requiring large storage/ unnecessary redundancy. One may even extrapolate these differences as the one underlying procedural or implicit memory and the ones underlying explicit or episodic memory.

There is a lot of literature on prototypes and exemplars and research supporting the same. One such research is in the case of Visual perception of faces, whereby it is posited that we find average faces attractive , as the average face is closer to a prototype of a face, and thus, the similarity calculation needed to classify an average face are minimal. This ease of processing, we may subjectively feel as attractiveness of the face. Of course, male and female prototype faces would be different, both perceived as attractive.

Alternately, we may be storing examples of faces, some attractive, some unattractive and one can theorize that we may find even the unattractive faces very fast to recognize/categorize.

With this in mind I will like to draw attention to a recent study that highlighted the past-tense over-regularization in males and females and showed that not only do females make more over-regularization errors, but also these errors are distributed around similar sounding verbs.

Let me explain what over-regularization of past-tense means. While the children are developing, they pick up language and start forming the concepts like that of a verb and that of a past tense verb. They sort of develop a folk theory of how past tense verbs are formed- the theory is that the past tense is formed by appending an ‘ed’ to a verb. Thus, when they encounter a new verb, that they have to use in past tense (and which say is irregular) , then they will tend to append ‘ed’ to the verb to make the past tense. Thus, instead of learning that ‘hold’, in past tense becomes ‘held’, they tend to make the past tense as ‘holded’.

Prototype theories suggest, that they have a prototypical concept of a past tense verb as having two features- one that it is a verb (signifies action) and second that it has ‘ed’ in the end.

Exemplar theories on the other hand, might predict, that the past tense verb category is a set of exemplars, with the exemplars representing one type of similar sounding verbs (based on rhyme, last coda same etc). Thus, the past tense verb category would contain some actual past tense verbs like { ‘linked’ representing sinked, blinked, honked, yanked etc; ‘folded’ representing molded, scolded etc}.

Thus, this past tense verb concept, which is based on regular verbs, is also applied while determining the past tense of irregular verb. On encountering ‘hold’ an irregular verb, that one wants to use in the past tense, one may use ‘holded’ as ‘holded’ is both a verb, ends in ‘ed’ and is also very similar to ‘folded’. While comparing ‘hold’ with a prototype, one may not have the additional effect of rhyming similarity with exemplars, that is present in the exemplar case; and thus, females who are supposed to use an exemplar system predominantly, would be more susceptible to over-regularization effects as opposed to boys. Also, this over-regularization would be skewed, with more over-regularization for similar rhyming regular verbs in females. As opposed to this, boys, who are usinbg the prototype system predominantly, would not show the skew-towards-rhyming-verbs effect. This is precisely what has been observed in that study.

Developing Intelligence has also commented on the same, though he seems unconvinced by the symbolic rules-words or procedural-declarative accounts of language as opposed to the traditional confectionist models. The account given by the authors, is entirely in terms of procedural (grammatical rule based) versus declarative (lexicon and pairs of past and present tense verb based) mechanism, and I have taken the liberty to reframe that in terms of Prototype versus Exemplar theories, because it is my contention that Procedural learning , in its early stages is prototypical and abstractive in nature, while lexicon-based learning is exemplar and particularizing in nature.

This has already become a sufficiently long post, so I will not take much space now. I will return to this discussion, discussing research on prototype Vs exemplars in other fields of psychology especially with reference to Gender and Hemisphericality based differences. I’ll finally extend the discussion to categorization of relations and that should move us into a whole new filed, that which is closely related to social psychology and which I believe has been ignored a lot in cognitive accounts of learning, thinking etc.

Five Minds, The Big Five and the Five Faces of the Genius

Howard Gardner, is currently promoting his new book, Five Minds for the Future, and more information about the same is available here.

The five minds—disciplined, synthesizing, creating, respectful, and ethical—differ from multiple intelligence in working in a more synergistic fashion as opposed to separate categories of intelligences.

The “disciplined mind,” Gardner argues, is not simply knowing a particular subject but “learning to think the way people who are experts in the field think,” and should develop by the end of secondary school.

The second type of mind, the “synthesizing mind,” is defined by “deciding what to focus on, what’s important, what to ignore, and putting that together in a way that makes sense.” With a dearth of information about synthesizing in textbooks, Gardner has become most intrigued by this concept. Gardner considers himself primarily a synthesizer, but now as a “fish that has suddenly discovered he’s in water,” Gardner is faced with the challenge of uncovering what goes on as people synthesize, what is good versus bad synthesis, and how to enhance the process.

Discussing the creative mind, Gardner points out that today “creating is a premium and not an option.” While one needs a certain amount of discipline and synthesizing to create, too much of either will stifle creativity.

To foster creativity in the classroom, Gardner recommends that teachers “model novel approaches and answers to questions and indicate [to students] that those responses are legitimate.” Students should be encouraged to come up with innovative approaches, discussing ideas that did not work and alternative models. There should also be study of “examples of creative ideas, actions, behaviors,” figuring out how success was attained, and what obstacles had to be overcome.

While the first three minds are more cognitively oriented, the last two, respect and ethics, have more to do with personality and emotion. The respectful mind, Gardner indicated, has to do with “how we think and relate to other people, most importantly to other people around us.”

While this mind develops at a relatively young age, a kind of intuitive altruistic sense of reaching out to those around us, “attempting to understand differences and work with them,” the ethical mind is more abstract, and generally develops during adolescence. It has to do with fulfilling one’s responsibility in the world in terms of job role and as citizen, thinking in terms such as: “I’m a teacher…journalist…physicist, carrying out that role in the most professional way I can.”

Although, Gardner thinks that only the last two types of mind are related to personality and emotion, I believe that the first three types of ‘cognitive’ minds can also be related to personality types, as it is my contention that personality dimensions are just different styles of cognition and emotion.

I would thus like to draw attention to the parallels here, with the big five personality traits or the factors of the Five-factor model (OCEAN)

The disciplined mind utilizes the Conscientiousness traits of self-discipline, carefulness, thoroughness, orderedness, and deliberation to develop the thinking style marked by mastering the conventional way in which the experts familiar with the domain usually think.

The synthesizing mind, utilizes the Neuroticism traits that basically refer to an ability or inability to deal with environmental stimuli in a meaningful way. While discussions of neuroticism are usually couched in emotional terms-more reactive sympathetic nervous system, and more sensitivity to environmental stimulation – I also belive that there is a cognitive dimension here, that pertains to whether one reacts to all and every stimulus (information) or is more ‘cognitively calm and composed’ and uses deliberation in sorting the relevant information from irrelevant one rather than reacting to every little information nugget. This precisely is the synthesizing mind – able to focus on what is important and the ability to not get burdened by information overload. This is the emotional equivalent of not getting overwhelmed by environmental stress.

The creative mind, I believe, utilizes the Openness to Experience traits like unconventional and individualistic beliefs,broad interests, novelty preference and imagination to indulge in a thinking style that is marked with creativity- the ability to create something novel.

The respectful mind, utilizes the Agreeableness traits of consideration, friendliness, generosity, helpfulness and concern with cooperation and social harmony to indulge in a thinking style that is imbibed with an altruistic sense of reaching out to those around us, “attempting to understand differences and work with them.”

The ethical mind, on the other hand, utilizes the Extraversion traits of enjoying human interactions, enthusiasm, talkativeness, assertiveness, gregariousness and pleasure in social interactions to indulge in a thinking style marked with emphasis on activity and social role and responsibility – the precise recipe for the ethical mind!

Gardner also proposes a relationship/ hierarchy between the five minds.

In the latter part of his book, Gardner explores the interaction between five minds. He doesn’t see them as isolated categories, but as a general taxonomy followed by respect before ethics, discipline before synthesis, ultimately creating.

This implication of a developmental framework, in which the order of development is – discipline, synthesis, respect, ethics and creativity – maps very well to my own obsession with a five stage developmental model of cognitive, moral, perspective-taking, linguistic , symbolic, pretend-play and other abilities. I believe that Gardner has got the order wrong, and the traits (and the Five minds) develop in the following order- Neuroticism, conscientiousness, Extraversion, agreeableness and finally Openness to Experience. I may be wrong here, but I would write in detail on my rationale for this developmental path in a subsequent post.

While it is reasonable to stop here, I am tempted to take the analogies further and link this up with the Five Faces of the Genius.

To me, the Fool epitomizes perseverance and thus a Disciplined and Conscientious mind.

The Observer epitomizes ability to pick a needle from a haystack and thus a Synthesising and a low Neurotic (cognitively stable) mind.

The Alchemist, with its focus on active bridging and connection between domains, seems to reflect an ethical and extraverted mind.

The Seer, with an ability to imagine and visualize, may have a corresponding capacity to imagine and feel other;s emotions and this empathy leading it to have a respectful and Agreeable mind.

The Sage, with its ability to simplify, may find a resonance in the openness traits of ‘preferring the plain, straightforward, and obvious over the complex, ambiguous, and subtle’ and may be linked to the creative and Open mind!

Do let me know, how you find these conjectures and linkages. I hope I am not using the analogical reasoning of the alchemist to an unacceptable extreme!! Even if I am, you can be sure that it is just due to my high energy levels and my ethical concerns!!

Artificial Neural Networks: temporal summation, embedded ‘clocks’ and operant learning

Artificial Neural Networks have historically focussed on modeling the brain as a collection of interconnected neurons. The individual neurons aggregate inputs and either produce an on/off output based on threshold values or produce a more complex output as a linear or sigmoid function of their inputs. The output of one neuron may go to several other neurons.

Not all inputs are equivalent and the inputs to the neuron are weighed according to a weight assigned to that input connection. This mimics the concept of synaptic strength. The weights can be positive (signifying an Excitatory Post-Synaptic Potential ) or negative (signifying an Inhibitory Post-Synaptic Potential).

Learning consists of the determination of correct weights that need to be assigned to solve the problem; i.e. to produce a desired output, given a particular input. This weight adjustment mimics the increase or decrease of synaptic strengths due to learning. Learning may also be established by manipulating the threshold required by the neuron for firing. This mimics the concept of long term potentiation (LTP).

The model generally consists of an input layer (mimicking sensory inputs to the neurons) , a hidden layer (mimicking the association functions of the neurons in the larger part of the brain) and an output layer ( mimicking the motor outputs for the neurons).

This model is a very nice replication of the actual neurons and neuronal computation, but it ignores some of the other relevant features of actual neurons:

1. Neuronal inputs are added together through the processes of both spatial and temporal summation. Spatial summation occurs when several weak signals are converted into a single large one, while temporal summation converts a rapid series of weak pulses from one source into one large signal. The concept of temporal summation is generally ignored. The summation consists exclusively of summation of signals from other neurons at the same time and does not normally include the concept of summation across a time interval.

2. Not all neuronal activity is due to external ‘inputs’. Many brain regions show spontaneous activity, in the absence of any external stimulus. This is not generally factored in. We need a model of brain that takes into account the spontaneous ‘noise’ that is present in the brain, and how an external ‘signal’ is perceived in this ‘noise’. Moreover, we need a model for what purpose does this ‘noise’ serve?

3. This model mimics the classical conditioning paradigm, whereby learning is conceptualized in terms of input-output relationships or stimulus-response associations. It fails to throw any light on many operant phenomenon and activity, where behavior or response is spontaneously generated and learning consist in the increase\decrease \ extinction of the timing and frequency of that behavior as a result of a history of reinforcement. This type of learning accounts for the majority of behavior in which we are most interested- the behavior that is goal directed and the behavior that is time and context and state-dependent. The fact that a food stimulus, will not always result in a response ‘eat’, but is mediated by factors like the state (hunger) of the organism, time-of-day etc. is not explainable by the current models.

4. The concept of time, durations and how to tune the motor output as per strict timing requirements has largely been an unexplored area. While episodic learning and memory may be relatively easier to model in the existing ANNs, its my hunch that endowing them with a procedural memory would be well nigh impossible using existing models.

Over a series of posts, I would try to tackle these problems by enhancing the existing neural networks by incorporating some new features into it, that are consistent with our existing knowledge about actual neurons.

First, I propose to have a time-threshold in each neural unit. This time-threshold signifies the duration in which temporal summation is applicable and takes place. All inputs signals, that are received within this time duration, either from repeated firing of the same input neuron or from time-displaced firings of different input neurons, are added together as per the normal input weights and if at any time this reaches above the normal threshold-for-firing, then the neuron fires. This has combined both temporal and spatial summation concepts. With temporal summation, we have an extra parameter- the time duration for which the history of inputs needs to be taken into account.

All neurons will also have a very short-term memory, in the sense that they would be able to remember the strengths of the inputs signals that they have received in the near past , that is in the range of the typical time-thresholds that are set for them. This time-threshold can typically be in milliseconds.

Each time a neuron receives an input, it starts a timer. This timer would run for a very small duration encoded as the time-threshold for that neuron. Till the time this timer is running and has not expired, the input signal is available to the neuron for calculation of total input strength and for deciding whether to fire or not. As soon as the timer expires, the memory of the associated input is erased from the neurons memory and that particular input would no longer be able to affect any future firing of the neuron.

All timers as well as the memory of associated input signals are erased after each successful neural firing (every time the neuron generates an action potential). After each firing, the neuron starts from afresh and starts accumulating and aggregating the inputs it receives thereafter in the time-threshold window that is associated with it.

Of course there could be variations to this. Just like spatial aggregation/firing need not be an either/or decision based on a threshold; the temporal aggregation/ firing need not be an either-or decision: one could have liner or sigmoid functions of time that modulate the input signal strength based on the time that has elapsed. One particular candidate mechanism could be a radioactive decay function, that decreases the input signal strength by half after each half-life. Here, the half-life is equivalent to the concept of a time-threshold. While in the case of time-threshold, after a signal arrives, and once the time-threshold has elapsed, then the input signal is not available to the neuron at all, and while the time-threshold had not elapsed the signal was available in its entirety; in the case of radioactive deacy the inpiut signal is available till infinity in theory; but the strength of the signal would get diminisehd by half after each half-life period; thus making the effects of the input signal negligible after a few half-lives. Of course in the radioactive case too, once the neuron has fired, all memory of that input would be erased and any half-life decay computations stopped.

These are not very far-fetched speculations and modeling the neural networks this way can lead to many interesting results.

Second, I propose to have some ‘clocks’ or ‘periodic oscillators’ in the network, that would be generating spontaneous outputs after a pre-determined time and irrespective of any inputs. Even one such clock is sufficient for our discussions. Such a clock or oscillator system is not difficulty to envisage or conceive. We just need a non-random, deterministic delay in the transmission of signals from one neuron to the other. There do exist systems in the brain that delay the signals, but leaving aside such specialized systems, even a normal synaptic transmission along an axon between two neurons, would suffer from some deterministic delay based on the time it takes the signal to travel down the axon length and assuming that no changes in myelination takes place over time, so that the speed of transmission is constant.

In such a scenario, the time it takes for a signal to reach the other neuron, is held constant over time. (Note that this time may be different for different neuron pairs based on both the axon lengths involved and the associated myelination, but would be same for the same neuron pair over time). Suppose that both the neurons have very long, unmyelinated axons and that these axons are equal in length and provide inputs to each other. Further suppose that both the neurons do not have any other inputs , though each may send its output to many other neurons.

Thus, the sole input of the first neuron is the output of the second neuron and vice versa. Suppose that the thresholds of the two neurons are such that each would trigger, if it received a single input signal (from the peer neuron). As there would be a time lag between the firing of neuron one, and its reaching the second neuron, the second neuron would fire only after, say 5 milliseconds, the time it takes for signal to travel, after the first neuron has fired. The first neuron meanwhile will respond to the AP generated by the second neuron -which would reach it after (5+5= 10 ms) the round trip delay- and generate an AP after 10 ms from its initial firing.

We of course have to assume that somehow, the system was first put in motion: someone caused the first neuron to fire initially (this could not be other neurons, as we have assumed that this oscillator pair has no external input signals) and after that it is a self-sustaining clock with neuron 1 and neuron 2 both firing regularly at 10 ms intervals but in opposite phases. We just need GOD to initally fire the first neuron (the park of life) and thereafter we do have a periodic spontaneous activity in the system.

Thirdly, I propose that this ‘clock’, along with the concept of temporal summations, is able to calculate and code any arbitrary time duration and any arbitrary time dependent behavior, but in particular any periodic or sate/ goal based behavior. I’ve already discussed some of this in my previous posts and elaborate more in subsequent posts.

For now, some elementary tantalizing facts.

1. Given a 10 ms clock and a neuron capable of temporal summation over 50 ms duration, we can have a 50 ms clock: The neuron has the sole input as the output of the 10ms clock. After every 50 ms, it would have accumulated 5 signals in its memory. If the threshold-for-firing of the neuron is set such that it only fires if it has received five time the signal strength that is outputted by the 10 ms clock , then this neuron will fire after very 50 ms. This neuron would generate a periodic output after every 50 ms and implements a 50 ms clock.

2. Given a 10 ms clock and a neuron capable of temporal summation over 40 ms, (or lets have the original 50 ms time-threshold neuron, but set its threshold-for-firing to 4 times the output strength of the 10 ms clock neuron) , using the same mechanism as defined above, we can have a 40 ms clock.

3. Given a 40 ms clock, a 50 ms clock and a neuron that does not do temporal summation, we can have a 2000 ms clock. The sole inputs to the neuron implementing the 2000 ms clock are the outputs of the 50 ms and the 40 ms clock. This neuron does not do temporal summation. Its threshold for firing is purely spatial and it fires only if it simultaneously receives a signal strength that is equal to or greater than the combined output signal strength of 50ms and 40 ms neuron. It is easy to see, that if we assume that the 50 ms and 40 ms neurons are firing in phase, then only after every 2000 ms would the signals from the two neurons arrive at the same time for this 2000ms clock. Viola, we have 2000 ms clock. After this, I assume, its clear that the sky is the limit as to the arbitrariness of the duration that we can code for.

Lastly, learning consists of changing the temporal thresholds associated with a neuron, so that any arbitrary schedule can be associated with a behavior, based on the history of reinforcement. After the training phase, the organism would exhibit spontaneous behavior that follows a schedule and could learn novel schedules for novel behaviors (transfer of learning).

To me all this seems very groundbreaking theorizing and I am not aware of how and whether these suggestions/ concepts have been incorporated in existing Neural Networks. Some temporal discussions I could find here. If anyone is aware of such research , do let me know via comments or by dropping a mail. I would be very grateful. I am especially intrigued by this paper (I have access to abstract only) and the application of temporal summation concepts to hypothalamic reward functions.