Category Archives: categorization

Music and Language: dissociation between rule-crunching and memory-retrieval systems

I have previously written about how concepts are stored in the brain: they involve rule-based systems (A is bachelor if A is Single AND A is male) and memory based systems (prototypes and exemplars). I have also looked at how language involves both rules (the syntax of the language) as well as memory (semantics or word meanings) systems and our normal language comprehension as well as productions engages both types of systems.

It is a popular paradigm in cognitive linguistic research to present unexpected words in sentences (such as, “I’ll have my coffee with milk and concrete”), while monitoring brain activity using ERP, and find that the presentation of an unexpected word leads to a N400 peak in the temporal lobe areas. This violation of semantics is differentiated from when the syntax of the sentence is wrong, in which case we get changed activity in frontal lobes.

“Up until now, researchers had found that the processing of rules relies on an overlapping set of frontal lobe structures in music and language. However, in addition to rules, both language and music crucially require the memorization of arbitrary information such as words and melodies,” says the study’s principal investigator, Michael Ulmann, Ph.D., professor of neuroscience, psychology, neurology and linguistics.

For the first time , similar results have been obtained for music. If one assumes that changing an in-key note in a familiar melody is akin to an unexpected word in a sentence, then the same N400 peak is observed. Also , if a violation of harmonical rules , like an off-key note in an unfamiliar harmony, is akin to violations of linguistic syntax, then here too similar changes in frontal lobe activity were observed.

The subjects listened to 180 snippets of melodies. Half of the melodies were segments from tunes that most participants would know, such as “Three Blind Mice” and “Twinkle, Twinkle Little Star.” The other half included novel tunes composed by Miranda. Three versions of each well-known and novel melody were created: melodies containing an in-key deviant note (which could only be detected if the melody was familiar, and therefore memorized); melodies that contained an out-of-key deviant note (which violated rules of harmony); and the original (control) melodies.

For listeners familiar with a melody, an in-key deviant note violated the listener’s memory of the melody ? the song sounded musically “correct” and didn’t violate any rules of music, but it was different than what the listener had previously memorized. In contrast, in-key “deviant” notes in novel melodies did not violate memory (or rules) because the listeners did not know the tune.

Out-of-key deviant notes constituted violations of musical rules in both well-known and novel melodies. Additionally, out-of-key deviant notes violated memory in well-known melodies.

Miranda and Ullman examined the brain waves of the participants who listened to melodies in the different conditions, and found that violations of rules andmemory in music corresponded to the two patterns of brain waves seen in previous studies of rule and memory violations in language. That is, in-key violations of familiar (but not novel) melodies led to a brain-wave pattern similar to one called an “N400” that has previously been found with violations of words (such as, “I’ll have my coffee with milk and concrete”). Out-of-key violations of both familiar and novel melodies led to a brain-wave pattern over frontal lobe electrodes similar to patterns previously found for violations of rules in both language and music. Finally, out-of-key violations of familiar melodies also led to an N400-like pattern of brain activity, as expected because these are violations of memory as well as rules.

“This tells us that these two aspects of music, that is rules and memorized melodies, depend on two different brain systems – brain systems that also underlie rules and memorized information in language,” Ullman says. “The findings open up exciting new ways of thinking about and investigating the relationship between language and music, two fundamental human capacities.”

To me this seems exciting. My thesis has been that Men are better at rule-based things (syntax and harmony); while women are better at memory-based things (semantics and melody), so I’ll like to know whether the authors observed any gender effects. If so, this would be further proof for abstract vs concrete gender difference theory.

Neurogeneisis, learning and small-world networks

Continuing this blog’s recent focus on categorization, one possibility of how new items are classified has been hypothesized as either assimilitaion (adding the item to an existing schema in the feature space) or accomodation (addition of a new schema around the item in the feature space). We’ll leave aside the newly introduced concept of Restructuring for this particular discussion.

Schemata, it is to be remembered, are conceptualized as nothing but a named cluster in the feature space. If we become a bit more audacious, we can posit that the clustering in the feature space is mimicked by the actual clustering/ connectivity of neurons in the Hippocampus (or the appropriate semantic memory brain module), with each neuron representing a particular item- say a neuron being a Halley Barry neuron. These neurons would not be randomly distributed- they form a small-world model with local clustering and bistability. whenever a group of neurons get activated together (and also belong to a cluster or clique), we can say that the memory of that category is activated.

Further suppose that learning and memory are crucially dependent on Neurogeneisis and new learning (of concepts ) happens by insertion of a new node (neuron in the small-world network of brain) and connecting it appropriately with other neurons.

As an example consider that all face recognition cells cluster together in the brain and the concept of face is activated by simultaneous activation of all cells of this cluster. The fact that a new visual stimulus (a novel human face of a stranger) is a face is determined by calculating the stimulus features and their difference from the prototypical/ exemplar face neurons and their features. A match so determined not only enables us to say that this new stimulus is a face (as this input would activate the face clique) , but would also give us an idea of where to place a new neuron that may encode for this new face and how to connect this with other neurons and with which other neurons.

Now whenever we encounter a novel stimulus we have two possibilities. If it matches some existing cluster / category, we encode this new memory by placing a new neuron coding for this in the region of that category in the feature space and (crucially) following preferential attachment attach it in a manner such that the probability of its linking to any other neighboring neuron is in proportion of the links that old neuron already has. (This can be readily implemented in brains as axonal connections will whither if not much functional activity happens at the synapse formed between the new neuron and the older one) . This is akin to assimilation of a new memory/ learning neuron. this method of insertion still keeps the neural net a small-world network.

Now consider the second case when the novel stimuli matches no older categories but necessitates that we form a new category if we have to represent that new item in the feature space. We need accommodation here. On the neural level this is still accomplished by inserting a new neuron, but this time the new node is not peripheral- the new neuron is a hub (category) neuron. So we use the method of copy to insert the new element. We copy the links (partially) of a neighboring hub (cluster center/ category label neuron) and use that link structure to link the newly introduced neuron in the small-world network. the network still remains scale-free and we have introduced a hub or a new category in this case.

All this seems very exciting. Some snippets from wikipedia article on scale -free networks are very relevant.

The mostly widely known generative model for a subset of scale-free networks is Barabási and Albert’s (1999) rich get richer generative model in which each new Web page creates links to existent Web pages with a probability distribution which is not uniform, but proportional to the current in-degree of Web pages.

A different generative model is the copy model studied by Kumar et al. (2000), in which new nodes choose an existent node at random and copy a fraction of the links of the existent node. This also generates a power law.

Recently, Manev and Manev (Med. Hypotheses, 2005) proposed that small world networks may be operative in adult brain neurogenesis. Adult neurogenesis has been observed in mammalian brains, including those of humans, but a question remains: how do new neurons become functional in the adult brain? It is proposed that the random addition of only a few new neurons functions as a maintenance system for the brain’s “small-world” networks. Randomly added to an orderly network, new links enhance signal propagation speed and synchronizability. Newly generated neurons are ideally suited to become such links: they are immature, form more new connections compared to mature ones, and their number but not their precise location may be maintained by continuous proliferation and dying off.

I am excited, what about you?

Categorization, Memory, small-world networks and neural architecture

In the last post I had wondered about the clustering based solution to categorization and how they may also inform us about how memory (semantic variety) is stored in brain, as semantic memory is best modeled by an associational or confectionist network.

Thus, a semantic memory based on clustering models may consist of associations between clusters or categories of information. For example one cluster may correspond to the names of countries and another to name of cities. A particular type of connection or association between these two clusters may map a relation of —-IS A CAPITAL OF —- type where for example the fact that Paris is the capital of France is stored. For this knowledge to exist, one has to have prior notions of France is a Country and Paris is a City and on top of that an associational relation between the individual entities France and Paris belonging to particular clusters.

Much of this would be more apparent once relational models of categorization are also covered. For now let us assume that (semantic) memory itself may consist of clusters of neurons that are also interconnected. Interestingly one such neural architecture, that has also been able to simulate short-term memory has been the small-world network model. In this a large number of nodes (neurons ) are connected by edges (synapses) as in a typical random graph. These small-world networks are special in the sense that they have high clustering coefficients and low mean path length. Translated in English, this means they exhibit more than chance clustering (to enhance local processing) as well as display a small value of smallest mean path length (reflecting ease of global processing).

It is intriguing thta in the short term memory model using small-world networks simulation, the researchers found that the model could exhibit bistability, which may be crucial for memory formation. In bistability, the cluster or functional region corresponding to a particular memory can be in two states, depending on an input variable. Thus, a pulse (direction of attention) can activate/ deactivate a memory.

Crucially, it can be hypothesized that as the small-world network model of memory/ categorization is good for local-global processing as well as reflective of the actual brain and AI simulation architectures, the entire brain is a small-world network adequately categorizing and representing the sensory, motor and cognitive information and processing them.

A recent MEG based study has established the fact that the small-world network topology exists in functional sphere in the brain at all oscillatory levels (crucial for binding) and that seems very promising.

Categoristation: how to bookmark the interesting pages on the web!

In an earlier post, I had touched upon the different categorization theories that are in prevalence. One of these that was discussed in details was the prototype Vs exemplar method that was based on clustering and involved different representational methods of the categories thus derived.

This post is about how a new item is allocated to a pre-existing category. Simplistically, and in the last post this was the position I had taken, it seems apparent that by calculating the distance of a new item in feature space from the central tendencies of the neighboring clusters (the prototypes/ exemplars) one can find a best fit with one of the clusters and allocate the new item to that category.

This is simplistic as it explains fitting of new items to existing categories, but does not include any mechanisms for formation of new categories.

The analogical approach I take here is of how do I decide in which folder to add a new bookmark of an interesting page found on the web. Most probably the names I have chose for my bookmarks folders are reflective of the central tendencies (common prominent features) of all pages bookmarked in that folder. I would normally look at the new page, and also at my existing folders and see if there is a best fit. If so I juts file the new bookmark under the best-fit existing folder. Slightly extending the concept of categorization to resemble that of a schema, this is the classical case of assimilation in a schema.

However, in case the new web-page cannot be filed under any existing bookmark folder, I would usually create a new folder (with an adequate descriptive name based on the location of the web page in the feature space) and file the new bookmark under that new folder. This is akin to trying to fit in a novel item into existing clusters in the feature space, only to discover, it doesnt fit well with any cluster, but is an outlier. The best way to accommodate such an outlier , in my opinion, is to create a new cluster around the outlier. Extending this to schema, it is not hard to see that this is the classical case of accommodation and formation of a new schemata to incorporate a novel item that cannot be assimilated in existing schema.

Piaget, of course , stopped here (and so do I, sometimes, when managing my bookmarks!). but I would like to venture firth and discuss the other process that I engage in , very infrequently, to keep my bookmarks in good shape. This is what I would call reorganization or restructuring. when I restructure my bookmarks, I change the names, I move bookmarks form one folder to another , I merge bookmarks and also at times create more than a few sub folders. Also, interestingly, I delete some of the old bookmarks; while am captivated by some of the bookmarks and even forget to complete the restructuring part.

I believe that we too indulge in restructuring of our Schema/ categories periodically (it may be as frequent as daily during REM sleep) and that a crucial form of learning is not juts Assimilation and Accommodation, but also Restructuring. Also it is my contention, that we consciously remember anything only because we have actively restructured that information and embedded it in a contextual narrative. In the absence of restructuring, there can be information that can be used, but no conscious knowledge.

I plan to tie this up with the 3 factor model of memory that is emerging. One factor of the memory system uses familiarity detection (assimilation), the other novelty detection(accommodation), while the other involves conscious and contextual recollection(restructuring).

I also propose that these three factors are behind the three kinds of memory (content-wise and not duration wise). The first type of memory is semantic (or noetic)- facts like France’s capital is Paris; the second is procedural (or anoetic) – learning how to drive- and is unconscious; while the third is episodic or autonoetic) – personally remembered events and feelings) . Of course memories would also differer along the time dimension- working memory, long-term memory etc. , but that discussion is for another day.

Also a brief not to myself – how this may be linked with Hughling-Jackson’s theory of 3 states of consciousness and how they are differentially affected in dissociation– the autonoetic memory would be affected first- the noetic second and the anoetic or unconscious memory last in dissociation.

Returning back to categorization, this approach of adding new items either by assimilation, accommodation or restructuring is more guided my Mind-Is-A-Container metaphor. Other metaphors of mind- assuming it theory like – may yield to new and interesting views of how we form a theory-like theory of categorization. The other minor variation to above mind is a container metaphor may be using labels for bookmarks (instead of folders)- this is what Google bookmarks and del.icio are using. I haven’t experimented with that approach to bookmarking extensively, so am not sure what new insights can be gained form them. For those readers, who use labels to organize bookmarks, their insights as comments, would be greatly appreciated.

Abstract vs Concrete: the two genders?( the catogorization debate)

In my previous posts I have focussed on distinctions in cognitive styles based on figure-ground, linear-parallel, routine-novel and literal-metaphorical emphasis.

There is another important dimension on which cognitive styles differ and I think this difference is of a different dimension and mechanism than the figure-ground difference that involves broader and looser associations (more context) vs narrow and intense associations (more focus). One can characterize the figure-ground differences as being detail and part-oriented vs big picture orientation and more broadly as analytical vs synthesizing style.

The other important difference pertains to whether associations and hence knowledge is mediated by abstract entities or whether associations, knowledge and behavior is grounded in concrete entities/experiences. One could summarize this as follows: whether the cognitive style is characterized by abstraction or whether it is characterized by a particularization bias. One could even go a step further and pit an algorithmic learning mechanism with one based on heuristics and pragmatics.

It is my contention that the bias towards abstraction would be greater for Males and the left hemisphere and the bias towards Particularization would be greater for Females and the right hemisphere.

Before I elaborate on my thesis, the readers of this blog need to get familiar with the literature on categorization and the different categorization/concept formation/ knowledge formation theories.

An excellent resource is a four article series from Mixing Memory. I’ll briefly summarize each post below, but you are strongly advised to read the original posts.

Background: Most of the categorization efforts are focussed on classifying and categorizing objects, as opposed to relations or activities, and the representation of such categories (concepts) in the brain. Objects are supposed to be made up of a number of features . An object may have a feature to varying degrees (its not necessarily a binary has/doesn’t has type of association, one feature may be tall and the feature strength may vary depending on the actual height)

The first post is regarding classical view of concepts as being definitional or rule-bound in nature. This view proposes that a category is defined by a combination of features and these features are of binary nature (one either has a feature or does not have it). Only those objects that have all the features of the category, belong to a category. The concept (representation of category) can be stored as a conjunction rule. Thus, concept of bachelor may be defined as having features Male, single, human and adult. To determine the classification of a novel object, say, Sandeep Gautam, one would subject that object to the bachelor category rule and calculate the truth value. If all the conditions are satisfied (i.e. Sandeep Gautam has all the features that define the category bachelor), then we may classify the new object as belonging to that category.

Thus,

Bachelor(x)= truth value of (male(x))AND(adult(x))AND(single(x))AND(human(x))

Thus a concept is nothing but a definitional rule.

The second and third posts are regarding the similarity-based approaches to categorization. These may also be called the clustering approaches. One visualizes the objects as spread in a multi-dimensional feature space, with each dimension representing the various degrees to which the feature is present. The objects in this n-dim space, which are close to each other, and are clustered together, are considered to form one category as they would have similar values of features. In these views, the distance between objects in this n-dim feature space, represents their degree of similarity. Thus, the closer the objects are the more likely that they are similar and the moire likely that we can label them as belonging to one category.

To take an example, consider a 3-dim space with one dimension (x) signifying height, the other (y) signifying color, and the third (z) signifying attractiveness . Suppose, we rate many Males along these dimensions and plot them on this 3-d space. Then we may find that some males have high values of height(Tall), color(Dark) and attractiveness(Handsome) and cluster in the 3-d space in the right-upper quadrant and thus define a category of Males that can be characterized as the TDH/cool hunk category(a category that is most common in the Mills and Boons novels). Other males may meanwhile cluster around a category that is labeled squats.

Their are some more complexities involved, like assigning weights to a feature in relation to a category, and thus skewing the similarity-distance relationship by making it dependent on the weights (or importance) of the feature to the category under consideration. In simpler terms, not all dimensions are equal , and the distance between two objects to classify them as similar (belonging to a cluster) may differ based on the dimension under consideration.

There are two variations to the similarity based or clustering approaches. Both have a similar classification and categorization mechanism, but differ in the representation of the category (concept). The category, it is to be recalled, in both cases is determined by the various objects that have clustered together. Thus, a category is a collection or set of such similar object. The differences arise in the representation of that set.

One can represent a set of data by its central tendencies. Some such central tendencies, like Mean Value, represent an average value of the set, and are an abstraction in the sense that no particular member may have that particular value. Others like Mode or Median , do signify a single member of that set, which is either the most frequent one or the middle one in an ordered list. When the discussion of central tendencies is extended to pairs or triplets of values, or to n-tuples (signifying n dim feature space) , then the concept of mode or median becomes more problematic, and a measure based on them, may also become abstract and no longer remain concrete.

The other central tendencies that one needs are an idea of the distribution of the set values. With Mean, we also have an associated Variance, again an abstract parameter, that signifies how much the set value are spread around the Mean. In the case of Median, one can resort to percentile values (10th percentile etc) and thus have concrete members as representing the variance of the data set.

It is my contention that the prototype theories rely on abstraction and averaging of data to represent the data set (categories), while the Exemplar theories rely on particularization and representativeness of some member values to represent the entire data set.

Thus, supposing that in the above TDH Male classification task, we had 100 males belonging to the TDH category, then a prototype theory would store the average values of height, color and attractiveness for the entire 100 TDH category members as representing the TDH male category.

On the other hand, an exemplar theory would store the particular values for the height, color and attractiveness ratings of 3 or 4 Males belonging to the TDH category as representing the TDH category. These 3 or 4 members of the set, would be chosen on their representativeness of the data set (Median values, outliers capturing variance etc).

Thus, the second post of Mixing Memory discusses the Prototype theories of categorization, which posits that we store average values of a category set to represent that category.

Thus,

Similarity will be determined by a feature match in which the feature weights figure into the similarity calculation, with more salient or frequent features contributing more to similarity. The similarity calculation might be described by an equation like the following:

Sj = Si (wi.v(i,j))

In this equation, Sj represents the similarity of exemplar j to a prototype, wi represents the weight of feature i, and v(i,j) represents the degree to which exemplar j exhibits feature i. Exemplars that reach a required level of similarity with the prototype will be classified as members of the category, and those fail to reach that level will not.

The third post discusses the Exemplar theory of categorization , which posits that we store all, or in more milder and practical versions, some members as exemplars that represent the category. Thus, a category is defined by a set of typical exemplars (say every tenth percentile).

To categorize a new object, one would compare the similarity of that object with all the exemplars belonging to that category, and if this reaches a threshold, the new object is classified as belonging to the new category. If two categories are involved, one would compare with exemplars from both the categories, and depending on threshold values either classify in both categories , or in a forced single-choice task, classify in the category which yields better similarity scores.

Thus,

We encounter an exemplar, and to categorize it, we compare it to all (or some subset) of the stored exemplars for categories that meet some initial similarity requirement. The comparison is generally considered to be between features, which are usually represented in a multidimensional space defined by various “psychological” dimensions (on which the values of particular features vary). Some features are more salient, or relevant, than others, and are thus given more attention and weight during the comparison. Thus, we can use an equation like the following to determine the similarity of an exemplar:

dist(s, m) = åiai|yistimymiex|

Here, the distance in the space between an instance, s, and an exemplar in memory, m, is equal to the sum of the values of the feature of m on all of dimensions (represented individually by i) subtracted from the feature value of the stimulus on the same dimensions. The sum is weighted by a, which represents the saliency of the particular features.

There is another interesting clustering approach that becomes available to us, if we use an exemplar model. This is the proximity-based approach. In this, we determine all the exemplars (of different categories) that are lying in a similarity radius (proximity) around the object in consideration. Then we determine the category to which these exemplars belong. The category to which the maximum number of these proximate exemplars belong, is the category to which this new object is classified.

The fourth post on Mixing Memory deals with a ‘theory’ theory approach to categorization, and I will not discuss it in detail right now.

I’ll like to mention briefly in passing that there are other relevant theories like schemata , scripts, frames and situated simulation theories of concept formation that take into account prior knowledge and context to form concepts.

However, for now, I’ll like to return to the prototype and exemplar theories and draw attention to the fact that the prototype theories are more abstracted, rule-type and economical in nature, but also subject to pragmatic deficiencies, based on their inability to take variance, outliers and exceptions into account; while the exemplar theories being more concrete, memory-based and pragmatic in nature (being able to account for atypical members) suffer from the problems of requiring large storage/ unnecessary redundancy. One may even extrapolate these differences as the one underlying procedural or implicit memory and the ones underlying explicit or episodic memory.


There is a lot of literature on prototypes and exemplars and research supporting the same. One such research is in the case of Visual perception of faces, whereby it is posited that we find average faces attractive , as the average face is closer to a prototype of a face, and thus, the similarity calculation needed to classify an average face are minimal. This ease of processing, we may subjectively feel as attractiveness of the face. Of course, male and female prototype faces would be different, both perceived as attractive.


Alternately, we may be storing examples of faces, some attractive, some unattractive and one can theorize that we may find even the unattractive faces very fast to recognize/categorize.

With this in mind I will like to draw attention to a recent study that highlighted the past-tense over-regularization in males and females and showed that not only do females make more over-regularization errors, but also these errors are distributed around similar sounding verbs.

Let me explain what over-regularization of past-tense means. While the children are developing, they pick up language and start forming the concepts like that of a verb and that of a past tense verb. They sort of develop a folk theory of how past tense verbs are formed- the theory is that the past tense is formed by appending an ‘ed’ to a verb. Thus, when they encounter a new verb, that they have to use in past tense (and which say is irregular) , then they will tend to append ‘ed’ to the verb to make the past tense. Thus, instead of learning that ‘hold’, in past tense becomes ‘held’, they tend to make the past tense as ‘holded’.

Prototype theories suggest, that they have a prototypical concept of a past tense verb as having two features- one that it is a verb (signifies action) and second that it has ‘ed’ in the end.

Exemplar theories on the other hand, might predict, that the past tense verb category is a set of exemplars, with the exemplars representing one type of similar sounding verbs (based on rhyme, last coda same etc). Thus, the past tense verb category would contain some actual past tense verbs like { ‘linked’ representing sinked, blinked, honked, yanked etc; ‘folded’ representing molded, scolded etc}.

Thus, this past tense verb concept, which is based on regular verbs, is also applied while determining the past tense of irregular verb. On encountering ‘hold’ an irregular verb, that one wants to use in the past tense, one may use ‘holded’ as ‘holded’ is both a verb, ends in ‘ed’ and is also very similar to ‘folded’. While comparing ‘hold’ with a prototype, one may not have the additional effect of rhyming similarity with exemplars, that is present in the exemplar case; and thus, females who are supposed to use an exemplar system predominantly, would be more susceptible to over-regularization effects as opposed to boys. Also, this over-regularization would be skewed, with more over-regularization for similar rhyming regular verbs in females. As opposed to this, boys, who are usinbg the prototype system predominantly, would not show the skew-towards-rhyming-verbs effect. This is precisely what has been observed in that study.

Developing Intelligence has also commented on the same, though he seems unconvinced by the symbolic rules-words or procedural-declarative accounts of language as opposed to the traditional confectionist models. The account given by the authors, is entirely in terms of procedural (grammatical rule based) versus declarative (lexicon and pairs of past and present tense verb based) mechanism, and I have taken the liberty to reframe that in terms of Prototype versus Exemplar theories, because it is my contention that Procedural learning , in its early stages is prototypical and abstractive in nature, while lexicon-based learning is exemplar and particularizing in nature.

This has already become a sufficiently long post, so I will not take much space now. I will return to this discussion, discussing research on prototype Vs exemplars in other fields of psychology especially with reference to Gender and Hemisphericality based differences. I’ll finally extend the discussion to categorization of relations and that should move us into a whole new filed, that which is closely related to social psychology and which I believe has been ignored a lot in cognitive accounts of learning, thinking etc.