Jupyter notebook Supplementary Materials 2 - Representation.ipynb
Image: ubuntu2004
Supplementary Materials 2: Representation of discrete symbolic structures
In Part 2, we explain how a GSC model represents discrete symbolic structures in its continuous representation space where is the number of computing units. The following topics will be covered rather informally with a simple example grammar.
Tensor-product variable binding
Superposition of filler/role bindings
Superposition of complete structures
For more formal explanation, see Smolensky (1990).
Smolensky, P. (1990). Tensor product variable binding and the representation of symbolic structures in connectionist systems. Artificial Intelligence, 46(1), 159–216. http://doi.org/10.1016/0004-3702(90)90007-M
Example grammar
Consider a grammar G = {S A B | X Y}. We implement it in a GSC model as follows.
Representation of fillers and roles
When it constructs a GscNet object, the software generates distributed (or 'neural') representations of fillers and roles and stores them in the attributes F and R, respectively.
The representation of each of the four roles is distributed across four units. In the default setting, each activation vector is orthogonal to every other activation vector and has a length of 1. Users can specify pairwise role similarities in dot products but we will not cover the topic in this tutorial.
Encodings of fillers are presented below in columns of a heatmap.
Tensor-product variable binding
Let ( and ) be a binding of filler with role where and are the sets of fillers and roles, respectively. Let and be the activation vectors of filler and role , respectively. Then, the neural encoding of is simply the tensor product () of and where .
In a simple case, the tensor product is simply an outer product of two vectors, one representing a filler and one representing a role. The figure below shows the neural representations of a filler in the left column, a role in the top row, and the binding in the other units arranged in a matrix form. The units with two circles have negative activation values. The color of each unit represents the absolute activation value; the more black it is, the greater the absolute activation is.
The resulting matrix (or 2D array) is flattened into a vector (or 1D array). The two-element indices of the tensor product are ordered as follows: (0,0), (1,0), ..., (I-1,0), ..., (i,j), ..., (0,J-1), (1,1), ..., (I-1,J-1) where I and J are the number of units representing fillers and the number of units representing roles, respectively. By default, the numbers are set to the number of fillers and roles.
The software computes the neural representations of all possible bindings and stores them in the attribute TP. The attribute contains a 2D array in which each column corresponds to an activation pattern representing a unique f/r binding.
The heatmap below shows the activation patterns of f/r bindings in columns.
Conceptual and neural spaces
Each column of the TP matrix is an activation pattern representing a f/r binding in the neural space. In Figure 1a, the basis of the neural space is shown in blue and the activation patterns of bindings in the neural space are shown in green. An activation pattern (e.g., the red vector in Figure 1a) in neural coordinates is hard to interpret. Thus, we introduce a new coordinate system in which the set of binding vectors constitutes a basis of the new representation space, which we call the conceptual space (see Figure 1b).
The basis of the conceptual space is simply the identity matrix whose columns correspond to unique f/r bindings. Because , the set of neural encodings of f/r bindings, can be thought of as a change-of-basis matrix . The change-of-basis matrix for the opposite conversion is .
With the change-of-basis matrix, we can easily convert an activation pattern in the neural space to an equivalent one in the conceptual space, which is much easier to interpret: . The code block below randomly samples a filler and a role and then presents the representation of their binding in the neural space first and in the conceptual space second.
For an exposition of this idea, see:
Smolensky, P. (1986). Neural and conceptual interpretations of parallel distributed processing models. In J. L. McClelland, D. E. Rumelhart, and the PDP Research Group, Parallel distributed processing: Explorations in the microstructure of cognition. Vol. 2, Psychological and biological models, pp. 390-431. MIT Press.
For interpretability, the software will typically present the static activation patterns in conceptual coordinates in a matrix form as in the second figure above.
Superposition of f/r bindings
So far we have learned (1) any symbolic structure can be described as a set of f/r bindings by role decomposition (Part 1) and (2) each f/r binding can be represented as a tensor product of the filler and role activation vectors in a continuous vector space. In this section, we introduce the third element of the theory of neural representation of symbolic structures: (3) the composition of multiple f/r bindings is represented as the superposition (simply, sum) of the activation vectors of constituent f/r bindings.
For example, consider T1=[ [ [A B]]]. The neural encoding of T1 is simply the superposition of the neural encodings of the constituent bindings: where is an activation vector (in either neural or conceptual coordinates) representing a binding or a tree.
The heatmap at the top presents the activation patterns of four constituent f/r bindings as well as their sum (labeled as 'Superposition') in the neural space. Then, we changed the basis to read the coordinates of the activation vectors in the conceptual space and plotted them in a heatmap at the bottom. It is clear that the sum of the first four rows equals the activation pattern of T1 in conceptual coordinates.
The superposition of the distributed representations of constituent bindings is not catastrophic, if the encodings of fillers and roles each are linearly independent; it naturally follows the tensor-products of fillers with roles are also linearly independent. The GSC model does not have any difficulty in representing a single discrete symbolic structure in the neural space.
Superposition of complete structures
The ability to represent a single discrete symbolic structure is not enough for an incremental processing model. Given local ambiguity, any incremental processing model must be able to consider multiple structures simultaneously as candidate interpretations in the middle of comprehension.
Consider two grammatical structures =[ [ [A B]]] and =[ [ [X Y]]] generated by a simple grammar G. An intuitive way of representing the co-presence of both structures is to compute a weighted average of neural encodings of T1 and T2. For example, consider an activation vector , where is an activation vector representing a discrete structure . The problem with this scheme is that it is not clear which bindings goes with which other bindings. To see the problem, consider two ungrammatical structures =[ [ [A Y]]] and =[ [ [X B]]]. We now verify that an equally-weighted sum of and will produce the same vector as an equally-weighted sum of and : :
An equally-weighted blend of T1 and T2, , is identical with a corresponding blend of T3 and T4. This kind of phenomenon has been dubbed the superposition catastrophe (Von der Malsburg, 1986) and has been claimed to be a weakness of distributed representation. It appears problematic because in the middle of language comprehension, a system must be able to consider multiple interpretations.
Although the GSC model monitors only local and not global coherence, we will next see that its dynamics can unblend the blend correctly to build either T1 or T2 but neither T3 nor T4. Because the harmonic grammar of G specifies that a pair of bindings and is well-formed (by binary HG rules), the model tries to coactivate these bindings. Likewise, and support each other. Thus, whenever is activated, and are also likely to be activated. In the current example, goes well with , which will compete with in role . Thus, there is a tension between a group of bindings in T1 and another group of bindings in T2 in the blend state . As the model will ultimately be forced to choose one filler in each role (e.g., vs. ), symmetry breaking will occur and either T1 or T2 will be chosen. We will discuss this dynamic process in Part 3.
Von Der Malsburg, C. (1986). Am I thinking assemblies? In G. Palm & A. Aertsen (Eds.), Brain Theory (pp. 161–176). Springer Berlin Heidelberg.