Thursday, May 1, 2014

Visual Thinking for the Machinic Theorist

Visualization has proved to be essential to many projects in digital humanities. The visualizations aren’t mere illustrations of verbal concepts, helpful but not essential. They are the stuff of thought itself.

Sydney Lamb, one of the first generation of researchers in computational linguistics, has remarked that “... it is precisely because we are talking about ordinary language that we need to adopt a notation as different from ordinary language as possible, to keep us from getting lost in confusion between the object of description and the means of description” (Pathways of the Brain: The Neurocognitive Basis of Language, John Benjamins 1999, p. 274).

Working independently, and on Shakespeare plays, not language itself, Franco Moretti expresses a similar idea (Network Theory, Plot Analysis, Stanford Literary Lab, Pamphlet No. 2, May 2011, p. 4):
Third consequence of this approach: once you make a network of a play, you stop working on the play proper, and work on a model instead: you reduce the text to characters and interactions, abstract them from everything else, and this process of reduction and abstraction makes the model obviously much less than the original object – just think of this: I am discussing Hamlet, and saying nothing about Shakespeare’s words – but also, in another sense, much more than it, because a model allows you to see the underlying structures of a complex object.
These passages speak to cognitive mechanisms. Whether or not any given line of reasoning and experiment is valid, that’s a question for episdemology proper. Visualization is a matter of cognitive strategy; it’s about thinkability.

Some years ago I wrote an article on visual thinking for an encyclopedia:
William Benzon. Visual Thinking. Allen Kent and James G. Williams, Eds. Encyclopedia of Computer Science and Technology. Volume 23, Supplement 8. New York; Basel: Marcel Dekker, Inc. (1990) 411-427. Available on line: https://www.academia.edu/13450375/Visual_Thinking  
I’ve reproduced the three sections of the article in the rest of this post.

• • • • • •

Visual Thinking: A Speculative Proposal

Given that visual thinking has a motor component, I offer the following speculative proposal: visual thinking involves the internalization of visuo-manipulative activity and of movement through the environment. We move through the physical environment, sometimes in a familiar place, sometimes in a strange place; we handle objects, sometimes to accomplish a specific task, sometimes simply to inspect the object. Visual thinking involves imagined locomotion in imagined settings, imagined manipulation of imagined objects. The settings and objects may be real, but not present, or they may exist only in imagination. In defining visual thinking in this way I am aligning myself with an approach to thinking which derives from Lev Semenovich Vygotsky's seminal analysis of the relationship between thought and language.

Vygostsky was a developmental psychologist who argued that thinking involves the use of inner speech to control mental activity. To simplify matters a great deal, he investigated a developmental sequence which goes like this: 1) First, the young child is subject to speech from adults, who use it to direct the child's activity, pointing out things to see, telling the child what to do. 2) As the child learns to speak, he or she learns to direct his or her own activity by talking, aloud, to himself or herself. 3) After awhile external speech becomes unnecessary. Inner speech is stable enough so that the child can use it to direct action and perception. This inner speech is the stuff of thought.

What is important about Vygotsky's account is that it involves the internalization of an external action. External speech requires physical activity, moving the vocal musculature, which has physical effects, the propagation of sound waves. These sound waves reach the ear where they are detected by the auditory apparatus and decoded. Vygotsky's theory (and the observations behind it) clearly implies that, at one stage in the child's development, the brain is using an external communication channel — the propagation of sound from mouth to ear — to manage its operations. After awhile, however, it becomes possible to communicate the same information through an internal channel. External speech has become internalized.

In order to construct a similar account of visual thinking we can turn to the work of Ulric Neisser. In common with many psychologists, Neisser believes that perception is an active process. The mind does not passively accept the impress of external stimuli as a wax tablet accepts the impress of a stylus. Rather, the mind actively structures sensory input. This process involves a perceptual cycle in which internal schemas representing objects direct exploration of the environment for information about objects. The information thus obtained is being continuously used to guide further exploration of the environment.

Images are said to exist when the perceptual cycle is activated with schemas for which no external objects are available. Thus, one may be looking for one's gloves. The “gloves” schema is activated, but the gloves aren't visible. In this situation one has an image of the glove. But, for Neisser, this image is not so much a picture in one's head as it is perceptual preparation to see and recognize the gloves whenever and wherever the environment makes them available. Both the perception of an object and the mental image of that object are grounded in the perceptual cycle; both arise through the activation of the mental schema for that object.

For Neisser, what needs to be explained about mental images is how they get detached from immediate perceptual activity. That is, how do we activate schemas in a context where there is no external support for them (that is, no external objects corresponding to the activated schemas)? One possibility, Neisser suggests, is locomotion. Perception is rapid, but locomotion is relatively slow. Consequently, as we move about in a familiar environment we are constantly anticipating things which we cannot yet see. This anticipation entails the activation of schemas in the absence of objects to which they correspond. We anticipate the appearance of the corner drugstore before we actually reach it; we anticipate the appearance of the pharmacist before we actually give her our prescription; and so forth. Neisser does not provide any explicit mechanism for how we activate these images, but his discussion makes the existence of such a mechanism seem plausible.

It is in this context which Neisser discusses the method of loci. For Neisser it is a clear example of the activation of visual schemas in the absence of external stimuli to support those schemas in the perceptual cycle. The schemas created by walking through an environment have become so thoroughly learned that they can be set in motion in the absence of that environment and be used to activate other schemas (the images associated stored at the various loci).

Now we can work our way back to Vygotsky by considering Neisser's account of language. Most of our schemas will be multimodal, we can see, hear, touch, smell, and taste things. If we hear a dog barking, we expect that by looking in the direction of the sound, and perhaps moving, we will see the dog. Seeing the dog, we know that by moving even closer we can smell it and touch it. Thus the arousal of a schema in one sensory mode can prime the perceptual cycle for perception in other sensory modes. According to Neisser, when the child is first learning language, the names of things become assimilated to the general schemas for those things. Thus, when a young child sees a dog and hears it called “dog,” the child assimilates that (linguistic) sound to its general schema for the dog. Just as the multimodal dog has an appearance, an odor, a texture and a touchable shape, and makes characteristic noises, so it also has this peculiar auditory attribute “dog.” Just as the appearance of a dog can activate the complete multimodal schema, leading the child to anticipate certain odors and tactile sensations, so hearing this special auditory attribute, “dog,” can also active the complete multimodal schema.

Perhaps the most peculiar property of the name is that it is the part of the complete schema which comes under the child's direct control. The child can, by moving the vocal musculature, utter the sound, “dog,” and thereby activate the dog schema in the complete absence of any external stimulus. In this way language gives us the capacity arbitrarily to manipulate our perceptual schemas. We generate words, aloud or, later, silently, and the schemas are aroused and manipulated according to the structures of words which are generated.

Obviously we can use inner speech to call up and to manipulate visual images, yielding a form of visual thinking. Neisser's account of the mental image, including his treatment of the method of loci, suggests that we may have some visual thinking which doesn't depend on inner speech. In both cases internalized motor activity (whether locomotor and manipulative, or vocal) is involved in visual thinking.

The important point is that thinking is virtual action. This implies that, if we want to think about how computer graphics extends visual thinking, we have to consider, not only what kind of images can be created, but how those images are manipulated. In a sense, visual thinking is always visuo-motor thinking.

The Controversy over Mental Images

To this point we have, for the most part, assumed that mental images exist. This is not, however, an unproblematic assumption. There has, in fact, been a great deal of controversy over whether or not mental images are real, and, if so, whether or not they play an significant role in thinking. Do we think in words, or at least in some quasi-verbal, or propositional, code? Or do we think in images? That, propositions or images, is the primary distinction, though it is qualified and elaborated with almost endless subtlety and sophistication.

Many of the central issues have been discussed by John Anderson, Stephen Kosslyn, and Zenon Pylyshyn. While the results of psychological experimentation play a large role in these discussions, much of the disagreement concerns fundamental issues about just what kinds are inferences are permissible. Perhaps, since computing has provided many of the concepts and metaphors which inform this debate, the easiest way to approach these fundamental issues is to begin with an extended analogy from computing.

Imagine, on the one hand, the monitor of a CAD system which is displaying a pleasing shaded image of, for example, a gear assembly. On the other hand, we have another monitor, linked to the same system, which is displaying a fragment of the code for a program which controls some numerically controlled device used in making one of the gears in the assembly displayed on the first screen. Obviously, one of these displays is an image and one is not. Yet, both displays have been generated from the same database and the software generating those displays could well have been written in the same language. Thus, at bottom, both of these displays offer information which is coded in a propositional form.

The fact that the displays are quite different, that one is an image arrayed in two-dimensional space and the other consists of one-dimensional string of “words,” obviously is not inconsistent with the fact that both are implemented in systems which use a propositional coding. Regardless of the display format, all of the computation is propositional. Those who argue against mental imagery make much the same argument with respect to the human mind. The basic processes are more or less verbal and propositional and mental images, where they exist, are mere epiphenomena playing no deeper a role in thinking than blinking lights play in the operations of a computer.

A counter argument could be constructed in the following way: We know that, from a theoretical point of view, any conceivable computation can be carried out in any language which meets certain fairly simple criteria. A bit more practically, anything which can be programmed at all, can be programmed in Fortran, or Lisp, or C, or Pascal, or APL, whatever. But, when you get down to it, not all languages are equally suited to all programming tasks. The theoretical equivalence of Lisp and Pascal does not translate into equivalent practical utility. Or, moving into the hardware domain, the fact that, theoretically, any computation which can be executed on a parallel machine can also be executed on a serial machine has little force in a physical world where all real computations take place at finite speeds. In this world there are computations which are so large that serial execution would take days or weeks. Where it is possible, there is real practical value in breaking such computations into many pieces, each assigned to its own processor, with all of the processors running in parallel.

So, the counter argument would go, it might be with the mind. In theory all mental activity could be implemented in some sort of verbal code, but, in practice, it is not. For there are some activities where an imagistic code is more efficient.

This kind of argument has some formal justification in Miriam Yevick's work on holographic logic, which was inspired, in part, by Karl Pribram's advocacy of holography as a model for neural processing. She is interested in the relationship between the complexity of an object and the complexity of a representation adequate to identify the object. For geometrically simple objects, such as squares, triangles, crosses, and circles, a propositional formalism of some sort is quite adequate. But for geometrically complex objects, such as Chinese ideograms or faces, an adequate holographic representation is simpler than an adequate linguistic representation.

A holographic representation of an object or a scene is certainly not, in any simple sense, an ordinary image of that object or scene. But it is not unreasonable to think of it as a very special kind of analog representation and it is certainly quite different from the sort of propositional representations favored by the opponents of mental imagery. The basic point, however, is simply that the fact that anything can be linguistically encoded need not imply that the mind, or the brain, does in fact do so. Most objects in the natural world — human beings, animals, trees and bushes, and so forth — are not geometrically simple. It is thus worth considering the possibility that vision uses a representation scheme which is more appropriate to such objects than propositional schemes seem to be. At the very least Yevick (and Pribram) offer an alternative to both simple mental images and propositional formalisms.

A simple empirical argument, and one which has been quite influential in recent thinking, comes from experiments performed by Robert Shepard and his colleagues. Subjects would be shown a target figure and another figure similar to it but having a different orientation. They had to compare the figures and determine whether or not they were identical. What Shepard discovered is that the length of time subjects took to reach a decision was proportional the angle through which a figure had to be rotated so that it had the same orientation as the target figure, thus allowing a direct comparison. If one figure had to be rotated 40 degrees and another had to be rotated 60 degrees, then it took one and a half times as long to reach a decision in the second case as it took in the first. Shepard interpreted this as support for the idea that subjects were mentally rotating a mental image of the figure. Mental rotation would thus be one basic operation in visual thinking. Others types of operation, such as scanning and zooming in, are suggested by the work of Stephen Kosslyn and his colleagues.

Operations such as rotation, scanning, and zooming might well be the primitive operations of visual thinking, with higher level processes being organized by structures such as the method of loci or by inner speech. The argument about mental imagery is about whether such operations are, in fact, primitive, or whether they are implemented in some as yet undiscovered propositional form of neural coding.

Images As Tools for Thought

Perceiving and thinking about the visual world is one thing. Creating imaginary visual worlds and thinking about them is a bit different. But in both cases we are thinking about objects deployed in space. But images are used as tools for thought in more abstract ways.

The basic idea is simple. Many very simple visual images have been used to express ideas. Each of these images can be considered a kind of “visual proverb,” a visual image which can be variously applied in different conceptual realms much as we call on ordinary proverbs. For example, consider the triangle. A linguist might use a triangle to visualize the relationship between a word, the object designated by the word, and the perceptual schema used to recognize the object. But for the Christian theologian, the triangle is likely to be used to visualize the relationship between the Father, the Son, and the Holy Ghost. These are very different conceptual realms, but the triangle has conceptual value in both of them. The triangle is thus the vehicle for a “visual proverb.” What are the visual properties which make it a useful image to think with? What about other simple geometrical forms, squares, crosses, circles, or spirals? Beyond these simple forms, what about more complex images — for example, William Butler Yeats, the Irish poet, was deeply impressed with an image which consisted of two interpenetrating cones.

These forms are, of course, common in the visual symbols used in all cultures for religious, artistic, and decorative purposes. The question we are asking now is about how these forms help us organize experience and ideas which may not be inherently visual, which may not be directly linked to perceptions of the external world. How do we think with these forms? Is this at all similar to the “more or less clear images” which Einstein asserted to be at the root of his own thought processes? Can we think of Feynman's quantum mechanical diagrams as a rigorous development of imagery of this sort? In a different vein, these visual forms are the substance of much of the apparently aimless doodling which many of us do. But is doodling aimless, or is it a form of visual thinking?

At least one place to begin looking for answers to these questions is in the development of drawing skill in children. Howard Gardner reports that, before they try to make representational drawings, children will spend a great deal of time drawing simple geometrical images — circles, squares, crosses, arrows, and so forth — and combinations of them. This activity is concerned only with the creation and elaboration of graphic forms, not with using graphic forms to represent objects and scenes. In Gardner's view the child is learning the properties of the graphic medium. Not only must the child learn how to control the movement of the drawing implement, but the child must also learn the correlations between what is seen in visual space and what is done in motor-tactile space.

It is possible that the figures drawn are those which are pleasing in both visual and motor space. For example, the circle is visually elegant — an enclosed boundary with uniform curvature — and motorically pleasing — a smooth continuous motion which ends where it started. The repertoire of simple geometrical forms might thus be a basic “vocabulary” of equivalences between visual and motor space.

These visuo-motor equivalences could then be the seeds around which the later conceptual use of graphic images can be built. The child first uses these correlations in the process of combining lines and circles and squares to represent people, houses, dogs, and so forth. Later, the adult can take these correlations into other, often abstract, conceptual realms. In fact, once the adult thinker shifts his or her thought away from the phenomenal world, what else is there to think with but words and simple inter-modal conceptual objects — visuo-motor triangles, squares, circles, crosses, etc. — which do not, in themselves, represent anything?

Select Bibliography

Anderson, John R., “Arguments Concerning Representations for Mental Imagery,” Psychological Review 85: 249 — 276, 1978.

Gardner, Howard, Artful Scribbles, Basic Books, New York, 1980.

Kosslyn, Stephen, Steven Pinker, George E. Smith, and Steven P. Shwartz, “On The Demystification of Mental Imagery,” The Behavioral and Brain Sciences 2: 535 — 583, 1979.

Neisser, Ulrich, Cognition and Reality, W. H. Freeman, San Francisco, 1976.

Pribram, Karl H., Languages of the Brain, Prentice-Hall, Englewood Cliffs, New Jersey, 1971.

Pylyshyn, Zenon, “What the Mind's Eye Tells the Mind's Brain: A Critique of Mental Imagery,” Psychological Bulletin 80: 1 — 24, 1973.

Pylyshyn, Zenon, “Computation and Cognition: Issues in the foundations of Cognitive Science,” The Behavioral and Brain Sciences 3: 111 — 169, 1980.

Shepard, Roger N, “Form, Formation, and Transformation of Internal Representations,” in Information Processing and Cognition (Robert L. Solso, ed.), Lawrence Erlbaum, Hillsdale, New Jersey, 1975, pp. 87 — 122.

Vygotsky, Lev Semenovich, Thought and Language, The MIT Press, Cambridge, Massachusetts, 1962.

Yevick, Miriam Lipschutz, “Holographic or Fourier Logic,” Pattern Recognition 7: 197 — 213, 1975.

No comments:

Post a Comment