Visual language? What even is that? Visual Language Theory and motion in comics

Visual Language Theory posits that sequential images in comics, like spoken and sign languages, follow specific combinatorial rules to convey meaning. The author's research focuses on depicting motion and time in static images, using techniques like postures and motion lines. The study highlights cultural differences in visual narratives and suggests that spoken languages influence visual representation in comics. This challenges the traditional view of language as primarily speech-oriented, emphasizing its multimodal nature.

Irmak Hacımusaoğlu

03 Oct, 2023
Visual language? What even is that? Visual Language Theory and motion in comics

The relationship between language and cognition, especially the effect of language on thought, are widely studied topics (Ünal, 2020). When it comes to language, we usually think of spoken languages first, and then sign languages. Though, what if sequential images also convey meaning by following certain combinatorial rules or principles? The ‘similarities’ between the images and the meanings they refer to (iconic meaning-making (Peirce, 1902) is perhaps the biggest obstacle to think of them as a part of language on their own. However, a visual expression also contains many symbolic ways of meaning-making, such as the fact that drawing of a heart shape has nothing to do with the heart as an organ, and this symbolic meaning needs to be learned. In fact, there are findings indicating people who are not exposed to visual systems as a part of their cultures experience difficulty understanding what is expressed in the visuals (Kennedy & Ross, 1975; Cohn, 2020). In this article, I explain the Visual Language Theory (Cohn, 2013), which leads to a new field and then mention motion in static images, which I investigate in my doctoral research based on this theory.

We can briefly describe the phenomenon we call language as the sounds or signs produced through an expression channel or modality and governed by combinatorial rules mapping onto a meaning in our mind (Jackendoff, 2002). Humans have three different modalities: verbal-auditory, visual-bodily, and visual-graphic. Sounds are produced through the verbal-auditory modality. Sequencing sounds by combinatorial principles and mapping them onto certain meanings stored in the mind form spoken languages. The signs produced through the visual-bodily modality are also subjected to certain rules and combined together to form sign languages (Cohn, 2020). Ray Jackendoff's ‘The Parallel Architecture’ notion (2002) suggests that this modality-grammar-meaning triad is independent of each other but interacts with each other and works in a parallel fashion. Neil Cohn then questions the position of drawings which are produced through the third modality i.e., visual-graphic modality. What if the drawings and sequential images produced through the visual-graphic modality are also combined by similar combinatory rules and create meaning in this way?

Before moving on to this theory, I should mention that important comic book artists and theorists such as Scott McCloud have written on an idea of visual language. In his book Understanding Comics, McCloud (1993) mentions the language of comics. He also presents his inspiring observations about the cross-cultural difference in visual narratives. For example, characters change frequently between panels in Japanese manga. However, this is much less the case in American or European comics, while action-related changes are more common in them (McCloud, 1993). What we should underline here is that comics as a medium is not a language by themselves; comics or other visual narratives are instead created using a visual language. Just like how we would not say that novels are a language, comics are not a visual language either, but they rather are sociocultural products drawn by using a visual language (Cohn, 2013). Then what is the visual language? Rooted in the intersection of linguistics and cognitive science, Visual Language Theory (Cohn, 2013) emerged based on the idea that in addition to spoken and sign languages, the sequential images found in visual narratives such as comics operate with similar cognitive principles. Just like how sound units are combined together through grammatical rules to form meaningful expressions; lines and shapes in an image or comic strips are also combined using combinatory rules. For instance, in Turkish, we derive new meaningful words by adding suffixes to existing words: 'yapamayıverecekmişçesine' (as if she/he will not be able to do (something) fast). Even though the verb 'yapmak' (to do) has a meaning by itself, the added suffixes such as '-ecek', (will) only make sense when preceded by a root. This concept also applies to visuals. While a drawn facial expression carries a meaning by itself, two curved lines added to the edge of that face give the meaning of ‘trembling’ only if they are added to the face, which is the root in this case (Cohn, 2018). Of course, this is the case if the meaning of these lines (as in what they refer to) exists in your mind.

Other than the morphological rules at the unit level, there are also rules for sequencing multiple images. Let's consider the grammatical categories such as nouns, adjectives, adverbs, and pronouns. Adjectives should precede nouns in regular sentences, and disrupting this order also disrupts the conveyed meaning. Visual Language Theory claims that there exists categories and a particular order in juxtaposed images as well (2013). Cohn's behavioral and neurocognitive studies also indicate that randomly sequenced images are viewed for a longer time than those with an expected sequencing. In fact, brain waves that are activated when such manipulations are made in verbal languages are also activated in this case (Cohn, 2020). Moreover, different reactions were observed in the brains of individuals who were exposed to certain drawing systems compared to those who were not. For instance, when a complex grammatical structure commonly used in Japanese manga is manipulated, the brain waves of those who read Japanese manga while growing up are different from of those who have not been exposed to this visual language system (Cohn & Kutas, 2017). This is similar to when individuals who have not been exposed to another language system have difficulty understanding that particular language.

According to this theory, ‘I have no drawing ability’ would be a misleading claim, as there is just a visual language system that is yet to be exposed and learned. Children tend to create more or less similar drawings e.g., in which there are a river running down between mountains, a sun and a chimney releasing smoke. Those who do not continue to practice drawing on the other hand would later declare not possessing any talent for drawing. However, just as in every language acquisition process, there is a critical period for visual language development (Cohn, 2012). In this sense, the Visual Language Theory formalizes and scientifically tests some of the phenomena mentioned by famous comic artists and theorists such as McCloud. If we were to wonder whether there is one common visual language in the world, we can benefit from Neil Cohn's TINTIN project which focuses on this very question. Cohn aims to expand the studies demonstrating Japanese manga and American superhero comics to be based on different visual systems (McCloud 1993; Cohn, 2020) by examining comic and graphic novels collected from all over the world. As it is a relatively new field, there are a lot of research opportunities on this topic. My doctoral research, which is part of the TINTIN Project, focuses on motion and time in comics. To briefly explain, I investigate how dynamic phenomena such as motion and time can be transcribed into a static and two-dimensional medium, as well as how these phenomena can be understood by the mind. In this article, I especially focus on the motion in comics.

First of all, how can we express motion via an image that cannot actually move? Possibly the most typical method is to draw the postures that the figures would take in action (each pose in Fig.1a) such as postures that indicate running, walking, or jumping. This method has an effect similar to seeing screenshots of a video. Even though we do not see the figure moving, we see the pose it would take at the beginning, the midst, or the end of the motion, and thus our mind can perceive the upcoming position by looking at the given static image (Kawabe & Miura, 2006). Another method is to add two or more lines (motion lines) trailing behind the figures or the objects (Fig. 1b). These lines indicate from where the motion has started and to where the object has travelled. In other words, we do not just see a screenshot of that very moment, instead, we also see where that moving object was before. These cues are indicators of the past moments that are not directly present in the given image. There are also occurrences when several parallel lines are added to the background behind the mover (Fig. 1c). It creates a similar effect to looking out when we are in a moving vehicle, and the buildings we pass by seeming like lines to us. These lines might also cover a part of the object or figure (Fig. 1d). The last two methods aforementioned also give us information about the speed of motion. Another method is to depict the whole or a part of a moving figure or object multiple times. In Fig. 1e, the three hands that I drew for the character do not imply the character has exactly three hands. It instead shows that the character is waving. Each hand drawing corresponds to a different moment in time. Thus, in a static medium, we can witness more than one moment at once.


Figure 1. a) Postural cues, b) Motion lines, c) Background lines that create the perception that we are going at the same speed as a moving object, d) Lines that cover a part of a figure or object, e) The method of repetition.

These lines have different meanings based on the visual systems they are drawn in. As an example, let us compare comics to instruction manuals (e.g., IKEA furniture installation guides or safety cards on planes). In such manuals, the arrows serve semantically as the opposite of motion lines used in comic books. Unlike motion lines, these arrows do not indicate a past motion but rather visualize the instructions for future actions. Moreover, cultural differences are also found when it comes to the motion in visual narratives (Hacımusaoğlu & Cohn, 2022). For instance, the background lines mentioned above, which give the sense that readers move at the same speed as the moving object, are especially common in Japanese manga. (McCloud, 1993). Then, we can also ask whether the languages spoken by comic book authors affect how they depict visual motion events? There are indeed findings indicating that different modalities affect each other. For example, in the Aymara language, the back of a person refers to the future whereas the front refers to the past. They are encoded as such in this language due to the idea that the past is known, and the future is yet to be discovered. Studies show that Aymara speakers point to their backs with hand gestures when they talk about the future (Núñez & Sweetser, 2006). Given these, in the rest of this article, I mention the first study of my doctoral research on whether there could be such an interaction between visual and spoken languages.

First, let’s examine the ways in which motion is expressed in different spoken languages. According to the classical typology of Talmy (1985), in verb-framed languages such as Japanese, Turkish, or French, the main verb alone indicates the direction of motion. However, in order to indicate the manner or characteristics of motion, additional grammatical structures are usually required. For example, the verb "inmek" in Turkish has the meaning of going down itself, without the need for a preposition or an adverb to specify the direction. Yet, we need an adverb to tell whether someone goes down by walking or running. As we have to describe the manner of motion by adding extra grammatical elements such as “Koşarak” (by running) in “Koşarak indim.” (I went down by running.), it is observed that people do not express the manner of motion unless it is a necessity (Berman & Slobin, 1994). In satellite-framed languages such as English, German, or Dutch though, the main verb itself encodes the manner of motion and you have to give the path information in satellites such as "out" in "I ran out". Since the verb phrases in this group end up encoding both the manner and the path of motion, speakers of these languages tend to express motion more saliently (Slobin, 2003). Some findings also indicate that these typological differences are reflected in the hand gestures of their speakers (Özyürek et al., 2005). On the other hand, there is not much research on motion events in visual narratives. Tversky & Chow (2017) found that comics created in satellite-framed languages are rated as more action-oriented than those created in verb-framed languages, but they did not directly look at how motion events themselves are depicted in the comics. Based on this finding, we examined motion events in 85 comic books collected from North America, Europe and Asia (Visual Language Research Corpus; Cohn et al., 2023) to examine the question of "Whether the way motion expressed in comics from two different language groups can vary from each other?" (Hacımusaoğlu & Cohn, 2022).

Our findings demonstrated that there is a potential relationship between spoken languages and their visual counterparts. First, we found that comics created by satellite-framed language speakers have more motion cues compared to the ones from verb-framed languages. Considering the found saliency in motion due to the language structure of satellite-framed languages, this result is in line with our expectations. Specifically, when we looked at the postural motion cues (Fig. 1a), we observed they were used similarly in both groups. The motion lines (Fig. 1b), on the other hand, were more present in the comics from satellite-framed languages. As mentioned, satellite-framed languages focus on the manner of motion. Since motion lines can also be used to modify the manner of motion by being drawn in varying shapes (Fig 2.), we think that might be why lines are used more in comics of this language group. Furthermore, we considered motion events as in spoken languages (Talmy, 1985) by looking at the starting points (source), midpoints (route) and endpoints (goal) (Fig.2) As expected, we also found midpoints were depicted more frequently in comics from satellite-framed languages of which main verbs indicate the manner of motion. We relate this finding to to the motion lines again, as the lines added behind the moving object indicate the route by default (Fig. 2). Since motion lines also depict the manner of motion, we discussed that motion lines might appear more in comics created by satellite-framed language speakers. When we examined individual languages within the two groups, we also observed some differences beyond what their typological classification would suggest. Even though French and German belong to different typological groups in terms of how they encode motion, the comics from these two languages similarly segmented scenes or events. We think this might be due to their shared visual language system (such as the European Visual Language), besides the influence of spoken languages alone.


Figure 2. The beginning/source of the motion, the mid-point where usually the manner of motion is shown/route, and the endpoint/goal of the motion.

In conclusion, our research supported the hypotheses that motion can be expressed in different ways not only in spoken languages, but also in visual languages, and that these differences can be an example of potential interaction between spoken languages and visual languages. Thus, it is important to incorporate drawings -that have been a part of human communication since ancient times- into such studies on language and to study the multimodal nature of language (Cohn & Schilperoord, 2022) challenging the commonly presumed unimodal (speech-oriented) perspective of language. Also, the notion that visual narratives also operate with their own principles and are not as ‘transparent’ or easily comprehensible as they were once considered (Coderre, 2020), should make researchers question the position of visual stimuli that are widely used in cognitive experiments and the findings based on them. Within Cohn's TINTIN Project, there are almost 1,000 visual narratives currently waiting to be analyzed. It is a matter of curiosity whether we would find similar results when we expand my mentioned work on motion across different languages in this project.

Research funding: This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement №850975)

References :

Berman, R. A., & Slobin, D. I. (1994). Relating events in narrative: A crosslinguistic developmental study. Hillsdale, NJ: L. Erlbaum.

Coderre E. L. (2020). Dismantling the “Visual Ease Assumption:” A review of visual narrative processing in clinical populations. Topics in cognitive science, 12(1), 224–255.

Cohn, N. (2012). Explaining ‘I Can’t Draw’: Parallels between the Structure and Development of Language and Drawing. Human Development, 55(4), 167–192.

Cohn, N. (2013). The visual language of comics: Introduction to the structure and cognition of sequential images (Bloomsbury Advances in Semiotics). Bloomsbury Academic.

Cohn, N. (2018). Combinatorial morphology in visual languages. In G. Booij (Ed.), The Construction of Words (Vol. 4, pp. 175–199). Springer International Publishing.

Cohn, N. (2020). Who understands comics?: Questioning the universality of visual language comprehension. London: Bloomsbury Academic.

Cohn, N., Cardoso, B., Klomberg, B., & Hacımusaoğlu, I. (2023). The Visual Language Research Corpus (VLRC): An annotated corpus of comics from Asia, Europe, and the United States. Language Resources and Evaluation.

Cohn, N., & Kutas, M. (2017). What is your neural function, visual narrative conjunction? Grammar, meaning, and fluency in sequential image processing. Cognitive research: principles and implications, 2(1), 27.

Cohn, N., & Schilperoord, J. (2022). Reimagining Language. Cognitive science, 46(7), e13164.

Hacımusaoğlu, I., & Cohn, N. (2022). Linguistic typology of motion events in visual narratives. Cognitive semiotics, 15(2), 197–222.

Jackendoff, R. (2002). Foundations of language: Brain, meaning, grammar, evolution. Oxford: Oxford UniversityPress.

Kawabe, T., & Miura, K. (2006). Representation of dynamic events triggered by motion lines and static human postures. Experimental Brain Research, 175(2), 372–375.

Kennedy, J. M., & Ross, A. (1975). Outline picture perception by the Songe of Paua. Perception, 4, 391–406.

McCloud, S. (1993). Understanding comics: The Invisible Art. New York, NY: Harper Collins.

Núñez, R. E., & Sweetser, E. (2006). With the future behind them: Convergent evidence from Aymara language and gesture in the crosslinguistic comparison of spatial construals of time. Cognitive Science, 30(3), 401–450.

Özyürek, A., Kita, S., Allen, S., Furman, R., & Brown, A. (2005). How does linguistic framing of events influence co-speech gestures?: Insights from crosslinguistic variations and similarities. Gesture, 5(1–2), 219–240.

Peirce, C. S. (1902). Logic as Semiotic: The Theory of Signs. In C. S. Peirce (Ed.), Philosophical Writings. Dover Publications.

Slobin, D. I. (2003). Language and thought online: Cognitive consequences of linguistic relativity. In D. Gentner & S. Goldin-Meadow (Eds.), Language in mind: Advances in the study of language and thought (pp. 157–191). Cambridge, MA: MIT Press.

Talmy, L. (1985). Lexicalization patterns: Semantic structure in lexical forms. Vol. III: Grammatical categories and the lexicon. In T. Shopen (Ed.), Language typology and syntactic description. (pp. 57–149). Cambridge: Cambridge University Press.

Tversky, B., & Chow, T. (2017). Language and culture in visual narratives. Cognitive Semiotics, 10(2).

Ünal, E. (2020, May 13). Dil ve Düşünce Etkileşimleri. CogIST.

Irmak Hacımusaoğlu

Irmak Hacımusaoğlu completed her bachelor's degree in psychology at Koç University and her research master's degree in cognitive neuropsychology at Vrije Universiteit Amsterdam. She works as a doctoral researcher, in the Visual Language Lab, at Tilburg University. Her research aims to investigate language and cognition by focusing on motion and time in comics. Her drawings and articles are also published in Aposto! News.



Extending Cognition

The Cognizer is a publishing platform initiated by CogIST, a cognitive science community from Turkey. On this platform, articles and essays on different topics from different fields of cognitive science are published in a way that would bridge the gap between public audience and experts.

Copyright © 2023