Interaction Design Explanations Conserved for Posterity

You are reading an older blog post. Please be aware that the information contained in it may be technologically outdated. This text may not necessarily reflect my current opinions or capabilities.

This is an English translation of a blog post that was originally published in German.

December 28th, 2011

In the summer semester of 2011, I was a student assistant for the Interaction Design course by Prof. Oberquelle. In this context I answered some questions which came up during the preparation for the exam in written form in the (closed) CommSy room of the course. It would be a pity if these words were simply lost, so I have slightly polished them up here and reproduced them coherently. Whether the contents will still be relevant in the next round of the course is, of course, not yet set in stone. Nevertheless, have fun with it!

Tangible Media

I have just looked through the MIA slides again and unfortunately actually found no definition there. There are examples on p. 17 and 18 of MIA11-4.pdf. The basic idea is probably already reasonably well clear from examples.

But what is meant by the term? With the dictionary meaning of “tangible” – tangible, touchable – you already get pretty close. A tangible user interface allows us to control the computer by manipulating physical objects with which we achieve a very high directness of interaction. Simply put, tangible media can be anything where physical objects are “connected” to digital representations, so that when we change the state of the physical objects, we change the state of the digital world. That's still relatively fuzzy, though, so I'll just tell you a bit more in detail…

To start with, tangible media is not a particularly technical term. After all, from a technical point of view, every input device is somehow physically manipulated and does something in the computer system through its sensor data. Conceptually, tangible media belongs more in the pigeonhole of usability research, experience design, etc., and touches on things like users' mental models in addition to technology.

Something like a mouse and keyboard are general purpose control devices that we can use to perform many different tasks. These devices are reduced to their basic functions (keyboard: single, discrete keystrokes, mouse: movement in the 2D plane) and all other properties are abstracted away. In turn, our highly complex desktop software systems are built on these basic functions. Because of this “gap” between physical objects and usage, I would not call mouse and keyboard tangible media. This impression is reinforced by the fact that computer users learn quite quickly to “forget” the mouse and keyboard and to think only in virtual concepts.

With tangible media, on the other hand, I have physical objects that are closely coupled with the digital representation, so that my interaction seems “direct” to me. By touching and moving something, for example, I change the digital world as if I had touched and/or moved it in the same place and in the same way. Second, these phyic objects, together with their digital representations (usually as a unit), are central in my mental model. They are not only tools but also materials.

An exciting question in this field would be to what extent touchscreens are “tangible.” The things are nice and direct and physical, but you don't usually change them and they don't have much to do with the material view. That might be an exciting question to discuss in an exam sometime. (Disclaimer: I have nothing to do with Prof. Oberquelle's exams and have neither insight into nor influence on the questions asked there).

I hope this helps. Wikipedia has a bit more on the topic: Wikipedia – Tangible user interface


For virtual reality systems, there are the so-called “Caves.” These are special rooms in which the walls, floor and possibly also the ceiling can be used as screens, so that a fairly high degree of immersion in a virtual world can be achieved for a person. (In most cases, stereoscopic 3D graphics are also possible.) One disadvantage of caves is that they are quite complex and expensive, because all the projectors have to be accommodated and work together somehow. Another disadvantage is that you can rotate around in the virtual world without any problems, but you can't walk around.

Roughly speaking, an L-Shape is a light version of a Cave. You leave out the ceiling and three of the four walls, so you only have to worry about one wall and the floor. (Think of it spatially, and you'll see the shape of the letter “L.”) That lowers the financial cost enormously, but of course it also reduces immersion. You have to weigh that up when planning such systems. In L-Shapes, unlike in Caves, you can no longer turn around without leaving the limits of the graphic representation. Otherwise, L-Shapes and Caves are functionally quite similar.

General Info on Models

I hear quiet murmurs from time to time about why, in the HCI field, we should actually bother with all these models that are all somehow similar and yet different, and yet do not fully represent reality. Why do we bother?

From a scientific theory perspective, they are a way to describe reality. A big part of even our work as scientists (you'll probably realize this by the time you're writing your bachelor's thesis) is not only to figure things out and explain them, but also to make things communicable that couldn't be talked about before. We observe complex relationships in the real world and want to talk about and deal with them in the scientific community. For that to happen, we first have to abstract and structure what we think we see. The PACT model is also an attempt to cast hitherto uncommunicated real-world interrelationships into a structure in which they can be understood by outsiders (i.e., including us, for the time being).

After having dealt with such models for a while, the question sometimes arises what one is supposed to have learned in the process. In fact, the PACT model is a good example of this as well: as moderately to well-qualified computer science students, it basically tells us nothing that we didn't already know in principle somehow. The merit of it all, though, is that we have a common basis for communication after we've been exposed to the model. We can use the same terms among ourselves, can be reasonably sure that we mean roughly the same thing, and – here's the kicker – can build on it. When studying for the ID exam, we may not yet realize how much we benefit from this, but if we can later reference the PACT model in a final paper instead of explaining the relationships over and over again from scratch, we will.

Leavitt's Diamond and the PACT Model

Now to Leavitt's Diamond and the PACT Model (cf. p. 7 of ID_11-1.pdf). What is this all about? The two models are so similar that I can probably say something about both at the same time.

A lot of things can go wrong when developing user-centric systems. As developers, we have to keep many things in mind at once. In the two models I mentioned, we have a series of terms (Leavitt's Diamond: human, task, organization, system; the PACT model adds technology) and double arrows connecting each (PACT: almost every) term to every other. What is this supposed to tell us? Perhaps we first consider the terms in isolation, one by one:

People : Our systems are used by people. People have different abilities, physical and mental conditions, experiences, inclinations, and tastes. When developing usable systems, we should be clear about who our target audience is.

Activity: As a rule, someone who uses software wants to achieve something through it. The times when computers have to be considered only in the context of workplace activities are over, but still they are not used without some goal (even if it is fun or coping with boredom). The design of a system depends in no small part on what tasks you want it to be able to perform.

Context: In what organizational context will the system be used? Is it a company (a concrete one?), an association, a loose community? The system must be able to be integrated into existing rule systems and should support collaboration.

System: The system itself, the component over which we have the most control.

Technology (only in the PACT model): The technological framework, input and output hardware, resources.

These are all things we must have in mind if the development of usable systems is to succeed. Now the central message of Leavitt's Diamond and the PACT model is this: All of these things are very closely related. You can't change any of them, so to speak, without affecting the rest. If we change the user group (the people), then their tasks and goals change. If we change the organizational context, we encounter different technologies. I think the originators of the PACT model primarily want to call on us not to make short-sighted changes to the system or to other areas without thinking about the consequences on a large scale.

Interaction Space, Visualization Space, Display Space

The slides for this are MIA11-4.pdf, pages 3 to 19, so almost the complete file. At the beginning, the model is presented in general, then there are slides that make the individual parts more concrete.

The Spaces model is, oh wonder, a model. So everything I wrote above about models applies here as well. It represents an attempt to discretize and describe something flowing and continuous.

There are four Spaces in the model: Interaction Space, Display Space, Visualization Space and Internal Model.

Interaction Space refers to the space in which the user physically interacts with the system. It does not include the user's mental processes or the internal processes of the system. Interaction Space usually includes the user's body and all control and sensory hardware. Depending on how this is constructed, it is more or less easy to assign a dimensionality to the Interaction Space. If I only have a mouse, then it is two-dimensional; a VR glove would be three-dimensional.

The Display Space is the space where the output of the system takes place. It can be a two-dimensional monitor, but it can also be a three-dimensional cave, a three-dimensional directional sound system, or other adventurous things. It is important to note that Display Space does not include what is being displayed (that will come in a moment), only the hardware on which it is displayed.

The Visualization Space is the place that is graphically represented on/by our display(s). This can also typically be two-dimensional (desktop) or three-dimensional (3D game, VR environment), other dimensionalities are quite rare. What is striking here is that you can display a 3D visualization space on a 2D display space (with sacrifices in immersion), as well as a 2D visualization space on a 3D display space (though hardly anyone is likely to do this simply because of the cost).

Closely related to the Visualization Space is the Internal Model, which describes how the system internally describes and stores the Visualization Space. For example, apart from the spatial dimensions, the system could store other values as dimensionality (e.g. air pressure at each point), which are not visible in the visualization at first. The difference to the Visualization Space is that the Internal Model contains everything that takes place “under the hood” of the system in the calculated world.

Articulatory and Semantic Distance

These keywords are found in ID_11-2.pdf on page 9 (and, as it seems to me, only in this one place) in relation to the diagram on the same page.

One axis of the diagram is labeled “directness.” The semantic and articulatory distances are aspects that negatively affect directness. Let's grab a couple of examples from the diagram.

There assembly languages are classified under very low directness. (Note: The directness to the mental processes of the user is referenced here, not to the functioning of the machine). Assembly languages have a high articulatory distance, because they are relatively far away from the thinking and talking of humans. One's own thoughts have to be translated comparatively laboriously into the crude vocabulary of assembly language before they can have their effect. In contrast, a text in a technical language, which is classified in the diagram under very high directness, ideally has a one-to-one correspondence of the thoughts behind it and the words formulated. In between, there is little translation effort or “rethinking.”

The semantic distance, on the other hand, means a gap between the possibilities of acting. For example, a virtual 3D world with a high degree of immersion probably also has a low semantic distance, because there is a strong congruence between my goal “to move forward” and the action possible in the system “to move the camera or the avatar forward.” In a text adventure, on the other hand, I have to translate my “move forward” into a textual command, type this in, and process the feedback. The application context “navigation through a world” suffers from the semantic distance to the functionality “manipulation of text commands.” I hope no text adventure fans are getting on my case for this – I don't want to have said that it can't be fun.

To all this I would like to note that I have extrapolated this also only from the few keywords, which are written on the slide. I may well have made mistakes in the process. Please don't rely on me as an ultimate authority here – especially if your interpretation differs from mine I'm very interested in hearing about it.


You can leave a comment by replying to this Mastodon post from your own account on Mastodon, Firefish, Akkoma, or any other ActivityPub-capable social network that can exchange replies with Mastodon.