The intention of the SIIC sound design framework is to give the designer a set of tools or ideas when conceptualising an auditory display for self-driving cars. While we are focusing on the sound modality here, the ideas presented may as well be applicable to a screen-only or a multimodal display solution (which means one that combines several types of information like visuals, sound, vibrations, you name it…)
The main goal of the framework is to solve the following User Experience (UX) issues that you may experience as a passenger in a self-driving car:
Trust. Even though you know the car is made by a respected manufacturer, it may feel scary to ride in it when no one is seated at the steering wheel. Hey, the car might not even have a steering wheel! Sounds that inform you in a comforting way about what the car will do or what it sees can make it easier to let go of that self-driving car anxiety and fully embrace the future of transportation.
Motion sickness. Ever tried to read something on your phone while someone else is driving a curvy inner-city route with a lot of harsh accelerations and decelerations? Yeah, many people do get quite sick from this even in today’s cars. A self-driving car always knows what will happen in advance, and simply by telling you what an upcoming manoeuvre will be like, motion sickness can actually be reduced. An auditory display may be perfect for this purpose since you don’t have to look away from your phone to perceive it.
Usability. Even though a self-driving car allows you to be a passenger and not really be involved in the driving task, you still need to know how to actually use it. “Is this the robotaxi that I booked?” “Is it ok to close the door?” “How do I start it?” “The seatbelt should be fastened… of course” “How do I change the route?” “Ah, I always forget my purse in the car!”. Many issues can of course be solved with the traditional screen interface. But an auditory display can be more intuitive, convenient, comforting and human-like. It also allows those who have visual impairments or cannot read, to use the car. After all, you use your ears when you interact during a normal taxi ride, right?
Aesthetics. Have you ever been annoyed by the parking sensor beeeeep? A self-driving car doesn’t need your attention or actions to drive safely so the need for urgent beeps is gone. While we think that all in-car sounds should be designed to be as pleasant as possible as long as they are effective, the need for paying attention to the aesthetics of the sound is even greater in the case of self-driving car sounds. They should be there to gently inform you, give you comfort and guide you through the interaction in a way that feels natural, non-intrusive and smooth. They should definitely not make you startled or annoyed and they should definitely not give you the sensation that you need to floor the brake pedal.
The main idea behind the SIIC framework is a set of sound types or sound “layers”. We use the term layers instead of sound types since the sounds that we recommend using in self-driving cars are not the type of traditional sound chimes that you find in your smartphone, but should be smoother and more continuous. Also, multiple sound layers may be active at the same time – hence the name layers.
Each layer has a different purpose when it comes to informing the passenger. They also relate differently time-wise to an event (for example, a specific planned manoeuvre or similar things happening in relation to the ride); some layers are supposed to be triggered well in advance to the event, others slightly in advance to the event, or even after the event has taken place. The figure here below shows how.
The Emotional Layer. Sound means feelings, and of course we want the feelings of the passengers to be good ones. In general within our framework, we think it is important to design all sounds to evoke positive and calming emotional responses. But the emotional layer also represents that you could use sounds for no other reason than to create a good atmosphere in the car. An example could be presenting a calming, ambient welcoming sound when the passenger sits down in the car, to emphasise that the passenger is about to experience a reliable and high quality service, or a similar goodbye sound when the destination is reached and the ride is over.
The Requested User Response Layer. The self-driving car is supposed to do all the driving, so why do I need these sounds? Well, we think there are still some things that you as a passenger need to do all by yourself in the era of intelligent cars. For example, you will need to step into the car, you may need to close the door, you probably need to put on your seat belt and you may need to press some button to get the ride going. Yeah, we know it’s a shame not all things can be automated, but some encouraging and guiding “requested user response” sounds may make these tasks a little bit more enjoyable.
The Strategic Layer informs the passenger well in advance of an event (on the order of minutes to several seconds). It could for example be information about the planned route, how far away your destination is (“Are we there yet?”) or that the upcoming route segment contains some particularly motion-sickness turns (meaning that you should probably stop reading soon…). It is possible that you will find it difficult design a sound-only solution for this layer, so text (or speech) may be needed for the message to be conveyed properly.
The Perception layer tells the passenger what the car perceives that may be of specific interest to the passenger. Knowing that the car sees a person jaywalking while deeply engaged in the latest youtube clips maybe comforting and may as well as give you a hint of what the car will do next (that is, drive very carefully, slow down, and perhaps even signal to the person that the car is approaching – although we all know that it may be very difficult to communicate with people watching funny cats videos).
In many cases, the Perception layer sounds will be followed by the Intention layer sounds (see next section) and will make totally sense.
The Intention layer gives you information regarding what the car will do within the next 1-2 seconds or so. Basically, it says what the car’s next manoeuvre will be like, whether it’s slowing down, speeding up, turning slightly left or hard right, or something else. If you’re even just a hint of a control freak and want to what the car is up to, this may be the layer for you. If you are the one who is constantly looking at a screen (who isn’t?) and often become carsick, intention sounds can help you feel more comfortable.
The Current Action Layer. Sometimes you may get the feeling that you simply don’t understand what’s happening or what just happened. Why is the car stopping? Did we just swerve to avoid a garden gnome standing on top of a cargo box or was I hallucinating? The information presented though the Current action layer can give you that comforting sense of being on top of all the strange things happening in daily traffic even if you’re not doing the driving. Since this layer may contain complex information, we think the usual toolbox for sound design could be too limited to create a compelling design. Think about using primarily text or maybe even spoken content if you want to design for this layer.
Selection- and design of layers
So many different sounds, how do I (as a designer) separate them and how do I make the passenger understand what’s what? We should perhaps start by saying that the SIIC framework a merely a toolbox and that you probably shouldn’t use all tools at once. What you should use depends on what type of passenger you have in your car and what the use case is like. A first-time control-freak rider, skeptic about new technology may want to have more support than the regular self-driving commuter. The person who needs to work but easily get carsick may mainly want the intention layer.
But regardless of how you do the personalisation and adaptation to different use cases, we think it is wise to use different types of sounds for different layers so that the passenger can easily connect a sound to a specific type of information. A suggestion on how to use different sound types for different layers is shown in the picture on the side. In some cases, it may be good to use a traditional, well-established type of sound chime (like “put on your seatbelt” – ever heard that sound before?). In other cases, we think it is better to use a more continuous sound that is more easily linked to what’s actually going on (for example, when the car is about to slow down, then you use a sound of a car slowing down – simple but efficient our studies say!).
Spatial separation of layers.
Another thing that can help the passenger to connect the different layers to specific events is if we spatialize the layers differently. Humans are quite good at hearing where a sound comes from, both terms of the angle to and in terms of the distance to the sound source. The spatial hearing ability of ours lets us separate quite well between simultaneously sounding sound sources and focus our hearing on what we find interesting (imagine shifting your attention between different persons talking at a cocktail party). In the self-driving car, you will want to make sounds that represent something happening outside the car to be heard if it’s coming from the point where that something is happening. In the framework, we suggest that Perception sounds, which represent something which could be at any position outside the car should be spatialized in this way, using any of the fancy 3D audio techniques out there. Intention sounds, on the other hand, represent something that the car does and should therefore be perceived as coming from the car. Current action and Strategy sounds could relate to a variety of messages and their perceived location could either be at some fixed point or varied depending on message. Requested user response on the other hand always relate directly to the passenger and should therefore be perceived as coming from somewhere close to the passenger. Sounds specific to the Emotional layer, such as welcoming- and goodbye sounds, should be perceived as ambient and soothing and diffuse in terms of direction, so any traditional stereo widening effects or techniques applied to the sound should do the trick.