Bachelor Thesis

(1)

i

Exploring gesture based interaction and visualizations for supporting collaboration

Bachelor Thesis

Author: Andreas Simonsson Huck Supervisors: Bahtijar Vogel & Oskar Pettersson

Semester: Spring 2011 Level: C

Course code: 2ME10E

(2)

ii

Abstract

This thesis will introduce the concept of collaboratively using freehand gestures to interact with visualizations. It could be problematic to work with data and visualizations together with others in the traditional desktop setting because of the limited screen size and a single user input device. Therefore this thesis suggests a solution by integrating computer vision and gestures with interactive visualizations. This integration resulted in a prototype where multiple users can interact with the same visualizations simultaneously. The prototype was evaluated and tested on ten potential users. The results from the tests show that using gestures have potential to support collaboration while working with interactive visualizations. It also shows what components are needed in order to enable gestural interaction with visualizations.

Keywords

Gesture interaction, Vision based interaction, Interactive visualizations, Computer vision, Collaborative interaction, Microsoft Kinect

(3)

iii

1 Introduction

Traditionally human computer interaction has consisted of different types of physical input devices such as the mouse and keyboard. With the evolvement of ubiquitous computing and touch technologies new types of interaction technologies have become more used and new interfaces have been developed (Kela et al. 2005). New interaction technologies such as computer vision and speech enable richer interaction with a computer (Hardenberg et al.

2001). However, these technologies have rarely been widely adopted due to cost and complexity (Wang et al. 2009). This is often the case when interacting with large displays where traditional input devices often restrict the user to be stationary while interacting with the content. These devices also restrict the interaction to one user at the time. To enable multiple users to interact with the same display, direct multi-touch approaches have been suggested (Isenberg et al. 2007). Utilizing small and medium displays where the user can physically reach all parts of the display has had some success. In order to create a more natural and less restricted interaction, research into computer vision as an input device has suggested gesture-based approaches (Nielsen et al. 2004).

Recent advances in the computer vision technology, such as the Microsoft Kinect (Microsoft, 2011), have enabled tracking of multiple users without any external input devices. This makes it possible to interact with a computer without being restricted to for example a desktop mouse or keyboard. Because a computer vision based system could allow the user to interact from a distance it opens up for larger displays. The use of large displays is beneficial when for example viewing visualized data because the user can use a wider field of vision. It also allows multiple users to view and interact with the data (Zudilova-Seinstra et al. 2009).

The large amount of various data available today often has to be interpreted to be comprehensible. One way this interpretation can be done is by creating visualizations.

―Visualization technologies empower users to perceive important patterns in a large amount of data, identify areas that need further scrutiny and make sophisticated decisions‖ (Zudilova- Seinstra et al. 2009:3). The writers also emphasize that the visualizations has to be interactive and how that interaction is done affects the users understanding of the data. This is also the case in visual analytics which Keim et al., defined as ―[v]isual analytics combines automated analysis techniques with interactive visualizations for an effective understanding, reasoning and decision making on the basis of very large and complex data sets.‖ (2008:157). Thomas et al., (2005) suggests that visual analytics is a dialogue between the user and the visualizations and through interaction the user asks the system for new views of the data.. Visualizations are often used by people working in groups to more easily communicate and interpret the information (Isenberg et al. 2007).

Collaboration occurs when people are working with information together and sharing a common goal. Having people with different skills to work and interpret the information being presented can enhance the understanding (Isenberg et al. 2010). However most visualization systems only allow one user to interact with the content at the time and this is often done on a small screen by using traditional input devices such as mouse and keyboard (Isenberg et al.

(6)

2 2007). Isenberg et al. stated that ―Attempting to collaborate under these conditions can be awkward and unnatural‖ (2007:1232). Collaborative interaction of visualizations have often been achieved by networked visualizations when people are physically distributed (Zudilova- Seinstra et al. 2009).

The purpose of this thesis is to integrate interactive technologies with visualizations to support collaboration. Integrating visualizations with multi-user computer vision technology will enable collaboration between co-located users in the same workspace. Referring to the recent advances in computer vision, the large amount of data that need to be displayed in an interactive manner and the benefits of collaboration, these areas motivates this work.

1.1 Problem definition

While collaborating with other co-located users using interactive visualizations of data, the interaction is restricted to one user at the time when using traditional desktop computers. The relatively small display of a desktop computer also makes it more cumbersome for several users to view the visualizations (Isenberg et al. 2007). Using a larger display makes it possible to multiple users to interact with the visualizations (Zudilova-Seinstra et al. 2009). However simultaneously multi-user interaction requires a system that supports multiple user input.

With advances of computer vision techniques, such as the Microsoft Kinect which is able to recognize multiple users and gestures (Leyvand et al. 2011), this could be achieved. Using freehand gestures removes the burden of having to physically interact with the content on a large screen and it also supports natural interaction (Bellucci et. Al 2010). With the need for easier means of multi-user interaction with visualized data on a large screen the main research question in this thesis is formulated as follows.

 How could the integration of interactive technologies and visualizations support collaboration using gesture based controls utilizing large displays?

In order to tackle the main question the following sub-questions are formulated as well.

 What are the components of an interactive gesture based visualization system?

 Compared to traditional means of interacting with visualization systems, what are the potentials of utilizing gesture based interaction?

These questions are answered by the development and evaluation of a prototype based on existing research in the fields of computer vision, visualizations and collaborative interaction.

(7)

3

1.2 Purpose and goal

As stated in the introduction a vast amount of data is available and being used in different kinds of visualizations. The interaction with these visualizations is often done in a traditional setting with a desktop computer. With the benefit of using large displays while working with visualizations of data combined with the less restricted interaction of a computer vision and user recognition system. The purpose of this thesis is to design and develop a computer vision-based interaction system to support collaboration while working with interactive visualizations of data. By integrating existing visualizations of data with a computer vision system that is able to track multiple users, this system supports collaboration between co- located users.

In order to answer the research questions stated in the previous section the components of an interactive gesture based visualization system has to be identified. The work includes investigating potential technologies for computer vision and user gesture recognition then the development of a prototype. The system is developed to support collaboration between two co-located users while viewing visualized data in form of map with points of interests, charts and pictures. The prototype is then tested by ten users to assess the usability and to see if it could support collaboration.

1.3 Limitations

Both the fields of human computer interaction and visualization technologies are broad and therefore a set of limitations are required. This thesis will focus on the interactive technologies to support collaboration and thus not aim to create new visualizations or evaluate the collaboration between the users.

 The software efforts done as a part of this thesis were developed only to test the gestural functionality and ideas.

 This thesis does not aim to create new methods of visualization but instead makes use of existing efforts.

 This thesis does not evaluate the level of collaboration between the users; it merely aims to support it by enabling gestures for multiple users.

1.4 Disposition

The second chapter of this thesis will introduce the methodical approaches that are used to explore the problem definition and answer the research questions.

The third chapter will review current research in the fields of computer vision, visualizations and collaborative interaction. Based on the literature review a number of features concerning the different fields of research were identified.

The fourth chapter will describe the development of a prototype based on the features identified in the literature review.

The fifth chapter will present the assessment of the prototype based on user tests.

(8)

4 The sixth and last chapter will lay out the results and discussion concerning the conclusions from the previous chapters. In this chapter the research questions will be answered and discussed. There will also be a discussion about future research that could be conducted to further explore the area of this thesis.

(9)

5

2 Methods

This chapter will discuss and argue for the methods used to answer the research question. The methods will guide and structure the work in order to make it a scientific contribution. The research consists of two main parts, prototyping and user testing. The prototype was developed to investigate potential technologies that could be used for gestural interaction and how it could support collaboration. The prototype was tested on ten users in order to evaluate the system and the concept. The goal of using these methods combined was to answer the research questions stated in chapter one by analyzing the data gathered from the tests.

2.1 Prototyping

A prototype is according to Sharp et al. ―[…] a limited representation of a design that allows users to interact with it and to explore its suitability‖ (2007:530). The purposes of developing a prototype was to be able to test it with users in order identify usability issues and assess the general attitude towards such system. Sharp et al. states that a prototype can be used for different purposes, for example ―[…] test out technical feasibility [and to] do some user testing and evaluation‖ (2007:531). These statements cohere well with the choice of prototyping as a method. Because this work makes use of a fairly new technology in a new context a prototyping method was deemed most suitable. By developing a prototype the focus could be on evaluating the concept and assessing the usability rather than developing flawless code. The fast iterations of prototyping were another important reason for choosing it as a development method due to the limited time frame of this project.

The prototype was developed as a high fidelity prototype which makes it fully functional.

This makes it possible to use it for exploration and tests (Sharp et al. 2007). Cooper et al.

argues that a product have to be fairly complete in order to conduct user tests with good results (2007). Another aspect which argued for the use of a high fidelity prototype was the relative completeness of the components which were to be implemented. Another reason was to be able to test gestural interaction. This could be difficult if using a low fidelity prototype such as a paper prototype and would not add much to the development. Therefore to be able to test it with good results a fully functional prototype was deemed the most appropriate approach.

To guide the prototyping efforts an iterative prototyping model was followed (CMS 2008). As can be seen in Figure 2.1 it consists of a number of steps which are discussed below.

(10)

6

Figure 2.1 Prototyping (CMS 2008)

 Initial investigation

Before any development could begin an initial investigation had to be done. The initial investigation consisted of investigating potential computer vision technologies that could be used for enabling gesture recognition. When appropriate hardware was identified, different software approaches was evaluated. In this stage a basic concept of the prototype was also constructed. This concept was refined throughout the whole process.

 Requirements definition

To get an understanding of what functionality the prototype was going to have, requirements had to be identified. These requirements were created from identified features in the literature survey. The initial and general features were identified during the literature survey by looking at how earlier research have been approaching similar problems. The features regarded the interactive technologies, usability, visualizations and collaboration and are discussed more in chapter 3.

Additional requirements were identified during the iterations and were considered while designing the prototype.

 System Design

The design of the prototype consisted of three parts which in the end were combined to cohere with the features identified in the literature survey. The first part was to enable user tracking and gesture based interaction. The second part was to implement a digital map and the third and final part was to implement the visualizations in relation to the map.

 Coding, testing…

The parts described in the System design were implemented by coding and testing.

Each part was first implemented separately and tested. The tests were conducted with a couple of potential users which gave their opinions of the specific parts being tested.

These opinions guided the development throughout the iterations.

(11)

7

 Implementation

When the parts were deemed complete they were combined and tested again. When the prototype was fully functional a formal evaluation test was conducted. The test will be described further in section 2.2.

 Maintenance

This stage will not be considered during this work and the reason for this is that the prototype will not reach a final working stage. The goal of using these steps was to be able to test the acceptance of using gestures during collaboration and therefore this final stage was considered to be redundant.

2.2 User tests

In order to assess the prototypes usefulness a series of user tests were conducted. The user tests included both qualitative and quantitative methods using a number of tasks, observation and questionnaires. The tasks can be viewed in chapter five, table 5.1, and the questionnaire can be viewed in Appendix B. The data gathered from the tests was analyzed to assess the prototype and identify usability issues. Sharp et al. suggests that evaluation of a product is about checking if the users like and can use the product, especially if it is a new concept (2007). Because the concept of using gestures to interact with visualizations is fairly new, user testing was considered to be crucial to assess the usability prototype.

2.2.1 Usability test

To assess the usability of the prototype, a series of usability tests were conducted. Usability tests include measuring the performance of a typical user on typical tasks according to Sharp et al., (2007). They also claim that the usability test is used for evaluation of the prototypes ease of use, effectiveness and user satisfaction.

The participants were given a set of tasks to complete. According to Lazar et al. a task list is often needed when testing functional prototypes. They also emphasize that the tasks need to be clear and should not need explanation. The tasks should include steps that are frequently used in the prototype (2010). In addition to the tests qualitative data can be collected by encouraging the participants to think aloud while interacting with the prototype. According to Lazar et al. there is often very useful information to be gathered by letting the users express their opinions during the tests (2010). The participants were observed while interacting with the prototype in order to identify errors and problems of the interaction. Because of the multi- user nature of the prototype the tests were conducted in pairs.

To capture the interaction between the users and the prototype the tests were recorded using a video camera. Using multimedia content, such as video- and audio-recordings is a good way to better understand the users‘ interaction with the system (Lazar et al. 2010). In order to further complement the data from the user tests a questionnaire was given to the participants.

According to Sharp et al. questionaries‘ is a widely used approach for gather opinions from the users and are used in addition to usability tests (2007). The questions in the questionnaire regarded the tasks in the test and the overall attitude towards the prototype.

(12)

8

3 Theory

The purpose of the literature survey was to get a greater understanding of similar systems and how the interaction previously has been in computer vision based systems, visualization systems and collaborative systems. To understand the scope of the current research in these areas they were divided into sections. The information studied in the survey was broken down into a list of features. These features were then used to support the development of the prototype. Studying how existing systems work was argued by Sharp et al., as a viable technique to help establishing requirements (2007).

The literature has been gathered from different sources. The main source was Google scholar which led to a number of scientific databases such as ACM Digital Library (ACM, 2010) IEEE Explorer (IEEE, 2010), Springerlink (SpringerLink, 2010). To be able to find relevant articles in the databases a number of keywords was used. Keywords such as Gesture interaction, Vision based interaction, Interactive visualizations, Computer vision, Collaborative interaction, etc. Other sources of information have been in form of books.

3.1 Interactive technologies

This section will deal with interactive technologies such as computer vision and touch surfaces. There has been much research in the field of gesture interaction using computer vision. Some has used computer vision to enable touch on surfaces other have used computer vision for mid-air interaction. There have been two prominent fields of research with two different approaches related to computer vision, the technological and the usability approaches. The technological approach has dealt with the technical aspects such as different types of camera hardware and recognition algorithms. The usability approach has concerned the usability and user experience of computer vision interaction.

3.1.1 Technological

The technological approach has focused on computer input devices such as different types of cameras, laser and external handheld input devices to be able to accurately track movement of the user.

There are different approaches to sense interaction using computer vision. First the computer must be able to track the user. Then there has to be software that can recognize gestures from the user. Nielsen et al. divided gestures in two categories, static and dynamic gestures (2004).

Static gestures are postures, for instance an open hand which is recognized by the system.

Dynamic gestures are movements that can be sensed by the system, one example is making a circle motion or a push movement to interact.

An approach to track hands using external artifacts such as multi-colored gloves has been purposed by Wang et al., (2009). Their technique was to use a color web-camera and a multi colored glove in order to track one hand of the user in real time above a surface. They used the tracking for interacting directly with 3d objects and hand poses for interacting indirectly.

They argue that their system can be extended to utilize two hands, however only if the hands does not share the same space. The goal of their work was to create an inexpensive and robust hand tracking input device.

(13)

9 Yin et al., (2010) made use of the technology created by Wang et al. in order to create a more natural interaction for large displays in a case study with urban search and rescue (USAR).

They state that ―Interfaces that more closely conform to the way people naturally interact would have the potential to lower the users‘ cognitive load, allowing them to concentrate on the decision-making task‖ (2009:1). They used projectors and a camera mounted above a surface table to track a hand of a user.

Vogel et al., (2005) used a motion-tracking system with reflective markers on each finger to track one hand of the user. They investigated the possibility of pointing and clicking at a large vertical display from a distance. They argued that the point and click metaphor is better than using other gestures to interact.

Z-touch was a project that used (IR) laser to be able to sense touch above and on a surface (Takeoka et al.2010). The authors stressed the importance of sensing depth of the users‘ hands near a table-top surface because it allows 3d gestural interaction. Approaches using depth- cameras have been investigated but the authors express that the cameras are still too expensive and inaccurate.

One project using a depth-camera was DepthTouch by Benko et al. which combined direct touch and mid-air interaction on a vertical display. The author stated that they wanted to preserve the ―[…] ‗walk-up-and-use‘ simplicity of the touch-sensitive interactive surface‖

(2008:3) and thus not use any external controls. DepthTouch is using a depth camera to track the user and hence does not make use of any external controls. Their approach was to track the user‘s hands by calculating the nearest point to the camera and use them as ―blobs‖ for interaction. They argued that their tracking procedure was robust but also suffers from limitations. Their algorithm was not able to distinguish between hands that were close together or close to the body.

Wilson (2010) explored the possibility to use a depth sensing camera as a touch sensor using the Microsoft Kinect. He states that the Microsoft Kinect software development kit enables skeletal models of the users viewed by the camera which is useful for animating in game characters. His approach was to sense touch over a surface by mounting the Microsoft Kinect sensor above the surface and hence not makes use of the skeleton tracking feature.

The Microsoft Kinect is a user tracking sensor for the Xbox 360. It combines an RGB camera and depth sensors in order to track multiple users for gaming and entertainment purposes (Microsoft 2010). Basically, users playing a game supported by the kinect can control it with their body using gestures. However almost immediately after the release of the Microsoft Kinect several ―hacks‖ were also made to it making it possible to connect the device to a PC ( Giles et al. 2010). This made it possible to use it for other purposes than entertainment and gaming. In a couple of months several frameworks was developed to be able to make use of the data from the sensors. One framework that supports the Microsoft kinect is OpenNI¹ which enables communication between sensors, middleware and the application as seen in figure 3.1.

1 OpenNI 2011 http://www.openni.org/

(14)

10 The OpenNI framework combined with the NITE middleware from Primesense² made it possible to use full body tracking in many different languages and applications. Open source projects like AS3Kinecet³ made it possible to use the data and skeleton tracking from the Microsoft Kinect via a server in Adobe Flash using ActionScript 3.

Another project that made use of the skeleton data from the device was as3osceleton⁴. The author combined a server, OSCeleton⁵, with an actionscript3 multi-touch library, AS3TUIO⁶. The OSCeleton translates the data from the Microsoft Kinect to skeletal joints. These joints are then used as touch points. By doing this the multi-touch features of the AS3TUIO library can be used.

Figure 3.1 OpenNI Framework (OpenNI User Guide, 2011)

3.1.2 Usability

The usability approach has focused on the user friendliness of the different technological solutions. But with the increase of touch and gesture enabled devices the usability has sometimes become secondary objectives in the development. Norman et al., says that ‖[t]here are several fundamental principles of interaction design that are completely independent of technology‖ (2010:46) and that these often are overlooked when designing gestural interfaces. These principles are:

• Visibility (also called perceived affordances or signifiers)

• Feedback

2 Primesense 2011 http://www.primesense.com/

3 AS3Kinect 2011 http://www.as3kinect.org/

4 as3osceleton 2011 http://blog.aboutme.be/2011/03/07/as3osceleton-using-kinect-as-a-multitouch-device-in- adobe-air/

5 OSCeleton 2011 https://github.com/Sensebloom/OSCeleton

6 AS3TUIO 2011 http://www.tuio.org/?flash

(15)

11

• Consistency (also known as standards)

• Non-destructive operations (hence the importance of undo)

• Discoverability: All operations can be discovered by systematic exploration of menus.

• Scalability: The operation should work on all screen sizes, small and large.

• Reliability: Operations should work. Period. And events should not happen randomly.

Norman (2010) also criticizes the use of gestures due to its lack of clues. He states that clues of how the interaction should be done are essential for successful interaction with a computer.

Also gestures could be difficult to use due to cultural differences, one gesture that is natural for one person is not necessarily natural for another. Nevertheless research in the area describes different aspects of gestures and how they can be used.

The interaction in a vision based system could either use a single input such as controlling a pointer with one hand or multiple inputs such as multi-touch but also combine with speech.

Systems like this are often referred to as multi-modal systems. Raisamo describes it like this

“Multimodal interaction is a way to make user interfaces natural and efficient with parallel and synergistic use of two or more input or output modalities” (1999:vii). He also suggests that many benefits of multimodal interaction also can be seen in two-handed interaction.

Forlines et al. compared uni-manual and bi-manual mouse interaction to direct touch interaction on a horizontal display. They found that uni-manual works best with a traditional mouse and bi-manual works better with direct touch on the surface. They also found that direct touch might not lead to improved speed or performance but it could be better for ―[…]

fatigue, spatial memory, and awareness of other‘s actions in a multi-user setting[…]‖

(2007:655).

In the case of free hand interaction Cabral et al. found that gesture interaction was significantly slower than mouse interaction and that it rapidly causes fatigue. However they also found it easy to learn and could be useful in for instance collaborative work during short periods. They argue for the benefits of using it in a multimodal interface due to its natural feel of interaction (2005). The term natural interaction has been argued from different perspectives. In the next section natural interaction is defined and discussed.

Natural interaction

Raisamo discusses the term Natural interaction and argues that it is not an exact expression but merely a way to describe the ways to control an interface without any external devices (1999). Others has also defined and discussed the naturalness of vision based interaction. Yin et al. wrote ―By natural gestures we mean those encountered in spontaneous interaction, rather than a set of artificial gestures chosen to simplify recognition‖ (2010:1) when describing a multi-camera tabletop display system.

One way of achieving naturalness in interaction is to use multimodal or two handed interaction (Raisamo 1999). He concluded that two handed interfaces were faster and easier to interact with than traditional interfaces using mouse and a keyboard.

(16)

12 One thing that can affect the natural feel and user experience of a system is how fast it responds to user input. In the next section the aspect of latency and responsiveness are discussed.

Latency and responsiveness

The responsiveness in real time interaction is an important aspect. Wachs et Al. identified a number of requirements for developing hand gesture interaction. One of these requirements was responsiveness where they argued that ―The system should be able to perform real-time gesture recognition. If slow, the system will be unacceptable for practical purposes.‖

(2011:62). They also argue that latency above 45ms between action and response is the maximum value for the system to feel responsive.

Vogel et al., speaks about a classical problem in ―device-free interaction‖ where the user lacks any physical buttons to click. One way to deal with this issue is to use a latency also known as dwelling to make the pointer click after a set time. This could cause the interaction to suffer from constant lag. To minimize any type of lag this technique should be avoided gestural interaction (2005). However it might be a good way to give the user feedback and hints concerning the systems state.

Lag could affect the ergonomics of a system by having to statically hold the hands over a certain point. In the next section the ergonomics of gestural interfaces is discussed.

Ergonomics

While using one‘s body to interact with a computer system the aspect of fatigue has to be taken into consideration. Standing and moving about while interacting with a screen could cause the user to become tired and less focused. Cabral et al. experienced that even in short periods of time gestural interaction could cause fatigue in their gesture based virtual reality interface. They purposed that the hand should be used to control a cursor so the arms could be closer to the body instead of having to keep the arm extended (2005). They considered principles for creating good gestural interfaces established by Nielsen et al. (2004).

 Avoid outer positions.

 Relax muscles

 Relaxed neutral position is in the middle between outer positions

 Avoid repetition.

 Avoid staying in static position

 Avoid internal and external force on joints that may stop body fluids

3.2 Data Visualization

With the evolution of larger storage devices and new ways of collecting data the amount of data being generated and saved today are enormous (Keim et al. 2008). The data itself are seldom usable without ways to extract the information from it. Furthermore they describe an

―information overload problem‖ which refers to that the data might be interpreted incorrectly

(17)

13 due to the vast amount at ones disposal. One way to overview large amount of data is to make graphical representations of it in form of visualizations.

Visualization of data is a powerful way to make sense of large amounts of data. However without interactivity the visualizations are often used as communication and not a tool during the workflow for better understanding of the data. How the interaction is done also affect the users understanding of the data (Zudilova-Seinstra et al. 2009). Card et al. defines information visualizations as ―The use of computer-supported, interactive, and dynamic visual representations of data to amplify cognition‖ (2009:6). Thomas et al. states that ―Visual representations invite the user to explore his or her data‖ and that it therefore has to be interactive (2005:69). This highlights the need of interactivity in visualizations of data.

Depending on the nature of the data there are different ways of visualizing it. The data can for example be one dimensional or two-dimensional (Keim et al. 2002). If a dataset consists of two-dimensional GPS-coordinates the data can be represented on a digital map, for instance Google maps. Other types of visualizations could be in from of one-dimensional charts.

One project that made use of a digital map and chars for visualization was the interactive web-based visualization tool by Vogel (2011). The tool was developed to help students understand environmental data collected during science based inquiry learning. They stated that it is important to be able to, in an interactive manner, explore, analyze and reflect upon the data. These data comprises from geo-tagged content and sensor data, and the overall dataset is offered over Google spreadsheet (Vogel, 2011). The data was visualized using Google maps and Google visualization API. The prototype in this work will make use of the same data source in order to create visualizations.

3.3 Collaboration using interactive technologies

When people are working towards a common goal they often collaborate. Isenberg et al., states that people collaborating often use visualizations to interpret data. They also say that having people with different skills working with the same information is beneficial especially when the data is too complex for one user to handle or when the amount of data is too great (2007).

Collaborative visualization is according to Zudilova-Seinstra et al. when more than one user can interact with the same visualization at the time. They also say that this type of collaboration often has been used over networks when users are physically distributed (2009).

Isenberg et al. presented a multi-user information visualization system using a large horizontal multi-touch display. Their research concerned the collaboration between co-located users while interacting with information visualization. They provided a set of guidelines for a ―co- located collaborative information visualization systems‖ (2007:1233). In the guidelines of the hardware they stressed the issue of display size. The display size is important in order to fit the visualizations but also to enable multiple users to interact with it. Another aspect of the hardware is the input from the users. In order to support collaboration multiple users have to be able to interact with the content at the same time.

(18)

14 Another research project that enabled multiuser interaction was the DiamonTouch by Dietz et al. which used a table-top touch screen to track multiple user interaction. In their research they identified potential problems using a traditional mouse in a multiuser environment. The problems consisted of keeping track of whose pointer was whose. They also meant that using an external device hinders the users natural movement such as reaching, touching and grasping (2001).

While evaluating a gestural interface for virtual reality Cabral et al. came to the conclusion that despite the drawback of fatigue, their gestural interaction system could be used to support collaboration by extend it to track multiple users (2005).

Stewart et al. found that children often gather around a computer screen and wanting to interact with the content. They also found that the user experience increased when having control over the content. Therefore they suggested a multi-user system that made use of multiple input devices on a single display. They concluded that using multiple mice in one screen was more fun while testing it with children (1999).

3.4 Summary and features definition

The purpose of this section is to identify possible features from the literature survey which will serve as a basis and to be considered during the development of the prototype.

3.4.1 Interactive Technologies

The first section of the literature survey addressed interactive technologies, both technological and usability aspects were described.

Technological

There have been many different approaches to use computer vision as input for interaction.

Some have used it to sense touch over a surface (Wang et al. 2009), (Takeoka et al. 2010) (Wilson, 2010). Others have used it for freehand interaction (Benko et al. 2008). With the release and hacks of the Microsoft Kinect a relatively low-cost camera with the ability to track multiple users became available. One benefit of using the Microsoft Kinect could be that it does not require any external devices for interaction. In the technological section the Microsoft Kinect and some of its application were described. Because of the availability of frameworks and the robustness of the user tracking this was considered to be the first feature.

A list of features related to the technological aspects is presented below.



Interactive Technologies o Full body tracking o Multi-user tracking o Gesture recognition o Device free – freehand

By looking at the Microsoft Kinect a number of sub-features were identified. The Microsoft Kinect combined with the OpenNI framework and the NITE middleware enables full body tracking of several users. This feature is essential to allow simultaneously interaction and thus

(19)

15 support collaborative interaction. Another aspect of gesture interaction discussed in the literature survey was the benefit of device-free or freehand interaction. This is also enabled by the Microsoft Kinect.

Usability

The second section of the interactive technologies was about the usability aspects of gesture based interaction.

Norman et al. argued that principles and standards in usability often are overlooked while designing gestural interfaces (2010). These principles are important to have in mind especially when designing gestural interfaces which Norman claim are invisible and does not give the user any clues of interaction (2010).

Raisamo concluded that interaction of a two handed interface is more natural and faster to use if designed well (1999). Forlines et al. found that bimanual interaction works best with direct touch on a surface (2007). Cabral et al., found device less two handed interaction is easy to learn and could be useful for collaboration and that it feels natural (2005). Raisamo described natural interaction as device-less interaction with a compute (1999). Norman (2010) does however not think that most gestures are natural because of cultural differences. He does not think that gestures are easy to learn and remember due to lack of clues in how the interaction is supposed to be done.

Wachs et al. discussed that gestural interaction should be in real time thus not have high latency (2011). Vogel et al. brought up the issue of clicking with device-free interaction .Often the solution have been to use dwelling over an object to interact with it. This is not a good way for interaction since it causes a constant lag (2005).

Cabral et al. found that gestural interaction could cause fatigue and is best used for short periods of time. They followed a list of principles that would make gestural interaction more ergonomically (2005). Below the features of the usability aspect are presented.

 Usability aspects

o Principles of interaction design o Clues of interaction

o Two-handed o Low latency

o Principles of good ergonomics

To ensure good usability and user experience the principles of interaction design described by Norman et al. (2010) would have to be considered. The principles of good ergonomics by Nielsen et al. (2004) are also important to achieve a pleasant user experience. Because of device-free interaction there has to be clues of how the interaction is done. With the argument that two handed interaction is more natural and faster in a two handed interface this is also considered a feature. Another important aspect related to the user experience would be that the application should be responsive at all time. Therefor a direct click interaction should be used instead of the alternative using dwelling.

(20)

16 3.4.2 Visualizations

The second section of the literature survey was about visualizations. Zudilova-Seinstra et al., highlighted the importance of visualization in order to understand data. They also states by making visualizations interactive it becomes more of a tool in understanding the data than for static presentation of it (2009).Vogel (2011) used a Google map, charts and pictures to present environmental data. This was to let users explore data collected. The features of the visualizations are presented below.

 Interactive visualizations o External data source o Digital map

o Charts o Pictures

Because of the importance of interactivity a number of approaches to visualize the data were considered. The GPS data would need to be presented in a way that is simple and relates to where the other data was gathered. This could be done by a digital map. To easily compare the one dimensional data it could be presented in charts. To get a better view of the location where the data was gathered, pictures would be needed.

3.4.3 Collaboration

The third and last section of the literature survey concerned collaboration using interactive technologies. The benefit of having several people working together while trying to understand data was discussed.

Isenberg et al. stated that visualizations often are used in collaboration in order to discuss and understand data. Their multi-touch multi-user information visualization system made use of a set of guidelines. Important aspects in the guidelines regarded the display size and input from the users (2007). To enable interaction from multiple users a large display is needed.

Dietz et al. found that multiuser interaction using a traditional mouse was problematic. Instead they used a multi-touch display to enable multi-user interaction (2001).

Stewart et al. found that the user experience could be better if one have control of the content on the screen when collaborating (1999). Below features that enable collaboration is presented.

 Collaboration

o Large display

o Simultaneous interaction

A large display is important to physically allow numerous users to interact and view the data.

There has to be simultaneous interaction from several users to enable collaborative interaction.

In the next chapter the features described above will be used to support the design and development of the prototype.

(21)

17

4 Design and development

This chapter will describe the design and development of the prototype. First the features identified in the literature survey will be used to create requirements that will guide the development process. Then a short technological evaluation was made. Third the development efforts will be described in relation to the requirements.

4.1 Requirements

The features identified in the literature survey are in this section used to create a set of requirements that will support the development of the prototype.

Interactive technologies

To enable multiple users to interact using gestures without any external devices the system should allow for:

 Full body tracking: The prototype shall track the whole body of each user independently.

 Multi-user tracking: The prototype must allow multiple users to interact with the content simultaneously.

 Gestural interaction: The prototype shall have a set of gestures for interacting with the content on the screen.

 Device free interaction: The prototype shall not require any external devices in order to interact with it.

Usability aspects

To increase the probability of a good user experience a number of usability requirements were identified:

 Principles of interaction design: The prototype shall follow the principles of interaction design stated in the literature survey.

 Clues of interaction: The prototype shall have visual clues of how to interact with it.

 Two-handed: The prototype shall make use of both hands of each user.

 Low latency: The interaction of the prototype shall be continuous and make use of direct interaction instead of dwelling interaction where the user only moves the hand over an object and the interaction is triggered by a set timer.



Principles of good ergonomics: The prototype shall conform to the principles of good ergonomics stated in the literature survey.

Interactive visualizations

To create interactive visualizations the following requirements were identified:

 External data source: The prototype shall make use of an external data source in order to create visualizations.

 Digital map: The GPS data shall be presented on an interactive digital map.

(22)

18

 Charts: The one dimensional data shall be presented in charts.

 Pictures: Pictures shall be displayed in relation to the charts.

Collaboration

Because the prototype should support collaboration, requirements regarding this aspect were identified:

 Large display: The prototype shall make use of a large display.

 Simultaneously interaction: The prototype shall allow multiple users to interact simultaneously.

4.2 Initial technological evaluation

The development of the prototype started with testing out the feasibility of using the Microsoft Kinect as an input device. Two Actionscript approaches were compared and evaluated. The first framework was the AS3Kinect which had the skeleton tracking feature but no good way of creating interaction. The other approach was as3osceleton combined with the multi-touch library AS3Tuio which also had the skeleton tracking feature. Because of the multi-touch features which made it possible to create gestures, the AS3Tuio was the given choice for the development of the prototype.

4.3 Development

The development consisted of several iterations of high fidelity prototyping. The prototype was developed with Adobe Flash Builder 4⁷ with the Flex SDK⁸. The Flex SDK is an open source framework for cross platform web, mobile and desktop application development. The SDK contains many components which are used to rapidly create rich interfaces (Adobe Flex, 2011). To support each step of the development the requirements were used to implement the functionality of the prototype.

During the initial investigations a basic concept of the prototype was constructed. This was done to get an overview of what was to be done. The concept of the prototype was to create a digital map with geo-tagged data related to locations. The data was going to be visualized by using charts and pictures. The map, charts and pictures was going to be interactive and controlled solely by using gestures. The prototype was going to let multiple users to interact with the visualizations simultaneously. This was going to be achieved by using the Microsoft Kinect. The prototype was also going to be used on a large display in order to more easily support collaboration.

4.3.1 Design & implementation

This section will describe the different parts of the prototype and how the requirements are met during the development. The development consisted of three main components which were iteratively implemented and combined. The parts are: the basic interaction using

7Flash Builder 2011 http://www.adobe.com/products/flash-builder.html

8Flex SDK 2011 http://www.adobe.com/products/flex/

(23)

19 gestures, a digital map and the charts and pictures. Below the development of each part is described.

Basic interaction

The first part of the development regarded the implementation of the basic user tracking and gestural interaction. The first step was to consider the requirements of the interactive technologies. To enable full body tracking of multiple users a combination of the OpenNI framework and the NITE middleware was used. The software combined registers the position of the user and creates data of a skeleton which are later used in the prototype. When the tracking of the users was functional, the requirements of gestural and device free interaction was implemented.

To enable the device free interaction the skeletal data from the OpenNI and NITE software was put through a server, OSCeletton, to translate it to skeletal joints. The data of the skeletal joints are read by the prototype and then translated into touch points by the as3osceleton library. By creating touch points from the skeletal joints that represent the hands of the user;

this lets the user to basically touch the screen from a distance.

During this stage some of the usability requirements also had to be considered. To conform to the visibility principle in the principles of interaction design, graphical pointers in form of hands were positioned at the touch points. The reason for making the pointers to look like hands was to give the users clues of how to interact and thus meet the requirement of clues of interaction. To strengthen this clue the hands are always visible, even if the user is resting his arms along the sides. The hands and touch points are used to create the gestures in the prototype.

There are four gestures that the prototype is making use of. Three of the gestures were identified as a part of the AS3Tuio multi-touch library and the Flex SDK. The forth one is required to calibrate the NITE middleware in order to track the users independently. The gestures were chosen because of their simplicity and that they were supported by the multi- touch library. The principles of ergonomics were used as guidelines for choosing the gestures so they would be as usable as possible.

Figure 4.1 shows the gestures and below the gestures are described in relation to the functionality.

Figure 4.1 The gestures

(24)

20

 Calibration

Before any interaction can be done each user have to calibrate. This is done by standing in front of the screen and raising the hands in level with the head. When the calibration is done each user will be assigned a set of graphical pointers connected to the hands.

 Click

The most basic interaction of the prototype is the click. This is done by moving one hand at a pace towards the screen. The click is a key component in the other gestures and has to be performed as an initial gesture. The click alone is used to open markers on the map and close charts and pictures.

 Swipe

The swipe is done by performing a click gesture then while with the arm extended moving it x or y direction. The swipe is used to pan the map and drag and drop the charts and pictures.

 Swim

The swim gestures are performed by first clicking with both hands and then make a swim motion with the arms. Moving the hands further or closer to each other enables the zoom out or in functionality. The swim gesture is used to zoom in and out the map but also to scale the charts and pictures.

In this stage the requirement of two handed interaction was met. The click and swipe gesture are can be performed by any hand while the swim gesture requires both hands. By using direct interaction with objects the requirement of Low latency was also met. By using only three gestures for all interaction, the prototype follow Consistency requirement of the basic principles of interaction. The idea behind this was to use the same gestures for similar operations in order to make it easier to recognize how to use. The other requirements of the principles of interaction design were none-destructive operations, discoverability, scalability and reliability. The prototype does not have any destructive functionality and therefore this principle is not considered. Because of the simplicity of the prototype all the functionality is shown to the user at all time. The scalability principle regarded that the operation should work on all screen sizes. Even though the prototype is developed with large screens in mind it can still be used with an ordinary desktop monitor. The reliability principle was considered because it was going to be tested on potential users. Therefore it was important to have a prototype that was stable and functional.

To conform to the requirement Principles of good ergonomics the gestures was meant to be as easy and simple as possible. Below the principles are discussed in relation to the gestures.

 Avoid outer positions.

The gestures utilized in the prototype do not require full extension of joints. This is an important aspect to avoid causing fatigue while performing the gestures.

 Relax muscles

To successfully complete a gesture it is unavoidable to tense the muscles. However the interaction is designed to work best with relaxed motions and not to cause fatigue.

(25)

21

 Relaxed neutral position is in the middle between outer positions

As stated earlier the gestures do not require the user to fully extend any joints. All the interaction can be done by between the outer positions. The outer positions are a fully extended arm or fully bent arm.

 Avoid repetition.

Because the prototype makes use of three gestures for six different tasks, the gestures will be repeated often. But because the gesture movement is a bit flexible the user can vary how to perform the gestures. For instance the swipe motion can be done with either hands, the hand can be closed, open or pointing with one finger at the screen.

 Avoid staying in static position

There is no functionality of the prototype that requires the user to be in a static position for a longer period of time. The gestures were used because of their continuous movement which prevents static positions.

 Avoid internal and external force on joints that may stop body fluids

By not using any handheld devices for interaction there is no external force that can be discomforting. The gestures are simple and relaxed and are thus not introducing much internal forces on the joints.

In this first part the gestures was used for creating interaction with simple objects. The interaction consisted of scaling and moving objects. Figure 4.2 shows the basic concept with the hands and a square which the user could move and scale.

Figure 4.2 Basic interaction

This was tested and demonstrated for a couple of users which gave their opinions. Some opinions concerned the pointers which were connected to the hands. In the first tests, the

(26)

22 hands were static images with a fixed size which made it hard to perceive the depth of the interaction. This was solved by making size of the pointers depend on how far from the screen they are. Another concern stated by the participants was that it was hard to know when the interaction occurred. A suggestion about using a glow around the hands was made and also implemented. When the basic gestural interaction was deemed complete the implementation of the digital map began.

Digital map

The second part of the prototype was the digital map which was created with the Google maps API and can be seen in figure 4.3. In this section the requirement of using an external data source and a digital map was met. The map was used to visualize data from an external data source. The data was stored in a Google spreadsheet document which is published as an RSS feed. The RSS feed was put through a PHP-proxy to be able to read it in the prototype. The data comprises from geo-tagged content and sensor data. The geo-tagging consists of GPS coordinates in latitude and longitude. These coordinates are used to position interactive markers on the map in order to display the sensor data in relation to the locations.

The original mouse interaction of the Google maps consists of clicking markers, panning and zooming of the map. This interaction was implemented by applying the gestures instead of using a mouse. The functionality of the Google Maps API is rather restricted when it comes to multi-touch interaction. Therefore the gestural interaction had to be adjusted to suit the map.

Figure 4.3 Google Maps

The map was also tested by a couple of users which gave their opinions regarding the gestural interaction of the map. The first implementation made use of a satellite map which the participants thought was slow and hard to navigate. Therefore a more simple map was used instead. When the map was functional the implementation of the sensor data visualizations began.

(27)

23 Charts & pictures

In this section the implementation of the charts and pictures are described and thus the requirement of charts and pictures is met. The sensor data from the data source was visualized by using the standard column chart component in Flex. The data source also contained URL to pictures related to the locations. The charts and pictures were set to be displayed when a marker was clicked. In order to be able to close the charts and pictures a red square was added as a close button. The gestural interaction created in the first part was implemented for the charts and pictures. This means that they can be scaled and moved.

Figure 4.4 shows the different parts combined.

Figure 4.4 Google Maps combined with charts and pictures

(28)

24 When the three parts were implemented they were tested again with a couple of users. The participants wanted to be able view more than one chart and picture at the time. Therefore the functionality of opening multiple markers was implemented. When the prototype was deemed functional, an evaluation test was conducted. The evaluation test is described in detail in chapter five. Figure 4.5 shows a user interacting with the prototype. Furthermore this is also demonstrated in a short demo video which is available at: http://vimeo.com/24271754

Figure 4.5 Live demo

The implementation of the different parts resulted in a fully functional prototype where two users can interact with a Google map, charts and pictures using gestures. During the development stage a number of components were identified and integrated in order to create the functionality.

Figure 4.6 presents an overview of the components used. These components can be divided into two parts, user input and system output. The user input consists of the Microsoft Kinect combined with tracking and gesture recognition software. The system output is the Google map with visualizations displayed on a projector screen. These parts each is comprised of a number of frameworks, libraries and APIs which were discussed during the implementation and can be seen in figure 4.7.

(29)

25

Figure 4.6 System Components Overview

The OpenNI, NITE and OSCeleton software was used to track the users. The AS3osceleton, AS3TUIO and native Actionscript gestures was implemented in a Flex application. These parts combined enabled the user input from the Microsoft Kinect. The Google Maps and Google Spreadsheet together with components from the Flex library were used to create the visualizations and output to the users.

Figure 4.7 Software overview

(30)

26

5 User tests

The user tests were conducted to assess the usability and identify usability issues of the prototype. The participants were given a short user manual to familiarize with the concept before the tests, see Appendix A. To further instruct the participants a short demonstration of the basic functionality were made in connection to each test session. When the participants had understood the basics they were given a set of tasks to complete. After the tasks had been completed they were given a questionnaire with questions related to the tasks.

To gather as much data as possible a video camera was used to record the tests. The video material was used to aid the analysis of the tests.

5.1 Users and settings

The user tests were conducted with ten participants; three of them were females and seven were males. The participants age ranged from twenty-one to thirty-one. All of the participants were used to work with computers. The test took place in a computer lab with a projector screen. The size of the interaction space was 3x3 meters in front of the projector screen.

Figure 5.1 shows the setting in which the tests were conducted.

Figure 5.1 User tests

5.2 Tasks

The participants were given seven tasks to complete while interacting with the prototype, see table 5.1. The tasks were composed to identify potential usability issues and drawbacks concerning the interaction of the prototype. To assess the tasks they were related to the gestures which are the core interaction of the prototype. To collect subjective opinions about the interaction the participants were encouraged to think out loud while performing the task.

(31)

27

Table 5.1 The Tasks

Description

1 Stand in front of the screen, raise your hands as instructed and wait for calibration(both, one at the time)

2 By using the swim gesture zoom the map.

3 Click one marker each to view the data (charts and pictures).

4 Move the charts and pictures by swiping so they do not overlap each other.

5 Close the charts by clicking the red square.

6 Scale the pictures by using the swim gesture so they do not fit the screen anymore, and then close them.

7 Open one marker each (as in Task 3) and switch objects with each other (collaborate).

5.3 Questionnaires

After the test of the prototype the participants were given a questionnaire. The questionnaire was comprised of eight close ended questions with a likert-scale ranging from 1 to 4 (1 = Strongly disagree; 2 = Disagree; 3 = Agree; 4 = Strongly agree). The questions regarded the participants opinions about the tasks performed in the test. In addition three open ended general questions were included in the questionnaire. The general questions regarded the overall impression of the prototype, collaboration and suggestions for improvement.

5.4 Data analysis and results

In this section the results from the questionnaire and the tasks are presented and analyzed. The gestures identified during the development have served as a base for the test and analysis of the results. The gestures have been used as categories and below the results for each gesture are presented. The overall results presented in figure 5.2 shows the mean value for each task.

From these results usability issues, mainly related to the click and swim gesture, were identified. The overall mean for the prototype was 2.9125 which indicate that it overall worked rather well but clearly had some usability issues.

(32)

28

Figure 5.2 Questionnaires/tasks results

5.4.1 Calibration

The first task regarded the calibration of the users. Each participant had to make the calibration pose in order use the prototype. The pose were described in the user manual and also demonstrated to the users. In all the tests the initial calibration went fairly easy. Most of the users agreed or strongly agreed that it was easy to understand according to figure 5.3.

However some drawbacks were observed during the tests.

Figure 5.3 Calibration

The users have to be in the Microsoft Kinects field of vision at all time otherwise the calibration could be lost. If the calibration is lost the user has to recalibrate. This happened a few times during the tests, and although the recalibration also went easy this is a tiring interruption in the interaction. When this occurred discontent from the participants were noted. Therefore the calibration should be easier and more robust.

0 0,5 1 1,5 2 2,5 3 3,5 4

Task1 Task3 Task5 Task 4 Task7q1 Task2 Task6 Task7q1 Task7q2 Calibration Click gesture Swipe Gesture Swim gesture Collaboration

Mean values of the tasks

0 1 2 3 4 5 6

Strongly dissagree

Disaagree Agree Strongly Agree

Callibration

Task1

Bachelor Thesis

Exploring gesture based interaction and visualizations for supporting collaboration

Bachelor Thesis

Abstract

Keywords

Contents

1 Introduction

1.1 Problem definition

1.2 Purpose and goal

1.3 Limitations

1.4 Disposition

2 Methods

2.1 Prototyping

2.2 User tests

3 Theory

3.1 Interactive technologies

3.2 Data Visualization

3.3 Collaboration using interactive technologies

3.4 Summary and features definition



4 Design and development

4.1 Requirements



4.2 Initial technological evaluation

4.3 Development

5 User tests

5.1 Users and settings

5.2 Tasks

5.3 Questionnaires

5.4 Data analysis and results

Mean values of the tasks