Hand Gesture based Telemedicine enabled by Mobile VR
Author Sofia Kiriaki Vulgari Supervisor Jenny Lundberg Examiner Ilir Jusufi
Exam date 14 June 2019
Subject Social Media and Web Technologies
Level Master
Abstract
Virtual Reality (VR) is a highly evolving domain and is used in an increasing number of areas in today’s society. Among the technologies associated with VR and especially mobile VR, is hand tracking and hand gesture recognition. Telemedicine is one of the fields where VR is starting to thrive, and so the concept of adding the use of hand gestures came to be in order to explore the possibilities that can come from it. This research is conducted with the development of a prototype application that uses some of the most emerging technologies. Manomotion’s hand tracking and hand gesture recognition algorithms, and Photon’s servers and de- veloper kit, which makes multi-user applications achievable, allowed the conceptual idea of the prototype to become reality. In order to test its usability and how potential users perceive it, a user study with 24 par- ticipants was made, 8 of which were either studying or working in the medical field. Additional expert meetings and observations from the user study also contributed to findings that helped show how hand gestures can affect a doctor consultation in Telemedicine. Findings showed that the participants thought of the proposed system as a less costly and time saving solution, and that they felt immersed in the VR. The hand gestures were accepted and understood. The participants did not have difficulties on learning or executing them, and had control of the prototype environ- ment. In addition, the data showed that participants considered it to be usable in the medical field in the future.
Keywords. Virtual Reality, Hand Gestures, Virtual Environment, Telemedicine,
Mobile VR, Multi User
Acknowledgements
I would like to express my great appreciation to my supervisor, Jenny Lundberg, for all the support, guidance, and the time she gave to help me with this thesis. I would also like to thank my previous teacher, Shahrouz Yousefi, for introducing me to the technologies that inspired me and the help he provided during his supervision.
I am grateful to all the teachers of the Media Technology department of Linnaeus University that guided and gave me so much knowledge dur- ing this master program. To the Manomotion family, and Abraham, for answering my questions no matter how intelligent or not, for being so helpful and allowed me to explore and use their technology.
A big thank you to all the participants and experts that gave their time and thoughts on this study as they compose a big part of this work.
I am beyond thankful to my parents, Athena and Elias, for their un-
conditional love, their sacrifices, and incredible support in the journey that
brought me here. To my friends, Lia, Konna, Jenny, Sonja and Cristina
that stood by me and gave me the words that I needed to hear and go
on. For making me feel their love, loyalty, and support, every day, even
when we are miles apart. Last, but not least, to Otto, for always putting
a smile on my face, and his family that embraced me like one of their own.
Contents
1 Introduction 1
1.1 Virtual Reality . . . . 1
1.2 Hand Gestures in VR . . . . 2
1.3 Telemedicine . . . . 2
1.4 Motivation . . . . 2
1.5 Research questions . . . . 3
1.6 Thesis Structure . . . . 4
2 Related Work 5 3 Methodology 8 3.1 Mixed Method Research . . . . 8
3.2 Informed Consent Form . . . . 8
3.3 Expert Discussions and Pilot Users . . . . 9
3.4 Participant Questionnaire . . . . 10
3.5 Usability Questionnaire . . . . 10
3.6 Data Analysis . . . . 11
4 Prototype 13 4.1 Technological Aspects . . . . 13
4.1.1 Unity . . . . 13
4.1.2 Manomotion . . . . 14
4.1.3 Photon Engine . . . . 15
4.2 First Stage . . . . 16
4.3 Environment . . . . 17
4.4 System Overview . . . . 19
4.5 Networking . . . . 22
4.6 Hand Gestures . . . . 24
5 User Study 25 5.1 Expert meetings . . . . 25
5.1.1 Meeting with nursing teacher at the start of the project . . . . 25
5.1.2 Meeting with endocrinologist and general doctor . . . . 25
5.1.3 Meeting with Manomotion expert . . . . 26
5.2 Pre-Testing Preparation . . . . 26
5.3 Pilot users . . . . 27
5.4 Test . . . . 28
5.4.1 Scenario . . . . 28
5.5 Results and Analysis . . . . 28
5.6 User Experience questions . . . . 30
5.7 Usability related questions . . . . 32
5.8 Hand gesture questions . . . . 33
5.9 Additional statistics . . . . 35
6 Discussion 41
7 Conclusion 43
7.1 Limitations . . . . 43
7.2 Future work . . . . 44
A Appendix A: User Study 49 A.1 Informed consent form . . . . 49
A.2 Participant Questionnaire . . . . 51
A.3 Observations form . . . . 56
B Appendix B: Declined mock-up 58 C Appendix C: Additional results 59 List of Figures 1 SRI’s “Green Telepresence System”, (Westwood, 1998) . . . . 5
2 From left to right: Samsung Gear VR with S7 Edge and Andersson VRG with S9. . . . 13
3 Manomotion SDK Lite: SDK Features (Manomotion, 2019b) . . . . . 15
4 First mock-up image with Doctor and Patient view. . . . . 16
5 Second mock-up image with Doctor enlarging the heart model. . . . . 17
6 Left: Skeleton model with organs in, Right: Skeleton model with organs out. . . . . 18
7 Hand model. . . . . 18
8 First view of skeleton and hand model without hand tracking. . . . 19
9 Skeleton and hand model with hand tracking. . . . 19
10 Closed hand gesture and organs are extracted. . . . . 20
11 Point gesture and text for heart model. . . . . 20
12 Both texts visible just for the doctor. . . . 21
13 System Overview Diagram . . . . 21
14 Inspector of the virtual hand with the PhotonView, PhotonAnima- torView and PhotonTransformView components. . . . . 23
15 Hand Gestures: Open Hand (upper left), Closed Hand/Grab (upper right), Point (down). (Manomotion, 2019a) . . . . 24
16 Graphs for questions Q1 (left) and Q2 (right) . . . . 30
17 Graphs for questions Q5 (left) and Q6 (right) . . . . 31
18 Graph for question Q8 . . . . 31
19 Graphs for questions Q3 (left) and Q9 (right) . . . . 32
20 Graph for question Q7 . . . . 32
21 Graphs for questions Q19 (left) and Q20 (right) . . . . 33
22 Graphs for questions Q4 (left) and Q10 (right) . . . . 34
23 Graphs for questions Q11 (left) and Q12 (right) . . . . 35
24 Scenario 1 mock-ups . . . . 58
List of Tables 1 VR Headset Devices and their cost. . . . 3
2 Research Questions . . . . 4
3 Age and previous participant experience with VR . . . . 10
4 Likert type questions. . . . . 29
5 Questions with Yes, No or Maybe responses. . . . 29
6 Session observations of participants. . . . . 36
7 Descriptives of Q10 for ANOVA. . . . 36
8 Oneway ANOVA test for Q10. . . . . 37
9 Descriptives of Q7 for ANOVA. . . . 37
10 Oneway ANOVA test for Q7. . . . 37
11 Time it took to complete the task * Age Range Cross tabulation. . . . 38
12 Chi-Square test for Table 11. . . . 38
13 Participants in the medical field or not * Use as patient Cross tabulation. 39 14 Chi-Square test for Table 13. . . . 39
15 Participants in the medical field or not * Use as doctor/nurse Cross tabulation. . . . . 40
16 Chi-Square test for Table 15. . . . 40
List of Abbreviations
2D Two Dimensional. 6 3D Three Dimensional. 1
API Application Program Interface. 43 AR Augmented Reality. 13
HCI Human-Computer Interaction. 2 HMD Head Mounted Display. 1
IDE Integrated Development Environment. 13 MR Mixed Reality. 44
PUN Photon Unity Networking. 15
SDK Software Development Kit. 2
VE Virtual Environment. 1
VR Virtual Reality. i
1 Introduction
In this section, three of the main research areas of the topic are presented. Starting with the VR, its history and evolution, as it is the base of this thesis. Following, is an introduction to hand gestures in VR and last, but not least, an introduction to Telemedicine, what it is, and its application in VR.
1.1 Virtual Reality
The area of VR is continuously advancing and becoming more popular with the help and progress of new technologies and applications. When being in a Virtual Environ- ment (VE), it gives the sense to the user of being somewhere else. As Bailenson (2018) refers in his book, the feeling of being somewhere else than where you were seconds before is called psychological presence. It is one of the main effects of using VR, and it is created with the help of a head-mounted display (HMD) which shows the VE to its user.
The idea and concept of VR was introduced a long time ago, and an early descrip- tion of VR is presented in the work of Sutherland (1965) where he first talked about a room where ”objects” can be materialized with the help of computers and therefore create a new kind of display and user experience. Sutherland (1968) continued with a project where the users are presented with images that change as they move or turn their heads to create the illusion of the dimensions (3D).
After the ‘60s, when VR technology progressed, it became valuable to train people with the use of VR, in high-risk careers like pilots, astronauts, and surgeons. Now, the cost of setting up a VR has become very low as it is available to the public. Just with the use of a smartphone and an HMD, a person can experience the sense of being somewhere else than an instant before. VR has developed since then, with it being used in many fields from science, engineering, entertainment, training, and education.
As mentioned by Mazuryk and Gervautz (1996), it is easier for the user to perceive data and other natural phenomena with VR and the help of computer graphics, so it can be ideal for applications in education.
For the experience, a user will need to wear a HMD. Some HMDs have their own two displays (directed to each eye, exploiting the human’s binocular vision like Giraldi et al. (2003) mention) and head trackers. Others are used by placing a compatible smartphone on them.
Essential aspects of VR are the output and input channels that the Virtual Envi- ronment needs. Like Gobbetti and Scateni (1998) explain, the input is the one that the users give to the VE with either the movement of their head or their hands and with the output channels of the VE being equivalent to the users’ senses of sight, hearing, and perception of touch. Input devices are the ones that help the users give input to the system and so make the VE control and interaction to feel immersive and natural. Pietroni (2013) describe natural interaction as the possibility for someone to interact and explore a VE with only the use of body, and in this case specifically, hand movements, in the most simple way.
An other element of VR to acknowledge for this study is the use of networked VR,
where more than one user is present and can interact in the VE simultaneously, and
affecting each others environment. It allows the users to connect from distance, and
this element has been a part of VR from its beginnings as many projects and games
in multi-user VR have been developing since then (Schroeder, 1997).
1.2 Hand Gestures in VR
Hand gestures are a natural and instinctive way of communication between humans.
Nowadays, they can also be a way of communication between humans and computers.
As mentioned before, the user’s hands can be a source of input for the VE by using input devices such as gloves and controllers. The same thing can be done by just using the bare hands and in general, hand gestures. So, in recent years, they have become more and more popular for their use in human-computer interactions (HCI).
The first study that introduced gestures in HCI was by Sutherland (1964), for which he created Sketchpad. Sketchpad was a system where the user could draw on a display and then the computer took that as information. Since then, many other handheld input devices have been made, such as the Oculus Touch controllers (Chen et al., 2019; Shum et al., 2019) and the HTC Vive controllers (Corporation, 2019).
As technology advances so does the hand tracking and hand gesture recognition.
VR gloves are another kind of input that is used. They also have different kinds of technologies, such as the use of fiber optics, or have an exoskeleton for a more precise tracking (Boas, 2013). Also, vision-based methods give a more natural feeling to the HCI. This method requires a camera, and the use of a computer software as the hand posture and gestures are recorded by the camera (Garg et al., 2009). The hand gesture recognition can also be divided into two different technologies where one is on a two- dimensional level and the other in 3D. That means that the last one is also taking depth data and so making its software harder to make (Li, 2016). In this study, for the hand tracking, a 3D hand gesture recognition software is used.
1.3 Telemedicine
Telemedicine has a literal meaning, which is ”healing at a distance.” It comes from the combination of two words, the Latin ”Medicus” and the Greek ”Tele” (Strehle and Shabde, 2006). From a study made by the World Health Organisation (WHO), a definition of Telemedicine was created, and it is as follows:
“Telemedicine is the delivery of health care services, where distance is a critical factor, by all health care professionals using information and communications tech- nologies for the exchange of valid information for diagnosis, treatment and prevention of disease and injuries, research and evaluation, and for the continuing education of health care providers, all in the interests of advancing the health of individuals and their communities”, (WHO et al., 1998).
Telemedicine has been conducted in many forms for many years going as back as in the ‘70s and ‘80s, starting with the use of telephone and then also adding a form of video to make the experience better (Perednia and Allen, 1995). Now, telemedicine services can include video calls for consultation and diagnosis or even give a prescrip- tion and general advice, remote monitoring of a patient or monitoring of a specific vital sign of a patient and even medical education for doctors and nurses in training (ATA et al., 2006).
1.4 Motivation
The main concept of this project is to incorporate Telemedicine with mobile VR with
the help of Samsung Gear VR (Electronics, 2019), Samsung smartphones, and hand
gesture recognition with Manomotion’s (Manomotion, 2019c) Software Development
Kit (SDK) for a more natural interaction. The hand gestures are considered a natural
manipulation as people use them in their everyday life and no other devices are needed for the interaction in the VE (Pham et al., 2018). To give the opportunity to a doctor to show and explain to the patient precisely what and where the problem is by sharing their view in the VE for a more immersive experience.
Among other technologies, mobile phones are one of the means that can be used for Telemedicine and VR. Mobile Telemedicine has been developing, as does wireless communication technology and Telemedicine applications (Tachakra et al., 2003). For that reason, and for the ease of use that mobiles offer, they are also a part of this study. It is a low-cost option, and they gave the chance to this research to use mobile VR and hand tracking.
VR has been used in medicine since the early ‘90s either for visualization of different data or for surgery. Especially in the field of surgery, VR has been used in several applications such as surgery planning, training, and even during surgery. As time passes, though, VR has been applied to other areas of healthcare like for assessments, rehabilitation, and treatments, among others, of psychological disorders. Some of the latest VR devices, referring to Samsung’s Gear VR and Google’s Google Cardboard (see Table 1), are less expensive and available to the general public as they operate with the help of smartphones and do not have a mounted screen display on them. They can be used with a user’s mobile phone and so, do not need a technician on standby like in other cases when VR was used in a hospital (Giuseppe and Wiederhold, 2015).
VR Device Cost
Samsung Gear VR $129.99
aGoogle Cardboard $15.00
baPrice listed is the suggested retail price in the official website. Re seller prices may vary.
bPrice listed is from the official website of Google for the Google Cardboard.
Table 1: VR Headset Devices and their cost.
In addition to that, it would be an even better option to not have to add any other input device and rather have hand gestures as an input source.
To conclude, the scope of the prototype created for this research is to depict the instance of a general doctor consultation when the doctor has already made examina- tion tests to the patient, like X-rays, and is explaining to the patient where and what the problem is.
1.5 Research questions
One of the goals in this study is to see if the use of hand gestures in VR application for Telemedicine can be of use, improve the users’ experience and if so how it would do that. To be more specific, the first research question (RQ1, see Table 2) was formed from that, were it will be examined how this application can affect that experience.
Also, the second research question (RQ2, Table 2) aims to see how potential future users perceive the technical application of the technologies behind the proposed system.
Examine if it can demonstrate the main idea, its potential for further development and
usability.
RQ1 How does the use of 3D hand gestures in Virtual Reality affect the doctor’s experience of a general medical consultation with Telemedicine?
RQ2 How is the system perceived by potential users?
Table 2: Research Questions
1.6 Thesis Structure
The thesis has been divided into eight chapters and their sub-chapters. In Chapter 1, we can see the introduction and information on the thesis’ primary objectives, includ- ing what it is about and some background information. Continuing, the motivation of this study and the main research questions are explained. Following, in Chapter 2, we can see some previous related work in the field of hand gestures and telemedicine in VR. The methodology used and explanation of the approach in this study is seen in Chapter 3.
Chapter 4 is given entirely to the prototype. How the process of making the prototype was, tools that are used, information on the final result, and how it works.
Continuing, on Chapter 5 information were given about the user study, how it was
conducted, the results and their analysis. Chapter 6 describes the discussion about the
final results, and finally, the thesis is concluded with Chapter 7 and a brief overview
of the research, the limitations and the work that could be added to extend this study
in the future.
2 Related Work
In this chapter, a selection of studies and previous works in the field of this research are introduced. They were discovered while doing the literature review at the start of this project. The references were found and gathered with the help of ResearchGate, ScienceDirect, and Google Scholar.
As the leading field when applying VR in medicine is surgery, it can be of relevance to start with VR applications that are related to that. One of the first projects was the one by a company called SRI International and its design, Green Telepresence Surgery System for military medicine (Satava, 1997). Their design (Figure 1) would locate a critically wounded soldier, place him in an automated intensive care unit and then a surgeon would operate remotely on a virtual image using instruments that have the feeling of real surgical instruments. A robot then, would be with the wounded soldier and mimic the surgeon’s movements to operate on him.
Figure 1: SRI’s “Green Telepresence System”, (Westwood, 1998) Another system that seemed to make many appearances during the literature review was the MIST VR system. It trains the user for laparoscopic surgery in VR.
It is comprised of a computer that has the VE which in turn shows graphics in an abstract form in order not to distract the users from their tasks. Their input devices are two laparoscopic instruments that are held by position sensors, which then, in turn, send their data to the computer that shows them in real time with the help of two graphical models of the instruments (Wilson et al., 1997).
An other, more recent study made by Bing et al. (2019) is related to surgical practice training with VR, using the Oculus Rift and it’s hand controllers. The aim was to improve the surgeons’ ability to have successful oncology procedures. Their results showed that this solution may help with learning more complex procedures to novice surgeons and that this low cost VR might serve as a global solution for cancer surgeries and cancer health care.
In addition to conducting and training for surgeries, VR has helped with their
planning. An example of that is the system called ”LiverPlanner” (Reitinger et al.,
2006) which allowed an easier way to plan liver cancer removal with three different
phases. First, they had an image analysis, where they also added the data from com- puted tomography to generate the anatomical models. Stage two was the segmentation refinement and lastly treatment planning. Their system allowed the 3D interaction and visualization of the liver model. It used a wall projection, shutter glasses, a sys- tem that tracked the user’s head and his/her input devices with one of them being their own, self designed controller called the ”Eye of Ra”. A second display for two dimensional (2D) input could also be used.
For therapy and treatment, an interesting research is the one by Son et al. (2015) where they assess the effectiveness of treating alcoholism with VR therapy. To do that they evaluated the changes in the brain metabolism of their participants. They used a 3D screen monitor to first show a peaceful image in order to let the participant relax, then they showed them a virtual drinking situation with different alcoholic beverages, which they described as a ”high-risk situation”, and finally an evasive situation, where they showed them a scene of people that had drunk and then vomited. At the same time, psychologists gave the participants a glass of fermented milk in order to stimulate the taste of vomit.
Another research in VR related to stroke rehabilitation was done by Jack et al.
(2001), which shows how they used a system in order to help with hand rehabilitation after a stroke. Their system was composed of a computer that had the VE developed for the research and two input devices. One input device was a CyberGlove that had sensors that measured the angles and positioning of the fingers and wrist. The second one was the RMII Glove, an exoskeleton device, that also used sensors and applied some force to the fingers in order to measure the hands’ responses.
Concerning the hand gesture technologies, a study by Weissmann and Salomon (1999) researched hand gestures for HCI in VR applications. Their input device was a data glove (CyberGlove) that gave the values of the positions of the fingers and their joints. They specifically used the fist, index finger, and victory sign as gestures and with those, compared their performance on different neural network models in order to see the rate of success on recognition.
Another research was done with the use of a hand tracking device called Leap Motion. Potter et al. (2013), researched Leap Motion’s suitability to recognize Aus- tralian sign language. Leap Motion is a device with sensors that can be plugged on a computer or an HMD and track hand gestures. Their study concluded that the device was not ready to fully recognize all the elaborate gestures, but as the research was done in 2013, it can be possible that Leap Motion has developed a lot since then.
Marin et al. (2014) made a study by comparing Leap Motion and another similar bare hand recognition device called Kinect. They extract data from both devices, using the same hand gestures, in order to compare their performances but also say that the two devices could be combined for more accurate hand recognition.
A paper by Alshaal et al. (2016) shows their work on a system that has the goal of making the usability of VR applications better, as well as their functionality. That is achieved by using wearable devices that collect data from hand gestures. Their example is used in a retail application that allows the users to browse and roam inside a clothing shop and also buy products. They use the Oculus Rift as a HMD and a glove to recognise the hand gestures and interact in the VE.
YAN et al. (2019) published an article on their work of designing a system that uses hand and head gestures to interact with virtual objects and control movement.
More specifically the acquisition of different targets was done according to the shape and size of each object. So, the gestures were adapted to each one accordingly.
An important article to mention here is also the work of Georgiadis and Yousefi
(2017) where they presented the results of using hand gestures in a multiplayer mobile VR game and how that enhanced the user’s experience. Their results showed that it is possible to offer a natural solution for interacting in a VE. Also, they mention that the presence of multiple users in the VE affected their experience in a positive way.
The feeling of presence was more profound.
3 Methodology
In this chapter, the research method and data gathering are introduced.
3.1 Mixed Method Research
There have been different definitions of what a quantitative research method is. In summary, it would be a research where empirical methods are applied and so phe- nomena as expressed by numerical data with the help of mathematics and statistics (Sukamolson, 2007). In this study, there are data that were not originally in a numer- ical form but with the help of data collection instruments, like questionnaires, they are transformed to numerical data.
Due to the limitations of expressing abstract thoughts through numerical vari- ables, some researchers began to contemplate other forms of data as valid, opening the door to the qualitative research paradigm. This research approach allows the researcher to get data from the participants’ thoughts, opinions, and behavior. The data are not numerical but descriptive, and they are acquired from interviews, voice or video recordings, and observations from the researchers, their notes, and open-ended questions (Eyisi, 2016).
In the last couple of decades, a third research method has emerged. The mixed method is a combination of the two previous methods, and the data collection is made with approaches from both. The advantages of the mixed method research lie on the fact that the data and the approaches that they are gathered with are covering the weaknesses of each other, and their strengths complement each other. It supplies data in different forms that help address more complex problems (Gunasekare, 2015).
In this research, the use of the mixed method research seemed the most appropri- ate. The research is done by the combination of experiment sessions that are recorded, observations and notes are taken from those recordings, and questionnaires that pro- vide the research with both numerical and descriptive data since it includes Likert style questions, Yes or No questions and open-ended questions. Because of that, the participants can give a more in-depth opinion. The combination of all these types of data allowed the answers to the research questions to be complete and more thorough.
3.2 Informed Consent Form
For the testing sessions with the participants, an informed consent form (Appendix A.1) was prepared for them to read and sign. Its format was based off of an informed consent form template from the Memorial University of Newfoundland
1. It starts with general information on the researcher, the purpose of the study, and what the participants’ roles are in it. The participants are informed on the length of the time that it will take for the session to be completed and also that they are free to withdraw from the study at any point of time they want. That would result in the destruction of their data.
There is a paragraph where the main ”Health and Safety warnings” are shown as described by Oculus for the VR headset and also included is the link to their online paperwork. For the issue of confidentiality, they are informed that their names will not be used for the study, but they will be referred to with coded names such as Participant
1Informed Consent Template
1, Participant 2, and et cetera. In order to avoid any associations with their identities and their data, each participant is referred to with the singular pronoun they, and its derivatives, so that the reference to their gender is avoided. It is also stated that the sessions will be recorded, but they have the option to not agree to it. The form ends with the date and signatures.
The informed consent form was given to the participants to read at the start of the sessions and in some cases beforehand in order to save time for the testing. After they read it, they could ask further questions and signed the form. They were then advised to take and keep pictures of it or in some cases make a copy from a printer.
All the participants agreed and signed with no problems or objections.
3.3 Expert Discussions and Pilot Users
In order to get more in-depth expert opinions, three meetings were arranged with different experts. They all offered much-valued information, opinions, and ideas. The first meeting was with a nursing teacher. During that time the prototype was in a very early stage, so she was able to explore the pre-prototype, the very first steps of its development. She also checked the mock-ups that were first created to get a better understanding of the concept, and was also presented with the main idea with additional explanation and background information. After that she was able to give her opinion and ideas for future development.
The second meeting was with a technical expert that is working in Manomotion.
The prototype by that time was done, so he saw the end result. The third meeting was also after that, and it was with two doctors, an endocrinologist and a general doctor. Similarly, they saw the working prototype, were introduced to the topic of the research and gave their feedback and thoughts.
For the testing, the first participants were two Pilot users. They were informed on that fact and were asked to offer any feedback they might have after the test, so that the rest of the sessions with the other participants would be conducted in a better way. Finally, with the pilot tests, 22 more participants answered the questionnaire.
So in total, the research study includes 24 participants from which a lot were acquain- tances and some were met for the first time, adding reliability to their answers on the questionnaire and their opinions as they were not biased. In addition, the participants were in different age groups, the youngest in their late teens and the oldest in their late fifties. Also, there were some that had previous experience with VR and some did not as shown in Table 3. Among the participants, there are people in various job oc- cupations, from which also people that work in the medical field. Between those, there are nurses, nurses in training, a psychotherapist, a physiotherapist, and surgery nurse.
These diverse characteristics can add to the reliability that was previously mentioned
as it reflects what it would be expected from the different users of a future product of
the system.
Previous experience with VR No Yes
Age Total
19 1 0 1
21 2 0 2
22 1 0 1
25 1 3 4
26 1 1 2
27 1 1 2
28 2 1 3
29 0 1 1
30 1 0 1
37 0 1 1
44 1 0 1
45 2 0 2
53 2 0 2
56 1 0 1
Total 16 8 24
Table 3: Age and previous participant experience with VR
3.4 Participant Questionnaire
The questionnaire for the participants was available to be filled in after the testing was completed (Appendix A.2). That could be done both on a personal computer and an iPad so that they could choose between them or if they were ready to answer it at the same time. The questions on the questionnaire were based on models by Brooke et al. (1996) and Gil-G´ omez et al. (2013). Some were left as they were, others were altered, and some were added in order to get the desired data that would help with answering the research questions.
The questionnaire starts with the scenario on the form description. Following are the name and email, which are required to be answered so that it is clear which participant gave which data during their analysis. The age is also asked before the start of the questions.
The first thirteen are Likert type questions (Boone and Boone, 2012) with five response alternatives, starting with ”Strongly Disagree” from the left, to ”Strongly Agree” on the right. Eight more Yes or No questions follow. For those, open-ended questions are also asked in order to allow the participants to elaborate on their an- swers. Finally, an open-ended question was added to let the participants add any other opinions and thoughts they would have on the project.
3.5 Usability Questionnaire
In order to see the usability of the system, notes were also taken in the form of
a questionnaire (Appendix A.3), based on a VR user interface usability model by
Sutcliffe and Kaur (2008). The questionnaire was being filled after a session, one for
each participant on the doctor role. It is comprised of nine Yes or No questions, five
open-ended questions for short answers and one for a long answer to add any additional comments on the session. These two questionnaires will add validity to the data as they are based from previous works on VR systems.
3.6 Data Analysis
Descriptive coding will be used for the data that are in the form of text and notes.
That means that they will be observed and summarised according to their main pat- terns. The numerical data are analyzed using descriptive statistics, frequencies, the Chi-square test, and the One-Way analysis of variance (ANOVA) with the help of Google Sheets and the IBM SPSS Statistics 24. The descriptive statistics will help to visualize and summarize the raw data, mostly for the Likert type data. With the use of frequencies, the data from the Yes or No questions can be described.
In order to draw additional conclusions from the previous two processes, the Chi- square test and specifically, the Pearson chi-square test is conducted. The Chi-square formula (Formula 1) contains the variables of χ
2which is the Chi-square variable, the O, which is the observed frequency and the E, which is the expected frequency.
x
2= Σ (O − E)
2E (1)
With that, the mean differences of two different categorical data (variables) can be examined and see if the distributions of some data are similar or not to what was expected. If the observed frequencies are similar to the expected ones, then the variables have no association, and we have the null hypothesis (H
0). If it is the opposite, and the variables are associated, we have the research hypothesis (H
1).
In order to check if the null hypotheses should be accepted, a significance level is established, so a p-value of 0.05 with 95% trust. If the significance is more than 0.05, it can be argued with a 95% trust that the two variables are distributed in an expected manner and therefore there is no association between them, H
0is accepted.
If the significance is less than the 0.05 p-value, then the H
0is not accepted, which means that the two values are statistically associated (Turner, 2014).
The ANOVA test is made in order to compare the means of two different groups
of a variable and tests if the means are statistically different. Its formula (Formula 2)
tests the H
0and is consisted of the group means, µ
1, µ
2, µ
κand where κ, the number
of groups.
η
0: µ
1= µ
2= . . . = µ
κ(2)
The null hypothesis is that the means of the groups are statistically equal, so there
is no significant difference. Similarly here, like in the Chi-square test, the significance
level is established on the 0.05 value with 95% of trust. That means that if the
significance is higher than the p-value,the null hypothesis is accepted. If it is less, the
alternative hypothesis (H
1) is accepted and that means that the means of the groups
are statistically significantly different (Kim, 2017).
4 Prototype
The main concept of the prototype originally started very differently from the final product. Two different scenarios were taken into consideration, from which the second one was chosen. The first included augmented reality (AR) where the users would be able to interact with a 3D object of a human anatomical part, like a heart, using their phones while on a video call. The mock-up of this concept and its scenario can be found in Appendix B.
4.1 Technological Aspects
The hardware consists of two VR headsets and two android phones. The first headset is the Samsung Gear VR (SM-R322)
2, a model made in 2015 which is compatible with some of Samsung’s older devices, the earliest being the Samsung S6 and the latest, the Samsung Galaxy Note7. In this research, it is paired with a Samsung S7 Edge smartphone (Figure 2).
The second VR headset is the Andersson VRG 1.0
3, which is compatible with smartphones that have a minimum of 4.6 and a maximum of 6 inches in screen size.
This headset was paired with a Samsung S9 smartphone (Figure 2).
Figure 2: From left to right: Samsung Gear VR with S7 Edge and Andersson VRG with S9.
4.1.1 Unity
The system is built in the Unity software from Unity Technologies. It allows the creation of interactive games and applications in 2D and 3D, in VR and AR.
For the completion of the application, three more different software technologies were used that were added in Unity’s integrated development environment (IDE).
Unity was the choice of developing software as it is an IDE that is able to build applications that are not only games. It allows the importation of assets from many other external sources and is compatible with other technologies such as the hand
2https://www.samsung.com/us/support/owners/product/gear-vr-2015
3https://productz.com/en/andersson-vrg-1-0-vr-headset
tracking software. For that reason, it was the first choice for the creation of the prototype. The programming language used to develop the application in Unity’s IDE is C#. In order to build the application on the android phones for VR, it is needed to change some settings in Unity and choose the right software development kit (SDK) for VR. In this case, the application was built once with the Oculus SDK for the Gear VR headset and once with the Google Cardboard SDK for the other.
4.1.2 Manomotion
Another SDK that was imported was the Manomotion SDK (Yousefi et al., 2018).
Manomotion’s concept was originally created through a research by Yousefi and Li (2015). They introduced a solution which used a database of hand gestures in order to allow interaction with hand gestures in a VE in real time.
Initially, the application was being developed with the Manomotion SDK 2.0.
Due to different issues regarding this version that are described below, it resulted in not meeting the project deadlines and much time lost, also with the addition of the technology’s learning curves. That version, although it has more features, requires a calibration of the user’s hand as a first step, which is impractical for VR. On top of that it was hard to network the hand tracking and the hand model as it required not just tracking and networking of one point, but many more. It was replaced later on, with the advice of a Manomotion expert, with the Manomotion SDK Lite (Figure 3).
It made developing easier and solved a lot of functional problems. The 2.0 version of the SDK gives more freedom in developing as it is possible to create custom hand gestures and tracks the entirety of the hand, including the fingertips, palm center, and contour points. In addition to that, there is a bounding box that encases the hand.
The Lite version is only tracking the bounding box around the hand, its width,
height, and top left corner position. The SDK has categorized their hand poses in three
different groups, called Manoclasses, the grab, the pinch, and the point. The hand
gestures are divided into trigger (gestures that trigger an event, similar to a mouse
click), and continuous gestures (gestures that are performed for a period of time).
Figure 3: Manomotion SDK Lite: SDK Features (Manomotion, 2019b)
4.1.3 Photon Engine
Photon
4is a company that offers networking and multiplayer solutions. One of their products is the Photon Unity Networking (PUN) package, which is included in Unity’s asset store for free and can be downloaded and imported directly from there. When the developing of the project was underway, they upgraded their product to version 2, and so that was used instead, also resulting in losing much needed time. The documentation available for PUN 2 was not much both from the official website and the its community. The differences between the two were not major but enough and in small details (uppercase/lowercase letters in key words) that a guideline and further research was needed. Despite that, PUN 2 made it possible for the application to be used with multiple users at the same time. The communication is done through
4https://www.photonengine.com/en-us/Photon
Photon’s servers and their cloud that hosts the games/applications. Each application has its own room, which is created or joined with its AppId. The rooms have a maximum level of user amount, so if one is full, a new one is created. The users in the same room receive whatever the others send, but not outside of it. Objects inside the VE can be instantiated and networked by adding the PhotonView component to them. It allows the object to be identified and also to which user it belongs to. The user that has ownership of that object is the one that sends the updates of it to the other users.
4.2 First Stage
The second scenario of the concept that was introduced was the one that was finally picked, as mentioned before. Following, are the scenario, and the mock-ups that were made. Patient B has had heart problems for some time, and so he is often in contact with his doctor. One day he experiences pain and he immediately video calls his doctor as it is on a day that the clinic is not open. The doctor puts on his VR headset, and shares his screen with the patient to show an anatomical heart so that he can explain the situation (Figure 4). In order to show some details better, he makes a hand gesture to make the model look bigger / bring it closer to the screen (Figure 5).
Figure 4: First mock-up image with Doctor and Patient view.
Figure 5: Second mock-up image with Doctor enlarging the heart model.
Although this was the first idea, the end result concluded with some very different characteristics. The prototype was developed in VR as initially planned but with both users having the HMDs on, the doctor and patient are in the VE in real time (multi-user), and only the doctor can interact in the VE.
Also, in the current prototype, the heart model has been replaced with a 3D skeleton model, and the hand representation is done with a 3D virtual hand model.
The hand gesture in the original idea was the pinch gesture while now the hand gestures are three, and different from that.
4.3 Environment
The application has two scenes. The IntroScene, where all the connections with Photon are made and the PlayScene that opens when the users have entered the Photon room.
Three objects exist in the environment of the scenes, a skeleton model (Figure 6)
that represents the physical body of the patient, the virtual hand that represents the
physical hand of the doctor, and the texts that appear only in the doctor’s scene with
the explanation of the patient’s problems. The skeleton model consists of models of
different human bones and organs. The bones from the head and the torso and the
organs of the heart and lungs. In Figure 6, they can both be seen as when they are a
part of the skeleton model and also, when they are extracted from it.
Figure 6: Left: Skeleton model with organs in, Right: Skeleton model with organs out.
The virtual hand consists of a series of spheres, and cylinders made in a way to reflect the human hand with its fingers, fingertips, joints and palm (Figure 7). The prefabs of the models were given by Manomotion.
Figure 7: Hand model.
The texts are created with a white background image so that the words in it can be seen clearly. The text for the heart writes: “From the X-rays, it was shown that the heart is enlarged. Extra tests will be needed in order to find the cause.” While the text for the lungs: “From the X-rays, it was shown that there is consolidation in the lungs.
According to the rest of the symptoms it appears to be the result of pneumonia.”
In addition, when the users enter the PlayScene, when the application is ready to be used, the horizon on the background changes color to a slight yellow, instead of the clear blue that it has while in the lobby (IntroScene), which is due to the different lighting settings of the scenes.
4.4 System Overview
The application begins with the connection to the Photon server, which places the users in a shared room. The user that connects first automatically acquires the role of the doctor, and the second one the role of the patient. The model of the hand appears in front of the skeleton (Figure 8), and when the doctor places their own hand in front of the phone camera in the open hand gesture, it takes the x and y coordinates of that (Figure 9). The depth (z) is set on a specific point. The patient sees the movement of the model hand in the environment.
Figure 8: First view of skeleton and hand model without hand tracking.
Figure 9: Skeleton and hand model with hand tracking.
The doctor performs the closed hand gesture, the patient sees it, and the three
organ models in the skeleton, the heart, and the two lungs are extracted from it so
that they are visible (Figure 10). The patient can now see them too.
Figure 10: Closed hand gesture and organs are extracted.
The doctor performs the point gesture, as the patient continues to observe, moves the hand model in front of the heart and the text about its condition (the doctor’s notes) appears on the lower left corner next to it (Figure 11)
5.
Figure 11: Point gesture and text for heart model.
The doctor can also move the hand model in front of the models of the lungs where the text about their condition will appear on the down right corner next to them (Figure 12). The patient still observes the movement of the hand, now in the point gesture, but they cannot see the texts. The diagram of the system overview can be seen in Figure 13.
5Text has a rotation as the screenshot was not taken while on the VR build.
Figure 12: Both texts visible just for the doctor.
Figure 13: System Overview Diagram
4.5 Networking
The first step while creating the prototype was to include networking in order to have a multi-user application (in this case, two users). The goal is to allow each user to instantiate in the same environment when they open the application, and so have synchronous communication. They both have the same view but only the first user, in this case, the doctor, can interact inside the environment. The second user, the patient, is only able to observe. The networking is achieved with the help of the Unity plugin, PUN 2.
When the application is loaded, with the first user, they enter the IntroScene where it first connects to Photon’s settings and the PhotonLobby from where it connects to the Master server which is a hub of the geographical region’s servers. Then, the application tries to join a random room, and if that fails, it creates a new one and the PlayScene loads. The user (or player, as they are referred to in the code) is instantiated with the virtual hand in the room, and if they are the first user in the room, they are assigned the Doctor player type. Then, it is checked what kind of player type they are, Doctor or Patient. If the user has the Doctor player type, they can access the virtual hand and make interactions with the environment. When the second user comes in the room, they get assigned to the Patient player type, and they are not able to interact with the environment. The first thing the users see in the VE when they enter is the skeleton model and the virtual hand that works as the representation of the first user’s real hand.
Objects in the scene can be synchronized among the two users by adding the Pho-
tonView component on an instance of an object as it is the thing that connects it with
the other instances on the other users’ scenes. As the virtual hand also has the role of
the first user’s avatar in the environment, Photon’s PhotonTransformView component
allows the instance of it in the second user’s view to synchronize with it (by adding
it also as an observed component in the PhotonView component). That way, the
Patient can see the virtual hand moving in the scene as the Doctor moves it. When
Manomotion’s hand tracker recognizes one of the three hand gestures, the hand skele-
ton enables the animation that matches to the corresponding hand gesture. Similarly,
Photon also has the PhotonAnimationView component that allows animations to be
synchronized (Figure 14).
Figure 14: Inspector of the virtual hand with the PhotonView, PhotonAnima- torView and PhotonTransformView components.
The skeleton similarly has a PhotonView with an observed PhotonAnimatorView
in order to synchronize its animation when the Doctor causes it to extract or retract
the two organs. As the organs are a part (children) of the skeleton, they do not need
to be networked on their own.
4.6 Hand Gestures
The virtual hand represents the real hand of the first user in the environment. It follows the position of the hand and replicates the hand gestures that the real hand is performing. That is achieved with the help of Manomotion’s SDK
6. Specifically here, the hand gestures that are done are the open hand, closed hand/grab, and the point gesture (Figure 15).
Figure 15: Hand Gestures: Open Hand (upper left), Closed Hand/Grab (upper right), Point (down). (Manomotion, 2019a)
All three are continuous hand gestures. ManomotionManager is added on the virtual hand. With it, the hand position is calculated from the bounding box centre.
If the closed hand gesture is performed then, the closed hand animation and the skeleton expand animation are executed. Similarly, with the open hand gesture, the open hand animation and retract animation are executed.
The point gesture does not have an immediate effect on the skeleton. While the organs are extracted, and the point gesture is performed, the hand and specifically the tip of the index finger can be placed in front of an organ in order for the equivalent text to pop up. That is achieved with a class from Unity that is called Physics.Raycast.
Essentially, it casts an invisible ray from a desired point, in this case, the tip of the index finger on the virtual hand, and when that ray hits the assigned object, in this case, one of the organs, an event can be triggered.
6https://www.manomotion.com/