Towards a Method to Detect F-formations in Real-Time to Enable Social Robots to Join Groups

(1)

http://www.diva-portal.org

Preprint

This is the submitted version of a paper presented at ECCE Workshop 2017: : Robots in

Contexts: Human-Robot Interaction as Physically and Socially Embedded conducted at

Umeå University, Sweden, 19 September, 2017.

Citation for the original published paper:

Krishna, S., Kiselev, A., Loutfi, A. (2017)

Towards a Method to Detect F-formations in Real-Time to Enable Social Robots to Join

Groups.

In: Towards a Method to Detect F-formations in Real-Time to Enable Social Robots to

Join Groups Umeå, Sweden: Umeå University

N.B. When citing this work, cite the original published paper.

Permanent link to this version:

(2)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106

Towards a Method to Detect F-formations in

Real-Time to Enable Social Robots to Join Groups

Sai Krishna

Center for Applied Autonomous Sensor Systems (AASS), Orebro

University Orebro, Sweden sai.krishna@oru.se

Andrey Kiselev Center for Applied Autonomous

Sensor Systems (AASS), Orebro University

Orebro, Sweden andrey.kiselev@oru.se

Amy Loutfi

Center for Applied Autonomous Sensor Systems (AASS), Orebro

University Orebro, Sweden amy.loutfi@oru.se ABSTRACT

In this paper, we extend an algorithm to detect constraint based F-formations for a telepresence robot and also consider the situation when the robot is in motion. The proposed algorithm is computationally inexpensive, uses an egocentric (first-person) vision, low memory, low quality vision settings and also works in real time which is explicitly designed for a mobile robot. The proposed approach is a first step advancing in the direction of automatically detecting F-formations for the robotics community.

CCS CONCEPTS

• Human-centered computing → Human computer in-teraction (HCI); • Computing methodologies → Com-puter vision; Vision for robotics;

KEYWORDS

Social Robot, F-formations, Face Orientation

ACM Reference Format:

Sai Krishna, Andrey Kiselev, and Amy Loutfi. 2017. Towards a Method to Detect F-formations in Real-Time to Enable Social Robots to Join Groups. In Proceedings of ECCE Workshop, Umea, Sweden, September 2017 (ECCE ’17),2 pages.

https://doi.org/10.1145/nnnnnnn.nnnnnnn

1 INTRODUCTION

Humans organise themselves spatially while interacting with others (social interaction). As social robots are increasingly entering our house and work space, it is of equal importance that these systems promote conditions for good social in-teraction by exploiting the natural spatial inin-teractions that humans use. The spatial orientations that humans use have been described by Adam Kendon’s [1] Facing formations, known as F-formations, which are spatial and orientational relationship between two or more people conversing with each other. Kendon proposed four standard formations. Vis-a-Visis when two people are interacting while facing each

other. Side-by-Side is when two people stand close to each other and face the same direction while conversing. L-shape is when two people are facing perpendicular to each other and are placed on the two edges of letter ‘L’. When three or more people are conversing in a circular arrangement then it is a circular formation.

Figure 1: An F-formation gives rise to three social spaces. O-Space: Convex empty space, P-O-Space: narrow strip on which people are standing & R-Space: beyond p-space. Left:Circular, center:L-Shape, right:Side-by-Side

Many researchers proposed different methods to detect F-formations. Few of them are: Cristani et al [2] used Hough voting strategy to locate the o-space, Graph-Cuts for F-formation (GCFF) [3] detects groups in still images using proxemic in-formation, Vascon et al [4] generate a frustum based on the position and orientation of person and compute affinity to extract F-formation. In [5], the authors consider the body orientation as the primary cue and propose a joint learning approach to estimate the pose and F-formation for groups in videos, these methods are from computer vision community.

Vazquez et al [6] explored this problem in robotics com-munity, proposed detecting the F-formations based on lower body estimation but uses an exocentric camera (overhead video data set). So, yet there is an unsolved problem of detect-ing F-formations in real time on a mobile robot. Our paper [7] explored this topic and detected the F-formations based on the face orientation. The algorithm proposed works in real time, uses low memory, low quality vision settings, compu-tationally inexpensive and uses an egocentric (first-person) vision but yet there are few constraints: works in laboratory settings, constraint based formations and the situation when the robot is in motion were not addressed. Constraint based formations: when one person is facing two or more people while interacting it is a Triangle formation, for example, the

(3)

107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159

ECCE ’17, September 2017, Umea, Sweden Sai Krishna, Andrey Kiselev, and Amy Loutfi

160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212

ticket counters. Rectangle formation is formed in board meet-ing rooms and Semi-Circular formation is formed when three or more people are focusing on same task while interact-ing with each other, for example, in-front of a wall while watching a piece of art.

In this paper, we extend our algorithm with few more hypothesis to detect constraint based F-formation and also consider the situation when the robot is in motion. 2 METHODOLOGY

Our method [7] uses Haar cascade face detector algorithm to detect the faces of the people in the scene. Once faces are detected, we identify the location of the eyes, to obtain rough estimation of the orientation of the person.

The methodology is based on quadrants. In our model, we assume, if a person is looking towards right then both eyes can be located in first quadrant; if the person is look-ing towards left then both eyes can be located in second quadrant; if looking towards center then both eyes are in both the quadrants, one in each. This is obviously a very rough approximation but we are interested only in the fac-ing direction of the person and not in calculatfac-ing the exact angle of the head. After estimating the facing direction, the F-formations are detected by mapping these face orienta-tions to the spatial arrangements depending on the number of people.

We extend this algorithm to account, when only one eye is visible. If a person’s left eye is located in first quadrant and no eyes in the second quadrant then we assume that they are facing left and if a person’s right eye is located in second quadrant and no eyes in the first quadrant then they are facing right which can be seen in Figures 3 & 2. Using these hypothesis, we detect triangle formation which is a constraint based F- formation and arises when one person is facing left and two persons are facing right or vice versa which can be observed in Figure 2.

(a) Triangle

Figure 2: F-formations in real time.

3 EXPERIMENTAL RESULTS

We performed qualitative experiments to estimate the F-formations in real time with a webcam of 640 x 480 pixel resolution on a standard PC and tested on 11 people with different spatial configurations. The results obtained are at a rate of 20 fps. The experiment scenario was done in this way: a camera (robot’s vision) is placed at a height of 1.5m, people walk into the scene (in front of camera) and have a natural conversation. The algorithm starts detecting people’s

faces, estimates their face orientation and then identifies the F-formations. The algorithm works as far as 2.5 meters between the people & the robot. We also tested the algorithm by moving the camera left to right and right to left. Assuming the robot has 1 DOF, which is the yaw axis similar to humans moving their head side to side. The results can be observed in Figure 3.

Figure 3: Frames:when robot (camera) is in motion.

4 CONCLUSIONS

In this paper, we extended the proposed method by few more assumptions and also, tested the algorithm when the robot is in motion. In future, we would extend this algorithm to other formations & natural settings and also create a dataset to evaluate the algorithms in real time.

REFERENCES

[1] A. Kendon, “Spacing and orientation in co-present interaction,” in Lec-ture Notes in Computer Science (including subseries LecLec-ture Notes in Arti-ficial Intelligence and Lecture Notes in Bioinformatics), vol. 5967 LNCS, 2010, pp. 1–15.

[2] M. Cristani, L. Bazzani, G. Paggetti, A. Fossati, D. Tosato, A. D. Bue, G. Menegaz, V. Murino, Andrea Fossati, and A. Del˜Bue, “Social interac-tion discovery by statistical analysis of F-formainterac-tions,” British Machine Vision Conference (BMVC), 2011.

[3] F. Setti, C. Russell, C. Bassetti, and M. Cristani, “F-formation detection: Individuating free-standing conversational groups in images,” PLoS ONE, vol. 10, no. 5, 2015.

[4] S. Vascon, E. Z. Mequanint, M. Cristani, H. Hung, M. Pelillo, and V. Murino, “A Game-Theoretic Probabilistic Approach For Detecting Conversational Groups,” in Lecture Notes in Computer Science (includ-ing subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9007, 2015, pp. 658–675.

[5] E. Ricci, J. Varadarajan, R. Subramanian, S. R. Bulo, N. Ahuja, and O. Lanz, “Uncovering interactions and interactors: Joint estimation of head, body orientation and f-formations from surveillance videos,” in Proceedings of the IEEE International Conference on Computer Vision, vol. 11-18-Dece, 2016, pp. 4660–4668.

[6] M. Vázquez, A. Steinfeld, and S. E. Hudson, “Parallel detection of conver-sational groups of free-standing people and tracking of their lower-body orientation,” in Intelligent Robots and Systems (IROS), 2015 IEEE/RSJ In-ternational Conference on. IEEE, 2015, pp. 3010–3017.

[7] S. K. Pathi, A. Kiselev, and A. Loutfi, “Estimating f-formations for mo-bile robotic telepresence,” in Proceedings of the Companion of the 2017 ACM/IEEE International Conference on Human-Robot Interaction. ACM, 2017, pp. 255–256.