Evaluating regular and speech-based text entry for creation of smartphone based addresses

(1)

,

STOCKHOLM SVERIGE 2020

Evaluating regular and

speech-based text entry for creation of

smartphone based addresses

VIKTOR MÖRSELL, MOIRA TWENGSTRÖM

KTH

(2)

Evaluating regular and speech-based text entry for

creation of smartphone based addresses

Utv¨ardering av diktering som input-metod vid skapandet av

smartphone-genererade adresser

Viktor M¨orsell

1

and Moira Twengstr¨om

2 1,2_{KTH Royal Institute of Technology, Stockholm, Sweden}

Abstract - Billions of people on earth lack a home address. In this paper we are investigating an approach to solve this using an address system where addresses consists of a GPS location and a description of how you find your way to the house when you are within close distance to the GPS location. The aim of the paper is to measure if said description has higher quality when it’s given using speech-based or regular text entry. Our findings indicate that speech based text input gives 1.7 times more information in about 5.5 times less time. From a usability standpoint there was no indicated difference, but as the experiments were carried out during perfect conditions it is concluded that speech-based text entry would likely present more of a challenge for the users. When and if speech recognition is more widely adopted into systems for everyday use, speech-based text entry will be a good asset for increasing the amount of information collected from users in navigational contexts.

Sammanfattning - Det uppskattas att över en miljard människor lever utan en adress. Den här studien siktar till att förbättra en applösning som använder genererade adresser best˚aende av GPS-koordinater och en tillhörande beskrivning. Beskrivningen är menad att vägleda användaren när hon befinner sig i näromr˚adet för att komplettera GPS-punktens eventuella osäkerhet. Syftet är att undersöka om en s˚adan beskrivning är av bättre kvalitet om den skapas med röstigenkänning än med vanlig text-input. Resultaten visar att röstbaserad input ger 1.7 g˚anger mer information än om användarna f˚ar skriva direkt i sin mobiltelefon och spenderar i snitt 5.5 g˚anger mindre tid med uppgiften. Användarnas utvärdering indikerar ingen skillnad i användarvänlighet, men eftersom experimenten utförts under perfekta förh˚allanden sl˚as det fast att röstbaserad input förmodligen skulle innebära mer av en utmaning för användare. När och om röstigenkänning blir en mer integrerad del i vardagstekniken skulle röstbaserad text-input vara ett användbart medel att öka mängden information man f˚ar ut av användarnas egna beskrivningar.

I. INTRODUCTION

B

ILLIONS of people on earth lack a home address [1]. The importance of something considered so basic in most parts of the world might be lost on those that have one. Having an address is a prerequisite for taking part in many of society’s services, such as the postal system, for opening a bank account [7], for directing customers to your local business or for signing up for any service that require you to state a place of residence.

In Sub-Saharan Africa many areas are growing so quickly their location on Google Maps is still just desert [3], and others are simply not given addresses by local government [7]. Addresses are shared as ”the red door by the water tower up the hill” [4]; ambiguous to anyone but your own neighbors. Attempts have been made with technological approaches to remedy these insufficient infrastructures. The Swedish com-pany Addressya recently launched a free, smartphone based system in Rwanda where anyone can create their own address [9]. The addresses in Addressya’s app have two components; a GPS location and a supplementary description of the sur-rounding area to help navigate once you are close by.

According to Addressya company representatives, this user given supplementary information provides the main challenge for the usefulness of the system. That is, the written in-formation given is not of high quality enough to provide a trustworthy address. Either the amount of information is insufficient, or the information given is not descriptive enough to be unambiguous.

A. Purpose

The purpose of this study is to investigate if people tend to give better directions with speech input as opposed to regular text entry. It will focus on the amount of information and details given, not attempting to asses whether the resulting navigational instructions are more or less easy to follow. This can be used in mapping services to improve the descriptions that complement GPS location in areas without addresses.

B. Problem statement

The two main questions we aim to answer with this report are the following.

1) Do people tend to give more information when giving input through speech as compared to regular text entry? Do they spend more or less time on the same assign-ment?

2) How does a transition from regular to speech-based text entry affect the usability?

C. Background

(3)

field large enough for you to draw your house in relation to a place that does have an address, or is generally known. In Accra, capital of Ghana, the difficulty of locating homes in a densely populated area was reported by BBC in 2019 to prolong ambulance response time so severely it is practice to call for a taxi cab in emergencies [4].

Some areas do not have addresses despite being in close proximity to the capital, such as is the case with informal settlements. Sub-Saharan Africa have the highest proportion of population living in slums [6]. As their existence is not always officially recognized by the government they are not included in the public efforts to map out cities. In Kiberia, a community close to Nairobi and Africa’s largest slum [10](although this has been disputed [11]), efforts have been made to map out the area using machine learning [6]. The knowledge about its populations distribution remains poor however, as the suggested number of its inhabitants has ranged from 170 thousand [11] to 800 thousand [10]. In addition to the streets being nameless, accessibility is low with only 17% of roads being paved as of 2018 [8].

Solutions similar to that of Addressya have been imple-mented in some countries to address the issue. In Kenya the OkHi app was launched in 2013, utilizing a GPS point as well as a picture of the front door of your home [12]. It can be shared in much the same way as an Addressya address [13]. A slightly different approach has been implemented in Ghana, where the national ambulance service uses an app called The SnooCode App to locate callers [4]. Anyone can generate a unique code for their house composed of letters and numbers (meaning users need not be fully literate) that the app can guide other users, such as ambulances, to. The British company What3Words have divided the world into 3-meter-by-3-meter squares and assigned them with unique three-word strings [14]. These strings becomes the address for the square. There are ventures with machine learning, among others a joint effort where Facebook and MIT are trying to generate addresses by automatically recognizing roads using satellite images [1].

Many positive social effects have been expected of suc-cessfully implementing such technology. Aside from crucial services such as ambulance transport, an entire job market of delivery-related jobs could emerge [7] [4]. In South Africa the lack of physical addresses was recently pointed out as an aggravating factor to the difficulties of combatting the covid-19 pandemic [5], as not knowing where large parts of the population resides obstructs efforts to trace the contagion and isolate populated areas where it has spread.

D. Recent studies regarding speech-based text entry The speed of speech entry has been examined several times [15], [16] with the conclusion that for any strings longer than a few characters, speech is faster [17], [18]. These experiments were carried out with subjects being given the input beforehand, in the form of either short sentences or strings of random digits. In 2017 Andrew Ng and colleagues examined the speed of short English sentences, concluding that speech input was 2.93 times faster than text input [17].

The error rate has been determined to be about the same for both entry methods in laboratory conditions [17] but significantly higher (about 33 %) when subject is on the go [19]. Background noise is an ever-present problem in deployed speech-recognition systems and is presumably the cause of the system being so prone to error. It has also been determined that more errors are made when the subject is walking than when seated [19].

In addition to its efficiency, the usability and reception of speech-based systems has been examined by different studies [20], [21]. One long-term study estimated 86 % of users abandoned their speech recognition system after half a year due to poor user experience [20]. Others address its users’ reserve concerning privacy [21].

II. THEORETICALFRAMEWORK

A. System Usability Scale

System Usability Scale is a quick and simple method for measuring the usability of a system. It uses a questionnaire with ten statements, where the subjects grades each statement using a five point Likert scale from Strongly agree (5) to Strongly disagree(1). Odd statements are positive, for example I would imagine that most people would learn to use this system very quickly. Even statements are negative, for example I needed to learn a lot of things before I could get going with this system.

The optimal number of points in a Likert scale has been debated over the years [22]–[24], and no consensus regarding the optimal number of options seems to have been reached.

We have used a five point scale in our System Usability Scale evaluation questionnaire. A five point scale is considered to be a good resolution for most applications, since five categories is short enough to select an answer quickly, and complete enough to express the feelings satisfactorily [25].

The final SUS score is calculated as follows:

2.5 ∗ 10 X

1

f (Qi) (1)

The function f (Qi) (2) reverses the scale for negative statements and translates each statement’s score from range 1-5 to range 0-4. i represents the statement’s index in the questionnaire. f (Qi) = ( 5 − Qi, if i is odd Qi− 1, if i is even (2)

This gives a maximum SUS score of 100. We will use an adjective rating scale to interpret the mean SUS score for each experiment app [26]. The scale is defined in table I below.

B. Multi-Level Perspective Model (MLP)

(4)

TABLE I SUSADJECTIVE SCALE

Adjective Mean SUS score Best Imaginable >90.9 Excellent 85.5 Good 71.4 OK 50.9 Poor 35.7 Awful 20.3 Worst Imaginable <12.5

landscape. It divides the studied system into three different levels; niche, regime and landscape, where a niche is a new, possibly radical, technology.

Fig. 1. Illustration of how the multi-level perspective describes the interaction between different levels. Image from [27].

A niche can be explained as a new idea or innovation. It typically originates from a protected environment where it can be tested and evolved. The more radical the idea, the higher is the importance of the isolated first phase, since radical new niches often are nothing more than ’hopeful monstrosities’ in the beginning [28]. When the niche is ready, it tries to challenge the prevailing regime with the definitive to become a part of the socio-technocal regime, and act as a remedy to the tensions that has been built up within the regime.

The socio-technical regime is a non-physical structure that encapsulates the parts that accounts for the stability of the system. As briefly explained above, tensions in the regime can emerge from both within the system (internal pressure), and from the socio-technical landscape (external pressure). The regime stabilises development trajectories and sets the standard of how things are done in the system. It can be viewed as a product of several sub-categories e.g. technology, user and market, and policies. When analysing the socio-technical regime as a whole, it is crucial to consider the interconnection between the sub-regimes to understand the tension that makes way for niches to become a part of the regime.

III. IMPLEMENTATION

A. Delimitations and limitations

The study was originally intended to be executed in April 2020 in Kigali, Rwanda. Due to travel restrictions caused by

the pandemic outbreak of covid-19 in 2020, this unfortunately became impossible. As a result of this, we redesigned the study and executed it in Stockholm, Sweden in May 2020 instead.

The main delimitations and limitations are the following: • Stockholm has a very limited number of areas with

relatively dense housing, but no addresses. We decided to go for an allotment garden on Kungsholmen, which seemed to be the best fit for the experiment among the possible places we could find in Stockholm. See detailed description of the area in section III-E.

• The current healthcare guidelines and social distancing routines applied during the time of the study likely made recruitment of participants more difficult. Many of the people asked to participate in the study declined due to the disease transmission risk.

• Only a small sample of 24 subjects were used due to time constraints.

• The sampling method used was a convenience sampling where the participants were recruited on the location where the experiment was executed.

B. Subjects and groups

For this study, we asked 24 subjects to participate in the experiment. The subjects were divided into two groups, one group were tested using a app utilizing regular text input, and the other using an app utilizing speech-based text entry.

TABLE II EXPERIMENT SUBJECTS Text Speech Males 3 7 Females 8 5 Other gender 1 0 N 12 12 Average age (SD) 41.8 (17.8) 48.0 (13.6)

The sampling method used for was a convenience sampling, where the subjects were recruited from people passing by the allotment garden where the experiment took place. This method was mainly selected since we assessed it to be the only way to execute the study in the given timeframe. By recruiting test subjects next to the location where the experiment took place, we got access to subjects that would be hard to reach in other ways. Furthermore, a convenience sampling with on-site recruitment is very cost effective. The waiting time for the researchers can be minimized and the planning can be greatly simplified. For the subject, the time spent while participating in the experiment is minimized.

The downside of this method is that it’s more likely to be affected by selection bias and sampling errors.

We also asked every participant how often they used a smartphone, how often they were in the vicinity, how often they used speech-controlled applications and what types of applications. All ”how often”-questions were answered by selecting one of the following options; every day, more than twice a week, every week, every month or more rarely.

(5)

the participants in the speech-based text entry were in the vicinity of the experiment location every week. 25 % of the participants in the regular text entry experiment and 8.3 % of the participants in the speech-based text entry experiment used speech control every week. The speech controlled application stated to be used was Apple’s personal assistant software Siri in all cases.

C. Data collection applications

For the purpose of this study, we developed two smartphone apps for data collection, a web based control panel for the researcher controlling the speech-based text entry experiment, and a backend. The smartphone apps let the subject describe how you would get to a specific location. One by regular text input, and one by speech-based text entry.

In order to provide feedback to the subjects using the speech-based text entry app, we designed the study as a Wizard of Oz study. This is a method to perform experiments involving man-machine interactions without the need for a machine intelligent enough to handle these interactions. The test subject perceives it as it’s communicating with the machine, but the machine’s actions are in fact dispatched by an experimenter, the wizard [29]. This makes the method suitable for evaluat-ing user-friendliness durevaluat-ing complex communications with a machine.

By selecting this study design we eliminated the risk for confusions due to speech recognition errors, as State-of-the-Art speech recognition models for Swedish spontaneous speech are expected to have a 20 % error rate.

The subjects were told the sound recorded by the app was analyzed in real-time using a speech recognition algorithm capable of handling spontaneous natural language, which in turn produced short summaries that were sent in return to the app and displayed for the user. One of the researchers sat with a laptop within hearing distance from the subject. This was explained to be a control function, where the researcher listened to the subject and monitored the feedback being sent back from the algorithm to ensure it’s correct.

In fact, the feedback messages to the speech-based text entry app were written by the experimenter with the laptop, the wizard, who used a web application to push text messages to the smartphone app used by the subject. The wizard was continuously writing on the keyboard during the experiment to avoid giving away the fact that it was producing the feedback messages.

The regular text entry experiments were conducted without involvement of the wizard.

The smartphone applications were developed using React Native and TypeScript. The backend was developed using TypeScript and was hosted as serverless Node.js functions on Microsoft Azure. We used Microsoft Azure SignalR to transfer feedback messages from the Wizard control interface to the speech-based text entry app, which in turn utilizes WebSockets. The Wizard control interface was developed using HTML and JavaScript and was hosted as a static website on Microsoft Azure.

Our selections of techniques and infrastructure were based on our previous experience. By limiting the scope to program-ming languages and infrastructures familiar to us, we thought it would be easier to assess what techniques are most suitable for solving the task, as well as reducing the likeliness of development errors.

All applications developed for the study are available as open source released under MIT license [31].

Uses asmartphone Isin the vicinity Uses speech control 0 20 40 60 80 100 100 33.3 25 16.7 25 25 75 % Every day At least twice a week Every week Every month More rarely 91.7 25 8.3 8.3 33.3 33.3 16.7 8.3 75

Fig. 2. Characteristics for the participant groups. The regular text entry experiment group is represented to the left, and the speech-based text entry group to the right.

D. Apparatus

For the text based experiment, the subjects were equipped with an iPhone 7. iPhone was selected due to it’s familiarity among the Swedish population. 50 % of the Swedish citizens older than 12 years owns an iPhone [32].

Most of the subjects in the speech-based text entry exper-iment were equipped with an iPhone 7, although two of the subjects used an iPhone 5 in order to minimize the waiting time between experiments. The iPhone 5 ran the same version of iOS as the main experiment device, but has a slightly smaller screen size of 4.0” compared to 4.7” on iPhone 7. We don’t think this affects the results of the experiment since the subject had no physical interaction with the device, except for holding it in its hand.

E. Location and task

(6)

parcels, of which many have red houses. The subjects were recruited at point A (see figure 3) and the experiment was conducted at point B, right in front of the target house marked as a box with horizontal stripes.

Fig. 3. Map of the allotment garden at Hornsbergs strand.

The subject was instructed to give a description that would lead a person unfamiliar with the allotment area to a specific house. The allotment area has at least three entry points, which means it’s not sufficient to just give the description from a particular entrance.

The motivation of the subjects action were to create an address for its house using a GPS position with an accuracy of 50-100 meters and a description. The aim for the experiment was to give that description. A person unfamiliar to the area should be able to find the exact location of the house using only the GPS position and the description.

By choosing a house in an allotment area we hoped to get the same type of descriptions we would have got if we performed this experiment in an area with few or no street addresses.

F. Data evaluation

All examples of navigational instructions given in this section are from the real data set.

When assessing the amount of information a subject’s description contains three metrics have been used: number of words, number of instructions and number of details.

We define an instruction as a sentence or part of a sentence that is independently helpful and makes sense. “Opposite the Karlberg canteen, on the other side of the canal” would be one instruction, since “the other side of the canal” is ambiguous without the first half of the sentence.

A detail enhances understanding but is by itself not useful, the most common being adjectives. ”There is a narrow path leading from the canal to the house” would be scored as one instruction (there is a path from the water to the house) and one detail (the path being narrow). They can be piled on top of each other, e.g. “There is a white double window with white shutters facing the canal” would be scored as one instruction (there is a window facing the canal) and 4 details (i.e. the window being white, the window being a double window, the window having shutters and those shutters being white).

These two types of information have been distinguished between because the assumption pre-study is that while two instructions may be the same length and concern the same thing (and thus be scored the same based on the two first met-rics) they may differ significantly in amount of information. For instance, ”Take the narrow gravel path perpendicular to the canal up to the red wooden house” would be scored the same as ”Walk the road that goes from the canal all the way up to the house”, despite containing the additional information of the path being perpendicular to the canal, a gravel path, narrow, and the house being wooden and red.

As the purpose of this study is to determine whether people tend to give more or less information with voice entering than with text, the model by which we assess the data does not take the quality of the information into account. That is, the score of a subject’s given input are not affected by the following:

• Incorrectness: ”The house has only one neighbour to its right, and one across from it” is scored as two instructions, despite the house having several neighbors in every direction.

• Uselessness/semi-uselessness: ”The house is to the right of the canal” is considered an instruction, despite its obvious ambiguity. ”There are tulips growing in the garden” is considered an instruction despite its usefulness being limited to spring time.

• Repetitions: ”The house is in the middle of the garden area” and ”I live right in the center of the garden area” are both scored as one instruction each, despite being two ways to phrase the same sentence.

• Language: Subjects participated in Swedish and English, all data was scored without prior translation.

IV. RESULTS ANDOBSERVATIONS

A. Time spent, number of words and number of instructions given

TABLE III

AVERAGE AND STANDARD DEVIATION OF TIME,NUMBER OF INSTRUCTIONS

Text Speech Average time in seconds (SD) 407.3 (164.8) 73.6 (43.3) Average number of words (SD) 60.3 (33.4) 82.3 (51.9) Average words per minute, WPM (SD) 9 (11.4) 68.4 (23.4) Average number of details (SD) 6.3 (4.6) 6.3 (5.0) Average number of instructions (SD) 5.2 (2.4) 9.0 (5.5)

(7)

the evaluators’ scoring diverged, the final sum of instructions varied a total of 8 % and details 5 % in text entry and 7 % and 5 % respectively in speech entry.

The results show that while the amount of adjectives and descriptive details are the same for both entry methods, the number of instructions given with speech are 1.7 times the number given with text entry. The latter takes on average 5.5 times longer to enter than with speech, meaning that speech entry gives on average 9.6 times the number of instructions had they continued on for the same time.

B. User experience

TABLE IV AVERAGESUSSCORES

Text Speech Average SUS score (SD) 80.6 (11.0) 84.4 (9.7) Value on SUS adjective scale (table I) Good Good

The results presented in table IV show that the app with speech-based text entry has a somewhat higher average System Usability Scale, SUS, score of 84.4 compared to the app with regular text entry, which have an average score of 80.6.

Since we’re evaluating the score using the adjective scale (see table I) the scores for both applications are rated as Good.

V. DISCUSSION

As the average number of instructions given with speech input was 70 % greater than that with text the results strongly imply that people tend to give more information when speak-ing freely than when writspeak-ing. If we adjust for the difference in time required to enter the information with the factor measured by Andrew Ng and colleagues in 2016 the speech-entered directions would be 3.4 times as many as the text-entered ones. This means that, in the same amount of time and adjusting for the keyboard being slower than speech, directions given with speech still contain more than 3 times as many instructions as those given with text.

The patience for the entry method can not account for the entire difference in time spent on giving directions. People seem to be comfortable spending double the amount of time entering text rather than with speech when accounting for the difference in entry speed.

The quality aspect of the information has as stated not been assessed in the evaluation model, such as repetitions, incorrect or partly unusable information. While the subjects evidently give more instructions with speech input, they might contain more erroneous information as it cannot be retroactively corrected. Speech input might be more prone to repetitions, as you have no overview of what has already been said. As the data was processed there was however no apparent difference in the directions given; most could be categorized into a few different approaches. Assessing their usefulness likely requires a separate framework with formal rules of what is considered ”qualitative” information. Further work could for instance include having subjects attempting to follow the directions given, or determine which of these approaches of

giving directions subjects tend to choose with the two entry methods.

A. Ambiguity of data evaluation model

As this model of assessment is new and created solely for this task, its adequacy is unproved. It relies on the evaluator’s distinguishments between what is an enhancing detail and what is an independent direction. For instance, ”the house is red with a white window” could be scored either as two or three details, depending on whether you find the house having a window as a helpful detail. Despite the variation between the different assessment being fairly low, (8% at most) it speaks to an uncertainty. However when instead considering the most sparse scoring of the speech entry and the most generous of the text entry the number of instructions in the former is still 1.5 times larger.

B. Usability

The usage of SUS as a measure tool for usability in this specific case can be debated due to two questions intended to measure the usability regarding different components in a system. Since our experiment app can be considered to only have one function, these questions might have caused confusion. However, eliminating these questions would give the regular text entry app a SUS score of 66.5 out of 80, and the speech-based text entry app a score of 68.8 out of 80. This indicates a difference of 3.5 %, which implies that the actual difference in SUS might be lower than the measured 4.7 %.

Since our results indicates a minor difference in SUS scores, and the scores are on the same value on the SUS adjective scale, we conclude that a transition from regular text entry to speech-based text entry likely wouldn’t cause an substantially changed usability for the app.

Some of the participants asked us if they could skip grading statements in the SUS questionnaire that they found irrelevant. The SUS model doesn’t allow that, but the intention of our five-point scale was to provide a middle, neutral, option for such cases where a participant neither agree nor disagree with a statement. Clearly, some participants didn’t interpret the scale that way, which might have affected their final answers.

C. Speech recognition as entry method for navigational instructions

(8)

speech-based input in unlikely to become the industry standard by push from consumers.

Instead, the relative advantage would need to be apparent for the system designers, and as presented in the background, not much research has been done on the subject. The most quoted studies the ones to be summarized shortly in this report -most conclude that the poor performance or speech recognition in general causes a low adoptions rate of the technology, which might be why it is yet to be implemented in many kinds of systems including those dealing with navigation. Many of the complicating factors mentioned in the studies on speech recognition are relevant in this implementation, such as the presence of background noise and the user being on the move. As this study was carried out with The Wizard of Oz concept the results has been based on perfect conditions, meaning both the amount of information perceived by the device and the subsequent experience of the use would be lower under real-world conditions. The advantage of the information input would still be significant in this context, but as long as the general perception of the entry method is that of being inefficient and frustrating it is likely to do more harm than good.

Suppose the many incremental improvements in speech recognition that is likely to be done in the coming years resulted in a more adapted and efficient technology. Casting navigational apps as the socio-technical regime, the changes would need to come about due to changes from the landscape of information entry in society in general. The patience for text as an entry method is likely to decrease if speech input becomes dominant design in other kinds of systems, pushing the regime to adapt.

D. Conclusion

Speech-based text input carries a significant advantage as opposed to regular text entry in amount of information it delivers: 1.7 times more in total and 9.6 times more when adjusting for time. Users are comfortable spending about 5.5 times more time entering text than speech, and the two methods yielded no notable difference in user experience.

It is however concluded that the perfect conditions during the experiment eliminates the difficulties other studies have concluded users have when using speech-recognition. While the transition to using speech-based text entry in navigational apps might be technically feasible with the current technical conditions and would greatly benefit the system in the aspect of information input, the technology needs to be better adopted in other day-to-day aspects of our society before it is perceived by the user as an improvement in usability.

ACKNOWLEDGMENT

We would like to express our kindest thanks to our mentors at KTH Royal Institute of Technology; Prof. Olov Engwall, School of Electrical Engineering and Computer Science, and Dr. Mattias Wiggberg, School of Industrial Engineering and Management. This also extends to Prof. Joakim Gustafsson and Assoc. Prof. Johan Boye for excellent mentorship during our our initial problem finding phase. Finally, we would like

to thank Karoline Beronius, Sadiki Businge and the other employees at Addressya who helped us plan the intended field study in Kigali, Rwanda.

REFERENCES

[1] K. Hao, Billions of people lack an address. Machine learning could change that, MIT Technology Review, Nov. 29 2018. Accessed on: Nov. 3, 2019. [Online]. Available: https://www.technologyreview.com/2018/11/29/138883/ four-billion-people-lack-an-address-machine-learning-could-change-that/ [2] UNESCO Institute for Statistics, Literacy Rates Continue to

Rise from One Generation to the Next (Fact Sheet No. 45, FS/2017/LIT/45), UNESCO Institute for Statistics, Sept. 2017. [Online]. Available: http://uis.unesco.org/sites/default/files/documents/ fs45-literacy-rates-continue-rise-generation-to-next-en-2017 0.pdf [Accessed on: May 10 2019].

[3] M. Onuoha, Side-by-side images expose a glitch in Google’s maps, QUARTZ, June 2017. [Online]. Available: https://qz.com/982709/

google-maps-is-making-entire-communities-invisible-the-consequences-are-worrying/ [Accessed on: May 10 2019].

[4] C. Matthews, Finding your way in a country without street addresses, BBC, Feb. 1 2016. [Online]. Available: https://www.bbc.com/news/ world-africa-35385636 [Accessed on: May 10 2019].

[5] P. Lehohla, SA’s lack of physical addresses amid Covid-19 a problem, Independent Online, Mars 26 2020. [Online]. Available: https://www.iol.co.za/business-report/ opinion-sas-lack-of-physical-addresses-amid-covid-19-a-problem-45541237 [Accessed on: May 6 2019].

[6] J. Panek and L. Sobotova, Community mapping in urban informal settle-ments: examples from Nairobi, Kenya, Electronic Journal of Information Systems in Developing Countries, 2015. [Online]. https://onlinelibrary. wiley.com/doi/pdf/10.1002/j.1681-4835.2015.tb00487.x [Accessed on: May 10 2019].

[7] BBC, Letter from Africa: The art of drawing your address in The Gambia, BBC, April 22 2019. [Online]. Available: https://www.bbc.com/news/ world-africa-47968968 [Accessed on: May 10 2019].

[8] C. Calder´on, C- Cant´u and P. Chuhan-Pole. Infrastructure Development in Sub-Saharan Africa, World Bank Group, May 2018. [Online]. Available: http://documents.worldbank.org/curated/en/866331525265592425/pdf/ Infrastructure-development-in-Sub-Saharan-Africa-a-scorecard.pdf. [Accessed on: May 10 2019].

[9] J.Karlsson, Addressya tog hem pitcht¨avlingen p˚a Female Founders, Da-gens Industri, April 10 2019. [Online]. Available: https://digital.di.se/ artikel/addressya-tog-hem-pitchtavlingen-pa-female-founders. [Accessed on: May 10 2019].

[10] P. Fihlani, Kenya’s Kibera slum gets a revamp, BBC, Feb. 23 2015. [Online]. Available: https://www.bbc.com/news/world-africa-31540911. [Accessed on: May 8 2019].

[11] M. Karanja, Myth shattered: Kibera numbers fail to add up, Daily Nation, Sep. 3 2010. [Online]. Available: https://www.nation.co.ke/news/ Kibera-numbers-fail-to-add-up/1056-1003404-lma0qz/index.html. [Ac-cessed on: May 10 2019].

[12] F. Mekuria, E. Enideg Nigussie, W. Dargie, M. Edward and T. Tegegne, Information and Communication Technology for Development for Africa: First International Conference, Sep. 2017. Available: https://books.google.se/books?id=8QtjDwAAQBAJ&pg=PA103&lpg= PA103&dq=okhi+addresses+kenya&source=bl&ots=TrqboMZylY&sig= ACfU3U08 KBGDv4SiO7AShyD26nleNbQ g&hl=sv&sa=X&ved= 2ahUKEwiO9J3AqanpAhVBcZoKHeKYDswQ6AEwB3oECAgQAQ# v=onepage&q=okhi\%20addresses\%20kenya&f=false.[Accessed on: May 11 2019].

[13] OkHi, https://www.okhi.com/. [Accessed on: May 9 2019].

[14] M. Toh, This startup helps you find any place on the planet with-out an address, CNN Business, Aug. 27 2019. Accessed on: Nov. 3 2019. [Online]. Available: https://edition.cnn.com/2019/08/27/tech/ what3words-app-w3w-addressstartup/index.html

[15] S. Ruan, J. O. Wobbrock, K. Liou, A. Ng, J. Landay, ”Comparing Speech and Keyboard Text Entry for Short Messages in Two Languages on Touchscreen Phones,” Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, vol. 1, no. 4, pp. 159:1-159:23, 2017.

(9)

[17] S. Ruan, J .Wobbrock, K. Liou, A. Ng, J. Landay (2016). Comparing Speech and Keyboard Text Entry for Short Messages in Two Languages on Touchscreen Phones.1(4), 1-23.

[18] A. Hauptmann and A. Rudnicky. A Comparison of Speech vs Typed Input, Carnegie Mellon University Pittsburgh.

[19] M. Lin, R. Goldman, K. J. Price, A. Sears, and J. Jacko, How do people tap when walking? An empirical investigation of nomadic data entry, International Journal of Human - Computer Studies, vol. 65, no. 9, pp. 759–769, 2007, doi: 10.1016/j.ijhcs.2007.04.001.

[20] H. Horstmann Koester (2003). Abandonment of speech recognition by new users

[21] N. Sawhney and C. Schmandt Nomadic radio: speech and audio interaction for contextual messaging in nomadic environments, ACM Transactions on Computer-Human Interaction (TOCHI), vol. 7, no. 3, pp. 353–383, 2000, doi: 10.1145/355324.355327.

[22] W. R. Garner, ”Rating scales, discriminability and information transmis-sion,” in Psychological Review, vol. 67, no. 6, pp. 343–352, 1960, doi: 10.1037/h0043047.

[23] P. Green and V. Rao, ”Rating scales and information recovery: How many scales and response categories to use?”, in Journal of Marketing, vol. 34, no. 3, pp. 33–39, 1970, doi: 10.2307/1249817.

[24] H. G. Schutz and M. H. Rucker, ”A Comparison of Variable Con-figurations Across Scale Lengths: An Empirical Study,” in Educational and Psychological Measurement, vol. 35, no. 2, pp. 319–324, 1975, doi: 10.1177/001316447503500210.

[25] C. C. Preston and A. M. Colman, ”Optimal number of response categories in rating scales: reliability, validity, discriminating power, and respondent preferences,” Acta Psychological, vol. 104, no. 1, pp. 1–15, 2000, doi: 10.1016/S0001-6918(99)00050-5.

[26] A. Bangor, P. Kortum and J. Miller, ”Determining What Individual SUS Scores Mean: Adding an Adjective Rating Scale,” Journal of Usability Studies, vol. 4, no. 3, p. 118, 2009.

[27] F. W. Geels, “Technological transitions as evolutionary reconfiguration processes: a multi-level perspective and a case-study,” Research Policy, vol. 31, no. 8, pp. 1257–1274, 2002, doi: 10.1016/S0048-7333(02)00062-8.

[28] J. Mokyr, The Lever of Riches Technological Creativity and Economic Progress. New York; Oxford: Oxford University Press, 2014.

[29] J. Nordstr¨om, Evaluation of Swedish speech recognizers for spontaneous and natural speech. Stockholm: CSC Computer Science and Communi-cation, KTH Royal Technical Institute, 2007.

[30] N. Dahlb¨ack, A. J¨onsson, and L. Ahrenberg, ”Wizard of Oz studies -why and how,” Knowledge-Based Systems, vol. 6, no. 4, pp. 258–266, 1993, doi: 10.1016/0950-7051(93)90017-N.

[31] V. M¨orsell, M. Twengstr¨om, Directions with lateral systems. [Source code]. 2020. Available: https://github.com/vm-bachelor-thesis.

[32] Svenskarna och Internet 2019, Stockholm: Internetstiftelsen, 2019. [33] T. Wakita, N. Ueshima, and H. Noguchi, ”Psychological Distance

Between Categories in the Likert Scale: Comparing Different Numbers of Options,” Educational and Psychological Measurement, vol. 72, no. 4, pp. 533–546, 2012, doi: 10.1177/0013164411431162.

[34] E. M. Rogers, Diffusion of innovations, 5th ed.. London: Simon Schus-ter, 2003.

PLACE PHOTO HERE

Viktor M¨orsell is an Industrial Engineering and Management undergraduate student at KTH Royal Institute of Technology, Sweden. Viktor specializes in Computer Science and Communication.

Among Viktors contribution to this study can in-strument design, the software used for data gathering and the System Usability Score analysis be found.

PLACE PHOTO HERE

Moira Twengstr¨om is an Industrial Engineering and Management undergraduate student at KTH Royal Institute of Technology, Sweden. Moira specializes in Computer Science and Communication.

Among Moiras contribution to the study you can find background research, data collection and the Industrial Tranformation analysis.

APPENDIXA

EXPERIMENT QUESTIONNAIRE

Beskriv s˚a att en person som är i näromr˚adet hittar fram till huset. Tänk p˚a att beskrivningen ska kunna användas ˚aret om och av personer som inte hittar i näromr˚adet.

English translation

Give a description that helps a person nearby find this building. Remember that the description should be valid year round and make sense to people unfamiliar with the vicinity.

APPENDIXB SUSQUESTIONNAIRE

1) Jag tror att jag skulle vilja använda produkten ofta 2) Jag tyckte att produkten var onödigt komplicerad 3) Jag tyckte att produkten var lätt att använda

4) Jag tror att jag kommer behöva hjälp av en teknisk person för att använda produkten

5) Jag tycker att de olika funktionerna i produkten ¨ar v¨al samordnade

6) Jag tycker att det fanns f¨or mycket inkonsekvens i produkten

7) Jag kan tänka mig att de flesta skulle lära sig att använda produkten mycket snabbt

8) Jag tyckte att produkten var mycket besv¨arlig att anv¨anda

9) Jag kände mig väldigt trygg när jag använde produkten 10) Jag behövde lära mig mycket innan jag kunde komma

ig˚ang med produkten English translation

1) I think I would like to use the product often

2) I think that I would like to use this system frequently. 3) I found the system unnecessarily complex.

4) I thought the system was easy to use.

5) I think that I would need the support of a technical person to be able to use this system.

6) I found the various functions in this system were well integrated.

7) I thought there was too much inconsistency in this system.

8) I would imagine that most people would learn to use this system very quickly.

9) I found the system very cumbersome to use. 10) I felt very confident using the system.

(10)