Soft machine

(1)

Soft machine

A pattern language for interacting with machine learning algorithms

Shibashankar Sahoo

Masters Thesis in Interaction Design 2020

(2)

Abstract

The computational nature of soft computing e.g. machine learning and AI systems have been hidden by seamless interfaces for almost two decades now. It has led to the loss of control, inability to explore, and adapt to needs and privacy at an individual level to social-technical problems on a global scale. I propose a soft machine - a set of cohesive design patterns or ‘seams’ to interact with everyday ‘black-box’ algorithms.

Through participatory design and tangible sketching, I illustrate several interaction techniques to show how

people can naturally control, explore, and adapt in-context algorithmic systems. Unlike existing design

approaches, I treat machine learning as playful ‘design material’ finding moments of interplay between human common sense and statical intelligence. Further, I conceive

machine learning not as a ‘technology’

but rather as an iterative training

‘process’, which eventually changes the role of user from a passive

consumer of technology to an active trainer of algorithms.

(3)

Content

1. Introduction 2. Related work

3. Methodology overview 4. Primary research

5. Design principles 6. Design exploration 7. Concept development 8. Soft machine

9. Design proposition 10. Conclusion 11. Reflection

12. Acknowledgement 13. Appendix

14. Reference

01-06 07-13 14-19 21-26 27-28 30-45 46-55 57-60 61-69 70-71 72-73

74 75-80

81-85

(4)

1. Introduction

1.1 What is soft machines

Soft in computing, as opposed to hard computing, deals with approximate models and gives solutions to complex real-life problems. Unlike hard

computing, soft computing is tolerant of imprecision, uncertainty, partial truth, and approximations. In effect, the role model for soft computing is the human mind. Soft computing is based on techniques such as fuzzy logic, genetic algorithms, artificial neural networks, machine learning, and expert systems. [1]

Around the same time horizon, 1980s ‘soft’ in computer

machinery had a new definition

“The juxtaposition of the terms “soft”

and “machine” connotes the essence of a philosophy for the design of user- computer interfaces to interactive computer systems.

“Machine” connotes an interface which

is machine-like in appearance and operation. “Soft” connotes a machine realized through computer-generated images of controls on a high-resolution color display with a touch-sensitive screen for actuating the controls, This software realization gives us the flexibility and power to overcome the limitations of conventional machines.” [2]

Since the past two decades, a

particular branch of soft computing called machine learning has changed the landscape of the world. Some of the examples are illustrated below

1.2 Benefits of AI

The advances in Soft computing in the form of AI and machine learning systems are yielding substantial societal benefits, ranging from more efficient automation, fraud detection, content moderation, to better

matchmaking and online dating apps, to more reliable medical diagnosis and novel drug discovery.

(5)

2

1.3 Problems associated -

control, surveillance, top-down A/B testing

We have all heard about Microsoft Tay becoming racist in a day, Google’s photos tagging black people as

gorillas. Algorithms if left unchecked, can create spread disinformation, filter bubble, send people to jail, fire hard-working school teachers, feedback loops that reinforce inequality [3], and even make it difficult for individuals to escape the vicious cycle of poverty (O’Neil 2016). A common man might get a little angry even shout at an email app that doesn’t detect spam. But the scale at which ML systems are now deployed even a few percent errors is statistically significant. You simply can’t fail fast and break things along the way.

1.4 People reacting - guideline, policies, provocation

AI and Machine Learning advances have also raised many questions about the regulatory and governance mechanisms for complex algorithmic systems. Many scholars policy makers,

journalist have working on white papers, regulatory policies, legislative frameworks e.g.... GDPR, algorithm auditing, and guidelines to build responsible, fair, accountable and transparent AI systems.

1.5 Relevance to interaction design

“AI has become the new UI” [4], the role of UX is considered as “putting lipstick on the pig”. Interaction designers in Industry have moved beyond usability, utility, or interaction aesthetics to building experiences that are personalized to users’ context and persona by collecting nuanced behavioral and profile data.

Some commonly used metrics to define success are reached(how many people use it), retention(if they come back), depth (how long people stay),

(6)

sale velocity (how fast you are making money), etc. It gives rise to new

experiences [5] and practice. UX is changing whether we accept it or not.

Traditionally UX designers have been creating linear perfect user journeys that delight the user and make her life simple. But now there is no one user, instead, millions of social context are fed into an algorithm that gives rise to new context(s).

1.6 Algorithm is the new context

We must treat the Algorithm as the new context and this is where the action is. Think of a Spotify weekly, the real value and delight i.e the reason why users pay for is the weekly playlist driven by an algorithm designed by an engineer, Or a Hotel booking system, Information Architects (IA) have spent years to perfect the flows is now replaces by an ML-powered chatbot or a voice assistant that has learned behavior of millions of users. ML is a difficult, underrepresented, and unexplored material. I believe Designers don’t have a proper articulation or framing

of Machine learning, currently assume ML systems are invisible assistants like Google Home, Amazon Echo and a few (including myself in past) anthropomorphize intelligence. However, there are also small intelligence we face every day e.g. search results, similar item suggestions, time estimators. Many designers often lack understanding of ML concept (confidence, confusion, type1-false positive, type 2- false negative, ground truth) to tackle these

1.7 Failing methods and principles

Well established UX principles doesn’t work for human-AI Interaction.

For example, ML based UIs are inconsistent (e.g. vary depending on different users and context), error prone (without enough examples a parrot can be confused with guacamole or worse guerrilla with human). Well established methods doesn’t work. For example, popularly known methods Wizard of Oz(WOZ) fails terribly, as there is no way human could mimic (without using actual data) machine intelligence and

(7)

4

vice-versa . We can’t use traditional sketching, test it early since the system evolves over time. We can’t fail fast because the consequence are catastrophic at scale.

1.8 Complexity of AI-UX

Yang et al[6] describe AI- design complexity map. We would tackle the definition of AI.

Level one systems learn from a self- contained dataset. They produce a small, fixed set of outputs. We would encounter this in the chapter of explorability.

Level four systems learn from new

data even after deployment. They also produce adaptive, open-ended outputs that resist abstraction. Search engines, news feed rankers, automated email replies, a recommender system that suggests “items you might like,”

would all fit in this category. In designing such systems, designers can encounter a full range of human- AI interaction design challenges. We would tackle this in the chapter of controllability.

1.9 Scope

What’s needed for AI’s wide adoption is an understanding of how to build interfaces that put the power of these systems in the hands of their human users.

— GREG BORENSTEIN

The societal perception of AI that we watch on Hollywood or news drastically different what we

experience everyday. I believe that the issue can be tackled by starting with everyday of AI. Interaction designers can start by building fundamental primitives that people can think,

(8)

create and guide these machine

learning algorithms. That is why I start with ‘everyday algorithm as a context’.

I would situate my work interacting with everyday algorithm that picks up a song suggestions, frustrates us, everyday ‘Artificial intelligence”

that are ‘artificially imperfect’. (More on chapter Design exploration/

Controllability)

In this thesis, I focus on designing

‘tools for people to interact with everyday (unintelligible) machine learning and AI systems’. This will alleviate anxiety and frustration that rises from human interacting with particular representation of technology. The tangible sketches and digital tools I propose can mediate new relationships between users and machine learning systems. The predominant ‘alterity relations’

[7] relationship that humans have with machine learning can change into hermeneutical one through introduction of new visualisation techniques (See chapter Design exploration/Explorability)

In the process , I engage with lives

of people and their existing mental model and perception towards these systems. I immerse myself in experts domain and practices to understand the nature of algorithms from a designer’s point of view. (See chapter Primary research and synthesis) Besides, I dive deeper into literature to inform myself and how researchers have been tackling it since past

decades (See chapter Related works).

Going further into thesis, through design exploration, I would devise novel prototyping techniques to address user needs around training, exploring and adapting to machine learning systems.

“A creative

language by which we communicate with intelligent

machines will heal

our relationship

with it”

(9)

6

1.10 Research Question In this thesis, I investigate following research questions RQ1 : How can we design tools for people to train a everyday used machine learning

algorithm e.g.. recommendation systems?

RQ2 : What are visualisation and manipulation techniques to explore ML output?

RQ 3: How might we create a

pattern language for designers

working with Machine learning

(ML) systems?

(10)

2. Related work

For past two decades academia and some industry experts have been striving to make the algorithmic systems better. The field of Explainable AI (XAI) tries to understand the inner working of ‘black box’ for makers in a lab.

Similarly Interpretable AI tries to build

‘causal knobs’ that can be tinkered.

An emerging field of interactive machine learning (iML) brings humans in the loop building

trainable iterative interfaces. Due to inherently multi-stakeholder nature, the recommendation system has

recently included fairness and societal perspective into its paradigm.

From the Interaction design research, initiatives on the trainable user

interface, agentive technology,

seamful design are some fascinating fields. I situate my work mainly in interactive machine learning(iML)

while brings elements of all the fields mentioned.

1. Interpretable and explainable AI 2. Interactive (human in the loop)

machine learning

3. Participatory (Society in the loop) ML

4. Fairness ( Algorithm in the loop) 5. RecSys ( Recommendation System

) (Simple explanation on RecSys can found on Appendix III.)

2.1 Interpretable and explainable AI

In the context of machine learning and artificial intelligence, explainability and interpretability are often used interchangeably. Interpretable means to find a causal explanation of the effect that we are observing.

For example “I turn this knob and realize oh this is causing that, I get it” Interpretability is contextual interpretability does NOT concern with the inner mechanism.

On the algorithm side there are new

‘post-hoc’ techniques and ‘model- agnostic’ approaches known as LIME

(11)

8

aim to explain the outputs of any classifier, regardless of the machine learning algorithm used to train it [8]

A prominent group of researchers who are from DARPA ( people who brought us internet) have started working on explainable AI. Explainability means understanding the internal mechanism of a Complex deep Learning model and explain it in human terms.

Machine learning models are opaque, non-intuitive, and difficult for people to understand. As the complexity of machines grows (Gunning & Aha, 2019). More and more researchers and engineers are trading interpretability with performance.

For business owners, exposing algorithms may destroy their

competitive advantage, but they need an ‘explainable’ model to profit and serve better to their customers. But the end-user an explainable system can be overwhelming and come in the way of utility. Thus end-user has to provided with an ‘interpretable’

system.

There are also independent

developers who have open-sourced and contributed their work of

Explainable AI e.g. Google Tensorflow, R2D3, DistilPub [9] creating beautiful visualization and model playground.

The definition(specially interpretable) is necessary as there is a push from Policies (GDPR art15h) to provide meaningful information about the logic involved to to user.

2.2 Interactive (human in the

loop) machine learning

(12)

Interactive Machine Learning is an interaction paradigm in which a user or user group iteratively builds and refines a mathematical model to describe a concept through iterative cycles of input and review. Model refinement is driven by user input that may come in many forms, such as providing indicative samples, describing indicative features, or

otherwise selecting high-level model parameters. An archetype of the iML process is classifier training through crayon[10]

Researchers have further explored in the domain of gestural interfaces[11], steering the classification model through intuitive confusion matrix UI[12]. In recent years many design principles stressing the need of building novel interfaces, non-expert user research as vital to interactive machine learning field [13]has been proposed.

Even though old work on Mixed- Initiative UI principles [14] for agent- User collaboration is still valid, The current field of iML research has been mainly focussing on classification problems whose reasoning is

intelligible to human brain. The real world is full of nonintuitive and non- linear ‘black boxes’ that needs some serious attention.

2.3 Participatory (Society in the loop) ML

AI system has gone beyond narrow well-defined automation tasks like spam filtering to wide societal implications like influencing the political beliefs of millions of citizens.

Rahwan [site] argues for “Society-

(13)

1 0

in-the-loop,” which stresses the

importance of creating infrastructure and tools to involve societal opinions in the creation of AI. We need tools to program, debug, and maintain an algorithmic social contract, a pact between various human stakeholders, mediated by machines. [15]

The design should negotiate trade-offs between different value systems that the AI system currently embodies and can strive towards. For example, the trade-off between privacy and utility, cost and benefit, different notions of fairness. As systems are more complex quantifying these tradeoffs are difficult. Further work explores how people can collectively build ideal algorithmic governance mechanisms for their own communities

2.4 Fairness ( Algorithm in the loop)

There’s plenty of definition of bias and fairness out there. Because a predictive algorithm is designed to be fair, resourceful, and beneficial to the owner. You can NOT simply satisfy all the definitions of fairness.[16]

In this thesis, we will focus mainly on content discovery marketplaces e.g.

Spotify, Pinterest, Behance. Due to the nature of these platforms, we would consider two definitions of fairness First, fair allocation, or exposure to artists/ creators who belong to the less exposed long tail. Second, Geodiversity fairness - output that doesn’t just exhibit amerocentric and eurocentric representation bias.

2.5 RecSys (Recommendation System )

In a layman term, RecSys

(recommendation system) is an application of machine learning that helps to filter information for people.

In early 1990s, These systems were built to reduce information overload by building models to predict how much a user would like a certain set of items. [17]

The concept has since seeped into every aspect of our life. Some common applications include filtering spam in email, prediction in social media that brings you news based on what you

(14)

or your friends like, maps that route you to your destinations efficiently.

The recommendation system could come in various forms. For example, Manual ads( marketer suggesting products), Product associations (what items go together), content- based( learning single user profile), Collaborative technique( learning from other’s profile). Even though the field has grown, underlying algorithms haven’t changed much.

Most modern recommendation systems use a strategy popularly known as collaborative filtering which recognizes commonalities between users(user-user) or between items (item-item) on the basis of explicit feedback (ratings, tags, etc.) or implicit tracking (actions like time spent, reading, downloading). Collaborative filtering can add more dimensions like user attributes (demography, preference), item attributes (Name, metadata) fed into deep neural networks.

There are also online algorithms like multi arm contextual bandits that uses reinforcement learning to recommend items on the go. It tries

to balance learning user preference by letting user explore and suggesting content that user would definitely like. This kind of algorithm can be used to optimise for long term goals like user retention, fairness etc. In case of a newly launched product, it is challenging is to find new users or items due to insufficient data.

This is called cold- start problem - a fertile ground for human biases and normative behaviour.

A commonly used algorithm used behind this called k-nearest neighbor.

Intuitively, this algorithm predicts how likely a user would like an item.

It achieves it by finding another user in the neighborhood of similar taste e.g. cosine similarity. The success of an algorithm is negotiated and determined by prediction accuracy, hit rate ( percentage of recommendation converted to sales).

But algorithms only contributes to 5% for commercial success and the majority is driven by interactive components (50%) such as input preference or elicitation and output to the system i.e. the way recommendation is presented

(15)

1 2

There is a significant challenge in doing human-centered design and research in this domain due to its temporal nature, inability distinguish between whether frustration is caused by algorithm OR UI problems, human biases, normative behavior, the

definition of what’s good changes over time, inconsistency in feedback.

2.6 Seamful design

Seamful design accepts that

technology has limits, and instead of disguising these limits to the user,

‘pulls back the curtain’ to give the user greater understanding of how systems works[18]

As experience creators, we should be mindful that showing the seams in our systems and processes, is not always a negative. In certain scenarios, our systems should make it a little harder for users to achieve their objective or ‘show the working’ of behind the scenes processes. Users can then make conscious decisions on their next steps, or at least understand how the system has arrived at a conclusion.

Systems can demonstrate how they are catering to users, and allow them to make judgements on the best course of action. The challenge is understanding when and where these points are, and the most appropriate way to inform and engage the user, so they aren’t completely removed from an easy to use experience.

If users can trust that the systems they are interacting with aren’t attempting to hoodwink or mislead them, and that they have ‘control’ over their experience, it can only strengthen a brand’s bond with its customers.

In a marketplace where everyone is trying to make things easier for their users, maybe the key to gaining a competitive edge could be making things a little bit harder instead.”[19]

(16)

(17)

1 4

3. Methodology Overview

This section describes the motivation behind the methodologies used in the thesis to address the research questions. Why did I use the method?

How it actually contribute to your questions?

The aesthetics of AI and Machine learning systems in design practice is driven by top-down, industry- driven, A/B testing approach. In the academic field, there’s research on computer science (CS) i.e. solve through optimization and Science and technology studies (STS )that strives either through societal consequence to working with design for machine learning. In this thesis, I approach it from design exploration [[20]] i.e.

Research through design (RtD) that includes sketching and participatory design, a design-led approach to dive into a new field is untouched.

RQ1 : How can we design tools for people to train a everyday used machine learning algorithm e.g.

recommendation systems?

3.1 Participatory Design

Participatory design has always been political and it has always been about taking the side of people who have a weaker voice or weak stakeholders.

The traditions are rooted in

democratization at workplaces with the introduction of new technology in Scandinavia. Due to the inherently participatory nature of ML systems, I borrowed several methodologies from these practices. Furthermore, A new way of looking at the ML concepts is through engaging novice users as participants as co-explorers, co- designers, co-researchers with various tangible kits and workshops.

RQ2 : What are visualisation and manipulation techniques to explore ML output?

(18)

3.2 Design Sketching

“What I hear I forget. What I see, I remember. What I do, I understand!”

- LAO TSE

Without worrying about what features required to build a product. Design sketching helps me to visualize a scenario in an engaging way. It sparks an invisible conversation between material, maker, and the user who experiences it. Design sketches in a broader sense not only help to share knowledge but also anticipated future needs and desires of users.

In a collaborative process, making tangible sketches helps to share early in the process with collaborators, to get early and quick feedback. In interaction design ‘Design by doing’

and ‘design by playing’ is considered as fundamental form of inquiry [21]

Since the past decade, this approach making just a tool for the designer but engaging participants has been widely popular design methodology.

Various approaches (cultural probes, generative toolkits, and design

prototypes), mindset (designing for people, and designing with people)[22]

is still relevant for exploring a machine intelligence

RQ 3: How might we create a pattern language for designers working with ML to design better user experience(UX)?

3.3 Automata Design or Component strategy

In-order to build a working machine, there are a series of steps that need to be done in order before one can really start experimenting. As well, the combination of parts that need to

(19)

1 6

be stable and parts that require free motion requires delicate tweaking and often times lots of frustration before getting the machine to work [23].

An analogous approach from the automata design (LEGO block

approach) has been used for working with machine intelligence. Even though big-data is not accessible or the deep-leaning models are not available to design students.

A designer can work with the

fundamental property of data, output (see chapter on Design exploration).

Due to COVID’19, these explorations were however disrupted. The

participants could not experience

the physical dynamics of interaction.

However, learning was transferred into digital software (Figma sketches) where users could explore things in digital. This change in the approach was indeed helpful in retrospect since most of these systems are primarily digital, and keeping them digital makes it believable. Due to this constraint, the outcome covers an even broader spectrum i.e. “a product Serving that can be implemented tomorrow” and “Provoking a speculative future”.

(20)

3.4 Pattern language

Even-though these systems have been for two decades now, Soft machine strives to establish a new interaction paradigm for algorithmic interaction.

Hence, A meta-design strategy [24]

for infrastucturing is taken into design methodology. It is popularly known as a pattern language. The idea originates from the renowned work of architect Christopher Alexander in the 1970s on a pattern language.

A pattern language is an organized and coherent set of patterns[25], each of which describes a problem, context, and the core of a solution that can be used in many ways within a specific field of expertise. The term was coined by architect Christopher Alexander and popularised by his 1977 book A Pattern Language. While most people confuse that patterns are something that needs to be invented.

Patterns are something that needs to be found in society, nature, and further refined iterated upon. An important aspect of design patterns is to identify and document the key ideas that make a good system

different from a poor system. Patterns and pattern languages have widely adopted by both software engineering (in Object-oriented programming) and by human-computer interaction researchers and practitioners.

A pattern language should represent a set of values and rationale behind it.

In the case of Christofer alexander, it was making people feel more alive and embed feelings in objects.

Making wholeness heals the maker

- CHRISTOPHER ALEXANDER

During, the entire thesis, I keep of log of things I learn about people, literature and machine intelligence.

Towards the end of the thesis, I

(21)

1 8

compile this in the form of a pattern library which I call seams - “a pattern language to interact with machine learning systems”

RQ1 : How can we design tools for people to train a everyday used machine learning algorithm e.g.

recommendation systems?

3.5 Synthesis methodology

“I saw this”+ “I know this” = Insights + Design patterns = Concepts

Although design patterns are known heuristics, In this thesis, a set of new or rather discovered design patterns were used for product development.

Popularly known as ‘dogfooding’

widely used by software companies in the early days of personal computers.

My earlier linear approach from building principles, iterating applying in a product scenario was changed in the middle of the thesis with my academic Mentor(s).

I revamped my strategy to more

‘sprint’ like approach to make

constant and incremental progress in the thesis to avoid “analysis paralysis”,

“Rushing for the concept in the end”, and more importantly to be resilient to the uncertain situation of the world we live.

Atomic components of system are sketched quickly to accumulate

learnings both about users perception, and about the intended and

unintended outcomes (schema is described below). Rapid incremental cycles of development Instead of long development cycles is a process of developing exploring a new paradigm was truly beneficial. Just like any design project, It is a huge field and one can learn something new almost everyday, being agile enough to accommodate changes to parts of design. That is why in this thesis, readers won’t find a traditional ‘design thinking’ methodology. The themes like controllability, explorability and adaptability are updated in parallel though some more than other due to constraint of resources and time.

With a lack of purpose anything

seems like step forward.

(22)

(23)

2 0

(24)

4. Primary Research

Total of 24+ perspectives

14 user with lived experiences 10 user with expert opinions + Additional anonymous users

4.1 User stories

In this section of my thesis, I focus on people’s perception of algorithm i.e. their beliefs, habits, bitter-sweet relationship with algorithmic systems that emerges over time, beyond a single press of a like button. For the end-user, there is no distinction

between, automation, heuristic-based AI, or deep learning. All they care is the experience wrapped around the black box.

An algorithmic system like Youtube, Spotify, Pinterest, Google uses

machine learning to curate, filter, and present information in the form of a video, music, image, text. These algorithms operate in an opaque black-box manner. Survey details can be found on Appendix I & II

4.1.1. Visual Perception of an algorithm

Jarvis, a sack of rotten apples, Spiders and more

We reason about what invisible via analogy and what Lynch calls

‘sketching’. To uncover underlying perceptual features in urban space, Kevin Lynch, an urban planner, asked local participants to draw a map or sketch of a space for a foreigner.

He discovered that these often fragmented mental maps revealed people’s personal histories and socio- economic conditions of the space. His findings are at the core of good urban design [26]

I use this analogy to investigate machine intelligence

A user when asked to draw, sketch, or verbally talk about algorithms they

(25)

2 2

used in their daily lives. They would reference various ‘cool’ and ‘creepy’

images. For example - An

avid user (p06) of Pinterest said: “I don’t understand how it works but it must look like spider webs with lots of information channels”. He further added “Its a random sack, if you are lucky you get something, if not you are going to have a lot of rotten apples”

Another user (P03), when asked about

Spotify, said - “An algorithm that analyses my behavior like Jarvis in Ironman”. When he was asked to draw it. It was completely a different story.

(26)

4.1.2. Theory of inner-working

A Song thief, Encyclopedia, nodes, and decision tree

Even though, Primary users were from a non-technical background( age 20- 28). They had strong theories about how the system works. A user(P02) who recently saved her dying plant using a computer vision-enabled app said - “I think it looks at the pixels of the photo I take, goes through an encyclopedia of images and if it matches then shows me the plant name and ways to save it”. Another user would draw a decision tree (a popular ML algorithm) like structure to describe how a recommendation of Kendrick Lamar is made to him and how the algorithm steals from his friend (Collaborative filtering)

Algorithm steals from my friend

-A SPOTIFY USER

Note- there was disconnect how people think it works and what they feel about it

4.1.3. Existing ways of control

three dots, less and more, like, let it play.

People have developed their ways to control the systems. (See illustraions).

However, a lot of them feel it doesn’t work.

(27)

2 4

4.1.4. A tendency to train

End-users of recommendation system have a natural tendency to make it better.

“I will pin & view as many images as possible to get my feed get there”

“They should have a incognito browsing mode or …some filters to change my feed” - p01

“I want to be more vocal about what I want” -p05

4.1.5. Need for serendipity

Users have need to explore beyond what’s recommended and don’t want to be boxed. Yet some users believe that they want to be on the edge of their taste not too far not too consistent. Thus A handle of consistency and serendipity is crucial.

“Cause it’s also nice those small

glitches( algorithmic inaccuracy) helps me to find something new” - p04

4.1.6. Algorithmic anxiety

Minefield, Disease, hacked

Due to lack of control users feel a strong sense on anxiety using recommendation systems.

“On Pinterest, I have a feeling like walking on a minefield”

“My feed sometimes populate like disease” - p01

“I am ashamed of my guilty pleasures popping on my top feed. I don’t want to show my youtube to anyone”- p01

(28)

“Deep learning is hot but it still needs people to navigate it”

“humans are highly accurate at this(fraud detection) over 90-95%”

4.2.2- An unintuitive black box

Machine learning tuning process is often unintelligible and hard to grasp even for experts.

“An unintuitive language interface is also bad on the expert side […]

three lines code then training then evaluation[…] with no idea what’s going on”

“Parameters ( weights, how many times you run) are hard to get for new students.” - PROFESSOR, UMEA UNIVERSITY

“Hyper-parameter tuning is often magic and doesn’t make any intuitive sense” - CEO, AI BASED HIRING, INDIA

4.2.3. Plug and play

An experts training process looks much like an iterative plug and play method. Experts use their experience,

4.2 Expert Perspective

Due to complexity and ‘folk’

knowledge involved when training models, there’s significantly less number of people (who make this systems) to look under the hood.

Following section dives deeper into practices of makers treating ‘machine learning as process’. Detailed

description can be found in the Appendix IV

4.2.1. Need for human

ML practitioners think that there’s a need for human experts in EVERY task. Some experts even discredit machine learning as an expert field since the major part of the process is driven by a human. This hints the focus on end-user involvement in Machine Learning process.

“I don’t think there

should be a thing such as machine learning expert”

- APPLIED SCIENTIST AT AMAZON, USA

(29)

2 6

folk knowledge and intuition.

“I use plug & play model keeping the same pipeline” - PHD SCHOLAR, USA

4.2.4. Quality of data

They often rely on a preferred pipeline trusting more on quality of data than on the algorithm.

“Good quality of data and a pre- trained CNN-that’s all you need” - BIOPHARMA ENGINEER, UMEA

4.2.5. Machine learning is not a

‘technology’ but a process

Explanation on each of this step can be found in the Appendix IV

4.3 Reflection

After the primary research it was evident that the emphasis needs to given to algorithmic system in the lived experiences of people’s lives, The previous assumptions were broken

Broken Assumption 1: We would democratise machine learning by

visualising algorithms in an aesthetic way

through tangible interactive tools

Commonly used algorithmic system are inherently complex. And

makers often trade performance with intelligibility. We should move away from making algorithms more transparent to building input and output of the algorithm more transparent.

Broken Assumption 2 : A new language to communicate with machine learning algorithms that surrounds us everyday will make things better

This assumption was too broad and hard to realise. Language is not something learned but actively practised through a dialogue with the system which can be done by active effort and collaboration between user and algorithm.

(30)

5. Design Principles

Based on primary and secondary research a set of design principles were generated - Some design principles clustered into 3

‘meaningfully’ distinct theme were generated and consecutively design opportunities were gathered

5.1 Controllability

1. Leverage the existing sense of control

A good design should leverage existing ways to training that exists ways of feedback and training.

2.Allow continuous and deliberate input

A good design should seek feedback from its trainer in a continuous fashion letting the user be deliberate about it.

3. Humanize the tuning process in machine learning

A good design should strive for

humane tuning mechanisms that are easy and natural for human brain.

4. Switch the role from ‘user’ to

‘trainer’

A good design should bring in a behavior change. i.e. A role of user should change from a passive

consumer of ML outputs to an active trainer of algorithm.

(31)

2 8

5.2 Explorability

5. Balance consistency and serendipity

A good design should let the user find facilitate the trade-off between consistently relevant and serendipity (Accidentally encountering familiar yet surprising). It should further make the consequence of this choice visible to its user.

6. Approach black box as a plug and play trial and error puzzle that needs solving

A good design should retain and even enhance the values such as failure, exploration, iterative nature inherent to ML algorithms.

7. Reduce algorithmic anxiety

A good design should alleviate algorithmic anxiety caused by

algorithms by reinforcing interaction aesthetics like playfulness and

criticism.

5.3 Adaptability

8. Accommodate multiple persona and context

A good design should accommodate people as a whole. That is to say their changing concept about things, their change in context, change in social circle, and own-self.

9. Bring fuzzy quality into expression.

A good design should give user handles and bring ungraspable and fuzziness of algorithms to expression

(32)

(33)

3 0

6. Design Exploration

This section dives into how abstract design principles and complex machine learning concepts are realised in the form of concrete artifacts.

How to prototype interaction

that follow an unpredictable

course?

(34)

6.1 Controllability - Design for Control

Background

In many machine learning systems users have an idea or concept they want to teach a machine. whether its a playlist they like, an email they want to go to junk, an particular type of cell that causes cancer, a type of news that’s appropriate for audience.

Machine learning system often misidentify this concept.

Currently, users provide feedback explicitly(saving, rating, liking), implicitly (engaging, clicking, zooming, downloading) or

socially(following). User demand more

control over how the algorithm works on their behalf, even though it require much effort form their side[27].

In the following section we would explore the design space of

controllability. There are broadly three kind of controls explicit, implicit and responsive to tune or give feedback to algorithms.

Sketching

Explicit controls are direct ways of manipulating an algorithmic system.

It is often used to capture people’s feedback (what I like / don’t like) Initial sketching of explicit controls.

SKETCH ONE - A USUAL FLOW

A commonly used visual discovery engine called ‘Pinterest’. Moodboard algorithms were chosen because of

(35)

3 2

its visual nature. In an algorithmic context of ‘similar suggestions’

(discussed in Related work), Users often get suggestions, right after creating a mood board. The current feedback mechanisms are ‘save’ and

‘hide’ (See figure)

Feedback mechanism to

Pinterest algorithm is skewed, and many users don’t even interact with them explicitly.

“I am scared that pressing this button on a mood board would influence my whole feed” - A participant

“I would never press hide, as I don’t want to loose anything”

- A PARTICIPANT

SKETCH TWO - WEIGHT

From the previous sketch it was observed that users don’t see suggestions as individual but as a whole (See Pattern Wholeness).

This sketch helped to represent

explicit feedback with new metaphors.

Using weights of tangible objects as a proxy for importance or priority.

SKETCH 3 - FEEDBACK PATTERNS

Based on the above reactions, from users, A further WoZ of Pinterest

(36)

was designed. Five participants were asked to provide feedback to an algorithm using multicolored lego blocks as a proxy for similar suggestions

Reflection

No negative

People don’t like to provide negative feedback, even though vital part of guiding algorithm is giving enough negative examples

Critique mechanisms

Unlike save or hide button, People want to be more deliberate about guiding algorithms e.g. sorting,

(37)

3 4

ranking, piling, clustering were some of the emerging ways of guidance - Users don’t understand the SCOPE of a actionable button. (The scope of an explicit feedback is beyond the scope of the thesis)

Beyond critique to training

Soon I realised that giving feedback to algorithm was only a part of training process. A process that involves showing examples, tuning, re-framing, observing, evaluating iteratively. However, giving feedback to the learner is core part of training experience. The chapter of concept development dives deeper into it.

Explicit controls only a part bigger picture of training a machine learning algorithms.

For discontinued directions (implicit and responsive / temporal controls)

see Appendix V and VII.

6.2 Explorabililty - Design for Discovery

Explorability is defined as presenting visualization to browse the entire information space, e.g. related items that are not recommended.

In contrast, to say search where information need is more strict, music RecSys like Spotify or image RecSys like Pinterest are more lenient.

Background

Kevin Lynch, the city-planner, believed that the best urban design supports not only efficient paths but mental maps. He called this

(38)

quality “imageability”. Bill Verplank, interaction designer describes them as path vs. map approach. A novice needs a path i.e knowing one step at a time, but a learner might benefit from map like knowledge.[28]

Since past two decades, users have become accustomed to path like interfaces e.g. infinite scroll.

They express frustration around

1. Lack of overview,

2. Hard to keep a track of browsing recommendation,

3. Need for a nuanced criteria of navigation.

Sketching

In this section I explore the interplay between ML statistical intelligence and human common sense intelligence.

Traditional software( designed GUI) are deterministic, where as ML systems are probabilistic. They require different kind of ways of navigation, interfaces and material to sketch with.

SKETCH ONE- DOTS

A fundamental component of explorabilty are called dots i.e.

semantically distinct data points that users navigate in any recommendation feed. In machine learning, this is known as embeddings¹. Think of it as a bag of words, images, songs, people, numbers that can be categorized in many ways. However, Interacting with 200 billion images is not reasonable so direct manipulation is not scalable.

Shneiderman identifies an effective process for exploring data:

“Overview first, zoom and filter, then details-on-

demand”[29]

Shown below are design sketches to conceptualise interaction with vector embeddings

Reflection

Anchor - The golden pin is like an anchor changes as I click next

1. The Beginner’s Guide to Dimensionality Reduction. (2020). https://idyll.pub/

(39)

3 6

Boundary- It revolves on the edge on my taste. can it let me find new things by rolling?

Similarity - Current feed is all mixed up I would like some sort of categories may be 3-5 pin based on some

Pattern(s)

1. Anchors: anchor users to a

reference which they can change as they wish

2. Boundary: Make the boundary of recommendation bubble visible/

tangible to users.

3. Vectors: Show similar items in one direction

Trace

“Keep a track of things I have browsed, let me know where I have lost track. how I have diverged”

- A PARTICIPANT

(40)

(41)

3 8

Beyond the cold hard data points or numbers that powers the feed, People actually feel a strong

sense of intuition regarding navigating them. Shown

below are few sketches to give expression to navigation

Clicking manipulates whats coming next its an ongoing feed ”Its annoying to see the history is lost. I sometimes go to browser history to find what I was looking for its difficult.”

“We should have some tags of recommendation and corresponding

suggestions that I can go back to”

Ant navigation as an analogy to navigating recommendation output

(42)

ANT ANALOGY FROM A PARTICIPANT

Ants can navigate over long distances between their nest and food sites using visual cues. Their internal counter is like a pedometer which allows them to find a food source and find their way home. They also leaving chemical trails for their fellow ants.

It’s a key part of their survival. The scientists theorize that the internal counter resets each time they go back to their nest. Further, The ants’ eyes have a wide-angle of view—they have nearly 360° vision, whereas humans can only see about one-third of their surroundings without turning their heads

This analogy was further integrated and developed in concept

(43)

4 0

development phase.

Pattern(s)

Traces - Allow temporary tags during navigation in the recommendation system

Range

Algorithm often involves more people than just the consumer of output e.g an artist might want to find the right ear for her music or a user might want to balance his relevance and serendipity. Sketches on this with Spotify can be found in the Appendix VI.

Help me to explore hidden gems, surprise me yet

being on the edge of my taste

- A PARTICIPANT

Its NOT possible to satisfy this user desire without negotiating with others.

Equitably optimisation

You can’t be user centric when it comes marketplace that involves stakeholders.

We often assume that when it comes to market places it is a zero sum game i.e.

one has to sacrifice for others benefits. But research has shown that marketplaces like Spotify can create a win- win relationship with business, artists, and end users by equitably optimising for all.

- SPOTIFY RESEARCH

(44)

During this phase, the material was hard to gather due to NDA by a lot of the companies. However some publicly available research has shown that it is indeed possible to negotiate trade-off between different stakeholders in a marketplace

and there is a sweet spot where exploration and exploitation.

Currently, this is implemented algorithmically, where stakeholders are given either equitable or weighted benefits.[30]

Platform- Spotify , Amazon , Airbnb - profitable

Supplier- artists , delivery guy , host - fair chance

User - listener, eater , traveler - you choose!

Given a chance, each of these

stakeholders can became this octopus.

As one user said why would I listen to not so personalized songs. Even- though this means fairness to long tail or under exposed artist a chance.

I believe we can surface these issues through giving users a choice. and most importantly deciding the system default.

That is why, I approach the negotiation of values through use of ‘normative ethics tools’ to enforce desired outcome that rational actors would be willing to opt into. For instance a fairness check. More on this on concept development chapter.

Pattern(s)

Range - Giving access to user’s

preference for relevance and diversity of recommendation

Trade-off — Make aware of other actors involved in the algorithms and how the individual choices affect others involved.

(45)

4 2

6.3 Adaptability - Design for Adaptation

Adaptability of the system is how the recommendations change in response to changes in a user’s profile, time, and context ( both physical & digital).

This is the core of personalization.

A key challenge for recommender systems is to find ways to improve recommender systems while still respecting individual privacy.

Design sketching

SKETCH 1 - DATA FILTERS

A set of initial recommendation system mock-ups were created to investigate this. A hotel recommender that shows what changes upon

selecting if you could control social and environmental filters.

Participants really like to see how their actions or data filters affect the recommendation they achieve in the same context. Further, a memory of what was suggested before and how it has changed gave them recognition rather that recalls

SKETCH 2 - SOCIAL FILTERS

An initial set of controls was

prototyped in order to check people’s perception of data filters to a music recommendation platform e.g. Spotify.

(46)

Bring in the social component into the tuning of algorithms

e.g Compare, mix and match your taste + friend’s taste + Culture

- A PARTICIPANT

Due to interest of people in social collaboration only, The prototype was then iterated just to focus on social aspect of recommendation. ‘cosine roller’ provides a new affordance for closeness, sameness, following, collaboration.

“I can see how close I am to a friend’s taste”

“Can I follow the dot of my friend and change my music taste”

Reflection

Personas

People desired to choose different personas. Primary user research

(47)

4 4

revealed that users’ concerns related to their personas (tunnel me, morning me, mood me). However, industry experts suggested that these user data lie in the latent vector space of algorithms often hard to surface unless actively looked for. This vector information ¹ often contains demographic attributes, behavioral information, users’ personally

identifiable information (PII), mobile device IDs, and web tracking pixels.

From this information, Google finds what it is called ‘Similar Audiences’, and on Pinterest, it is called ‘Actalike Audiences’[31]

Physical context

In the particular use case of Spotify, users don’t necessarily care about the location as a contextual input

Social context (Micro community/

Curators)

Unlike friends and followers that prevails current social media

application, users demanded micro- communities they could choose. Those micro-communities can help them

1 Every person is represented by a series of numbers in vector space by Daniel Smilkov, Fernanda Viégas, Martin Wattenberg, and the Big Picture team at Google

(48)

to have curation and have a more intimate conversation.

As one user of Spotify said, “I would rather have my Spotify influenced by a small group of people that I trust e.g. my concert buddies, rather than all my friends; as a user said

“Honestly, I don’t trust my friend’s taste”.

This would lead to a pattern known as

‘curation’ in later chapters.

(49)

4 6

You are unpacking a lot of difficult problems in the industry. I can see how a thing like this could generalize to all recommendation and not just Spotify

-MAT BUDELMAN, SR. PRODUCT DESIGNER, PERSONALISATION, SPOTIFY

7. Concept

development

(50)

7.1 Controllability

See Appendix IX for early iteration of controllability in the context of Pinterest and further iteration and testing on maze platform.

7.1.1 ML is an active training Process:

Pipeline of machine training was established. A schematic is described below.

A detailed explanations of all the features is described in Design propostion.

“I feel like I consciously and actively teach it now”

“Its an personal and informative experience”

“I feel more in control and powerful”

“There’s a element of surprise in this process”.

(51)

4 8

7.1.2 Various critique mechanisms

Users preferred to train an algorithm through say Ranking, Piling,

Keywords, Crayons in the image-based machine learning system.

These mechanisms enforce different quality of interaction.

For example, a keyword may be personal but sorting and ranking might seem like a labor work

“It depends on the size of my mood

(52)

board. I think sorting a mood board with a lot of pictures can be tiresome, and I would then maybe prefer to sort

‘piles/groups’ of pictures instead of individual pictures. The ‘keyword’

seems more personal to me, which is nice. It allows you to express yourself whereas sorting may seem more like labor work.”

“Its more deliberate now - sorting and piling had a physicality to it just like deck of cards solitaire game. I can go on auto pilot and therapeutic state of doing this.”

“Range feels like Out of the cave experience”

Most recently, Cai et al [32] have

also found usefulness refinement mechanisms in a diagnostic empowered clinicians to test,

understand, and grapple with opaque algorithms. This reinforces the

universality of critique mechanisms proposed.

Detailed user feedback of user can be found in Appendix IX.

7.1.3 Collective critiquing

In one feedback criteria ‘Critique through keywords’ illustrated above, Users don’t trust the keywords

suggested by an algorithm - they think it would be objective, However, they

(53)

5 0

trust themselves and their micro- community to suggest these keywords.

Machine generated

keyword would be dumb, For specific search, I

would add my own If I have no idea[...]I would go with the community key words

7.1.4 Similar and controllable perspective

Since users needed an influence from their micro-community or curators to influence their algorithm. Along with

critique criteria of called keywords, an option for a similar community closed friends were introduced.

7.1.5 Dissimilar, uncontrollable and anti perspective

We have tons of examples in history where a crowd’s wisdom leading to isolated bubbles In order to break that I started introducing difference in perspectives

Some initial iterations are shown below

However during the process some sketches were turned out to be value judgmental kind.

“I feel it is judgmental

telling me that I don’t

look east. I might as

well be just training for

Scandinavian style

(54)

(55)

5 2

training

People’s notion of things, concept change over time. The system needs to take such a change into account and provide mechanisms to review and fine-tune.

This can be done either by providing a negative nudge like “is this still valid?”

or neutral nudge like “the examples are getting old”. A machine persona (learner persona) is vital to a trainer.

He/she can keep track of

7.1.7 Machine persona

How the machine is progressing over time? A record of what sort of images goes into the algorithm, having the ability to control it (data control).

what proportion of good images are not suggested (False negative), what proportion of bad images are suggested (false positive). A great way to maintain the quality of this report is to give enough ‘negative’ examples to an algorithm. (More sketches for machine persona could be found in Appendix VIII)

- A PARTICIPANT

The second iteration is more subtle, a neutral perspective.

7.1.6 Change over time aspect of

(56)

(57)

5 4

7.2 Explorability

Gathering learning from previous design exploration the concepts like anchors, boundary and similarity was implemented in context. The following images are for illustrative purpose only.

(58)

7.3 Adaptability

In context adapting to digital persona, and negotiating with utility.New data control martketplaces can emerge as the platform evolves over time

33 bits of entropy

There are only 6.6 billion people in the world, so you only need 33 bits of information about a person to determine who they are.

Hence, privacy buttons could be

redesigned as gradients of negotiation.

Instead of saying if you want to share your data or not. We could say how much instead a user want to share things and in what context i.e.

contextual data control.

(59)

5 6

(60)

8. Soft machine - A pattern

language

RQ 3: How might we create a pattern language for designers working with ML?

Patterns have always been widely used as actionable and tangible assets in the field of architecture, computer science, design, and many other fields. In UX there are several pattern libraries like Apple Human Interface Guidelines, Google Material Design Library which has an extensive list of good practices, look and feel to design better UX. Moreover, there is also a recent trend towards building a Human-AI guideline (e.g. Microsoft, Google PAIR), etc. However, there is no such guideline or library that takes various components of the algorithm

into consideration. I conceive 16 fundamental patterns as a building block for designing with machine learning systems.

1. Curation —Machine learning

systems are inherently participatory, User often rely on their ‘micro-

community’s choice to curate the input to algorithm. Even-though, curation is helpful but may NOT always be preferable[33]. Hence, Persona or Range should be integrated

2. Critique—Unlike, save and hide feedback on existing interfaces. User prefer more subtle, nuanced, granular, deliberate and positive feedback and refinement tools for machine learning system. These feedback are based on weight of group, concept hierarchy of suggestions.

3. Explainability —User

understanding of the reasoning behind a recommendation, It help users to calibrate their confidence the recommendation. A tight coupling of critique is needed during refining outputs

(61)

5 8

4. Anchors — Users like to anchored to an initial data point that can further change.

5. Boundary — User should be able to perceive the boundary of recommendation system

6. Trace— People often navigate in recommendation system following visual or semantic thread. ML algorithms currently do NOT allow users to leave a trace of this path leaving user lost and frustrated.

people can not come back and revisit the recommendation. Unlike general history (browser history, past search history) traces are records not just the item but corresponding recommendation

7. Range —people constantly negotiate between their choices of relevant and diversity. which is prime contributes to fairness in marketplace.

8. Persona —Users believe in his/

her digital persona that needs change based on the context he/she is in the kind of data she provides at that particular moment In user’s perception, every recommendation

system has a certain personality.

e.g. Youtube is someone who keeps reminding you of things you like while Spotify is someone who wants to keep you fresh. ( See responsiveness). On the other hand.

9. Closeness — Machine learning algorithm sees similarity and closeness between things, people projected an multidimensional embedding space (often calculated through cosine similarity). Even- though, algorithmic system makes decision solely on algorithmic

closeness This notion of ‘closeness’ is often ungraspable and different than human perception of closeness.

10. Memory—A user should be able control in the input data recording mechanism contextually e.g pause, erase, shutdown, accelerate,

decelerate etc.

11. Machine Confidence — Uncertainty of a predictive algorithm often

confused with intensity by users. An appropriate visualisation respecting the context is still a challenge in UX.

(62)

12. Responsiveness— Users perceive the ‘responsiveness’ of an algorithm towards input. There are two kind of responsiveness. First, *Content responsiveness*- A highly content responsive algorithm is Spotify daily mix that updates fresh content, whereas a low content responsive algorithm is Youtube that anchors to the past videos watched. Second, Time responsive- A highly time responsive algorithms are smart feeds could be Pinterest smart feed, where as a slowly changing, less responsive algorithms is Spotify weekly.

13. Wholeness— People judge the system not on quality of individual recommendation but provided list as a whole. However, few salient recommendation pushes people for extreme positive or negative feedback.

Thus critique mechanisms should be built both on individual and group level.

14. Anticipatory feedback — ML algorithms anticipate feedback from users by learning from past behaviours (e.g. auto complete next word you are going to type, suggested

images, songs etc.) Some algorithms provide this anticipatory feedback more forcefully than others(See Force). Anticipatory feedback are fertile ground for societal norms and bias.

15. Force —People feel a strong sense of pulling (e.g. search) , pushing (e.g. ads), pulling and pushing ( recommendation systems) , flowing ( browsing). This force can be

moderated by the nature of model.

16. Bias —While refinement tools helped users test for ML errors, a potential risk of refinement is confirmation bias.

(63)

6 0

(64)

9. Design Proposition

As a design proposition for the thesis I propose Seams - An In-app AI training platform for all your

applications. Following are some links to interactive content to engage with the concept.

INTERACTIVE MOCK UP

(65)

6 2

ONBOARDING 60 SEC THESIS

(66)

KEY FEATURES ALL TUNING TOOLS

(67)

6 4

SYSTEM ARCHITECTURE

(68)

(69)

6 6

outside testing them for problems and harms without the cooperation of Online platform providers.

By opting-in to an audit, many businesses believe they’re getting early insight into tools that will eventually be required by regulators.

Companies have long been required to issue audited financial statements for the benefit of financial markets and other stakeholders. That’s because

— like algorithms — companies’

internal operations appear as “black

BUSINESS MODEL

“An infrastructure, like railroad tracks or the Internet is not reinvented every time, but is

‘sunk into’ other socio-material structures and only accessible by membership in a specific community-of-practice”

- PELLE EHN

Many companies are starting to build their internal ethical board in the product teams. An emerging area called“algorithm auditing” [34]allows researchers, designers, and users new ways to influence algorithms from

(70)

boxes” to those on the outside. This gives managers an informational advantage over the investing public which could be abused by unethical actors. Requiring managers to report periodically on their operations provides a check on that advantage.

To bolster the trustworthiness of these reports, independent auditors are hired to provide reasonable assurance that the reports coming from the

“black box” are free of material misstatement. Should we not subject societally impactful “black box”

algorithms to comparable scrutiny?

[35]