• No results found

Big Data Models and Artificial Intelligence in COVID-19: A Systematic Literature Review

N/A
N/A
Protected

Academic year: 2022

Share "Big Data Models and Artificial Intelligence in COVID-19: A Systematic Literature Review"

Copied!
51
0
0

Loading.... (view fulltext now)

Full text

(1)

Bachelor Degree Project

Big data models and artificial intelligence in COVID-19: a systematic literature review

- Big data-modeller och artificiell

intelligens i COVID-19: en systematisk litteraturöversikt

(2)

Abstract

The study aims to identify the role of artificial intelligence and big data that can help us to confront the COVID-19. The study adopted literature review by reading and analyzing academic studies. This study was divided into two parts: a theoretical part that deals with the basic concepts of artificial intelligence and big data, and an analytical section that deals with reviewing and analyzing artificial intelligence and big data applications to confront COVID-19, including Contact Tracing Apps, then find the weaknesses and develop a recommendation list. The paper concludes that artificial intelligence and big data applications and apps could help to confront COVID-19 to some extent. However, artificial intelligence and big data are in the first steps. Moreover, they have not yet had a significant impact on controlling the COVID-19 since some issues and challenges hamper the use of these technologies like accuracy and public trust, etc. Hard work from governments is required in order to overcome these challenges in the first place. It is doubtful that these challenges will be addressed during the COVID-19. However, it is a great learning experience and an opportunity to develop our technologies to overcome future pandemics.

Keywords: COVID-19, Artificial intelligence, Big data, Contact tracing apps, COVID-19 applications

(3)

Preface

The research was challenging, but conducting a comprehensive investigation has allowed me to achieve the identified objectives. Fortunately, Mr Daniel Toll was always available and willing to answer my questions. And I would like to thank my family and friends who kept supporting me during these hard days.

(4)

Contents

1 Glossary 1

2 Introduction 2

2.1 Background . . . . 2

2.2 Related work . . . . 3

2.3 Problem formulation . . . . 3

2.4 Motivation . . . . 4

2.5 Results . . . . 4

2.6 Scope/Limitation . . . . 5

2.7 Target group . . . . 5

2.8 Outline . . . . 5

3 Method 6 3.1 Research Project . . . . 6

3.2 Research methods . . . . 7

3.3 Reliability and Validity . . . . 11

3.4 Ethical considerations . . . . 11

4 Theoretical Background 12 4.1 AI . . . . 12

4.2 Big Data . . . . 14

4.3 The relation between Big Data and AI . . . . 15

5 Contact Tracing Apps 16 6 What are the most accurate applications of big data and AI algorithms used to confront COVID-19? 21 6.1 Predicting the spread of COVID-19 . . . . 21

6.2 Monitor and follow up on cases of people infected with COVID-19 . . . 23

6.3 COVID-19 diagnosis . . . . 24

7 What are the challenges/issues AI and big data technologies faced in con- fronting COVID-19? 26 7.1 No standard data sets . . . . 26

7.2 Data accuracy and recency . . . . 27

7.3 The huge need for labeled data . . . . 27

7.4 Data privacy, security, and public trust . . . . 27

7.5 COVID-19 Tracking Apps Challenges . . . . 28

8 Results 29 8.1 O1 AI and big data characteristics/capabilities. . . . . 29

8.2 O2 Contact Tracing Apps. . . . . 30

8.3 O3 AI and big data applications used to confront the COVID-19. . . . 30

8.4 RO4 Weaknesses/Challenges of AI and big data Apps and applications used to confront the COVID-19. . . . 32

9 Discussion 37

10 Conclusions and Future Work 41

(5)

References 42

List of Figures 46

List of Tables 46

(6)

1 Glossary

This section aims to explain some terms used throughout this report.

• AI: The capacity of digital computers to execute particular tasks that simulate and resemble those humans do, such as thinking or learning from past experiences or other processes requiring mental operations [1].

• Big Data: A large data set that is constantly growing over time, making this data complicated and enormous. Thus, the traditional data management software cannot save, analyze, or handle it effectively. [2].

• Labeled Data: A designation for parts of data that have been designated with one or more labels identifying particular features or characteristics or classifications or contained objects [3].

• GT: A free service provided by Google that allows users to discover and follow fundamental trends related to keywords (the concepts that the user searches on the Internet for what he/she wants) and research related to the season and the current period of the year.

• GFT: A free service provided by Google presents an assessment of disease activ- ity for over 25 countries. By gathering Google search queries to present precise predictions about disease activity.

• WSNs: It is a set of sensor nodes that are used to transfer or track a particular chemical or physical phenomenon like temperature, humidity, etc. After that, it transfers the data about the phenomenon using microwaves, radio, etc.(wirelessly) to the center of data processing to analyze it and take advantage of it without human interference [4].

• PDA: It is a software service associated with a device or just a feature offered on a general-purpose computer such as a smartphone. PDA responds and uses voice commands to assist owners and generally make their life easier [5].

• CT: One of the medical imaging techniques that depend on x-rays. It creates a three-dimensional(3D) image of the body’s internal organs.

• RSSI: The amount of power in a radio signal that has been received.

• TXpower: The maximum strength of the transmitted signal.

• App: An application that operates on a phone/tablet.

• Application: An application with no specific operating system. It could be Win- dows application, Mac application, phone application, etc.

(7)

2 Introduction

Since the emergence of the various applications of artificial intelligence (AI) and big data technologies, humankind has been searching for how it can benefit from these technolo- gies in achieving and continuing its prosperity and obtaining comfort, happiness, and well-being through it. The world was and still suffers from pandemics that humanity is exposed to occasionally, which imposes on us the necessity to use AI and data science applications and big data alike. Since they might have a significant role in helping us alleviate patients suffering, strengthen the methods that predict the occurrences of the pandemics, and strengthen the ways and means to deal with them before, during, and after their occurrence [6].

In this thesis, we study the role of AI and big data to confront the SARS-COV-2 (COVID-19) by identifying their concept and their applications that developed to confront COVID19, including Contact Tracking Apps. Moreover, we will discuss the challenges associated with using these applications and apps to develop our recommendation lists that governments or researchers in the second place should consider. This thesis aims to help researchers and governments to combat any similar pandemics in the future.

2.1 Background

COVID-19 disease that emerged in 2019 is highly contagious; it first began in Wuhan, China. COVID-19 has spread widely and has affected more than 200 countries. There are 172,630,637 confirmed cases, 3,718,683 of them died, until June 7, 2021 [7]. So in this situation, something must be done, and it has been started in most countries, which is the manual blood test and saliva test where doctors take a blood or saliva sample and analyze it using a microscope. However, the manual blood test has drawbacks such as costly and ineffective since it takes 5-6 hours to produce the result. Therefore, many countries have changed their approach and employed AI and big data to confront COVID-19 to produce the result quickly.

The latest advancement in big data, AI algorithms, communication technologies, and computational technologies can help us process the huge datasets derived from surveil- lance of public health, governmental institutions, and real-time monitoring of pandemic outbreaks, etc. However, there seems to be an unavoidable trade-off between public health and individual privacy. AI and big data can be a double-edged sword since they can lo- cate anyone, anytime, and anywhere. It supports modeling efforts, predicts the flow of a pandemic, and informs preparation and response policies. Critics of these technologies are concerned about governments violating individuals and hackers who might steal their data [8].

(8)

2.2 Related work

While systematic studies have been developed for various domains (such as data-intensive applications [9]) in recent years, to our best knowledge, not much attention has been paid to systematically study the applicability of big data and AI in the context of COVID-19.

In what follows in this section, we highlight several examples of related work.

Pham et al. [10] Aimed to identify COVID-19 properties and outline AI and big data applications in confronting COVID-19, monitoring public opinion on the internet, and assessing pandemic prediction. Moreover, it discusses issues, challenges, and suggestions related to applying big data and AI to fight against COVID-19. The study reached several results, such as governments can use big data analysis to improve their epidemiological response mechanisms. This study recommends using AI and big data to support outbreak prediction, diagnosis support, virus detection, and treatment suggestion.

Bragazzi et al. [11] Aimed to review the potential applications of AI and big data that could be used to help manage the pandemic on a worldwide scale. The study spot the light on Short-Term, Medium-Term, and Long-Term Applications of AI and big data to fight COVID-19 in these areas: assisting the implementation of public health participation, building intelligent and resilient cities, identifying a potential pharmacological treatment.

The study found that AI and big data seem to have enormous possibilities in control- ling COVID19. Moreover, AI and big data can be used to track the COVID-19 spread, help public health in planning for interventions accordingly, and improve the response of societies to COVID-19.

In our project, we review contact tracking apps and COVID-19 applications that have a high accuracy rate. Moreover, We identify the weaknesses in these applications and apps based on their functionality (collecting and analyzing data). Finally, we provide a recommendation list that might be useful for stakeholders who want to use these apps or applications.

2.3 Problem formulation

COVID-19 pandemic differs from previous pandemics in various aspects. Nowadays, the number of infected cases is widely reported and extensively distributed across countries daily. Combining this data with the data available on social distancing patterns in different countries could form a suitable model for big data. This model can help establish more effective anti-virus medical policies by; using this model with AI algorithms could pre- dict the spread of a particular pandemic, diagnose infections, and determine the potential treatment. Over the last two years, many scientific publications aimed to understand and fight COVID-19 using AI and big data models. Perhaps it is important to study the role of big data and AI in confronting COVID-19 further by analyzing the published studies in this regard. Thus the study problem is represented in the following main objective: How were big data and AI used to confront COVID-19?

In order to achieve this main objective, the study seeks to achieve the objectives in table 2.1.

(9)

Objectives

O1 Study characteristics/capabilities of AI and big data.

O2 Review Contact Tracing Apps developed by governments to confront the COVID-19.

O3 Identify the most accurate applications of big data mod- els and AI algorithms used to confront the COVID-19 pan- demic.

O4 Discuss the weaknesses/challenges of these Apps and ap- plications.

O5 Develop recommendation lists.

Table 2.1: Research objectives.

2.4 Motivation

The importance of this study derives from the newness of the topic since it addresses COVID-19, which is a recent topic, and there was no time to do deliberate academic stud- ies on this topic. Moreover, the study deals with the role of big data and AI in confronting COVID-19, which can contribute to making appropriate decisions to warn us in case a new virus emerged. So it is vital to study the role of AI and big data to confront COVID- 19 in greater depth. Finally, we believe this study is significant in providing an academic reference for researchers who wish to study such topics in the future.

2.5 Results

Regarding the first objective, we will spot the light on AI and big data concepts/capabilities, giving the reader a general overview to better understand this topic. The type of result will be a descriptive model since we will describe AI and big data. The type of validation will be evaluating and analyzing academic studies and articles that addressed AI and big data in COVID-19.

Regarding the second objective, we are going to explain how Contact Tracing Apps work. The type of result will be a descriptive model since we will explain how Contact Tracing Apps work. The type of validation will be evaluating and analyzing academic studies and articles that addressed Contact Tracing apps in COVID-19.

Regarding the third objective, we will review the most accurate applications of big data models and AI algorithms used to confront COVID-19. The type of result will be a descriptive model since we will explain and review AI and big data applications used to confront COVID-19. The type of validation will be evaluating and analyzing academic studies and articles that addressed AI and big data applications COVID-19.

Regarding the fourth objective, we will discuss the weaknesses/ challenges of Contact Tracing Apps and the applications employed in COVID-19. The type of result will be

(10)

2.6 Scope/Limitation

Table 2.2 shows the scope of this literature review, considering that we did not mention that Machine Learning(ML) and Deep Learning(DL) are included since both are subfields of AI.

2.7 Target group

We have discussed the target group in short before, in the introduction section. In this section, we will provide a more detailed explanation of the target groups for this project.

The project has an interest in all three major fields – healthcare, governmental entities, programmers. However, we believe governmental entities are the primary stakeholder.

Since the outcome of this project will not focus on medical solutions or coding. Therefore, we believe it will not be very beneficial for healthcare and programmers. However, it might be partly beneficial for programmers since we will spot the light on the technical issues in these apps and applications.

2.8 Outline

In chapter 3, we will show and explain our method of searching and analyzing relevant literature.

In chapter 4, we are going to address the first objective. We are going to give an overview of AI and big data. However, we will not explain complex aspects to ensure that the reader can understand the concepts.

In chapter 5, we are going to address the second objective. We will explain how Contact Tracing Apps work.

In chapter 6, we are going to address the third objective. We will spot the light on the most accurate big data and AI applications used to confront the COVID-19 pandemic by reviewing these applications and explaining how they can help confront COVID-19.

In chapter 7, we will address the fourth objective. We will analyze the weaknesses/challenges in these applications and apps based on their functionality (how it is analyzing and gath- ering the data) to find the weaknesses of each application. This objective is divided into two parts as follows: First, we will declare and explain these challenges/issues in chapter 7. in chapter 8. We will match each application with the challenges/issue that it is facing.

In chapter 8, we are going to present all the results obtained through this research.

In chapter 9, we are going to address the fifth objective. We will discuss our find- ings and how they relate to what others have done in the field of study. Moreover, we will develop a recommendation list that might be beneficial for stakeholders interested in making use of these applications and apps.

In chapter 10, we are going to conclude our findings.

(11)

Inclusion and Exclusion criteria

IC01 Publications that address the use of AI algorithms and big data models together or separately to confront COVID-19.

IC02 Publications that address AI algorithms and big data models in the healthcare field.

IC03 Research published between January 1st, 2019, to January 1st, 2021, since COVID-19 emerged in 2019.

IC04 Research related to these study areas which are:

spread prediction, trace infections, diagnosis infected cases.

EC02 Publications not written in English.

Table 2.2: The scope of this literature review.

3 Method

The study relies on the Systematic Literature Review that was carried out in its context recourse of published research studies or articles on the internet to spotlight the role of AI and big data to develop solutions that can contribute to confronting the COVID-19 pandemic and analyze them.

We set up our plan for this project as follows:

Firstly, we will figure out if a literature review is necessary. Secondly, specifying the research objectives. Lastly, evaluating the protocol for conducting the literature review.

Several steps are going to be conducted in this research to achieve the defined objec- tives as follows:

Firstly, we will review academic research to acquire the information needed to develop the solution by searching digital libraries using our keywords.

Secondly, we will categorize the academic research we obtained by the number of times they have been cited. The most cited academic research will always be at the top of our list.

Fourthly, we will read the title and the abstract of the found academic research. Based on reading the title and the abstract, If we found the academic research related to our study, we will include it. Otherwise, we will ignore it.

Lastly, We read the entire paper. If we found any condition of these conditions (1- Not related to our project. 2- Violates our inclusion criteria. 3- Within our exclusion criteria.) will ignore it. We will include academic research only if it meets all these conditions(1- Related to our project. 2- Does not violate our inclusion criteria. 3- Not within our exclusion criteria.). More details to be given in the following subsections.

(12)

Based on the study’s objectives, we have picked a primary collection of keywords used to search for studies and articles, such as COVID-19, artificial intelligence, big data. We took into account synonyms to enhance the search process results, such as AI, COVID-19, ML, and DL. The searching process is carried out on digital libraries such as ScienceDi- rect, Google Scholar, and IEEEXplore. The search process procedure a list of possibly relevant publications. Then we minimize our selection by selecting publications manu- ally by reading the title, abstract, and keywords. Then, the entire paper, and applying inclusion-exclusion criteria, which results in a list of related publications. The literature search and selection process is presented in Figure 3.1.

Figure 3.1: The process of searching and selecting the relevant literature

3.2 Research methods

To achieve the first objective, we first searched Google Scholar using our keywords "Ar- tificial intelligence" OR "Big data." We found a total of 981 possibly related articles. We sorted the relevant articles by cite number. Only 124 were selected out of these articles by reading the title and abstract. After reading the full article and ensuring that it is related to our project and included in our inclusion criteria and not included in our exclusion criteria, 28 were selected as relevant articles.

IEEEXplore gave us 17,775 possibly related articles. However, the suggested articles were not related to our study. Therefore, we considered the first 1000 articles only. We sorted the articles by the cite number. Only 184 were selected out of these articles by reading the title and abstract. After reading the full article and ensuring that it is related to our project and included in our inclusion criteria and not included in our exclusion criteria, 32 were selected as relevant articles.

ScienceDirect gave us 9,349 possibly related articles. However, the suggested articles were not related to our study. Therefore, we considered the first 1000 articles only. We sorted the articles by cite number. Only 204 were selected out of these articles by reading the title and abstract. After reading the full article and ensuring that it is related to our project and included in our inclusion criteria and not included in our exclusion criteria, 21 were selected as relevant articles.

So we have 81 articles in total, and after deleting duplicate articles, we left with 14 articles in total for the keyword "Artificial intelligence" OR "Big data.".

To achieve the second objective, we first searched Google Scholar using the keyword

"Contact tracing apps" we found a total of 645 possibly related articles. We sorted the articles by cite number. Only 136 were selected out of these articles by reading the title and abstract. After reading the full article and ensuring that it is related to our project and included in our inclusion criteria and not included in our exclusion criteria, 21 were selected as relevant articles.

(13)

IEEEXplore gave us 36 possibly related articles. We sorted the articles by cite number.

Only 18 were selected out of these articles by reading the title and abstract. After reading the full article and ensuring that it is related to our project and included in our inclusion criteria and not included in our exclusion criteria, 12 were selected as relevant articles.

ScienceDirect gave us 534 possibly related articles. We sorted the articles by cite number. Only 94 were selected out of these articles by reading the title and abstract.

After reading the full article and ensuring that it is related to our project and included in our inclusion criteria and not included in our exclusion criteria, 14 were selected as relevant articles.

So we have 47 articles in total, and after deleting duplicate articles, we left with 7 articles in total.

To achieve the third objective, we first searched Google Scholar using the keyword "AI applications in COVID-19" OR "Big data applications in COVID-19". We found a total of 16,900 possibly related articles. However, the suggested articles were not related to our study. Therefore, we considered the first 1000 articles only. We sorted the articles by cite number. Only 224 were selected out of these articles by reading the title and abstract.

After reading the full article and ensuring that it is related to our project and included in our inclusion criteria and not included in our exclusion criteria, 28 were selected as relevant articles.

IEEEXplore gave us 103 possibly related articles. We sorted the articles by cite num- ber. Only 56 were selected out of these articles by reading the title and abstract. After reading the full article and ensuring that it is related to our project and included in our inclusion criteria and not included in our exclusion criteria, 21 were selected as relevant articles.

ScienceDirect gave us 589 possibly related articles. We sorted the articles by cite number. Only 62 were selected out of these articles by reading the title and abstract.

After reading the full article and ensuring that it is related to our project and included in our inclusion criteria and not included in our exclusion criteria, 24 were selected as relevant articles.

So we have 73 articles in total, and after deleting duplicate articles, we left with 16 articles in total.

Table 3.3 shows the number of possibly related articles we found on each digital li- brary.

Table 3.4 shows the number of possibly related articles after reading the title and abstract.

Table 3.5 shows the number of the relevant articles after reading the entire paper and applying inclusion-exclusion criteria and the final total number of the relevant articles after deleting the duplicate ones.

(14)

KeyWords Google Scholar IEEEXplore ScienceDirect

"Artificial intelligence"

OR

"Big data"

981 17.775

1000

9349 1000

"Contact Tracing Apps"

645 36 534

"AI applications in COVID-19"

OR

"Big data applications in COVID-19"

16.900 1000

103 589

Table 3.3: The number of possibly related articles by searching on digital libraries using the identified Keywords.

KeyWords Google Scholar IEEEXplore ScienceDirect

"Artificial intelligence"

OR

"Big data"

981 124

1000 184

1000 204

"Contact Tracing Apps"

645 136

36 18

534 94

"AI applications in COVID-19"

OR

"Big data applications in COVID-19"

1000 224

103 56

589 62

Table 3.4: The number of possibly related articles after reading the title and abstract.

(15)

KeyWords Google Scholar

IEEEXplore ScienceDirect Total Total (After deleting duplicates)

"Artificial intelligence"

OR

"Big data"

124 28

184 32

204 21

81 81

14

"Contact Tracing Apps"

136 21

18 12

94 14

47 47

7

"AI applications in

COVID-19"

OR

"Big data applications in

COVID-19"

224 28

56 21

62 24

73 73

16

Table 3.5: The final total number of the relevant articles after applying inclusion-exclusion criteria and after deleting the duplicate ones.

(16)

3.3 Reliability and Validity

In this study, we will locate various academic articles and studies with proven validity.

Then, we analyze and list them to extract information that is related to this project. For the Systematic Literature Review to be reliable, the chosen publications should not receive any negative or controversial reviews or counter publications. The applications that we will review have not been widely used, nor have they been clinically tested. Moreover, they might be outdated and replaced by new applications in the future. Therefore this study is valid now but may not be as valid in the future.

To sum up, we would like to say that this project’s validity is considered justifiable due to the structured procedure in which the thesis work has been conducted and documented in the final report. Moreover, the work is dependent on the literature review, which will correlate with the results, thereby increasing the validity.

3.4 Ethical considerations

This project has been conducted without group participation, such as surveys. Thus it does not expose any personal information or harm anyone’s privacy. The names mentioned in the report are researchers’ names. In case of any future arguments or complaints, this work is available for review at any time. Publications and information within this report are appropriately referenced when needed. Studies, articles, and applications that are discussed in the report are publicly available and referenced.

(17)

4 Theoretical Background

We are going to introduce some concepts and theories used throughout the document.

However, We do not intend to give a technical demonstration of how AI and big data operate but rather demystify to the reader what AI and big data mean in general.

4.1 AI

AI is a computerized system that presents behavior that requires intelligence. In other words, "AI is the branch of computer science that deals with the simulation of intelligent behavior in computers as regards their capacity to mimic and ideally improve human behavior. To achieve this, the simulation of human cognition and functions, including learning and problem-solving is required." [1].

The ability to plan, reason, learn, and develop some information perception and com- munication in natural language is referred to as intelligence. It has been proved that com- puters can be programmed to execute exceedingly complicated jobs since the development of digital computers in the 1940s. For example, deducing mathematical theories or play- ing chess with exceptional skill. Moreover, some programs achieved human experts and professionals’ performance levels in performing specific tasks such as computer search engines, medical diagnostics, voice recognition, etc., making AI an effective investment in many areas. Figure 4.2 shows some AI capabilities that make it required in many fields.

Generally, psychologists do not distinguish humans’ intelligence with one trait only but through many combinations of diverse capabilities. Therefore, AI researchers mainly fo- cus on the following components: self-learning, problem-solving, perception, and the use of various languages to communicate [13].

Figure 4.2: Some AI capabilities.

(18)

With AI, meaningful relationships can be identified in data, including treatment deci- sions, drug development, patient care. Using AI, healthcare professionals can deal with complex and time-consuming issues that are difficult to solve independently. For medical professionals, AI may be a significant resource, allowing them to better leverage their expertise. AI in medicine can focus at most on the following terms:

• Image Processing:- A method for converting an image into a digital format and per- forming some operations using mathematical operations to enhance the image or extract valuable data. It is a type of signal distribution where the image represents the input part. Images Processing Technology is among the rapidly growing technologies today, with its applications in various fields [14].

• Computer Vision:- A process to identify the picture input and a suitable output. It deals with the design, theory, and implementation of algorithms that automatically pro- cess visual data to recognize, track objectives and spatial mapping. It is considered an interdisciplinary field as it deals with how the computer industry can obtain a high-level perception of digital images or video [14].

• Artificial Neural Network (ANN):- A simulation of how the human brain performs a specific task through massive parallel-distributed processing consists of simple pro- cessing units using a mathematical model and computational techniques. These units are nothing more than computational elements having a neurological property called neurons or nodes (nodes, neurons). It saves practical knowledge and empirical data so that the user can access it [14].

• Machine Learning:- ML is a data-analytics technique for automating the structure of analytical models. It is based on the notion that computers can learn from data, un- derstand patterns, and make decisions with limited or without human intervention. ML has provided many life applications that would have been difficult to be created without it, such as self-driving cars, effective internet searches, direct speech recognition, and a better understanding of the human genome [14].

• Convolutional Neural Network (CNN):- It is a type of ANN based on DL algorithms with numerous hidden layers to analyze data. CNN has one or more hidden layers that can extract features and data in videos and images. Then, create a linked layer to pro- vide the needed output. Therefore, it is widely used in visual analysis and computer vision [14].

• Deep Learning:- It is a subset of ML, which is mostly focused on developing algo- rithms that allow computers to learn to accomplish complex tasks that necessitate a thorough comprehension of data (such as using medical imaging to diagnose illnesses ). Primarily, It depends on ANN. Figure 4.3 depicts the relation of DL with ML and AI.

What distinguishes DL algorithms is their ability to learn and automate tasks without programming (extracting data by data scientists manually) [14].

(19)

Figure 4.3: DL is a branch of ML, which is an area of AI.

4.2 Big Data

Big data refers to larger and complicated data sets, so traditional data processing software cannot handle them. However, these vast amounts of data can be used to solve business issues that humans have previously been unable to solve. In other words, the term "big data" refers to a large volume of data with a high degree of complexity, so that it is difficult to process and analyze to take advantage of it through data-processing systems. Such a large volume of data necessitates a system that can extract and analyze it [2].

Big data need to be analyzed to obtain its value by analyzing the trends patterns. Even though data comes from a rapid increase in volume, it quickly, efficiently, and effectively processes those data refers to data processing velocity. Therefore, big data analytics al- lows for more precise analysis, which leads to better decision-making and performance.

(20)

• Health development. Medical data includes large numbers of medical records, im- ages, and models that help detect disease early and develop drugs [17].

• Predicting and responding to natural and human-induced disasters. Sensor data can be analyzed to predict the next places that earthquakes will strike or any other disas- ters. At that time, Human behavior patterns provide insights that aid organizations in providing relief to survivors and using big data technologies to track and protect the flow of refugees fleeing war zones throughout the world [18].

Figure 4.4: Some big data characteristics.

4.3 The relation between Big Data and AI

The world was already overwhelmed with vast volumes of data that it did not realize its benefits. When the term big data appeared, everyone realized that their stored data rep- resents a massive wealth that can - if analyzed properly - be used to make more rational decisions for the industry to which this data belongs. Soon, information specialists real- ized that the human mind could not analyze this massive amount of data, which created the urgent need to develop AI algorithms to accomplish that task. However, the statisti- cal and algorithmic approach for the data in AI remained limited until big data emerged, which led to the acceleration of the growth and development of AI. Thus a symbiotic re- lationship arose between big data and AI in a way that one of them cannot be studied in isolation from the other one [19, 20]. In other words, AI works well with big data since they complement each other. The better the AI gets, the more excellent outcomes. Figure 4.5 depicts a quick comparison of when AI operates with big data and when it is not.

AI with big data can be beneficial in several fields such as:

• Anomaly Detection. AI can analyze data to detect unusual events in the data. For instance, having a network of sensors that have a proper pre-defined domain and anything outside this domain is an irregularity [21].

• Probability of Future Outcome. Using a recognised condition that has a definite probability of affecting the future outcome, AI can determine the possibility of this outcome. [21].

• Data charts and graphs. AI can analyze patterns in graphs and charts that might be unnoticed by humans [21].

(21)

Figure 4.5: AI before and with big data

5 Contact Tracing Apps

COVID-19 has caught the world off guard, causing quarantine and putting pressure on public health care organizations.

Since COVID-19 is extremely contagious infected people mostly do not show symp- toms at first, whereas some remaining asymptomatic. As a result, a sizable portion of the population may be a hidden source of transmissions. Thus, several governments ex- pressed strong attention to contact tracking apps, which automate the laborious work of identifying all recently detected infected people’s contacts. Contact Tracing Apps can be an excellent weapon to fight pandemics since nowadays, around half of the world’s pop- ulation has a smartphone with Bluetooth and GPS [22].

Contact tracing apps analyze the users’ location paths using GPS and the anony- mousID tokens collected from infected people using Bluetooth to recognize who has been nearby these people; this simplifies contact tracing more accurately and timely traditional manual approach. A person’s location path and list of nearby device IDs include strongly confidential, private information, such as where they work and live and family members and friends they visit [23].

The contact-tracing process estimates those who get infected’s recent geographical

(22)

One Time Password (OTP). The server computes a TempID after verification, which is only valid for a limited time (around 15-20 minutes) [24]. Figure 5.7 shows the registra- tion process.

Figure 5.6: Tracing Apps centralised architecture.

(23)

Figure 5.7: Registration process for Tracking Apps.

Both devices exchange an Encounter Message when a user contacts another app user over Bluetooth, as explained in Figure 5.8. Each device keeps track of the Received Signal Strength Indicator (RSSI) and the message delivery timestamp. "Since the TempIDs are generated and encrypted by the server they do not reveal any of the app user’s personal information. Thus, both app users have a symmetric record of the encounter that is stored on their respective phones’ local storage." [24].

Rather than being automatically transferred to the server, all encounter records are maintained locally. The health authority verifies that the user has installed the app and marks the user as infected. " If the user agrees to upload the data, the health official sets this up in the back-end server, and the server generates an OTP for verification. Once verified, the encounter data is uploaded to the server." [24]. Figure 5.9 shows the progress when the user tests positive COVID-19.

(24)

Figure 5.8: Tracing Apps contact exchange [24].

(25)

Figure 5.9: Tracing Apps notification.

(26)

6 What are the most accurate applications of big data and AI algo- rithms used to confront COVID-19?

Big data and AI can have great importance in managing the crisis by making appropriate decisions at the right time. During COVID-19, if governments collect data from different sources and proper analytical methodologies, they may respond quickly to achieve appro- priate public health decisions. Thus, they can quickly and effectively restore the normal situation (before COVID-19). Big data contributes to managing crises and reducing risks through appropriate decision-making; By analyzing it with modern technical methods and tools represented in the technologies brought about by the AI revolution. Many studies have discussed the use of big data applications to confront the COVID-19 pandemic that has killed many people and threatened the world economy [25].

In the following subsections, we will review the studies that dealt with AI and big data applications to confront COVID-19.

6.1 Predicting the spread of COVID-19

Yang et al. Collected data during the mass migration for the annual Spring Festival hol- idays on January 25, 2020. The data was used to fill in an infection model that uses AI algorithms trained on COVID-19 data to predict the curve of the COVID-19 pandemic.

Yang found that the Chinese authorities’ rigorous public health measures (like quaran- tine) would have resulted in a three-fold increase in the extent of the pandemic if they had delayed the implementation for five days [25].

The Center Systems Science Engineering at John Hopkins University established a public web-based interactive dashboard. The purpose of this board is to capture and track reported COVID-19 cases in real-time precisely. The rapid update of the data character- izes the dashboard. It is updated twice daily, which gave AI a new vision to predict and recommend quarantine in certain areas upon reaching a specific number of infected cases in this area. Moreover, It can help to diagnose patients early if they reported travel to these areas. This information is publicised in ArcGIS Living Atlas and Google Sheets.

[26, 27]. Figure 6.10 shows the interactive dashboard.

Figure 6.10: Johns Hopkins University’s interactive dashboard.

(27)

Before COVID-19, Google developed a Google Flu Trend (GFT) model that em- ployed big data technology to predict the extent of influenza activity counting on web search queries from various areas. Google created the GFT risk prediction model using data from "a database containing 50 million of the most common web search queries on all influenza-related topics." [28]. GFT was able to predict an influenza-like illness around 8 days ahead of the Centers for Disease Control, and Prevention reports on the outbreak, according to Google. Google’s result revealed that big data analysis could increase the timeliness of public health surveillance [28].

In the same approach, Strzelecki used Google Trends (GT) to obtain COVID-19 re- lated data in South Korea, Iran, Italy, and China. Figure 6.11 depicts the COVID-19 reported cases and Google Trends data that Strzelecki obtained. This data was used to build a perception of COVID-19 direction and to estimate the possible future outbreak [29].

One of the problems in tracking, controlling, and preventing the COVID-19 spread is finding potential close contacts carrying the virus. One of the methods that can be used to identify contacts of people infected with the virus is using the big data graph database to search for close contacts and figure out the virus transmission path and the transmission path. The COVID-19 spreads from one node to the contacted node through a dendritic process, using a person as a node. The graph database could be utilized to save data like geographic location and infection time to create a graph model that perceives the spread route visualization [30].

(28)

6.2 Monitor and follow up on cases of people infected with COVID-19

Large quantities of data for those who infected with the COVID-19 virus are collected and analyzed through AI systems that can collect surveillance data for infected people using personal digital assistant (PDA), tablets, and phones, etc., to store the patient’s data in the electronic health records. Thus, it can be quickly transferred when needed and easily shared; thus reducing the risks of infection and the burden that imposed on the medical staff to analyze, collect and store these data. Various methods in the context of expert and intelligent systems were used through wireless sensor networks (WSNs) for medical observation [31].

Patients and their families can be monitored remotely at home using their smart bracelets or phones, which provides an automatic warning message in the event of any quarantine break. Thus, decreasing the strain on health care employees, so they can work more effi- ciently. Moreover, it can provide a high-speed medical surveillance system, depending on surveillance cameras, and a motion-tracking algorithm for people infected with COVID- 19 and the information related to vital signs. Moreover, typical information can also be extracted later to analyze the level of risk or damage caused by the virus [32].

Zhao et al. Obtained a data set from the China National Health Commission. This data contains information about 854,424 passengers who traveled from Wuhan Tianhe Airport -across 55 airports in China from 12/20/2019 to 01/20/2020. Air passengers and local population variables were used as inputs to multiple linear models to explain the variation of confirmed cases. Moreover, Zhao et al. Conducted a Spearman correlation to analyze the relationship between Wuhan’s daily traffic and the total traffic (from 01/01/2020 to 01/26/2020). The study found that the local population size correlated positively with the number of confirmed infection cases [33].

A set of data from China, Singapore, South Korea, and Italy were collected and used to establish a Comprehensive analytical model by comparing these data with the exponential growth laws and macroscopic to track the spread of COVID-19. The macroscopic growth laws are used for the number of infected cases besides using modelling and ML techniques which can estimate the maximum number of infected cases in a particular region [34].

A model based on air temperature can be used to determine the association between the average air temperature in various regions and the number of infected cases based on data obtained from 24 countries. Notari found that the analysis of the virus tracking data showed that the spread growth rate has significantly decreased in the southern hemisphere countries due to high temperature compared to the northern hemisphere countries [35].

(29)

6.3 COVID-19 diagnosis

Zixin et al. have developed a modified stacked auto-encoder to model the dynamics of pandemic transmission. DL technologies were used to process CT images. Zixin et al.

trained a model that used the data of 499 images and testing them on data of 131 images.

The study found that the possibility of diagnosing COVID-19 infection using AI and big data is 84%. This study is a quick approach to identifying those infected with COVID- 19, which provides a quick and reliable way to reach a quarantine decision and medical treatment. As such, it can be used in monitoring outbreaks and improving public health strategies [36].

Sedky mentioned conducted an analytical study that aimed to analyze the applications of AI that contribute to confronting COVID-19 in various countries by making maximum use of the big data related to the infected by developing a smartphone application that collects symptoms, previous locations of the infected, travel history, areas in which the COVID-19 has spread and after that it filtering this data using certain algorithms, where only suspected cases are dealt with. Figure 6.12 describes AI approaches to address the COVID-19. Figure 6.13 shows collecting virus suspects data through the smartphone application and new updates to the areas where the virus has spread. The application analyzes this data using AI techniques to find out the infected cases and inform them of the necessity of quarantine until they are tested [32].

Aslan et al. have proposed two DL architectures that automatically identify COVID- 19 positive cases utilizing Lung segmentation preprocessing in CT images and chest CT X-ray images, which are provided as input to the suggested architectures. ANN was used to automate the process. Aslan et al. used two databases. The first database is The COVID19 Radiology database. This database was created by assembling the sam- ples from various sources. Figure 6.14 shows COVID-19 Radiography Database sample images. The second database is an open-access database including X-ray images of the chest. This database is collected from Qatar University, the University of Dhaka, and their collaborators from Malaysia and Pakistan. DL algorithms were used to interpret and an- alyze many computed tomography (CT) examinations for the chest. Before training, raw chest CT X-ray images are segmented using ANN-based segmentation, and the lung part of the raw image is cropped as a result of the segmentation to enhance the accuracy. The number of segmented photos is enhanced with the data augmentation technique to provide data variety. 85% of these images are sent into training architectures as input. The first architecture is a modified version of AlexNet’s based on chest CT images. The second architecture includes the BiLSTM layer, which considers the temporary properties in the images and in the first architecture. The study found that the classification accuracy is 98.14% for the first architecture. On the other hand, it is 98.70% for the second hybrid ar- chitecture. Thus the study shows outstanding success for DL algorithms and ANN-based segmentation in infection detection [37].

(30)

Figure 6.12: Summary of AI approaches in fighting the COVID-19 [32].

(31)

Figure 6.13: Potential AI-based detection of suspected COVID-19 using a smartphone app [32].

Figure 6.14: Sample images of COVID-19 Radiography Database [37].

7 What are the challenges/issues AI and big data technologies faced in confronting COVID-19?

The COVID-19 crisis, with its challenges that negatively affect various fields, continues to enhance the efforts of the human mind towards innovative thinking and accelerate the use of modern technology in its most extreme stages, especially the stage of coexistence with the virus began without specifying the end date. At the forefront of benefiting from modern technology comes the topic of AI and big data as the most influential factor in

(32)

the data from the available sources on the internet like public health institutions or patients who provide the data they have like the doctor report, X-ray report, etc.—gathering all the data to form a private data set to test it on AI algorithms and big data applications. Thus in most cases, accuracy ratio, sensitivity, or the dependency between the algorithms cannot be matched because each algorithm uses a different data set [10].

7.2 Data accuracy and recency

Data accuracy means how much validity that data reflects. This data might contain hid- den issues from the accuracy perspective, which is difficult to detect or clear issues for anyone trying to use. It depends on several factors like infected cases, population, envi- ronments, and living conditions. For example, "Initial estimates from China including a study of over 72,000 patients indicated an asymptomatic infection rate of about 1%, how- ever growing evidence and increased testing indicates that a larger proportion of infected persons might remain asymptomatic and estimates of asymptomatic infection have been wide, ranging from 17.9% - 78%, depending on the context of the study." [38]. Therefore, it should be taken into account when the dataset is used in different regions and countries to estimate the pandemic’s spread [39].

Data recency is an essential factor in its quality. Data must be continuously updated to remain appropriate for the users since it reflects reality, especially if it changes rapidly in this case. Timing is an essential factor, as the data on the number of COVID-19 infected people changes daily, and decisions are made based on it.

7.3 The huge need for labeled data

Since AI algorithms are a type of system that is trained and not programmed, it often requires enormous amounts of labeled data in order to be able to perform complex tasks accurately, which may be difficult since some applications, for example, need thousands or even millions of records in demand for its performance to reach the human level, these records cannot be provided in some areas at all, or they may not be available simply in other areas. [3].

7.4 Data privacy, security, and public trust

In the COVID-19 crisis, it emerged that the rules for maintaining the privacy of data of- ten stand as a barrier to researchers obtaining large and reliable datasets that include a complete real-time picture of the population health, which might disrupt their ability to develop highly reliable algorithms. For example, in the U.S.A, there was no comprehen- sive picture of population data that includes all medical data available at all government levels like travel patterns and local demographics, etc., due to the data privacy. When the COVID-19 started, the need to develop a computer model containing all required data arose to help predict where the pandemic might appear [10].

(33)

7.5 COVID-19 Tracking Apps Challenges

Many countries have launched apps to track the spread of the COVID-19 since the be- ginning of the pandemic crisis, and results so far are still below expectations because the global download rates for tracking apps were low so far. In Germany, the download rate was about 21%, in Italy 14%, in France 3%. The download rates are significant because it reflects the acceptance of people’s acceptance of such apps.

We have downloaded some tracking apps trying to understand why the download rates are low, and we observed that these applications ask for several permissions, such as:-

• Accessing the camera: This allows the to take pictures and videos using the camera.

This permission allows the application to use the camera at any time without confirming its use every time.

• Geolocation Access: This allows the app to obtain an accurate user’s location using GPS or network location sources such as cell towers and Wi-Fi. These location services must be turned on in order for the app to use them.

• Phone calls: This allows the app to call phone numbers without the user’s knowledge, leading to unexpected charges or calls.

• Accessing and controlling Bluetooth: This allows the app to discover and pair with remote devices.

• Read phone status and identity: This allows the app to access the phone’s features like phone number and device identifiers, etc.

• External SD Memory: This allows the app to write to the SD card and modify and delete data.

• Voice recording: This allows the app to record audio with the microphone. This per- mission allows the app to record audio at any time without the need for the user’s consent.

• Prevent the phone from sleeping: This allows the app to prevent the phone from sleep- ing, leading to the battery’s depletion.

Among the other challenges to consider when evaluating the effectiveness of these apps is that people who are classified as “at high risk” are at risk to be infected with the virus, such as the elderly, those with pre-existing health problems, people with disabil- ities, and those living in minorities or limited societies, may not own a smartphone or unable to use the app since most of these apps were developed without considering the possibility of their use by everyone. Thus, the data extracted from these apps may not include information on the most important demographic groups.

Moreover, many factors can influence the signal and cause data error; The errors may be due to walls, human bodies, clothing pockets, or even approaching multiple phones

(34)

8 Results

8.1 O1 AI and big data characteristics/capabilities.

We reached several results by reviewing research and studies that dealt with AI and big data, as shown in Table 8.6.

O1 Results

• Using AI with Digital Image Processing to enhance images or extract valuable data can be used in the medical such as Gamma Ray, X-rays, medical CT scans, UV imaging, etc.

• Using AI with Computer Vision can be used in the medical such as extract data from image data to diagnose patients, detecting a tumor, atherosclerosis, or other malignant changes; Organ dimensions measurements, blood flow, etc.

• Using AI with Neural Network known as ANN to simulate the images in the same way that humans’ brain simulates the images, can be used in the medical field to diagnose patients.

• ML can be used in the medical field to help doctors by suggesting treatment ideas, predicting results, and optimizing clinical processes.

• Using DL with CNN can be used in the medical field to pharmaceutical industry, med- ical image analysis software.

• DL can be used in the medical field to diagnose patients, identify individual variability in drug response, support clinical decisions, and make recommendations about the most appropriate drugs for each person.

• AI can be a helpful source for medical personnel, allowing them to deliver value across the health ecosystem and use their expertise better.

• Big data analytics allows for more accurate analysis, which leads to better performance and decision-making.

• Big data can help in help detecting diseases early and develop new drugs.

• Big data is a valuable resource if and only if analyzed properly using AI algorithms.

• Big data is here to stay, and it will get bigger and bigger. Therefore, The future trend will be the increasing demand for AI.

• Big data and AI complement each other. Artificial intelligence becomes better the more data is provided.

Table 8.6: The found results regarding O1.

(35)

8.2 O2 Contact Tracing Apps.

We reached several results by reviewing research and studies that dealt with Contact Trac- ing Apps, as shown in Table 8.7.

O2 Results

• Contact Tracing Apps analyze the users’ location paths using GPS and the anony- mousID tokens collected from people infected with COVID-19 using Bluetooth to rec- ognize who has been nearby these people.

• Contact Tracing Apps can be an excellent weapon to fight pandemics since nowadays, around half of the world’s population owns a smartphone that has Bluetooth and GPS.

• The server of Contact Tracing Apps creates a securely encrypted Temporary ID so, when devices come into proximity, they exchange TempIDs via Bluetooth encounter messages.

• Contact tracing technology can help us identify the potential new cases and notify peo- ple who crossed the path with them, limiting the spread rate of the virus among the population.

Table 8.7: The found results regarding O2.

8.3 O3 AI and big data applications used to confront the COVID-19.

By reviewing the researches and studies that dealt with the COVID-19 pandemic, it’s notable that big data and AI have an important role in combating the spread of the virus and mitigating its effects. Even though these applications have not been widely used, nor have they been clinically tested, they have provided urgent insights and clinically meaningful information for policymakers and medical personnel. Table 8.8 shows the results we found regarding O3.

References

Related documents

Building on this knowledge, this study aims to perform a content analysis on interim reports in order to explore whether there is a correlation between either the size or

Schwartz DA (2020) An Analysis of 38 Pregnant Women with COVID-19, Their Newborn Infants, and Maternal-Fetal Transmission of SARS-CoV-2: Maternal Coronavirus Infections and

(2020) Clinical characteristics and intrauterine vertical transmission potential of COVID-19 infection in nine pregnant women: a retrospective review of medical records.. Chen S,

• Data Integration – integration of data from heterogeneous systems • Analytical capabilities – business users have the tools and knowledge to leverage the data into information

Detta pekar på att det finns stora möjligheter för banker att använda sig av big data och att det med rätt verktyg skulle kunna generera fördelar.. Detta arbete är således en

Although the research about AI in Swedish companies is sparse, there is some research on the topic of data analytics, which can be used to understand some foundational factors to

The goal of this thesis is to identify the critical success factors in an agile project from various literature that has been analyzed, to see how the contributing attributes in the

In discourse analysis practise, there are no set models or processes to be found (Bergstrom et al., 2005, p. The researcher creates a model fit for the research area. Hence,