• No results found

From Numerical Sensor Data to Semantic Representations:

N/A
N/A
Protected

Academic year: 2021

Share "From Numerical Sensor Data to Semantic Representations:"

Copied!
2
0
0

Loading.... (view fulltext now)

Full text

(1)

From Numerical Sensor Data to Semantic Representations:

A Data-driven Approach for Generating Linguistic Descriptions

av

Hadi Banaee

Akademisk avhandling

Avhandling för filosofie doktorsexamen i datavetenskap, som kommer att försvaras offentligt

fredag den 20 april 2018 kl. 13.15, Hörsal T, Örebro universitet, Örebro

Opponent: Prof. Antonio Chella University of Palermo

Italy

Örebro universitet

Institutionen för Naturvetenskap och Teknik 701 82 Örebro

(2)

Abstract

Hadi Banaee (2018): From Numerical Sensor Data to Semantic Representations: A Data-driven Approach for Generating Linguistic Descriptions.

Örebro Studies in Technology 78.

In our daily lives, sensors recordings are becoming more and more ubiquitous. With the increased availability of data comes the increased need of systems that can represent the data in human interpretable concepts. In order to describe unknown observations in natural language, an artificial intelligence system must deal with several issues involving perception, concept formation, and linguistic description. These issues cover various subfields within artificial intelligence, such as machine learning, cognitive science, and natural language generation.

The aim of this thesis is to address the problem of semantically modelling and describing numerical observations from sensor data. This thesis introduces data-driven approaches to perform the tasks of mining numerical data and creating semantic representations of the derived information in order to de-scribe unseen but interesting observations in natural language.

The research considers creating a semantic representation using the theory of conceptual spaces. In particular, the central contribution of this thesis is to present a data-Eriven approach that automatically constructs conceptual spaces from labelled numerical data sets. This constructed conceptual space then utilises semantic inference techniques to derive linguistic interpretations for novel unknown observations. Another contribution of this thesis is to explore an instantiation of the proposed approach in a real-world application. Specifically, this research investigates a case study where the proposed approach is used to describe unknown time series patterns that emerge from physiological sensor data. This instantiation first presents automatic data analysis methods to extract time series patterns and temporal rules from multiple channels of physiological sensor data, and then applies various linguistic description approaches (includ ing the proposed semantic representation based on conceptual spaces) to generate human-readable natural language descriptions for such time series patterns and temporal rules.

The main outcome of this thesis is the use of data-driven strategies that ena-ble the system to reveal and explain aspects of sensor data which may other-wise be difficult to capture by knowledge-driven techniques alone. Briefly put, the thesis aims to automate the process whereby unknown observations of data can be 1) numerically analysed, 2) semantically represented, and eventually 3) linguistically described.

Keywords: Semantic representations, Conceptual spaces, Natural language

generation, Temporal rule mining, Physiological sensors, Health monitoring system. Hadi Banaee, School of Science and Technology

Örebro University, SE-701 82 Örebro, Sweden, hadi.banaee@oru.se

Abstract

Hadi B ana ee (2 018) : From Nu mer ical Sen sor Dat a t o S em ant ic Re pres enta tion s: A Da ta -d riven Approa ch for Genera ting Ling uist ic Descr iption s. Örebro S tudie s in Tech nolog y 78. In our daily lives, senso rs re cordin gs a re becoming more

and more ubiq uitou s. With th e increase d av aila bility of da ta comes th e increased nee d of syst ems t ha t can repre sent th e data in hu man int erp reta ble concept s. I n or der to descri be unk now n ob serv ati ons i n na tu ral la ngua ge, a n art ifi cia l i ntell igence syst em mu st deal w ith severa l iss ues in vo lving pe rcep tio n, co ncep t f orm atio n, a nd li ngu ist ic desc ript ion. Th ese issu es cover v ario us su bfie lds w ith in art ific ial i nte llig en ce, su ch as ma ch in e l ear ning , cogn itiv e scien ce, an d na tu ral la ngua ge gene ratio n. The aim of th is t hesi s is t o a ddress t he pr oblem of sema ntica lly modelli ng and de scribing nu merica l ob serva tion s f ro m sens or da ta . This t hes is int roduce s data -driv en a pproa ches to perfor m the ta sks of mini ng nume rica l data and

creating semantic representations of the derived information in order to

de-scribe unseen but interesting observations in natural language.

The research considers creating a semantic representation using the theory of

conceptual spaces. In particular, the central contribution of this thesis is to present a data-Eriven approach that automatically constructs conceptual spaces

from labelled numerical data sets. This constructed conceptual space then utilises

semantic inference techniques to derive linguistic interpretations for novel unknown observations. Another contribution of this thesis is to explore an instantiation of the proposed approach in a real-world application. Specifically,

this research investigates a case study where the proposed approach is used to

describe unknown time series patterns that emerge from physiological sensor data. This instantiation first presents automatic data analysis methods to extract

time series patterns and temporal rules from multiple channels of physio logical

sensor data, and then applies various linguistic description approaches (includ ing the proposed semantic representation based on conceptual spaces) to generate human-readable natural language descriptions for such time series patterns and

temporal rules. The main outcome of this thesis is the use of

data-driven strategies that ena-ble the system to reveal and explain aspects of sensor data which may

other-wise be difficult to capture by knowledge-driven techniques alone. Briefly put, the thesis aims to automate the process whereby unknown observations of data can be 1) numerically analysed, 2) semantically represented, and eventually 3)

linguistically described.

Keywords: Semantic representations, Conceptual spaces, Natural language

generation, Temporal rule mining, Physiological sensors, Health monitoring

system.

Hadi Banaee, School of Science and T

echnology

Örebro University, SE-701 82 Örebro, Sweden, hadi.banaee@oru.se

Abstract

Hadi Banaee (2018): From Numerical Sensor Data to Semantic Representations: A Data-driven Approach for Generating Linguistic Descriptions.

Örebro Studies in Technology 78.

In our daily lives, sensors recordings are becoming more and more ubiquitous. With the increased availability of data comes the increased need of systems that can represent the data in human interpretable concepts. In order to describe unknown observations in natural language, an artificial intelligence system must deal with several issues involving perception, concept formation, and linguistic description. These issues cover various subfields within artificial intelligence, such as machine learning, cognitive science, and natural language generation.

The aim of this thesis is to address the problem of semantically modelling and describing numerical observations from sensor data. This thesis introduces data-driven approaches to perform the tasks of mining numerical data and creating semantic representations of the derived information in order to de-scribe unseen but interesting observations in natural language.

The research considers creating a semantic representation using the theory of conceptual spaces. In particular, the central contribution of this thesis is to present a data-Eriven approach that automatically constructs conceptual spaces from labelled numerical data sets. This constructed conceptual space then utilises semantic inference techniques to derive linguistic interpretations for novel unknown observations. Another contribution of this thesis is to explore an instantiation of the proposed approach in a real-world application. Specifically, this research investigates a case study where the proposed approach is used to describe unknown time series patterns that emerge from physiological sensor data. This instantiation first presents automatic data analysis methods to extract time series patterns and temporal rules from multiple channels of physiological sensor data, and then applies various linguistic description approaches (includ ing the proposed semantic representation based on conceptual spaces) to generate human-readable natural language descriptions for such time series patterns and temporal rules.

The main outcome of this thesis is the use of data-driven strategies that ena-ble the system to reveal and explain aspects of sensor data which may other-wise be difficult to capture by knowledge-driven techniques alone. Briefly put, the thesis aims to automate the process whereby unknown observations of data can be 1) numerically analysed, 2) semantically represented, and eventually 3) linguistically described.

Keywords: Semantic representations, Conceptual spaces, Natural language

generation, Temporal rule mining, Physiological sensors, Health monitoring system. Hadi Banaee, School of Science and Technology

Örebro University, SE-701 82 Örebro, Sweden, hadi.banaee@oru.se

Abstract

Hadi B ana ee (2 018) : From Nu mer ical Sen sor Dat a t o S em ant ic Re pres enta tion s: A Da ta -d riven Approa ch for Genera ting Ling uist ic Descr iption s. Örebro S tudie s in Tech nolog y 78. In our daily lives, senso rs re cordin gs a re becoming more

and more ubiq uitou s. With th e increase d av aila bility of da ta comes th e increased nee d of syst ems t ha t can repre sent th e data in hu man int erp reta ble concept s. I n or der to descri be unk now n ob serv ati ons i n na tu ral la ngua ge, a n art ifi cia l i ntell igence syst em mu st deal w ith severa l iss ues in vo lving pe rcep tio n, co ncep t f orm atio n, a nd li ngu ist ic desc ript ion. Th ese issu es cover v ario us su bfie lds w ith in art ific ial i nte llig en ce, su ch as ma ch in e l ear ning , cogn itiv e scien ce, an d na tu ral la ngua ge gene ratio n. The aim of th is t hesi s is t o a ddress t he pr oblem of sema ntica lly modelli ng and de scribing nu merica l ob serva tion s f ro m sens or da ta . This t hes is int roduce s data -driv en a pproa ches to perfor m the ta sks of mini ng nume rica l data and

creating semantic representations of the derived information in order to

de-scribe unseen but interesting observations in natural language.

The research considers creating a semantic representation using the theory of

conceptual spaces. In particular, the central contribution of this thesis is to present a data-Eriven approach that automatically constructs conceptual spaces

from labelled numerical data sets. This constructed conceptual space then utilises

semantic inference techniques to derive linguistic interpretations for novel unknown observations. Another contribution of this thesis is to explore an instantiation of the proposed approach in a real-world application. Specifically,

this research investigates a case study where the proposed approach is used to

describe unknown time series patterns that emerge from physiological sensor data. This instantiation first presents automatic data analysis methods to extract

time series patterns and temporal rules from multiple channels of physio logical

sensor data, and then applies various linguistic description approaches (includ ing the proposed semantic representation based on conceptual spaces) to generate human-readable natural language descriptions for such time series patterns and

temporal rules. The main outcome of this thesis is the use of

data-driven strategies that ena-ble the system to reveal and explain aspects of sensor data which may

other-wise be difficult to capture by knowledge-driven techniques alone. Briefly put, the thesis aims to automate the process whereby unknown observations of data can be 1) numerically analysed, 2) semantically represented, and eventually 3)

linguistically described.

Keywords: Semantic representations, Conceptual spaces, Natural language

generation, Temporal rule mining, Physiological sensors, Health monitoring

system.

Hadi Banaee, School of Science and T

echnology

Örebro University, SE-701 82 Örebro, Sweden, hadi.banaee@oru.se

References

Related documents

The aim for this thesis is to see if one word of user data can be read from a trains RFID tag, when it passes by at a maximum speed of 250 km/h and at a maximum distance of 2.7 m

For the point anomaly detection the method of Isolation forest was applied and for contextual anomaly detection two different recurrent neural network architectures using Long

When assessing the relation between visual acuity and eye dominance, a significant difference between the number of subjects with better right visual acuity among left and

In 2012 he joined the Center of Applied Autonomous Sensor Systems (AASS) of Örebro University in Sweden as a doctoral student. His research interests include various aspects of

The inclusion criteria were: (1) study participants of any age diagnosed with type 2 diabetes or pre-diabetes, (2) studies evaluating the modulation of the gut microbiota, either

At this point, depending on which entropy method is used for optimisation in stage C, either the full GMM will be used through the next steps or a moment matched approximation will

He has been employed by Saab since 1994 as system engineer working mainly with modelling and simulation of airborne platforms and synthetic natural environments. His

This thesis investigates the extraction of semantic information for mobile robots in outdoor environments and the use of semantic information to link ground-level occupancy maps