Technologies of the
Internet of Things (IoT) [1] together with the corresponding edge [2]
and fog [3] computing techniques enable rapid development of so-called cyber-physical
systems, which are the alloy of the real and the virtual worlds and assume the
computation power is incorporated and distributed within real-world everyday
objects [4]. The key paradigm of human-machine interaction within
cyber-physical systems is tangible user interface (TUI), proposed by Hiroshi
Ishii in 1997 [5]. TUI assumes interaction with virtual objects through
their physical “avatars” – real-world objects, which are not universal for the
entire virtual world (like traditional controls, such as mouse or keyboard),
but are close in shape, meaning and distinctive features to their virtual
prototypes. Moreover, these physical objects often share some other functions,
which are different from steering the virtual objects: they can be some real-world
tools, pieces of interior, etc.
The nature of TUI implies multimodal
interplay with the human that is far beyond traditional pushing of the buttons.
First of all, haptic channel is employed as an addition to traditional audio
and visual channels. This is especially important for the visually impaired
people providing them with wide range of possibilities to communicate with the
virtual environment. As per the task solved, other modalities can be combined
with the haptics, for example, spatial gestures.
In our opinion, the multimodal nature of
cyber-physical systems opens up the wide opportunities for humans in the field
of complex data analytics. TUI for visual analytics systems enable expert to
efficiently utilize his / her perceptive and cognitive mechanisms to
understanding the features of the analyzed objects and thereby speed up the
process and increase the quality of analysis.
Cyber-physical interaction of the human
with the visual analytics system can be denoted as “preceptive-cognitive interface”
(PCI). PCI is an ergonomic multimodal interface customized for the particular
analytics task involving the human sensory-motor sphere and contributing in the
speed and quality of analysis.
The aim of this work is to develop the
concept of perceptive-cognitive interface for multimodal analytics tasks, as
well as to describe the implementation of the appropriate software and hardware
for confirming in practice the viability of the proposed concept.
The creation of
perceptive-cognitive interfaces is based on the characteristics of the human
psyche, which is capable of processing information transmitted by organs of
sensory perception and harmonizing these heterogeneous multimodal signals to
form a holistic picture of the world. Moreover, most of the understood and
consciously processed information is transmitted in the process of verbal
communication, which is mainly carried out using natural language through audio
or visual channels. However, the ability of the psyche to structure information
obtained through different channels of perception means that the other modes of
sensory perception available to human also have great potential for
understanding and cognition. Language semantics is closely related to the
sensorimotor sphere of human. This is evidenced by research area of
neuroscience and cognitive psychology (the so-called Embodiment Theories
– theories involving the consideration of human consciousness in relation to
physical environmental factors) [6–8]. The researchers have identified
patterns between pronouncing words with the food semantics and
salivation [9], reactions of the expansion or contraction of the pupil to
words conveying darkness and brightness [10], manifestations
of motor activity as a reaction to words with the meaning of actions [11],
etc. In addition, it was found that the words associated with the activation of
the sensorimotor experience consistent with the semantics of the word
contribute to the understanding of messages, and, on the contrary, the words
with the semantics of the contrasting (uncoordinated) sensorimotor experience
hinder the understanding of verbal information.
Words
(considered in context), which have several sensorimotor codes, are more easily
recognized [12]. This finds practical application in the learning
process [13] and in search technologies for multimodal (visual and audio)
content [14]. The interconnection of words of abstract semantics with the
sensorimotor sphere is described differently in the context of Embodiment
Theories: either through the theory of conceptual metaphor introduced by
G. Lakoff [8], or through the idea of a closer connection between
abstract words and the emotional sphere [15].
The idea of
perceptive-cognitive interfaces is based on Embodiment Theories and
technological possibilities of using multimodal channels for the transmission
of natural language information. From a technological point of view, this
possibility is provided by a system of sensors used to activate text fragments
of a certain sensory (sensorimotor) semantics. To activate, the semantics of
sensors must be connected with the semantics of language content using a common
formal model, which can be an ontology that links the levels of possible types
of sensors (temperature, sound, light, movement in space – motility – and
others) with the values of the sensors (color – white, black, etc.; moving in
space – up, down, right, left, etc.) and the names of semantic fields
consisting of the meanings of words/phrases with perceptual semantics. Here,
the semantic field is understood as “the totality of linguistic units, combined
by a common content and reflecting the conceptual, substantive or functional
similarity of the designated phenomena” [16].
The
perceptive-cognitive interface in multimodal analytics systems allows the
expert to use his / her perceptive-cognitive experience to find the
pertinent information. The search strategy is based on the general hypothesis,
in which proving process numerous operational hypotheses are used (for example,
“Are there any messages in the analyzed data in which <condition>”).
Operational hypotheses arise due to the knowledge gained about the relationship
of individual modal values with each other and their regular presence in texts,
i.e. their semantic standardization. In this context, the expert’s
perceptive-cognitive experience and operational hypotheses become part of the
human-machine interface.
The Fig. 1
shows a fragment of the PCI ontological model for spatial movements’ sensors
(for a sake of simplicity, only four types of movement are presented: up, down,
right and left). This ontology is used for ontologically controlled solution of
the problem of determining the type of personality of a native speaker
depending on the use of words with certain perceptual semantics in the
framework of the state research project of Perm State University for 2017–2019,
project No. 34.1505.2017/4.6 “Verbal and nonverbal behavior of a social network
user: socio-cognitive modeling using machine learning methods and
geoinformation technologies”.
Fig. 1. A fragment of perceptive-cognitive
interface ontology.
In the above
fragment, the categorical semantics (Category) of the verbal messages is
represented by “Spatial” and “Temporal” sub-categories. Spatiality, in its
turn, is represented by the “Direction” and “Speed” of movement. The direction
of movement is realized by means of antonymic pairs: “Moving Up” and “Moving
Down”, “Moving Right” and “Moving Left”. Note that in this case we take into
account only the semantics of messages (for example, “he raised his
hands above his head”, “look in the lower left corner of the
screen”, etc.). The semantics implemented in messages is directly related to
the detectable gestures “Up”, “Down”, “Left” and “Right”, which, in turn, can
be used to control the process of visual analysis of the messages’ semantics.
For example, when using PCI gesture “Up”, the content that has the semantics of
moving up should be filtered (“Moving Up”).
The instance
“Text 1”, is on the one hand a representative of the message class (“Text”),
and on the other hand belongs to a specific author (“User 1”). “User 1”, in
turn, is an informant (“Informant”), who has some texts created by
him / her and a set of parameters (“Parameter”), for example, the
psychological parameters of the large five – factor personality questionnaire
BFI (Big Five Inventory) and its values, for example, “bfin” – the severity of
neuroticism: character traits predisposing to the experience of negative
emotions [17].
Thus, the
ontological model allows using gesture to filter content in accordance with the
semantics of the gesture used, and then analyze the statistics of the
parameters of personal characteristics of the authors of texts, in which this
semantics is expressed. For example, if we want to know what percentage of
informants with high rates of neuroticism refer in their statements to the
semantics of moving up, it is enough for us to make a hand gesture “Up” within
PCI.
The presented ontological model is
implemented in the information system Semograph. This system is intended to
automate the process of text data analyzing, creating the corpora, conducting
and interpreting the results of psycholinguistic, sociolinguistic, and the like
experiments, to create the classifiers and thesauri of subject areas, to
construct the models and other tasks that arise during the analysis of text
content [18].
To solve the problem of connecting gestures
with message semantics, in Semograph we created a hierarchical classifier,
which cells contain semantic fields, and the markup procedure is designated as
field analysis (see. Fig. 2). The markup of the messages with spatial
semantics was carried out by two experts. During the classification process all
experts developed a concerted position on controversial issues.
Fig. 2. Screenshot of the field
analysis window that classifies user comments.
In the Fig. 2 it can be seen that the field
analysis window consists of three areas: “Fields”, “Terms” and “Contexts”. The
left column of “Fields” shows semantic fields. “Terms” (comments of the social
network users) are filtered by the UP field (only comments that are included in
this field are displayed). In the “Contexts” column we can see the same
comments with their additional parameters, including links to the contexts in
which they occur. The “Terms” column reflects the frequency of using comments
in the entire reaction corpus (column “C”) and the number of occurrences of
this unit in semantic fields (column “F”).
An important
feature of the presented model is the hierarchical organization of semantic
fields and their potential extensibility. For example, each field with spatial
semantics can be divided into three subfields: the first subfield consists of
lexical units with an explicit expression of sensual semantics, the second
subfield is composed of lexical units with an implicit expression of sensual
semantics, and the third one consists of lexical units associated with this
semantics indirectly, for example, using metaphorical / metonymic
transfers, etymology, etc. This approach allows us, on the one hand, to expand
significantly the repertoire of lexical units covered by the presented model,
and on the other hand, to choose the level of semantic complexity appropriate
for the expert.
The hardware part of PCI in this case is a
glove-style device that detects spatial gestures by using inertial measurement
unit (angular position detector) MPU6050 and programmable microcontroller
ESP8266. Software part of PCI is SciVi visual analytics platform [19] that
interprets the gestures and treats them as semantic filters for the data
visualized as a graph. This graph demonstrates the relationships between the
verbal behavior of people and their psychological characteristics revealed
using machine learning methods [20–22].
The PCI is distinct from the traditional
human-machine interface (like mouse or keyboard) by its specialization: its
hardware part is customized to suit the particular visual analytics task. In
this sense it is close to the GUI, which appearance may vary from task to task.
However, PCI involves changes both in virtual and in physical part of the
interface. This, in turn, requires high-level tools and high-level hardware and
software building blocks to enable an expert to assemble a custom PCI without
having deep skills in programming or electronics. SciVi platform provides
corresponding mechanisms to automate customizations in the software part of
PCI.
We propose the following life cycle of PCI
within SciVi:
1.
Designing the PCI.
2.
Assembling the hardware part of PCI.
3.
Writing and installing the software part of PCI
(firmware for the device and driver for the computer).
4.
Calibration of the device’ sensors.
5.
Testing and debugging of the communication
between PCI and application it is supposed to steer.
6.
Solving analytics tasks with PCI.
Traditionally, each stage of this life
cycle is supported by different, often unrelated instruments. In this work we
propose to utilize SciVi visual analytics platform [19] as a unified
approach for PCI creation and usage. SciVi includes high-level flexible
adaptation and customization mechanisms governed by ontologies. It leverages
communication with different data sources to obtain data for analytics,
provides mechanisms to declare preprocessing (filtering) and visualization
algorithms as well as supports device firmware generation. Thanks to this,
SciVi can help to fulfill stages 3–6 in a uniform way making the PCI creation
available for experts without advanced software engineering skills.
Designing and assembling stages cannot yet
be automated, but the electronic components ontology of SciVi [19] has,
among others, the recommendation function, as it contains a description of
various microcontrollers, sensors, actuators and commutators, as well as
methods of their interactions. It can be treated as a guideline, which
components fit together and how they should be interconnected.
The firmware generation mechanism included
with SciVi operates on the basis of the electronic components’ ontology
indicated above [23]. The logic of the PCI, as well as the visualization
and analysis algorithms, are declared using a data flow diagram (DFD), composed
by the user within built-in high-level editor. This altogether allows almost
complete automation of the PCI software part development.
Fig. 3 presents the basic concept of
SciVi usage as a platform for PCI-powered visual analytics.
Fig. 3. PCI in SciVi: blue arrows denote the
main direction of both data and control flows.
Fig. 4
demonstrates the DFD describing firmware for ESP8266 microcontroller used in
the glove PCI that has been assembled during the current work.
Fig. 4. Data flow diagram
describing device firmware.
The data sources in this case are the
inertial measurement unit (IMU) MPU6050, supplying acceleration and angular
velocity of the glove and digital pin, connected to the button pushed when
index finger is bent. The IMU data are passed to Mahony filter [24] that
transforms them into quaternion describing the glove’s orientation. The boolean
signal from digital pin is debounced to reduce the random noise inevitably
appearing when button is pushed. Then, both orientation quaternion and finger
bending flag are serialized in JSON format, combined into single message and
transmitted via WebSocket over the WiFi.
According to the signal message format
(defined by serialization and transmission nodes), ontological profile of the
device is automatically created. This profile describes the output data of the
device, which should be used as control signals during the visual analytics
process. The ontological profile enables to generate the node that represents
the device as a control signal source as it is schematically shown in the Fig. 5.
Fig. 5. Generation of the device
ontological profile.
Calibration of the PCI sensors can be
simplified by means of visual monitoring of the control signals obtained from
the device. SciVi DFD editor provides fast and flexible way of choosing the
most observable and informative form of data displaying, and thereby helps to
efficiently estimate the sensor measurement errors [25]. The graphical
user interface generator within SciVi allows to incorporate feedback widgets
into the visualization view. These widgets can be used to set up calibration
parameters and transfer them back to the device. For example, threshold values
can be tuned at runtime to compensate the sensor errors and noise [25].
The debug of the interaction between PCI
and particular graphical scene visualized by SciVi is also based on modifying
the DFD: the user can try out different combinations of rendering algorithms
and semantic filters for the data, searching for the most convenient and
observable variants. Thereby the one can rapidly examine, if the particular PCI
suites the needs of current visual analytics task, or it has to be improved.
Ontology driven functioning of SciVi simplifies the extending of the platform’s
capabilities: if the new semantic filters or rendering mechanisms are required
for the particular visual analytics task, they can be added without changing
the source code of the SciVi core. Thereby, SciVi can be tuned for solving the
wide range of visualization and analytics problems, involving both traditional
graphical user interfaces and PCI.
Fig. 6 demonstrates DFD that describes
the mapping of the different gestures to the language semantics. The “Glove”
node denotes communication with the glove PCI. The connection is established
automatically; technical details of data transmission needed for this
connection are described in the ontological profile of the glove device. The
nodes “Getsure Up”, “Gesture Down”, “Gesture Left” and “Gesture Right” describe
detectors of corresponding gestures. The key DFD element is the node
“Classifier” depicting the filter, that actually maps gestures to the language
semantics according to the ontology, that has been shown in the Fig. 1.
Under the hood this filter generates ontology driven algorithm for selecting
the data, which semantics matches the gesture detected, and transmits this
algorithm to the graph view represented by “BFI Graph” node. The “BFI Graph”
node denotes the visualization type of the data and the data itself. In this
example, for the sake of DFD simplicity, the data set is tied to the visual
object as a parameter accessible through the node’s settings. The rendering
result is shown in the Fig. 7. According to the DFD described above, data
filtering in a circular graph [22] is controlled by the glove PCI.
Fig. 6. Data flow diagram
describing gesture-based visual analytics task.
Fig. 7. Circular graph rendered
according to the data flow diagram from the Fig. 6.
Our previous research
allowed to transform SciVi scientific visualization system into the
feature-packed visual analytics platform. In the current work we integrated
methods and means of ontology engineering with IoT technologies to enrich SciVi
with hardware perceptive-cognitive human-machine interface. This is a first
step towards multimodal analytics systems, involving both visual and
sensorimotor perceptive channels of the expert. Technique proposed were used to
solve a practical problem of analyzing the dependencies between psychological
parameters and verbal behavior of social network users. According to the
proposed approach a PCI was developed that enables to control the data search
and filtering mechanisms with spatial gestures, associated with corresponding
spatial semantics. Gestures are detected with special glove manipulator that
operates on the basis of inertial measurement unit. Presented ontology driven
classifier maps these gestures to the data selection algorithms, which build up
a semantic filter for the data being visualized. This, in turn, speeds up the
analysis process allowing the expert do find the relevant data just by the
single gesture without any textual queries, sliders dragging, etc.
In the future, it is planned to expand the
range of supported PCI modalities with a haptic channel, using various types of
sensors and various methods of their integration into sensor networks based on
the principles of IoT technologies. In addition, it should be examined how
tight the expert’s perceptual and cognitive experience is converged with PCI,
because in the process of multimodal analytical activity the use of this type
of interface can qualitatively transform the expert’s perceptual-cognitive
experience due to the possible formation of additional neural connections
between visual, auditory, motor, and other centers. These studies will be
carried out using the capabilities of the 128-channel BE Plus LTM neurovisor.
In the future, this can significantly improve the methods and means of
automated transformation of machine-to-machine IoT-systems into human-centric
ones.
The reported study is partially supported
by Ministry of Education and Science of the Russian Federation, State
Assignment No. 34.1505.2017/4.6 (Research Project of Perm State University,
2017–2019).
1.
Rose, K., Eldridge, S., Chapin, L. The Internet
of Things: an Overview [Electronic Resource] // The Internet Society (ISOC). –
2015. URL: https://www.internetsociety.org/resources/doc/2015/iot-overview (last accessed 09.10.2019).
2.
Khan, W., Ahmed, E., Hakak, S., Yaqoob, I.,
Ahmed, A. Edge computing: A survey // Future Generation Computer Systems. –
Elsevier, 2019. – Vol. 97. – PP. 219-235. DOI: 10.1016/j.future.2019.02.050.
3.
Zhang, P., Zhou, M., Fortino, G. Security and
trust issues in Fog computing: A survey // Future Generation Computer Systems.
– Elsevier, 2018. – V. 88. – PP. 16–27. DOI: 10.1016/j.future.2018.05.008.
4.
Sanfelice, R. Analysis and Design of
Cyber-Physical Systems. A Hybrid Control Systems Approach // Cyber-Physical
Systems: From Theory to Practice / Rawat, D., Rodrigues, J., Stojmenovic, I. –
CRC Press, 2015. – PP. 3–31. DOI: 10.1201/b19290-3.
5.
Ishii, H., Ullmer, B. Tangible Bits: Towards
Seamless Interfaces Between People, Bits and Atoms // CHI '97 Proceedings of
the ACM SIGCHI Conference on Human Factors in Computing Systems. – ACM, 1997. –
PP. 234–241. DOI: 10.1145/258549.258715.
6.
Barsalou, L. Perceptual Symbol Systems //
Behavioral and Brain Sciences. – 1999. – Vol. 22. – PP. 577–609.
7.
Pulvermüller, F. Words in the Brain’s
Language // Behavioral and Brain Sciences. – 1999. – Vol. 22. – PP. 253–279.
8.
Gallese, V., Lakoff, G. The Brain Concepts: the
Role of the Sensorymotor System in Conceptual Structure // Cognitive
Neuropsychology. – 2005. – Vol. 22, I. 3. – PP. 455–479. DOI:
10.1080/02643290442000310.
9.
Staats, A., Hammond, O. Natural Words as
Physiological Conditioned Stimuli: Food-Word-Elicited Salivation and
Deprivation Effects // Journal of Experimental Psychology. – 1972. – Vol. 96,
I. 1. – PP. 206–208. DOI: 10.1037/h0033508.
10. Laeng, B., Sulutvedt, U. The Eye Pupil Adjusts to Imaginary Light //
Psychological Science. – 2013. – Vol. 25, I. 1. – PP. 188–197. DOI:
10.1177/0956797613503556.
11. Aravena, P., Delevoye-Turrell, Y., Deprez, V., Cheylus, A.,
Paulignan, Y., Frak, V., Nazir, T. Grip Force Reveals the Context Sensitivity
of Language-Induced Motor Activity during “Action Words” Processing: Evidence
from Sentential Negation // PLoS ONE. – 2012. – Vol. 7, I. 12. DOI:
10.1371/journal.pone.0050287.
12. Hoffman, P., Lambon Ralph, M. Shapes, Scents and Sounds: Quantifying
the Full Multi-Sensory Basis of Conceptual Knowledge // Neuropsychologia. –
Elsevier, 2013. – Vol. 51, I. 1. – PP. 14–25. DOI:
10.1016/j.neuropsychologia.2012.11.009.
13. Lockwood, G., Hagoort, P., Dingemanse, M. How Iconicity Helps People
Learn New Words: Neural Correlates and Individual Differences in Sound-Symbolic
Bootstrapping // Collabra: Psychology. – 2016. – Vol. 2, I. 1. – PP. 1–15. DOI:
10.1525/collabra.42.
14. Chang, Sh.-F., Ellis, D., Jiang, W., Lee, K., Yanagawa, A., Loui,
A., Luo, J. Large-Scale Multimodal Semantic Concept Detection for Consumer
Video // Multimedia Information Retrieval. – 2007. – P. 255–264. DOI:
10.1145/1290082.1290118.
15. Meteyard, L., Cuadrado, S., Bahrami, B., Vigliocco, G. Coming of
Age: a Review of Embodiment and the Neuroscience of Semantics // Cortex. –
Elsevier, 2012. – Vol. 48, I. 7. – PP. 788–804. DOI:
10.1016/j.cortex.2010.11.002.
16. Kuznetsov, A. Field [in Russian] // Linguistics. Great Academic
Dictionary. Sec. Ed. – Great Russian Encyclopedia, 1998. – 685 p.
17. Shchebetenko, S. Reflexive Characteristic Adaptations Explain Sex
Differences in the Big Five: but not in Neuroticism // Personality and
Individual Differences. – 2017. – Vol. 111. – PP. 153–156. DOI:
10.1016/j.paid.2017.02.013.
18. Belousov, K., Erofeeva, E., Leshchenko, Y., Baranov, D. “Semograph”
Information System as a Framework for Network-Based Science and Education //
Smart Education and e-Learning. – Springer, 2017. – PP. 263–272. DOI:
10.1007/978-3-319-59451-4_26.
19. Ryabinin, K., Chuprina, S., Kolesnik, M. Calibration and Monitoring
of IoT Devices by Means of Embedded Scientific Visualization Tools // Lecture
Notes in Computer Science. – Springer, 2018. – Vol. 10861. – PP. 655–668. DOI:
10.1007/978-3-319-93701-4_52.
20. Ryabinin K.V., Belousov K.I., Chuprina S.I., Shchebetenko S.A.,
Permyakov S.S. Visual Analytics Tools for Systematic Exploration of
Multi-Parameter Data of Social Web-Based Service Users // Scientific
Visualization. – National Research Nuclear University “MEPhI”', 2018. – Q. 3,
Vol. 10, No. 4. – PP. 82–99. DOI: 10.26583/sv.10.4.07.
21. Ryabinin, K.V., Baranov, D.A., Belousov, K.I. Integration of
Scientific Visualization Toolset SciVi with Information System Semograph //
Proceedings of 27th International Conference GraphiCon 2017. – 2017. – PP.
138–141.
22. Ryabinin, K.V., Chuprina, S.I., Belousov, K.I., Permyakov, S.S.
Visual analytics methods of the verbal behavior variability of social networks
users depending on their individual psychological features // Proceedings of 28th
International Conference GraphiCon 2018. – 2018. – PP. 163–167.
23. Ryabinin, K., Chuprina, S., Belousov, K. Ontology-Driven Automation
of IoT-Based Human-Machine Interfaces Development // Lecture Notes in Computer
Science. – Springer, 2019. – Vol. 11540. – PP. 110–124. DOI:
10.1007/978-3-030-22750-0_9.
24. Mahony, R., Hamel, T., Pflimlin J. Nonlinear Complementary Filters
on the Special Orthogonal Group // IEEE Transactions on Automatic Control. –
IEEE, 2008. – Vol. 53, No. 5. – PP. 1203–1218. DOI: 10.1109/TAC.2008.923738.
25. Ryabinin, K., Chuprina, S. High-Level Toolset For Comprehensive
Visual Data Analysis and Model Validation // Procedia Computer Science. –
Elsevier, 2017. – Vol. 108. – PP. 2090–2099. DOI: 10.1016/j.procs.2017.05.050.