Hostname: page-component-77f85d65b8-hzqq2 Total loading time: 0 Render date: 2026-03-28T23:32:04.513Z Has data issue: false hasContentIssue false

A semantic parsing pipeline for context-dependent question answering over temporally structured data

Published online by Cambridge University Press:  29 October 2021

Charles Chen
Affiliation:
School of Electrical Engineering and Computer Science, Ohio University, Athens, OH 45701, USA
Razvan Bunescu*
Affiliation:
School of Electrical Engineering and Computer Science, Ohio University, Athens, OH 45701, USA
Cindy Marling
Affiliation:
School of Electrical Engineering and Computer Science, Ohio University, Athens, OH 45701, USA
*
*Corresponding author. E-mail: razvan.bunescu@uncc.edu
Rights & Permissions [Opens in a new window]

Abstract

We propose a new setting for question answering (QA) in which users can query the system using both natural language and direct interactions within a graphical user interface that displays multiple time series associated with an entity of interest. The user interacts with the interface in order to understand the entity’s state and behavior, entailing sequences of actions and questions whose answers may depend on previous factual or navigational interactions. We describe a pipeline implementation where spoken questions are first transcribed into text which is then semantically parsed into logical forms that can be used to automatically extract the answer from the underlying database. The speech recognition module is implemented by adapting a pre-trained long short-term memory (LSTM)-based architecture to the user’s speech, whereas for the semantic parsing component we introduce an LSTM-based encoder–decoder architecture that models context dependency through copying mechanisms and multiple levels of attention over inputs and previous outputs. When evaluated separately, with and without data augmentation, both models are shown to substantially outperform several strong baselines. Furthermore, the full pipeline evaluation shows only a small degradation in semantic parsing accuracy, demonstrating that the semantic parser is robust to mistakes in the speech recognition output. The new QA paradigm proposed in this paper has the potential to improve the presentation and navigation of the large amounts of sensor data and life events that are generated in many areas of medicine.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
© The Author(s), 2021. Published by Cambridge University Press
Figure 0

Figure 1. PhysioGraph window displaying one day’s worth of patient data (default view). The blue graph at the top shows the entire timeline of blood glucose measurements, over approximately 8 weeks. The top pane below it shows physiological parameters, including heart rate (red) and skin conductivity (green). The bottom pane shows time series of blood glucose (blue), basal insulin (black), and estimated active insulin (red). Discrete life events such as meals and exercise are shown at the top of the bottom pane, whereas boluses are shown at the bottom. When an event is clicked, details are displayed in the pane on the right.

Figure 1

Figure 2. The proposed semantic parsing pipeline for context-dependent question answering.

Figure 2

Table 1. Examples of interactions and logical forms

Figure 3

Table 2. Performance of speech recognition system on dataset Amber and dataset Frank with and without data augmentation. SR.Amber and SR.Frank are two systems without data augmentation while SR.Amber + Frank and SR.Frank + Amber are two systems enhanced with data augmentation. WER (%) is the evaluation metric

Figure 4

Table 3. Examples of transcriptions generated by SR systems trained on SR.Frank and SR.Amber, and their augmented versions SR.Frank + Amber and SR.Amber + Frank, respectively. True refers to the correct transcription

Figure 5

Table 4. Vocabulary for logical forms

Figure 6

Figure 3. The SeqGen model takes a sequence of natural language (NL) tokens as input $X =x_1, \ldots, x_n$ and encodes it with a Bi-LSTM (left, green). The two final states are used to initialize the decoder LSTM (right, blue) which generates the LF sequence $\hat{Y} = \hat{y}_1, \ldots, \hat{y}_m$. The attention-augmented SeqGen+Att2In model computes attention weights (blue arrows) and a context vector (red arrow) for each position in the decoder.

Figure 7

Figure 4. Context-dependent semantic parsing architecture. We use a Bi-LSTM (left) to encode the input and a LSTM (right) as the decoder. Top shows the previous interaction and bottom shows the current interaction. The complete previous LF is $Y^{-1}$ = [Answer, (, e,),$\wedge$, Around, (, e,., time, OOV,),$\wedge$, e,., type, =, DiscreteType]. The token 10am is copied from the input to replace the generated OOV token (green arrow). The complete current LF is Y = [Answer, (, REF,., time,)]. The entity token e(-1) is copied from the previous LF to replace the generated REF token (green arrow). To avoid clutter, only a subset of the attention lines (dotted) are shown.

Figure 8

Table 5. Sequence-level accuracy on the Artificial dataset and the two Real interactions datasets

Figure 9

Table 6. Ablation results on the Amber dataset, as we gradually add more components to SeqGen

Figure 10

Table 7. Examples generated by SPAAC-MLE and SPAAC-RL using real interactions. MLE: logical forms by SPAAC-MLE. RL: logical forms by SPAAC-RL. True: manually annotated LFs

Figure 11

Table 8. Examples generated by SPAAC-MLE and SPAAC-RL using artificial interactions. MLE: logical forms generated by SPAAC-MLE. RL: logical forms generated by SPAAC-RL. True: manually annotated logical forms

Figure 12

Table 9. Sequence-level accuracy (%) of semantic parsing systems on datasets Amber and Frank. SP.Amber and SP.Frank are the original systems without data augmentation while SP.Amber + Frank and SP.Frank + Amber are the two systems enhanced with data augmentation. SP.All is a system trained and tested on the union of the two datasets

Figure 13

Table 10. Logical forms generated by SP.Amber + Frank versus SP.Amber, and SP.Frank + Amber versus SP.Frank. True refers to the manually annotated logical forms

Figure 14

Table 11. The performance of the entire semantic parsing pipeline. SP.Amber and SP.Frank are the two systems without data augmentation while SP.Amber + Frank and SP.Frank + Amber are the two systems enhanced with data augmentation. Word error rate (WER) and Sequence-level accuracy (%) are the evaluation metrics

Figure 15

Table 12. Transcriptions (SR) and logical forms (SP) generated by the pipeline versus corresponding manually annotations (True text and True LF)

Figure 16

Table A1. Examples of generation of artificial samples