We present two studies on neural network architectures that learn to represent sentences by composing their words according to automatically induced binary trees, without ever being shown a correct parse tree. We use Tree-Long Short-Term Memories (LSTMs) as our composition function, applied along a tree structure found by a differentiable natural language chart parser. The models simultaneously optimise both the composition function and the parser, thus eliminating the need for externally provided parse trees, which are normally required for Tree-LSTMs. They can therefore be seen as tree-based recurrent neural networks that are unsupervised with respect to the parse trees. Due to being fully differentiable, the models are easily trained with an off-the-shelf gradient descent method and backpropagation.
In the first part of this paper, we introduce a model based on the CKY chart parser, and evaluate its downstream performance on a natural language inference task and a reverse dictionary task. Further, we show how its performance can be improved with an attention mechanism which fully exploits the parse chart, by attending over all possible subspans of the sentence. We find that our approach is competitive against similar models of comparable size and outperforms Tree-LSTMs that use trees produced by a parser.
Finally, we present an alternative architecture based on a shift-reduce parser. We perform an analysis of the trees induced by both our models, to investigate whether they are consistent with each other and across re-runs, and whether they resemble the trees produced by a standard parser.