Interpretable linguistic representations learned by LSTMs

deep learning nlp

Inspired by semantic representations in biology, Long Short-Term Memory (LSTM) hidden state neurons learn interpretable representations which could explain their predictive success.


LSTMs are a vital part of state-of-the-art Recurrent Neural Network (RNN) architectures, which are generally applied to time-series data such as audio and text. The article builds on evidence that even character-based text prediction models are capable of representing features including line termination, nesting, and the sentiment over words, phrases and even sentences [1]. This follows earlier work that identified interpretable but also a large proportion of apparently irregular hidden state cell activity [2]. Commentators claim that theoretical understanding of LSTMs’ effectiveness is still lacking. This is therefore an endeavour in discovering more interpretable and higher-level computational solutions to representing and remembering linguistic features of a natural language, the English Language.

The properties of biological neurons are typically determined through real-time Magnetic Resonance Imaging (MRI) or computerised tomography (CT) scan monitoring, lesions on animal models, and by association of selective brain damage with presenting symptoms. Of most interest for encoding the meaning and structure of natural language is semantic memory. At present, theories of semantic memory heavily implicate at least the left inferior temporal lobe which lies upstream of verbal and auditory areas. In semantic dementia, progressive semantic memory impairment (as opposed to global language or memory impairment) is associated with characteristic atrophy to predominantly the left temporal lobe [3]. Motivated by the ostensible importance of the circuit topology and neuronal dynamics in this area in semantic representation, this work seeks to unveil various types of linguistic representations learned by artificial networks.

This study therefore aims to determine whether LSTM hidden layer neurons trained on textual English language input display properties analogous to generalised features of natural languages, or the behaviour of biological neurons residing in brain areas implicated in semantic memory. Separately, it aims to generalise these findings and propose directions for future theoretical research on LSTMs.

Visual inspection of LSTMs with LSTMVis

LSTMVis was released as open-source in June 2016 by a team at the Harvard School of Engineering and Applied Sciences.It provides an interactive visualisation to facilitate data analysis of RNN hidden states. A user would select a range of text to represent a hypothesis, and the tool matches this selection to other examples in the data set.

To see if hidden state neurons implicitly learned representations of different components of language, hidden state neuron properties and patterns related to metadata were visually inspected using LSTMVis [4]. The threshold used for matching activation if disparate neurons was 0.3 unless otherwise stated. This was then followed up by principal component analysis on the vector representation of the proposed set of words outside of the tool. This allowed quantitative verification of whether the model indeed distinguishes between samples on the basis of the hypothesised feature, in line with the approach taken by the developers of LSTMVis [5].

Although as referenced earlier, it is possible for character-level models to learn higher-level features, they produce many word-level errors due to the difficulty of accurately predicting global context from an understandably limited character context. In addition, it is less intuitive to explain the types of features that may arise from interpretation of character context than from word context. I therefore selected a predictive word model trained on the Wall Street Journal, which is a 2x650 LSTM language model annotated with gold-standard part-of-speech tags. In testing hypotheses for each language feature, the hidden states of both layer 1 and 2 were examined. The data consists of a meta-word sentence of length 929589.

Sentence chunking

In order to further elucidate this neuron’s behaviour, I conducted a small-sample study of its activation timing and decay rate.

In sum, a hidden state was found in the second layer to encode current and previous sentence chunk properties. This is in line with the expectation that LSTMs are purposeful in their use of memory and forget gates. The results also agree with the intuition that the predicted word configurations present in a sentence depend on prior and current n, where sentences are n-grams.

Semantic category embeddings

We already know that, theoretically, predictive word models using RNNs utilise word embeddings to represent and predict words from context. However, the relationships between neurons that represent particular semantic concepts can be empirically shown and interpreted. This strengthens evidence that predictive RNNs do learn and utilise abstracted concepts in an unsupervised manner, much as a human might when learning about the English Language.During undirected exploration of the dataset in LSTMVis, patterns were found in activation related to certain semantic concepts. 3 hidden state neurons whose activations matched on legal match queries were found to selectively encode institutional concepts with fine differentiation.

Some fuzzy overlap was observed between the activity of neuron 378 and 265, despite 378 being activated additionally for tokens related to institutional mechanisms. Interestingly, neuron 169 which activated in response to words related to legislation and the court, activated in response to usage of the word “bar”, meaning “ban” in this context, rather than the bar examination, both of which nonetheless have some essential legislative meaning.

However, interesting patterns of periodicity (which may mimic biological semantic organisation), other than n-gram representation, were not found over the course of this study. Unfortunately, neuron-wise matching is not an integrated function of LSTMVis. The author therefore plans to undertake more advanced machine learning methodology to uncover such relationships between different word-embedding hidden states.

If it is hypothesised that different sets of neuron weights are needed for encoding different overarching forms of semantic relationships, then idiomatic phrases are one such difference in overarching form which may prove fruitful for study. Several idiomatic phrases were queried and activated neuron matches were recorded. In addition, a positive and negative sample study was performed for the idiomatic usage of “weather" to determine if the model performs word-sense disambiguation utilising a differentiated set of neurons.

Of the 63 neurons that had matching activations for the idiomatic match queries, only neuron 319 was found to active highly selectively for the one instance of the phrase “dragging its feet”. Neuron 305 was selective for the idiomatic usage of “weather” and particularly selective for “weather the downturn”. A further 5 neurons (just under 8% of observations) were found to be highly selectively activated for their respective phrases. However, 6 neurons (just over 9.5% of observations) were responsive to 3 different idioms. Of the 56 neurons active for match queries containing “weather”, only 2 (210 and 305 or 3.6%) did not fully distinguish between idiomatic and conventional usage of the word “weather”. Taken together, these observations hint at the existence of idiom neurons, or neurons sensitive to idiomatic n-grams.

However, neuron 305 was particular, in that it was active in all cases, whether or not the phrase was idiomatic. This neuron is therefore likely to encode a linguistic feature unrelated to idioms.

Interpretable representations in the wild

A range of interpretable hidden state behaviours were learned by an LSTM word model in the wild (i.e. in anunsupervised way). These could be key in explaining its state-of-the-art performance. This study affirmed first the capacity of LSTMs to learn and remember sentence chunks. Secondly, it finds that LSTMscan learn to represent abstracted semantic categories which themselves overlap, though this is possibly a reflectionof the underlying neural language model implementation. Finally, it provides novel empirical evidence for“idiomatic” neurons, a promising foundation for further LSTM experiments aiming to investigate how LSTMs might learnor repurpose existing representations in new contexts and make comparisons to biology.

Further work

To my knowledge, there is considerable research interest in developing algorithms that, unsupervised, can learn to programme. There have been many attempts to do so throughout history, though at present the most robust attempt I have found is the Bayou application [6], which appears to be more of a predictive programming tool than an auto-programming algorithm. Next datasets to consider are to generalise these observations to other large language prediction or translation datasets, and further to coding corpuses; indeed LSTMVis provides a Java Word Model dataset which is worth examining for this purpose.

Overall, this article is undergirded by greater aspirations to discover or develop higher-level, more generalisable types of semantic encoding. In a separate context, for example, a paper found grid-like dynamics in neural network strained to perform navigational tasks, which mimics the behaviour of biological neurons in the inferior temporal area. This area was previously mentioned to be involved in semantic understanding and speech, but also navigation, suggesting perhaps an even more abstracted vector space-type representation may undergird both functions in humans.Such periodic dynamics constitute a method for flexibly encoding and remembering a large amount of generalised high-dimensional data. I believe strongly that the performance of AI on any given task can be improved by selectively or contextually choosing particular models that fall back on different learning rules, of which feature representation is a vital component. It is therefore my hope that future work will discover, implement, or evaluate the performance of grid-like representations in general intelligence models such as the One Model To Learn Them All [7].


[1] ‘Unsupervised Sentiment Neuron’, OpenAI Blog, 06-Apr-2017. [Online]. Available: [Accessed:20-Nov-2018].

[2] A.Karpathy, J. Johnson, and L. Fei-Fei, ‘Visualizing and Understanding Recurrent Networks’, ArXiv150602078 Cs, Jun. 2015.

[3] M.Harciarek and A. Kertesz, ‘Primary Progressive Aphasias and Their Contributionto the Contemporary Knowledge About the Brain-Language Relationship’, Neuropsychol. Rev., vol. 21, no. 3, pp. 271–287, Sep. 2011.

[4] ‘LSTMVis’. [Online]. Available: [Accessed:20-Nov-2018].

[5] H.Strobelt, S. Gehrmann, H. Pfister, and A. M. Rush, ‘LSTMVis: A Tool for VisualAnalysis of Hidden State Dynamics in Recurrent Neural Networks’, ArXiv160607461 Cs, Jun. 2016.

[6] ‘Bayou’. [Online]. Available: [Accessed: 20-Nov-2018].

[7] L.Kaiser et al., ‘One Model To Learn Them All’, ArXiv170605137 Cs Stat, Jun. 2017.


[1] This definition is a result of inference; I could not access the details of the original dataset beyond what was described on the LSTMVis tool online client.

The big picture

The parahippocampal gyrus includes the hippocampus, and the perirhinal and entorhinal cortices. The latter areas consist of the proposed sites of memory consolidation in the human brain. Cells in these area are arguably the closest biological analogues to LSTMs.