Inspired by semantic representations in biology, I present interpretable representations learned by Long Short-Term Memory (LSTM) hidden state neurons, in an effort to explain their predictive success.


LSTMs are a vital part of state-of-the-art Recurrent Neural Network (RNN) architectures, which are generally applied to time-series data such as audio and text. The article builds on evidence that even character-based text prediction models are capable of representing features including line termination, nesting, and the sentiment over words, phrases and even sentences [1]. This follows earlier work that identified interpretable but also a large proportion of apparently irregular hidden state cell activity [2]. Commentators claim that theoretical understanding of LSTMs’ effectiveness is still lacking. This is therefore an endeavour in discovering more interpretable and higher-level computational solutions to representing and remembering linguistic features of a natural language, the English Language.

The properties of biological neurons are typically determined through real-time Magnetic Resonance Imaging (MRI) or computerised tomography (CT) scan monitoring, lesions on animal models, and by association of selective brain damage with presenting symptoms. Of most interest for encoding the meaning and structure of natural language is semantic memory. At present, theories of semantic memory heavily implicate at least the left inferior temporal lobe which lies upstream of verbal and auditory areas. In semantic dementia, progressive semantic memory impairment (as opposed to global language or memory impairment) is associated with characteristic atrophy to predominantly the left temporal lobe [3]. Motivated by the ostensible importance of the circuit topology and neuronal dynamics in this area in semantic representation, this work seeks to unveil various types of linguistic representations learned by artificial networks.

This study therefore aims to determine whether LSTM hidden layer neurons trained on textual English language input display properties analogous to generalised features of natural languages, or the behaviour of biological neurons residing in brain areas implicated in semantic memory. Separately, it aims to generalise these findings and propose directions for future theoretical research on LSTMs.

Visual inspection of LSTMs with LSTMVis

LSTMVis was released as open-source in June 2016 by a team at the Harvard School of Engineering and Applied Sciences.It provides an interactive visualisation to facilitate data analysis of RNN hidden states. A user would select a range of text to represent a hypothesis, and the tool matches this selection to other examples in the data set.

To see if hidden state neurons implicitly learned representations of different components of language, hidden state neuron properties and patterns related to metadata were visually inspected using LSTMVis [4]. The threshold used for matching activation if disparate neurons was 0.3 unless otherwise stated. This was then followed up by principal component analysis on the vector representation of the proposed set of words outside of the tool. This allowed quantitative verification of whether the model indeed distinguishes between samples on the basis of the hypothesised feature, in line with the approach taken by the developers of LSTMVis [5].

Although as referenced earlier, it is possible for character-level models to learn higher-level features, they produce many word-level errors due to the difficulty of accurately predicting global context from an understandably limited character context. In addition, it is less intuitive to explain the types of features that may arise from interpretation of character context than from word context. I therefore selected a predictive word model trained on the Wall Street Journal, which is a 2x650 LSTM language model annotated with gold-standard part-of-speech tags. In testing hypotheses for each language feature, the hidden states of both layer 1 and 2 were examined. The data consists of a meta-word sentence of length 929589.

Sentence chunking

The LSTMVis paper used evidence for phrase chunking in the Gutenberg children’s corpus as a proof of concept for their tool. In a similar vein, I began with a simple hypothesis that the model learns sentence chunks. I verified this against the corpus’ sentence termination tag “</s>”[Endnote 1]. Neuron 611 in hidden state layer 2 was found to display activation patterns sensitive to sentence initiation, sentence length, sentence termination. Disruptions in this consistent activity were additionally observed when many </s> tags were present in quick succession (separated by only 1–3 words), suggesting that the neuron has temporal dynamics that allow short n-grams to disrupt behaviour in the short-term.

In order to further elucidate this neuron’s behaviour, I conducted a small-sample study of its activation timing and decay rate.

Position Position of peak (in viewframe at default zoom, width=61 words) Largest change in activation observed No. of words (From the 2nd word after </s> till next <s> inclusive) 1/w (No. of words for increase in activation of 1 to 2 d.p.)
106750 1 0.2 1 5
106750 2 0.3 4 13.33
106750 3 0.8 36 45
106750 4 0.4 9 22.5
Table 1: Brief examination of the change in activation of hidden state layer 2 neuron 611, and the associated sentence lengths.

Neuron 611 neuron signals the </s> tag by dropping its activation immediately following the tag. Its activity is sentence length-dependent in two ways. First, the magnitude of the activation cliff after </s> seems positively related to some combination of current sentence length and previous sentence length. Second, at almost all positions other than the one after </s>, the neuron activation increases gradually at a decaying rate, which is interrupted after the </s> tag. Curiously, if there are multiple short sentences in quick succession, the </s> tag corresponds to a peak (a rise in activation at </s> in addition to the cliff that follows). Additionally, after long sentences, the activation does not drop to the same level of activation present at the beginning of that sentence. These several features point toward both a preference for some baseline value (ostensibly set at an activation of -0.6), and memory of prior sentence length.

In sum, a hidden state was found in the second layer to encode current and previous sentence chunk properties. This is in line with the expectation that LSTMs are purposeful in their use of memory and forget gates. The results also agree with the intuition that the predicted word configurations present in a sentence depend on prior and current n, where sentences are n-grams.

Semantic category embeddings

We already know that, theoretically, predictive word models using RNNs utilise word embeddings to represent and predict words from context. However, the relationships between neurons that represent particular semantic concepts can be empirically shown and interpreted. This strengthens evidence that predictive RNNs do learn and utilise abstracted concepts in an unsupervised manner, much as a human might when learning about the English Language.During undirected exploration of the dataset in LSTMVis, patterns were found in activation related to certain semantic concepts. 3 hidden state neurons whose activations matched on legal match queries were found to selectively encode institutional concepts with fine differentiation.

Semantic category encoded Position Match query Layer Neuron n-grams
(associated with activation within viewframe of width=61 words)
Institutional; financial 21241 lower court ruling states::states2 378 Fuzzy match with 265; additionally, "u.s.", "services", "programs", "funds", "use", "federal", "woman"
Institutional; financial 126877 lower court ruling states::states2 378 "commission", "audit", "edison", "expenses," utility", "collection", "plant", "customers subject", "commission ruled", "million"
Legislative 21233 appeals court states::states2 169 "a federal appeals court upheld a", "lower court ruling that the u.s.", "bar the", "of", "and human services", "in", "in"
Legislative 21233 appeals court states::states2 365 "abortion ruling", "federal appeals court", "health", "human"
Legislative 21238 lower court ruling states::states2 265 "abortion ruling", "federal appeals court", "lower court ruling", "prohibit"
Legislative 126877 lower court ruling states::states2 265 "illinois supreme court", "commission to audit commonwealth edison's", "utility", "plant", "million", "commission ruled"
Legislative 126877 supreme court states::states2 365 "supreme court", "commonwealth", "commission"
Table 2: Sample of neurons that matched across legislative and institutional/financial semantic categories.

Some fuzzy overlap was observed between the activity of neuron 378 and 265, despite 378 being activated additionally for tokens related to institutional mechanisms. Interestingly, neuron 169 which activated in response to words related to legislation and the court, activated in response to usage of the word “bar”, meaning “ban” in this context, rather than the bar examination, both of which nonetheless have some essential legislative meaning.

However, interesting patterns of periodicity (which may mimic biological semantic organisation), other than n-gram representation, were not found over the course of this study. Unfortunately, neuron-wise matching is not an integrated function of LSTMVis. The author therefore plans to undertake more advanced machine learning methodology to uncover such relationships between different word-embedding hidden states.

If it is hypothesised that different sets of neuron weights are needed for encoding different overarching forms of semantic relationships, then idiomatic phrases are one such difference in overarching form which may prove fruitful for study. Several idiomatic phrases were queried and activated neuron matches were recorded. In addition, a positive and negative sample study was performed for the idiomatic usage of “weather" to determine if the model performs word-sense disambiguation utilising a differentiated set of neurons.

Row Labels Number of active neurons
[industry groups consistently] weather the storm [better than others] 10
break even 9
dragging its feet 17
in the black 20
in the red 21
late october weather 14
persistent dry weather 19
weather any storm 14
weather man 11
weather the downturn 14
Total 149

Of the 63 neurons that had matching activations for the idiomatic match queries, only neuron 319 was found to active highly selectively for the one instance of the phrase “dragging its feet”. Neuron 305 was selective for the idiomatic usage of “weather” and particularly selective for “weather the downturn”. A further 5 neurons (just under 8% of observations) were found to be highly selectively activated for their respective phrases. However, 6 neurons (just over 9.5% of observations) were responsive to 3 different idioms. Of the 56 neurons active for match queries containing “weather”, only 2 (210 and 305 or 3.6%) did not fully distinguish between idiomatic and conventional usage of the word “weather”. Taken together, these observations hint at the existence of idiom neurons, or neurons sensitive to idiomatic n-grams.

However, neuron 305 was particular, in that it was active in all cases, whether or not the phrase was idiomatic. This neuron is therefore likely to encode a linguistic feature unrelated to idioms.

Interpretable representations in the wild

A range of interpretable hidden state behaviours were learned by an LSTM word model in the wild (i.e. in anunsupervised way). These could be key in explaining its state-of-the-art performance.

This study affirmed first the capacity of LSTMs to learn and remember sentence chunks. Secondly, it finds that LSTMscan learn to represent abstracted semantic categories which themselves overlap, though this is possibly a reflectionof the underlying neural language model implementation. Finally, it provides novel empirical evidence for“idiomatic” neurons, a promising foundation for further LSTM experiments aiming to investigate how LSTMs might learnor repurpose existing representations in new contexts and make comparisons to biology.

Further work

To my knowledge, there is considerable research interest in developing algorithms that, unsupervised, can learn to programme. There have been many attempts to do so throughout history, though at present the most robust attempt I have found is the Bayou application [6], which appears to be more of a predictive programming tool than an auto-programming algorithm. Next datasets to consider are to generalise these observations to other large language prediction or translation datasets, and further to coding corpuses; indeed LSTMVis provides a Java Word Model dataset which is worth examining for this purpose.

Overall, this article is undergirded by greater aspirations to discover or develop higher-level, more generalisable types of semantic encoding. In a separate context, for example, a paper found grid-like dynamics in neural network strained to perform navigational tasks, which mimics the behaviour of biological neurons in the inferior temporal area. This area was previously mentioned to be involved in semantic understanding and speech, but also navigation, suggesting perhaps an even more abstracted vector space-type representation may undergird both functions in humans.Such periodic dynamics constitute a method for flexibly encoding and remembering a large amount of generalised high-dimensional data. I believe strongly that the performance of AI on any given task can be improved by selectively or contextually choosing particular models that fall back on different learning rules, of which feature representation is a vital component. It is therefore my hope that future work will discover, implement, or evaluate the performance of grid-like representations in general intelligence models such as the One Model To Learn Them All [7].


[1] ‘Unsupervised Sentiment Neuron’, OpenAI Blog, 06-Apr-2017. [Online]. Available: [Accessed:20-Nov-2018].

[2] A.Karpathy, J. Johnson, and L. Fei-Fei, ‘Visualizing and Understanding Recurrent Networks’, ArXiv150602078 Cs, Jun. 2015.

[3] M.Harciarek and A. Kertesz, ‘Primary Progressive Aphasias and Their Contributionto the Contemporary Knowledge About the Brain-Language Relationship’, Neuropsychol. Rev., vol. 21, no. 3, pp. 271–287, Sep. 2011.

[4] ‘LSTMVis’. [Online]. Available: [Accessed:20-Nov-2018].

[5] H.Strobelt, S. Gehrmann, H. Pfister, and A. M. Rush, ‘LSTMVis: A Tool for VisualAnalysis of Hidden State Dynamics in Recurrent Neural Networks’, ArXiv160607461 Cs, Jun. 2016.

[6] ‘Bayou’. [Online]. Available: [Accessed: 20-Nov-2018].

[7] L.Kaiser et al., ‘One Model To Learn Them All’, ArXiv170605137 Cs Stat, Jun. 2017.


[1] This definition is a result of inference; I could not access the details of the original dataset beyond what was described on the LSTMVis tool online client.

The big picture

A schematic of the parahippocampal gyrus, which includes the hippocampus, and the perirhinal and entorhinal cortices. The latter areas consist of the proposed sites of memory consolidation in the human brain. Cells in these area are arguably the closest biological analogues to LSTMs.