*A computational neuroscientific perspective on encoding, optimisation, free energy, and architectural principles. I review select papers presented during the **“Theory towards Brains, Machines and Minds” Workshop** held by the **RIKEN Centre for Brain Science** 15–16 October 2019.*

#### Changelog

*Updated 21/10/19: ENet added as implementation of an asymmetric encoder-decoder.*

## Asymmetry in the auditory cortex

Experimental methodology: calcium imaging of the mouse auditory cortex during a perceptual decision-making task

Finding | Possible implementations in AI |

Encoder neuron behaviour is modulated by reward expectation and stimulus probability, while decoder neuron behaviour is not. The activations that are driving output behaviour are either parallel to decoder processing, or are modulated in downstream areas. | Reinforcement learning might benefit from asymmetric fine-tuning, where only encoder weights are fine-tuned. Alternatively, we might mimic this using skip connections. In one branch, the encoder would connect directly to output modules, in effect skipping the decoder. Another would connect the encoder to output modules through the decoder. |

A relatively larger number of encoder neurons than decoder neurons participate in the task. | Asymmetric encoder-decoder architectures, where encoder layer size or depth is higher than those of the decoder. Found in ENet, a real-time semantic segmentation algorithm. |

Encoder neurons vary their thresholds more than decoder neurons. Therefore, decoder neuron behaviour more closely resembles the expected prediction for the constant synaptic weights hypothesis (stable synaptic weights), than dynamic synaptic weights hypothesis (weights change frequently). | Differentiated learning weights, where encoders learn faster than decoders. |

Reference: Neural encoding and decoding of auditory cortex during perceptual decision making (Seminar by Dr. Akihiro Funamizu, 2019)

## Natural language automatons

Experimental methodology: artificial, human MEG and ECoG from the left fusiform cortex during a visual word task

Finding | Possible implementations in AI |

Many studies in the field test the biological plausibility of artificial neural networks. However, few aim to design neural networks from biological principles. | Be Promethean. |

Probabilistic Context-Free Grammar, the most complex automaton model tested, was most able to reproduce human patterns of behaviour. An expanded study covering Deep Learning models might be able to situate human, and thus optimal, complexity, among the entire landscape of natural language models. | Testing and designing for optimal complexity in neural networks for NLP. |

Reference: Construction and evaluation of neurocomputational models of natural language (Dr. Yohei Oseki, 2019)

## Network optimisation in biology

Are humans special because our brains are tuned for learning? We explore what kinds of biological hyperparameters might be tuned at birth.

### Hardwire chaos for musculoskeletal learning

Experimental methodology: computer modelling of foetal General Movement generation, computer modelling of artificial snake movement using spiking neural networks with small-world connectivity

Finding | Possible implementations in AI |

Musculoskeletal and cognitive disorders are correlated with hypoconnectivity. | Initialisation with the right connectivity provides an important bias for learning, and disorders or abherrant behaviour in humans can clarify important hyperparameters for general intelligence and learning. |

Chaotic oscillators can produce convergent, emergent movement behaviour. Spontaneous activity generates transitions between attractor states. | Randomness might be important in avoiding local minima and saddles. |

Low complexity networks converge to a higher proportion of excitatory-to-inhibitory synapses, and lower proportion of excitatory-to-excitatory synapses than high complexity networks (where complexity is said to be measured in terms of “entropy”). | For larger, deeper networks, the assumption of feedforward-only connections is more reasonable in terms of biological plausibility. |

Reference: Development of complex behaviors and brain activities: A constructivist approach (Dr. Hiroki Mori, 2019)

### Time- and data-efficiency in olfactory perceptual learning

Experimental methodology: analytical study of mammalian and insect olfactory circuit network sizes, and computer modelling of a 3-layer olfactory perceptual learning network

Finding | Possible implementation in AI |

Conservation of scaling laws between input and hidden cell layers in mammals and insects (at different exponents). | There is potential for layer size optimisation given specific assumptions, including the use of Bayesian inference, Gaussian input, fixed input size, and energy constraints. An optimal layer size exists due to bias-variance trade-offs of the effectively single-layer feedforward network.However, the speaker referenced a seminal paper on how bias-variance trade-offs are circumvented by neural networks (Belkin et al. 2019). |

The biological error rate is not optimal, due to correlations between neuron activity between days. | In multi-task training, the task sequence is likely to be important for optimal output. Separately, naturally-occurring stimuli and tasks might be efficiently coded in this manner. |

Synaptic plasticity can be a substrate for Bayesian optimisation. “It is always better to be Bayesian if possible.” | Machine learning techniques that have equivalence to Bayesian inference are biologically plausible. Note nonetheless that Bayesian optimisation is a relatively general technique. |

Reference: Micro- and macroscopic neural architectures for data-efficient learning (Naoki Hiratani, 2019)

## Free Energy

I was unexpectedly blessed with a short presentation of the Free Energy “proof” by Takuya Isomura, who collaborated with Karl Friston on cost function characterisation.

Finding | Possible implementation in AI |

Given a discrete state space, any neural network minimising its cost function is performing variational Bayesian inference. There is mathematical equivalence between the cost function of a neural network, and free energy in Bayesian inference. Given a continuous state space, a biologically plausible algorithm called error-gated Hebbian Rule (EGHR) can perform independent component analysis, and principal component analysis. GABA-ergic spike timing-dependent plasticity might be the neural substrate for this type of learning. | Free energy based on variational Bayes provides evidence for plausible equivalence of machine learning techniques to biological learning methods. |

Reference: Optimization principles for biological neural networks (Takuya Isomura, 2019)

0