Growing Brains: Artificial Embryogeny

Johanna Appel · Published in Towards Data Science · 14 min read
.
March 24, 2022
Chemical signaling: Cell gets input B, produces A, produces C, which enable production of much more B, which enables production of D which disables production of B
Graphic by author

The path to artificial general intelligence is still not understood very well. Today’s mainstream approach of designing networks by hand and training them via backprop is good for very specific tasks. What it doesn’t do very well, however, is act as a petri dish for a more universal kind of intelligence. Manually designing brains, after all, is ultimately limited by our understanding of the structures they consist of. So we as the designers are the bottleneck.

In past articles, I have looked into different learning techniques that remove this bottleneck by adapting the network topology automatically. An example for this is the deep-dive into the Cascade-Correlation learning architecture, which iteratively adds and trains new neurons as part of its learning algorithm.

In my last article, I explored another technique called Neuroevolution, a stochastic optimization approach based on evolutionary principles copied from nature. This approach in particular is interesting in the context of creating an artificial general intelligence (AGI) because the same principles created us, i.e. including our brains. In other words, to date, this is the only algorithm known to have the potential for creating an AGI. Yet, how it works is quite simple:

  1. Encode your network in some genetic encoding
  2. Create a set of randomized genomes (i.e. randomized nets)
  3. Express the genes (i.e. re-create the actual network and/or its parameters) and test it on the target problem
  4. Select some of the best performing individuals
  5. Mix them up and introduce changes to their genes (random to a certain degree), creating a new set of genomes, and re-iterate the whole process from step 3

Generally speaking, we want evolutionary process to be stable and exploitative (i.e. slow and local) enough not to loose good solutions again randomly in the process, but also exploratory enough to find good solutions in few generations (as fast as possible).

This can be influenced in part by how the selection of the ‘fittest’ happens, by the algorithms used to introduce random changes to the genome, and, crucially, by the genetic encoding.

From neuroevolution to artifical embryogeny

From what we have looked at in the neuroevolution article, one baffling aspect stood out: As Stanley et al. mention in [1], the genetic encoding can bias evolution in unpredictable ways if it is done indirectly.

Indirectly here means that it doesn’t directly specify which neurons are connected and how strongly, etc. but instead only gives compact instructions on how to grow or construct the network from a given seed according to given rules. Which is very similar, in fact, to how human brains develop: From a single cell to a complex network structure.

Bias means that how the network is grown could either increase or decrease the chance of stumbling upon a well-optimized (more intelligent) network in the process of evolution.

This is reason enough for researchers to explore this space of artificial embryogeny (or artificial development), so let’s use this article to understand how ‘brains’ can potentially be grown more literally than you might have expected.

NB: To follow this article, you should understand what artificial neural nets are. I would also recommend skimming the article on Neuroevolution to understand the foundations before diving even deeper.

Why bother with indirect encodings?

As mentioned, the research on the effects of indirect genetic encodings is relatively thin. Besides Stanley’s taxonomy of the field of artificial embryogeny from 2003 [2], I could not really find a comprehensive study on how different aspects of indirect genetic encodings affect or bias the evolutionary processes. If there is no conclusive research in this area, then this means we do not yet fully understand these aspects and we do not have a theory that can be tested. Which begs the question: Why bother with this at all?

I mentioned already that evolution needs to have balance between exploration and exploitation to be a viable optimization method. As far as I can see, an indirect encoding could influence this beneficially in the following ways:

  1. Search Space Reduction: It can reduce the amount possible network architectures that can or need to be tested.
  2. Modularity: It could find useful modular network structures and replicate them or exploit symmetrical or fractal (self-similar) structures to preserve good solutions and further reduce search space.
  3. Genetic Operator Effectiveness: It can influence what types of recombination & mutation operations are possible and therefore indirectly influence exploration.

Search Space Reduction

Let’s have a short look back to the neuroevolution article, where we discussed that genetic encodings are a different representation (the genotype) of the actual neural networks (the phenotype). In the case of indirect genetic encodings, the set of all possible genotypes (the genotype space) is a compaction of the space of all possible phenotypes. In plain(er) English: There should be less possible construction plans for the neural nets than there are theoretically possible neural nets.

Indirect and direct genetic encoding map to a phenotype space. Indirect encodings are a much smaller set than direct encodings.
Indirect and direct genetic encoding map to a phenotype space. Indirect encodings are usually a much smaller search space than direct encodings. Graphic by author

So if we regard the evolutionary processes as a stochastic search for an optimal network, it is faster to search a smaller genotype space than to search a bigger one for an optimal solution.

However, if the search space is reduced it means we disregard some solutions that are genetically impossible. Solutions that could, in theory, be optimal. Here I guess we come very close to the problem of choosing the encoding in such a way that we don’t accidentally loose good network configurations on average.

Modularity

One look at nature tells you that its solutions are highly regular. Most living beings of a certain complexity have some sort of symmetrical aspects to them, or repeating structures. Looking at the brain, one of the fundamental building blocks of our neocortex (the place where the general problem solving happens) is a cortical column. A module that is repeated a couple million times in the brain [3].

A schematic of the layers of a neocortical column
A schematic of the layers of a neocortical column. Source: Wikipedia, Henry Vandyke Carter, Public domain, via Wikimedia Commons

Interestingly, the drive for these kind of solutions probably shaped our sense of aesthetics in large part. Human faces that are more symmetrical are also perceived to be more attractive [4].

All this at least hints towards the fact that it is more effective to build complexity from evolving and arranging smaller parts, rather than evolving the whole system at once. This should come at no surprise — after all, since evolution is essentially random, we are dealing with statistics. Evolving a complex, irregular system that cannot be built up slowly is just extremely unlikely to occur.

Genetic Operator Effectiveness

Genetic operators, essentially, are just a stick to stir the waters of evolution. How the stirring is done and what mixing it entails are parameters that will effect our exploration and exploitation of the genotype space.

To find a balance then, these operators need to act in boundaries. Totally random changes can actually do more harm than good in that way. Their definition as well as the genetic encoding can influence this.
Specifically, we want find an encoding where small changes in the genotype also only lead to small changes in the phenotype. This way, it becomes easier to control the degree of change and therefore the speed with which new solutions are explored.

With direct encodings of small networks, this comes for free. They, by their very nature, will directly represent that small change of genes also as small changes in the network.
However, as we increase the complexity, these small changes lead to diminishing returns and we would have to create more and more unstructured mutations to really have an exploratory effect — which will most likely lead to broken or ineffective networks.

In indirectly encoded networks this property is harder to achieve. But if it is done well, it can enable or disable whole features of genetic ‘modules’, and let recombination more likely combine useful network structures of parents. This preserves useful features and accelerates evolution.

A really great encoding would enable us to have this property adjust to the level of complexity it creates: Small genetic changes lead to small changes in simple networks, but they ideally also lead to safe and modular changes in more complex networks.

Artificial Embryogeny

In the title and in the introduction I mentioned ‘artificial embryogeny’ as the overarching topic. A term that was coined by Stanley in the Taxonomy of the field in [2]. What is that exactly, and what does it have to do with evolution and indirect genetic encodings?

To put it simply — an indirect genetic encoding implies that we have to do something with it to create the actual neural network out of it. This is what I mean when I say ‘growing brains’. The genetic encoding is just a seed or building plan, but then the actual construction still needs to happen. In humans, this process of early development is called embryogeny. And since we’re doing something with a similar purpose in silicon, we call it artificial embryogeny. Welcome to the world of science :)

Babies usually don’t grow just from a strand of DNA, they also need the infrastructure (the fertilized egg cell) to develop. We face a similar dilemma. It is not enough to design a great genetic encoding that biases evolution in favorable ways. We also need to create the machinery to execute it.

Side note: Most fascinatingly, in nature, RNA is not only a building plan but can also be a functional molecule. Ribosomes, the small molecules that enable the translation of RNA into proteins are actually made of a special RNA sequence. There is a whole theory by which RNA for exactly this reason might be the source of all life and the first self-replicating molecule.

Overview of the Approaches

So let’s look at the methods scientists have devised to grow brains in silico. I’ll partially base this short overview on the aforementioned taxonomy [2], but since it was created in 2003, more ideas have been developed in the meantime. From that paper, Stanley et al. classified the research up till 2003 roughly in two directions:

Top-down: Ideas that start from an abstract theoretical model (e.g. formal grammars) and use this is to generate structures that can be interpreted as a neural network.

Bottom-up: Ideas that simulate the cell chemistry and cell interactions to achieve neural cell growth and network structures closer to what is happening in a real biological system.

Both these approaches require a development over time, i.e. they iterate and ‘unpack’ the genetic information step-by-step to finally create the network. However, this more tedious approach is apparently not necessary — the translation can also happen in one step:

Time-independent: Since 2003, Stanley et al. have developed a time independent approach — ‘compositional pattern producing networks’ [5] — that uses the ideas from nature but gets to a result in one go.

Top-down

Many seemingly complex patterns in nature can be replicated with surprisingly simple rules. For example, in 1968, Lindemayer devised a formal grammar now known as L-Systems to approximate the growth of trees and weeds using a few simple rules. [6]

Weed grown from a simple grammar-based rule.
Weed grown from a simple grammar-based rule. Source: Wikipedia, Sakurambo, CC BY-SA 3.0, via Wikimedia Commons

Grammars like this are basically rules for symbol replacement.

Take for example a simple rule set like:

SAB
A → aS
B → b

Starting with S, we would replace that with AB, which we then would replace by aSb, which we then would replace by aABb, aaSbb, and so on, slowly growing the sequence of symbols.

The surprising thing about this approach is that the above weed has been generated from this rule set only:

XF+[[X]-X]-F[-FX]+X
FFF

Where F means grow a bit forward in the current direction and ‘+’ and ‘-’ means tilt the growing direction a bit by +/- 25°. The brackets are for new branches and the X is ignored when actually generating the weed.

Rules could also be parameterized, e.g.:

S(1) → a
S(n) → aS(n-1)

Which would, other than the rules above, terminate after n steps when starting with a fixed S(n), producing a pattern of aaa… (n times a). E.g. S(2) would produce ‘aa’.

Using this as a basis, all we need is to find a mapping from some sequence of symbols to a neural network. Kitano did this in 1990 [7] by iteratively expanding 2-by-2 adjacency matrices, which then signify the connections of the network.

If using such an encoding, the growth of the network happens by generating sequences of symbols iteratively and then translating the result into some usable network topology and weight matrix format. There is — of course — a lot of room for possible versions of this. Which means a lot of trial and error before we get to a good variant if doing things this way. My humble opinion here is to first aim for grammars and rule sets that can (in principle) generate similar structures to what we so far understood the brain looks like.

Formal grammars, by the way, are not the only thinkable top-down solution here. There will also be other generative systems that produce patterns of data from simple initial instructions — see e.g. pseudorandom number generators which can create the exact same seemingly random sequence of numbers if given the same seed.
Although admittedly, pseudorandom number generators are probably a very poor choice for such an approach, because they specifically break with the idea of ‘small change in genotype space leads to small change in phenotype space’. If one changes the seed only slightly, one should expect a very different sequence of numbers.

Bottom-up

Cells in our bodies rely on complicated inner mechanisms. For outer alignment though, navigation relies on chemical signaling and gradients.

For axons — the ‘long’ part of neurons that passes the impulse to the next neuron — chemical gradients can serve as a beacon to guide the growth. They find their way from eyes to the neocortex by pulling towards or pushing away from special signal chemicals [8].

Growth cone of an axon pulling away from a negative stimulus and toward a positive stimulus
Growth cone of an axon pulling away from a negative stimulus and toward a positive stimulus. Source: Wikipedia, Chris1387, CC BY-SA 3.0, via Wikimedia Commons

These principles have been implemented e.g. in ‘genetic regulatory networks’ by Dellaert et al. to varying degrees of detail [9]. Cells are aligned on a two-dimensional grid and can grow and interact based on that topology.

As Stanley describes in [4], the mechanism of signaling and acting based on signals can for example be encoded in logic statements. A cell can receive values through specific channels (similar to receptors) from other cells or the environment. This then triggers the inner logic of that cell to produce temporary effects.

Chemical signaling: Cell gets input B, produces A, produces C, which enable production of much more B, which enables production of D which disables production of B
Chemical signaling: Cell gets input B, produces A, produces C, which enable production of much more B, which enables production of D which disables production of B. Graphic by author

So, the overall idea is simple: Copy nature on a level that is still feasible. Through this kind of bottom-up simulation of cell growth, axon navigation and connection, we can create neural net architectures by analogous means to how biology does it.

Time-Independent

When the taxonomy paper was published in 2003 [4], the discussed articles in it had one thing in common: They were all growing structures over time, i.e. iteratively, until they reach a defined end state to be usable.

However, in 2007 Stanley (yes, that guy again) introduced a new idea, ‘compositional pattern producing networks’ (CPPN) [9]. This concept is based on function composition the way neural networks do it, but puts an emphasis on a variety of activation functions (e.g. sine, gauss, and abs/norm in addition to standard ones like ReLU) that — when combined — can produce intricate patterns.

A simplified example architecture of a compositional pattern producing network with a variety of activation functions.
A simplified example architecture of a compositional pattern producing network with a variety of activation functions. Graphic by author.

The idea stems from the observation that in natural embryogeny, cells arrange themselves according to chemical gradients, and these are often produced in symmetrical or other patterns that enable our bodies to achieve more complexity with less additional data that needs to be encoded. Look at it this way: If your genes ‘figured out’ how to put a hand with fingers on one arm, they can reuse the same basic pattern (mirrored) for the other arm. The CPPN allows encoding these patterns directly to approximate arbitrary shapes.

To use it for neural architectures, neurons are placed in a two-dimensional grid and connections (and weights) between them are directly calculated by passing their coordinates to the CPPN. [10]

Building on that, even more recent approaches have made use of back-propagation to train the CPPN — to fully close the circle. [11]

Uses & Limits

In neuroevolutionary approaches, generally speaking, solutions that can actually exploit symmetries and copies of network features have a huge potential for generating architectures that approach brain structures. After all we know, the power of the human neocortex also stems from a simple building block, a cortical column (or a variant thereof), that is being replicated many, many times. [3] So it may well be that training an ever evolving network on selected, hard problems will some day lead to it showing aspects of general intelligence. After all, this is more or less how we as humanity got to where we are now, biologically speaking.

However, as some scientists in the aforementioned papers already had to find out, following nature closely is tricky. Things tend to get complex very quickly and reasoning about the workings of evolved networks becomes very hard — even harder still, if the genetic encoding does not directly map to the later network architecture.

For all this, artificial embryogeny remains a field of active research. Time will tell whether or not it can provide the necessary tools to crack AGI — a reason for me, personally, to follow the developments here closely.

All finished source documents, notebooks and code related to this is also available on Github. Please feel encouraged to leave feedback and suggest improvements.

If you’d like to support the creation of this and similarly fascinating articles, you can sign up for a medium membership and/or follow my account.

References

[1] K. O. Stanley, R. Miikkulainen, “Evolving Neural Networks through Augmenting Topologies” (2002), Evolutionary Computation 10 (2): 99–127

[2] K. O. Stanley, R. Miikkulainen, “A Taxonomy for Artificial Embryogeny” (2003), Artificial Life 9 (2): 93–130

[3] J. Hawkins, “A Thousand Brains: A New Theory of Intelligence” (2021), Basic Books

[4] K. Grammer, R. Thornhill, “Human (Homo sapiens) facial attractiveness and sexual selection: the role of symmetry and averageness” (1994), Journal of comparative psychology 108 (3): 233–42.

[5] K. O. Stanley, “Compositional pattern producing networks: A novel abstraction of development” (2007), Genetic programming and evolvable machines 8 (2): 131–162

[6] A. Lindenmayer, “Mathematical models for cellular interactions in development II. Simple and branching filaments with two-sided inputs” (1968), Journal of Theoretical Biology 18 (3): 300–315

[7] H. Kitano, “Designing neural networks using genetic algorithms with graph generation system.” (1990), Complex systems 4: 461–476

[8] C. Holt, “Wiring up the brain: How axons navigate” 2017, Royal Society Ferrier Prize Lecture 2017

[9] F. Dellaert, R. D. Beer, “Co-evolving body and brain in autonomous agents using a developmental model” (1994), (Tech. Rep. CES-94–16) Cleveland, OH: Dept. of Computer Engineering and Science, Case Western Reserve University

[10] K. O. Stanley, D. B. D’Ambrosio, J. Gauci, “A Hypercube-Based Encoding for Evolving Large-Scale Neural Networks” (2009), Artificial Life 15 (2): 185–212

[11] C. Fernando, D. Chrisantha, et al., “Convolution by evolution: Differentiable pattern producing networks” (2016), Proceedings of the Genetic and Evolutionary Computation Conference 2016

Johannes Hollmann

CEO/Founder

Are you planning an AI project?

Let’s discuss how your data combined with machine learning technologies can increase the performance of your organization.

Get in touch!