# Rant about Jeff Hawkins “the sensory-motor model of the world is a learned representation of the thing itself”

I enjoyed very much the presentation given by Jeff Hawkins “Computing like the brain: the path to machine intelligence”

Around 8:40

This is something which could be read in parallel with the passage I commented in the post   The front end visual system performs like a distributed GLC computation.

I reproduce some parts

In the article by Kappers, A.M.L.; Koenderink, J.J.; Doorn, A.J. van, Basic Research Series (1992), pp. 1 – 23,

Local Operations: The Embodiment of Geometry

the authors introduce the notion of  the  “Front End Visual System” .

Let’s pass to the main part of interest: what does the front end?  Quotes from the section 1, indexed by me with (a), … (e):

• (a) the front end is a “machine” in the sense of a syntactical transformer (or “signal processor”)
• (b) there is no semantics (reference to the environment of the agent). The front end merely processes structure
• (c) the front end is precategorical,  thus – in a way – the front end does not compute anything
• (d) the front end operates in a bottom up fashion. Top down commands based upon semantical interpretations are not considered to be part of the front end proper
• (e) the front end is a deterministic machine […]  all output depends causally on the (total) input from the immediate past.

Of course, today I would say “performs like a distributed chemlambda computation”, according to one of the strategies described here.

Around 17:00 (Sparse distributed representation) . ” You have to think about a neuron being a bit (active:  a one, non active: a zero).  You have to have many thousands before you have anything interesting [what about C. elegans?]

Each bit has semantic meaning. It has to be learned, this is not something that you assign to it, …

… so the representation of the thing itself is its semantic representation. It tells you what the thing is. ”

That is exactly what I call “no semantics”! But is much better formulated as a positive thing.

Why is this a form of “no semantics”? Because as you can see the representation of the thing itself edits out the semantics, in the sense that “semantics” is redundant, appears only at the level of the explanation about how the brain works, not in the brain workings.

But what is the representation  of the thing itself? A chemical blizzard in the brain.

Let’s put together the two ingredients into one sentence:

• the sensory-motor model of the world is a learned representation of the thing itself.

Remark that as previously, there is too much here: “model” and “representation” sort of cancel one another, being just superfluous additions of the discourse. Not needed.

What is left: a local, decentralized.  asynchronous, chemically based, never ending “computation”, which is as concrete as the thing (i.e. the world, the brain) itself.

I put “computation” in quotes marks because this is one of the sensible points: there should be a rigorous definition of what that “computation” means. Of course, the first step would be fully rigorous mathematical proof of principle that such a “computation”, which satisfies the requierements listed in the previous paragraph, exists.

Then, it could be refined.

I claim that chemlambda is such a proof of principle. It satisfies the requirements.

I don’t claim that brains work based on a real chemistry instance of chemlambda.

Just a proof of principle.

But how much I would like to test this with people from the frontier mentioned by Jeff Hawkins at the beginning of the talk!

In the following some short thoughts from my point of view.

While playing with chemlambda with the strategy of reduction called “stupid” (i.e. the simplest one), I tested how it works on the very small part (of chemlambda) which simulates lambda calculus.

Lambda calculus, recall, is one of the two pillars of computation, along with the Turing machine.

In chemlambda, the lambda calculus appears as a sector, a small class of molecules and their reactions. Contrary to the Alchemy of Fontana and Buss, abstraction and application (operations from lambda calculus) are both concrete (atoms of molecules). The chemlambda artificial chemistry defines some very general, but very concrete local chemical interactions (local graph rewrites on the molecules) and some (but not all) can be interpreted as lambda calculus reductions.

Contrary to Alchemy, the part which models lambda calculus is concerned only with untyped lambda calculus without extensionality, therefore chemical molecules are not identified with their function, not have they definite functions.

Moreover, the “no semantics” means concretely that most of the chemlambda molecules can’t be associated to a global meaning.

Finally, there are no “correct” molecules, everything resulted from the chemlambda reactions goes, there is no semantics police.

So from this point of view, this is very nature like!

Amazingly,  the chemical reductions of molecules which represent lambda terms reproduce lambda calculus computations! It is amazing because with no semantics control, with no variable passing or evaluation strategies, even if the intermediary molecules don’t represent lambda calculus terms, the computation goes well.

For example the famous Y combinator reduces first to only a small (to nodes and 6 port nodes molecule), which does not have any meaning in lambda calculus, and then becomes just a gun shooting “application” and “fanout” atoms (pair which I call a “bit”). The functioning of the Y combinator is not at all sophisticated and mysterious, being instead fueled by the self-multiplication of the molecules (realized by unsupervised local chemical reactions) which then react with the bits and have as effect exactly what the Y combinator does.

The best example I have is the illustration of the computation of the Ackermann function (recall: a recursive but not primitive recursive function!)

What is nice in this example is that it works without the Y combinator, even if it’s a game of recursion.

But this is a choice, because actually, for many computations which try to reproduce lambda calculus reductions, the “stupid” strategy used with chemlambda is a bit too exuberant if the Y combinator is used as in lambda calculus (or functional programming).

The main reason is the lack of extension, there are no functions, so the usual functional programming techniques and designs are not the best idea. There are shorter ways in chemlambda, which employ better the “representation of the thing itself is its own semantic interpretation” than FP.

One of those techniques is to use instead of long linear and sequential lambda terms (designed as a composition of functions), so to use instead of that another architecture, one of neurons.

For me, when I think about a neural net and neural computation, I tend to see the neurons and synapses as loci of chemical activity. Then  I just forget about these bags of chemicals and I see a chemical connectome sort of thing, actually I see a huge molecule suffering chemical reactions with itself, but in a such a way that its spatial extension (in the neural net), phisically embodied by neurons and synapses and perhaps glial cells and whatnot, this spatial extention is manifested in the reductions themselves.

In this sense, the neural architecture way of using the Y combinator efficiently in chemlambda is to embed it into a neuron (bag of chemical reactions), like sketched in the following simple experiment

Now, instead of a sequential call of duplication and application (which is the way the Y combinator is used in lambda calculus), imagine a well designed network of neurons which in very few steps build a (huge, distributed) molecule (instead of a perhaps very big number of sequential steps) which at it’s turn reduce itself in very few steps as well, and then this chemical connectome ends in a quine state, i.e. in a sort of dynamic equilibrium (reactions are happening all the time but they combine in such a way that the reductions compensate themselves into a static image).

Notice that the end of the short movie about the neuron is a quine.

For chemlambda quines see this post.

In conclusion there are chances that this massively parallel (bad name actually for decentralized, local) architecture of a neural net, seen as it’s chemical working, there are chances that chemlambda really can do not only any computer science computation, but also anything a neural net can do.

_________________________________________________________________

# Synapse servers: plenty of room at the virtual bottom

The Distributed GLC is a decentralized asynchronous model of computation which uses chemlambda molecules. In the model, each molecule is managed by an actor,  which has the molecule as his state, and the free ports of the  molecule are tagged with other actors names. Reductions  between molecules (like chemical reactions) happen  only for those molecules which have actors who know  one the other, i.e. only between molecules managed by actors  :Alice and :Bob say, such that:

• the state of :Alice (i.e. her molecule) contains a half of the $graph_{1}$ pattern of a move, along with a free port tagged with :Bob name.
• the state of :Bob contains the other half of the $graph_{1}$ pattern, with a free port tagged with :Alice name
• there is a procedure based exclusively on communication by packets, TCP style, [UPDATE: watch this recent video of an interview of Carl Hewitt!]  which allow to perform the reduction on both sides and which later informs the eventual other actors which are neighbors (i.e. appear as tags in :Alice or :Bob state) about possible new tags at their states, due to the reduction which happened (this can be done for example by performing the move either via introduction of new invisible nodes in the chemlambda molecules, or via the introduction of Arrow elements, then followed by combing moves).

Now, here is possibly a better idea. To explore. One which connects to a thread which is not developed for the moment (anybody interested? contact me)   neural type computation with chemlambda and GLC .

The idea is that once the initial configuration of actors and their initial states are set, then why not move the actors around and make the possible reductions only if the actors :Alice and :Bob are in the same synapse server.

Because the actor IS the state of the actor, the rest of stuff a GLC actor knows to do is so trivially easy so  that it is not worthy do dedicate one program per actor running some place fixed. This way, a synapse server can do thousands of reductions on different actors datagrams (see further) in the same time.

• be bold and use connectionless communication, UDP like, to pass the actors states (as datagrams) between servers called “synapse servers”
• and let a synapse server to check datagrams to see if by chance there is a pair which allow a reduction, then perform in one place the reduction, then let them walk, modified.

There is so much place for the artificial chemistry chemlambda at the bottom of the Internet layers that one can then add some learning mechanisms to the synapse servers. One is for example this: suppose that a synapse server matches two actors datagrams and finds there are more than one possible reductions between them. Then the synapse server asks his neighbour synapse servers (which perhaps correspond to a virtual neuroglia) if they encouter this configuration. It chooses then (according to a simple algorithm) which reduction to make based on the info coming from its neighbours in the same glial domain and tags the packets which   result after  the reduction (i.e. adds to them in some field) a code for the mode which was made. Successful choices are those which have descendants which are active, say after more than $n$ reductions.

Plenty of possibilities, plenty of room at the bottom.

# Neuroscience and computation: hand-in-hand

Finding the following in a CS research article:

… understanding the brain’s computing paradigm has the potential to produce a paradigm shift in current models of computing.

almost surely would qualify the respective article as crackpot, right? Wrong, for historical and contemporary reasons, which I shall mention further.

1. The cited formulation comes from the site of the Human Brain Project, one of the most amazing collaborations ever. More specifically, it is taken from here, let me cite more:

The brain differs from modern computing systems in many ways. The first striking difference is its use of heterogeneous components: unlike the components of a modern computer, the components of the brain (ion channels, receptors, synapses, neurons, circuits) are always highly diverse – a property recently shown to confer robustness to the system [1]. Second, again unlike the components of a computer, they all behave stochastically – it is never possible to predict the precise output they will produce in response to a given input; they are never “bit-precise”. Third, they can switch dynamically between communicating synchronously and asynchronously. Fourth, the way they transmit information across the brain is almost certainly very different from the way data is transmitted within a computer: each recipient neuron appears to give its own unique interpretation to the information it receives from other neurons. Finally, the brain’s hierarchically organised, massively recurrent connectively, with its small-world topology, is completely different from the interconnect architecture of any modern computer. For all these reasons, understanding the brain’s computing paradigm has the potential to produce a paradigm shift in current models of computing.

Part of the efforts made by HBP are towards neuromorphic computing.    See the presentation Real-time neuromorphic circuits for neuro-inspired computing systems by Giacomo Indiveri, in order to learn more about the history and the present of the subject.

2.  As you can see from the presentation, neuromorphic computing  is rooted in the article “A logical calculus of the ideas immanent in nervous activity” by Warren Mcculloch and Walter Pitts,1943, Bulletin of Mathematical Biophysics 5:115-133. This brings me to the “history” part. I shall use the very informative article by Gualtiero Piccinini “The First Computational Theory of Mind and Brain: A Close Look at McCulloch and Pitts’s ‘Logical Calculus of Ideas Immanent in Nervous Activity'”, Synthese 141: 175–215, 2004.  From the article:

[p. 175] Warren S. McCulloch and Walter H. Pitt’s 1943 paper, ‘‘A Logical  Calculus of the Ideas Immanent in Nervous Activity,’’ is often cited as the starting point in neural network research. As a matter of fact,  in 1943 there already existed a lively community of biophysicists doing mathematical work on neural networks.  What was novel in McCulloch and Pitts’s paper was a theory that employed logic and the mathematical notion of computationintroduced by Alan Turing (1936–37) in terms of what came to be known as Turing  Machines – to explain how neural mechanisms might realize mental functions.

About Turing and McCulloch and Pitts:

[p. 176] The modern computational theory of mind and brain is often credited to Turing himself (e.g., by Fodor 1998). Indeed, Turing talked about the brain first as a ‘‘digital computing machine,’’ and later as a sort of analog computer.  But Turing made these statements in passing, without attempting to justify them, and he never developed a computational  theory of thinking. More importantly, Turing made these statements well after the publication of McCulloch and Pitts’s theory, which Turing knew about.  Before McCulloch and Pitts, neither Turing nor anyone else had used the mathematical notion of computation as an ingredient in a theory of mind and brain.

[p. 181] In 1936, Alan Turing published his famous paper on computability (Turing 1936–37), in which he introduced Turing Machines and used them to draw a clear and rigorous connection between computing, logic, and machinery. In particular, Turing argued that any effectively calculable function can be computed by some Turing Machine – a thesis now known as the Church–Turing thesis (CT) – and proved that some special Turing Machines, which he called ‘‘universal,’’ can compute any function computable by Turing Machines.  By the early 1940s, McCulloch had read Turing’s paper. In 1948, in a public discussion during the Hixon Symposium, McCulloch declared that in formulating his theory of mind in terms of neural mechanisms, reading Turing’s paper led him in the ‘‘right direction.’’

On McCulloch and “the logic of the nervous system”:

[p. 179] In 1929, McCulloch had a new insight. It occurred to him that the all-or-none electric impulses transmitted by each neuron to its neighbors might correspond to the mental atoms of his psychological  theory, where the relations of excitation and inhibition between neurons would perform logical operations upon electrical signals corresponding to inferences of his propositional calculus of psychons. His psychological theory of mental atoms turned into a theory of ‘‘information flowing through ranks of neurons.’’ This was McCulloch’s first attempt ‘‘to apply Boolean algebra to the behavior of nervous nets.’’ The brain would embody a logical  calculus like that of Whitehead and Russell’s Principia Mathematica, which would account for how humans could perceive objects on the basis of sensory signals and how humans could do mathematicsand abstract thinking. This was the beginning of McCulloch’s  search for the ‘‘logic of the nervous system,’’ on which he kept working until his death.

On Pitts, McCulloch and logic:

[p. 185-186] In the papers that Pitts wrote independently of McCulloch, Pitts did not suggest that the brain is a logic machine. Before McCulloch entered the picture, neither Pitts nor any  other member of Rashevsky’s biophysics group employed logical or computational language to describe the functions performed by networks of neurons. The use of logic and computation theory to model the brain and understand its function appeared for the first time in McCulloch and Pitts’s 1943 paper; this is likely to be a contribution made by McCulloch to his joint project with Pitts. […]

Soon after McCulloch met Pitts, around the end of 1941, they started collaborating on a joint mathematical theory that employed logic to model nervous activity, and they worked on it during the following two years. They worked so closely that Pitts (as well as Lettvin) moved in with McCulloch and his family for about a year in  Chicago. McCulloch and Pitts became intimate friends and they remained so until their death in 1969.  According to McCulloch, they worked largely on how to treat closed loops of activity mathematically, and the solution was worked out mostly by Pitts using techniques that McCulloch didn’t understand. To build up their formal theory, they adapted Carnap’s rigorous (but cumbersome) formalism, which Pitts knew from having studied with Carnap. Thus, according to McCulloch, Pitts did all the difficult technical work.

A citation from McCullogh and Pitts paper [p. 17 from the linked pdf]

It is easily shown: first, that every net, if furnished with a tape, scanners connected to  afferents, and suitable efferents to perform the necessary motor-operations, can compute only such numbers as can a Turing machine; second, that each of the latter numbers can be computed by such a net; and that nets with circles can be computed by such a net; and that nets with circles can compute, without scanners and a tape, some of the numbers the machine can, but no others, and not all of them. This is of interest as affording a psychological justification of the Turing definition of computability and its equivalents, Church’s  $\lambda$-definability and Kleene’s primitive recursiveness: If any number can be computed by an organism, it is computable by these definitions, and conversely.

Comment by Piccinini on this:

[p. 198] in discussing computation in their paper, McCulloch and Pitts  did not prove any results about the computation power of their nets;  they only stated that there were results to prove. And their conjecture was not that their nets can compute anything that can be computed by Turing Machines. Rather, they claimed that if their nets were provided with a tape, scanners, and ‘‘efferents,’’ then they would compute what Turing Machines could compute; without a tape, McCulloch and Pitts expected even nets with circles to compute a smaller class of functions than the class computable by Turing Machines.

I have boldfaced the previous paragraph because I find it especially illuminating, resembling the same kind of comment as the one on currying I gave in the post “Currying by using zippers and an allusion to the Cartesian Theater“.

[p. 198-199] McCulloch and Pitts did not explain what they meant by saying that nets compute. As far as the first part of the passage is concerned, the sense in which nets compute seems to be a matter of describing the behavior of nets by the vocabulary and formalisms of computability theory. Describing McCulloch–Pitts nets in this way turned them into a useful tool for designing circuits for computing mechanisms. This is how von Neumann would later use them (von Neumann 1945).