# An exercice with convex analysis and neural networks

This is a note about a simple use of convex analysis in relation with neural networks. There are many points of contact between convex analysis and neural networks, but I have not been able to locate this one, thanks for pointing me to a source, if any.

Let’s start with a directed graph with set of nodes $N$ (these are the neurons) and a set of directed bonds $B$. Each bond has a source and a target, which are neurons, therefore there are source and target functions

$s:B \rightarrow N$   , $t:B \rightarrow N$

so that for any bond $x \in B$ the neuron $a = s(x)$ is the source of the bond and the neuron $b = t(x)$ is the target of the bond.

For any neuron $a \in N$:

• let $in(a) \subset B$ be the set of bonds $x \in B$ with target $t(x)=a$,
• let $out(a) \subset B$ be the set of bonds $x \in B$ with source $s(x)=a$.

A state of the network is a function $u: B \rightarrow V^{*}$ where $V^{*}$ is the dual of a real vector space $V$. I’ll explain why in a moment, but it’s nothing strange: I’ll suppose that $V$ and $V^{*}$ are dual topological vector spaces, with duality product denoted by $(u,v) \in V \times V^{*} \mapsto \langle v, u \rangle$ such that any linear and continuous function from $V$ to the reals is expressed by an element of $V^{*}$ and, similarly, any linear and continuous function from $V^{*}$ to the reals is expressed by an element of $V$.

If you think that’s too much, just imagine $V=V^{*}$ to be finite euclidean vector space with the euclidean scalar product denoted with the $\langle , \rangle$ notation.

A weight of the network is a function $w:B \rightarrow Lin(V^{*}, V)$, you’ll see why in a moment.

Usually the state of the network is described by a function which associates to any bond $x \in B$ a real value $u(x)$. A weight is a function which is defined on bonds and with values in the reals. This corresponds to the choice $V = V^{*} = \mathbb{R}$ and $\langle v, u \rangle = uv$. A linear function from $V^{*}$ to $V$ is just a real number $w$.

The activation function of a neuron $a \in N$ gives a relation between the values of the state on the input bonds and the values of the state of the output bonds: any value of an output bond is a function of the weighted sum of the values of the input bonds. Usually (but not exclusively) this is an increasing continuous function.

The integral of an increasing continuous function is a convex function. I’ll call this integral the activation potential $\phi$ (suppose it does not depends on the neuron, for simplicity). The relation between the input and output values is the following:

for any neuron $a \in N$ and for any bond $y \in out(a)$ we have

$u(y) = D \phi ( \sum_{x \in in(a)} w(x) u(x) )$.

This relation generalizes to:

for any neuron $a \in N$ and for any bond $y \in out(a)$ we have

$u(y) \in \partial \phi ( \sum_{x \in in(a)} w(x) u(x) )$

where $\partial \phi$ is the subgradient of a convex and lower semicontinuous activation potential

$\phi: V \rightarrow \mathbb{R} \cup \left\{ + \infty \right\}$

Written like this, we are done with any smoothness assumptions, which is one of the strong features of convex analysis.

This subgradient relation also explains the maybe strange definition of states and weights with the vector spaces $V$ and $V^{*}$.

This subgradient relation can be expressed as the minimum of a cost function. Indeed, to any convex function $phi$ is associated a sync  (means “syncronized convex function, notion introduced in [1])

$c: V \times V^{*} \rightarrow \mathbb{R} \cup \left\{ + \infty \right\}$

$c(u,v) = \phi(u) + \phi^{*}(v) - \langle v, u \rangle$

where $\phi^{*}$ is the Fenchel dual of the function $\phi$, defined by

$\phi^{*}(v) = \sup \left\{ \langle v, u \rangle - \phi(u) \right\}$

This sync has the following properties:

• it is convex in each argument
• $c(u,v) \geq 0$ for any $(u,v) \in V \times V^{*}$
• $c(u,v) = 0$ if and only if $v \in \partial \phi(u)$.

With the sync we can produce a cost associated to the neuron: for any $a \in N$, the contribution to the cost of the state $u$ and of the weight $w$ is

$\sum_{y \in out(a)} c(\sum_{x \in in(a)} w(x) u(x) , u(y) )$.

The total cost function $C(u,w)$ is

$C(u,w) = \sum_{a \in N} \sum_{y \in out(a)} c(\sum_{x \in in(a)} w(x) u(x) , u(y) )$

and it has the following properties:

• $C(u,w) \geq 0$ for any state $u$ and any weight $w$
• $C(u,w) = 0$ if and only if for any neuron $a \in N$ and for any bond $y \in out(a)$ we have

$u(y) \in \partial \phi ( \sum_{x \in in(a)} w(x) u(x) )$

so that’s a good cost function.

Example:

• take $\phi$ to be the softplus function $\phi(u) =\ln(1+\exp(x))$
• then the activation function (i.e. the subgradient) is the logistic function
• and the Fenchel dual of the softplus function is the (negative of the) binary entropy $\phi^{*}(v) = v \ln(v) + (1-v) \ln(1-v)$ (extended by $0$ for $v = 0$ or $v = 1$ and equal to $+ \infty$ outside the closed interval $[0,1]$).

________

[1] Blurred maximal cyclically monotone sets and bipotentials, with Géry de Saxcé and Claude Vallée, Analysis and Applications 8 (2010), no. 4, 1-14, arXiv:0905.0068

_______________________________

# Neurons rewrites

A real neural network is a huge cascade of chemical rewrites. So I can try my chemlambda with that task. From a programming point of view, the problem is to understand neural networks as a graph rewrite model of computation, together with a (yet undiscovered) discipline of using them.

Further are some pretty images showing the first tries. They are all made by filming real simulations obtained with chemlambda.

Before giving them, I tell you that this task seems hard and now I believe that an easier one would be to use the ideas of chemlambda in the frame of quantum computing. (Do I have to add that in a new way, different from what was proposed in the many graphical formalisms associated to category theory? Probably! All those formalisms fall into the family: topological changes do not compute. Wait and see.)

# A neuron-like artificial molecule

This is a neuron-like artificial molecule, simulated in chemlambda.

It has been described in the older post

https://chorasimilarity.wordpress.com/2015/03/04/how-to-put-a-y-combinator-into-a-neuron-and-lock-it-with-a-quine/

Made with quiner.sh and the file neuron.mol as input.
If you want to make your own, then follow the instructions from here:

It is on the list of demos: http://chorasimilarity.github.io/chemlambda-gui/dynamic/neuron.html

__________
Needs explanations, because this is not exactly the image used for a neuron in a neural network.

_________
1. In neural networks a neuron is a black box with some inputs and some outputs, which computes and sends (the same) signal at the outputs as a function of the signals from the inputs. A connection between an output of one neuron and an an input of another has a weight.

The interpretation is that outputs represent the axon of a neuron, which is like a tree with the root in the neuron’s soma. The inputs of the neuron represents the dendrites. For a neuron in a neural network, a connection between an input (leaf of a dendrite) and an output (leaf of an axon) has a weight because it models an average of many contact points (synapses) between dendrites and axons.

The signals sent represent electrical activity in the neural network.

2. Here, the image is different.

Each neuron is seen as a bag of chemicals. The electrical signals are only a (more easy to measure) aspect of the cascades of chemical reactions which happen inside the neuron’s some, in the dendrites, axon and synapses.

Therefore a neuron in this image is the chemical reactions which happen.

From here we go a bit abstract.

3. B-type unorganised machines. In his fundamental research “Intelligent Machinery” Turing introduced his  B-type neural networks
http://www.alanturing.net/turing_archive/pages/Reference%20Articles/connectionism/Turing%27s%20neural%20networks.html
or B-type unorganised machine, which is made by more or less identical neurons (they compute a boolean function) which are connected via a connection-modifier box.

A connection modifier box has an input and an output, which are part of the connection between two neurons. But it has also some training connections, which can modify the function  computed by the connection modifier.

What is great is that Turing explains that the connection modifier can me made by neurons
http://www.alanturing.net/turing_archive/archive/l/l32/L32-007.html
so that actually a B-type unorganised machine is an A-type unorganised machine (same machine without connection modifiers), where we see the connection modifiers as certain, well chosen patterns of neurons.

OK, the B-type machines compute by signals sent and received by neurons nevertheless. They compute boolean values.

4. What would happen  if we pass from Turing to Church?

Let’s imagine the same thing, but in lambda calculus: neurons which reduce lambda terms, in networks which have some recurring patterns which are themselves made by neurons.

Further: replace the signals (which are now lambda terms) by their chemical equivalent — chemlambda molecules — and replace the connection between them by chemical bonds. This is what you see in the animation.

Connected to the magenta dot (which is the output of the axon) is a pair of nodes which is related to the Y combinator, as seen as a molecule in chemlambda. ( Y combinator in chemlambda explained in this article http://arxiv.org/abs/1403.8046 )

The soma of this neuron is a single node (green), it represents an application node.

There are two dendrites (strings of red nodes) which have each 3 inputs (yellow dots). Then there are two small molecules at the end of the dendrites which are chemical signals that the computation stops there.

Now, what happens: the neuron uses the Y combinator to build a tree of application nodes with the leaves being the input nodes of the dendrites.

When there is no more work to do the molecules which signal these interact with the soma and the axon and transform all into a chemlambda quine (i.e. an artificial bug of the sort I explained previously) which is short living, so it dies after expelling some “bubbles”  (closed strings of white nodes).

5. Is that all? You can take “neurons” which have the soma any syntactic tree of a lambda term, for example. You can take neurons which have other axons than the Y-combinator. You can take networks of such neurons which build and then reduce any chemlambda molecule you wish.
__________________________________________

# Rant about Jeff Hawkins “the sensory-motor model of the world is a learned representation of the thing itself”

I enjoyed very much the presentation given by Jeff Hawkins “Computing like the brain: the path to machine intelligence”

Around 8:40

This is something which could be read in parallel with the passage I commented in the post   The front end visual system performs like a distributed GLC computation.

I reproduce some parts

In the article by Kappers, A.M.L.; Koenderink, J.J.; Doorn, A.J. van, Basic Research Series (1992), pp. 1 – 23,

Local Operations: The Embodiment of Geometry

the authors introduce the notion of  the  “Front End Visual System” .

Let’s pass to the main part of interest: what does the front end?  Quotes from the section 1, indexed by me with (a), … (e):

• (a) the front end is a “machine” in the sense of a syntactical transformer (or “signal processor”)
• (b) there is no semantics (reference to the environment of the agent). The front end merely processes structure
• (c) the front end is precategorical,  thus – in a way – the front end does not compute anything
• (d) the front end operates in a bottom up fashion. Top down commands based upon semantical interpretations are not considered to be part of the front end proper
• (e) the front end is a deterministic machine […]  all output depends causally on the (total) input from the immediate past.

Of course, today I would say “performs like a distributed chemlambda computation”, according to one of the strategies described here.

Around 17:00 (Sparse distributed representation) . ” You have to think about a neuron being a bit (active:  a one, non active: a zero).  You have to have many thousands before you have anything interesting [what about C. elegans?]

Each bit has semantic meaning. It has to be learned, this is not something that you assign to it, …

… so the representation of the thing itself is its semantic representation. It tells you what the thing is. ”

That is exactly what I call “no semantics”! But is much better formulated as a positive thing.

Why is this a form of “no semantics”? Because as you can see the representation of the thing itself edits out the semantics, in the sense that “semantics” is redundant, appears only at the level of the explanation about how the brain works, not in the brain workings.

But what is the representation  of the thing itself? A chemical blizzard in the brain.

Let’s put together the two ingredients into one sentence:

• the sensory-motor model of the world is a learned representation of the thing itself.

Remark that as previously, there is too much here: “model” and “representation” sort of cancel one another, being just superfluous additions of the discourse. Not needed.

What is left: a local, decentralized.  asynchronous, chemically based, never ending “computation”, which is as concrete as the thing (i.e. the world, the brain) itself.

I put “computation” in quotes marks because this is one of the sensible points: there should be a rigorous definition of what that “computation” means. Of course, the first step would be fully rigorous mathematical proof of principle that such a “computation”, which satisfies the requierements listed in the previous paragraph, exists.

Then, it could be refined.

I claim that chemlambda is such a proof of principle. It satisfies the requirements.

I don’t claim that brains work based on a real chemistry instance of chemlambda.

Just a proof of principle.

But how much I would like to test this with people from the frontier mentioned by Jeff Hawkins at the beginning of the talk!

In the following some short thoughts from my point of view.

While playing with chemlambda with the strategy of reduction called “stupid” (i.e. the simplest one), I tested how it works on the very small part (of chemlambda) which simulates lambda calculus.

Lambda calculus, recall, is one of the two pillars of computation, along with the Turing machine.

In chemlambda, the lambda calculus appears as a sector, a small class of molecules and their reactions. Contrary to the Alchemy of Fontana and Buss, abstraction and application (operations from lambda calculus) are both concrete (atoms of molecules). The chemlambda artificial chemistry defines some very general, but very concrete local chemical interactions (local graph rewrites on the molecules) and some (but not all) can be interpreted as lambda calculus reductions.

Contrary to Alchemy, the part which models lambda calculus is concerned only with untyped lambda calculus without extensionality, therefore chemical molecules are not identified with their function, not have they definite functions.

Moreover, the “no semantics” means concretely that most of the chemlambda molecules can’t be associated to a global meaning.

Finally, there are no “correct” molecules, everything resulted from the chemlambda reactions goes, there is no semantics police.

So from this point of view, this is very nature like!

Amazingly,  the chemical reductions of molecules which represent lambda terms reproduce lambda calculus computations! It is amazing because with no semantics control, with no variable passing or evaluation strategies, even if the intermediary molecules don’t represent lambda calculus terms, the computation goes well.

For example the famous Y combinator reduces first to only a small (to nodes and 6 port nodes molecule), which does not have any meaning in lambda calculus, and then becomes just a gun shooting “application” and “fanout” atoms (pair which I call a “bit”). The functioning of the Y combinator is not at all sophisticated and mysterious, being instead fueled by the self-multiplication of the molecules (realized by unsupervised local chemical reactions) which then react with the bits and have as effect exactly what the Y combinator does.

The best example I have is the illustration of the computation of the Ackermann function (recall: a recursive but not primitive recursive function!)

What is nice in this example is that it works without the Y combinator, even if it’s a game of recursion.

But this is a choice, because actually, for many computations which try to reproduce lambda calculus reductions, the “stupid” strategy used with chemlambda is a bit too exuberant if the Y combinator is used as in lambda calculus (or functional programming).

The main reason is the lack of extension, there are no functions, so the usual functional programming techniques and designs are not the best idea. There are shorter ways in chemlambda, which employ better the “representation of the thing itself is its own semantic interpretation” than FP.

One of those techniques is to use instead of long linear and sequential lambda terms (designed as a composition of functions), so to use instead of that another architecture, one of neurons.

For me, when I think about a neural net and neural computation, I tend to see the neurons and synapses as loci of chemical activity. Then  I just forget about these bags of chemicals and I see a chemical connectome sort of thing, actually I see a huge molecule suffering chemical reactions with itself, but in a such a way that its spatial extension (in the neural net), phisically embodied by neurons and synapses and perhaps glial cells and whatnot, this spatial extention is manifested in the reductions themselves.

In this sense, the neural architecture way of using the Y combinator efficiently in chemlambda is to embed it into a neuron (bag of chemical reactions), like sketched in the following simple experiment

Now, instead of a sequential call of duplication and application (which is the way the Y combinator is used in lambda calculus), imagine a well designed network of neurons which in very few steps build a (huge, distributed) molecule (instead of a perhaps very big number of sequential steps) which at it’s turn reduce itself in very few steps as well, and then this chemical connectome ends in a quine state, i.e. in a sort of dynamic equilibrium (reactions are happening all the time but they combine in such a way that the reductions compensate themselves into a static image).

Notice that the end of the short movie about the neuron is a quine.

For chemlambda quines see this post.

In conclusion there are chances that this massively parallel (bad name actually for decentralized, local) architecture of a neural net, seen as it’s chemical working, there are chances that chemlambda really can do not only any computer science computation, but also anything a neural net can do.

_________________________________________________________________

# Neurons know nothing, however …

… they know surprisingly much, according to the choice of definition of “neural knowledge”. The concrete definition which I adopt is the following:  the knowledge of a neuron at a given moment is the collection (multiset) of molecules it contains. The knowledge of a synapse is the collection of molecules present in respective axon, dendrite and synaptic cleft.

I take the following hypotheses for a wet neural network:

• the neural network is physically described as a graph with nodes being the neurons and arrows being the axon-dendrite synapses. The network is built from two ingredients: neurons and synapses. Each synapse involves three parts: an axon (associated to a neuron), a synaptic cleft (associated to a local environment) and a dendrite (associated to a neuron).
• Each of the two ingredients of a neural network, i.e. neurons and synapses, as described previously, function by associated chemical reaction networks, involving the knowledge of the respective ingredients.
• (the most simplifying hypothesis)  all molecules from the knowledge of a neuron, or of a synapse, are of two kinds: elements of $MOLECULES$ or enzyme  names from the chemical concrete machine.

The last hypothesis seem to introduce knowledge with a more semantic flavour by the backdoor. That is because, as explained in  arXiv:1309.6914 , some molecules (i.e. trivalent graphs from the chemical concrete machine formalism) represent lambda calculus terms. So, terms are programs, moreover the chemical concrete machine is Turing universal, therefore we end up with a rather big chunk of semantic knowledge in a neuron’s lap. I intend to show you this is not the case, in fact a neuron, or a synapse does not have (or need to) this kind of knowledge.

__________________________

Before giving this explanation, I shall explain in just a bit more detail how the wet neural network,  which satisfies those hypotheses, works.  A physical neuron’s behaviour is ruled by the chemistry of it’s metabolic pathways. By the third hypothesis these metabolic pathways can be seen as graph rewrites of the molecules (more about this later). As an effect of it’s metabolism, the neuron has an electrical activity which in turn alters the behaviour of the other ingredient, the synapse. In the synapse act other chemical reaction networks, which are amenable, again by the third hypothesis, to computations with the chemical concrete machine. As an effect of the action of these metabolic pathways, a neuron communicates with another neuron. In the process the knowledge of each neuron (i.e. the collection of molecules) is modified, and the same is true about a synapse.

As concerns chemical reactions between molecules, in the chemical concrete machine formalism there is only one type of reactions which are admissible, namely the reaction between a molecule and an enzyme. Recall that if (some of the) molecules are like lambda calculus terms, then (some of the) enzymes are like names of reduction steps and the chemical reaction between a molecule and an enzyme is assimilated to the respective reduction step applied to the respective lambda calculus term.

But, in the post  SPLICE and SWITCH for tangle diagrams in the chemical concrete machine    I proved that in the chemical concrete machine formalism there is a local move, called SWITCH

which is the result of 3 chemical reactions with enzymes, as follows:

Therefore, the chemical concrete machine formalism with the SWITCH move added is equivalent with the original formalism. So, we can safely add the SWITCH move to the formalism and use it for defining chemical reactions between molecules (maybe also by adding an enzyme, or more, for the SWITCH move, let’s call them $\sigma$).  This mechanism gives chemical reactions between molecules with the form

$A + B + \sigma \rightarrow C + D + GARB$

where $\latex A$ and $B$ are molecules such that by taking an arrow from $A$ and another arrow from $B$ we may apply the $\sigma$ enzyme and produce the SWITCH move for this pair of arrows, which results in new molecules $C$ and $D$ (and possibly some GARBAGE, such as loops).

In conclusion, for this part concerning possible chemical reactions between molecules, we have enough raw material for constructing any chemical reaction network we like. Let me pass to the semantic knowledge part.

__________________________

Semantic knowledge of molecules. This is related to evaluation and it is maybe the least understood part of the chemical concrete machine. As a background, see the post  Example: decorations of S,K,I combinators in simply typed graphic lambda calculus , where it is explained the same phenomenon (without any relation with chemical metaphors) for the parent of the chemical concrete machine, the graphic lambda calculus.

Let us consider the following rules of decorations with names and types:

If we consider decorations of combinator molecules, then we obtain the right type and identification of the corresponding combinator, like in the following example.

For combinator molecules, the “semantic knowledge”, i.e. the identification of the lambda calculus term from the associated molecule, is possible.

In general, though, this is not possible. Consider for example a 2-zipper molecule.

We obtain the decoration $F$ as a nested expression of $A, D, E$, which enough for performing two beta reductions, without knowing what $A, D, E$ mean (without the need to evaluate $A, D, E$). This is equivalent with the property of zippers, to allow only a certain sequence of graphic beta moves (in this case two such moves).

Here is the tricky part: if we look at the term $F$ then all that we can write after beta reductions is only formal, i.e. $F$  reduces to $(A[y:=D])[x:=E]$, with all the possible problems related to variable names clashes and order of substitutions. We can write this reduction but we don’t get anything from it, it still needs further info about relations between the variables $x, y$ and the terms $A, D, E$.

However, the graphic beta reductions can be done without any further complication, because they don’t involve any names, nor of variables, like $x, y$, neither of terms, like $A, D, E, F$.

Remark that the decoration is made such that:

• the type decorations of arrows are left unchanged after any move
• the terms or variables decorations (names elsewhere “places”) change globally.

We indicate this global change like in the following figure, which is the result of the sequence of the two possible $\beta^{+}$ moves.

Therefore, after the first graphic beta reduction, we write  $A'= A[y:=D]$ to indicate that $A'$ is the new, globally (i.e. with respect to the whole graph in which the 2-zipper is a part) obtained decoration which replaces the older $A$, when we replace $y$ by $D$. After the second  graphic beta reduction we use the same notation.

But such indication are even misleading, if, for example, there is a path made by arrows outside the 2-zipper, which connect the arrow decorated by $D$ with the arrow decorated by $y$.  We should, in order to have a better notation, replace $D$ by $D[y:=D]$, which gives rise to a notation for a potentially infinite process of modifying $D$. So, once we use graphs (molecules) which do not correspond to combinators (or to lambda calculus terms), we are in big trouble if we try to reduce the graphic beta move to term rewriting, or to reductions in lambda calculus.

In conclusion for this part: decorations considered here, which add a semantic layer to the pure syntax of graph rewriting, cannot be used as replacement the graphic molecules, nor should reductions be equated with chemical reactions, with the exception of the cases when we have access to the whole molecule and moreover when the whole molecule is one which represents a combinator.

So, in this sense, the syntactic knowledge of the neuron, consisting in the list of it’s molecules, is not enough for deducing the semantic knowledge which is global, coming from the simultaneous decoration of the whole chemical reaction network which is associated to the whole neural network.

# Local FAN-IN eliminates global FAN-OUT (I)

For being able to build  a chemical concrete machine (see the posts  A chemical concrete machine for lambda calculus  and  Why build a chemical concrete machine, and how?) we have to prove that  universal computation can be attained with only local moves in graphic lambda calculus. Or, the lambda calculus sector of the graphic lambda calculus, which gives universality to graphic lambda calculus, uses the global FAN-OUT move (see theorem 3.1 (d)  arXiv:1305.5786 [cs.LO]. Similarly, we see in proposition 3.2 (d), which describes the way combinatory logic appears in graphic lambda calculus, that again global FAN-OUT is used.

I want to describe a way to eliminate the global FAN-OUT move from combinatory logic (as appears in graphic lambda calculus via the algorithm described here ).

________________

There are reasons to dislike global moves in relation to B-type neural networks (see the last post    Pair of synapses, one controlling the other (B-type NN part III) ). Similar concerns can be found in the series of posts which has as the most recent one Dictionary from emergent algebra to graphic lambda calculus (III) .

In this first post I shall introduce a local FAN-IN move and two distributivity moves and I shall prove that they eliminate the need for using global FAN-OUT in combinatory logic. In the next post I shall prove that we can eliminate two other moves (so that the total number of moves of graphic lambda calculus stays the same as before) and moreover we can recover from distributivity and local FAN-OUT moves the missing oriented Reidemeister moves from the $\lambda$-TANGLE sector.

________________

Definition. The local FAN-IN move is described in the next figure and it can be applied for any $\varepsilon \not = 1$.

• as you see, in the move appears a dilation gate, what can this has to do with combinatory logic? As I explained previously, the properties of the gates are coming through the moves they are involved in, and not from their name. I could have introduced a new gate, with two inputs and one output, call this new gate “fan-in gate” and use it in the FAN-IN move. However, wait until the next post to see that there are other advantages, besides the economy of gates available, in using a dilation gate as a fan-in.
• the FAN-IN move resembles to the packing arrows trick which is used extensively in the neural networks posts.  This suggests to use as a  fan-in gate

the green triangle gate and as fan-out gate the red triangle gate. This would eliminate the $\Upsilon$ gate from the formalism, but is not clear to me how this replacement would interfere with the rest of the moves.

• the FAN-IN move resembles with the dual of the graphic beta move, but is not the same (recall that until now I have not accepted the dual of the graphic beta move in the list of the moves of graphic lambda calculus, although there are strong reasons to do so):

which is needed in the emergent algebra sector in order to make the dictionary to work (and related as well to the goal of non using global FAN-OUT in that sector).  This latter move is in fact a distributivity move (see further), but we are free to choose different moves in different sectors of the graphic lambda calculus,

• I know it is surprising that until now there was nothing about evaluation strategies in graphic lambda calculus, the reason being that because there are no variables then there is noting to evaluate. However, the situation is not so simple, at some point, for example in the chemical concrete machine or for neural networks, some criterion for choosing the order of moves will be needed. But it is an important point to notice that replacing global FAN-OUT (which could be seen as a remnant of having variables and evaluating them) by local FAN-IN has nothing to to with evaluation strategies.

________________

Definition: The distributivity moves (related to the application and lambda abstraction gates) are the following:

• the first distributivity move is straighforward, an application gate is just doubled and two fan-out moves establish the right connections. We see here why the “mystery move” can be seen as a kind of distributivity move,
• the second distributivity move is where we need a fan-in gate (and where we use a dilation gate instead): because of th orientation of the arrows, after we double the lambda abstraction gates, we need to collect two arrows into one!

________________

Combinatory logic terms appear in graphic lambda calculus as  trees made by application gates, with leaves one of the combinators S, K, I (seen as graphs in $GRAPH$.  I want to show the following. [UPDATE: made some corrections.]

Theorem.   We can replace the global FAN-OUT move with a sequence of local FAN-IN ,  DIST, CO-COMM and local pruning moves, every time the global FAN-OUT move is applied to a term made by SKI combinators.

Proof.  First, remark that a sequence of  DIST moves for the application gate allows to reduce the problem of replacing global FAN-OUT moves for any combinator to the problem of replacing it for S, K, and I. This is because the DIST move for the application gate allows to do the FAN-OUT of trees of application gates:

Now we have to prove that we can perform global FAN-OUT for I , K, S combinators.  For the combinator I the proof is the following:

For the combinator K we shall also use a local pruning move:

Finally, the proof for the combinator S is the following:

Now we are going to use 3 DIST moves, followed by the switch of arrows explained in   Local FAN-IN eliminates GLOBAL FAN-OUT (II) , which is applied in the dashed green ellipse from the next figure:

And we are done.

UPDATE: At close inspection, it turns out that we don’t need to do switch (macro) moves. Instead, if we go back at the point we were before the last figure,  we may use  first CO-ASSOC and then perform the three FAN-IN moves .

# Pair of synapses, one controlling the other (B-type NN part III)

In  Teaser (I) and Teaser (II) I discussed about the possibility to construct neural networks with controlled connections, resembling the B-type neural networks of Turing, in the following sense:

• they are formed by “neurons”, which in Turing’s description are boolean logic gates, and here they are graphs from graphic lambda calculus which correspond to lambda calculus terms. But there’s no real restriction on that, graphic lambda calculus being more general and powerful than untyped lambda calculus, one may consider “neurons” which are outside the lambda calculus sector. The fact is that a “neuron” here should be interpreted as any graph in $GRAPH$ with at least one input but only one output. Later, when we shall speak about the order of performing moves in such neural networks, it will be seen that each “neuron” is a bag containing graphs which are modified (or reduced, as people say) independently one from another. Neurons are packing conventions from a formal viewpoint, but this viewpoint may not be the best one, a better viewpoint would be to see neurons as well as synapses, described further, as real constructs, like in the related project of the chemical concrete machine, In particular, neurons may do other things than computing in the lambda calculus sense (classical computation), they may do some more geometrical or physical or robot-like activities, not usually considered computations.
• the connections between neurons are controlled in a B-type NN, by TRUE – FALSE control wires or inputs. Turing explains that controls themselves can be realized as constructions with boolean neurons (i.e. in his language B-type NNs are particular cases of his A-type NNs). In Teaser (II)    is explained how the connections between graphic lambda calculus neurons can be constructed by using a 2-zipper and a switch, such that a connection is controlled (by the same TRUE-FALSE mechanism, but this time TRUE and FALSE are literary the graphs of the corresponding terms in lambda calculus) by another connection.

There is an important difference, besides the power of the calculi used (boolean vs. graphic lambda), namely that there is no signal transmitted through the wires  of the network. That is because graphic lambda calculus does not need variable names. I shall come back to this very important point (which is a big advantage for implementing such networks in reality) in a future post, but let me mention two simple facts:

• an analogy: think about hardware and software, a difference between them is that software may be associated to the pattern of signals which circulates in the hardware. The hardware is the machine inhabited by the software, they are not in the same plane of physical reality, right? Well, in this implementation there is literary no difference between those, exactly because there are no signals (software) transmitted through the hardware.
• this use of names, software, variable and signals is a pain for those trying to make sense how to implement in reality computing constructs. But, if you think, the need to name that stuff “x” and that other stuff “y”, to make alpha conversions in order to avoid name clashes, or even to describe computations in a linear string of symbols, all this are in no relation with physical reality, they are only cultural constraints of humans (and their artificial constructs). It all has to do with the fact that humans communicate science through writing, and of course, it has to do with the enumeration techniques of the cartesian method , which “is designed as a technique for understanding performed by one mind in isolation, severely handicapped by the bad capacity of communication with other isolated minds”. Molecules in a cell or in a solution do not align in a string according to the experimenter writing habits, from left to right, right to left, or vertical. They just don’t wait for an external  clock  to rhythm their ballet, nor for the experimenter to draw it’s familiar coordinate system. These are all limitations of the method, not of nature.

_____________

Further, in this post, I shall use “synapse” instead “connection”.  Let’s see, now that we know we can implement synapses by a TRUE-FALSE mechanism, let’s see if we can do it simpler. Also, if it is possible to make the “synapses control other synapses” concept more symmetric.

Instead of the little magenta triangles used in Teaser I, II posts (which are not the same as the magenta triangles used in the chemical concrete machine post) , I shall use a more clear notation:

The design of a synapse proposed in Teaser (II)  involved two switches and a zipper, justified by the direct translation of TRUE-FALSE lambda calculus terms in the freedom sector of graphic lambda calculus.  Think now that in fact we have there two synapses, one controlling the other. The zipper between them is only a form of binding them together in unambiguous way.

A simplified form of this idea of a pair of synapses is the following:

In the upper part of the figure you see the pair of synapses, taking the a form like  this: $><$. There are two synapses there, one is $>$, which is controlled by the other $<$.  The connection between the blue 1 and the red 3′ is controlled by the other synapse. In order to see how, let’s perform a graphic beta move and obtain the graph from the lower part of the figure.

The red 3 can connect, in the sense explained by the switch mechanism from Teaser (II) with blue 1′ or blue 2′, all three of them belonging to the controlling synapse. Say red 3  connects with blue 1′ and look what happens:

OK, so we obtain a connection between blue 1 and red 3′. Suppose now that red 3 connects with blue 2′, look:

Yes, blue 1 does not connect with red 3′, they just exchanged a pair of termination gates. That’s how the pair of synapses work.

Finally, let me remark that the use of termination gates is a bit ugly, it breaks the symmetry,  is not really needed.