# An extension of hamiltonian mechanics

This is an introduction to the ideas of the article arXiv:1902.04598

UPDATE: If you think about a billiard-ball computer, the computer is in the expression of the information gap. The model applies  also to chemlambda, molecules have a hamiltonian as well and the graph rewrites, aka chemical reactions, have a description in the information gap. That’s part of the kaleidos project 🙂

__

Hamiltonian mechanics is the mechanism of the world. Indeed, the very simple equations (here the dot means a time derivative)

govern everything. Just choose an expression for the function $H$, called hamiltonian, and then solve these equations to find the evolution in time of the system.

Quantum mechanics is in a very precise sense the same thing. The equations are the same, only the formalism is different. There is a hamiltonian which gives the evolution of the quantum system…

Well, until measurement, which is an addition to the beautiful formalism. So we can say that hamiltonian mechanics, in the quantum version, and the measurement algorithm are, together, the basis of the quantum world.

Going back to classical mechanics, the same happens. Hamiltonian mechanics can be used as is in astronomy, or when we model the behavior of a robotic arm, or other purely mechanical system. However, in real life there are behaviors which go beyond this. Among them: viscosity, plasticity, friction, damage, unilateral contact…

There is always, in almost all applications of mechanics, this extra ingredient: the system does not only have a hamiltonian, there are other quantities which govern it and which make, most of the time, the system to behave irreversibly.

Practically every  object, machine or construction made by humans needs knowledge beyond hamiltonian mechanics. Or beyond quantum mechanics. This is the realm of applied mathematics, of differential equations, of numerical simulations.

In this classical mechanics for the real world we need the hamiltonian and we also need to explain in which way the object or material we study is different from all the other objects or materials. This one is viscous, plastic, elsot-plastic, elasto-visco-plastic, there is damage, you name it, these differences are studied and they add to hamiltonian mechanics.

They should add, but practically they don’t. Instead, what happens is that the researchers interested into such studies choose to renounce at the beaustiful hamiltonian mechanics formalism and to go back to Newton and add their knowledge about irreversible behaviours there.

(There is another aspect to be considered if you think about mechanical computers. They are mostly nice thought experiments, very powerfull ideas generators. Take for example a billiard-ball computer. It can’t be described by hamiltonian mechanics alone because of the unilateral contact of the balls with the biliard and of the balls one with another. So we can study it, but we have to add to the hamiltonian mechanics formalism.)

From all this  we see that it may be interesting to study if there is any information content of the deviation from hamiltonian mechanics.

We can measure this deviation by a gap vector, defined by

and we need new equations for the gap vector $\eta$.  Very simple then, suppose we have the other ingredient we need, a likelihood function $\pi \in [0,1]$ and we add that

where $z = z(t) = (q(t), p(t))$. That is we ask that    if the system is in the state $z$ then the velocity $\dot{z}$ and the gap vector $\eta$   maximize the likelihood $\pi$ .

Still too general, how can we choose the likelihood? We may take the following condition

that is we can suppose that the algorithm max  gives a  categorical answer when applied to any of the 2nd or 3rd argument of the likelihood.

(It’s Nature’s business to embody the algorithm max…)

We define then the information content associated to the likelihood as

So now we have a principle of minimal information content of the difference from hamiltonian evolution: minimize

In arXiv:1902.04598 I explain how this extension of hamiltonian mechanics works wonderfully with viscosity, plasticity, damage and unilateral contact.

# My first programs, long ago: Mumford-Shah and fracture

A long time ago, in 1995-1997, I dreamed about really fast and visual results in image segmentation by the new then Mumford-Shah functional and in fracture. It was my first programming experience. I used Fortran, bash and all kinds of tools available in linux.

There is still this trace of my page back then, here at the Wayback Machine. (I was away until 2006.) The present day web page is this.

Here is the image segmentation by the M-S functional of a bw picture of a Van Gogh painting.

And here is a typical result of  fracture propagation (although I remember having hundreds of frames available…)

The article is here.

# A quine in Lafont’ Interaction combinators

I continue with a second post about Y. Lafont Interaction combinators. Here is the first one.

In the Figure 3 from the article is given an example of a nonterminating computation:

This is a quine. Indeed it is a graph which has a periodic evolution under the deterministic greedy reduction algorithm, using the interaction rules (i.e. the graph rewrites) of interaction combinators.

By comparison, a chemlambda quine is a molecule (graph in chemlambda) which has a periodic evolution under the deterministic greedy reduction algorithm which uses the chemlambda graph rewrites, with the priority of the rewrites set to “viral”, i.e. the DIST family of rewrites comes first. In chemlamdba is needed a prority of rewrites (for the deterministic algorithm) because there exist conflicts between the rewrites, i.e. overlaping left patterns.

About a third of the molecules from the library are chemlambda quines and they are interesting mostly when reduced with the random reduction algorithm. While for interaction combinators the random reduction algorithm brings nothing new (the system is confluent), for chemlambda with the random reduction algorithm the system is not confluent and the chemlambda quines may die. All of the quines from the library are at best immortal, i.e. the probability of death does not depend on the age of the molecule.

A reason for this phenomenon is that all these chemlambda quines don’t use termination nodes (which correspond to the epsilon nodes of interaction combinators). The smallest chemlambda quine without T nodes is the 9_quine, which has  9 nodes. But we may use termination nodes and produce a quine which is similar to Lafont’ example:

In mol notation this quine is:

FO 1 2 3
T 2
FOE 3 4 1
T 4

The animation is obtained with the scripts from the chemlambda repository, in the same way as those used for the comparison with the Dynamic GOI Machine. In order to obtain a deterministic reduction all weights (i.e. all parameters “wei_*” from the relevant awk script were set to 0.

You see that this is really a 6 nodes quine, why? Because even in Lafont example, a deterministic greedy reduction would lead at step 2 to the simultaneous application of rewrites which increase the number of nodes and of those which decrease the number of nodes, so a correct application of the deterministic greedy algorithm would be similar with the example from chemlambda.

Maybe it would be interesting (as a tool) and straightforward to modify the chemlambda scripts into an interaction combinators version.

# Universality of interaction combinators and chemical reactions

In the foundational article Interaction combinators, Yves Lafont describes interaction rules as having the form

He then gives three examples of particular families of interaction rules which can be used to simulate Turing Machines, Cellular Automata and Unary Arithmetics.

The main result of his article (Theorem 1) is that there is an algorithm which allows to translate any interaction system (i.e. collection of interaction rules which satisfy some natural conditions) into the very simple system of his interaction combinators:

In plain words, he proves that there is a way to replace the nodes of a given interaction system by networks of interaction combinators in such a way that any of the interaction rules of that interaction system can be achieved (in a finite number of steps) by the interaction rules of the interaction combinators.

Because he has the example of Turing Machines as an interaction system, it follows that the interaction combinators are universal in the Turing sense.

The most interesting thing for me is that Lafont has a notion of universality for interaction systems, the one he uses in his Theorem 1. This universality of interaction combinators is somehow larger than the universality in the sense of Turing. It is a notion of universality at the level of graph rewrite systems, or, if you want, at the level of chemical reactions!

Indeed, why not proceed as in chemlambda and see an interaction rule as if it’s a chemical reaction? We may add an “enzyme” per interaction rule, or we may try to make the reaction conservative (in the number of nodes and wires) as we did in chemlambda strings.

Probably the rewrites of chemlambda are also universal in the class of directed interaction networks. If we take seriously that graph rewrites are akin to chemical reactions then the universality in the sense of Lafont means, more or less:

any finite collection of chemical reactions among a finite number of patterns of chemical molecules can be translated into reactions among chemlambda molecules

But why keep talking about chemlambda and not about the original interaction combinators of Lafont. Let’s make the same hypothesis as in the article Molecular computers and deduce that:

such molecular computers which embody the interaction combinators rewrites as chemical reaction can indeed simulate any other finite collection of chemical reactions, in particular life.

For me that is the true meaning of Lafont universality.

# Back to the drawing board: all strings

UPDATE: Better look at “chemlambda strings”, eliminates enzymes, is conservative! Link to original  and  link to archived version.
All is strings. Make and break strings.

Define backbone moves.

But any machine would do.

# An exercice with convex analysis and neural networks

This is a note about a simple use of convex analysis in relation with neural networks. There are many points of contact between convex analysis and neural networks, but I have not been able to locate this one, thanks for pointing me to a source, if any.

Let’s start with a directed graph with set of nodes $N$ (these are the neurons) and a set of directed bonds $B$. Each bond has a source and a target, which are neurons, therefore there are source and target functions

$s:B \rightarrow N$   , $t:B \rightarrow N$

so that for any bond $x \in B$ the neuron $a = s(x)$ is the source of the bond and the neuron $b = t(x)$ is the target of the bond.

For any neuron $a \in N$:

• let $in(a) \subset B$ be the set of bonds $x \in B$ with target $t(x)=a$,
• let $out(a) \subset B$ be the set of bonds $x \in B$ with source $s(x)=a$.

A state of the network is a function $u: B \rightarrow V^{*}$ where $V^{*}$ is the dual of a real vector space $V$. I’ll explain why in a moment, but it’s nothing strange: I’ll suppose that $V$ and $V^{*}$ are dual topological vector spaces, with duality product denoted by $(u,v) \in V \times V^{*} \mapsto \langle v, u \rangle$ such that any linear and continuous function from $V$ to the reals is expressed by an element of $V^{*}$ and, similarly, any linear and continuous function from $V^{*}$ to the reals is expressed by an element of $V$.

If you think that’s too much, just imagine $V=V^{*}$ to be finite euclidean vector space with the euclidean scalar product denoted with the $\langle , \rangle$ notation.

A weight of the network is a function $w:B \rightarrow Lin(V^{*}, V)$, you’ll see why in a moment.

Usually the state of the network is described by a function which associates to any bond $x \in B$ a real value $u(x)$. A weight is a function which is defined on bonds and with values in the reals. This corresponds to the choice $V = V^{*} = \mathbb{R}$ and $\langle v, u \rangle = uv$. A linear function from $V^{*}$ to $V$ is just a real number $w$.

The activation function of a neuron $a \in N$ gives a relation between the values of the state on the input bonds and the values of the state of the output bonds: any value of an output bond is a function of the weighted sum of the values of the input bonds. Usually (but not exclusively) this is an increasing continuous function.

The integral of an increasing continuous function is a convex function. I’ll call this integral the activation potential $\phi$ (suppose it does not depends on the neuron, for simplicity). The relation between the input and output values is the following:

for any neuron $a \in N$ and for any bond $y \in out(a)$ we have

$u(y) = D \phi ( \sum_{x \in in(a)} w(x) u(x) )$.

This relation generalizes to:

for any neuron $a \in N$ and for any bond $y \in out(a)$ we have

$u(y) \in \partial \phi ( \sum_{x \in in(a)} w(x) u(x) )$

where $\partial \phi$ is the subgradient of a convex and lower semicontinuous activation potential

$\phi: V \rightarrow \mathbb{R} \cup \left\{ + \infty \right\}$

Written like this, we are done with any smoothness assumptions, which is one of the strong features of convex analysis.

This subgradient relation also explains the maybe strange definition of states and weights with the vector spaces $V$ and $V^{*}$.

This subgradient relation can be expressed as the minimum of a cost function. Indeed, to any convex function $phi$ is associated a sync  (means “syncronized convex function, notion introduced in [1])

$c: V \times V^{*} \rightarrow \mathbb{R} \cup \left\{ + \infty \right\}$

$c(u,v) = \phi(u) + \phi^{*}(v) - \langle v, u \rangle$

where $\phi^{*}$ is the Fenchel dual of the function $\phi$, defined by

$\phi^{*}(v) = \sup \left\{ \langle v, u \rangle - \phi(u) \right\}$

This sync has the following properties:

• it is convex in each argument
• $c(u,v) \geq 0$ for any $(u,v) \in V \times V^{*}$
• $c(u,v) = 0$ if and only if $v \in \partial \phi(u)$.

With the sync we can produce a cost associated to the neuron: for any $a \in N$, the contribution to the cost of the state $u$ and of the weight $w$ is

$\sum_{y \in out(a)} c(\sum_{x \in in(a)} w(x) u(x) , u(y) )$.

The total cost function $C(u,w)$ is

$C(u,w) = \sum_{a \in N} \sum_{y \in out(a)} c(\sum_{x \in in(a)} w(x) u(x) , u(y) )$

and it has the following properties:

• $C(u,w) \geq 0$ for any state $u$ and any weight $w$
• $C(u,w) = 0$ if and only if for any neuron $a \in N$ and for any bond $y \in out(a)$ we have

$u(y) \in \partial \phi ( \sum_{x \in in(a)} w(x) u(x) )$

so that’s a good cost function.

Example:

• take $\phi$ to be the softplus function $\phi(u) =\ln(1+\exp(x))$
• then the activation function (i.e. the subgradient) is the logistic function
• and the Fenchel dual of the softplus function is the (negative of the) binary entropy $\phi^{*}(v) = v \ln(v) + (1-v) \ln(1-v)$ (extended by $0$ for $v = 0$ or $v = 1$ and equal to $+ \infty$ outside the closed interval $[0,1]$).

________

[1] Blurred maximal cyclically monotone sets and bipotentials, with Géry de Saxcé and Claude Vallée, Analysis and Applications 8 (2010), no. 4, 1-14, arXiv:0905.0068

_______________________________

# Do triangulations of oriented surfaces compute?

In a precise sense, which I shall explain, they do. But the way they do it is hidden behind the fact that the rewrites seem non local.

1. They compute, because ribbon graphs with colored, trivalent nodes and directed edges do compute, via the encoding of untyped lambda terms into this family of graphs, provided by chemlambda. Indeed, a chemlambda molecule is a ribbon graph with these properties. If you want to encode a lambda term into chemlambda then there is a simple procedure: start from the lambda term on a form which eliminates the need of any alpha conversion. Then build the syntactic tree and replace the nodes by A nodes for application and L nodes for lambda abstraction (don’t forget that L nodes have one in and 2 out ports, differently from the syntactic tree node for lambda abstraction). Then eliminate the variables which are at the leaves by grafting trees of FO (green fanout) nodes from the lambda abstraction node to the places where the variables occur, or by grafting T (terminal) nodes to the lambda node which issues a variable which does not occur later, or simply by just erasing the variable label for those variables which are not issued from an abstraction. That’s it, you get a ribbon graph which is open (it has at least the root half-edge and maybe the half-edges for the variables which don’t come from an abstraction), but then you may add FRIN (free in) and FROUT (free out) nodes and think about them as tadpoles and you get a trivalent ribbon graph. The dual of this graph is (equivalent to) a triangulated, oriented surface, which has faces colored (corresponding to the nodes of the graph), directed edges, such that there are no faces with the 3 edges directed in a cyclic way.
2. How they compute? Chemlambda uses a set of graph rewrites which has some classic ones, like the Wadsworth-Lamping graphical version of the beta move, but it has two types of fanouts (FO and FOE), one FANIN, and different than usual rules for distributivity. Look at the moves page to see them. All these rewrites are local, in the sense that there is a small number, fixed a priori, which is an upper bound for the number of nodes and edges which enter (in any way) into the graph rewrite (as a condition or as the left pattern, or as the right pattern). The algorithm of application of the rewrites is a very important piece which is needed to make a model of computation. The algorithm is very simple, it can be deterministic or random, and consists, in the deterministic case, into the application of as many rewrites as possible, with a priority for the distributivity moves in case of conflict, and in the random case, it’s just random application of rewrites.

Here is an example, where I play with the reduction of false omega id in chemlambda

1. Now let’s pass to the duals, the triangulated surfaces. The nodes of the triangulated surface correspond to the faces of the ribbon graph. Or the faces of the ribbon graph are global notions, because they are the orbits of a permutation. After one of the rewrites, the faces (of the ribbon graph) change in a way which has to be non local, because one has to compute again the orbits of the permutation for the new graph, and there is no upper bound on the number of half-edges which have to be visited for doing that.
2. So triangulated, oriented surfaces do compute, but the rewrites and the algorithm of application are hidden behind this duality. They are non-local for triangulated surfaces, but local for ribbon graphs.
3. Finally, a word of attention: these surfaces do compute not by being arrows in a category. They don’t compute in this usual, say Turaev kind of way. They compute by (the duals of) the rewrites, there is nothing else than triangulated surfaces, colored by 3 colors (red, green, yellow), there is no decoration which actually does the computation by substitution and evaluation. I don’t know why, but this seems very hard to understand by many. Really, these surfaces compute by rewrites on the triangulations, not by anything else.

ADDED: If you look at the tadpoles as pinches, then make the easy effort to see what  the SKI formalism looks like, you’ll see funny things. The I combinator is the sphere with one pinch (the plane), the K combinator is the sphere with two pinches (cylinder) and the S combinator is the torus with one pinch. But what is SKK=I? What is KAB=A? What you see in the dual (i.e in the triangulation) It depends globally on the whole term, so these reductions do not appear to be the same topological manipulations in different contexts.