Define backbone moves.
Incidentally add lambda.
But any machine would do.
This is a note about a simple use of convex analysis in relation with neural networks. There are many points of contact between convex analysis and neural networks, but I have not been able to locate this one, thanks for pointing me to a source, if any.
Let’s start with a directed graph with set of nodes (these are the neurons) and a set of directed bonds . Each bond has a source and a target, which are neurons, therefore there are source and target functions
so that for any bond the neuron is the source of the bond and the neuron is the target of the bond.
For any neuron :
A state of the network is a function where is the dual of a real vector space . I’ll explain why in a moment, but it’s nothing strange: I’ll suppose that and are dual topological vector spaces, with duality product denoted by such that any linear and continuous function from to the reals is expressed by an element of and, similarly, any linear and continuous function from to the reals is expressed by an element of .
If you think that’s too much, just imagine to be finite euclidean vector space with the euclidean scalar product denoted with the notation.
A weight of the network is a function , you’ll see why in a moment.
Usually the state of the network is described by a function which associates to any bond a real value . A weight is a function which is defined on bonds and with values in the reals. This corresponds to the choice and . A linear function from to is just a real number .
The activation function of a neuron gives a relation between the values of the state on the input bonds and the values of the state of the output bonds: any value of an output bond is a function of the weighted sum of the values of the input bonds. Usually (but not exclusively) this is an increasing continuous function.
The integral of an increasing continuous function is a convex function. I’ll call this integral the activation potential (suppose it does not depends on the neuron, for simplicity). The relation between the input and output values is the following:
for any neuron and for any bond we have
This relation generalizes to:
for any neuron and for any bond we have
where is the subgradient of a convex and lower semicontinuous activation potential
Written like this, we are done with any smoothness assumptions, which is one of the strong features of convex analysis.
This subgradient relation also explains the maybe strange definition of states and weights with the vector spaces and .
This subgradient relation can be expressed as the minimum of a cost function. Indeed, to any convex function is associated a sync (means “syncronized convex function, notion introduced in )
where is the Fenchel dual of the function , defined by
This sync has the following properties:
With the sync we can produce a cost associated to the neuron: for any , the contribution to the cost of the state and of the weight is
The total cost function is
and it has the following properties:
so that’s a good cost function.
 Blurred maximal cyclically monotone sets and bipotentials, with Géry de Saxcé and Claude Vallée, Analysis and Applications 8 (2010), no. 4, 1-14, arXiv:0905.0068
In a precise sense, which I shall explain, they do. But the way they do it is hidden behind the fact that the rewrites seem non local.
Here is an example, where I play with the reduction of false omega id in chemlambda
ADDED: If you look at the tadpoles as pinches, then make the easy effort to see what the SKI formalism looks like, you’ll see funny things. The I combinator is the sphere with one pinch (the plane), the K combinator is the sphere with two pinches (cylinder) and the S combinator is the torus with one pinch. But what is SKK=I? What is KAB=A? What you see in the dual (i.e in the triangulation) It depends globally on the whole term, so these reductions do not appear to be the same topological manipulations in different contexts.
computing with space | open notebook
The Decentralised Internet is Here
An experimental 3d voxel rendering algorithm
Tracking retractions as a window into the scientific process
a personal view of the theory of computation