This is a note about a simple use of convex analysis in relation with neural networks. There are many points of contact between convex analysis and neural networks, but I have not been able to locate this one, thanks for pointing me to a source, if any.
Let’s start with a directed graph with set of nodes (these are the neurons) and a set of directed bonds . Each bond has a source and a target, which are neurons, therefore there are source and target functions
,
so that for any bond the neuron is the source of the bond and the neuron is the target of the bond.
For any neuron :
- let be the set of bonds with target ,
- let be the set of bonds with source .
A state of the network is a function where is the dual of a real vector space . I’ll explain why in a moment, but it’s nothing strange: I’ll suppose that and are dual topological vector spaces, with duality product denoted by such that any linear and continuous function from to the reals is expressed by an element of and, similarly, any linear and continuous function from to the reals is expressed by an element of .
If you think that’s too much, just imagine to be finite euclidean vector space with the euclidean scalar product denoted with the notation.
A weight of the network is a function , you’ll see why in a moment.
Usually the state of the network is described by a function which associates to any bond a real value . A weight is a function which is defined on bonds and with values in the reals. This corresponds to the choice and . A linear function from to is just a real number .
The activation function of a neuron gives a relation between the values of the state on the input bonds and the values of the state of the output bonds: any value of an output bond is a function of the weighted sum of the values of the input bonds. Usually (but not exclusively) this is an increasing continuous function.
The integral of an increasing continuous function is a convex function. I’ll call this integral the activation potential (suppose it does not depends on the neuron, for simplicity). The relation between the input and output values is the following:
for any neuron and for any bond we have
.
This relation generalizes to:
for any neuron and for any bond we have
where is the subgradient of a convex and lower semicontinuous activation potential
Written like this, we are done with any smoothness assumptions, which is one of the strong features of convex analysis.
This subgradient relation also explains the maybe strange definition of states and weights with the vector spaces and .
This subgradient relation can be expressed as the minimum of a cost function. Indeed, to any convex function is associated a sync (means “syncronized convex function, notion introduced in [1])
where is the Fenchel dual of the function , defined by
This sync has the following properties:
- it is convex in each argument
- for any
- if and only if .
With the sync we can produce a cost associated to the neuron: for any , the contribution to the cost of the state and of the weight is
.
The total cost function is
and it has the following properties:
- for any state and any weight
- if and only if for any neuron and for any bond we have
so that’s a good cost function.
Example:
- take to be the softplus function
- then the activation function (i.e. the subgradient) is the logistic function
- and the Fenchel dual of the softplus function is the (negative of the) binary entropy (extended by for or and equal to outside the closed interval ).
________
[1] Blurred maximal cyclically monotone sets and bipotentials, with Géry de Saxcé and Claude Vallée, Analysis and Applications 8 (2010), no. 4, 1-14, arXiv:0905.0068
_______________________________