One of the first articles with means for complete validation by reproducibility

I have not stressed enough this aspect. The article

M. Buliga, Molecular computers

is one of the first articles which comes with complete means of validation by reproducibility.

This means that along with the content of the article, which contains animations and links to demonstrations, comes a github repository with the scripts which can be used to validate (or invalidate, of course) this work.

I can’t show you here how the article looks like, but I can show you a gif created from this  video of a demonstration which appears also in the article (however, with simpler settings, in order to not punish too much the browser).

KebzRb

This is a chemical like computation of the Ackermann(2,2) function.

In itself, is intended to show that if autonomous computing molecules can be created by the means proposed in the article, then impressive feats can be achieved.

This is part of the discussion about peer review and the need to pass to a more evolved way of communicating science.There are several efforts in this direction, like for example PeerJ’s paper-now commented in this post. See also the post Fascinating: micropublications, hypothes.is for more!

Presently one of the most important pieces of this is the peer review, which is the social practice consisting in declarations of one, two, four, etc anonymous professionals that they have checked the work and they consider it valid.

Instead, an ideal should be the article which runs in the browser, i.e. one which comes with means which would allow anybody to validate it up to external resources, like the works by other authors.

(For example, if I write in my article that “According to the work [1]   A is true. Here we prove that B follows from A.” then I should provide means to validate the proof that A implies B, but it would be unrealistical to be ask me to provide means to validate A.)

This is explained in more detail in Reproducibility vs peer review.

Therefore, if you care about evolving the form of the scientific article, then you have a concrete, short example of what can be done in this direction.

Mind that I am stubborn enough to cling to this form of publication, not because I am afraid to submit these beautiful ideas to legacy journals, but because I want to promote new ways of sharing research by using the best content I can make.

_________________________________________

A short complete description of chemlambda as a universal model of computation

UPDATE(2020): All you need to read is this.

A mol file is any finite list of items of the following kind, which
satisfies conditions written further. Take a countable collection of
variables, denote them by a, b, c, … . There is a list of symbols:
L, A, FI, FO, FOE, Arrow, T, FRIN, FROUT. A mol file is a list made of
lines of the form:
–  L a b c    (called abstraction)
–  A a b c    (called application)
– FI a b c    (called fanin)
– FO a b c   (called fanout)
– FOE a b c  (called other fanout)
– Arrow a b   (called arrow)
– T a            (called terminal)
– FRIN a      (called free in)
– FROUT a    (called free out)

The variables which appear in a mol file are called port names.

Condition 1: any port name appears exactly twice

Depending on the symbol (L, A, FI, FO, FOE, Arrow, T, FRIN, FROUT),
every port name has two types, the first from the list (left, right,
middle), the second from the list (in,out). The convention is to write
this pair of types as middle.in, or left.out, for example.

Further I repeat the list of possible lines in a mol file, but I shall
replace a, b, c, … by their types:
– L middle.in left.out right.out
– A left.in right.in middle.out
– FI left.in right.in middle.out
– FO middle.in left.out right.out
– FOE middle.in left.out right.out
– Arrow middle.in middle.out
– T middle.in
– FRIN middle.out
– FROUT middle.in

Condition 2: each port name (which appears in exactly two places
according to condition 1) appears in a place with the type *.in and in
the other place with the type *.out

Two mol files define the same molecule up to the following:
– there is a renaming of port names from one mol file to the other
– any permutation of the lines in a mol file gives an equivalent mol file

(The reason is that a mol file is a notation for an oriented graph
with trivalent or 2-valent or univalent nodes, made of locally planar
trivalent nodes L, A, FI, FO, FOE , 2valent node Arrow and 1valent
nodes T, FRIN, FROUT. In this notation the port names come from an
arbitrary naming of the arrows, then the mol file is a list of nodes,
in arbitrary order, along with port names coming from the names of
arrows.)

(In the visualisations these graphs-molecules are turned into
undirected graphs, made of nodes of of various radii and colours. To
any line from a mol file, thus to any node from the oriented graph,
correspond up to 4 nodes in the graphs from the visualisations.
Indeed, the symbols L, A, FI, FO, appear as nodes or radius
main_const and colour red_col or green_col, FOE, T and Arrow have different colours. Their respective ports
appear as nodes of colour in_col or out_col, for the types “in” or
“out”, and radius “left”, “right”, “middle” for the corresponding
types.FRIN has the colour in_col, FROUT has the colour out_col)

The chemlambda rewrites are of the following form: replace a left
pattern (LT) consisting of 2 lines from a mol file, by a right pattern
(RT) which may consist of one, two, three or four lines. This is
written as LT – – > RT

The rewrites are: (with the understanding that port names 1, 2, … c,
from LT  represent port names which exist in the mol file before the
rewrite and j, k, … from RT but not appearing in LT represent new
port names)

http://chorasimilarity.github.io/chemlambda-gui/dynamic/moves.html

– A-L (or beta):
L 1 2 c , A c 4 3 – – > Arrow 1 3 , Arrow 4 2

– FI-FOE (or fan-in):
FI 1 4 c , FOE c 2 3 – – > Arrow 1 3 , Arrow 4 2

– FO-FOE :
FO 1 2 c , FOE c 3 4 – – > FI j i 2 , FO k i 3 , FO l j 4 , FOE 1 k l

– FI-FO:
FI 1 4 c , FO c 2 3 – – > FO 1 i j , FI i k 2 , FI j l 3 , FO 4 k l

– L-FOE:
L 1 2 c , FOE c 3 4 – – > FI j i 2, L k i 3 , L l j 4 , FOE 1 k l

– L-FO:
L 1 2 c , FO c 3 4 – – > FI j i 2 , L k i 3 , L l j 4 , FOE 1 k l

– A-FOE:
A 1 4 c , FOE c 2 3 – – > FOE 1 i j , A i k 2 , A j l 3 , FOE 4 k l

– A-FO:
A 1 4 c , FO c 2 3 – – > FOE 1 i j , A i k 2 , A j l 3 , FOE 4 k l

– A-T:                                 A 1 2 3 , T 3 – – > T 1 , T 2

– FI-T:                                FI 1 2 3 , T 3 – – > T 1 , T 2

– L-T:                                 L 1 2 3 , T 3 – – > T 1 , T c , FRIN c

– FO2-T:                            FO 1 2 3 , T 2 – – > Arrow 1 3

– FOE2-T:                          FOE 1 2 3 , T 2 – – > Arrow 1 3

– FO3-T:                            FO 1 2 3 , T 3 – – > Arrow 1 2

– FOE3-T:                          FOE 1 2 3 , T 3 – – > Arrow 1 2

– COMB:                           any node M (excepting Arrow) any out
port c  , Arrow c d  – – >  M  d

Each of these rewrites are seen as a chemical interaction of the mol
file (molecule) with an invisible enzyme, which rewrites the LT into
RT.

(Actually one can group the rewrites into families, so there are
needed enzymes for (A-L, FI-FOE), for (FO-FOE, L-FOE, L-FO, A-FOE,
A-FO, FI-FOE, FI-FO) for (A-T, FI-T, L-T) and for (FO2-T, FO3-T,
FOE2-T, FOE3-T). COMB rewrites,a swell as the Arrow elements,  have a
less clear interpretation chemically and there may be not needed, or
even eliminated, see further how they appear in the reduction
algorithm.)

The algorithm (called “stupid”) of application  of the rewrites has
two variants: deterministic and random. Further I explain both. For
the random version there are needed some weights, denoted by wei_*
where * is the name of the rewrite.

(The algorithms actually used in the demos have a supplementary family
of weights, which are used in relation with the  count of the enzymes
used, I’ll pass this.)

The algorithm takes as input a mol file. Then there is a cycle which
repeats (indefinitely, or a prescribed number of times specified at
the beginning of the computation, or it stops if there are no rewrites
possible).

Then it marks all lines of the mol file as unblocked.
Then it creates an empty file of replacement proposals.

Then the  cycle is:

1- identify all LT  which do not contain blocked lines for the move
FO-FOE and mark the lines from the identified LT as “blocked”. Propose
to replace the LT’s by RT’s. In the random version flip a coin with
weight wei-FOFOE  for each instance of the LT identified and according
to the coin drop block or ignore the instance.

2- identify all LT which do not contain blocked lines for the  moves
(L-FOE, L-FO, A-FOE, A-FO, FI-FOE, FI-FO) and mark the lines from
these LT’s as “blocked” and propose to replace these by the respective
RTs. In the random version flip a coin with the respective weight
before deciding to block and replacement proposals.

3 – identify all LT which do not contain blocked lines for the  moves
(A-L, FI-FOE) and mark the lines from these LT’s as “blocked” and
propose to replace these by the respective RTs. In the random version
flip a coin with the respective weight before deciding to block and
replacement proposals.

4 – identify all LT which do not contain blocked lines for the  moves
(A-T, FI-T, L-T) and for (FO2-T, FO3-T, FOE2-T, FOE3-T) and mark the
lines from these LT’s as “blocked” and propose to replace these by the
respective RTs. In the random version flip a coin with the respective
weight before deciding to block and replacement proposals.

5- erase all LT which have been blocked and replace them by the
proposals. Empty the proposals file.

6- the COMB cycle: repeat  the application of COMB moves, in the same
style (blocks and proposals and updates) until no COMB move is
possible.

7- mark all lines as unblocked

The main cycle ends here.

The algorithm ends if (the number of cycles is attained, or there are
no rewrites performed in the last cycle, in the deterministic version,
or there were no rewrites in the last N cycles, with N predetermined,
in the random version).

___________________________________

Three models of chemical computing

+Lucius Meredith  pointed to stochastic pi-calculus ans SPiM  in a discussion about chemlambda. I take this as reference http://research.microsoft.com/en-us/projects/spim/ssb.pdf (say pages 8-13 to have a anchor for the discussion).

Analysis: SPiM side
– The nodes of the graphs are molecules, the arrows are channels.
– As described there the procedure is to take a (huge perhaps) CRN and to reformulate it more economically, as a collection of graphs where nodes are molecules, arrows are channels.
– There is a physical chemistry model behind which tells you which probability has each reaction.
– During the computation the reactions are known, all the molecules are known, the graphs don’t change and the variables are concentrations of different molecules.
– During the computation one may interpret the messages passed by the channels as decorations of a static graph.

The big advantage is that indeed, when compared with a Chemical Reactions Network approach,  the stochastic pi calculus transforms the CRN  into a much more realistical model. And much more economical.

chemlambda side:

Take the pet example with the Ackermann(2,2)=7 from the beginning of http://chorasimilarity.github.io/chemlambda-gui/dynamic/molecular.html

(or go to the demos page for more http://chorasimilarity.github.io/chemlambda-gui/dynamic/demos.html )

– You have one molecule (it does not matter if it is a connected graph or not, so you may think about it as being a collection of molecules instead).
– The nodes of the molecule are atoms (or perhaps codes for simpler molecular bricks). The nodes are not species of molecules, like in the SPiM.
– The arrows are bonds between atoms (or perhaps codes for other simple molecular bricks). The arrows are not channels.
– I don’t know which are all the intermediary molecules in the computation. To know it would mean to know the result of the computation before. Also, there may be thousands possible intermediary molecules. There may be an infinity of possible intermediary molecules, in principle, for some initial molecules.
-During the computation the graph (i.e. the molecule) changes. The rewrites are done on the molecule, and they can be interpreted as chemical reactions where invisible enzymes identify a very small part of the big molecule and rewrite them, randomly.

Conclusion: there is no relation between those two models.

Now, chemlambda may be related to +Tim Hutton  ‘s artificial chemistry.
Here is a link to a beautiful javascript chemistry
http://htmlpreview.github.io/?https://github.com/ModelingOriginsofLife/ParticleArtificialChemistry/blob/master/index.html
where you see that
-nodes of molecules are atoms
-arrows of molecules are bonds
-the computation proceeds by rewrites which are random
– but distinctly from chemlambda the chemical reactions (rewrites) happen in a physical 2D space (i.e based on the space proximity of the reactants)

As something in the middle between SPiM and jschemistry, there is a button which shows you a kind if instantaneous reaction graph!

Improve chemical computing complexity by a 1000 times factor, easily

That is, if autonomous computing molecules are possible, as described in the model shown in the Molecular computers.

To be exactly sure about about the factor, I need to know the answer for the following question:

What is the most complex chemical computation done without external intervention, from the moment when the (solution, DNA molecule, whatnot) is prepared, up to the moment when the result is measured?

Attention, I know that there are several Turing complete chemical models of computations, but they all involve some manipulations (heating a solution, splitting one in two, adding something, etc).

I believe, but I may be wrong, depending on the answer to this question, that the said complexity is not bigger than a handful of boolean gates, or perhaps some simple Turing Machines, or a simple CA.

If I am right, then compare with my pet example: the Ackermann function. How many instructions a TM, or a CA, or how big a circuit has to be to do this? 1000 times is a clement estimate. This can be done in my proposal easily.

So, instead of trying to convince you that my model is interesting because is related with lmbda calculus, maybe I can make you more interested if I tell you that for the same material input, the computational output is way bigger than in the best model you have.

Thank you for answering to the question, and possibly for showing me wrong.

___________________________

A second opinion on “Slay peer review” article

“It is no good just finding particular instances where peer review has failed because I can point you to specific instances where peer review has been very successful,” she said.
She feared that abandoning peer review would make scientific literature no more reliable than the blogosphere, consisting of an unnavigable mass of articles, most of which were “wrong or misleading”.
This is a quote from one of the most interesting articles I read these days: “Slay peer review ‘sacred cow’, says former BMJ chief” by Paul Jump.
http://www.timeshighereducation.co.uk/news/slay-peer-review-sacred-cow-says-former-bmj-chief/2019812.article#.VTZxYhJAwW8.twitter
I commented previously about replacing peer-review with validation by reproducibility
but now I want to concentrate on this quote, which, according to the author of the article,  has been made by “Georgina Mace, professor of biodiversity and ecosystems at University College London”.This is the pro argument in favour of the actual peer review system. Opposed to it, and main subject of the article, is”Richard Smith, who edited the BMJ between 1991 and 2004, told the Royal Society’s Future of Scholarly Scientific Communication conference on 20 April that there was no evidence that pre-publication peer review improved papers or detected errors or fraud.”

I am very much convinced by this, but let’s think coldly.

Pro peer review is that a majority of peer reviewed articles is formed by correct articles, while a majority of  “the blogosphere [is] consisting of an unnavigable mass of articles, most of which were “wrong or misleading””.

Contrary to peer review is that, according to “Richard Smith, who edited the BMJ between 1991 and 2004” :

“there was no evidence that pre-publication peer review improved papers or detected errors or fraud.”
“Referring to John Ioannidis’ famous 2005 paper “Why most published research findings are false”, Dr Smith said “most of what is published in journals is just plain wrong or nonsense”. […]
“If peer review was a drug it would never get on the market because we have lots of evidence of its adverse effects and don’t have evidence of its benefit.””
and moreover:

“peer review was too slow, expensive and burdensome on reviewers’ time. It was also biased against innovative papers and was open to abuse by the unscrupulous. He said science would be better off if it abandoned pre-publication peer review entirely and left it to online readers to determine “what matters and what doesn’t”.”

Which I interpret as confidence in the blogosphere-like medium.

Where is the truth? In the middle, as usual.

Here is my opinion, please form yours.

The new medium comes with new, relatively better means to do research. An important part of the research involves communication, and it is clear that the old system is already obsolete. It is kept artificially alive by authority and business interests.

However, it is also true that a majority of productions which are accessible via the new medium are of a very bad quality and unreliable.

To make another comparison, in the continuation of the one about the fall of academic painters and the rise of impressionists
https://chorasimilarity.wordpress.com/2013/02/16/another-parable-of-academic-publishing-the-fall-of-19th-century-academic-art/
a majority of the work of academic painters was good but not brilliant (reliable but not innovative enough), a majority of non academic painters produce crappy cute paintings which average people LOVE to see and comment about.
You can’t accuse a non affiliated painter that he shows his work in the same venue where you find all the cats, kids, wrinkled old people and cute places.

Science side, we live in a sea of crappy content which is loved by the average people.

The so  called attention economy consists mainly in shuffling this content from a place to another. This is because liking and sharing content is a different activity than creating content. Some new thinking is needed here as well, in order to pass over the old idea of scarce resources which are made available by sharing them.

It is difficult for a researcher, who is a particular species of a creator, to find other people willing to spend time not only to share original ideas (which are not liked because strange, by default), but also to invest  work into understanding it, into validating it, which is akin an act of creation.

That is why I believe that:
– there have to be social incentives for these researchers  (and that attention economy thinking is not helping this, being instead a vector of propagation for big budget PR and lolcats and life wisdom quotes)
– and that the creators of new scientific content have to provide as many as possible means for self-validation of their work.

_________________________________

A comparison of two models of computation: chemlambda and Turing machines

The purpose is to understand clearly what is this story about. The most simple stuff, OK? in order to feel it in familiar situations.

Proceed.
Chemlambda is a collection of rules about rewritings done on pieces of files in a certain format. Without an algorithm which tells which rewrite to use, where and when,  chemlambda does nothing.

In the sophisticated version of the Distributed GLC proposal, this algorithmic part uses the Actor Model idea. Too complicated!, Let’s go simpler!

The simplest algorithm for using the collection of rewrites from chemlambda is the following:
  1. take a file (in the format called “mol”, see later)
  2. look for all patterns in the file which can be used for rewrites
  3. if there are different patterns which overlap, then pick a side (by using an ordering or graph rewrites, like the precedence rules in arithmetic)
  4. apply all the rewrites at once
  5. repeat (either until there is no rewrite possible, or a given number of times, or forever)
 To spice things just a bit, consider the next simple algorithm, which is like the one described, only that we add at step 2:
  •  for every identified pattern flip a coin to decide to keep it or ignore it further in the algorithm
The reason  is that  randomness is the simplest way to say: who knows if I can do this rewrite when I want, or maybe I have in my computer only a part of the file, or maybe I have to know that a friend has a half of the pattern and I have the other, so I have to talk with him first, then agree to make together the rewrite. Who knows? Flip a coin then.

Now, proven facts.

Chemlambda with the stupid deterministic algorithm is Turing universal. Which means that implicitly this is a model of computation. Everything is prescribed from the top to the bottom. Is on the par with a Turing machine, or with a RAND model.

Chemlambda with the random stupid model seems to be also Turing universal, but I don’t have yet a proof for this. There is a reason for the fact that it is as powerful as the stupid deterministic model, but I won’t go there.

So the right image to have is that chemlambda with the  described algorithm can do anything any computer can.

The first question is, how? For example how compares chemlambda with a Turing machine? If it is at this basic level then it means it is incomprehensible, because we humans can’t make sense of a scroll of bytecode, unless we are highly trained in this very specialised task.

All computers do the same thing: they crunch machine code. No matter which high language you use to write a program, it is then compiled and eventually there is a machine code which is executed, and that is the level we speak.

It does not matter which language you use, eventually all is machine code. There is a huge architectural tower and we are on the top of it, but in the basement all looks the same. The tower is here for us to be easy to use the superb machine. But it is not needed otherwise, it is only for our comfort.

This is very puzzling when we look at chemlambda because it is claimed that chemlambda has something to do with lambda calculus, or lambda calculus is the prototype of a functional programming language. So it appears that chemlamdba should be associated with higher meaning and clever thinking, and abstraction of the abstraction of the abstraction.

No, from the point of view of the programmer.

Yes, from the point of view of the machine.

In order to compare chemlambda with a TM we have to put it in the same terms. So you can easily put a TM in terms of a rewrite system, such that it works with the same stupid deterministic algorithm. http://chorasimilarity.github.io/chemlambda-gui/dynamic/turingchem.html

It is not yet put there, but the conclusion is obvious: chemlambda can do lambda calculus with one rewrite, while an Universal Turing Machine needs about 20 rewrites to do what TM do.

Unbelievable!
Wait, what about distributivity, propagation, the fanin, all the other rewrites?
They are common, they just form a mechanism for signal transduction and duplication!
Chemlambda is much simpler than TM.

So you can use directly chemlambda, at this metal level, to perform lambda calculus. Is explained here
https://chorasimilarity.wordpress.com/2015/04/21/all-successful-computation-experiments-with-lambda-calculus-in-one-list/

And I highly recommend  to try to play with it by following the instructions.

You need a linux system, or any system where you have sh and awk.

Then

2. unzip it and go to the directory “dynamic”
3. open a shell and write:  bash moving_random_metabo.sh
4. you will get a prompt and a list of files with the extension .mol , write the name of one of them, in the form file.mol
5. you get file.html. Open it with a browser with js enabled. For reasons I don’t understand, it works much better in safari, chromium, chrome than in firefox.

When you look at the result of the computation you see an animation, which is the equivalent of seeing a TM head running here and there on a tape. It does not make much sense at first, but you can convince that it works and get a feeling about how it does it.

Once you get this feeling I will be very glad to discuss more!

Recall that all this is related  to the most stupid algorithm. But I believe it helps a lot to understand how to build on it.
____________________________________________________

Yes, “slay peer review” and replace it by reproducibility

Via Graham Steel the awesome article Slay peer review ‘sacred cow’, says former BMJ chief.

“Richard Smith, who edited the BMJ between 1991 and 2004, told the Royal Society’s Future of Scholarly Scientific Communication conference on 20 April that there was no evidence that pre-publication peer review improved papers or detected errors or fraud. […]

“He said science would be better off if it abandoned pre-publication peer review entirely and left it to online readers to determine “what matters and what doesn’t”.

“That is the real peer review: not all these silly processes that go on before and immediately after publication,” he said.”

That’s just a part of the article, go read the counter arguments by Georgina Mace.

Make your opinion about this.

Here is mine.

In the post Reproducibility vs peer review I write

“The only purpose of peer review is to signal that at least one, two, three or four members of the professional community (peers) declare that they believe that the said work is valid. Validation by reproducibility is much more than this peer review practice. […]

Compared to peer review, which is only a social claim that somebody from the guild checked it, validation through reproducibility is much more, even if it does not provide means to absolute truths.”

There are several points to mention:

  • the role of journals is irrelevant to anybody else than publishers and their fellow academic bureaucrats who work together to maintain this crooked system, for their own $ advantage.
  • indeed, an article should give by itself the means to validate its content
  • which means that the form of the article has to change from the paper version to a document which contains data, programs, everything which may help to validate the content written with words
  • and the validation process (aka post review) has to be put on the par with the activity of writing articles, Even if an article comes with all means to validate it (like the process described in  Reproducibility vs peer review ), the validation supposes work and by itself it is an activity akin to the one which is reported in the article. More than this, the validation may or may not function according to what the author of the work supposes, but in any case it leads to new scientific content.

In theory sounds great, but in practice it may be very difficult to provide a work with the means of validation (of course up to the external resources used in the work, like for example other works).

My answer is that: concretely it is possible to do this and I offer as example my article Molecular computers, which is published on github.io and it comes with a repository which contains all the means to confirm or refute what is written in the article.

The real problem is social. In such a system the bored researcher has to spend more than 10 min top to read an article he or she intends to use.

Then it is much easy, socially, to use the actual, unscientific system of replacing validation by authority arguments.

As well, the monkey system — you scratch my back and I’ll scratch yours — which is behind most of the peer reviews (only think about the extreme specialisation of research which makes that almost surely a reviewer competes or collaborates with the author), well, that monkey system will no longer function.

This is even a bigger problem than the one that publishing and academic bean counting will soon be obsolete.

So my forecast is that we shall keep a mix of authority based (read “peer review”) and reproducibility (by validation), for some time.

The authority, though, will take another blow.

Which is in favour of research. It is also economically sound, if you think that probably today a majority of funding for research go to researchers whose work pass peer reviews, but not validation.

______________________________________________

All successful computation experiments with lambda calculus in one list

What you see in this links: I take a lambda term, transform it into a artificial molecule, then let it reduce randomly, with no evaluation strategy. That is what I call the most stupid algorithm. Amazing is that is works.
You don’t have to believe me, because you can check it independently, by using the programs available in the github repo.

Here is the list of demos where lambda terms are used.

Now, details of the process:
– I take a lambda term and I draw the syntactic tree
– this tree has as leaves the variables, bound and free. These are eliminated by two tricks, one for the bound variables, the other for the free ones. The bound variables are eliminated by replacing them with new arrows in the graph, going from one leg of a lambda abstraction node, to the leaf where the variable appear. If there are more places where the same bound variable appears, then insert some fanout nodes (FO). For the free variable do the same, by adding for each free variable a tree of FO nodes. If the bound variable does not appear anywhere else then add a termination (T) node.
– in this way the graph which is obtained is no longer a tree. It is a trivalent graph mostly, with some free ends. It is an oriented graph. The free ends which correspond to a “in” arrow are there for each free variable. There is only one end which correspond to an “out” arrow, coming from the root of the syntactic tree.
– I give a unique name to each arrow in the graph
– then I write the “mol file” which represents the graph, as a list of nodes and the names of arrows connected to the nodes (thus an application node A which has the left leg connected to the arrow “a”, the right leg connected to the arrow “b” and the out leg connected to “c”, is described by one line “A a b c” for example.

OK, now I have the mol file, I run the scripts on it and then I look at the output.

What is the output?

The scripts take the mol file and transform it into a collection of associative arrays (that’s why I’m using awk) which describe the graph.

Then they apply the algorithm which I call “stupid” because really is stupidly simplistic: do a predetermined number of cycles, where in each cycle do the following
– identify the places (called patterns) where a chemlambda rewrite is possible (these are pairs of lines in the mol file, so pairs of nodes in the graph)
– then, as you identify a pattern, flip a coin, if the coin gives “0” then block the pattern and propose a change in the graph
– when you finish all this, update the graph
– some rewrites involve the introduction of some 2-valent nodes, called “Arrow”. Eliminate them in a inner cycle called “COMB cycle”, i.e. comb the arrows
-repeat

As you see, there is absolutely no care about the correctness of the intermediary graphs. Do they represent lambda terms? Generically no!
Are there any variable which are passed, or evaluations of terms which are done in some clever order (eager, lazy, etc)? Not at all, there are no other variables than the names of the arrows of the graph, or these ones have the property that they are names which appear twice in the mol file (once in a port “in”, 2nd in a port “out”). When the pattern is replaced these names disappear and the names of the arrows from the new pattern are generated on the fly, for example by a counter of arrows.

The scripts do the computation and they stop. There is a choice made over the way of seeing the computation and the results.
One obvious choice would be to see the computation as a sequence of mol files, corresponding to the sequence of graphs. Then one could use another script to transform each mol file into a graph (via, say, a json file) and use some graph visualiser to see the graph. This was the choice in the first scripts made.
Another choice is to make an animation of the computation, by using d3.js. Nodes which are eliminated are first freed of links and then they vanish, while new nodes appear, are linked with their ports, then linked with the rest of the graph.

This is what you see in the demos. The scripts produce a html file, which has inside a js script which uses d3.js. So the output of the scripts is the visualisation of the computation.

Recall hat the algorithm of computation is random, therefore it is highly likely that different runs of the algorithm give different animations. In the demos you see one such animation, but you can take the scripts from the repo and make your own.

What is amazing is that they give the right results!

It is perhaps bizzare to look at the computation and to not make any sense of it. What happens? Where is the evaluation of this term? Who calls whom?

Nothing of this happens. The algorithm just does what I explained. And since there are no calls, no evaluations, no variables passed from here to there, that means that you won’t see them.

That is because the computation does not work by the IT paradigm of signals sent by wires, through gates, but it works by what chemists call signal transduction. This is a pervasive phenomenon: a molecule enters in chemical reactions with others and there is a cascade of other chemical reactions which propagate and produce the result.

About what you see in the visualisations.
Because the graph is oriented, and because the trivalent nodes have the legs differentiated (i.e. for example there might be a left.in leg, a right.in leg and a out leg, which for symmetry is described as a middle.out leg) I want to turn it into an unoriented graph.
This is done by replacing each trivalent node by 4 nodes, and each free end or termination node by 2 nodes each.
For trivalent nodes there will be one main node and 3 other nodes which represents the legs. These are called “ports”. There will be a color-coded notation, and the choice made is to represent the nodes A (application) and FO by the main node colored green, the L (lambda) and FI (fanin, exists in chemlamda only) by red (actually in the demos this is a purple)
and so on. The port nodes are coloured by yellow for the “in” ports and by blue for the “out” ports. The “left”, right”, “middle” types are encoded by the radius of the ports.

__________________________________________________

Busy beaver’s random church

The demo linked here would surely look impossible.

One computer, WHILE IT WORKS, is multiplied into 3 computers which  continue to work, each of them. This feat is done without any higher control and there are never conflicts which appear. All computers work asynchronously (mimicked here by randomness). Moreover, eventually they arrive to read and write on the same tape.

There are no calls, nothing you have seen before.Everything is local, no matter how you slice it into separated parts.

_________________________________________

OMG the busy beaver goes into abstract thinking

So called higher capacities, like abstraction, are highly hyped.  Look:
What happens when you apply the Church number 3 to a busy beaver? You get 3 busy beavers on the same tape.

Details will be added into the article Turing machines, chemlambda style.

If you want tot experiment, then click on “fork me on github” and copy the gh-pages branch of the repo. Then look in the dynamic folder for the script moving_random_metabo_bb.sh. In a terminal type “bash moving_random_metabo_bb.sh”, then type “church3bb.mol”. You shall get the file church3bb.html which you can see by using a js enabled browser.
The sh script calls an awk script, which produces the html file. The awk script is check_1_mov2_rand_metabo_bb.awk. Open it with a text editor and you shall see at the beginning all kinds of parameters which you can change (before calling the sh script), so that you may alter the duration, the speed, change between deterministic and random algorithms.
Finally, you also need a mol file to play. For this demo has been used the mol file church3bb.mol. You can also open it with a text editor and play with it.

UPDATE: Will tweak it more the next days, but the idea which I want to communicate is that TM can be seen as chemistry, like in chemlambda, and it can interact very well with the rest of the formalism. So you have these two pillars of computation on the same footing, together, despite the impression that they are somehow very different, one like hardware and the other like software.

______________________________________

Turing Machines, chemlambda style (I)

UPDATE: First animation and a more detailed version here. There will be many additions, as well as programs available from the gh-pages of the repo.

Once again: why do chemists try to reproduce silicon computers? Because that’s the movie.

    At first sight they look very different. Therefore, if we want to compare them, we have to reformulate them in ways which are similar. Rewrite systems are appropriate here.
    The Turing Machine (TM) is a model of computation which is well-known as being a formalisation of what a computer is. It is a machine which has some internal states (from a set S), has a head which read/writes symbols (from a set A) on a tape. The tape is seen as an infinite word made of letters from A. The set A has a special letter (call it “b” from blank) and the infinite words which describe tapes have to be such that all but a finite number of letters of that word are different from “b”. Imagine an infinite, in both directions, tape which has written symbols on it, such that “b” represents an empty space. The tape has only a finite part of it filled with letters from A, others than the blank letter.
    The action of the machine depends on the internal state and on the symbol read by the head. It is therefore a function of the internal state of the machine (element of S), the letter from the tape which is read (element of A), and outputs a letter from the alphabet A (which is written in the case where previously was the letter which has been read, changes its internal state, and the head moves one step along the tape, to the left (L) or right (R), or does not move at all (N).
    The TM can be seen as a rewrite system.
    For example, one could see a TM as follows. (Pedantically this is seen as a multi-headed TM without internal states; the only interest in this distinction is that it raises the question if there is really any meaning into discerning internal states from the tape symbols.) We start from a set (or type) of internal states S. Such states are denoted by A, B, C (thus exhibiting their type by the type of the font used). There are 3 special symbols: < is the symbol of the beginning of a list (i.e. word), > is the symbol of the end of a list (word) and M is the special symbol for the head move (or of the type associated to head moves). There is an alphabet A of external states (i.e. tape symbols), with b (the blank) being in A.
    A tape is then a finite word (list) of one of the forms < w A w’ > , < w M A w’ > , < w A M w’ >, where A is an internal state and w and w’ are finite words written with the alphabet A, which can be empty words as well.
    A rewrite replaces a left pattern (LT) by a right pattern (RT), and there are denoted as LT – – > RT . Here LT and RT are sub-words of the tape word. It supposed that all rewrites are context independent, i.e. LT is replaced by RT regardless of the place where LT appears in the tape word. The rewrite is called “local” if the lengths (i.e. number of letters) of LT and RT are bounded a priori.
    A TM is given as a list of Turing instructions, which have the form (current internal state, tape letter read, new internal state, tape letter write, move of the tape). In terms as explained here, all this can be expressed via local rewrites.

  • Rewrites which introduce blanks at the extremities of the written tape:
    • < A   – – >   < b A
    • A >   – – >   A b >
  • Rewrites which describe how the head moves:
    • A M a   – – >   a A
    • a M A   – – >   A a
  • Turing instructions rewrites:
    • a A c   – – >   d B M c   , for the Turing instruction (A, a, B, d, R)
    • a A c   – – >   d M B c   , for the Turing instruction (A, a, B, d, L)
    • a A c   – – >   d B c   , for the Turing instruction (A, a, B, d, N)
    Together with the algorithm “at each step apply all the rewrites which are possible, else stop” we obtain the deterministic TM model of computation. For any initial tape word, the algorithm explains what the TM does to that tape. < don’t forget to link that to a part of the Cartesian method “to be sure that I made an exhaustive enumeration” which is clearly going down today > Other algorithms are of course possible. Before mentioning some very simple variants of the basic algorithm, let’s see when it works.
    If we start from a tape word as defined here, there is never a conflict of rewrites. This means that there is never the case that two LT from two different rewrites overlap. It might be the case though, if we formulate some rewrites a bit differently. For example, suppose that the Turing rewrites are modified to:
  • a A   – – >   d B M   , for the Turing instruction (A, a, B, d, R)
  • a A   – – >   d M B   , for the Turing instruction (A, a, B, d, L)
    Therefore, the LT of the Turing rewrites is no longer of the form “a A c”, but of the form “a A”. Then it may enter in conflict with the other rewrites, like in the cases:

  • a A M c where two overlapping rewrites are possible
    • Turing rewrite: a A M c   – – >   d M B M c &nbsp which will later produce two possibilities for the head moves rewrites, due to the string M B M
    • head moves rewrite: a A M c   – – >   a c A &nbsp which then produces a LT for a Turing rewrite for c A, instead of the previous Turing rewrite for a A
  • a A > where one may apply a Turing rewrite on a A, or a blank rewrite on A >
    The list is non exhaustive. Let’s turn back to the initial formulation to the Turing rewrites and instead let’s change the definition of a tape word. For example, suppose we allow multiple TM heads on the same tape, more precisely suppose that we accept initial tape words of the form < w1 A w2 B w3 C … wN >. Then we shall surely encounter conflicts between head moves rewrites for patterns as a A M B c.
    The most simple solution for solving these conflicts is to introduce a priority of rewrites. For example we may impose that blank rewrites take precedence over head moves rewrites, which take precedence over Turing rewrites. More such structure can be imposed (like some head moves rewrites have precedence over others). Even new rewrites may be introduced, for example rewrites which allow multiple TMs on the same tape to switch place.
    Let’s see an example: the 2-symbols, 3-states

busy beaver machine

    . Following the conventions from this work, the tape letters (i.e. the alphabet A) are “b” and “1”, the internal states are A, B, C, HALT. (The state HALT may get special treatment, but this is not mandatory). The rewrites are:

  • Rewrites which introduce blanks at the extremities of the written tape:
    • < X   – – >   < b X   for every internal state X
    • A >   – – >   A b >   for every internal state X
  • Rewrites which describe how the head moves:
    • X M a   – – >   a X   , for every internal state X and every tape letter a
    • a M X   – – >   X a   , for every internal state X and every tape letter a
  • Turing instructions rewrites:
    • b A c   – – >   1 B M c   , for every tape letter c
    • b B c   – – >   b C M c   , for every tape letter c
    • b C c   – – >   1 M C c   , for every tape letter c
    • 1 A c   – – >   1 HALT M c   , for every tape letter c
    • 1 B c   – – >   1 B M c   , for every tape letter c
    • 1 C c   – – >   1 M A c   , for every tape letter c
    We can enhance this by adding the priority of rewrites, for example in the previous list, any rewrite has priority over the rewrites written below it. In this way we may relax the definition of the initial tape word and allow for multiple heads on the same tape. Or for multiple tapes.
    Suppose we put the machine to work with an infinite tape with all symbols being blanks. This corresponds to the tape word < A >. Further are the steps of the computation:
  • < A >   – – >   < b A >
  • <b A >   – – >   < b A b >
  • < b A b >   – – >   < 1 B M b >
  • < 1 B M b >   – – >   < 1 b B >
  • < 1 b B >   – – >   < 1 b B b >
  • < 1 b B b >   – – >   < 1 b C M b >
  • < 1 b C M b >   – – >   < 1 b b C >
  • < 1 b b C >   – – >   < 1 b b C b >
  • < 1 b b C b >   – – >   < 1 b 1 M C b >
  • < 1 b 1 M C b >   – – >   < 1 b C 1 b >
  • < 1 b C 1 b >   – – >   < 1 1 M C 1 b >
  • < 1 1 M C 1 b >   – – >   < 1 C 1 1 b >
  • < 1 C 1 1 b >   – – >   < 1 M A 1 1 b >
  • < 1 M A 1 1 b >   – – >   < A 1 1 1 b >
  • < A 1 1 1 b >   – – >   < b A 1 1 1 b >
  • < b A 1 1 1 b >   – – >   < 1 B M 1 1 1 b >
  • <1 B M 1 1 1 b >   – – >   < 1 1 B 1 1 b >
  • < 1 1 B 1 1 b >   – – >   < 1 1 B M 1 1 b >
  • < 1 1 B M 1 1 b >   – – >   < 1 1 1 B 1 b >
  • < 1 1 1 B 1 b >   – – >   < 1 1 1 B M 1 b >
  • < 1 1 1 B M 1 b >   – – >   < 1 1 1 1 B b >
  • < 1 1 1 1 B b >   – – >   < 1 1 1 1 B M b >
  • < 1 1 1 1 B M b >   – – >   < 1 1 1 1 b B >
  • < 1 1 1 1 b B >   – – >   < 1 1 1 1 b B b >
  • < 1 1 1 1 b B b >   – – >   < 1 1 1 1 b C M b >
  • < 1 1 1 1 b C M b >   – – >   < 1 1 1 1 b b C >
  • < 1 1 1 1 b b C >   – – >   < 1 1 1 1 b b C b >
  • < 1 1 1 1 b b C b >   – – >   < 1 1 1 1 b 1 M C b >
  • < 1 1 1 1 b 1 M C b >   – – >   < 1 1 1 1 b C 1 b >
  • < 1 1 1 1 b C 1 b >   – – >   < 1 1 1 1 1 M C 1 b >
  • < 1 1 1 1 1 M C 1 b >   – – >   < 1 1 1 1 C 1 1 b >
  • < 1 1 1 1 C 1 1 b >   – – >   < 1 1 1 1 M A 1 1 b >
  • < 1 1 1 1 M A 1 1 b >   – – >   < 1 1 1 A 1 1 1 b >
  • < 1 1 1 A 1 1 1 b >   – – >   < 1 1 1 HALT M 1 1 1 b >
  • < 1 1 1 HALT M 1 1 1 b >   – – >   < 1 1 1 1 HALT 1 1 b >
    At this stage there are no possible rewrites. Otherwise said, the computation stops. Remark that the priority of rewrites imposed a path of the rewrites applications. Also, at each step there was only one rewrite possible, even if the algorithm does not ask for this.
    More possibilities appear if we see the tape words as graphs. In this case we pass from rewrites to graph rewrites. Here is a proposal for this.
    I shall use the same kind of notation as in

chemlambda: the mol format

    . It goes like this, explained for the busy beaver TM example. We have 9 symbols, which can be seen as nodes in a graph:

  • < which is a node with one “out” port. Use the notation FRIN out
  • > which is a node with one “in” port. Use the notation FROUT in
  • b, 1, A, B, C, HALT, M which are nodes with one “in” and one”out” port. Use a notation Name of node in out
    The rule is to connect “in” ports with “out” ports, in order to obtain a tape word. Or a tape graph, with many busy beavers on it. (TO BE CONTINUED…)

Reproducibility vs peer review

Here are my thoughts about replacing peer review by validation. Peer review is the practice where the work of a researcher is commented by peers. The content of the commentaries (reviews) is clearly not important. The social practice is to not make them public, nor to keep a public record about those. The only purpose of peer review is to signal that at least one, two, three or four members of the professional community (peers) declare that they believe that the said work is valid. Validation by reproducibility is much more than this peer review practice. Validation means the following:

  • a researcher makes public (i.e. “publishes”) a body of work, call it W. The work contains text, links, video, databases, experiments, anything. By making it public, the work is claimed to be valid, provided that the external resources used (as other works, for example) are valid. In itself, validation has no meaning.
  • a second part (anybody)  can also publish a validation assessment of the work W. The validation assessment is a body of work as well, and thus is potentially submitted to the same validation practices described here. In particular, by publishing the validation assessment, call it W1, it is also claimed to be valid, provided the external resources (other works used, excepting W) are valid.
  • the validation assessment W1 makes claims of the following kind: provided that external works A,B,C are valid, then this piece D of the work W is valid because it has been reproduced in the work W1. Alternatively, under the same hypothesis about the external work, in the work W1 is claimed that the other piece E of the work D cannot be reproduced in the same.
  • the means for reproducibility have to be provided by each work. They can be proofs, programs, experimental data.

As you can see the validation can be only relative, not absolute. I am sure that scientific results are never amenable to an acyclic graph of validations by reproducibility. Compared to peer review, which is only a social claim that somebody from the guild checked it, validation through reproducibility is much more, even if it does not provide means to absolute truths. What is preferable: to have a social claim that something is true, or to have a body of works where “relative truth” dependencies are exposed? This is moreover technically possible, in principle. However, this is not easy to do, at least because:

  • traditional means of publication and its practices are based on social validation (peer review)
  • there is this illusion that there is somehow an absolute semantical categorification of knowledge, pushed forward by those who are technically able to implement a validation reproducibility scheme at a large scale.

UPDATE: The mentioned illusion is related to outdated parts of the cartesian method. It is therefore a manifestation of the “cartesian disease”.

I use further the post More on the cartesian method and it’s associated disease. In that post the cartesian method is parsed like this:

  • (1a) “never to accept anything for true which I did not clearly know to be such”
  • (1b) “to comprise nothing more in my judgement than what was presented to my mind”
  • (1c) “so clearly and distinctly as to exclude all ground of doubt”
  • (2a) “to divide each of the difficulties under examination into as many parts as possible”
  • (2b) “and as might be necessary for its adequate solution”
  • (3a) “to conduct my thoughts in such order that”
  • (3b) “by commencing with objects the simplest and easiest to know, I might ascend […] to the knowledge of the more complex”
  • (3c) “little and little, and, as it were, step by step”
  • (3d) “assigning in thought a certain order even to those objects which in their own nature do not stand in a relation of antecedence and sequence”

Let’s take several researchers who produce works, some works related to others, as explained in the validation procedure.

Differently from the time of Descartes, there are plenty of researchers who think in the same time, and moreover the body of works they produce is huge.

Every piece of the cartesian method has to be considered relative to each researcher and this is what causes many problems.

Parts (1a),(1b), (1c) can be seen as part of the validation technique, but with the condition to see “true”and “exclude all grounds of doubt” as relative to the reproducibility of work W1 by a reader who tries to validate it up to external resources.

Parts (2a), (2b) are clearly researcher dependent; in a interconnected world these parts may introduce far more complexity than the original research work W1.

Combined with (1c), this leads to the illusion that the algorithm which embodies the cartesian method, when run in a decentralized and asynchronous world of users, HALTS.

There is no ground for that.

But the most damaging is (3d). First, every researcher embeds a piece of work into a narrative in order to explain the work. There is nothing “objective” about that. In a connected world, with the help of Google and alike, who impose or seek for global coherence, the parts (3d) and (2a), (2b) transform the cartesian method into a global echo chamber. The management of work bloats and spill over the work itself and in the same time the cartesian method always HALT, but for no scientific reason at all.

__________________________________

Inflation created by the Curry’s paradox

Curry’s paradox expressed in lambda calculus.

I took the lambda term from there and I modified slightly the part which describes the IFTHEN (figured by an arrow in the wiki explanation)

IFTHEN a b  appears in chemlambda as

A 1 a2 out
A a1 b 1
FO a a1 a2

which if you think a little bit, behaves like IFTHENELSE a b a.

Once I built a term like the “r” from the wiki explanation, instead of  using rr, I made a graph by the following procedure:

– take the graph of r applied to something (i.e. suppose that the free out port of r is “1” then add A 1 in out)

– make a copy of that graph (i.e in mol notation duplicate the mol file of the previous graph, change the ports variable — here by adding the “a” postfix)

– then apply one to the other (i.e. modify

A 1 in out
A 1a ina outa

into

A 1 outa out,
A 1a out outa)

The initial mol file is:

A 1 outa out
A 1a out outa

L 2 3 1
A 4 7 2
A 6 5 4
FO 8 6 7

FO 3 9 10
A 9 10 8

L 2a 3a 1a
A 4a 7a 2a
A 6a 5a 4a
FO 8a 6a 7a

FO 3a 9a 10a
A 9a 10a 8a

The parameters are: cycounter=8; wei_FOFOE=0; wei_LFOELFO=0; wei_AFOAFOE=0; wei_FIFO=0; wei_FIFOE=0; wei_AL=0;

i.e is a deterministic run for 8 steps.

Done in chemlambda.

______________________________________________________________-

A citizen science project on autonomous computing molecules

 Wanted: chemists, or people who work(ed) with chemical molecules databases!
[update:  github.io version]
The  chemlambda project proposes the following. Chemlambda is a model of computation based on individual molecules, which compute alone, by themselves (in a certain well defined sense). Everything is formulated from the point of view of ONE molecule which interacts randomly with a family of enzymes.
So what?
Bad detail: chemlambda is not a real chemistry, it’s artificial.
Good detail: it is Turing universal in a very powerful sense. It does not rely on boolean gates kind of computation, but on the other pillar of computation which led to functional programming: lambda calculus.
So instead of molecular assemblies which mimic a silicon computer hardware, chemlambda can do sophisticated programming stuff with chemical reactions. (The idea that lambda calculus is a sort of chemistry appeared in the ALCHEMY (i.e. algorithmic chemistry) proposal by Fontana and Buss. Chemlambda is far more concrete and simple than Alchemy, principially different, but it nevertheless owes to Alchemy the idea that lambda calculus can be done chemically.)
From here,  the following reasoning.
(a) Suppose we can make this chemistry real, as explained in the article Molecular computers.  This looks reasonable, based on the extreme simplicity of chemlambda reactions. The citizen science part is essential for this step.
(b) Then is is possible to take further Craig Venter’s Digital Biological Converters (which already exist) idea and enhance it to the point of being able to “print” autonomous computing molecules. Which can do anything (amenable to a computation, so literary anything). Anything in the sense that they can do it alone, once printed.
The first step of such an ambitious project is a very modest one: identify the ingredients in real chemistry.
The second step would be to recreate with real chemistry some of the examples which have been already shown as working, such as the factorial, or the Ackermann function.
Already this second step would be a huge advance over the actual state of the art in molecular computing. Indeed, compare a handful of boolean gates with a functional programming like computation.
If it is, for example, a big deal to build with DNA some simple assemblies of boolean gates, then surely it is a bigger deal to be able to compute the Ackermann function (which is not primitive recursive, like the factorial) as the result of a random chemical process acting on individual molecules.
It looks perfect for a citizen science project, because what is missing is a human distributed search in existing databases, combined with a call for realization of possibly simple proofs of principles chemical experiments based on an existing simple and rigorous formalism.
Once these two steps are realized, then the proof of principle part ends and more practical directions open.
Nobody wants to compute factorials with chemistry, silicon computers are much better for this task. Instead, chemical tiny computers as described here are good for something else.
If you examine what happens in this chemical computation, then you realize that it is in fact a means towards self-building of chemical or geometrical structure at the molecular level. The chemlambda computations are not done by numbers, or bits, but by structure processing. Or this structure processing is the real goal!
Universal structure processing!
In the chemlambda vision page this is taken even further, towards the interaction with the future Internet of Things.