One of the first articles with means for complete validation by reproducibility

I have not stressed enough this aspect. The article

M. Buliga, Molecular computers

is one of the first articles which comes with complete means of validation by reproducibility.

This means that along with the content of the article, which contains animations and links to demonstrations, comes a github repository with the scripts which can be used to validate (or invalidate, of course) this work.

I can’t show you here how the article looks like, but I can show you a gif created from this  video of a demonstration which appears also in the article (however, with simpler settings, in order to not punish too much the browser).

KebzRb

This is a chemical like computation of the Ackermann(2,2) function.

In itself, is intended to show that if autonomous computing molecules can be created by the means proposed in the article, then impressive feats can be achieved.

This is part of the discussion about peer review and the need to pass to a more evolved way of communicating science.There are several efforts in this direction, like for example PeerJ’s paper-now commented in this post. See also the post Fascinating: micropublications, hypothes.is for more!

Presently one of the most important pieces of this is the peer review, which is the social practice consisting in declarations of one, two, four, etc anonymous professionals that they have checked the work and they consider it valid.

Instead, an ideal should be the article which runs in the browser, i.e. one which comes with means which would allow anybody to validate it up to external resources, like the works by other authors.

(For example, if I write in my article that “According to the work [1]   A is true. Here we prove that B follows from A.” then I should provide means to validate the proof that A implies B, but it would be unrealistical to be ask me to provide means to validate A.)

This is explained in more detail in Reproducibility vs peer review.

Therefore, if you care about evolving the form of the scientific article, then you have a concrete, short example of what can be done in this direction.

Mind that I am stubborn enough to cling to this form of publication, not because I am afraid to submit these beautiful ideas to legacy journals, but because I want to promote new ways of sharing research by using the best content I can make.

_________________________________________

Advertisements

A short complete description of chemlambda as a universal model of computation

A mol file is any finite list of items of the following kind, which
satisfies conditions written further. Take a countable collection of
variables, denote them by a, b, c, … . There is a list of symbols:
L, A, FI, FO, FOE, Arrow, T, FRIN, FROUT. A mol file is a list made of
lines of the form:
–  L a b c    (called abstraction)
–  A a b c    (called application)
– FI a b c    (called fanin)
– FO a b c   (called fanout)
– FOE a b c  (called other fanout)
– Arrow a b   (called arrow)
– T a            (called terminal)
– FRIN a      (called free in)
– FROUT a    (called free out)

The variables which appear in a mol file are called port names.

Condition 1: any port name appears exactly twice

Depending on the symbol (L, A, FI, FO, FOE, Arrow, T, FRIN, FROUT),
every port name has two types, the first from the list (left, right,
middle), the second from the list (in,out). The convention is to write
this pair of types as middle.in, or left.out, for example.

Further I repeat the list of possible lines in a mol file, but I shall
replace a, b, c, … by their types:
– L middle.in left.out right.out
– A left.in right.in middle.out
– FI left.in right.in middle.out
– FO middle.in left.out right.out
– FOE middle.in left.out right.out
– Arrow middle.in middle.out
– T middle.in
– FRIN middle.out
– FROUT middle.in

Condition 2: each port name (which appears in exactly two places
according to condition 1) appears in a place with the type *.in and in
the other place with the type *.out

Two mol files define the same molecule up to the following:
– there is a renaming of port names from one mol file to the other
– any permutation of the lines in a mol file gives an equivalent mol file

(The reason is that a mol file is a notation for an oriented graph
with trivalent or 2-valent or univalent nodes, made of locally planar
trivalent nodes L, A, FI, FO, FOE , 2valent node Arrow and 1valent
nodes T, FRIN, FROUT. In this notation the port names come from an
arbitrary naming of the arrows, then the mol file is a list of nodes,
in arbitrary order, along with port names coming from the names of
arrows.)

(In the visualisations these graphs-molecules are turned into
undirected graphs, made of nodes of of various radii and colours. To
any line from a mol file, thus to any node from the oriented graph,
correspond up to 4 nodes in the graphs from the visualisations.
Indeed, the symbols L, A, FI, FO, appear as nodes or radius
main_const and colour red_col or green_col, FOE, T and Arrow have different colours. Their respective ports
appear as nodes of colour in_col or out_col, for the types “in” or
“out”, and radius “left”, “right”, “middle” for the corresponding
types.FRIN has the colour in_col, FROUT has the colour out_col)

The chemlambda rewrites are of the following form: replace a left
pattern (LT) consisting of 2 lines from a mol file, by a right pattern
(RT) which may consist of one, two, three or four lines. This is
written as LT – – > RT

The rewrites are: (with the understanding that port names 1, 2, … c,
from LT  represent port names which exist in the mol file before the
rewrite and j, k, … from RT but not appearing in LT represent new
port names)

http://chorasimilarity.github.io/chemlambda-gui/dynamic/moves.html

– A-L (or beta):
L 1 2 c , A c 4 3 – – > Arrow 1 3 , Arrow 4 2

– FI-FOE (or fan-in):
FI 1 4 c , FOE c 2 3 – – > Arrow 1 3 , Arrow 4 2

– FO-FOE :
FO 1 2 c , FOE c 3 4 – – > FI j i 2 , FO k i 3 , FO l j 4 , FOE 1 k l

– FI-FO:
FI 1 4 c , FO c 2 3 – – > FO 1 i j , FI i k 2 , FI j l 3 , FO 4 k l

– L-FOE:
L 1 2 c , FOE c 3 4 – – > FI j i 2, L k i 3 , L l j 4 , FOE 1 k l

– L-FO:
L 1 2 c , FO c 3 4 – – > FI j i 2 , L k i 3 , L l j 4 , FOE 1 k l

– A-FOE:
A 1 4 c , FOE c 2 3 – – > FOE 1 i j , A i k 2 , A j l 3 , FOE 4 k l

– A-FO:
A 1 4 c , FO c 2 3 – – > FOE 1 i j , A i k 2 , A j l 3 , FOE 4 k l

– A-T:                                 A 1 2 3 , T 3 – – > T 1 , T 2

– FI-T:                                FI 1 2 3 , T 3 – – > T 1 , T 2

– L-T:                                 L 1 2 3 , T 3 – – > T 1 , T c , FRIN c

– FO2-T:                            FO 1 2 3 , T 2 – – > Arrow 1 3

– FOE2-T:                          FOE 1 2 3 , T 2 – – > Arrow 1 3

– FO3-T:                            FO 1 2 3 , T 3 – – > Arrow 1 2

– FOE3-T:                          FOE 1 2 3 , T 3 – – > Arrow 1 2

– COMB:                           any node M (excepting Arrow) any out
port c  , Arrow c d  – – >  M  d

Each of these rewrites are seen as a chemical interaction of the mol
file (molecule) with an invisible enzyme, which rewrites the LT into
RT.

(Actually one can group the rewrites into families, so there are
needed enzymes for (A-L, FI-FOE), for (FO-FOE, L-FOE, L-FO, A-FOE,
A-FO, FI-FOE, FI-FO) for (A-T, FI-T, L-T) and for (FO2-T, FO3-T,
FOE2-T, FOE3-T). COMB rewrites,a swell as the Arrow elements,  have a
less clear interpretation chemically and there may be not needed, or
even eliminated, see further how they appear in the reduction
algorithm.)

The algorithm (called “stupid”) of application  of the rewrites has
two variants: deterministic and random. Further I explain both. For
the random version there are needed some weights, denoted by wei_*
where * is the name of the rewrite.

(The algorithms actually used in the demos have a supplementary family
of weights, which are used in relation with the  count of the enzymes
used, I’ll pass this.)

The algorithm takes as input a mol file. Then there is a cycle which
repeats (indefinitely, or a prescribed number of times specified at
the beginning of the computation, or it stops if there are no rewrites
possible).

Then it marks all lines of the mol file as unblocked.
Then it creates an empty file of replacement proposals.

Then the  cycle is:

1- identify all LT  which do not contain blocked lines for the move
FO-FOE and mark the lines from the identified LT as “blocked”. Propose
to replace the LT’s by RT’s. In the random version flip a coin with
weight wei-FOFOE  for each instance of the LT identified and according
to the coin drop block or ignore the instance.

2- identify all LT which do not contain blocked lines for the  moves
(L-FOE, L-FO, A-FOE, A-FO, FI-FOE, FI-FO) and mark the lines from
these LT’s as “blocked” and propose to replace these by the respective
RTs. In the random version flip a coin with the respective weight
before deciding to block and replacement proposals.

3 – identify all LT which do not contain blocked lines for the  moves
(A-L, FI-FOE) and mark the lines from these LT’s as “blocked” and
propose to replace these by the respective RTs. In the random version
flip a coin with the respective weight before deciding to block and
replacement proposals.

4 – identify all LT which do not contain blocked lines for the  moves
(A-T, FI-T, L-T) and for (FO2-T, FO3-T, FOE2-T, FOE3-T) and mark the
lines from these LT’s as “blocked” and propose to replace these by the
respective RTs. In the random version flip a coin with the respective
weight before deciding to block and replacement proposals.

5- erase all LT which have been blocked and replace them by the
proposals. Empty the proposals file.

6- the COMB cycle: repeat  the application of COMB moves, in the same
style (blocks and proposals and updates) until no COMB move is
possible.

7- mark all lines as unblocked

The main cycle ends here.

The algorithm ends if (the number of cycles is attained, or there are
no rewrites performed in the last cycle, in the deterministic version,
or there were no rewrites in the last N cycles, with N predetermined,
in the random version).

___________________________________

Three models of chemical computing

+Lucius Meredith  pointed to stochastic pi-calculus ans SPiM  in a discussion about chemlambda. I take this as reference http://research.microsoft.com/en-us/projects/spim/ssb.pdf (say pages 8-13 to have a anchor for the discussion).

Analysis: SPiM side
– The nodes of the graphs are molecules, the arrows are channels.
– As described there the procedure is to take a (huge perhaps) CRN and to reformulate it more economically, as a collection of graphs where nodes are molecules, arrows are channels.
– There is a physical chemistry model behind which tells you which probability has each reaction.
– During the computation the reactions are known, all the molecules are known, the graphs don’t change and the variables are concentrations of different molecules.
– During the computation one may interpret the messages passed by the channels as decorations of a static graph.

The big advantage is that indeed, when compared with a Chemical Reactions Network approach,  the stochastic pi calculus transforms the CRN  into a much more realistical model. And much more economical.

chemlambda side:

Take the pet example with the Ackermann(2,2)=7 from the beginning of http://chorasimilarity.github.io/chemlambda-gui/dynamic/molecular.html

(or go to the demos page for more http://chorasimilarity.github.io/chemlambda-gui/dynamic/demos.html )

– You have one molecule (it does not matter if it is a connected graph or not, so you may think about it as being a collection of molecules instead).
– The nodes of the molecule are atoms (or perhaps codes for simpler molecular bricks). The nodes are not species of molecules, like in the SPiM.
– The arrows are bonds between atoms (or perhaps codes for other simple molecular bricks). The arrows are not channels.
– I don’t know which are all the intermediary molecules in the computation. To know it would mean to know the result of the computation before. Also, there may be thousands possible intermediary molecules. There may be an infinity of possible intermediary molecules, in principle, for some initial molecules.
-During the computation the graph (i.e. the molecule) changes. The rewrites are done on the molecule, and they can be interpreted as chemical reactions where invisible enzymes identify a very small part of the big molecule and rewrite them, randomly.

Conclusion: there is no relation between those two models.

Now, chemlambda may be related to +Tim Hutton  ‘s artificial chemistry.
Here is a link to a beautiful javascript chemistry
http://htmlpreview.github.io/?https://github.com/ModelingOriginsofLife/ParticleArtificialChemistry/blob/master/index.html
where you see that
-nodes of molecules are atoms
-arrows of molecules are bonds
-the computation proceeds by rewrites which are random
– but distinctly from chemlambda the chemical reactions (rewrites) happen in a physical 2D space (i.e based on the space proximity of the reactants)

As something in the middle between SPiM and jschemistry, there is a button which shows you a kind if instantaneous reaction graph!

Improve chemical computing complexity by a 1000 times factor, easily

That is, if autonomous computing molecules are possible, as described in the model shown in the Molecular computers.

To be exactly sure about about the factor, I need to know the answer for the following question:

What is the most complex chemical computation done without external intervention, from the moment when the (solution, DNA molecule, whatnot) is prepared, up to the moment when the result is measured?

Attention, I know that there are several Turing complete chemical models of computations, but they all involve some manipulations (heating a solution, splitting one in two, adding something, etc).

I believe, but I may be wrong, depending on the answer to this question, that the said complexity is not bigger than a handful of boolean gates, or perhaps some simple Turing Machines, or a simple CA.

If I am right, then compare with my pet example: the Ackermann function. How many instructions a TM, or a CA, or how big a circuit has to be to do this? 1000 times is a clement estimate. This can be done in my proposal easily.

So, instead of trying to convince you that my model is interesting because is related with lmbda calculus, maybe I can make you more interested if I tell you that for the same material input, the computational output is way bigger than in the best model you have.

Thank you for answering to the question, and possibly for showing me wrong.

___________________________

A second opinion on “Slay peer review” article

“It is no good just finding particular instances where peer review has failed because I can point you to specific instances where peer review has been very successful,” she said.
She feared that abandoning peer review would make scientific literature no more reliable than the blogosphere, consisting of an unnavigable mass of articles, most of which were “wrong or misleading”.
This is a quote from one of the most interesting articles I read these days: “Slay peer review ‘sacred cow’, says former BMJ chief” by Paul Jump.
http://www.timeshighereducation.co.uk/news/slay-peer-review-sacred-cow-says-former-bmj-chief/2019812.article#.VTZxYhJAwW8.twitter
I commented previously about replacing peer-review with validation by reproducibility
but now I want to concentrate on this quote, which, according to the author of the article,  has been made by “Georgina Mace, professor of biodiversity and ecosystems at University College London”.This is the pro argument in favour of the actual peer review system. Opposed to it, and main subject of the article, is”Richard Smith, who edited the BMJ between 1991 and 2004, told the Royal Society’s Future of Scholarly Scientific Communication conference on 20 April that there was no evidence that pre-publication peer review improved papers or detected errors or fraud.”

I am very much convinced by this, but let’s think coldly.

Pro peer review is that a majority of peer reviewed articles is formed by correct articles, while a majority of  “the blogosphere [is] consisting of an unnavigable mass of articles, most of which were “wrong or misleading””.

Contrary to peer review is that, according to “Richard Smith, who edited the BMJ between 1991 and 2004” :

“there was no evidence that pre-publication peer review improved papers or detected errors or fraud.”
“Referring to John Ioannidis’ famous 2005 paper “Why most published research findings are false”, Dr Smith said “most of what is published in journals is just plain wrong or nonsense”. […]
“If peer review was a drug it would never get on the market because we have lots of evidence of its adverse effects and don’t have evidence of its benefit.””
and moreover:

“peer review was too slow, expensive and burdensome on reviewers’ time. It was also biased against innovative papers and was open to abuse by the unscrupulous. He said science would be better off if it abandoned pre-publication peer review entirely and left it to online readers to determine “what matters and what doesn’t”.”

Which I interpret as confidence in the blogosphere-like medium.

Where is the truth? In the middle, as usual.

Here is my opinion, please form yours.

The new medium comes with new, relatively better means to do research. An important part of the research involves communication, and it is clear that the old system is already obsolete. It is kept artificially alive by authority and business interests.

However, it is also true that a majority of productions which are accessible via the new medium are of a very bad quality and unreliable.

To make another comparison, in the continuation of the one about the fall of academic painters and the rise of impressionists
https://chorasimilarity.wordpress.com/2013/02/16/another-parable-of-academic-publishing-the-fall-of-19th-century-academic-art/
a majority of the work of academic painters was good but not brilliant (reliable but not innovative enough), a majority of non academic painters produce crappy cute paintings which average people LOVE to see and comment about.
You can’t accuse a non affiliated painter that he shows his work in the same venue where you find all the cats, kids, wrinkled old people and cute places.

Science side, we live in a sea of crappy content which is loved by the average people.

The so  called attention economy consists mainly in shuffling this content from a place to another. This is because liking and sharing content is a different activity than creating content. Some new thinking is needed here as well, in order to pass over the old idea of scarce resources which are made available by sharing them.

It is difficult for a researcher, who is a particular species of a creator, to find other people willing to spend time not only to share original ideas (which are not liked because strange, by default), but also to invest  work into understanding it, into validating it, which is akin an act of creation.

That is why I believe that:
– there have to be social incentives for these researchers  (and that attention economy thinking is not helping this, being instead a vector of propagation for big budget PR and lolcats and life wisdom quotes)
– and that the creators of new scientific content have to provide as many as possible means for self-validation of their work.

_________________________________

A comparison of two models of computation: chemlambda and Turing machines

The purpose is to understand clearly what is this story about. The most simple stuff, OK? in order to feel it in familiar situations.

Proceed.
Chemlambda is a collection of rules about rewritings done on pieces of files in a certain format. Without an algorithm which tells which rewrite to use, where and when,  chemlambda does nothing.

In the sophisticated version of the Distributed GLC proposal, this algorithmic part uses the Actor Model idea. Too complicated!, Let’s go simpler!

The simplest algorithm for using the collection of rewrites from chemlambda is the following:
  1. take a file (in the format called “mol”, see later)
  2. look for all patterns in the file which can be used for rewrites
  3. if there are different patterns which overlap, then pick a side (by using an ordering or graph rewrites, like the precedence rules in arithmetic)
  4. apply all the rewrites at once
  5. repeat (either until there is no rewrite possible, or a given number of times, or forever)
 To spice things just a bit, consider the next simple algorithm, which is like the one described, only that we add at step 2:
  •  for every identified pattern flip a coin to decide to keep it or ignore it further in the algorithm
The reason  is that  randomness is the simplest way to say: who knows if I can do this rewrite when I want, or maybe I have in my computer only a part of the file, or maybe I have to know that a friend has a half of the pattern and I have the other, so I have to talk with him first, then agree to make together the rewrite. Who knows? Flip a coin then.

Now, proven facts.

Chemlambda with the stupid deterministic algorithm is Turing universal. Which means that implicitly this is a model of computation. Everything is prescribed from the top to the bottom. Is on the par with a Turing machine, or with a RAND model.

Chemlambda with the random stupid model seems to be also Turing universal, but I don’t have yet a proof for this. There is a reason for the fact that it is as powerful as the stupid deterministic model, but I won’t go there.

So the right image to have is that chemlambda with the  described algorithm can do anything any computer can.

The first question is, how? For example how compares chemlambda with a Turing machine? If it is at this basic level then it means it is incomprehensible, because we humans can’t make sense of a scroll of bytecode, unless we are highly trained in this very specialised task.

All computers do the same thing: they crunch machine code. No matter which high language you use to write a program, it is then compiled and eventually there is a machine code which is executed, and that is the level we speak.

It does not matter which language you use, eventually all is machine code. There is a huge architectural tower and we are on the top of it, but in the basement all looks the same. The tower is here for us to be easy to use the superb machine. But it is not needed otherwise, it is only for our comfort.

This is very puzzling when we look at chemlambda because it is claimed that chemlambda has something to do with lambda calculus, or lambda calculus is the prototype of a functional programming language. So it appears that chemlamdba should be associated with higher meaning and clever thinking, and abstraction of the abstraction of the abstraction.

No, from the point of view of the programmer.

Yes, from the point of view of the machine.

In order to compare chemlambda with a TM we have to put it in the same terms. So you can easily put a TM in terms of a rewrite system, such that it works with the same stupid deterministic algorithm. http://chorasimilarity.github.io/chemlambda-gui/dynamic/turingchem.html

It is not yet put there, but the conclusion is obvious: chemlambda can do lambda calculus with one rewrite, while an Universal Turing Machine needs about 20 rewrites to do what TM do.

Unbelievable!
Wait, what about distributivity, propagation, the fanin, all the other rewrites?
They are common, they just form a mechanism for signal transduction and duplication!
Chemlambda is much simpler than TM.

So you can use directly chemlambda, at this metal level, to perform lambda calculus. Is explained here
https://chorasimilarity.wordpress.com/2015/04/21/all-successful-computation-experiments-with-lambda-calculus-in-one-list/

And I highly recommend  to try to play with it by following the instructions.

You need a linux system, or any system where you have sh and awk.

Then

2. unzip it and go to the directory “dynamic”
3. open a shell and write:  bash moving_random_metabo.sh
4. you will get a prompt and a list of files with the extension .mol , write the name of one of them, in the form file.mol
5. you get file.html. Open it with a browser with js enabled. For reasons I don’t understand, it works much better in safari, chromium, chrome than in firefox.

When you look at the result of the computation you see an animation, which is the equivalent of seeing a TM head running here and there on a tape. It does not make much sense at first, but you can convince that it works and get a feeling about how it does it.

Once you get this feeling I will be very glad to discuss more!

Recall that all this is related  to the most stupid algorithm. But I believe it helps a lot to understand how to build on it.
____________________________________________________

Yes, “slay peer review” and replace it by reproducibility

Via Graham Steel the awesome article Slay peer review ‘sacred cow’, says former BMJ chief.

“Richard Smith, who edited the BMJ between 1991 and 2004, told the Royal Society’s Future of Scholarly Scientific Communication conference on 20 April that there was no evidence that pre-publication peer review improved papers or detected errors or fraud. […]

“He said science would be better off if it abandoned pre-publication peer review entirely and left it to online readers to determine “what matters and what doesn’t”.

“That is the real peer review: not all these silly processes that go on before and immediately after publication,” he said.”

That’s just a part of the article, go read the counter arguments by Georgina Mace.

Make your opinion about this.

Here is mine.

In the post Reproducibility vs peer review I write

“The only purpose of peer review is to signal that at least one, two, three or four members of the professional community (peers) declare that they believe that the said work is valid. Validation by reproducibility is much more than this peer review practice. […]

Compared to peer review, which is only a social claim that somebody from the guild checked it, validation through reproducibility is much more, even if it does not provide means to absolute truths.”

There are several points to mention:

  • the role of journals is irrelevant to anybody else than publishers and their fellow academic bureaucrats who work together to maintain this crooked system, for their own $ advantage.
  • indeed, an article should give by itself the means to validate its content
  • which means that the form of the article has to change from the paper version to a document which contains data, programs, everything which may help to validate the content written with words
  • and the validation process (aka post review) has to be put on the par with the activity of writing articles, Even if an article comes with all means to validate it (like the process described in  Reproducibility vs peer review ), the validation supposes work and by itself it is an activity akin to the one which is reported in the article. More than this, the validation may or may not function according to what the author of the work supposes, but in any case it leads to new scientific content.

In theory sounds great, but in practice it may be very difficult to provide a work with the means of validation (of course up to the external resources used in the work, like for example other works).

My answer is that: concretely it is possible to do this and I offer as example my article Molecular computers, which is published on github.io and it comes with a repository which contains all the means to confirm or refute what is written in the article.

The real problem is social. In such a system the bored researcher has to spend more than 10 min top to read an article he or she intends to use.

Then it is much easy, socially, to use the actual, unscientific system of replacing validation by authority arguments.

As well, the monkey system — you scratch my back and I’ll scratch yours — which is behind most of the peer reviews (only think about the extreme specialisation of research which makes that almost surely a reviewer competes or collaborates with the author), well, that monkey system will no longer function.

This is even a bigger problem than the one that publishing and academic bean counting will soon be obsolete.

So my forecast is that we shall keep a mix of authority based (read “peer review”) and reproducibility (by validation), for some time.

The authority, though, will take another blow.

Which is in favour of research. It is also economically sound, if you think that probably today a majority of funding for research go to researchers whose work pass peer reviews, but not validation.

______________________________________________