Chemlambda for the people

We can program a computer to do anything. What if we had the same power over the molecules of our bodies? Let’s imagine how this could change our lives.

For example… this version of the scenario [3].

Adam and Eve meet at a party. She likes him. Her sniffer ring can sense Adam’s biomolecules floating in the air between them. One of them triggers a warning. Eve forwards the warning to Adam’s phone.

Back home, Adam files a bug report with his internet slash health provider. The bug report contains his biological ID and the DNA code received by the warning message.

The bug report is opened.

The ID and DNA code are converted to a digital chemistry. Technical staff manipulate this chemistry, as hackers about to debug a program in Neuromancer style.

“still he’d see the matrix in his sleep, bright lattices of logic unfolding across that colorless void”
William Gibson, Neuromancer

Things like making lists, just, fold up inside themselves. Come out the other way around. Crazy things.”
Pseudo — William Gibson

They find a digital molecule which solves Adam’s problem. A medicine. They convert the solution back to a DNA code which they send to Adam’s router.

The router can turn DNA code back into real biomolecules. Why? It’s a Venter 9000 digital-to-biological converter. Version one looks like this [1].

Is a bit larger than a router, for the moment. But, in few years, the 9000 version will be in everybody’s home.

The router emits these biomolecules into Adam’s bedroom. They enter the body and so the bug report is solved, the medicine is delivered and Adam is in perfect health again.

Can we really do this?

I think so, there are 3 steps to make.

Step 1. Build a digital chemistry which we can program. In a digital chemistry data and programs are all graph like structures, digital molecules which “fold up inside themselves and come out the other way around” only they do it randomly, like in real chemistry.

We would create and manipulate digital molecules as if we write programs made from a very few elementary bricks. Then we could simulate their behaviour on a computer, to be sure they work right.

Step 2. Use Nature to simulate this digital chemistry. There’s no computer as powerful as Nature, let’s use it. Find a digital-to-biological dictionary from the elementary bricks of the digital chemistry to real biomolecular bricks.

Step 3. Build digital-to-biological converters and biological-to-digital sensors. Craig Venter gave us the first generic DBC converter. Sensors as performant as Eve’s sniffer ring, as a part of the Internet of Things, are possible.

OK, so the program is simple. Let’s do it right away!

Well, I’m not a chemist, I’m a mathematician and I built a digital chemistry which does work like real chemistry. It is indeed inspired from stuff related to Lisp and Haskell (but goes in wild directions). Is called chemlambda [6], is an Open Science project and I hope it can be used in reality.

Molecules in chemlambda are graphs made by colored nodes and links between them. The chemical reactions are done by enzymes rewiring small patterns in these graphs.

Chemlambda is Turing universal, meaning that you can translate any computer program into one of these molecules and execute it via random digital chemical reactions.

In my simulations I used things like the Ackermann function or the factorial, but think: any program! You could do anything with the Nature’s computer.

More general, going far outside the small world of computer programs interesting for the neighbourhood programmer, you could design molecules from first principles.

Instead of shooting in the dark by doing many experiments with real world molecules, kind of like a barbarian who finds new uses for the tiny things discovered in a clock workshop, instead of this you could design what you need, then turn it into reality.

Colonize Mars? Deposit all Netflix shows in lichen spores?

Just applications.

Some frightening, of course.

But: understand life at molecular level? What a worthy goal. This may (or may not) help.

If the step 2 is realized, here’s the bottleneck.

I am very willing to try the step 2 of the program. I think this can be done by a combination of clever searches in available chemical databases and collaborative work.

After all, chemlambda it’s an Open Science project. Means that it may scale, with chance.


[1] Digital-to-biological converter for on-demand production of biologics, Kent S Boles, Krishna Kannan, John Gill, Martina Felderman, Heather Gouvis, Bolyn Hubby, Kurt I Kamrud, J Craig Venter and Daniel G Gibson
see also Motherboard article

[2] The chemlambda repository README is the entry point to the project.

[3] Internet of Smells,

The Library of Alexandra

“Hint: Sci-Hub was created to open papers that are not available online at all. You cannot find these papers in Google or in open access” [tweet by @Sci_Hub]

“Public Resource will make extracts of the Library of Alexandra available shortly, will present the issues to publishers and governments.” [tweet by Carl Malamud]



More experiments with Open Science

I still don’t know which format is better for Open Science. I’m long past the article format for obvious reasons. Validation is a good word and concept because you don’t have to rely absolutely on opinions of others and that’s how the world works. This is not all the story though.

I am very fortunate to be a mathematician, not a biologist or biochemist. Still I long for the good format for Open Science, even if, as a mathematician, I don’t have the problems biologists or chemists have, namely loads and loads of experimental data and empirical approaches. I do have a world of my own to experiment with, where I do have loads of data and empirical constructs. My mind, my brain are real and I could understand myself by using tools of chemists and biologists to explore the outcomes of my research. Funny right? I can look at myself from the outside.

That is why  I chose to not jump directly to make Hydrogen, but instead to treat the chemlambda  world, again, as a guinea pig for Open Science.

There are 427 well written molecules in the chemlambda library of molecules on Github. There are 385 posts in the chemlambda collection on Google+, most of them with animations from simulations of those molecules. It is a world, how big is it?

It is easy to make first a one page direct access to the chemlambda collection. It is funnier to build a phylogenetic tree of the molecules, based on their genes. That’s what I am doing now, based on a work in progress.

Each molecule can be decomposed in “genes” say, by a sequencer program. Then one can use a distance between these genes to estimate first how they cluster and later to make a phylogenetic tree.

Here is the first heatmap (using the edit distance between single occurrences of genes in molecules) of the 427 molecules.


Is a screenshot, proving that my custom programs work 🙂 (one understands more by writing some scripts than by taking tools ready made from others, at least at this stage of research).

By using the edit distance I can map the explored chemlambda molecules. In the following image the 427 molecules from the library are represented as nodes and for each pair of molecules at an edit distance at most 20 there is a link. The nodes are in a central gravitational field, each node has the same charge and the links between nodes act as springs.


This is a screenshot of the result, showing clusters and trees, connecting them. Not very sophisticated, but enough to give a sense of the explored territory. In the curated collection, such a map would be useful to navigate through the molecules, as well as for giving ideas about which parts are not as well explored. I have not yet made clear which parts of the map cover lambda terms, which cover quines, etc.

Moreover, I see structure! The 427 molecules are made of copies of  605 different linear “genes” (i.e. sticks with colored ends)  and 38 ring shaped ones.  (Is easy to prove that lambda terms have no rings, when turned into molecules.) There are some interesting curved features visible in the edit distance of the sticks.


They don’t look random enough.

Is clear that a phylogenetic tree is in reach, then what else than connecting the G+ collection posts with the molecules used, arranged along the tree…?

Can I discover which molecules are coming from lambda terms?

Can I discover how my mind worked when building these molecules?

Which are the neglected sides, the blind places?

I hope to be able to tell by the numbers.

Which brings me to the main subject of this post: which is a good format for an Open Science piece of research?

Right now I am in between two variants, which may turn out to not be as different as they seem. An OS research vehicle could be:

  • like a viable living organism, literary
  • or like a viable world, literary.

Only the future will tell which is which. Maybe both!

Chemlambda will be curated

Chemlambda appeared out of frustration that nobody understands and see what I do, so I had to write it. The same with the chemlambda collection, I’ll curate it and put it in one easy to figure out place. It’s doable. The only fear I have about this is to be sucked again in this highly hallucinatory universe, which is almost real now and will be really real soon.

So for the moment here’s a page which allows you to go directly to any of the chemlambda collection post.

Maybe this will improve my karma so I’ll be prepared to do pure hydrogen. The initial trials look very promising and despite the apparent simplicity (what? make required mathematics, space, physics and this simple atom, all invoked from abstract nonsense like in a super geometric Lisp) the hydrogen project is more difficult because there is no precedent.

So wish me luck 🙂

Update the Panton Principles please

There is a big contradiction between the text of The Panton Principles and the List of the Recommended Conformant Licenses. It appears that it is intentional, I’ll explain in a moment why I write this.

This contradiction is very bad for the Open Science movement. That is why, please, update your principles.

Here is the evidence.

1. The second of the Panton Principles is:

“2. Many widely recognized licenses are not intended for, and are not appropriate for, data or collections of data. A variety of waivers and licenses that are designed for and appropriate for the treatment of data are described [here]( Creative Commons licenses (apart from CCZero), GFDL, GPL, BSD, etc are NOT appropriate for data and their use is STRONGLY discouraged.

*Use a recognized waiver or license that is appropriate for data.* ”

As you can see, the authors clearly state that “Creative Commons licenses (apart from CCZero) … are NOT appropriate for data and their use is STRONGLY discouraged.”

2. However, if you look at the List of Recommended Licenses, surprise:

Creative Commons Attribution Share-Alike 4.0 (CC-BY-SA-4.0) is recommended.

3. The CC-BY-SA-4.0 is important because it has a very clear anti-DRM part:

“You may not offer or impose any additional or different terms or conditions on, or apply any Effective Technological Measures to, the Licensed Material if doing so restricts exercise of the Licensed Rights by any recipient of the Licensed Material.” [source CC 4.0 licence: in Section 2/Scope/a. Licence grant/5]

4. The anti-DRM is not a “must” in the Open Definition 2.1. Indeed, the Open Definition clearly uses “must” in some places and “may” in another places.  See

“2.2.6 Technical Restriction Prohibition

The license may require that distributions of the work remain free of any technical measures that would restrict the exercise of otherwise allowed rights. ”

5. I asked why is this here. Rufus Pollock, one of the authors of The Panton Principles and of the Open Definition 2.1, answered:

“Hi that’s quite simple: that’s about allowing licenses which have anti-DRM clauses. This is one of the few restrictions that an open license can have.”

My reply:

“Thanks Rufus Pollock but to me this looks like allowing as well any DRM clauses. Why don’t include a statement as clear as the one I quoted?”


“Marius: erm how do you read it that way? “The license may prohibit distribution of the work in a manner where technical measures impose restrictions on the exercise of otherwise allowed rights.”

That’s pretty clear: it allows licenses to prohibit DRM stuff – not to allow it. “[Open] Licenses may prohibit …. technical measures …”


“Marius: so are you saying your unhappy because the Definition fails to require that all “open licenses” explicitly prohibit DRM? That would seem a bit of a strong thing to require – its one thing to allow people to do that but its another to require it in every license. Remember the Definition is not a license but a set of principles (a standard if you like) that open works (data, content etc) and open licenses for data and content must conform to.”

I gather from this exchange that indeed the anti-DRM is not one of the main concerns!

6. So, until now, what do we have? Principles and definitions which aim to regulate what Open Data means which avoid to take an anti-DRM stance. In the same time they strongly discourage the use of an anti-DRM license like CC-BY-4.0. However, on a page which is not as visible they recommend, among others, CC-BY-4.0.

There is one thing to say: “you may use anti-DRM licenses for Open Data”. It means almost nothing, it’s up to you, not important for them. They write that all CC licenses excepting CCZero are bad! Notice that CC0 does not have anything anti-DRM.

Conclusion. This ambiguity has to be settled by the authors. Or not, is up to them. For me this is a strong signal that we witness one more attempt to tweak a well intended  movement for cloudy purposes.

The Open Definition 2.1. ends with:

Richard Stallman was the first to push the ideals of software freedom which we continue.

Don’t say, really? Maybe is the moment for a less ambiguous Free Science.

The price of publishing with GitHub, Figshare, G+, etc

Three years ago I posted The price of publishing with arXiv. If you look at my arXiv articles then you’ll notice that I barely posted on since then. Instead I went into territory which is even less recognized as serious by a big part of academia. I used:

The effects of this choice are put in front of my homepage, so go there to read them. (Besides, it is a good exercise to remember how to click on links and use them, that lost art from the age when internet was free.)

In this post I want to explain what is the price I paid for these choices and what I think now about them.

First, it is a very stressful way of living. I am not joking, as you know stress comes from realizing that there are many choices and one has to choose. Random reward from the social media is addictive. The discovery that there is a way to get out from the situation which keeps us locked into the legacy publishing system (validation). The realization that the problem is not technical but social. A much more cynical view of the undercurrents of the social life of researchers.

The feeling that I can really change the world with my research. The worries that some possible changes might be very dangerous.

The debt I owe concerning the scarcity of my explanations. The effort to show only the aspects I think are relevant, putting aside those who are not. (Btw, if you look at my About page then you’ll read “This blog contains ideas from the future”. It is true because I already pruned the 99% of the paths leading nowhere interesting.)

The desire to go much deeper, the desire to explain once again what and why, to people who seem either lacking long term attention capability or having shallow pet theories.

Is like fishing for Moby Dick.

computing with space | open notebook


Change, when it comes, cracks everything open. - Dorothy Allison


computing with space | open notebook

Research Practices and Tools

computing with space | open notebook


News from coreboot world


computing with space | open notebook

Random thoughts and fancy math

computing with space | open notebook


The Decentralised Internet is Here


computing with space | open notebook

Low Dimensional Topology

Recent Progress and Open Problems


An experimental 3d voxel rendering algorithm

DIANABUJA'S BLOG: Africa, The Middle East, Agriculture, History and Culture

Ambling through the present and past with thoughts about the future

Retraction Watch

Tracking retractions as a window into the scientific process

Gödel's Lost Letter and P=NP

a personal view of the theory of computation

%d bloggers like this: