In the article
Koon-Kiu Yan, Gang Fang, Nitin Bhardwaj, Roger P. Alexander, and Mark Gerstein
PNAS May 18, 2010 107 (20) 9186-9191; https://doi.org/10.1073/pnas.0914771107
are compared the E. coli transcriptional regulatory network and the Linux call graph.
Any model of molecular based life should be able to predict these differences.
Likewise, any model of decentralized computing which is based on the same hypotheses as a model of life should be different in the same qualitative ways from the Linux call graph as this E. coli transcriptional regulatory network is.
Here are two figures from the article which I consider highly relevant.
The first one [link to source] is:
and the decription is:
“The hierarchical layout of the E. coli transcriptional regulatory network and the Linux call graph. (Left) The transcriptional regulatory network of E. coli. (Right) The call graph of the Linux Kernel. Nodes are classified into three categories on the basis of their location in the hierarchy: master regulators (nodes with zero in-degree, Yellow), workhorses (nodes with zero out-degree, Green), and middle managers (nodes with nonzero in- and out-degree, Purple). Persistent genes and persistent functions (as defined in the main text) are shown in a larger size. The majority of persistent genes are located at the workhorse level, but persistent functions are underrepresented in the workhorse level. For easy visualization of the Linux call graph, we sampled 10% of the nodes for display. Under the sampling, the relative portion of nodes in the three levels and the ratio between persistent and nonpersistent nodes are preserved compared to the original network. The entire E. coli transcriptional regulatory network is displayed.”
What are “persistent functions” and “persistent genes”:
“In the Linux kernel […] {persistent functions are] defined as those that exist in every version of software development. Persistent functions in software systems are analogous to persistent genes in biological systems, which are genes that are consistently present in a large number of genomes.”
The article says that most of the persistent genes are down in the hierarchy, at the “workhorse” level, their aparently analogous persistent functions are spread in the Linux kernel at all levels, but mostly towards the top.
The second figure is about the modularity. [link to source]
In the graphs, they look for the average overlap between the nodes which are on the downside of two master nodes, and also they look for the average node reuse.
The problem as I understand it is not why the Linux graph is as it is, because it is obvious: it is written by programmers, who value semantics, modularity and reuse.
The problem is why the other graph is so different. A quantitative answer is needed for any computational model of biological life. Evolutionary explanations are alike a proof by contradiction. Here contradiction would mean “not observed now”. For persistent genes the evolutionary explanation would be that (from the article):
“The idea of persistence is closely related to the rate of evolution. In biological systems, the fundamental components of life exist in every genome independently of environmental conditions. These persistent genes, say, ribosomal proteins and dnaA, are under high selective pressure and evolve very slowly.”
which seems to say that persistent genes are observable now because they evolve very slowly, due to high selective pressure. (i.e. if the persistent genes are not very important for life then random evolution would wash them out). This is a proof by contradiction, it is not constructive. Costructive proofs in well defined models of life would be very valuable, in my opinion.