Tag Archives: Open Science

What I do according to ADS search

ArXiv  links to the Astrophysics  Data System, which got a new fancy look. It may be a bit heavy, as a supporter of the wonderful arXiv I would rather applaud if they would allow me to put articles with animations inside, be them only animated gifs. But is nevertheless interesting.

So if I go to my arXiv articles, choose an article and then click on NASA ADS link on the right panel, then I get this page.  Funny that they don’t use the Journal Reference from the arXiv to decide which article is “refereed”, i.e. peer reviewed, even if peer review is less than validation.

I am very pleased though   about the visual representation of what I do, as seen from the arXiv articles.

no_papers

This is the image which tells how many articles I have on certain keywords, as well as links between keywords which are proportional with the number of the articles which fit a pair of keywords.

TBH this is the first time a neutral bibliometric system  shows an accurate image of my work.

The darker blue sector, which has no words on it is related to variational methods in fracture, Mumford-Shah and convexity articles.

The same picture, but according to the downloads in the last 90 days, is this one.

no_downloads

This is also very satisfying because the hamiltonian/information/… has a big future. For the moment it looks unrelated to the other sectors, but wait for the kaleidos project 🙂

The em-convex rewrite system, where I guess I found the equivalent of the Church numbers for space, is in the dilatation structures/…/selfsimilar sector. In my opinion, important subject.

What’s new around Open Access and Open Science? [updated]

In the last year I was not very much interested into Open Access and Open Science. There are several reasons, I shall explain them. But before: what’s new?

My reasons were that:

  • I’m a supporter of OA, but not under the banner of gold OA. You know that I have a very bad impression about the whole BOAI thing, which introduced the false distinction between gold which is publication and green which is archival. They succeeded to delay the adoption of what researchers need (i.e. basically older than BOAI inventions, like arXiv) and the recognition that the whole academic publication system is working actively against the researchers interests. Academic managers are the first to be blamed about this, because they don’t have the excuse that they work for a private entity which has to make money no matter the price. Publishers are greedy, OK, but who gives them the money?
  • Practically, for the working researcher, we can now publish in any place, no matter how close or anachronically managed, because we can find anything on Sci-Hub, if we want. So there is no reason to fight for more OA than this. Except for those who make money from gold OA…
  • I was very wrong with my efforts and attempts to use corporate social media for scientific communication.
  • Bu still, I believe strongly in the superiority of validation over peer-review. Open Science is the future.

I was also interested in the implications for OA and OS of the new EU Copyright Directive. I expressed my concern that again it seems that nobody cares about the needs of researchers (as opposed to publishers and corporations in general) and I asked some questions which interest me and nobody else seems to ask: will the new EU Copyright Directive affect arXiv or Figshare?  The problem I see is related to automatic filters, or to real ways the researchers may use these repositories.  See for example here for a discussion.  In   Sept 2018 I filed requests for answers to arXiv and to Figshare. For me at least the answers will be very interesting and I hope them to be as bland as possible, in the sense that there is nothing to worry about.

So from my side, that’s about all, not much. I feel like except the gold OA money sucking there’s nothing new happening. Please tell me I’m very wrong and also what can I do with my research output, in 2019.

UPDATE: I submitted two days ago a comment at Julia Reda post Article 13 is back on – and it got worse, not better. About the implications for the research articles repositories, the big ones, I mean, the ones which are used millions of times by many researchers. I waited patiently, either for the appearance of the comment or for a reaction. Any reaction. For me this is a clear answer: pirates fight for the freedom of the corporation to share in its walled garden the product of a publisher. The rest is immaterial for them. They pirates not explorers.

UPDATE 2: This draft of Article 13 contains the following definition: “‘online content sharing service provider’ means a provider of an information society service whose main or one of the main purposes is to store and give the public access to a large amount of copyright protected works or other protected subject-matter uploaded by its users which it organises and promotes for profit-making purposes. Providers of services such as not-for profit online encyclopedias, not-for profit educational and scientific repositories, open source software developing and sharing platforms, electronic communication service providers as defined in Directive 2018/1972 establishing the European Communications Code, online marketplaces and business-to business cloud services and cloud services which allow users to upload content for their own use shall not be considered online content sharing service providers within the meaning of this Directive.

If this is part of the final version of Article 13 then there is nothing to worry as concerns arXiv, for example.

Maybe a separate push should be on upload filters and their legal side (who is responsible for the output of this algorithm? surely not the algorithm!), perhaps by asking for complete, reproducible, transparent information about those: source code and all the dependencies source code, reproducible behavior.

 

Open Science is rwx science

Preamble: this is a short text on Open Science, written a while ago,  which I now put it here. It is taken from this place at telegra.ph. The link (not the content) appeared here at the Chemlambda for the people post. I can’t find other traces, except the empty github repository “creat”,  described as “framework for research output as a living creature“.

__________________

I am a big fan of Open Science. For me, a good piece of research is one which I can Read Write eXecute.

Researchers use articles to communicate. Articles are not eXecutable. I can either Read others’ articles or Write mine. I have to trust an editor who tells me that somebody else, whom I don’t know, read the article and made a peer-review.

No. Articles are stories told by researchers about how they did the work. And since the micromanagement era, they are even less: fungible units to be used in funding applications, by the number or by the keyword.

This is so strange. I’m a mathematician and you probably know that mathematics is the most economical way to explain something clearly. Take a 10 pages research article. It contains the intensive work of many months. Now, compress the article further more by the following ridiculous algorithm: throw away everything but the first several bits. Keep only the title, the name of the journal, keywords, maybe the Abstract. That’s not science communication, that’s massive misuse of brain material.

So I’m an Open Science fan, what should I do instead of writing articles? Maybe I should push my article in public and wait after that for somebody to review it. That’s called Open Access and it’s very good for the readers. So what? the article is still only Readable or Writable, pick only one option, otherwise it’s bad practice. What about my time? It looks that I have to wait and wait for all the bosses, managers, politicians and my fellow researchers to switch to OA first.

It’s actually much easier to do Open Science, remember! something that you can Read, Write and eXecute. As an author, you don’t have to wait for the whole society to leave the old ways and to embrace the new ones. You can just push what you did: stories, programs, data, everything. Any reader can pull the content and validate it, independently. EXecute what you pushed, Read your research story and Write derivative works.

I tried this! Want to know how to build a molecular computer which is indiscernible from how we are made? Use this playground called chemlambda. It’s a made up, simple chemistry. It works like the real chemistry does, that is locally, randomly, without any externally imposed control. My bet is that chemlambda can be done in real life. Now, or in a few years.

I use everything available to turn this project into Open Science. You name it: old form articles, html and javascript articles, research blog, Github repository, Figshare data repository, Google collection [update: deleted], this 🙂

Funny animations obtained from simulations. Those simulations can be run on your computer, so you can validate my research. Here’s what chemlambda looks like.

[Here come some examples and animations. ]

 

During this project I realized that it went beyond a Read Write Execute thing. What I did was to design many interesting molecules. They work by themselves, without any external control. Each molecule is like a theorem and the chemical evolution is the proof of the theorem, done by a blind, random, stupid, universal algorithm.

Therefore my Open Science attempt was to create molecules, some of them exhibiting a metabolism, some of them alive. Maybe this is the future of Open Science. To create a living organism which embodies in its metabolism the programs and research data. It’s valid if it lives, grow, reproduces, even die. Let it cross breed with other living creatures. In time the natural selection will do marvels. Life is not different than Science. Science is not different than life.

Open Science: “a complete institution for the use of learners”

The quote is from 1736. You can see it on the front page of the book “The method of fluxions and infinite series” by Newton, “translated from the author’s Latin original not yet made publick” (nobody is perfect, we know now where this secrecy led in the dispute with Leibniz over the invention of the differential calculus).

newton

That should be the goal of any open science research output.

What we have at the end of 2017?

  • Sci-hub. Pros: not corporate. It does not matter where you output your article, as long as it becomes available to any learner. Cons:  only old style articles, not more. So not a full solution.
  • ArXiv. Pros: simple, proved to be reliable long term. Cons: only articles.
  • Zenodo. Pros: not corporate, lots of space for present needs. Cons: not playable.
  • Github. Pros: good for publicly and visibly share and discuss over articles and programs. Cons: corporate, not reliable in the long term.
  • Git in general. Pros: excellent tool.
  • Blockchain. Pros: excellent tool.

I have not added anything about BOAI inspired Open Access because it is something from the past. It was just a trick to delay the demise of legacy publishing style, it was done over the heads of researchers, basically a deal between publishers and academic managers, for them to be able to siphon research $  and stiffle the true open access movement.

Conclusion: at the moment there are only timid and partial proposals for open science as “a complete institution for the use of learners”. Open science is not a new idea. Open science is the natural way to do science.

There is only one way to do it: share. Let’s do it!

Transparency is superior to trust

I am fascinated by this quote. I think it’s the most beautiful quote, in it’s terseness, I’ve seen since a long time. Wish I invented it!

It is not, though, the motto of Wikileaks, it’s taken from the section on Reproducibility of this Open Science manifesto.

To me, this quote means that validation is superior to peer review.

It is also significant that the quote says nothing about the publishing aspects of Open Science. That is because, I believe, we should split publishing from the discussion about Open Science.

Publishing, scientific publishing I mean, is simply irrelevant at this point. The strong part of Open Science, the new, original idea it brings forth is validation.

Sci-Hub acted as the great leveler, as concerns scientific publication. No interested reader cares, at this point, if an article is hostage behind a paywall or if the author of the article paid money for nothing to a Gold OA publisher.

Scientific publishing is finished. You have to be realistic about this thing.

But science communication is a far greater subject of interest. And validation is one major contribution to a superior scientific method.

More experiments with Open Science

I still don’t know which format is better for Open Science. I’m long past the article format for obvious reasons. Validation is a good word and concept because you don’t have to rely absolutely on opinions of others and that’s how the world works. This is not all the story though.

I am very fortunate to be a mathematician, not a biologist or biochemist. Still I long for the good format for Open Science, even if, as a mathematician, I don’t have the problems biologists or chemists have, namely loads and loads of experimental data and empirical approaches. I do have a world of my own to experiment with, where I do have loads of data and empirical constructs. My mind, my brain are real and I could understand myself by using tools of chemists and biologists to explore the outcomes of my research. Funny right? I can look at myself from the outside.

That is why  I chose to not jump directly to make Hydrogen, but instead to treat the chemlambda  world, again, as a guinea pig for Open Science.

There are 427 well written molecules in the chemlambda library of molecules on Github. There are 385 posts in the chemlambda collection on Google+, most of them with animations from simulations of those molecules. It is a world, how big is it?

It is easy to make first a one page direct access to the chemlambda collection. It is funnier to build a phylogenetic tree of the molecules, based on their genes. That’s what I am doing now, based on a work in progress.

Each molecule can be decomposed in “genes” say, by a sequencer program. Then one can use a distance between these genes to estimate first how they cluster and later to make a phylogenetic tree.

Here is the first heatmap (using the edit distance between single occurrences of genes in molecules) of the 427 molecules.

Screenshot-22

Is a screenshot, proving that my custom programs work 🙂 (one understands more by writing some scripts than by taking tools ready made from others, at least at this stage of research).

By using the edit distance I can map the explored chemlambda molecules. In the following image the 427 molecules from the library are represented as nodes and for each pair of molecules at an edit distance at most 20 there is a link. The nodes are in a central gravitational field, each node has the same charge and the links between nodes act as springs.

Screenshot-29_map

This is a screenshot of the result, showing clusters and trees, connecting them. Not very sophisticated, but enough to give a sense of the explored territory. In the curated collection, such a map would be useful to navigate through the molecules, as well as for giving ideas about which parts are not as well explored. I have not yet made clear which parts of the map cover lambda terms, which cover quines, etc.

Moreover, I see structure! The 427 molecules are made of copies of  605 different linear “genes” (i.e. sticks with colored ends)  and 38 ring shaped ones.  (Is easy to prove that lambda terms have no rings, when turned into molecules.) There are some interesting curved features visible in the edit distance of the sticks.

screenshot-23

They don’t look random enough.

Is clear that a phylogenetic tree is in reach, then what else than connecting the G+ collection posts with the molecules used, arranged along the tree…?

Can I discover which molecules are coming from lambda terms?

Can I discover how my mind worked when building these molecules?

Which are the neglected sides, the blind places?

I hope to be able to tell by the numbers.

Which brings me to the main subject of this post: which is a good format for an Open Science piece of research?

Right now I am in between two variants, which may turn out to not be as different as they seem. An OS research vehicle could be:

  • like a viable living organism, literary
  • or like a viable world, literary.

Only the future will tell which is which. Maybe both!

Update the Panton Principles please

There is a big contradiction between the text of The Panton Principles and the List of the Recommended Conformant Licenses. It appears that it is intentional, I’ll explain in a moment why I write this.

This contradiction is very bad for the Open Science movement. That is why, please, update your principles.

Here is the evidence.

1. The second of the Panton Principles is:

“2. Many widely recognized licenses are not intended for, and are not appropriate for, data or collections of data. A variety of waivers and licenses that are designed for and appropriate for the treatment of data are described [here](http://opendefinition.org/licenses#Data). Creative Commons licenses (apart from CCZero), GFDL, GPL, BSD, etc are NOT appropriate for data and their use is STRONGLY discouraged.

*Use a recognized waiver or license that is appropriate for data.* ”

As you can see, the authors clearly state that “Creative Commons licenses (apart from CCZero) … are NOT appropriate for data and their use is STRONGLY discouraged.”

2. However, if you look at the List of Recommended Licenses, surprise:

Creative Commons Attribution Share-Alike 4.0 (CC-BY-SA-4.0) is recommended.

3. The CC-BY-SA-4.0 is important because it has a very clear anti-DRM part:

“You may not offer or impose any additional or different terms or conditions on, or apply any Effective Technological Measures to, the Licensed Material if doing so restricts exercise of the Licensed Rights by any recipient of the Licensed Material.” [source CC 4.0 licence: in Section 2/Scope/a. Licence grant/5]

4. The anti-DRM is not a “must” in the Open Definition 2.1. Indeed, the Open Definition clearly uses “must” in some places and “may” in another places.  See

“2.2.6 Technical Restriction Prohibition

The license may require that distributions of the work remain free of any technical measures that would restrict the exercise of otherwise allowed rights. ”

5. I asked why is this here. Rufus Pollock, one of the authors of The Panton Principles and of the Open Definition 2.1, answered:

“Hi that’s quite simple: that’s about allowing licenses which have anti-DRM clauses. This is one of the few restrictions that an open license can have.”

My reply:

“Thanks Rufus Pollock but to me this looks like allowing as well any DRM clauses. Why don’t include a statement as clear as the one I quoted?”

Rufus:

“Marius: erm how do you read it that way? “The license may prohibit distribution of the work in a manner where technical measures impose restrictions on the exercise of otherwise allowed rights.”

That’s pretty clear: it allows licenses to prohibit DRM stuff – not to allow it. “[Open] Licenses may prohibit …. technical measures …”

Then:

“Marius: so are you saying your unhappy because the Definition fails to require that all “open licenses” explicitly prohibit DRM? That would seem a bit of a strong thing to require – its one thing to allow people to do that but its another to require it in every license. Remember the Definition is not a license but a set of principles (a standard if you like) that open works (data, content etc) and open licenses for data and content must conform to.”

I gather from this exchange that indeed the anti-DRM is not one of the main concerns!

6. So, until now, what do we have? Principles and definitions which aim to regulate what Open Data means which avoid to take an anti-DRM stance. In the same time they strongly discourage the use of an anti-DRM license like CC-BY-4.0. However, on a page which is not as visible they recommend, among others, CC-BY-4.0.

There is one thing to say: “you may use anti-DRM licenses for Open Data”. It means almost nothing, it’s up to you, not important for them. They write that all CC licenses excepting CCZero are bad! Notice that CC0 does not have anything anti-DRM.

Conclusion. This ambiguity has to be settled by the authors. Or not, is up to them. For me this is a strong signal that we witness one more attempt to tweak a well intended  movement for cloudy purposes.

The Open Definition 2.1. ends with:

Richard Stallman was the first to push the ideals of software freedom which we continue.

Don’t say, really? Maybe is the moment for a less ambiguous Free Science.