Biotechnology - Genomic and Proteomics/Commons based cases in BGP
Research questions
- Commons based cases (the cases that we know will appear in the right part of the quadrants)
- Identify cases
- Correlate them with their main outputs (Data. Narratives. Tools)
- How and in what extent they are “experimenting” or “adopting” commons-based approach. Are they adopting OA policies, for instance? Are they adopting Social Responsible License approaches?
- Identify these cases and treat them as entities that will also be placed in our mapping device (the quadrants)
- Identify what actors are participating on this and what actors are just observers (Use the questionnaire to guide your research when appropriate - Carol will select specific relevant and helpful questions)
Common-based cases
The understanding the parameters of participation and regulation in Genomics (and Proteomics – developed below in this report) helps to understand the appropriation rules/parameters that are characteristic of this field. It is also interesting to compare the processes around Open Source, specific Linux, with the process around Observational Data, for example. The basic motivations for the community formation around the development of Linux were “it is bad, lets improve it”, in addition to a shared ideology related to the negation of Windows, in addition to the existence of a big enough community of programmers with the capacity and the tools needed. Rules on how to work together in this enterprise were then developed.
Differently, in the beginning of Biotechnology it was just possible to work in just one specific gene at a time, because of the lack of capacity/knowledge and tools (the bioinformatics were developed just in the second wave of Biotechnology). Also, the concept of networks of genes was not yet developed, since the belief was that each gene was important in their own. This reality had as a consequence the emergence of islands of knowledge around a specific gene and its expressions, “forcing” scientists to collaborate, but the way they found to collaborate was through the production of papers.
However, after the Human Genome Project and the development of the bioinformatics, the concept of genes functioning in a network environment was developed, changing the need for collaboration in the field of gene expression. Scientist did not need to collaborate anymore, since they could now, with the tools and machine available, work the entire genome by their selves. Suddenly, there was no clear motivation to collaborate and people started to keep their data private. This was the environment that made possible the birth of the second wave of biotechnology industries, the so called “Industrialized Biotechnology” that also had as part of its business models a controlled and charged access to data.
An observation of the first two waves of biotechnology – the laboratory-based biotechnology and the industrialized biotechnology – shows that the capacity of production was in the hands of few people and the ability to use data was regulated by trade secret but also technologically, by the non existence of standard data, making impossible interoperability for observational data. This, if one compare to what made the Linux development process, based on the commons and peer-production strategies, give us some understanding of why and when people cooperate in Biotechnology.
However, currently, there is a movement for a “commons-based” approach in observational-genomics since there is a need to advance and not repeat research in addition to a growing understanding that genomics is so complicated, that uncertainty is so high and there is so much information asymmetry in the industry, that the field needs commons-base efforts.
An example is that in the 2000s, the community of science came together to create a standard – the Minimum information about a microarray experiment (MIAME). This allowed the creation of a standard database by the NIH, the Gene expression Omnibus (http://www.ncbi.nlm.nih.gov/geo/).
Parallel to this increasing necessity of collaboration from the community, robots to study genes became much cheaper and also easy to assemble in addition to a growing service industry of services providers with core laboratory and informatics facilities that could perform experiments in change for a payment. As a result, we see a more democratic and open participation in the field of data production on gene observation.
Foundational Data Commons
Human Genome Project
- External Link: Human Genome Project
- Products: Data and Tools. Genome sequence available publicly
- Governance: funded through the NIH
- Comment: Another interesting instance of the commons - the government used the power of funding to mandate open access requirements from the organizations which participated.
- Summary/Notes:
The Human Genome Project was the mapping of the entire human genome using an open approach. Funded primarily by US government funding emerging initially from the Department of Energy (representing the roots of genomic research in the push to understand mutation emerging from radiation exposure), the HGP was a classic “big science” project. An enormous amount of money was committed, a small number of centers were chosen to receive that money, and there was an expectation that the data resulting would be an open product. The primary regulation in our nomenclature was normative and not legal – the data was in the public domain, but there were some expectations of scientific behavior and the right to punish violators was reserved, but the punishment would be in the discipline via peer review and grantmaking review and not via the courts.
The norms that emerged from the Human Genome Project served as the basis for setting norms for the development of common‐based practices in the genomics field and also for the understanding of the legal rules related to database protection. For instance, in the in‐take angle, with the HGP it was understood that a limited group of people could contribute since there was a lack of capacity and infrastructure. Not many had the scientists, the labs or the machines to develop the study – a marked characteristic of differentiation when you compare the HGP with Open Source projects, where there is a democratization of means via ubiquitous cheap desktop computing and ICTs.
However, after some time into the project development, the sponsors of the project – the government – realized that the people part of the project’s team was not posting the data they were producing and the competitor Celera was rapidly creating a private version of the genome via new technology (itself developed at least partially with HGP funding). This was the origin of the codified and formalized norms known as the Bermuda Rules, further developed during the Fort Lauderdale meeting. The Rules were simple and clear: all data was in the public domain, and it would be posted online with 24 hours of coming off the machines. However, scientists using the data were expected to check and see if the data had been “published” yet (the fuzzy part) and if it was unpublished they were expected to honor some norms about the data.
The norms that emerged from the HGP were the inspiration for a norm‐setting process in the HapMap project. However, when the HapMap came to life, the Open Source Movement was already a well developed and studied movement. The FLOSS movement inspired the HapMap to adopt, in its beginning, a more regulated approach, through the institution of a “click wrap” contract among the HapMap participants during its in‐take and out‐take process. The sharing norms instituted by the HapMap contract highly regulated the publication process and also tried to interfere in the exploitation (more precisely – the abuse) of patents that may have emerged from the HapMap out‐puts.
HapMap
- External Link: HapMap
- Products: Data. Coordination between researchers in Canada, China, Japan, Nigeria, United Kingdom and the United States to identify disease-causing genes. Data released into the public domain
- Governance: Combination of both public and private organizations (http://www.hapmap.org/groups.html)
- Another good instance of commons-based production
- Summary/notes:
However, before the HapMap Out‐Take process became a Open Unregulated Commons, it was an Open Regulated Commons by a contract that asked for “not to reduce access to the data” and a kind of “share‐alike” for patents generated through the use of the data from the HapMap. This approach, as can be seen in the project’s site, was abandoned for a unregulated environment – the HapMap Out‐take = Open Unregulated Commons. The reasons were: (1) the project was finished, so there was no reason to protect the data anymore and (2) it was preventing data integration. It turned out that the clickwrap license was less effective at preventing patents than the simple creation of prior art, and its effect on data integration was felt to be toxic
enough so that the contract was removed. (See: http://www.hapmap.org/cgi‐perl/registration + Science Vol. 312. no. 5777, p. 1131 ‐ The HapMap Gold Rush: Researchers Mine a Rich Deposit)
It is interesting to track that the desired community regulation moved from norms to contract back to norms, and that the desire for the community output to serve as an input into new systems (like integrated genomic global databases) was an important factor in moving back to norms
ENCONDE
- External Link: ENCODE
- Products: Data. Open consortium to identify all functional elements of the human genome. Data is made publicly available
- Governance: Part of the NIH
- Comment: Perfect instance of commons-based production.
Sage Bionetworks
- External Link: Sage Bionetworks
- Primarily products: Data and Narratives.
- Summary-Notes on Sage
Observational Data Commons
Gene Expression Omnibus
The birth of the GEO project in the environment we briefly analyzed above allowed the emergence of a new kind of Commons, the Partially Open Commons, where everybody that has the money and the tools to run the experiment is allowed. This pattern may be similar to the Open Commons, but it is different from the “Limited Commons” that in general we observed as a patter in the Foundational Data projects, where just the “chosen” ones could contribute.
- GRAPHIC 1****
GEO In-Take = Partially Open Regulated Commons Axis x = Open Group Commons = everybody that has the capacity and tools to do so (however, these elements are less distributed in society, if we compare to the software commons/Open source movement) Axis y= Regulated Commons = Regulated by the MIAME standard norms on how to generate the database
- GRAPHIC 2****
GEO Out-Take-Legal = Open Unregulated Commons = it is data access point created by NIH Axis x = Open Commons: anyone can use it Axis y = Unregulated (or Partially Regulated) Commons: final data is in Public Domain, no need for registration, data is downloadable, however, there is an agreement on keeping using the MIAME to further production of data. Currently, MIAME is been adopted in the PROTEOMICS space in its MIAPE version. See http://www.psidev.info/index.php?q=node/91
Tools Commons
BIOS/CAMBIA
- External Link: CAMBIA / BIOS
- Output: Tools (e.g. new databases) and Narratives (studies and papers)
- Governance: Non-profit NGO. Funding through the Norwegian Government, Horticulture Australia, and the Lemelson Foundation
- Should definitely take a look at the BioForge project, which aims to encourage collaboration between research groups in the life sciences
Methodologies for the Commons
Health Commons
- External Link: Health Commons
- Products: Data, Narratives and Tools. Coalition of organizations aim to share data under a common set of terms and conditions
- Governance: lead by 501(c)3 Science Commons
Infrastructure for the Commons
Ensembl Genome Browser
- External Link: Ensembl Genome Browser
- Output: Data. Aims to automatically annotate the genome, integrate that annotation with other databases and share the product freely on the web
- Governance: Collaboration between the European Bioinformatics Institute and the Wellcome Trust Sanger Institute
- Comment: Interesting case - seems to be using data that's in the commons, managed by private organizations, to produce a new product that is also in the commons
BIODAS: Distributed Annotation System
- Products: Data, Narratives and Tools. Aims to create standard protocol for exchanging genomic annotations
- Governance: Distributed, though with self-appointed leaders
- Comment: This falls under the gray area of the definition of 'commons'. It is much closer to Lessig's definition, where something like TCP/IP could be considered a commons.
National Center for Biotechnology Information
- External Link: National Center for Biotechnology Information
- Output: Data. Creates publicly accessible data and analysis systems for biochemistry and genetics
- Governance: Division of National Library of Medicine and National Institutes of Health
- Comment: Probably does not count as a common-based system. The tools, while publicly available, do not appear to be publicly edit-able. Might be more useful to see what if any collaborative enterprises develop from this work
The Open Biological and Biomedical Ontologies
- External Link: Open Biological Ontologies
- Primarily products: Data, Narratives and Tools. Aims to support community of people developing biomedical ontologies
- Governance: Coordinating editors from the Berkeley Bioinformatics Open-Source Projects - there does not seem to be a system of elections
- Summary:
The OBO Foundry is a collaborative experiment involving developers of science-based ontologies who are establishing a set of principles for ontology development with the goal of creating a suite of orthogonal interoperable reference ontologies in the biomedical domain. The groups developing ontologies who have expressed an interest in this goal are listed below, followed by other relevant efforts in this domain.
In addition to a listing of OBO ontologies, this site also provides a statement of the OBO Foundry principles, discussion fora, technical infrastructure, and other services to facilitate ontology development. We welcome feedback and encourage participation.
Open Wet Ware
- External Link: Open Wet Ware
- Primarily products: Data, Narratives and Tools. Sharing best practices in biological engineering
- Governance: Elected officers, funded through the NSF
- Summary/Notes: OpenWetWare is an effort to promote the sharing of information, know-how, and wisdom among researchers and groups who are working in biology & biological engineering.
Others
BMC Biotechnology
- External Link: BMC Biotechnology
- Output: Narratives. Open Access Biotech Journal. Anyone can submit, though maintains a peer-review process
- Governance: The site itself is part of Springer Science+Business Media
Chiron
- External Link: Chiron
- Output: Data. Chiron maintains connections to lots of individual scientists. Reported 1,400 informal agreements and collaborations with other companies and 64 formal collaborations with other companies (Powell pp. 72-73)
- Comment: Seems Chiron has built a collaborative network qualitatively different from that of other firms. Likely worth investigating more
Michigan Biotech
- External Link: Two Michigan Biotech companies decide to share their lab and equipment
- Output: Tools. Two companies didn't have the money to maintain separate labs, so they merged their efforts.
- Comment: It might be interesting to talk to these people personally and ask what if any collaboration this sort of proximity has brought
Personal Genome Project
- External Link: Personal Genome Project
- Products: Data and Tools. Aims to make personal genome sequencing possible and affordable
- Not clear that this is an instance of a commons. Right now, seems to me to be a free service.
Bibliography for Item 10 in BGP
Biotechnology_-_Genomic_and_Proteomics