The digital challenge facing genomic medicine
Teratec held its AGM, and I was fortunate enough to meet Jean-Marc Grognet, CEO of Genopole, the first French biocluster focusing on genetics research and applied health and environmental biotechnology. In particular, he explained why and how digital technologies have carved an indispensable foothold in these fields.
During Teratec’s AGM, Genopole’s CEO Jean-Marc Grognet, gave us the lowdown on the Genopole public interest group and the renewed need for digital technologies in exploring genomics.
« Genopole is the culmination of a 20-year-old visionary idea aimed at creating a biocluster, meaning a geographically concentrated innovation campus (at Evry, Corbeil and Courcouronnes) that brings together the knowledge triangle (higher education, academic research centers and industry) to zero in on a highly important topic (genomics). We have put an essential component in the middle of the triangle, namely the patient. The first reason is a longstanding one, since the AFM-Téléthon muscular dystrophy association set up one of its first laboratories at the Evry site. The second reason is that CHSF, the largest hospital complex in the Ile-de-France region outside Paris with 1,000 beds and 3,000 medical professionals, plays a central role in our catchment area. »explains Jean-Marc Grognet
5,600 people directing their efforts towards genomics
The idea has proven to be a resounding success, because some 20 years later, the Genopole innovation campus has attracted 5,600 people, including 2,400 direct jobs. People are spread over 16 academic laboratories operating under the supervision of INSERM, CNRS, CEA, the University of Evry Paris Saclay and Paris-Sud University (1,000 people), as well as in 96 certified companies (1,500 people) and 29 leading tech hubs.
The hubs provide laboratories or companies that already possess the necessary skills with the cutting-edge hardware that they would not necessarily be able to afford, while helping them operate and open the hardware to all the interested parties and stakeholders at the site with the goal of sharing knowledge. For example, the Mines ParisTech institute of higher education has a materials laboratory in Evry that has been given an electron microscope, which is now available for the biology applications spearheaded by Genopole’s member laboratories and companies.
Eminent HE establishments are also involved, including ENSIIE (IT for industry and business), Telecom SudParis (telecoms engineering) and IMT BS (engineering and business management).
The biocluster also heralds an economic success, since Genopole’s companies raise an average of €70-80 million a year. The last operation to date recently generated $125 million for Ynsect, an expert in insect proteins created five years ago within Genopole.
Genopole currently encompasses five campuses, which should quickly be bolstered by another two which, in addition to the laboratories, will also house the production units.
Genopole’s very lifeblood is DNA
« For 100 years, genomics has been a way of answering the very simple question about the hereditary nature of certain things and explaining any exceptions. Why do parents with blue eyes have a child with brown eyes? We also knew that some diseases had a genetic component that genomics was striving to explain. DNA was found to hold the answer, i.e. the molecule inside the nucleus of our cells. ».
DNA is a long chain that measures an average of two meters, which has an impact on the amount of information carried by each molecule. Humans have 23 pairs of chromosomes in the nucleus of each cell. Proteins associated with the DNA are present in the chromosomes. This DNA molecule is formed from a sequence of four types of molecular patterns (bases) that represent all the information carried by the cell, i.e. three billion base pairs!
« The total DNA of an organism represents its genome. Nearly all the 70,000 billion cells in the human body contain the same DNA. Therefore, cracking the genome code unlocks three billion bits of information, i.e. 3×109 ».
All it takes is for the sequence of the DNA molecule to contain errors in one of the bases and the capacities of the proteins produced are altered, which causes hereditary anomalies or genetic disorders.
Up until the 1990s, efforts were made to gain a clearer understanding of these mechanisms, and all that remained was to sequence the genome. If we tried to take the molecule and read each DNA fragment at the rate of one second for each base, it would take approximately 100 years to read the entire genome for an individual!
That explains why massively parallel computing is used, by splitting the DNA into several single-strand fragments. High-speed sequencers executing complex protocols read the sequence of bases for each fragment. The read DNA fragments are subsequently reassembled by computer analysis. The computer reproduces the genomes and stores them in large databases.
The Human Genome Project was created in 1990 with the aim of determining the DNA sequence of the human genome for the first time. A dozen laboratories around the world each focused on sequencing a chromosome. In the case of France, the sequence for chromosome 14 was revealed in 2001 by Genoscope (a CEA laboratory based at the Genopole site). Sequencing of the entire human genome was achieved in 2003.
All that remains is to determine what this message means by trying to locate the genes, or specific sequence of bases, in the genome, bearing in mind that there are approximately 22,000 genes in a human genome.
From $100 million to $100 in 20 years
Sequencing comes at a cost. The first sequencing of the genome by the Human Genome Project cost an estimated $100 million, and subsequent efforts have followed Moore’s Law by halving in cost every 18 months until 2007 (approximately $10 million).
« This is when US company Illumina came up with an earthshattering method of using massively parallel computing to sequence the genome, which caused costs to freefall until reaching the current rate of $1,000 per genome ».
« The bottom line is that we are now capable of sequencing a genome for a similar cost to a complex biological analysis, meaning that it is now acceptable for conventional medical practice. The question now is not about knowing whether we are going to reach $100 a day, but how long it will take until we reach that figure. Three years? Six years? »
Not only have costs fallen, but times have been drastically reduced. The latest sequencing machines are capable of processing 48 human genomes in parallel in 44 hours.
« With the advent of molecular technologies, robotics, IT and artificial intelligence, biology is in the throes of a real methodological revolution. We are entering the age of big data and sequencing for all. ». This development paves the way for countless new applications.
Firstly, genomics is all about exploring the rich diversity of the living world by better understanding the living species. There is also metagenomics, which involves discovering complex ecosystems formed by an entire community of organisms through their DNA. Functional genomics aims to provide a clearer insight into the mechanisms of the living world by determining the function of the genes identified in the genomes. Finally, there is medical genomics, which involves gaining a better understanding of human beings and their health.
Genomics heralds the return of the great explorers
Genomics also lets us see what we were unable to see before. For example, the Tara Oceans expedition has been taking and sequencing hundreds of thousands of water samples from seas around the world, meaning that it has been able to compile a catalog of 117 million genes, over half of which were previously unknown. Species are being discovered that have never been seen before. Similarly, sequencing has completely changed our vision of gut microbiota. Some 40 years ago, only a few bacterial species had been identified, but that number has since risen to several hundreds, and there are still many more species just waiting to be discovered!
Personalizing the medical approach
Genome sequencing for each individual will revolutionize medical practices by ushering in personalized diagnoses and treatment plans. But to achieve real efficiency, practitioners will need to be capable of knowing and interpreting the information contained in the genome in real time, which explains why there is a growing requirement for computing power and performance.
« The idea is to process streams of heterogeneous data even faster using algorithms, especially artificial intelligence, and biological computing tools as the need grows for an interdisciplinary approach to establish the right diagnosis. ».
Knowing the genomic rearrangements of a tumor and analyzing its genetic characteristics can provide physicians with precious information when determining the appropriate treatment. Efforts will be required beforehand to acquire, interpret, digest and present the data and thereby prevent them from being submerged under a mass of information.
Genomic or personalized medicine can be divided into two main branches. Analyzing the genome will help accurately detect an individual’s predispositions to given conditions according to their genetic ID card and allow preventive medicine to step in by providing individuals with behavioral advice and even prescribing preventive treatments to lower the risks.
Analyzing the genome will also detect rare diseases by establishing the right diagnosis the first time. If an illness is established, analyzing the genome will allow the physician to weigh up the different treatments and choose the fastest and most effective care plan the first time with the fewest side effects.
« For nearly two years now, oncologists have not only been able to treat a renal or digestive disorder, but a genetic mutation with a specific drug, regardless of the tumor’s location. This represents a radical change to how we view treatment ».
This whole issue underlies the major plan known as France Genomic Medicine 2025, which was partly created at Genopole and which aims to provide France with a large number of dedicated sequencing platforms that will send the information obtained to an Analytical Data Collector (ADC), which will process, interpret and send each patient’s data to a research center (“Crefix“) for validating the procedures and appliances.
Genomics and big data
Unlike many big data applications that deal with a few types of information covering a large number of individuals, genomic medicine covers a very large number of descriptors (genetic, mutations, etc.) for a very low number of individuals.
« According to estimates, we will have 1021 bytes of information to process for each individual, and that figure needs to be multiplied by several hundreds of thousands of patients a year. This will not only require significant processing power, but tremendous storage capacities in the region of 10 exabytes a year ».
This also raises a number of questions. Although acquisition no longer poses any problem, data confidentiality must be ensured and there has to be a way of associating the data with a medical record in order to have any meaning and value. How can data confidentiality be guaranteed? Who do the data belong to? The patient? The treating physician? The sequencing center? The local authority, i.e. the government? The information collected today may be useful for treating a disorder in 30 years’ time, so who will be responsible for retaining the data and for how long? Who will interpret and reinterpret the data as advances in science continue?
No digital technologies = no genomics
« Medicine in tomorrow’s world will be driven by genomics, meaning that we need ever greater computational power to safely process huge datasets, while guaranteeing very long-term data storage. We have actually been toying with the idea of creating the world’s first “digital genomics” institute to address these particular needs, because our country is home to some of the most powerful sequencing centers in Europe, as well as structures with leading-edge skills in high-performance computing (HPC) », concludes Jean-Marc Grognet.
About the author
Jean-François Prevéraud, an ENIM engineering graduate and professional journalist since 1981, has participated in countless papers and newsletters (Bureau d’Etudes, CFAO Synthèse, SIT, Industrie & Technologies, Usine Nouvelle, etc.) as a journalist, deputy editor or editor-in-chief.
Despite retiring in February 2017, Jean-François has every intention of maintaining an active lifestyle. That is why he keeps abreast of the latest trends and developments sweeping the PLM sector (CAD/CAM, digital simulation, 3D printing, the factory of the future, virtual and augmented reality, etc.).