Non-Coding Sequences – the Dark Matter of Biology

Tobias Ruff

A couple of phenomena in cosmology (for example the speed at which stars move around the center of galaxies) contradict predictions made based on the amount and properties of the observable matter in the Universe. These phenomena indicate that there is something besides the matter we can observe with (radio-) telescopes, which is therefore called “dark matter”. Physicists consider the question about the nature of this dark matter one of the most fundamental unsolved questions, because the amount of dark matter required to explain the phenomena is higher than the amount of “conventional” matter observed in our Universe.1
In biology, there is also a phenomenon which is not well understood in detail and which is maybe equally surprising: there are long parts of DNA for which there is no known function. These parts are sometimes seen as an equivalent of dark matter in biology:2 it is known that they exist, but their function is unknown. The fact that the sequence of the human genome3 and that of many other organisms is known, does unfortunately not mean that we understand the function of all parts of the genome. Investigating the role of certain genes or parts of the genome is actually an important part of molecular biology research.
Studying the effects of altered variants of a certain stretch of DNA is a possible approach to learn more about the function of the corresponding stretch of DNA. From an evolutionary point of view, there are sometimes surprisingly little effects in animals with a defect gene: Mice without functional prion proteins show no pronounced deficits.4
The same is true for mice without a functional form the the so-called amyloid-precursor protein (APP).5 The function of the two corresponding genes is not well understood, although the aggregation of prion proteins is the cause of bovine-sponge encephalopathy (BSE) and the aggregation of APP cleavage products a possible cause of Alzheimer’s disease.
Mutations in DNA sequences with vital function are lethal for a cell. A cell could for example not live without components for the generation of energy. The genes encoding for these components are therefore highly conserved across a wide range of species. In contrast, DNA sequences without any function for an organism can accumulate random mutations over many generations without affecting the survival or the reproduction of an organism. Stretches of DNA with a high degree of variability across species and even within individuals of a certain species are therefore usually assumed to be “junk DNA”, i.e.DNA without any obvious function for an organism.
Another reason why they are assumed to be without function is the fact that they often consist of many repetitions of one sequence. Their content of information seen from a theoretical point of view is therefore low. The number of repetitions in certain parts of the DNA is even very specific for certain individuals and their relatives making them useful for the identification of individuals and their relationship to each other. The repetitive nature of large parts of the DNA provided early evidence about their origin: viruses are basically stretches of DNA or RNA enclosed by proteins and have the ability to replicate inside a cell. In some cases, these stretches have the ability to integrate themselves into the genome of its host cell. In case these events occur multiple times, the genome of a cell will contain multiple copies of the viral sequence.6 With the accumulation of mutations in these sequences of viral origin across generations, they lost the ability to act as templates for the production of viral proteins and silently remained inside the genome of the host species.
However, the transcription to RNA and eventually the translation to protein are not the only functions a stretch of DNA can have. Controlling how much of a certain protein gets produced at a given point in time is at least as important as the information about the sequence of the protein itself. This gets obvious if one takes into account that the genome of a cell in the liver is identical to the genome of a neuron, although these cells look entirely different and also fullfill very different functions. Beyond the parts directly in front of a given protein-coding region the three-dimensional arrangement of the DNA inside a cell is also likely to influence how accessible certain parts of the DNA are for the production of proteins. How DNA with the long non-coding parts is spatially arranged inside the nucleus of a cell is an intriguing, but also difficult question to examine. Even if one could resolve the arrangement of the DNA strand inside the nucleus with (electron) microscopy, one could still not tell what the sequence of the visible part of DNA is. What seems to be clear so far is the existence of certain defined arrangements of the DNA which are not random.

Read more:

[1]Frieman, J. al.(2008).Dark Energy and the Accelerating Universe.Annual Review of Astronomy and Astrophysics 46(1), 385-432.
[3]International Human Genome Sequencing Consortium (2001), Initial sequencing and analysis of the human genome.Nature, 409(6822), 860-921.
[4]Büeler, al.(1992).Normal development and behaviour of mice lacking the neuronal cell-surface PrP protein.Nature, 356(6370), 577-582.
[5]Zheng, al.(1995).β-Amyloid precursor protein-deficient mice show reactive gliosis and decreased locomotor activity.Cell, 81(4), 525-531.
[6]Bourque, al.(2018).Ten things you should know about transposable elements.Genome Biol 19, 199.