ТОП просматриваемых книг сайта:
Algorithms in Bioinformatics. Paul A. Gagniuc
Читать онлайн.Название Algorithms in Bioinformatics
Год выпуска 0
isbn 9781119697992
Автор произведения Paul A. Gagniuc
Жанр Математика
Издательство John Wiley & Sons Limited
Table 2.5 The average genome size of different eukaryotic and prokaryotic viruses.
Viruses | Average genome size (Mb) | GC% |
---|---|---|
AV | 0.0339 | 45.3970 |
SD | ±0.0652 | ±9.2474 |
Samples | 37962 | 37962 |
Note that smaller standard deviation (SD) values indicate that more of the data are clustered about the mean while a larger SD value indicates the data are more spread out (larger variation in the data). The unit of length for DNA is shown in mega bases (Mb). DNA fragments equal to 1 million nucleotides (1 000 000 b) are 1 mega base in length (1 Mb) or 1000 kilo bases (1000 kb) in length. For instance, an average genome size of 0.0339 Mb is 33.9 kb. The last row (samples) indicates how many sequenced genomes were used for this calculation.
The physical size of organisms and the size of their genomes lack any proportionality or correlation. But the relationship between DNA quantity and physical size is partially different in the case of viruses. Interestingly, the largest viruses also contain the largest genomes and the smallest viruses contain the smallest genomes [91]. However, these extremes are occupied by virus species with a DNA-based genome. For instance, Pandoravirus salinus is among the largest virus species (1 μm long) and contains 2.5 Mb of dsDNA packed in particles of bacterium-like shapes [217]. Their large size is explained by DNA transposons that have colonized the genome of the giant virus P. salinus over long periods of time [218]. On the other hand, Porcine circovirus is the smallest virus, with a capsid diameter of 17 nm and a ssDNA-based genome size of ∼1.7 kb [219, 220]. RNA-based viral genomes are also among the smallest. For instance, hepatitis delta virus (HDV) contains a 36-nm virion (virus particle) and an ssRNA molecule around ∼1.7 kb [221–223]. As previously stated, plasmids, and viruses show a close GC% average of ∼ 45% (Tables 2.1 and 2.5). Evidence suggests that prokaryotic and eukaryotic ssDNA viruses have their origin in bacterial and archaeal plasmids [183]. Furthermore, as mentioned earlier, giant viruses overlap the cellular world. For instance, DNA methylation contributes to various regulations in all domains of life. Genes of giant dsDNA viruses encode DNA methyltransferases, which make use of this mechanism [224].
2.7 Viroids and Their Implications
Discussions about viruses and their simplicity or complexity form bridges that were once hard to imagine. Large viruses partially overlap with cellular mechanisms and their upper limit appears to be life. But which is the lower limit for viruses? The smallest viruses discussed here are less representative for the lower limit of infectious mechanisms. The lower limit is represented by different RNA fragments or different proteins such as prions. Prions are misfolded proteins with the ability to transmit their misfolded shape onto correctly folded proteins of the same type (please see the mad cow disease). Prion mechanisms are, perhaps, less relevant to the occurrence of life on Earth and will not be discussed here. However, the mechanisms related to self-replicating proteins represent one of the competing hypotheses for the preorigins of life. For instance, amyloid fibers arise spontaneously from amino acids under prebiotic conditions. Thus, amyloid catalysts may have played an important role in prebiotic molecular evolution [225]. In the RNA world, the current bet for the origin of life on Earth is represented by catalytic RNAs. Examples of short RNA fragments with different properties are found in many varied and distant cases throughout the scientific literature. For instance, RNA fragments of several hundred nucleotides called “viroids” are the smallest infectious pathogens [226]. Viroids were first observed in the roots of Solanum tuberosum (potatoes) by Theodor Otto Diener in 1971 [226]. The ssRNA circular structure of viroids or viroid-like satellite RNAs lacks the presence of any genes and stands somewhere in between “nothing” and RNA viruses [216]. Apparently, RNAs are the only biological macromolecules that can function both as genotype and phenotype [227]. Some viroids and viroid-like RNAs exhibit catalytic properties that allow self-cleavage and ligation [228]. This catalytic property links the opportunistic RNAs to self-splicing introns (Group I introns). Group I introns are found in protein coding genes of bacteria and their phages, nuclear ribosomal RNA (rRNA) genes, mitochondrial mRNA and rRNA genes, chloroplast transfer RNA (tRNA) genes, and so on [229–231]. In 1981, Theodor Otto Diener asked the question: Are viroids escaped introns? [232]. A small fraction of the nuclear group I introns have the potential of being mobile elements [233]. Of course, today one can ask a complementary question: Are introns some distant viroid-like RNAs introduced into the genome of different organisms through DNA intermediates? It is likely that noncoding RNAs were the indirect source for all introns [227]. These speculations place the early opportunistic catalytic RNAs at the point of origin for the eukaryotic proteome diversity. In conclusion, viroid-like molecules could have been directly implicated in the occurrence of life on Earth. It is reasonable to believe that an intersection between self-replicating proteins and catalytic RNAs has probably led to some truly rudimentary precellular forms of life. Thus, it can be speculated here that in the prebiotic period there could have been two rudimentary life forms, which gradually merged to form the Last Universal Common Ancestor (LUCA) population. Please note that “viroids” are short ssRNAs and “virions” are virus particles.
2.8 Genes vs. Proteins in the Tree of Life
Throughout different organisms, the proteome may be smaller, equal to (hardly ever), or larger than the genome. In eukaryotic species in particular, one gene may encode for more than one protein via a process known as alternative splicing. Note that RNA-splicing mechanisms are discussed in detail in Chapter 8. A comparative analysis between the average number of genes and the average number of proteins is shown in Table 2.6. Based on the values shown in this table, various rough estimates can be made on the frequency of alternative splicing in different kingdoms of life. A general equation can be formulated by assuming a “one gene–one protein” correspondence. Given that an equality between the number of proteins and the number of genes means 100%, everything that is above this threshold is a surplus that can be attributed to alternative splicing and protein splicing. Thus, the average number of genes divides the unity (a value of 1 – it can also be 100 for simplicity) and the result is multiplied by the average number of proteins. To find the average protein surplus (S), the unity is deduced from this result only if the proteome is larger than the genome, as follows:
Table 2.6 Genes vs. proteins in the tree of life.
Eukaryotes | Size (Mb) | Genes |
|
---|