Can New Genetic Information Arise?

Young-Earth creationists (YEC) claim that all animal species alive today are direct descendants of about 1,400 (the most recent number given by Ken Ham) breeding pairs of animals on Noah’s ark. These 1,400 species, they say, diverged into the approximately 10,000,000 species of animals we see today. They call this “rapid post-flood speciation” and an example of microevolution.

I was in a discussion where the claim was made that mutations cannot increase the information content of an organism’s genome, but rather only decrease or degrade it. This is a common claim made by YEC believers, but it directly contradicts their idea of “rapid post-flood speciation”. The amount of new genetic information required for 1,400 species to give rise to 10,000,000 species is extraordinary. We aren’t just talking about new alleles (variants of a gene) within a species, but entirely new genes that would account for the distinctiveness of each species. In reality, diversification on such a scale would be macroevolution on a scale that is much faster than anything described by real science.

I asked where this genetic information came from, if all mutations only degrade genetic information. The answer was that the genomes of those original 1,400 “kinds” were huge, and were preloaded by God with just the right information content needed to give rise to the 10,000,000 animal species’ diversity we see today. I believe that God’s hand is in all of creation, but the evidence points to no pre-loading of DNA. We’ll tackle the pre-loading claim in a moment, but first let’s discuss the first claim about mutations not producing new information.

Some creation science proponents point to the fact that DNA encodes information in much the same way that a written book encodes information. If the genome of a typical life form can be thought of as a book, then the individual chromosomes are the chapters, genes are the paragraphs, codons are the words, and nucleotides are the letters. This is not entirely how DNA really encodes information, but it is a reasonable way to explain the concept. Their argument is that this points to an author who must have explicitly written down the code, letter by letter, in its entirety. Thus, any change over time must be a degradation of that original writing. Any random scattering of point mutations in human writing is very unlikely to result in something meaningful. For example, take the following sentence:

“The world is a strange madhouse.”

If I were to randomly insert point mutations or single-letter deletions or insertions, I might come up with the following:

“The world is a strknge madhousw.”
“The world i a strange mwdhouse.”
“The word is a strange madhouse.”

There are millions of such possibilities, but I could think of only a couple single-letter changes/additions/deletions that resulted in a sentence that was still correct in spelling and grammar. And yet the meaning of this resulting sentence is still nonsensical. That is the third sentence that I listed above. Even so, the sentence is still very similar to its progenitor; it’s microevolution. It’s possible that a large number of additional point mutations, deletions, and additions would turn the above sentence into something completely different. So,

“The world is a strange madhouse.”

could turn into

“Poets have been mysteriously silent on the subject of cheese.”

with just the right number of point mutations and single-letter additions, but it would be through many nonsensical intermediaries. This would be the macroevolution of a sentence. It’s possible, but implausible. Examples like these are sometimes given to show how evolution works on a genetic level, but they are not very helpful because we intuitively see that there’s a lot of hand-waving required in the intermediary steps between two legitimate English sentences. So using an English sentence as an example makes it all the much harder to understand biological evolution.

Before answering the question of how any new information written in our DNA “language” could ever arise, I’m going to give a brief explanation of how DNA encodes the building blocks of life. I’ll point out a key distinction between a human-written language and the DNA “language”, so please follow along with my explanation even if you already know how DNA works.

DNA is encoded in four letters (the nucleotides), which we call A, C, T, and G, chemicals found in the DNA helix: adenine, cytosine, thymine, and guanine. Any sequence of three of these letters is used to encode an amino acid in a protein. There are only 20 amino acids that combine to form the proteins in every living thing on Earth.

Furthermore, there are only 64 possible three-letter ordered combinations of A, C, T, and G, and each combination translates to one of those amino acids. There is some redundancy, such that some combinations map to the same amino acid. The complete mapping is shown here.


Thus, while the English language has about 50 letters, numbers, and punctuation marks, and about a million words, DNA has only 4 letters and 20 words. The immense expressiveness of DNA is not due to the number of individual letters or even words. It’s due to the use of the resulting words — the sequences of three-letter combinations.

So now back to our question of how new genetic information can arise, rather than just be a degradation of already-existing information. The answer is that there is a crucial distinction between DNA and any form of human writing. As you can see from the chart, every three-letter combination of DNA nucleotides will code for SOME amino acid or a “stop” instruction which terminates the protein chain. Any lengthy sequence of DNA will thus code for some proteins.

So the distinction between English text and DNA is that every moderately long string of nucleotides, no matter what the random assemblage of letters, will contain a sequence that’s grammatically correct — it’s readable and parseable, and it will likely code for a protein. Will that protein be useful for anything? Probably not. But that doesn’t matter over the long run. In many cases the protein will get built and added to the soup of information floating around in that cell in some quantity every time that DNA is read. Most such proteins will have no effect on the cell biology around them, some will cause harm, and others will be useful. Anything useful that improves the organism’s chance of survival will get passed down to the next generation. But even if it’s not useful, as long as it’s not harmful, it will still get passed on… it’s carried along for the ride. Only harmful mutations are weeded out in a single generation. This shows how intermediary sequences of information are still propagated, and it explains why so much of the DNA in the genomes of all living things is not actively being used. That doesn’t mean it’s “junk” DNA. It’s necessary “scratch space” that is required for evolution to do its work (and no teleological statement is implied here). That soup of proteins combined with random motion can have incredible effects within the confines of a cell wall. At the intense rate that chemical reactions take place within the boundaries of a cell wall, it’s estimated that every free-floating molecule within a cell membrane will bump against every other molecule within that membrane in the span of seconds. David Goodsell’s The Machinery of Life and Peter Hoffman’s Life’s Ratchet provide very readable explanations of this.

The premise that mutations cannot give rise to new information is false. I’ve only shown point mutations and single-letter insertions/deletions, but we now know that there are many different ways that genetic mutations can arise and increase the complexity of the genetic code, such as duplications, chromosomal inversions, frameshift mutations, repeat expansions, horizontal gene transfer, endogenous retroviruses, translocations, and others. New information is easily added to the genome by means of duplications and subsequent divergence in function. Frameshift mutations are especially interesting, although uncommon. An insertion or deletion of a single nucleotide can change the expression of thousands of nucleotides that follow it, simply because the framing of the three letters has altered the meaning of each three-letter sequence. Thus, a copy followed by a frameshift mutation causes the generation of a tremendous amount of new information. However, it takes many, many generations — millions — in order for these usable proteins to built and be expressed in a way to account for today’s diversity of life. Evolution, even microevolution, requires diversification and natural selection. There is no way that this mechanism could account for the diversification and natural selection of 1,400 species into 10,000,000 species within the span of 4000 years since Noah’s ark.

Now what about the claim that God pre-loaded all of the information content of 10,000,000 animal species into the genomes of those 1,400 animal pairs? The problem with this is that there would be a tremendous amount of un-expressed, but perfectly preserved, genetic information that was carried for hundreds of generations. Any DNA sequence that codes for a usable protein will very likely get expressed! You can’t stop it. And any DNA sequence that doesn’t code for a usable protein will, over time, degrade. New mutations are scattered throughout the genomes in every sperm and egg cell. If there are long stretches of “pre-loaded” usable DNA that are somehow NOT used by that particular organism, the mutations which randomly occur in those stretches of DNA will continue to degrade and not be usable. So the idea of DNA which encodes for useful information that will somehow get expressed only in a species that will arise thousands of years later is nonsensical.

This is a basic tenet of genetics. If the DNA codes for something that works and provides a benefit, it will get expressed in that organism and get passed on. Large chunks of coding DNA cannot hide in the genome of a species only to be expressed thousands of years later in a descendant species. Conversely, if the DNA does not code for something, it will still get passed on, but mutations will slowly degrade it with no selective pressure to preserve it. There are edge cases with epigenetics and other mechanisms that affect this process in other ways, but these do nothing to remove the basics of what we know about evolution and genetics. The young-Earth creationists’ stance that all of the DNA diversity we see in the millions of species today was hidden in the genomes of about a thousand species without expressing themselves in those species is simply not possible.

I’m not going to address how life first arose or how we got to the initial complexity required for the DNA-copying and DNA-reading machinery to even exist before evolution could take place. I’ve discussed that in another article, and I take no position on “abiogenesis”. If abiogenesis happened, then to me, it was part of God’s plan. My goal here is to have shown that once we have at least one life form, it is possible for new genetic information to arise through natural means, without all the information content of DNA having been pre-loaded into “super-species” several thousand years ago, and furthermore that such preloading would be incompatible with how DNA expression actually works.


The first four references below are by outspoken evangelical Christians who practice science with integrity.

