"Humans and sharks, for example, make hemoglobin using nearly identical genes. That means hemoglobin genes were already present in their common ancestor."
Um. If this gene was the result of a mutation, isn't it possible, at least in theory, that mutation happened more than once? At two different times? At two different branches on the species tree?
Not really my area but there seems to be an assumption that species development (dare I say evolution) was linear, with humans being the ultimate achievement in that process. But doesn't the fact that genes mutate - sometimes triggered by viruses - imply the process was not at all linear?
> Um. If this gene was the result of a mutation, isn't it possible, at least in theory, that mutation happened more than once? At two different times? At two different branches on the species tree?
Genes are long sequences of information. You could easily have quite different genes doing similar things, but if you have the very same gene (sans minor differences) it's just too unlikely to have originated completely independently twice.
The closest thing I can imagine to your scenario is where, because of shared ancestry, 2 species have a certain gene, and then some minor mutation hits both of them, and now they both have some other gene. I suppose that's possible. But they already shared the original gene to start with.
> " it's just too unlikely to have originated completely independently twice."
Unlikely, but not absolutely impossible. Correct?
Editorial: This is where Science / science loses me. It makes absolute statements that aren't in fact truly absolute.
The point being, __if__ there was a chance identical mutation that changes (a lot?) of things. Truth be told, life coming into being has to be a couple orders of magnitude coincidental than some gene mutation. In that context, "too unlikely" starts to feel much less so, yes?
It is possible for every atom in a quarter to suddenly move in the exact same direction at the same instant, temporarily lifting the quarter off a table. But it won’t happen. It has likely never happened in the history of the universe.
Yep, the aura of statistics can blind us. It will most probably happen if enough numbers of people keep trying again and again for the next ten millions of years. Would be not much different as and asteroid falling on earth once an killing most animals.
Agreed. But just the same, given the number of cells, the number of a species, the amount of time...I think you see where this is going.
The occurance of something is also a function of the total universe of occurrences. This doesn't seem to be part of the "close to impossible" numbers being mentioned.
If life itself was the result of a roll of the cosmic dice then anything down stream sounds reasonable.
Is just an example of how unprobable things can still happen
Think about winning the spermatozoan race 2 millions of times in a row, as member of the most intelligent species known in the universe, one among millions of different forms of life, after surviving five consecutive massive extinctions that wipped the 90% of the life in the planet each time.
Flip a coin 100 times. The odds of getting the exact sequence of heads and tails that you got are about 0.00000000000000000000000000001% (give or take a zero). And yet, it happened!
Not accurate. It depends on how many times you play.
That matters here because we're not talking about any single cell or single strand of DNA. We're talking about millions upon gazillion. So "the game" is being played an unimaginable number of times. Clearly that changes the possibility of an outcome.
> "the game" is being played an unimaginable number of times.
Actually, it's quite imaginable. Not only is it imaginable, it's fairly straightforward to calculate an upper bound on the actual number, and from there to calculate the odds of producing the same gene completely independently more than once. If you do the math you will find that the odds are indistinguishable from zero.
Science is fundamentally empirical, it serves no purpose to say "with extremely high likelyhood" behind every statement, its implicit in any scientific context.
The only thing this encourages is crap like creationism and flat earth theory because there is no way science is ACTUALLY SURE.
I don't think these statements are as absolute as you take them. I'm also not sure where you go if "almost certainly did(n't)" is no longer sufficient. At that point you basically have to give up on saying anything about anything.
Science doesn't (intentionally) forget about unlikely alternatives. That doesn't mean there is any good reason to treat highly unlikely scenarios as the equals of the vastly more likely cases.
And arguing anything about chances based on things that have already happened is kind of bullshit. They already happened, however unlikely. We know unlikely things can and do happen occasionally. That does not mean other unlikely things are more likely to happen.
And if you really insist on doing that, keep in mind the timespans here also differ by orders of magnitude.
I know that this is true given the pure combinatorics of the situation. You think that it's a chance of 1/4^250 (or so), because it's a gene BUT that very much depends on the error surface in that high-dimensional space, and not so much on the exact combination.
In other words, if you need to hit haemoglobin exactly, odds are absurdly against that. But if there are 1e30 different genes resulting in substances that all increase oxygen saturation of blood, that then "coalesce" on haemoglobin as a result of optimization. In other words, I'm asking "haemoglobin" (and variants) are the bottom of a valley of an optimization process. But for evolution to "find" haemoglobin, it doesn't need to hit the bottom, it only needs to hit the valley. So it matters a great deal how big the valley is, and such a valley can be quite big. And I get it: we have no hope in hell of figuring out how big the valley is, so we just take this as answer.
So "What good is half a wing ?". Well if 1/1e30th of a correct gene already works, then "half a wing" can be quite bad and yet result in a wing. DNA is an optimization process, and if that was the defining change being selected against, is it that hard to imagine that it would converge on haemoglobin given 1e30 starting positions.
If you look at online evolution simulators, the ones with the wheel racing [1], then the "spikes for and aft, small wheel forward, big wheel aft, with a tail spike to prevent tipping over" could be haemoglobin in this example. The valley surrounding that optimal outcome is huge : it's essentially the whole universe in that case, which is a "gene" with 14 float32's and 2 integers. How many combinations is that ? Quadrillions, at least. And yet, all roads lead to Rome, or at least to the tailed bigwheel.
Of course this doesn't even seem to apply to haemoglobin. Haemoglobin and chlorophyll[2] aren't that different (in fact the gene is identical, or at least there are genes that code for chlorophyll and genes that code for haemoglobin that are identical, so ignoring variations, they're actually identical. The difference is not so much in the gene itself but what happens to the molecule after it's created, it's in the "meta" information in the gene, not in the transcription part). So what really needed to happen is a screwup in the haemoglobin gene animals inherited from plants, followed by a few hundred generation of fine tuning. (in fact, that molecule does other stuff too, animal blood, plant photosynthesis, and (most) plant colors, as well as some aspects ATP generation (and I'm sure there's more, we just haven't found those functions yet) have a very similar chemical basis, and therefore are likely regulated by very similar genes).
For most amino acids, there are several DNA triplets that code for it (there are 64 triplets and only 20 amino acids). Even if there is one magical optimized protein that the species converges to, there is no evolutionary pressure for the triplets to be the same in the two converged genes.
This is an important point. DNA is an ECC. And it seems the distribution of triplets to amino acids is not random, but has converged on a mapping that maximizes the robustness of the organism. Pretty darn amazing.
The word you're looking for is "converge", not "coalesce", as in convergent evolution, which is well known. Here's a quick find using the googles: https://www.nature.com/articles/nrg3483
If a mutation in a given gene leads to death 99.9999% of the time, but leads to oxygen transport the rest of the time, and the mutation happens many millions of times, then it is reasonable to think it could have been invented twice. Most of the time the mutation leads to death, but every once in a million times it leads to a happy result. Over the course of a few million years, the mutation happens many millions of times. In that scenario you would expect to happen at least twice, and probably much more often.
As an example, write a simple program that randomly generates strings of letters from 1 to 20 characters in length. Then include an English dictionary in the program. Compare the randomly generated strings with the words in the dictionary. How often does the random process generate actual words? Not often, but sometimes, and that is all that is needed.
If genetics were that fickle life would have never happened. The vast majority of mutations are neutral or affect the functionality of the resulting protein only by a matter of degree.
Your example totally ignores how selection works. Life does not generate genes randomly and see what sticks. Prebiotic chemistry might have worked like that to some degree. But at the point oxygen transport was invented every gene in an organism would already been subjected to billions of years of selection pressure and the gene or genes that eventually came to code for hemoglobin would already have had some other, related purpose.
"Your example totally ignores how selection works."
What do you think the word "selection" means? It means that stuff dies. It means that there is constant random mutation, most of which leads to death. Of the mutations that are useful, we say they are selected. When we speak of a gene being selected, we should recall that many mutations are failed mutations that lead to death. It is a random process. If you think there is some guiding force that leads genes down the correct path, then you are basically making a religious argument.
Your understanding of selection and evolution is inadequate. As I said, the majority of mutations are neutral. Of the rest, the vast majority have some small effect on the reproductive fitness of the organism. More often negative than positive, yes. But it's about "on average x±ε offspring instead of x" and not "99.999% likely to cause death pre reproduction, 0.001% likely to make the gene code for hemoglobin when it previously did something completely different". A gene that fickle would never get selected for in the first place; any line carrying such a ticking timebomb would go extinct very quickly. The biochemistry of genetics is itself subject to selection; the most critical genes are selected for being extra robust against harmful mutations. You proposed a mechanism for the same gene to separately evolve more than once, but that argument is irrelevant because the real world doesn't work like that.
Neutral mutations can simply be ignored for answering the question posed above. Gort asked:
"isn't it possible, at least in theory, that mutation happened more than once?"
The answer is obviously "yes". Even taking your own words, the answer is clearly "yes", so I'm not clear why you are arguing. Even in the extreme case, where mutations lead to death 99.9999% of the time, the answer would remain "yes". Your own math shows the answer is "yes". I'm sure you know what convergent evolution is:
I think you wanted to make the point that this is rare. You should have said so. You could have answered gort by saying "Yes, but this is rare." Instead, you've taken an indefensible position.
What if there are a million different possible genes that produce hemoglobin? In that case, if two different species were found to produce hemoglobin but use the same gene, then what would that imply? I think this is what grandparent meant by "You could easily have quite different genes doing similar things".
> there seems to be an assumption that species development (dare I say evolution) was linear, with humans being the ultimate achievement in that process
Only amongst people who haven't done any reading or learning about evolution.
> the process was not at all linear
Indeed.
> If this gene was the result of a mutation, isn't it possible, at least in theory, that mutation happened more than once?
Yes. But if the gene is nearly identical in sharks as in humans, the chance of that happening from convergent evolution would be small enough that "they have a common ancestor" is a much more likely cause.
Mosquitoes supposedly helped transfer bits of DNA between species, which over millions of years influenced multiple species... or is that because they were the pathogen carriers?
Evolution is not linear. All species are at a different leaf of the tree of evolution from some sort of “proto-being” as the root node. Humans are no different than dogs or cacti. Humans may be the common ancestor of two different species that exist 1 million years in the future.
It is definitely possible that the two same mutations for hemoglobin happened at different points in time, but as I understand it, there is an assumption of parsimony. In other words, it is much more likely that the creation of genes for hemoglobin for sharks and humans happened at the same time than at different times.
I think we will always have a special place in the history books (perhaps not a pleasant one) because we were the first to have some ability to consciously edit not only our DNA but the DNA of other species.
Thinking about it, if life begin in millions of different universes, there would seem to be one of two possible outcomes. Life eventually goes extinct (due to something happening on the planet that life can't evolve to survive, if nothing else the death of the planet's star), or life develops a species with enough intelligence to be able to modify their own evolution. An idea similar to the anthropic principal. Though different in that we are a sufficient end case for the anthropic principal, but we haven't reached a sufficient end case in regards to evolution (though we have begun to grasp it).
Yes, as other have said "convergent evolution" is a concept in biology.
However, in the case of hemoglobin this is unlikely because this gene is particularly large, and sensitive to change. As a result it is highly "conserved". That is, mutations in this gene are selected against at a high rate because they tend to be deleterious.
To create a gene like hemoglobin, you'd need hundreds of mutations to develop a hemaglobin gene from raw material. It is possible, but unlikely that humans and sharks would have independently accrued the same exact mutations in the same places since their divergence to create the same gene. It is far more likely that since this gene is essential, it has remained mostly unchanged since the divergence of sharks/humans from their common ancestor.
There is a lot of confusion in this discussion (and in the original paper) about what "genes" are, and what "novelty" means. Human genes and shark genes (the DNA sequences) are not nearly identical, in fact, the DNA sequences that encode human and D. rerio (zebrafish, closer than a shark) hemoglobin share about 67% identity over about 2/3 of the mRNA coding length, which is which is quite statistically significant (expected 1 in 10^-30 by chance). But hemoglobin "genes" (the DNA) encode hemoglobin protein, and the human and D. rerio protein sequences share more than 50% identity over the entire length of the protein; a level of similarity expected by chance once in 10^45 searches (these low probabilities would be much much lower for an average length protein of 400 amino acid, globins are much shorter -- 145 aa -- so they have excess similarity or statistical significance). In the protein world, sequences with less than 30% identity often have very high statistical significance. As several writers mention, when the amount of similarity is so high that the odds of it occurring by chance are very low, the simplest explanation is that the similar sequences diverged from a common ancestor.
Let me reassure you that computational biologists who do these kinds of analysis are very aware that evolution is not linear (hence the many methods for building evolutionary trees), and that similar mutations could (and do) occur in different branches of a tree. The argument being made here is that something new has appeared -- there are many new trees that emerged about 650 million years ago. (In the protein world, 650 million years is a relatively short time, so evolutionary relationships at that distance should be detected.)
That said, it is very dangerous to talk about "novel genes" or "novel proteins" without being very clear about the nature of the novelty. Just as protein sequence comparison is more sensitive (has a longer evolutionary look-back time) than DNA sequence comparison, there are methods (not used in this paper) that can look back even farther, and we know that there are ancient evolutionary relationships that are difficult to recognize using sequence comparison, the approach used here. Moreover, many proteins are build from modules (called domains) that can be rearranged, so even though the module might be old, the particular arrangement might be newer. After a quick scan of the paper, it was unclear to me whether the authors are claiming that there are a large number of genuinely novel domains (unlikely), or simply arrangements (more likely), or perhaps simply duplications of existing genes (very likely, but not very novel).
We're not talking about cells, we're talking about gametes, an fertilized ones, with that. Otherwise, you may have the best mutation ever, it will never be herited by your offspring.
So, basically, the total number of sharks having been born and having reproduced. It must be large, but probably not enough to beat the odds.
Convergent evolution is a thing, it occurs, but it's generally considered super-rare due to the parsimonious assumption.
It's unlikely that the sequences of the two independently evolved proteins would be so similar; you also wouldn't see a chain of progressively similar hemoglobins as you moved down the gene tree to older ancestral organisms.
Well, there are functional constraints on binding iron, right? There would probably be some residues- spread throughout the protein- that in its 3D folded form, would have a similar orientation to allow for heme binding and controlled release.
> Convergent evolution is a thing, it occurs, but it's generally considered super-rare.
Not anymore - though it depends on what precisely you mean by convergence, of course. I highly recommend the new book Improbable Destinies by Jonathan Losos[0]
By convergence, I mean in the sense that two independent organisms that did not share a historical character trait (in this case, a specific homologous gene) independently evolved an identical trait (a gene with provably no homology, but identical function) through the process of natural selection, with no information sharing via horizontal gene duplication.
This was argued out a while ago with stick insects that lost and then regained the ability to fly (several independent subspecies did this). A lot of people thought it meant the stick insects "completely lost all the genes associated with wing formation" and then "gained an entirely new set of genes that caused wing formation", but folks in the evo devo world believe- and I agree- that the insect never lost anything except a few regulatory domains that managed the expression of pre-existing components that are depended upon by multiple other subsystems (IE, wings are composed of basis proteins that are used for many body parts).
My interest is in molecular evolution, not full character traits, because those phenotypes are incredibly complex and IMHO beyond our current level of sophistication in analysis.
> By convergence, I mean in the sense that two independent organisms that did not share a historical character trait (in this case, a specific homologous gene) independently evolved an identical trait (a gene with provably no homology, but identical function) through the process of natural selection, with no information sharing via horizontal gene duplication.
I'm no expert and could be wrong but I think this is what Losos is referring to in his book. For example, "New and Old World porcupines do not share a common evolutionary heritage… The two lineages have independently evolved their quills from different, unquilled rodent species." Dozens of stuff like this. He goes into more detail with tropical island anoles, his field of expertise, and orients the whole discussion around Stephen Jay Gould and Simon Conway Morris's competing views on the overall paradigm of convergence.
See this NPR interview for a quick summary[0]. A key quote (which I'm not discerning enough to tell if it challenges your claims or if your claims are actually already in response to these claims): "the flood of molecular DNA data that has come forth in the last two decades or so has in many cases revised our understanding about how species are related to each other. And it has revealed that many species that we thought were similar because they're closely related, that they're not closely related and that their similarity is the result of convergent evolution."
Sure, I'm not going to complain about things like quills being an example of convergent evolution. those two porcupines aren't the same species- they were ad-hoc clustered together based on phenotypic similarity. But if you investigated, you'd seen huge differences in the sets of genes required to grow quills.
There's nothing surprising or interesting about that- spike shapes are common in nature and probably fall out of basic genetic evo-devo rules naturally.
Everyone is right in a way. Evolution is messy and undirected.
Some genes appear very early in the tree of life. Others have re-evolved two, three, or hundreds of times. This happens both because there may only be one reasonable way to encode the same function or because the same existing structure can be re-used for the same “new” function (evolution often re-uses existing features because that’s far easier then evolving ex nil)
I think it is generally true to say that individual genes don't work in a vacuum, and neither do the biological mechanisms they encode for, so the transferred DNA would have to contain a suite of co-dependent genes. Admittedly, there is a counter-example in the transfer of genes conferring resistance to a particular pathogen.
I think it is also true to say that convergent evolution often produces results that differ genetically, even though the resulting phenotypes serve the same purpose.
> That means hemoglobin genes were already present in their common ancestor
Or that some DNA capture happened in some part of the road. Now we have transgenic tomatoes with flatfish genes in their DNA. There is not any guarantee that the common ancestor of both species would had developped those genes yet at the time of the code fork.
I'm not an expert in the field but my understanding is that DNA has a huge amount of "junk" in it. This junk doesn't do anything but can act as a fingerprint for where the DNA evolved from, as it's impossible for the junk to be evolved twice in exactly the same way
The concept of junk DNA was mostly taken apart a few years ago. Those sections often serve regulatory functions and other purposes related to controlling how genes are used and at what rates.
You're referring to regulatory regions which really represent a tiny component of the genome (in terms of base fraction)... but nobody has really constructed a strong argument (that can be tested) that long stretches of Alu (which comprise most of the junk DNA) actually have a regulatory or other function.
It's speculated those regions are gene foundries, but it's mostly speculation with limited evidence.
I think there's a 'balancing game' between "I need my DNA to be as compact as possible" (no waste of energy, faster to duplicate, less chance for errors) and "I need space in my DNA to allow new things to evolve".
In zebrafish and mouse, the more "developmental" a gene is, the more SPACE it seems to have around it. More space means more regulation and more potential for new regulation.
I thought they were part of the structural mechanisms of the regulatory regions, ie they need long stretches of dna to facilitate it looping regulators back onto the allele
No, it really hasn't been. Projects like ENCODE which claimed that 80% of the human genome was functional were done in ignorance of genomes of other organisms. There are animals like the pufferfish that have almost no junk DNA. And there are amoebas with more junk DNA than do humans. If junk DNA was in general functional, this would not be the case.
I mean, 'junk' DNA is just the noncoding regions. It contains everything that would look like a conditional jump to us CS folks, or a gate to EE folks. The coding regions are closer to .rodata; literally tables of the protein chains.
The relative lack of non coding DNA in other, simpler cells doesn't tell us that we don't need ours. It's equally possible that our DNA is just making significantly more dynamic decisions.
But the fact that some have more non-coding DNA doesn't really fit that idea. Unless you think amoebas are secret masters of the world. The simplest explanation is still that it is neutral baggage of neither benefit nor hindrance and organisms can gain or lose it by chance.
Or their strategy is to generate proteins with less of a concern for how their environment is dynamically changing. Which more aligns with single celled and very simple animals.
And it's not necessarily that they have more coding regions. It's that they have a greater percentage vs noncoding. It's a proportional thing.
Um. If this gene was the result of a mutation, isn't it possible, at least in theory, that mutation happened more than once? At two different times? At two different branches on the species tree?
Not really my area but there seems to be an assumption that species development (dare I say evolution) was linear, with humans being the ultimate achievement in that process. But doesn't the fact that genes mutate - sometimes triggered by viruses - imply the process was not at all linear?