There are about 25k unique protein coding genes in the human genome, not 500k.
Additionally, there appears to be a size limit to proteins. The largest human protein, titin, is about 40k amino acids. No larger protein has ever been discovered.
Obviously 20^40000 is an incomprehensible coding space, but we can with a few broad strokes narrow that number down immensely.
For starters, Eugene Koonin argues in The Logic of Chance, and elsewhere, that the evolution of proteins is limited by the amount of stable folds available in cell-like aqueous conditions. It's an impossible number to quantify but we do know that even most cellular proteins will naturally misfold coming off the ribosome unless aided by chaperones, so we can assume that coding space is drastically decreased.
Furthermore, because of inertia, there are many combinations evolution simply will not sample. Most genes evolve by genome duplication events. The conclusion of the process is that youre never going to get too far away from what you stated with, and indeed we see this is the case with there only being a handful of true, conserved orthogonal genes (these number in the hundreds).
Obviously, n = 1, but I guess the point I'm trying to make is that once you have stable structure/function chemistry, there's probably many many routes to "humans" or something else able to be self aware. We just see one of them because of we were guided here by earth chemistry and inertia.
What is far more confusing is how the ribosome evolved in the first place, because that's the true de novo lynchpin of cellular life.
Most protein coding genes produce multiple proteins, through having multiple isoforms, and proteins are modified post-translationally. 500,000 is on the higher end of estimates I've heard, but hundreds of thousands is the right order of magnitude.
Yeah, splicing gets weird but the biggest estimate I've ever heard is 100k, and again, that's an estimate, not something we have determined empirically.
However, I don't believe it impacts the coding space problem to the same degree. For starters, isoforms tend to have the same fold and similar, but modulated function, and plenty of life exists without them. They aren't randomly sampled the way that is suggested in the OP. Again, truly orthogonal genes number in the hundreds in all the known genome of all known life.
If you start to add PTMs to the mix, then the truth is that we have no idea. A single protein will produce dozens of visible PTMs, probably thousands of invisible ones since even our most sensitive methods detect only population averages. On that vein, any individual protein also has a number of defects from translation, allosteric modification, etc. Proteins are obviously dynamic molecules, but those dynamics are of limited genetic tractability and are significantly different from environment to environment.
I think I'm still going to stick to my initial argument which is that amino acid combinations are substantially more constrained than is being suggested.
There are about 25k unique protein coding genes in the human genome, not 500k.
Additionally, there appears to be a size limit to proteins. The largest human protein, titin, is about 40k amino acids. No larger protein has ever been discovered.
Obviously 20^40000 is an incomprehensible coding space, but we can with a few broad strokes narrow that number down immensely.
For starters, Eugene Koonin argues in The Logic of Chance, and elsewhere, that the evolution of proteins is limited by the amount of stable folds available in cell-like aqueous conditions. It's an impossible number to quantify but we do know that even most cellular proteins will naturally misfold coming off the ribosome unless aided by chaperones, so we can assume that coding space is drastically decreased.
Furthermore, because of inertia, there are many combinations evolution simply will not sample. Most genes evolve by genome duplication events. The conclusion of the process is that youre never going to get too far away from what you stated with, and indeed we see this is the case with there only being a handful of true, conserved orthogonal genes (these number in the hundreds).
Obviously, n = 1, but I guess the point I'm trying to make is that once you have stable structure/function chemistry, there's probably many many routes to "humans" or something else able to be self aware. We just see one of them because of we were guided here by earth chemistry and inertia.
What is far more confusing is how the ribosome evolved in the first place, because that's the true de novo lynchpin of cellular life.