Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Most protein coding genes produce multiple proteins, through having multiple isoforms, and proteins are modified post-translationally. 500,000 is on the higher end of estimates I've heard, but hundreds of thousands is the right order of magnitude.


Yeah, splicing gets weird but the biggest estimate I've ever heard is 100k, and again, that's an estimate, not something we have determined empirically.

However, I don't believe it impacts the coding space problem to the same degree. For starters, isoforms tend to have the same fold and similar, but modulated function, and plenty of life exists without them. They aren't randomly sampled the way that is suggested in the OP. Again, truly orthogonal genes number in the hundreds in all the known genome of all known life.

If you start to add PTMs to the mix, then the truth is that we have no idea. A single protein will produce dozens of visible PTMs, probably thousands of invisible ones since even our most sensitive methods detect only population averages. On that vein, any individual protein also has a number of defects from translation, allosteric modification, etc. Proteins are obviously dynamic molecules, but those dynamics are of limited genetic tractability and are significantly different from environment to environment.

I think I'm still going to stick to my initial argument which is that amino acid combinations are substantially more constrained than is being suggested.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: