As I’ve said, when it comes down to it, my work is all about change. On a grand scale, its about how organisms change, how they evolve through time and how populations interact and change. On a much smaller scale, we track these processes through genes and how they change. So to understand how we can learn about evolution and population dynamics from genes, you first have to understand some basics about different types of genes, how they change, and what these changes can tell us.
In a previous post I explained how genes are sequences of nitrogenous bases that all code for a single protein or product. In complex organisms like, say, humans or frogs or octopuses (before anybody tries correcting me, it can be either octopuses or octopi [or octopodes for that matter], but octopuses is the more etymologically sound version), there are a ridiculous number of genes that code for an equally ridiculous number of products. In humans, there are genes that code for proteins leading to hair color, eye color, blood type, most everything you can think of. Some genes have products which regulate the expression of other genes, creating proteins called transcription factors which control the transcribing of DNA into RNA, while others don’t code for a protein at all, but instead for pieces of RNA that make up ribosomes. There’s a reason genes are considered the basic units of life, and this is why. You name a biological process and most every component of that process will have been put together from gene products.
So the question becomes: if you want to learn about an organism’s evolutionary history, whether microevolutionary (like how populations interact and disperse into new regions) or macroevolutionary (like how are frogs related to salamanders, or how are all amphibians related to birds), what genes should you look at?
The answer is, you can probably guess, pretty damn complicated.
Different genes have very, very different levels of variability, rates of change, and even types of change. Levels of variability and rates of change are easy to understand but difficult to discuss. By and large, this is a much more technical aspect of research, and which genes are informative for different types of question is still widely debated. Look around a bit and you’ll find a lot of papers discussing different strategies for selecting genes to answer specific questions, but, as I said, that all gets very technical and math-heavy very quickly. While levels of variability and rates of change are highly technical and pretty nuanced topics that I don’t feel like getting into here, the types of change genes see is a much easier topic to discuss.
You can see any number of changes in a gene: insertions of bases, deletions of bases, substitution of bases of one base for another, to name just a few. The types of changes you can see in a gene play a big part in whether or not it can tell you something about an organism’s evolutionary history. Occurrences like recombination, where copies of a gene on different chromosomes swap sections of the gene, or transposition, where genes can move around on chromosomes through some super sneaky cellular processes, can make keeping track of a single gene’s changes across time very difficult. As you can probably guess, choosing genes that have fairly predictable, simple types of change would allow for the most informative picture.
So with all that established and nothing at all resolved, the question stands: what genes should you look at to learn about evolution?
Well for my dual purposes of learning about population structure and genetic relationships between Scottish populations and European populations, we needed two types of genes. For population structure, where we need to see populations diverging across relatively short periods of time, we want genes that are change a lot, but that don’t change in response to the environment, that is, that aren’t under selection. So we turn to a type of gene called a microsatellite (or, depending on what you field you work in, they also get called short sequence repeats [SSRs] and short tandem repeats [STRs]).
Microsatellites are nothing like satellites you find in space. Let’s get that out of the way now. The name is one of those weird idiosyncratic names that comes from science’s weird and idiosyncratic history. Which is to say, I like the name and I’m going to explain it now whether you care or not. Skip to the next paragraph if you don’t. Now, when you put a whole genome’s worth of DNA in a centrifuge (so all the genes an organism has, not selecting just one or two of interest), you get two general chunks of DNA, separated by their densities: one chunk that contains the bulk of the organism’s coding DNA and a second that contains lots of repetitive sections. Now, these repetitive sections have different densities compared to the rest of a genome, largely because the patterns in which each base appears differ so greatly from your normal coding DNA. When centrifuged, these sections form bands apart from the rest of an organism’s DNA, forming “satellite” bands that sort of surround the rest of the genome. Microsatellites are then “satellite” genes that fall within a specific size range.
Microsatellite sequences are usually 2-5 base pairs long and are repeated 5-50 times, although the number of repeats can vary quite a bit. These repetitions come from what we call, adorably I think, replication slippage. As DNA gets replicated, the enzyme doing the job messes up and creates an extra unit, so that a sequence reading ATTA ATTA becomes ATTA ATTA ATTA. Over time you accumulate more and more of these units, creating a variety of versions of each gene, with each version (called an allele) has a different number of repeats. The microsatellites we chose are nice because while these slippages are pretty common, they don’t do much of anything and so the frequency of any specific number of repetitions isn’t affected by the environment or selection. Our microsatellites are non-coding (not all microsatellites are) so there is no pressure for specific numbers of repeats to appear. If two populations have pretty similar frequencies for the number of repetitions, they probably interact a lot and have genes moving between the populations regularly (which in truth means they are functionally one population, but whatever).
This is all well and good for population structure, but microsatellites are far too variable to tell us much of anything about relationships along a larger time scale, like how Scottish populations have diverged from European populations since their initial colonization after the last Ice Age. So for that we look to mitochondrial DNA.
Mitochondrial DNA is pretty unique. It comes from a part of the cell called the mitochondria, which produces the cell’s main source of energy ATP (adenosine triphosphate, for those of you who hate acronyms, which is to say, for all of you ever). Mitochondria have their own DNA separate from the rest of the cell’s and it’s only ever passed down from the mother. So for us evolutionary biologists this translates to highly conserved genes (in that most organisms have mitochondria and similar genomes in them) but with enough variability that certain mitochondrial genes can still be informative. Only being passed on from the mother means we don’t have those annoying processes like recombination to mix up the otherwise straightforward task of tracking genetic changes.
Unlike microsatellites, the changes we see in mitochondrial genes are point mutations, so insertions, deletions, and substitutions. This makes for a slightly different approach to establishing a genotype for an individual, but it also makes for more stable genes, as point mutations are estimated to be three orders of magnitude rarer than the replication slippage that drive microsatellite changes.
In my study we looked at two separate mitochondrial genes, cytochrome b (which has been used in amphibian studies in the past) and cytochrome oxidase subunit I, or COI (which has been used much less for amphibians, but a whole lot elsewhere in the world of genetics). Our goal in using both was to see if COI might be more variable than cytochrome b, and thus a bit more informative for determining how populations in Scotland and Europe have diverged from each other. Whether or not we managed to prove anything, well we’ll get to that.
So now we have two types of genes in hand, microsatellites and mitochondrial, to answer our two questions.
NEXT TIME ON NITROGENOUS BASICS: A Crash Course in Population Genetics