LAMARC offers three ways to handle variation in mutation rate among markers. This file describes all three and gives some guidelines on their correct use. Note that we will use the term 'segment' for a contiguous stretch of sites with markers of the same data type, and 'region' to indicate one or more linked segments of the same or different data types.

** Variation within a contiguous segment.** If the mutation
rate (or fixation rate) may vary from site to site within a single
contiguous genetic segment, such as a DNA sequence or group of
linked microsatellites, the best approach is to use the "Multiple
rate categories" option of the appropriate data model. This
option is described in the data models
section of the documentation.

** Known variation among segments, regions, and/or data types.**
If you know in advance
that one region has, say, a tenfold higher mutation rate than
another, the best approach is to set the "Relative mutation
rate" option of the appropriate data model. Choose one region
as the standard, and set its relative mutation rate to 1.0;
set the others proportionally. This is a good approach if you
have, for example, a DNA sequence and some microsatellites, and are
fairly sure that microsatellite shift mutations are about 1000
times more common than single base pair substitutions. The
unit of comparison is the single marker: one microsatellite,
one base pair of DNA, one SNP.

This approach can also be used for areas where the mutation rate variation is known for a contiguous stretch of markers, for example a DNA sequence containing both introns and exons. If each intron and exon is assigned to a unique segment, the known relative mutation rates can be set explicitly.

Even if you are not perfectly sure of the ratio between your data, using a reasonable guess will still be better than allowing the default of identical rates everywhere. If you are not sure whether microsatellites mutate 1000x or only 100x faster than DNA, pick an intermediate value. Assuming that they mutate at the same rate will definitely give bad results.

** Unknown variation among regions.** If you suspect that
your regions vary in mutation rate, but you don't have any
information on their specific rates, you can assume that
these rates are drawn from a gamma distribution. The
gamma distribution is a somewhat arbitrarily-chosen, flexible statistical
distribution which varies from looking exponential when its
scaled shape parameter α is low, to looking like an increasingly narrow
bell curve as α increases. Low values of α correspond
to cases in which most regions are nearly invariant, and a few
evolve rapidly. High values of α correspond
to cases in which the single-region mutation rates are approximately
normally distributed about the mean single-region mutation rate.
(The gamma distribution actually has two parameters, a "shape
parameter" α and a "scale parameter" β, but LAMARC
sets β = 1/α to avoid overparameterization, and to
allow it to work with a distribution whose mean, the product αβ,
is 1.)

LAMARC can estimate α if you have no prior conception of what a good value here would be (though a reasonable starting guess will speed up maximization). In practice, it needs more than two or three regions to make a reasonable estimate of α. If you have only 2-3 regions, it is best to guess at their ratio, or fix α to a value you find reasonable; estimation of α is likely to fail because not enough information is available. (With only one region, α cannot be estimated and will not be used.) Information on setting this option is available in the gamma parameter section of the LAMARC menu documentation.

If your data consist of several microsatellites and several DNA or SNP regions, the real distribution of mutation rates probably resembles a two-humped camel and not a gamma distribution at all. You can try fitting a gamma anyway, but be aware that you are fitting an inappropriate model. A better alternative is to guess the relative mutation rates: the large difference between microsatellites and DNA data probably trumps any differences among each group. You can even do both, giving a mutation rate constant for each region and then adding a gamma on top. We believe that this has the effect of drawing the different regions from versions of the same gamma but with its mean shifted by the given mutation rate ratio. However, this combination has not been extensively tested: use it at your own risk. It assumes that α is the same for DNA and microsatellites, which is probably not the case, but sometimes a shaky assumption is better than nothing.

Please note that Lamarc can only apply a gamma distribution
to single-region relative mutation rates if all populations
are assumed to remain constant in their respective sizes.
This is due to a mathematical complication in the way Lamarc
implements the gamma distribution (in this case, Lamarc does
*not* approximate the gamma distribution by a histogram of
relative rates). This means that Lamarc cannot simultaneously
model the gamma "force" and the force of exponential population
growth, even for fixed values of α or *g*. If you
believe one or more of your populations is rapidly growing or
shrinking, and you think the single-region relative mutation
rates are approximately gamma-distributed for your data,
then your best bet is to estimate the relative rates by some
other method and supply these to Lamarc as constants, and then
to proceed to estimate growth rates.

Also, because of the way Lamarc implements this feature, it can only be used for maximum-likelihood analyses. If you want to perform a Bayesian analysis, and you think the single-region relative mutation rates are approximately gamma-distributed for your data, then your best bet is to estimate the relative rates by some other method and supply these to Lamarc as constants, and then proceed with your Bayesian analysis.

**Bottom line.** If your data has mutation rate
variation within a segment, use the "Multiple rate categories"
option of the mutation model. If it has variation between
segments and you know the relative rate of each, use the
"Variable mutation rate" option of the mutation model.
If it has variation among regions and you don't know the rates
of individual regions, you can assume that they are drawn
from a gamma, but this is likely to work well only if you
have more than 3 regions, and is not ideal if the regions fall
into large classes with distinctly different rates. It
is best suited for large collections of one data type,
such as multiple regions with DNA.