Saturday, March 25, 2006

Parameter quoting == information reduction

The discussion yesterday about the quoted
best fitting parameters brings up a larger issue. We
always try to reduce our results to a few numbers,
but that discussion illustrates that in fact doing so can
be a significant reduction in information. In fact, the
very act of creating a likelihood function does so
in a way that does not necessarily match intuition.

The function exp(chi^2) only follows
intuition if the data are Gaussian in the first place. What
does it mean if the resulting likelihood function is highly
non-Gaussian? Certainly the minimum chi^2 does follow
intuition; it is the model that is closest to the data given that
metric. But understanding the "error"
on that quantity using these techniques is more a matter of
definition than anything else; if you define the exp(chi^2) as
the probability of a given parameter, then you can draw random
values from that distribution and define your confidence regions
based on the range of parameters about the best fit that contain
some percentage of the random points. Fine, but the fact is
exp(chi^2) isn't even what we would normally define as a
probability except under certain conditions.

So why bother with all the error estimation using this function
if you end up with a skewed distribution like WMAP had with
the optical depth? I think it's fine if everyone looks at the
likelihoods and understands them. You are not really looking
at a likelihood; the breadth of that measure does indicate
something about how well constrained your model is, but
it is not clear how that translates into an intuitive feel of

I think the only way you could really get a meaningful "confidence"
is to have N independent data sets and repeat the best fit and ask
about percentages. This tells you about the error on the independent
sets. People rarely do this because the error on these sets is
roughly sqrt(N) larger than the overall dataset. People prefer to use
bootstrap or jackknife techniques because it then artificially gives
you sqrt(N) better error estimates (I'm guilty too). Of course, if
everything is Gaussian, then in fact the error on the overall set
is sqrt(N) smaller.



Hogg said...

Nice post. My fear is that when errors are highly non-gaussian, it might be that chi^2 minimization is far from correct, even in the mean. I bet there are not-very pathological cases in which the chi^2 minimization gives you an answer which is much less likely than the maximum likelihood answer. I don't have an example, though.

Erin Sheldon said...

Do you mean if you somehow took
the non-gassianity into account
in your ML analysis?