Friday, March 31, 2006

Latest systematics tests and the inversion paper

Systematics

I ran a bunch of test, none of which
were conclusive, but I did see something
connected to the resolution parameter. This
isn't surprising. I saw a bigger effect
however when I just limited the smallest radius
in arcsec that sources would be used. This again
suggests deblending problems.

So I looked at the flags and in fact deblended
objects are included in the Princeton catalog,
when I thought they had been cut in what Rachel
gave me.

So now I'm remaking the postgres table to include
the flags so I can test this.

Dave's Inversion Paper

I read Dave's new draft. The content is excellent;
it holds much of what we have learned over the
last few years about weak lensing and stacking
in particular. I hope people read it because it
is sort of the Bible for stacking from the theory
point of view.


Erin

Wednesday, March 29, 2006

Wasted time

I wasted a few hours trying to figure out why
I couldn't link C programs to IDL on the evolve
machine. At first I thought it was because it is
running IDL 6.2, but then I got it to work on
another machine with 6.2. Then I thought it must be because
it is a 64-bit machine, but then I got it to work
on a different 64-bit machine. Then I figure it
out: for some reason IDL is running in 32-bit mode
on that machine, so it can't work with 64-bit libraries.
This is not the default behavior, so it must be
some kind of configuration error. I've written
to the sysadmin to see what's up.

I did get something done today: I finished
creating the new source catalog tables in the
postgres database and I'm ready to do some
systematics tests tomorrow.

Erin

Tuesday, March 28, 2006

Searching the sphere + looking for systematics

Searching

I hacked together a ra/dec searching code which uses the
htm (hierarchical triangular mesh) to match things on
the sphere. It is extremely fast. There is significant overhead
in finding the index for each input ra/dec position which is
a killer for small lists. For my work I always carry this
index around with all my objects, so I only generate it once.
But for a stand alone piece of code it is actually rather
annoying if not prohibitive. I don't know a workaround
for this yet.

Systematics

I've been trying to track down systematics and haven't had any
luck with the easy stuff as I have described below. So what
I have done is remake the catalogs with more info
and stuffed it into my postgres database so I can very quickly
select source samples with different cuts and run them through the
lensing code. I think this is a better approach than adding
lots more info to the input catalog read by the lensing code
and then doing cuts within because the memory gets prohibitive
for the cheopsen at Chicago (limited to 2Gb). I'm finishing this
up now so tomorrow I'll start running the tests.

The first one I want to try is cuts on how well the galaxies are
resolved. I have done this test before and it passed, but maybe
the small number statistics of the LRGs and the high Ngal bin
is more sensitive to certain regions of sky where there is bad
seeing for example. Note the masking will be a bit messed
up if a resolution cut removes all objects from a given region
of sky.

Erin

Saturday, March 25, 2006

Parameter quoting == information reduction

The discussion yesterday about the quoted
best fitting parameters brings up a larger issue. We
always try to reduce our results to a few numbers,
but that discussion illustrates that in fact doing so can
be a significant reduction in information. In fact, the
very act of creating a likelihood function does so
in a way that does not necessarily match intuition.

The function exp(chi^2) only follows
intuition if the data are Gaussian in the first place. What
does it mean if the resulting likelihood function is highly
non-Gaussian? Certainly the minimum chi^2 does follow
intuition; it is the model that is closest to the data given that
metric. But understanding the "error"
on that quantity using these techniques is more a matter of
definition than anything else; if you define the exp(chi^2) as
the probability of a given parameter, then you can draw random
values from that distribution and define your confidence regions
based on the range of parameters about the best fit that contain
some percentage of the random points. Fine, but the fact is
exp(chi^2) isn't even what we would normally define as a
probability except under certain conditions.

So why bother with all the error estimation using this function
if you end up with a skewed distribution like WMAP had with
the optical depth? I think it's fine if everyone looks at the
likelihoods and understands them. You are not really looking
at a likelihood; the breadth of that measure does indicate
something about how well constrained your model is, but
it is not clear how that translates into an intuitive feel of
confidence.

I think the only way you could really get a meaningful "confidence"
is to have N independent data sets and repeat the best fit and ask
about percentages. This tells you about the error on the independent
sets. People rarely do this because the error on these sets is
roughly sqrt(N) larger than the overall dataset. People prefer to use
bootstrap or jackknife techniques because it then artificially gives
you sqrt(N) better error estimates (I'm guilty too). Of course, if
everything is Gaussian, then in fact the error on the overall set
is sqrt(N) smaller.


Erin

3 hours of talks; still odd features in signal

Two 1.5 hour talks by Spergel today. Learned that just
about all the "significant" differences we see in the new
papers come from choices in analysis. For example, they
quoted the mean in the first year paper but they quote
maximum likelihoods in this paper. Many of the
distributions were heavily scewed so it makes a big
difference. In fact, the ML value for the optical depth
in the first year results was 0.1 and it is in the new results
as well. But the mean changed a lot, from 0.17 to 0.1.
It's not clear why the made the change.

I tried to isolate the odd features in the signal, which
shows up in the LRGs and the high Ngal clusters. I split the
LRGs by redshift but the signal remains. I also tried a buffer
that the sources had to be zlens + zbuffer (0.1 and 0.2) but that
made absolutely no difference at all. I also ran with the other
catalog instead of the princeton but this made no difference
either (it never has). The catalog has no objects where were
deblended, but this could still be the problem and we just don't
have a simple indicator.

The feature is not physical, so I must figure out a way to
isolate it. I'm out of ideas at the moment.


Erin

Friday, March 24, 2006

Tidying up

I finished generalizing all the code so that adding
new samples and sub samples takes only a few minutes.
Ran the jackknifing for sample 11, 20kpc - 10Mpc. Am
running sample12 (30 Mpc) through the lensing code, will
probably finish late tomorrow or early saturday.

I want to isolate the odd features I see in the profiles.
One thing I want to try is removing some of the interlopers
by putting a buffer zone in redshift. This is a bit harder
to model in the photoz bias corrections, but if it works I'll
put in the time to do it right.

Tomorrow I'll start some of the new randoms running nice 19 on
jet or evolve.

Erin

Thursday, March 23, 2006

Distances in cosmology

Distances

I implemented a bunch of routines to calculate cosmological
distances. I was motivated to do this 1) because I wanted to
generate a volume limited set of random redshifts, so I needed
a formula for the volume element and 2) I wanted some routines
that worked for more general cosmologies; I only had flat universe
code. Still don't have evolving dark energy code, but I'll write
that when I need it.

So I started from the bottom up and wrote an integration routine
to calculate the comoving distance, then all the steps needed to get
to the angular diameter distance (see Hogg astro-ph/9905116, which
mostly comes from Peebles 1993 book).

One thing I noted about that paper: I believe there is no
need to go to equation 19, which is not general. Because
you can calculate Dc, the comoving distance, between two different
redshifts, call it Dc_12, then you can implement the difference of two
DM's (comoving transverse distance) simply by using the Dc_12 in
formula 16. This will also work in any curvature and so is
general. I tested that it works for the cases where equation 19 holds
(curvature >= 0).

Now that I have this stuff going I'm generating 48 sets of 500,000
randoms from z=0.04 to 0.35. I'll set them running through the lensing
code in the background. I don't need these at the
moment since the randoms I have cover the redshift range for maxBCG,
but it will be good to have them for the future.

Corrected Lensing Profiles


Yesterday I finished to code to match the histograms of redshift, so
today I was able to do the random point corrections to the maxBCG
lensing profiles. These look OK, but there are some worrying features
in the profiles. The features are similar to what I saw in the LRG profiles.
Tomorrow I will do the jackknifing so we can do some more meaningful
analysis. I will also do sub-sampling in the "paper" versions of the
catalog. These are the clusters which will be published in Ben's catalog
paper. Essentially it is an Ngals cut and a redshift cut.

Erin

Tuesday, March 21, 2006

Better idea for matching randoms

A better idea for the random points (see last entry):

Take the bin in redshift for which the histogram of the lenses
is highest relative to the randoms. For that bin we will keep
100% of the randoms. Normalize the ratio of lens to random histograms
to 1 at this bin. Then the value of the ratio in the other bins is the
fraction of randoms you will keep from that bin. This maximizes the
number of randoms you will use and reproduces the histogram.

I'm matching to randoms as I write this.

Erin

Monday, March 20, 2006

First post: random points

This is my first post on my research blog, so let's get to it:

Hogg convinced me that I don't need to create new random points for
each sample I do. My thought was that each should
be statistically independent, which they won't be if I re-use random
points. But because the randoms are so well-determined now (20
million points) the statistical fluctuations in their determination
are no longer important anyway. This also allows me to just
set randoms running in the background with redshifts drawn
randomly from a "volume limited" distribution, and then use
them when I need them.

The trick is getting the redshifts correct for the randoms when I use
them with a specific lens sample. What I plan to do is just draw from
the randoms until the histogram of random redshifts is N times
the histogram of lens redshifts. This will preserve the rough shape
of the redshift distribution but not use exactly the same redshifts.

I'm also running a new cluster lens sample right now which should be
done by mid-day tomorrow; more on that when the time comes.

Erin