Sunday, August 27, 2006

It's away!

Posted the cluster lensing paper to ApJ on Friday.
Not much else to say, I've been working on this
project, on and off and amongst other things, for
a few years now. Some science is going to come
of this work. Within the group there is already
one paper in prep. and at least two or three others
that should be completed within the year.

Monday, July 31, 2006

Milestone

I just posted the cluster lensing paper to the collaboration
for three weeks of scrutiny; then off to AJ.

I have this tendency to wait on things like this until they
feel ready. For some reason that takes longer for me than
other people, unless they just move on with unreadiness.
Even though nothing major may have changed in months,
I just keep working on them until they feel right. This
time the feeling coincided well with the timescale the
cluster group set for releasing papers. Deadlines can be
very useful, especially toward the end of a project.

I'm going to tweak some more on this draft but it is
basically done. I'm ready to move on. I'll probably
spend some time on DES stuff and begin working on
some interpretation of these results, probably on
M/L profiles.

Erin

Thursday, June 15, 2006

Long time coming

Is that fact that I've been doing a lot of research
a good excuse for neglecting my research blog?

I've been very busy. Ben Koester and I have been
going back and forth trying to make the maxBCG
cluster catalog all that it can be. As usual, 95%
of the work goes into the most difficult 5%. I think
I can say this: we understand things well enough
that we know that we can basically stop working
on it (for now). We also know what data is usable.

With the new and improved catalog I've been running
the lensing code and Morad has been measuring the
number density and luminosity profiles. Morad and
Sarah both plan to write Science papers from Morad's
work. In the short term, I am most interested in extracting
the r200-Ngals relationship from the luminosity profiles.
With this we have a good size estimate within which
we can can re-measure the cluster properties. This is an
update of Sarah's original measurement of r200-ngals for
the new algorithm. The new algorithm is significantly different,
at least in concept; we will see if the r200-ngals scaling
is different.

It will be interesting to compare this r200 to the r200 we
will get from the mass profiles. We have both the mass
and luminosity profiles, so we will get mass-to-light
ratios as a function of radius. Sarah and Morad are interested
in galaxy-formation and such topics and this should provide
an excellent starting point for that as well.

I have measured the lensing around very low
luminosity galaxies using a catalog from Michael Blanton.
This should be very interesting since there are few reliable
statistical mass measurements at these luminosities. Currently
Michael basically just has the number density argument. I have
also run Chris Miller's cluster catalog through. It is an
extremely well understood catalog from spectroscopic data.
The S/N isn't great because they are few in number and nearby
in space, making their lensing strength weak. A factor of
two improvement in number is on the way with the processing
of dr4.

Finally, I completed the set of shapelet decompositions for the
DES simulations. Now I hope to push to higher redshift by
looking at the GOODS fields. The statistics are poor, but this
will possibly give us a handle on morphology evolution.

Erin

Wednesday, May 10, 2006

Shapelet Catalogs & Clusters

SDSS Shapelets

I've been working on shapelet decompositions for
all spectroscopic galaxies from SDSS DR4, in all
5 SDSS banspasses. The primary motivator is
realistic simulations for the Dark Energy Survey,
but there can be a lot of science in this catalog as well.
Basically you transform the image into a two
dimensional basis, sort of like a fourier transform.
From that you can define a reconstruction. We
are using the reconstructions to make realistic
simulated galaxies for the sim. There is even
a technique to perturb the reconstruction in a
way that doesn't just produce random junk. This
way the hundreds of millions of galaxies simulated won't be
just be copies of the hundred thousand I decomposed.

I will write this up after I'm done, even if it is just to
post on astro-ph. They are all running now and will
be finished in a few days.

Clusters

I've been looking a lot at the maxbcg cluster catalog.
It is an excellent catalog and is very well tested at the
high mass end. I've been identifying issues that
don't really affect the high mass end but do affect the
low mass end. This is important to me because there
is a lot of lensing signal in these low mass objects and
if we can construct a believable catalog there it will
be great for lensing and will in general allow a more
complete examination of structures larger than
individual galaxies. With SDSS we can really do this.
Berlind has done this with the spectroscopic sample at
low redshift and this is our chance to do a uniformly
selected sample to higher redshift. Because of the
extra volume this will also just have a lot more statistical
power in certain ways.

Friday, April 28, 2006

Workshop & IDL extensions

Cluster Workshop

This week we had a maxBCG cluster workshop in AA.
It was very productive from my point of view. We clarified
the science we want to do, defined tasks that need to
be completed, and actually did some work. I got to some
of the sub-sampling that had been on the back burner while
I worked on systematics. These were pretty enlightening,
including a feature seen that an L200 luminosity split in a tNgals
2Mpc bin resulted in very weird profiles as you might expect,
while L200 in an N200 bin was much cleaner. At this point
I'm really liking N200 measures and I think the work that
Eduardo and Morad are both doing to improve this will be
fruitful.

I also did some X-ray clusters and, although the sample is
small and S/N not terrible exciting, it points to some interesting
future work, such as going back to the RASS and looking for
lower flux sources like we did back in the 42 clusters paper.

IDL extensions

I've been getting tired of the crappy file reading
support IDL has, and the slowness of fits and its
inability to properly deal with 64-bit integers. A while
back I wrote a suite of routines to read and write structures
to a file. These handle all the basic data types of IDL
and can write/read in either the native byte order
or another. This makes reading at least 3 times faster
than fits on an little endian machine.

But what is really needed is the ability to extract
individual rows and columns for both memory
and speed reasons. You can't do this efficiently
with the build in routines. I wrote binary_read and ascii_read
C routines, linked to IDL with the DLM mechanism, which
allow efficient reading of data from these types of files into
structures. You can extract just the data you need fast.

Also polished off ascii_write which writes a structure to an
ascii file. Many times faster than looping and doing printf statements.
The idlstruct routines can all work without these, but if these are
compiled it makes them much more powerful.

I realized as I was finishing off binary_read that this is
approaching the ability of databases. I thought about how
I could modify those simple Goddard database routines, which
just work with the most basic read/write abilities of IDL,
to make an extremely efficient native IDL database system.
Don't know if it's worth it since I have an interface to postgres,
but it would have the advantage of not requiring another
program. It would never have all the features. And, besides,
I'm supposed to be migrating to Python!

Anyway, the sdss idl (aka sdssidl, umsdss_idl) code, C/C++
and documentation are here:

http://cheops1.uchicago.edu/idlhelp/sdssidl/umich_idl.html

Erin

Thursday, April 13, 2006

Intrinsic alignments & Lots 'o talks

Intrinsic Alignments

Finished a measurement of intrinsic alignments
in clusters. Used Berlind's spectroscopic clusters
sample cross correlated with the main sample. I'm
getting a fairly weak bound (no detection), but this is
the first such measurement in clusters that I know of.

This shear needs to be converted to "delta sigma" as
measured for the clusters and I have done this.

Talks

Lots of talks here this week. Two 1.5 hour talks by
Leonard Susskind and a talk by Roger Penrose, plus
a job talk by Andrew Zentner and Alice Shapley is
talking tomorrow.

Susskind and Penrose both talked about things we
don't understand. Susskind discussed, for example,
how everyone loves the eternal inflation, bubble
nucleation, pocket universe theory, but at this point
its just a neat idea with no conceptual or mathematical
foundation. He tried to show that there are indications
that there might exist such a foundation.

Penrose talked about what happened before the big bang.
All the math breaks down there both classically and
the best quantum efforts, so you have to come up with
something new. His talk centered around the concept
that, at a nearly infinite time from now when lambda has
accelerated everything away and if all particles have decayed
away and black holes have evaporated, and somehow you
could get rid of electrons too, then there would be only
photons in the universe. In that case time has no meaning
at all since photons don't experience times. You can't build
clocks. This is an interesting idea. But then he made a huge
leap and said that if suddenly the phase space is rescaled so that
it appears to be a very dense universe, then it looks
like the big bang and everything starts over again.

Erin

Tuesday, April 11, 2006

Space-time curvature

Today I had an interesting discussion with Andrei
Gruzinov about space-time curvature. I
see clearly now that most of the un-intuitive aspects of
GR (even black holes) have the same source as the
un-intuitive aspects of special relativity. The fact that light
needs no medium for propagation leads to SR, and there
is little conceptually new in GR other than the equivalence
principle. The difference between SR and GR comes from the
fact that accelerations due to sources (such as gravity, EM)
have spatial gradients and this manifests itself as curvature
when you write down the covariant formulation.

There are some aspects of GR that seem really new, for
example the expansion of space. Andrei pointed
out that for an infinite universe you can always reformulate
the problem as particles in space given initial velocities flying
apart; nothing new there. But I don't think a closed universe
with positive curvature can be reformulated in that way
because it is not simply connected, so eventually the particles
will mix and this will produce observational differences.
The thing is, GR is a local theory, it doesn't tell us anything
about the global properties of the universe. We may
need something new altogether to address whether, for
example, the universe can have non-trivial topology.

Erin

Tuesday, April 04, 2006

More tests

I decided to run some of these tests on our
cluster sample and I definitely see a difference
with the better resolved sources. Now I want
to see how few sources I can cut and still get
the improvement. More tomorrow.

Numpy

Today there was a thread on the Numpy
discussion list about the fee based
documentation. People there made some
good points about this. This guy is actually
writing a book here, and its not like I haven't
bought books before; I own the Perl book
for example and used it all the time when I
was writing a lot of Perl code.

On the other hand, what is really needed for this numpy
thing is really just a list of all the available
commands as much as anything. This I can
do from the source perhaps, and I might actually
do that once they get all the good stuff from numarray
ported over. We'll see.

Erin

Monday, April 03, 2006

Trends in the signal and Python disappointment

Trends

I'm seeing trends of the signal with the resolution of
the source galaxies, the distance (in arcseconds) from
the source to the lens. I see no trends with the seeing,
the redshift of the source, the redshift of the lens, or
deblending (at least the blended flag).

The resolution thing is odd because earlier tests showed
no dependence on it. It may be correlated with the galaxy
density, but in that case there is another variable besides
deblending involved and I don't have a guess yet as to
what it is.

Python

I've been really getting into numerical python and thinking
it is our way to get out from under the IDL yoke. I've
been disappointed, however, with some trends I have seen
among the developers. The first thing I saw is that the
only documentation for Numpy is available only if you
fork over some cash. This seems outrageous considering
one of the whole points of developing this stuff is to get
away from fees like this with Matlab and IDL. That said,
the guy who is charging for this stuff is also one of the
most helpful contributors to the mailing list so it may
all work out in the end. I just think it is a bad example
to set.

The second hit came when reading up on the wonderful
PyTables stuff that is being developed. PyTables is a
python+C library to access the HDF5 api. Most of this is
free, but I found out that the most powerful extensions,
which allow efficient complex searches of the tables, are only going
to be available in a commercial product unless someone else
develops something similar. This crap is happening because
PyTables is released under a BSD license. This license
is not compatible with the GPL (also copyleft) and so is not free
software as I see it. BTW, can someone explain how the Python
license is compatible with GPL yet someone can then license
their Python code under BSD?

I have already put all my personal code under the
GPL which is probably the only thing I will ever license under.
I also will be discussing with my collaborators if the sdssidl
code can also be distributed under GPL.

Friday, March 31, 2006

Latest systematics tests and the inversion paper

Systematics

I ran a bunch of test, none of which
were conclusive, but I did see something
connected to the resolution parameter. This
isn't surprising. I saw a bigger effect
however when I just limited the smallest radius
in arcsec that sources would be used. This again
suggests deblending problems.

So I looked at the flags and in fact deblended
objects are included in the Princeton catalog,
when I thought they had been cut in what Rachel
gave me.

So now I'm remaking the postgres table to include
the flags so I can test this.

Dave's Inversion Paper

I read Dave's new draft. The content is excellent;
it holds much of what we have learned over the
last few years about weak lensing and stacking
in particular. I hope people read it because it
is sort of the Bible for stacking from the theory
point of view.


Erin

Wednesday, March 29, 2006

Wasted time

I wasted a few hours trying to figure out why
I couldn't link C programs to IDL on the evolve
machine. At first I thought it was because it is
running IDL 6.2, but then I got it to work on
another machine with 6.2. Then I thought it must be because
it is a 64-bit machine, but then I got it to work
on a different 64-bit machine. Then I figure it
out: for some reason IDL is running in 32-bit mode
on that machine, so it can't work with 64-bit libraries.
This is not the default behavior, so it must be
some kind of configuration error. I've written
to the sysadmin to see what's up.

I did get something done today: I finished
creating the new source catalog tables in the
postgres database and I'm ready to do some
systematics tests tomorrow.

Erin

Tuesday, March 28, 2006

Searching the sphere + looking for systematics

Searching

I hacked together a ra/dec searching code which uses the
htm (hierarchical triangular mesh) to match things on
the sphere. It is extremely fast. There is significant overhead
in finding the index for each input ra/dec position which is
a killer for small lists. For my work I always carry this
index around with all my objects, so I only generate it once.
But for a stand alone piece of code it is actually rather
annoying if not prohibitive. I don't know a workaround
for this yet.

Systematics

I've been trying to track down systematics and haven't had any
luck with the easy stuff as I have described below. So what
I have done is remake the catalogs with more info
and stuffed it into my postgres database so I can very quickly
select source samples with different cuts and run them through the
lensing code. I think this is a better approach than adding
lots more info to the input catalog read by the lensing code
and then doing cuts within because the memory gets prohibitive
for the cheopsen at Chicago (limited to 2Gb). I'm finishing this
up now so tomorrow I'll start running the tests.

The first one I want to try is cuts on how well the galaxies are
resolved. I have done this test before and it passed, but maybe
the small number statistics of the LRGs and the high Ngal bin
is more sensitive to certain regions of sky where there is bad
seeing for example. Note the masking will be a bit messed
up if a resolution cut removes all objects from a given region
of sky.

Erin

Saturday, March 25, 2006

Parameter quoting == information reduction

The discussion yesterday about the quoted
best fitting parameters brings up a larger issue. We
always try to reduce our results to a few numbers,
but that discussion illustrates that in fact doing so can
be a significant reduction in information. In fact, the
very act of creating a likelihood function does so
in a way that does not necessarily match intuition.

The function exp(chi^2) only follows
intuition if the data are Gaussian in the first place. What
does it mean if the resulting likelihood function is highly
non-Gaussian? Certainly the minimum chi^2 does follow
intuition; it is the model that is closest to the data given that
metric. But understanding the "error"
on that quantity using these techniques is more a matter of
definition than anything else; if you define the exp(chi^2) as
the probability of a given parameter, then you can draw random
values from that distribution and define your confidence regions
based on the range of parameters about the best fit that contain
some percentage of the random points. Fine, but the fact is
exp(chi^2) isn't even what we would normally define as a
probability except under certain conditions.

So why bother with all the error estimation using this function
if you end up with a skewed distribution like WMAP had with
the optical depth? I think it's fine if everyone looks at the
likelihoods and understands them. You are not really looking
at a likelihood; the breadth of that measure does indicate
something about how well constrained your model is, but
it is not clear how that translates into an intuitive feel of
confidence.

I think the only way you could really get a meaningful "confidence"
is to have N independent data sets and repeat the best fit and ask
about percentages. This tells you about the error on the independent
sets. People rarely do this because the error on these sets is
roughly sqrt(N) larger than the overall dataset. People prefer to use
bootstrap or jackknife techniques because it then artificially gives
you sqrt(N) better error estimates (I'm guilty too). Of course, if
everything is Gaussian, then in fact the error on the overall set
is sqrt(N) smaller.


Erin

3 hours of talks; still odd features in signal

Two 1.5 hour talks by Spergel today. Learned that just
about all the "significant" differences we see in the new
papers come from choices in analysis. For example, they
quoted the mean in the first year paper but they quote
maximum likelihoods in this paper. Many of the
distributions were heavily scewed so it makes a big
difference. In fact, the ML value for the optical depth
in the first year results was 0.1 and it is in the new results
as well. But the mean changed a lot, from 0.17 to 0.1.
It's not clear why the made the change.

I tried to isolate the odd features in the signal, which
shows up in the LRGs and the high Ngal clusters. I split the
LRGs by redshift but the signal remains. I also tried a buffer
that the sources had to be zlens + zbuffer (0.1 and 0.2) but that
made absolutely no difference at all. I also ran with the other
catalog instead of the princeton but this made no difference
either (it never has). The catalog has no objects where were
deblended, but this could still be the problem and we just don't
have a simple indicator.

The feature is not physical, so I must figure out a way to
isolate it. I'm out of ideas at the moment.


Erin

Friday, March 24, 2006

Tidying up

I finished generalizing all the code so that adding
new samples and sub samples takes only a few minutes.
Ran the jackknifing for sample 11, 20kpc - 10Mpc. Am
running sample12 (30 Mpc) through the lensing code, will
probably finish late tomorrow or early saturday.

I want to isolate the odd features I see in the profiles.
One thing I want to try is removing some of the interlopers
by putting a buffer zone in redshift. This is a bit harder
to model in the photoz bias corrections, but if it works I'll
put in the time to do it right.

Tomorrow I'll start some of the new randoms running nice 19 on
jet or evolve.

Erin

Thursday, March 23, 2006

Distances in cosmology

Distances

I implemented a bunch of routines to calculate cosmological
distances. I was motivated to do this 1) because I wanted to
generate a volume limited set of random redshifts, so I needed
a formula for the volume element and 2) I wanted some routines
that worked for more general cosmologies; I only had flat universe
code. Still don't have evolving dark energy code, but I'll write
that when I need it.

So I started from the bottom up and wrote an integration routine
to calculate the comoving distance, then all the steps needed to get
to the angular diameter distance (see Hogg astro-ph/9905116, which
mostly comes from Peebles 1993 book).

One thing I noted about that paper: I believe there is no
need to go to equation 19, which is not general. Because
you can calculate Dc, the comoving distance, between two different
redshifts, call it Dc_12, then you can implement the difference of two
DM's (comoving transverse distance) simply by using the Dc_12 in
formula 16. This will also work in any curvature and so is
general. I tested that it works for the cases where equation 19 holds
(curvature >= 0).

Now that I have this stuff going I'm generating 48 sets of 500,000
randoms from z=0.04 to 0.35. I'll set them running through the lensing
code in the background. I don't need these at the
moment since the randoms I have cover the redshift range for maxBCG,
but it will be good to have them for the future.

Corrected Lensing Profiles


Yesterday I finished to code to match the histograms of redshift, so
today I was able to do the random point corrections to the maxBCG
lensing profiles. These look OK, but there are some worrying features
in the profiles. The features are similar to what I saw in the LRG profiles.
Tomorrow I will do the jackknifing so we can do some more meaningful
analysis. I will also do sub-sampling in the "paper" versions of the
catalog. These are the clusters which will be published in Ben's catalog
paper. Essentially it is an Ngals cut and a redshift cut.

Erin

Tuesday, March 21, 2006

Better idea for matching randoms

A better idea for the random points (see last entry):

Take the bin in redshift for which the histogram of the lenses
is highest relative to the randoms. For that bin we will keep
100% of the randoms. Normalize the ratio of lens to random histograms
to 1 at this bin. Then the value of the ratio in the other bins is the
fraction of randoms you will keep from that bin. This maximizes the
number of randoms you will use and reproduces the histogram.

I'm matching to randoms as I write this.

Erin

Monday, March 20, 2006

First post: random points

This is my first post on my research blog, so let's get to it:

Hogg convinced me that I don't need to create new random points for
each sample I do. My thought was that each should
be statistically independent, which they won't be if I re-use random
points. But because the randoms are so well-determined now (20
million points) the statistical fluctuations in their determination
are no longer important anyway. This also allows me to just
set randoms running in the background with redshifts drawn
randomly from a "volume limited" distribution, and then use
them when I need them.

The trick is getting the redshifts correct for the randoms when I use
them with a specific lens sample. What I plan to do is just draw from
the randoms until the histogram of random redshifts is N times
the histogram of lens redshifts. This will preserve the rough shape
of the redshift distribution but not use exactly the same redshifts.

I'm also running a new cluster lens sample right now which should be
done by mid-day tomorrow; more on that when the time comes.

Erin