Linear extended sources

A linear extended source is a source that has a number of free parameters to control its surface brightness linearly. Generally, for these sources we can write the surface brightness $f(\theta)$ as a linear superposition of basis functions $b_n(\theta)$:

\[f(\theta) = \sum_n c_n b_n(\theta) \; .\]

The advantage of these sources is that one can perform direct inference on the parameters $\{ c_n \}$ without the need to include these parameters in the inference algorithm. Note that this is totally equivalent to the direct inference of the luminosity for point-like sources.

In this section we investigate the proper use of these sources and their inclusion in the inference algorithm.

Examples of linear sources

Apparently very different models of sources can be actually treated consistently, as long as the linearity property above holds.

Pixelated sources

These sources are described in terms of a (regular) grid of pixels, with one weight for each pixel. The surface brightness of the source can be interpolated using a suitable technique. In general terms, this can be carried out using a suitable interpolation kernel. For example, a 1st order B-spline (i.e., a top-hat kernel), is equivalent to a nearest-neighbor interpolation, while a 2nd order B-spline is equivalent to a linear interpolation. Other options, such as cubic splines or Laczos kernels, are of course possible.

Other basis functions

Alternatively, one might want to describe the surface brightness of a source in terms of non-local basis functions, such as rectangular or circular shapelets. All further results are not influenced by the specific choice of the functions $\{ b_n(\theta) \}$.

Observed linear sources (linear images)

The surface brightness of observed sources differs from the original surface brightness because of gravitational lensing and of the effects of the point spread function (PSF).

Gravitational lensing

Sources affected by gravitational lensing are distorted as a result of the lens mapping. In the simplest approach, one can just consider pencil-like beams and model the surface brightness affected by gravitational-lensing, and replace the angular position on the sky $\theta$ with the mapped angular position $\beta(\theta)$:

\[f(\theta) = \sum_n c_n b_n\bigl(\beta(\theta)\bigr) \; .\]

Alternatively, one can map an entire observed pixel into the source plane and compute the average surface brightness within the mapped area. This approach, which resemble the drizzle algorithm, is computationally more expensive and, in practice, can be adopted only for a top-hat interpolation kernel.

In either case, if we consider the vector of observed pixels $\{ f_m \}$ in the grid, we can immediately see that their surface brightness is linearly related to the weights $\{ c_n \}$, so that we can write

\[f_m \equiv f(\theta_m) = \sum_m L_{mn} c_n \; ,\]

where

\[L_{mn} = b_n\bigl( \beta(\theta_m) \bigr) \; \]

is a matrix that encodes both the lensing and the interpolation algorithm (or basis functions) used. Note that the properties of $L$ depend on the choice adopted for the source, as this matrix will generally be sparse for pixelated sources, but dense for basis functions such as wavelets.

Point spread function

The PSF will act as a convolution operator $P$ on the lensing operator $L$. As a result, the vector of observed surface brightness can be computed as

\[f = P L \, c \equiv M c \; .\]

Note that, in order to be able to perform a PSF convolution with output $f$ on a given grid, it is necessary to compute $L c$ on an area larger than $f$ by the size of the kernel.

Likelihood of extended sources

Say we have an observation $\{ \hat f_m \}$ covering a given area of the sky, together with associated inverse variances $\{ h_m \}$. Assuming the measurement process is Gaussian and the pixels are uncorrelated, we have

\[\log P\bigl(\{ \hat f_m\} \bigm| \{ c_n \} \bigr) = \sum_m \frac{1}{2} \log \frac{h_m}{2\pi} - \frac{h_m}{2} (\hat f_m - M_{mn} c_n)^2 \; .\]

We can rewrite the equation above in matrix notation as

\[\log P(\hat f \mid c) = \frac{1}{2} \log \biggl| \frac{H}{2\pi} \biggr| - \frac{1}{2} (\hat f - M c)^T H (\hat f - M c) \; ,\]

where $H = \mathrm{diag}(h_1, h_2, \dots)$.

Regularizations

From a Bayesian point of view, we intend to (i) perform an inference on the source, in order to study its properties, and (ii) marginalize over its flux, so we can use the associated evidence (marginalized likelihood) to perform inference on the lens properties.

Therefore, for both tasks we will need to define a prior over the source parameters $\{ c_n \}$. For reasons that will be clear below, we elect to use for this prior a multivariate normal distribution characterized by inverse covariance matrix $\lambda R$ (where $\lambda$ is a positive hyper-parameter), and by mean $r$:

\[\log P(c \mid \lambda) = \frac{1}{2} \log \biggl| \frac{\lambda R}{2\pi} \biggr| - \frac{\lambda}{2} (c - r)^T R (c - r) \; .\]

Often, we will use a vanishing mean $r$ for the prior distribution.

As a possible choice for pixelated sources, one can adopt for $R$ a matrix representing a discrete Laplacian operator.

Inference

The posterior distribution for the source parameters can be written as

\[\log P(c \mid \hat f, \lambda) = \log P(\hat f \mid c) + \log P(c \mid \lambda) - \log Z \; ,\]

where the evidence $Z$ is

\[Z = \int P(\hat f \mid c) \, P(c \mid \lambda) \, \mathrm{d}c \; .\]

Since all distributions are Gaussians, we can compute all quantities analytically. We have

\[2 \log P(c \mid \hat f, \lambda) = \log \biggl| \frac{H}{2\pi} \biggr| - (\hat f - M c)^T H (\hat f - M c) + \log \biggl| \frac{\lambda R}{2\pi} \biggr| - \lambda (w - r)^T R (w - r) - 2 \log Z\; .\]

In terms of $c$, the above expression can be interpreted as a Gaussian with covariance

\[W = M^T H M + \lambda R \; ,\]

and with average

\[c_0 = W^{-1} \bigl( \hat M^T H \hat f + \lambda R r\bigr) \; .\]

The vector $c_0$ represents the best-fit estimate for the source parameters, while the matrix $W$ is the inverse covariance of the posterior distribution (and, therefore, can be used to estimate the statistical errors on $c_0$). For the normalizing factor $Z$ (evidence) we find

\[\begin{align*} 2 \log Z = & \log \biggl| \frac{H}{2\pi} \biggr| + \log \biggl| \frac{\lambda R}{2\pi} \biggr| - \log \biggl| \frac{W}{2\pi} \biggr| \notag\\ & {} - (\hat f - M c_0)^T H (\hat f - M c_0) - \lambda (c_0 - r)^T R (c_0 - r) \; . \end{align*}\]

$Z$ is very relevant, as it can be used to perform inference on the lens parameters. Note how $-2 \log Z$ is approximately equivalent to a $\chi^2$ estimate of the model, with a number of relevant differences:

The evidence includes already the penalties linked to the prior on the base coefficients (matrix $R$ and possible mean $r$);
$\log |W / 2 \pi|$ introduces an extra term in the log-evidence. This term is important, as it takes into account the complicated effects of gravitational lensing and PSF smearing;
$\log |\lambda R / 2 \pi|$ in the log-evidence correctly takes into account the freedom in the choice of the source-parameter coefficients. This term allows us to perform inferences with a variable number of parameters in the source description (such as a variable number of pixels for a pixelated source).

Implementation

Rather than computing all equations in the form above, it might be convenient to exploit the properties of the specific problem to speed up the computation.

Let us define the matrix $M'$ as

\[M'_{mn} \equiv \sqrt{h_n} M_{mn} \; ,\]

so that $M^T H M = M'^T M'$, and analogously

\[\hat f'_n \equiv \sqrt{h_n} \hat f_n \; .\]

Suppose also that we can write $R$ as $R = Q^T Q$: in general, this can be done by computing the Cholevsky decomposition $R$, but in fact it is often not necessary, as many regularization operators are naturally written as $c^T R c = \| Q c \|^2 = c^T Q^T Q c$. We define the block matrix

\[\bar M = \begin{bmatrix} M' \\ \sqrt{\lambda} Q \end{bmatrix} \; ,\]

and analogously the block vector

\[v = \begin{bmatrix} \hat f' \\ \sqrt{\lambda} r \end{bmatrix} \; .\]

This way we have

\[W = \bar M^T \bar M \; ,\]

and

\[c_0 = (\bar M^T \bar M)^{-1} \bar M^T v = \bar M^+ v \; ,\]

where $\bar M^+$ is the Moore-Penrose inverse of $\bar M$. Finally, for the evidence we have

\[2 \log Z = \log \biggl| \frac{H}{2\pi} \biggr| + \log \biggl| \frac{\lambda R}{2\pi} \biggr| - \log \biggl| \frac{W}{2\pi} \biggr| - \bigl\| \bar M c_0 - v \bigr\|^2 \; .\]

Computational complexity

It can be useful to estimate the computational complexity of this algorithm. The matrix $\bar M$ involved in the analysis has size $(m + n, n)$, where $m$ is the number of pixels of the image, and $n$ is the number of base functions used to model the source. To build this matrix, we need to perform the following operations

Compute the lens mapping on a set of $m$ sky directions;
Estimate for each of the $n$ base function its value for the $m$ sky directions;
Perform $n$ convolutions involving $m$ output pixels;

The regularization part of $\bar M$ can be computed only once during the analysis, if we assume that the base function does not change.

For the implementation of the algorithm, we also need to compute $W = \bar M^T \bar M$, calculate $c_0 = \bar M^+ v$, and estimated the log-determinant of $W$.

Multiple sources

In case of multiple extended sources, we need to consider two separate cases: sources generating overlapping images, and sources generating non-overlapping ones. Note that this definition refers to the sky area needed to compute the image surface brightness before the PSF convolution: therefore, two sources with two close non-overlapping mask for their images, will be considered as overlapping if the PSF-enlarged masks overlap.

By adding the transitivity property, the overlap property is an equivalence relation, and therefore we can divide an arbitrary set of sources in classes of overlapping ones.

Non-overlapping sources

Non-overlapping sources (or, better, two distinct equivalence classes for the overlapping relation) can be modeled separately: their fluxes can be independently changed with no mutual influence. This is computationally very convenient, as it allow us to reduce the dimensionality of the various matrices involved in the analysis.

Overlapping sources

In case of overlapping sources, we can just model the predicted flux as the sum of various terms, one for each source. In practice, this can be carried out by defining a global response matrix $M$ composed of the individual response matrices $\{ M^{(s)} \}$ for the various sources:

\[M = \begin{bmatrix} M^{(1)} & M^{(2)} & \dots & M^{(S)} \end{bmatrix} \; ,\]

and, analogously, for $c$

\[c = \begin{bmatrix} c^{(1)} \\ c^{(2)} \\ \vdots \\ c^{(S)} \end{bmatrix} \; .\]

Note that, for overlapping sources, the various matrices $M^{(s)}$ should be computed by using, as image area, the union of all image masks, and not only the mask associated to the images of the source $s$. This way, the resulting $\bar M$ matrix can be written as

\[\bar M = \begin{bmatrix} M'^{(1)} & M'^{(2)} & \dots & M'^{(S)} \\ \sqrt{\lambda^{(1)}} Q^{(1)} & 0 & \dots & 0 \\ 0 & \sqrt{\lambda^{(2)}} Q^{(2)} & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & \sqrt{\lambda^{(S)}} Q^{(S)} \end{bmatrix} \; ,\]

while the data vector is

\[v = \begin{bmatrix} \hat f' \\ \sqrt{\lambda^{(1)}} r^{(1)} \\ \sqrt{\lambda^{(2)}} r^{(2)} \\ \vdots \\ \sqrt{\lambda^{(S)}} r^{(S)} \\ \end{bmatrix} \; .\]

We stress that, as discussed above, the vector $\hat f$ (and, similarly, $\hat f'$) must include all pixels affected by any of the images produced by the $S$ sources; similarly, all matrices $M'^{(s)}$ will be computed for all pixels. Note how, in the formulae above, we have left as a possibility the use of source-specific regularization factors $\{ \lambda^{(s)} \}$.

For overlapping sources, we will need to compute the Moore–Penrose inverse of the block matrix $\bar M$. To perform this task, we can use specific algorithms