Measurement errors

Point-like sources

Image measurements are at the core of any lensing inversion. In the code we assume everywhere that point-like image measurements have Gaussian errors: more precisely, we write the probability to observe an image at the position $\hat\theta$ given that its true position is $\theta$ as a bi-variate normal distribution with a given known precision matrix $\mathbf\Theta$ (this matrix is the inverse of the covariance matrix, a $2 \times 2$ symmetric matrix representing the measurement errors):

\[P(\hat\theta \mid \theta) = \sqrt{\biggl| \frac{\mathbf\Theta}{2\pi} \biggr|} \exp\left[ -\frac{1}{2} (\hat\theta - \theta)^T \mathbf\Theta (\hat\theta - \theta) \right] \; .\]

We also assume that all image measurements are independent: as a result, when computing the joined probability to observe a set of images $\{ \hat\theta_i \}$ given the corresponding predictions $\{ \theta_i \}$, we can just write

\[P(\{ \hat\theta_i \} \mid \{ \theta_i \}) = \prod_{i=1}^I \sqrt{\biggl| \frac{\mathbf\Theta_i}{2\pi} \biggr|} \exp\left[ -\frac{1}{2} (\hat\theta_i - \theta_i)^T \mathbf\Theta_i (\hat\theta_i - \theta_i) \right] \; .\]

This is effectively the likelihood generally used in the code.

Luminous sources

In case where we also measure the images' magnitudes $\{\hat m_i\}$ and their associated inverse variances $\{\lambda_i\}$ we can easily add the related constraints to the previous equations. The likelihood associated to the magnitudes is simply

\[P(\{ \hat m_i \} \mid \{ m_i \}) = \prod_{i=1}^I \sqrt{\frac{\lambda_i}{2\pi}} \exp\left[ -\frac{1}{2} \lambda_i (\hat m_i - m_i) \right] \; .\]

The observed magnitudes are related to the original (unlensed) one $M$ through the relation

\[m_i = M - 2.5 \log_{10} \bigl\lvert A_i^{-1} \bigr\rvert \equiv M - \mathit{LM}_i \; ,\]

where we have called $\mathit{LM}_i \equiv 2.5 \log_{10} \bigl\lvert A_i^{-1} \bigr\rvert$ the lensing modulus, a quantity that indicates to the change in magnitude associated to the lensing magnification. With this definition we can write the likelihood as

\[P \bigl(\{ \hat m_i \} \mid M \bigr) = \prod_{i=1}^I \sqrt{\frac{\lambda_i}{2\pi}} \exp\left[ -\frac{1}{2} \lambda_i \bigl(\hat m_i + \mathit{LM}_i - M \bigr)^2 \right] \; .\]

Elliptical sources

As a third source type, we consider sources with an elliptical profile. These sources can be characterized using the quadrupole moment of their light distribution: for a source with semi-axes $a$ and $b$ and position angle $\varphi$ counted clockwise from the top (North to West in astronomical sense), the quadrupole moment is

\[Q = \begin{pmatrix} a^2 \sin^2 \varphi + b^2 \cos^2 \varphi & (a^2 - b^2) \sin \varphi \cos \varphi \\ (a^2 - b^2) \sin \varphi \cos \varphi & a^2 \cos^2 \varphi + b^2 \sin^2 \varphi \end{pmatrix}\]

Note how the quadrupole moment is a symmetric and positive-definite matrix. We take quadrupole moment measurements to be distributed according to a Wishart distribution, with probability density function given by

\[P(\hat Q \mid \mathbf{W}, \nu) = \frac{|\mathbf{W}|^{\nu/2}}{2^{\nu p/2} \Gamma_p(\nu/2)} |\hat Q|^{(\nu-p-1)/2} \exp\bigl[ - \mathrm{tr}(\mathbf{W} \hat Q) / 2\bigr] \; ,\]

where $p = 2$ is the dimensionality of the space, $\nu > p + 1$ is a real number, and $\mathbf{W}$ is a symmetric, positive definite matrix. The equation above also uses $\Gamma_p$, i.e. the multivariate Gamma function. For the specific case $p = 2$ we have

\[\Gamma_2 (\nu/2) = \pi 2^{2-\nu} \Gamma(\nu-1) \; .\]

Therefore, for $p = 2$ the probability distribution function simplifies into

\[P(\hat Q \mid \mathbf{W}, \nu) = \frac{|\mathbf{W}|^{\nu/2}}{4 \pi \Gamma(\nu-1)} |\hat Q|^{(\nu-3)/2} \exp\bigl[ - \mathrm{tr}(\mathbf{W} \hat Q) / 2 \bigr] \; .\]

The mode of this distribution is $(\nu - p - 1) \mathbf{W}^{-1} = (\nu-3) \mathbf{W}^{-1}$, while the average is $\nu \mathbf{W}^{-1}$. As usual, we will assume all measurements independent, so that the join distribution for various elliptical measurements can be obtained as a product of terms such as the one written above. Note that here we use the canonical parameter $\mathbf{W}$ instead of the more usual choice $\mathbf{V} \equiv \mathbf{W}^{-1}$.

A quadrupole measurement is generally given in terms of a measured shape, often given in terms of the semi-axes ($a$ and $b$) and of a position angle ($\theta$), together with their associated errors ($\sigma_a$, $\sigma_b$, $\sigma_\theta$). A quadrupole moment $\hat Q$ can be easily built using the set $\{ a, b, \theta \}$ from the equations above. We will see below how we model the intrinsic quadrupole of a source and its relation to the $\mathbf{W}$ matrix. Regarding $\nu$, it is easy to show that, to first order, one has

\[\frac{\sigma_a}{a} = \frac{\sigma_b}{b} = \frac{2}{\sqrt{\nu}} \; ,\]

where $\sigma_a$ and $\sigma_b$ are the measurement uncertainties on the semi-axes $a$ and $b$.

Time-measurements

In some case we have at our disposal time-of-flight measurements for the multiple images of a point source, and we want to take advantage of this information. In this section, for simplicity, we assume that each measurement is independent, much like measurements of the magnitude of point-like sources. This unrealistic assumption will be corrected in the next section.

The situation here, from a statistical point of view, is virtually identical to the case of magnitude measurements. Specifically, suppose we measure the images' time-delays $\{ \hat t_i\}$ and their associated inverse precisions $\tau_i$. Note that, typically, time-delays are given as differences of arrival-time with respect to a source. In Gravity.jl we assume that all images of a given source are given a time-delay: this, in practice, means that one usually sets the measured time delay of the reference image to zero with a very small error (i.e., a very large precision $\tau_i$).

The computation of time-delays requires the knowledge of Fermat's potential, which for many lens models is a relatively expensive operation. From a statistical point of view, however, all equations for time-delay measurements reflect the ones for magnitude measurements. The likelihood can be written as

\[P(\{ \hat t_i \} \mid \{ t_i \}) = \prod_{i=1}^I \sqrt{\frac{\tau_i}{2\pi}} \exp\left[ -\frac{1}{2} \tau_i (\hat t_i - t_i) \right] \; ,\]

with the expected time delays $\{ t_i \}$ written as

\[t_i = T + T_i \; ,\]

where we have called $T_i$ the time-delay function for the $i$-th image. The likelihood then can written as

\[P \bigl(\{ \hat t_i \} \mid T \bigr) = \prod_{i=1}^I \sqrt{\frac{\tau_i}{2\pi}} \exp\left[ -\frac{1}{2} \tau_i \bigl(\hat t_i - T_i - T \bigr)^2 \right] \; .\]

Notice that, in this formalism, we need to fix some source time-of-event T. Similarly to other source parameters, one will need to specify a (conjucate) prior for this quantity.

Time-delay measurements

Much more often, we have at our disposal relative measurements the time-of-flight. These are generally obtained by comparing the light curves of different images, and by finding appropriate time delays that alight the features observed in the various light curves. Clearly, this process introduces correlations among the time-delay measurements, which need to be properly taken into account.

In Gravity.jl, time-delay measurements are defined as the difference in the time-of-arrival of a given event in the light-curves of two sources. Specifically, we assume that the first image of an image family is taken as reference, and that the time delays are defined as

\[\delta_i = t_{i + 1} - t_1 \; ,\]

or, in matrix notation $\delta = \mathbf{M} t$, where

\[\mathbf{M} = \begin{pmatrix} -1 & 1 & 0 & \dots & 0 \\ -1 & 0 & 1 & \dots & 0 \\ \vdots & \vdots & \vdots & \ddots & 0 \\ -1 & 0 & 0 & \dots & 1 \end{pmatrix} \; .\]

The errors on the measured time delays $\{ \hat \delta_i \}$ are described by a $(I - 1) \times (I - 1)$ covariance matrix $\mathbf{C}$. This way, the log-likelihood for time delays can be written as

\[\log P(\{\hat\delta_i \} \mid \{ \delta_i \}) = \frac{1}{2} \log \left| \frac{\mathbf{C}^{-1}}{2\pi} \right| - \frac{1}{2} (\hat \delta - \delta)^T \mathbf{C}^{-1} (\hat \delta - \delta) \; .\]

Optionally, one can write the same expression in terms of the times $\{ t_i \}$

\[\log P(\{\hat\delta_i \} \mid \{ \delta_i \}) = \frac{1}{2} \log \left| \frac{\mathbf{C}^{-1}}{2\pi} \right| - \frac{1}{2} (\hat t - T)^T \mathbf{P} (\hat t - T) \; ,\]

where $T = (T_1, \dots, T_I)$ is the vector of the modeled time-delays, and where the precision matrix $\mathbf{P}$ is given by

\[\mathbf{P} = \mathbf{M}^T \mathbf{C}^{-1} \mathbf{M} \; .\]

Note that $\mathbf{P}$ is singular with rank $I - 1$, so the normalization term needs to be computed in terms of $\mathbf{C}^-1$. Note also how the log-probability $\log P$ does not really include directly any source parameter, but only depends on the modeled time delays $\{ T_i \}$.

Joined time-delay and luminosity measurements

For variable sources with time-delay measurements, in principle we could also measure the relative magnifications. These measurements will be correlated among themselves; additionally, the magnification measurements will also be correlated with the time-delay measurements.

In principle, this kind of combined measurements can be described by a joint covariance matrix which will describe both the relative fluxes and the time-delay. The expressions for its use is essentially identical to the ones found in the previous section.

Log-likelihoods

Generally, we will be dealing with the log-likelihood, which therefore can be written as

\[\log P(\{ \hat\theta_i \} \mid \{ \theta_i \}) = \sum_{i=1}^I \biggl[ \frac{1}{2} \log \biggl| \frac{\mathbf\Theta_i}{2\pi} \biggr| - \frac{1}{2} (\hat\theta_i - \theta_i)^T \mathbf\Theta_i (\hat\theta_i - \theta_i) \biggr] \; ,\]

for the positions,

\[\log P \bigl(\{ \hat m_i \} \mid M \bigr) = \sum_{i=1}^I \biggl[ \frac{1}{2} \log \frac{\lambda_i}{2\pi} - \frac{1}{2} \lambda_i \bigl(\hat m_i + \mathit{LM}_i - M \bigr)^2 \biggr] \; ,\]

for the magnitudes, and

\[\log P \bigl(\{ \hat Q_i \} \mid \{ \mathbf{W}_i \}, \{ \nu_i \} \bigr) = \sum_{i=1}^I \biggl[ \frac{\nu_i}{2} \log |\mathbf{W}_i| - \log \bigl[ 4 \pi \Gamma(\nu_i - 1) \bigr] + \frac{\nu_i - 3}{2} \log |\hat Q_i| - \frac{1}{2} \mathrm{tr} (\mathbf{W}_i \hat Q_i) \biggr] \; .\]

for the quadrupole moments.

In a maximum-likelihood approach, in many cases one can safely ignore the normalizing constants in the expressions above, corresponding in all cases to the first two terms inside the summations, as their values typically depends only on the measurements errors associated to each observed image, and not on the predicted images. In a Bayesian approach, instead, the normalizing constants can be relevant, especially when a marginalization over the source parameters is performed.

When dealing with these expressions, we need to consider various issues

  1. How are the predicted images $\{ \theta_i \}$ computed?

  2. How do we associate an observed image to the corresponding predicted one?

  3. What do we do if the number of predicted images differs from the number of observed ones?

We will consider all these issues below.