Measurement errors
Point-like sources
Image measurements are at the core of any lensing inversion. In the code we assume everywhere that point-like image measurements have Gaussian errors: more precisely, we write the probability to observe an image at the position $\hat\theta$ given that its true position is $\theta$ as a bi-variate normal distribution with a given known precision matrix $\mathbf\Theta$ (this matrix is the inverse of the covariance matrix, a $2 \times 2$ symmetric matrix representing the measurement errors):
\[P(\hat\theta \mid \theta) = \sqrt{\biggl| \frac{\mathbf\Theta}{2\pi} \biggr|} \exp\left[ -\frac{1}{2} (\hat\theta - \theta)^T \mathbf\Theta (\hat\theta - \theta) \right] \; .\]
We also assume that all image measurements are independent: as a result, when computing the joined probability to observe a set of images $\{ \hat\theta_i \}$ given the corresponding predictions $\{ \theta_i \}$, we can just write
\[P(\{ \hat\theta_i \} \mid \{ \theta_i \}) = \prod_{i=1}^I \sqrt{\biggl| \frac{\mathbf\Theta_i}{2\pi} \biggr|} \exp\left[ -\frac{1}{2} (\hat\theta_i - \theta_i)^T \mathbf\Theta_i (\hat\theta_i - \theta_i) \right] \; .\]
This is effectively the likelihood generally used in the code.
Luminous sources
In case where we also measure the images' magnitudes $\{\hat m_i\}$ and their associated inverse variances $\{\lambda_i\}$ we can easily add the related constraints to the previous equations. The likelihood associated to the magnitudes is simply
\[P(\{ \hat m_i \} \mid \{ m_i \}) = \prod_{i=1}^I \sqrt{\frac{\lambda_i}{2\pi}} \exp\left[ -\frac{1}{2} \lambda_i (\hat m_i - m_i) \right] \; .\]
The observed magnitudes are related to the original (unlensed) one $M$ through the relation
\[m_i = M - 2.5 \log_{10} \bigl\lvert A_i^{-1} \bigr\rvert \equiv M - \mathit{LM}_i \; ,\]
where we have called $\mathit{LM}_i \equiv 2.5 \log_{10} \bigl\lvert A_i^{-1} \bigr\rvert$ the lensing modulus, a quantity that indicates to the change in magnitude associated to the lensing magnification. With this definition we can write the likelihood as
\[P \bigl(\{ \hat m_i \} \mid M \bigr) = \prod_{i=1}^I \sqrt{\frac{\lambda_i}{2\pi}} \exp\left[ -\frac{1}{2} \lambda_i \bigl(\hat m_i + \mathit{LM}_i - M \bigr)^2 \right] \; .\]
Elliptical sources
As a third source type, we consider sources with an elliptical profile. These sources can be characterized using the quadrupole moment of their light distribution: for a source with semi-axes $a$ and $b$ and position angle $\varphi$ counted clockwise from the top (North to West in astronomical sense), the quadrupole moment is
\[Q = \begin{pmatrix} a^2 \sin^2 \varphi + b^2 \cos^2 \varphi & (a^2 - b^2) \sin \varphi \cos \varphi \\ (a^2 - b^2) \sin \varphi \cos \varphi & a^2 \cos^2 \varphi + b^2 \sin^2 \varphi \end{pmatrix}\]
Note how the quadrupole moment is a symmetric and positive-definite matrix. We take quadrupole moment measurements to be distributed according to a Wishart distribution, with probability density function given by
\[P(\hat Q \mid \mathbf{W}, \nu) = \frac{|\mathbf{W}|^{\nu/2}}{2^{\nu p/2} \Gamma_p(\nu/2)} |\hat Q|^{(\nu-p-1)/2} \exp\bigl[ - \mathrm{tr}(\mathbf{W} \hat Q) / 2\bigr] \; ,\]
where $p = 2$ is the dimensionality of the space, $\nu > p + 1$ is a real number, and $\mathbf{W}$ is a symmetric, positive definite matrix. The equation above also uses $\Gamma_p$, i.e. the multivariate Gamma function. For the specific case $p = 2$ we have
\[\Gamma_2 (\nu/2) = \pi 2^{2-\nu} \Gamma(\nu-1) \; .\]
Therefore, for $p = 2$ the probability distribution function simplifies into
\[P(\hat Q \mid \mathbf{W}, \nu) = \frac{|\mathbf{W}|^{\nu/2}}{4 \pi \Gamma(\nu-1)} |\hat Q|^{(\nu-3)/2} \exp\bigl[ - \mathrm{tr}(\mathbf{W} \hat Q) / 2 \bigr] \; .\]
The mode of this distribution is $(\nu - p - 1) \mathbf{W}^{-1} = (\nu-3) \mathbf{W}^{-1}$, while the average is $\nu \mathbf{W}^{-1}$. As usual, we will assume all measurements independent, so that the join distribution for various elliptical measurements can be obtained as a product of terms such as the one written above. Note that here we use the canonical parameter $\mathbf{W}$ instead of the more usual choice $\mathbf{V} \equiv \mathbf{W}^{-1}$.
A quadrupole measurement is generally given in terms of a measured shape, often given in terms of the semi-axes ($a$ and $b$) and of a position angle ($\theta$), together with their associated errors ($\sigma_a$, $\sigma_b$, $\sigma_\theta$). A quadrupole moment $\hat Q$ can be easily built using the set $\{ a, b, \theta \}$ from the equations above. We will see below how we model the intrinsic quadrupole of a source and its relation to the $\mathbf{W}$ matrix. Regarding $\nu$, it is easy to show that, to first order, one has
\[\frac{\sigma_a}{a} = \frac{\sigma_b}{b} = \frac{2}{\sqrt{\nu}} \; ,\]
where $\sigma_a$ and $\sigma_b$ are the measurement uncertainties on the semi-axes $a$ and $b$.
Time-delays
In some case we have at our disposal time-delays for the multiple images of a point source, and we want to take advantage of this information. This is particularly useful for cosmological measurements, in particular the ones related to the Hubble's constant.
The situation here, from a statistical point of view, is virtually identical to the case of magnitude measurements. Specifically, suppose we measure the images' time-delays $\{ \hat t_i\}$ and their associated inverse precisions $\tau_i$. Note that, typically, time-delays are given as differences of arrival-time with respect to a source. In Gravity.jl we assume that all images of a given source are given a time-delay: this, in practice, means that one usually sets the measured time delay of the reference image to zero with a very small error (i.e., a very large precision $\tau_i$).
The computation of time-delays requires the knowledge of Fermat's potential, which for many lens models is a relatively expensive operation. From a statistical point of view, however, all equations for time-delay measurements reflect the ones for magnitude measurements. The likelihood can be written as
\[P(\{ \hat t_i \} \mid \{ t_i \}) = \prod_{i=1}^I \sqrt{\frac{\tau_i}{2\pi}} \exp\left[ -\frac{1}{2} \tau_i (\hat t_i - t_i) \right] \; ,\]
with the expected time delays $\{ t_i \}$ written as
\[t_i = T + T_i \; ,\]
where we have called $T_i$ the time-delay function for the $i$-th image. The likelihood then can written as
\[P \bigl(\{ \hat t_i \} \mid T \bigr) = \prod_{i=1}^I \sqrt{\frac{\tau_i}{2\pi}} \exp\left[ -\frac{1}{2} \tau_i \bigl(\hat t_i - T_i - T \bigr)^2 \right] \; .\]
Log-likelihoods
Generally, we will be dealing with the log-likelihood, which therefore can be written as
\[\log P(\{ \hat\theta_i \} \mid \{ \theta_i \}) = \sum_{i=1}^I \biggl[ \frac{1}{2} \log \biggl| \frac{\mathbf\Theta_i}{2\pi} \biggr| - \frac{1}{2} (\hat\theta_i - \theta_i)^T \mathbf\Theta_i (\hat\theta_i - \theta_i) \biggr] \; ,\]
for the positions,
\[\log P \bigl(\{ \hat m_i \} \mid M \bigr) = \sum_{i=1}^I \biggl[ \frac{1}{2} \log \frac{\lambda_i}{2\pi} - \frac{1}{2} \lambda_i \bigl(\hat m_i + \mathit{LM}_i - M \bigr)^2 \biggr] \; ,\]
for the magnitudes, and
\[\log P \bigl(\{ \hat Q_i \} \mid \{ \mathbf{W}_i \}, \{ \nu_i \} \bigr) = \sum_{i=1}^I \biggl[ \frac{\nu_i}{2} \log |\mathbf{W}_i| - \log \bigl[ 4 \pi \Gamma(\nu_i - 1) \bigr] + \frac{\nu_i - 3}{2} \log |\hat Q_i| - \frac{1}{2} \mathrm{tr} (\mathbf{W}_i \hat Q_i) \biggr] \; .\]
for the quadrupole moments.
In a maximum-likelihood approach, in many cases one can safely ignore the normalizing constants in the expressions above, corresponding in all cases to the first two terms inside the summations, as their values typically depends only on the measurements errors associated to each observed image, and not on the predicted images. In a Bayesian approach, instead, the normalizing constants can be relevant, especially when a marginalization over the source parameters is performed.
When dealing with these expressions, we need to consider various issues
How are the predicted images $\{ \theta_i \}$ computed?
How do we associate an observed image to the corresponding predicted one?
What do we do if the number of predicted images differs from the number of observed ones?
We will consider all these issues below.