Section sampling

A complex compulsory section used to define the inference strategy. It has various keys, which are covered in the following subsections.

scheme

This compulsory key indicates the likelihood form of each source and associated image family. There are four modes correspond to increasing levels of analysis, in addition to a simulation mode. Note that this keyword used inside the sampling section indicates the default scheme to be used: one can always change the scheme of each source individually with an analogous scheme specification for the source.

  • sourceplane maps the observed images in the source plane and compute there the likelihood (by correctly propagating the measurement errors of the images, as discussed in the Source-plane likelihood page). This way the parameters of the source will not need to be included in the sampling, thus significantly reducing the dimensionality of the sampling space; moreover, the lens equation will not need to be inverted.

  • simplified-imageplane uses an hybrid mode: sources are still mapped to the source plane, and a fiducial source parameters (position, luminosity...) are computed by taking into account the mapped uncertainties. The sources, however, are then mapped back to the image plane, where the likelihood is computed. This scheme avoids the use of explicit source parameters in the sampling, but requires the inversion of the lens equation.

  • imageplane is the most complete (and complex) scheme. The sources are fully modeled and the corresponding images are computed with an inversion of the lens equation. The associated parameter space is generally quite large, but can be handled with the use of a suitable sampler. Sources with parameters that can be automatically inferred (in particular, all linear parameters of point-like sources, such as the magnitudes) are not included as free parameters but marginalized over.

  • extended-imageplane is equivalent to the imageplane scheme, but also add linear parameters of point-like sources. It is generally not necessary to use this mode, unless one in interested in the specific properties of the source (such as its luminosity).

  • simulation is a special scheme that can be used to perform lensing simulations. It is equivalent to the extended-imageplane one with. In this scheme one can avoid entering the images associated to each source, provided that a grid has been specified.

To speed up the convergence of the sampling algorithm, it might be useful, especially for the simplified-imageplane and imageplane schemes, to use the sampling.initialization parameter. Note that, in this case, algorithms that compute the (log) evidence might return wrong values.

likelihood-options

A dictionary of various options passed to the likelihood computation. All of them, except bayesianfactor, are only used if scheme: imageplane.

  • bayesianfactor: if true (the default), the likelihood include a normalizing Bayesian factor. It is good practice to set it to true.

  • matches: indicates the way the predicted images are matched to the observed ones. The sensible default, all, does a multiple match of each predicted image with each source. The alternative, best, matches each observed image to the closest predicted one, with or without repetitions (see below).

  • duplicates: if true (the default), a predicted image can be matched to multiple observed ones; if false, this is not possible. Generally, it is *not- a good idea to set this to false, because in this case models predicting a number of images smaller than the observed ones will all have a vanishing likelihood.

  • missedpredictionspenalty: if true (the default), there will be a penalty for models predicting more images than the observed ones. This penalty is based on a sound statistical argument.

  • flatsourcepriors: if false (the default), the code will include the source priors even for scheme: sourceplane. This will ensure that, under certain conditions, the results obtained in this scheme will match the ones obtained for scheme: imageplane. If true, the source-plane marginalization will assume flat improper priors for all source parameters. Note that this parameter is silently ignored for scheme: imageplane, since in this case proper priors for the source parameters are always required.

  • slope-numbercounts: the exponential slope of the source number-counts. It is defined as the exponent α of the power law dN/dm ∝ exp(α m) = 10^(α m / log(10)) where m is the source magnitude. The default value is zero. The standard Hubble power law for number counts requires α = 0.6 log(10).

algorithm

A compulsory option used to specify the desired kind of inference. Can be one of the values described in the following subsections.

Depending on the chosen algorithm, the final output obtained at the end of the inference will change. All MCMC algorithms (that is, all algorithms excepts maximum-likelihood and maximum-a-posteriori) will return an object of kind MCMCChain.

maximum-likelihod or maximum-a-posteriori

These are the classical maximization techniques often used in frequentist analysis. Depending on the user choice, the code generated will perform a maximization of the likelihood or of the posterior, i.e. the product of the likelihood and the prior.

The maximization is carried out using globally optimization techniques, as specified by the keyword sampling.method. Generally, these techniques are based on the use of a population of minimizers, whose size is controlled by the sampling.algorithm-options.points parameter.

Used sampling parameters: algorithm-options.points, iterations, samples, threads (only for Metaheuristics-based methods), timelimit.

Package requirements

The use of this option requires the installation of BlackBoxOptim.jl or Metaheuristics.jl, depending on the chosen sampling.method. This can be done with

julia> import Pkg; Pkg.add(["BlackBoxOptim", "Metaheuristics"])

mh

The program will use the classical Metropolis-Hastings algorithm to sample the posterior distribution. A critical aspect of this method is the choice of the proposal distribution. For each free parameter, the code will use a proposal based on the following rules:

  • If the section algorithm-options.proposal-distributions is present and contains a specification for the given parameter, that is used as a random-walk proposal: that is, the proposed new value for the parameter will be the old value plus a random value extracted from the given distribution. For this reason, it is usually sensible to use distributions with vanishing mean in this section.

  • Otherwise, if the section initialization is present and contains a specification for the given parameter, that is used as a static proposal: the proposed new value for the parameter will be just a random value extracted from the distribution, with no influence from the current value.

  • Finally, if all previous options fail, the parameter's prior distribution is used as static proposal.

Used sampling parameters: algorithm-options.proposal-distributions, distributed, initialization, iterations, samples, thinning, threads, warmup

Package requirement

The use of this option requires the installation of AdvancedMH.jl. This can be done with

julia> import Pkg; Pkg.add("AdvancedMH")

affine-invariant

The program will use an affine-invariance MCMC algorithm to obtain samples of the posterior distribution. The method is analogous to the python emcee library, and is based on a number of walkers whose number of determined by the sampling.algorithm-options.points parameter. The initial local of the walkers is determined by the sampling.initialization section.

Used sampling parameters: algorithm-options.points, distributed, initialization, iterations, samples, thinning, threads, warmup

Package requirement

The use of this option requires the installation of AdvancedMH.jl. This can be done with

julia> import Pkg; Pkg.add("AdvancedMH")

emcee

This sampler is a full re-implementation of the affine-invariance MCMC algorithm that offers several advantages: it is fully parallel, implements a full mixing of the chains, allows the use of distributions for the stretch-parameter $a$ using the stretch-scale parameter, and supports full restarts with the extendchain command.

Used sampling parameters: algorithm-options.points, algorithm-options.stretch-scale, distributed, initialization, iterations, samples, thinning, threads, warmup

ess

The program will use the Elliptical Slice Sampling algorithm to sample the posterior and generate an MCMC.

Used sampling parameters: distributed, initialization, iterations, samples, thinning, threads, warmup

Package requirement

The use of this option requires the installation of EllipticalSliceSampling.jl. This can be done with

julia> import Pkg; Pkg.add("EllipticalSliceSampling")

hmc

The posterior is sampled using an Hamiltonian Monte Carlo algorithm, a robust method based on Hamiltonian dynamics. This technique requires the use of derivatives of the posterior, which therefore must be differentiable. As a result, to avoid any sort of possible discontinuity at the border of the support of the prior, the analysis is carried out on a transformed parameter space, called the normal parameter coordinates, chosen so that the prior distribution in that coordinate system is a standard multivariate normal.

Used sampling parameters: algorithm-options.method, algorithm-options.metric, initialization, iterations, samples, thinning, threads, warmup

Package requirement

The use of this option requires the installation of AdvancedHMC.jl. This can be done with

julia> import Pkg; Pkg.add("AdvancedHMC")

dhmc

Similar to hmc, but uses a different external library (see below).

Used sampling parameters: algorithm-options.method, algorithm-options.metric, initialization, iterations, samples, thinning, threads, warmup

Package requirement

The use of this option requires the installation of DynamicHMC.jl. This can be done with

julia> import Pkg; Pkg.add("DynamicHMC")

mala

The Metropolis-adjusted Langevin MCMC algorithm is essentially a simplified version of the Hamiltonian Monte Carlo, with just one step size. It also requires differentiable parameters, and therefore also makes use of the normal parameter coordinates.

Used sampling parameters: algorithm-options.proposal-size, distributed, initialization, iterations, samples, thinning, threads, warmup

Package requirement

The use of this option requires the installation of AdvancedMH.jl. This can be done with

julia> import Pkg; Pkg.add("AdvancedMH")

nested-sampler

The Nested sampling algorithm is a powerful technique to perform Bayesian inference. Compared to the other methods, it is more robust to multimodal posteriors; moreover, it allows to estimate the Bayesian evidence.

Note that this sampler ignores the initialization section.

Used sampling parameters: algorithm-options.bounds, algorithm-options.dlogz, algorithm-options.points, algorithm-options.proposal, iterations, samples, thinning, threads, warmup

Package requirement

The use of this option requires the installation of NestedSamplers.jl. This can be done with

julia> import Pkg; Pkg.add("NestedSamplers")

pt

Uses the Pigeons module, based on a non-reversible parallel-tempering (PT) algorithm. The algorithm is highly parallel and will use threads if the configuration include a threads specification larger than unity. Note, however, that the number of threads employed does not strictly follow the threads specification, but is more an upper limit. More specifically, Gravity.jl will use k copies of the parallel-tempering algorithm, with k = threads ÷ rounds.

Used sampling parameters: algorithm-options.chains, algorithm-options.rounds, algorithm-options.variational-rounds, algorithm-options.variational-chains, algorithm-options.checkpoint, initialization, random-seed, threads, copies

ultranest

Similar to the nested-sampler MCMC algorithm, but based on the UltraNest technique. This option is available only if PyCall has been installed, together with a working version of Conda.

algorithm-options

This is an optional dictionary (sub-section) which can be used to set algorithm-specific parameters.

bounds

Only used for algorithm: nested-sampler: it indicates the algorithm used to bound the parameter volume. Possible values are

  • nobounds: no bounds on the prior volume (equivalent to a unit cube)

  • ellipsoid: bound using a single ellipsoid

  • multiellipsoid: bound using multiple ellipsoids in an optimal cluster (the default)

chains

For a parallel-tempering algorithm (algorithm: pt), it indicates the number of chains (temperatures) to use. Ideally, one should set this parameter to approximately (or more), where Λ is the global communication barrier (given during the execution of @runmodel in one of the table columns). The default is chains: 10.

A posteriori, one can check the values of the global communication barrier and other parameters by investigating the member chain.info.state.shared.report.summary in the result of @runmodel.

checkpoint

A boolean flag that can be used in a parallel-tempering algorithm (algorithm: pt) to save intermediate results of the sampling. This can be useful to monitor the progress of the sampling or to recover results in case of a failure. The data are saved under subdirectories located under results/all, and the latest results (for a single chain) is symlinked at results/latest. The data can be recovered with loadcheckpoint.

dlogz

Convergence criterium on the fraction of the remaining evidence for algorithm: nested-sampler. The default is 0.5.

explorer

For a parallel-tempering algorithm (algorithm: pt), it indicates the method to be used for the local exploration. Possible options are SliceSampler (the default), MALA, AutoMALA, and AAPS. Optionally, one can pass specific keyword arguments to these samplers, as in SliceSampler(n_passes=5). Check the Pigeons documentation for the list of valid keywords.

method

This field is only used for maximization algorithms (maximum-likelihood and maximum-a-posteriori) and for the Hamiltonian Monte Carlo one (hmc).

For maximization algorithms, two different libraries for optimization are available: Metaheuristics and BlackBoxOptim. The method field indicates which maximization method has to be used and, consequently, which optimization library. It can be one of the following (see also this dedicated page):

  • ECA: Evolutionary Centers Algorithm

  • DE: Differential Evolution

  • PSO: Particle Swarm Optimization

  • ABC: Artificial Bee Colony

  • GSA: Gravitational Search Algorithm

  • SA: Simulated Annealing

  • WOA: Whale Optimization Algorithm

  • MCCGA: Machine-coded Compact Genetic Algorithm

  • GA: Genetic Algorithm

  • εDE: ε Constrained Differential Evolution

  • adaptive_de_rand_1_bin: Adaptive Differential Evolution optimizer

  • adaptive_de_rand_1_bin_radiuslimited: Adaptive Differential Evolution optimizer with radius limited sampling

  • separable_nes: Separable Natural Evolution Strategies

  • xnes: Exponential Natural Evolution Strategies

  • de_rand_1_bin: Differential Evolution optimizer

  • de_rand_2_bin: Differential Evolution optimizer

  • de_rand_1_bin_radiuslimited: Differential Evolution optimizer with radius limited sampling

  • de_rand_2_bin_radiuslimited: Differential Evolution optimizer with radius limited sampling

  • random_search: Random search (for comparison)

  • generating_set_search: Compass/coordinate search

  • probabilistic_descent: Direct search through probabilistic descent

Note that Metaheuristics algorithms can be distinguished from the use of capital letters.

For Hamiltonian Monte-Carlo (algorithm: hmc), this field can be either static, indicating a static (fixed) number of steps for the integrator, or NUTS, for the No U-Turn Sampler; the default is to use NUTS. In case of a fixed number of steps, the number of steps can be specified using the sampling.points key.

metric

The mass metric to use for Hamiltonian Monte Carlo (algorithm: hmc). The possible values are

  • unit: a metric proportional to the identity matrix (all masses associated to the various parameters are identical)

  • diagonal: a diagonal metric (different masses for each parameter)

  • dense: full dense symmetric metric.

The metric controls the generation of the momentum random values, as explained in Neal (2011). An optimal choice is to use a metric that reproduces the covariance of the posterior distribution of the parameters. The default is to use diagonal.

points

The number of points to use for cloud algorithms (by default, 10), or the number of steps to be used for a static Hamiltonian Monte-Carlo sampler (by default, 4 times the dimensionality of the parameter space).

proposal

Only used for algorithm: nested-sampler: it indicates the algorithm used to propose new points. The other options are:

  • rejection: samples uniformly within the bounding volume

  • rwalk: random walks to a new point given an existing one

  • rstagger: random staggering away to a new point given an existing one

  • slice: slicing away to a new point given an existing one

  • rslice: random slicing away to a new point given an existing one

  • auto: the default, choose the algorithm depending on the number of free parameters ndim: rejection if ndims < 10, rwalk if 10 ≤ ndims ≤ 20 , and slice if ndims > 20

proposal-distributions

This is a section that can be used to specify the proposal distributions for all free parameters or for a subset of them. These distributions are used in the Metropolis-Hastings algorithm (algorithm: mh) to propose a new point of the Markov chain. The proposals are taken to be relative to the current point, as in a random walk: for this reason, it is usually sensible to use distributions with vanishing mean. Note also that this parameter is critical to ensure the convergence of the Metropolis-Hastings algorithm: too wide distributions here will almost certainly result in rejections of the proposed points, while too narrow ones will impede an effective sampling of the posterior.

proposal-size

The width of the Gaussian proposal used for the algorithm: mala. This width is expressed in normal parameter coordinates, and by default it is 0.1 in all directions. One can specify a single number indicating the width (standard deviation) for an isotropic multivariate normal proposal distribution, a vector of positive standard deviations for the various normal parameters, or a full covariance matrix.

rounds

Only used for algorithm: pt. Indicates the number of rounds to be used for the non-reversible parallel-tempering. At each round, the number of local exploration samples doubles. The default is rounds: 10, resulting in a chain containing 1024 samples.

stretch-scale

Only used for algorithm: emcee. It indicates the scale used for the stretch move in the algorithm. Can be a single positive number, or a univariate continuous distribution with positive support, in which case the scale is randomly set at each iteration using the provided distribution. The default is to use Exponential(5), which generally guarantees a good exploration of the parameter space.

variational-chains

Only used with algorithm: pt. If set to a positive number, it indicates that a stabilized variational parallel-tempering is requested, with the provided number of chains. This parameter is ignored if variational-rounds is zero. The default is zero.

variational-rounds

Only used with algorithm: pt. If set to a positive number, it triggers the use a variational parallel-tempering starting at the round indicated. This can significantly decrease the global communication barrier between the reference (prior) distribution and the target (posterior) distribution. The variational parallel-tempering should not be started too early, since this might lead to incorrect results; additionally, it should also be started too late, or there will be no advantage. A good compromise is to set variational-rounds to a value around 50%-75% of rounds.

The default is 0, meaning that non-variational parallel-tempering has to be used.

warmup

The number of steps to use discard, only used for MCMC algorithms. By default warmup = 0, so no point is discarded.

For the dhmc algorithm, this parameter takes the format a:b*c:d, where a are the initial steps, b the middle steps, c the doubling stages, and d the final steps for the adaptation phase. See the documentation of the package AdvancedHMC.jl for details.

initialization

This optional parameter has a special meaning. It can be either a subsection (with a list of free parameters and associated distributions) as in

initialization:
    lenses_main_NIE_σ: 220 ± 10
    lenses_main_NIE_q: 0.73 .. 0.78

or it can be a string. In case it is a subsection, the prior randomization process for the parameters listed in the subsection will use the specified distributions (note, however, that the prior probability density function will still use the original distributions). This can be useful to improve the convergence and speed-up of various sampling algorithms. This option has no effect for nested sampling.

Alternatively, this parameter can be a string. In this case, the string can refer to either a file with a chain resulting from a previous inference (in that case the file must have been created with savechain), or can refer to a global variable holding a chain. In both cases, random prior values will be obtained from a sampling of the chain. If necessary, the chain will be automatically converted appropriately with a call to imageplanechain, by adding data on the source parameters.

The format to use for a global variable is Module.var"name", where Module is the module holding the global variable named name. If Module is omitted, the variable will be searched in the module associated to the keyword gmodel in the call gravitymodel.

iterations

The total number of iterations (steps) to compute. Note that one can either specify this parameter, or the samples parameter. The two, for standard MCMC algorithms, are related:

iterations = samples $\cdot$ thinning + warmup.

The default, 100, is likely to be too small for most cases.

samples

The number of samples that will be returned by standard MCMC algorithms. This parameter is related to iterations, so one should only specify one of the two. Note that, for ensemble MCMC algorithms (such as affine-invariant or emcee), the number of samples refer to the single chain: therefore, the total number of samples obtained will be samples $\cdot$ points.

thinning

For MCMC algorithms, the chain thinning, i.e. the frequency of retained samples in the final output. For example, if set to, say, 5, only one out of 5 samples will be retained; this will generally help make the samples independent. By default, thinning: 1 so all samples are retained.

timelimit

The maximum time limit for the maximization algorithms, in seconds. If unspecified, no time limit is used.

threads

The number of threads to use for the inference. This option can be used to speed-up calculations by taking advantage of multicore CPUs. It can be a positive integer, or the string auto (the default) to indicate that the code should find the number of available threads and use them.

Note that sampling threads are disabled if the special parameter use_threads does not include sampling (by default it does).

To use it, Julia must have been launched using the -t or --threads option.

Info

The best performance of the code is obtained by following a set of simple prescritions:

  • The number of threads used should not exceed the number of free cores, to avoid resource oversubscription.

  • The number of available cores in the system can be obtained through the command length(Sys.cpu_info()); it is also printed when executing the command runmodel.

  • When working with extended images, by default threads are also used to perform a bulk lens-mapping of the pixels of the images in the source plane (as dictated by the use_threads special parameter). Since the sampling.threads parameter controls only the threads reserved for the inference, in case of large extended images it might be wise to use just a fraction of the free threads for the inference, so that the others can be used for the extended-image mapping (xsources).

copies

This parameter is used only for the PT algorithm. It can be used to specify explicitly the number of chains during the run. By default, it is set to auto: the code in this case will make as many copies of the chain as allowed by the current number of threads, by reserving at least one thread to each chain. In other words, with this setting the number of copies will be copies = threads ÷ chains.

In some cases, one might want to use a smaller or a larger number of copies. For example, if one desires to reserve threads for other purposes (grids, extended images...), as indicated by the use_threads special parameter, it might be a good idea to set copies: 1.

distributed

The number of distributed computations to use for inference. This option can be used to speed-up calculations by taking advantage of multicore CPUs or of distributed resources. To use it, Julia must have been launched using the -p or --procs option. Currently, this option is available only for the emcee and mala algorithms.

Remembering that in parallel execution one need to share the loaded code among all Julia processes, one could use the following code to run a distributed version of Gravity.

julia> @everywhere import Pkg

julia> @everywhere Pkg.activate(".")

julia> @everywhere using Gravity

julia> @everywhere model = gravitymodel("model.yml")

julia> runmodel(model)

The code above assumes that the Julia REPL has been launched with the -p or --procs flags. Alternatively, one can launch a standard Julia REPL and enter the following lines before the code above:

julia> using Distributed

julia> addprocs(4)

random-seed

If provided, the random generator is seeded with the specified value (which must be a positive integer). This will make different code execution reproducible.