GPU acceleration

If your computer includes a suitable GPU, the code can use it to speed-up certain operations, most notably linked to the computation of lens-mapping for a large number of directions (points). This is, for example, required for the inversion of the lens-map when using a fine grid (see Gravity.lensunmap), or for the analysis of extended image.

Gravity.has_processing_unit — Function

has_processing_unit(Val(:something))

Return true if the requested processing unit is available.

Without extensions, this function returns true only for Val(:CPU), and false in all other cases.

Extensions should overload this function by adding definitions such as

has_processing_unit(::Val{:CUDA}) = CUDA.funtional()

source

GPU accelerations are enabled by loading the associated package, as explained in the following pages.

CUDA

In order to use a CUDA-enabled GPU, the user should include the CUDA package in its environment. That can be just performed with the command

julia> using CUDA

which can executed either before or after the using Gravity command. This way one can use various CUDA-based versions of standard routines. Note that after this command, if a CUDA GPU is available, one has

Gravity.has_processing_unit(Val(:CUDA)) == true

Lens mapping

In order to perform a batch lens mapping on the GPU for an array of points, on calls

Gravity.lensmap(lenssys, ς, θs, Val(:CUDA); kw...)

Internally, this calls GravityCUDAExt.cuda_lensmap. The argument θs can be either an array (of any dimension), or a tuple (xrange, yrange), in which case the result will be a 2D array (matrix) of points.

GravityCUDAExt.cuda_deflection! — Function

cuda_deflection!(α̂s, lens, xs; kw...)

Compute in place the deflection (i.e., the quantity α̂) for an array of points.

This function uses the CUDA GPU capabilities to compute α̂s .= deflect.(lens, xs). The same method can be applied to a LensPlane, in which case the deflection is the sum of the deflections of all lenses in the plane. Both α̂s and xs must be identical arrays of SPoints.

The keyword arguments are passed to CUDA.cufunction and can be used to control the compilation of the associated kernel cuda_deflect_kernel!. By default, they are always_inline=true and fastmath=true.

If no CUDA GPU is available, this function will fall back to Gravity.deflection!.

Info

Note that this is a flexible function: each argument can be a standard (host-based) one, or a device-based one, generally obtained through the use of cu(arr) for arrays, or as Ref(obj) for scalaro bjects. Note also that, for efficiency reasons, objects are generally converted to use Float32 quantities instead of Float64 (as generally done also by cu). The function will generate appropriate code to perform all necessary conversions (at the expense, however, of CPU time).

source

GravityCUDAExt.cuda_deflect_kernel! — Function

cuda_deflect_kernel!(ys, l, xs)

CUDA kernel to compute ys .= deflect.(l, xs).

The paramete l can be both a Gravity.SingleLens or a LensPlane, while xs must be an array of SPoints.

source

GravityCUDAExt.cuda_lensmap! — Function

cuda_lensmap!(ys, sys, ς, xs; kw...)
cuda_lensmap!(ys, sys, ς, grid; kw...)

Compute in place the lens mapping for an array of points.

This function uses the CUDA GPU capabilities to to compute ys .= lensmap.(sys, ς, xs). Both ys and xs must be identical arrays of SPoints. Alternatively, one can pass a grid as a tuple of two ranges, as in (-8:0.1:8, -6:0.1:6). The use of a grid is generally convenient, as it avoids the transfer and conversion of a large memory chunk. In this case, of course, ys must be a two-dimensional array (i.e., a matrix).

The keyword arguments are passed to CUDA.cufunction and can be used to control the compilation of the associated kernel cuda_lensmap_kernel!. By default, they are always_inline=true and fastmath=true.

If no CUDA GPU is available, this function will fall back to Gravity.lensmap.

Info

Warning

Contrary to lensmap!, this function does not modify the array xs, but only the array ys. Note also the different meaning of the arguments passed to these two functions.

source

GravityCUDAExt.cuda_lensmap — Function

cuda_lensmap(ys, sys, ς, xs; kw...)
cuda_lensmap(ys, sys, ς, grid; kw...)

Compute the lens mapping for an array of points.

This function uses the CUDA GPU capabilities to to compute lensmap.(sys, ς, xs). The parameter xs must be an array of SPoints. Alternatively, one can pass a grid as a tuple of two ranges, as in (-8:0.1:8, -6:0.1:6). The use of a grid is generally convenient, as it avoids the transfer and conversion of a large memory chunk.

Note that the results is provided as a device array and, for efficiency reasons, is not transfered to the host. In case one needs a standard array, it is sufficient to call Array (see the example below).

If no CUDA GPU is available, this function will fall back to Gravity.lensmap.

Info

Example

julia> ls = Gravity.convert_obj(Float32, rand(LensSystem));

julia> sc = SourceCoeff(1.2f0, ls);

julia> Array(cuda_lensmap(ls, sc, (-8f0:0.1f0:8f0, -5f0:0.1f0:5f0)))

source

GravityCUDAExt.cuda_lensmap_kernel! — Function

cuda_lensmap_kernel!(ys, sys, ς, xs)
cuda_lensmap_kernel!(ys, sys, ς, grid)

CUDA kernel to compute ys .= lensmap.(sys, ς, xs).

The kernel operates independently on each point xs, which must be an array of SPoints. The result is stored in ys. Alternatively, it is possible to pass a grid as a tuple in the format ((x₁, Δx, Nx), (y₁, Δy, Ny)), i.e. by indicating the first value, the step, and the number of points for both axes.

Typically, the use of this kernel is computationally efficient to use it when xs (or, equivalently, grid) is a large enough (typically, with more than ~100 points).

source

Grid inversion

Grid inversion is necessary to find a starting point for iterative algorithms used to solve the lens equation. Internally, this task is carried out by computing the lens mapping at the corners of grid cells, and by finding all polygons that intercept a given source in the source-plane. This task is carried out by Gravity.inversegridpoints.

When this function is called with a CuArray as third argument (the mapped points), it triggers a call to a CUDA-based version, based on GravityCUDAExt.cuda_inversegridpoints.

GravityCUDAExt.cuda_inversegridpoints — Function

cuda_inversegridpoints(sys, ς, y0, xs [, ys]; kw...)
cuda_inversegridpoints(sys, ς, y0, grid [, ys]; kw...)

Compute the points of a lensmap that contain y0.

This function uses the CUDA GPU capabilities to to compute the approximate inverse of a lensmap, i.e. the counter-images of y0. To this purpose, it uses a regular grid (matrix) of points xs and the associated lens-mapped points ys, which if not provided are computed as ys = cuda_lensmap(sys, ς, xs). Alternatively, instead of xs a one can pass a grid as a tuple of two ranges, as in (-8:0.1:8, -6:0.1:6). The use of a grid is generally convenient, as it avoids the transfer and conversion of a large memory chunk.

Info

Example

julia> ls = Gravity.convert_obj(Float32, rand(LensSystem));

julia> sc = SourceCoeff(1.2f0, ls);

julia> xs = (-8f0:0.1f0:8f0, -5f0:0.1f0:5f0);

julia> ys = cuda_lensmap(ls, sc, xs);

julia> cuda_inversegridpoints(SPoint(1f0, 0.5f0), xs, ys)
3-element Vector{StaticArraysCore.SVector{2, Float32}}:
 [1.3913703, -0.32566312]
 [1.3346648, -0.40849888]
 [1.7395462, 0.67141294]

source

GravityCUDAExt.cuda_inversegrid_kernel! — Function

cuda_inversegrid_kernel(idx, sols, y0, xs, ys)
cuda_inversegrid_kernel(idx, sols, y0, grid, ys)

Kernel to identify all points of a mapped grid that contain y0.

This function scan all mapped quadrilaterals of ys (which must be a matrix of SPoints) that contain y0 and returns the approximate associated x coordinates. The found locations are saved in the vector sols (which must be pre-allocated with a large-enough length); the corresponding index value idx (a pointer to a UInt32) is increased.

Alternatively, this kernel also accepts a grid as a tuple in the format ((x₁, Δx, Nx), (y₁, Δy, Ny)), i.e. by indicating the first value, the step, and the number of points for both axes.

source

Convolutions

CUDA-based convolutions are integrated in the Gravity convolution algorithms by registering them as convolution methods in the Gravity.convolution_direct_methods and Gravity.convolution_indirect_methods. The method implemented are

:cuda_direct, a CUDA implementation of the :binned convolution technique;
:cuda_fft, FFT based;
:cuda_fft2, FFT based, using power of two grid sizes.

GravityCUDAExt.cuda_convolve_direct! — Function

cuda_convolve_direct!(a_out, a_in, kernel)

CUDA-based direct convolution.

The convolution is performed using a standard algorithm, accelerated by CUDA. The arguments can be standard arrays, OffsetArrays, or CUDA-converted (offset)arrays.

This function also supports undersampled results, where a_out has a size smaller than what prescribed by Gravity.convolution_output_axes. In this case, the final result is binned using Gravity.blocksum!

source

GravityCUDAExt.cuda_convolve_direct_prepare — Function

cuda_convolve_direct_prepare(a_out, a_in, kernel)

Returns a convolution function for a fixed kernel.

This function just creates a closure which, when called with arguments a_out and a_in, performs the convolution of a_in with kernel. This technique is slightly faster than cuda_convolve_direct! and avoids large allocations.

source

GravityCUDAExt.cuda_convolve_direct_kernel! — Function

cuda_convolve_direct_kernel!(out, A, filter, indices)

CUDA kernel for direct convolutions.

The kernels works at each out point independently and compute the convolution of A and filter. The indices must be CartesianIndices representing the output pixel moved to the A array. In case of no-binning, these can just be computed as CartesianIndices(out).

This kernel requires a shared memory of size lebgth(filter) * sizeof(eltype(filter)).

source