GPU acceleration

If your computer includes a suitable GPU, the code can use it to speed-up certain operations, most notably linked to the computation of lens-mapping for a large number of directions (points). This is, for example, required for the inversion of the lens-map when using a fine grid (see Gravity.lensunmap), or for the analysis of extended image.

Gravity.has_processing_unitFunction
has_processing_unit(Val(:something))

Return true if the requested processing unit is available.

Without extensions, this function returns true only for Val(:CPU), and false in all other cases.

Extensions should overload this function by adding definitions such as

has_processing_unit(::Val{:CUDA}) = CUDA.funtional()
source

GPU accelerations are enabled by loading the associated package, as explained in the following pages.

CUDA

In order to use a CUDA-enabled GPU, the user should include the CUDA package in its environment. That can be just performed with the command

julia> using CUDA

which can executed either before or after the using Gravity command. This way one can use various CUDA-based versions of standard routines. Note that after this command, if a CUDA GPU is available, one has

Gravity.has_processing_unit(Val(:CUDA)) == true

Lens mapping

In order to perform a batch lens mapping on the GPU for an array of points, on calls

Gravity.lensmap(lenssys, ς, θs, Val(:CUDA); kw...)

Internally, this calls GravityCUDAExt.cuda_lensmap. The argument θs can be either an array (of any dimension), or a tuple (xrange, yrange), in which case the result will be a 2D array (matrix) of points.

GravityCUDAExt.cuda_deflection!Function
cuda_deflection!(α̂s, lens, xs; kw...)

Compute in place the deflection (i.e., the quantity α̂) for an array of points.

This function uses the CUDA GPU capabilities to compute α̂s .= deflect.(lens, xs). The same method can be applied to a LensPlane, in which case the deflection is the sum of the deflections of all lenses in the plane. Both α̂s and xs must be identical arrays of SPoints.

The keyword arguments are passed to CUDA.cufunction and can be used to control the compilation of the associated kernel cuda_deflect_kernel!. By default, they are always_inline=true and fastmath=true.

If no CUDA GPU is available, this function will fall back to Gravity.deflection!.

Info

Note that this is a flexible function: each argument can be a standard (host-based) one, or a device-based one, generally obtained through the use of cu(arr) for arrays, or as Ref(obj) for scalaro bjects. Note also that, for efficiency reasons, objects are generally converted to use Float32 quantities instead of Float64 (as generally done also by cu). The function will generate appropriate code to perform all necessary conversions (at the expense, however, of CPU time).

source
GravityCUDAExt.cuda_lensmap!Function
cuda_lensmap!(ys, sys, ς, xs; kw...)
cuda_lensmap!(ys, sys, ς, grid; kw...)

Compute in place the lens mapping for an array of points.

This function uses the CUDA GPU capabilities to to compute ys .= lensmap.(sys, ς, xs). Both ys and xs must be identical arrays of SPoints. Alternatively, one can pass a grid as a tuple of two ranges, as in (-8:0.1:8, -6:0.1:6). The use of a grid is generally convenient, as it avoids the transfer and conversion of a large memory chunk. In this case, of course, ys must be a two-dimensional array (i.e., a matrix).

The keyword arguments are passed to CUDA.cufunction and can be used to control the compilation of the associated kernel cuda_lensmap_kernel!. By default, they are always_inline=true and fastmath=true.

If no CUDA GPU is available, this function will fall back to Gravity.lensmap.

Info

Note that this is a flexible function: each argument can be a standard (host-based) one, or a device-based one, generally obtained through the use of cu(arr) for arrays, or as Ref(obj) for scalaro bjects. Note also that, for efficiency reasons, objects are generally converted to use Float32 quantities instead of Float64 (as generally done also by cu). The function will generate appropriate code to perform all necessary conversions (at the expense, however, of CPU time).

Warning

Contrary to lensmap!, this function does not modify the array xs, but only the array ys. Note also the different meaning of the arguments passed to these two functions.

source
GravityCUDAExt.cuda_lensmapFunction
cuda_lensmap(ys, sys, ς, xs; kw...)
cuda_lensmap(ys, sys, ς, grid; kw...)

Compute the lens mapping for an array of points.

This function uses the CUDA GPU capabilities to to compute lensmap.(sys, ς, xs). The parameter xs must be an array of SPoints. Alternatively, one can pass a grid as a tuple of two ranges, as in (-8:0.1:8, -6:0.1:6). The use of a grid is generally convenient, as it avoids the transfer and conversion of a large memory chunk.

Note that the results is provided as a device array and, for efficiency reasons, is not transfered to the host. In case one needs a standard array, it is sufficient to call Array (see the example below).

The keyword arguments are passed to CUDA.cufunction and can be used to control the compilation of the associated kernel cuda_lensmap_kernel!. By default, they are always_inline=true and fastmath=true.

If no CUDA GPU is available, this function will fall back to Gravity.lensmap.

Info

Note that this is a flexible function: each argument can be a standard (host-based) one, or a device-based one, generally obtained through the use of cu(arr) for arrays, or as Ref(obj) for scalaro bjects. Note also that, for efficiency reasons, objects are generally converted to use Float32 quantities instead of Float64 (as generally done also by cu). The function will generate appropriate code to perform all necessary conversions (at the expense, however, of CPU time).

Example

julia> ls = Gravity.convert_obj(Float32, rand(LensSystem));

julia> sc = SourceCoeff(1.2f0, ls);

julia> Array(cuda_lensmap(ls, sc, (-8f0:0.1f0:8f0, -5f0:0.1f0:5f0)))
source
GravityCUDAExt.cuda_lensmap_kernel!Function
cuda_lensmap_kernel!(ys, sys, ς, xs)
cuda_lensmap_kernel!(ys, sys, ς, grid)

CUDA kernel to compute ys .= lensmap.(sys, ς, xs).

The kernel operates independently on each point xs, which must be an array of SPoints. The result is stored in ys. Alternatively, it is possible to pass a grid as a tuple in the format ((x₁, Δx, Nx), (y₁, Δy, Ny)), i.e. by indicating the first value, the step, and the number of points for both axes.

Typically, the use of this kernel is computationally efficient to use it when xs (or, equivalently, grid) is a large enough (typically, with more than ~100 points).

source

Grid inversion

Grid inversion is necessary to find a starting point for iterative algorithms used to solve the lens equation. Internally, this task is carried out by computing the lens mapping at the corners of grid cells, and by finding all polygons that intercept a given source in the source-plane. This task is carried out by Gravity.inversegridpoints.

When this function is called with a CuArray as third argument (the mapped points), it triggers a call to a CUDA-based version, based on GravityCUDAExt.cuda_inversegridpoints.

GravityCUDAExt.cuda_inversegridpointsFunction
cuda_inversegridpoints(sys, ς, y0, xs [, ys]; kw...)
cuda_inversegridpoints(sys, ς, y0, grid [, ys]; kw...)

Compute the points of a lensmap that contain y0.

This function uses the CUDA GPU capabilities to to compute the approximate inverse of a lensmap, i.e. the counter-images of y0. To this purpose, it uses a regular grid (matrix) of points xs and the associated lens-mapped points ys, which if not provided are computed as ys = cuda_lensmap(sys, ς, xs). Alternatively, instead of xs a one can pass a grid as a tuple of two ranges, as in (-8:0.1:8, -6:0.1:6). The use of a grid is generally convenient, as it avoids the transfer and conversion of a large memory chunk.

The keyword arguments are passed to CUDA.cufunction and can be used to control the compilation of the associated kernel cuda_deflect_kernel!. By default, they are always_inline=true and fastmath=true.

Info

Note that this is a flexible function: each argument can be a standard (host-based) one, or a device-based one, generally obtained through the use of cu(arr) for arrays, or as Ref(obj) for scalaro bjects. Note also that, for efficiency reasons, objects are generally converted to use Float32 quantities instead of Float64 (as generally done also by cu). The function will generate appropriate code to perform all necessary conversions (at the expense, however, of CPU time).

Example

julia> ls = Gravity.convert_obj(Float32, rand(LensSystem));

julia> sc = SourceCoeff(1.2f0, ls);

julia> xs = (-8f0:0.1f0:8f0, -5f0:0.1f0:5f0);

julia> ys = cuda_lensmap(ls, sc, xs);

julia> cuda_inversegridpoints(SPoint(1f0, 0.5f0), xs, ys)
3-element Vector{StaticArraysCore.SVector{2, Float32}}:
 [1.3913703, -0.32566312]
 [1.3346648, -0.40849888]
 [1.7395462, 0.67141294]
source
GravityCUDAExt.cuda_inversegrid_kernel!Function
cuda_inversegrid_kernel(idx, sols, y0, xs, ys)
cuda_inversegrid_kernel(idx, sols, y0, grid, ys)

Kernel to identify all points of a mapped grid that contain y0.

This function scan all mapped quadrilaterals of ys (which must be a matrix of SPoints) that contain y0 and returns the approximate associated x coordinates. The found locations are saved in the vector sols (which must be pre-allocated with a large-enough length); the corresponding index value idx (a pointer to a UInt32) is increased.

Alternatively, this kernel also accepts a grid as a tuple in the format ((x₁, Δx, Nx), (y₁, Δy, Ny)), i.e. by indicating the first value, the step, and the number of points for both axes.

source

Convolutions

CUDA-based convolutions are integrated in the Gravity convolution algorithms by registering them as convolution methods in the Gravity.convolution_direct_methods and Gravity.convolution_indirect_methods. The method implemented are

  • :cuda_direct, a CUDA implementation of the :binned convolution technique;

  • :cuda_fft, FFT based;

  • :cuda_fft2, FFT based, using power of two grid sizes.

GravityCUDAExt.cuda_convolve_direct!Function
cuda_convolve_direct!(a_out, a_in, kernel)

CUDA-based direct convolution.

The convolution is performed using a standard algorithm, accelerated by CUDA. The arguments can be standard arrays, OffsetArrays, or CUDA-converted (offset)arrays.

This function also supports undersampled results, where a_out has a size smaller than what prescribed by Gravity.convolution_output_axes. In this case, the final result is binned using Gravity.blocksum!

See also: cuda_convolve_direct_prepare, Gravity.convolve!, Gravity.convolve_prepare

source
GravityCUDAExt.cuda_convolve_direct_kernel!Function
cuda_convolve_direct_kernel!(out, A, filter, indices)

CUDA kernel for direct convolutions.

The kernels works at each out point independently and compute the convolution of A and filter. The indices must be CartesianIndices representing the output pixel moved to the A array. In case of no-binning, these can just be computed as CartesianIndices(out).

This kernel requires a shared memory of size lebgth(filter) * sizeof(eltype(filter)).

source