API Reference¶
commondiffusion¶
boundaries.h¶
Declaration of boundary condition function prototypes.
Functions

void
apply_initial_conditions
(fp_t **conc_old, const int nx, const int ny, const int nm)¶ Initialize flat composition field with fixed boundary conditions.
The boundary conditions are fixed values of \( c_{hi} \) along the lowerleft half and upperright half walls, no flux everywhere else, with an initial values of \( c_{lo} \) everywhere. These conditions represent a carburizing process, with partial exposure (rather than the entire left and right walls) to produce an inhomogeneous workload and highlight numerical errors at the boundaries.
discretization.h¶
Warning
doxygenfile: Cannot find file “discretization.h
mesh.h¶
Declaration of mesh function prototypes for diffusion benchmarks.
Functions

void
make_arrays
(fp_t ***conc_old, fp_t ***conc_new, fp_t ***conc_lap, fp_t ***mask_lap, const int nx, const int ny, const int nm)¶ Allocate 2D arrays to store scalar composition values.
Arrays are allocated as 1D arrays, then 2D pointer arrays are mapped over the top. This facilitates use of either 1D or 2D data access, depending on whether the task is spatially dependent or not.

void
free_arrays
(fp_t **conc_old, fp_t **conc_new, fp_t **conc_lap, fp_t **mask_lap)¶ Free dynamically allocated memory.

void
swap_pointers
(fp_t ***conc_old, fp_t ***conc_new)¶ Swap pointers to 2D arrays.
Rather than copy data from
fp_t**
conc_old intofp_t**
conc_new, an expensive operation, simply trade the topmost pointers. New becomes old, old becomes new, with no data lost and in almost no time.

void
swap_pointers_1D
(fp_t **conc_old, fp_t **conc_new)¶ Swap pointers to data underlying 1D arrays.
Rather than copy data from
fp_t*
conc_old[0] intofp_t*
conc_new[0], an expensive operation, simply trade the topmost pointers. New becomes old, old becomes new, with no data lost and in almost no time.
numerics.h¶
Declaration of Laplacian operator and analytical solution functions.
Defines

MAX_MASK_W
¶ Maximum width of the convolution mask (Laplacian stencil) array.

MAX_MASK_H
¶ Maximum height of the convolution mask (Laplacian stencil) array.
Functions

void
set_mask
(const fp_t dx, const fp_t dy, const int code, fp_t **mask_lap, const int nm)¶ Specify which stencil (mask) to use for the Laplacian (convolution)
The mask corresponding to the numerical code will be applied. The suggested encoding is mask width as the ones digit and value count as the tens digit, e.g. 53 specifies five_point_Laplacian_stencil(), while 93 specifies nine_point_Laplacian_stencil().
To add your own mask (stencil), add a case to this function with your chosen numerical encoding, then specify that code in the input parameters file (params.txt by default). Note that, for a Laplacian stencil, the sum of the coefficients must equal zero and nm must be an odd integer.
If your stencil is larger than \( 5\times 5\), you must increase the values defined by MAX_MASK_W and MAX_MASK_H.

void
five_point_Laplacian_stencil
(const fp_t dx, const fp_t dy, fp_t **mask_lap, const int nm)¶ Write 5point Laplacian stencil into convolution mask.
\(3\times3\) mask, 5 values, truncation error \(\mathcal{O}(\Delta x^2)\)

void
nine_point_Laplacian_stencil
(const fp_t dx, const fp_t dy, fp_t **mask_lap, const int nm)¶ Write 9point Laplacian stencil into convolution mask.
\(3\times3\) mask, 9 values, truncation error \(\mathcal{O}(\Delta x^4)\)

void
slow_nine_point_Laplacian_stencil
(const fp_t dx, const fp_t dy, fp_t **mask_lap, const int nm)¶ Write 9point Laplacian stencil into convolution mask.
\(5\times5\) mask, 9 values, truncation error \(\mathcal{O}(\Delta x^4)\)
Provided for testing and demonstration of scalability, only: as the name indicates, this 9point stencil is computationally more expensive than the \(3\times3\) version. If your code requires \(\mathcal{O}(\Delta x^4)\) accuracy, please use nine_point_Laplacian_stencil().

void
compute_convolution
(fp_t **const conc_old, fp_t **conc_lap, fp_t **const mask_lap, const int nx, const int ny, const int nm)¶ Perform the convolution of the mask matrix with the composition matrix.
If the convolution mask is the Laplacian stencil, the convolution evaluates the discrete Laplacian of the composition field. Other masks are possible, for example the Sobel filters for edge detection. This function is general purpose: as long as the dimensions nx, ny, and nm are properly specified, the convolution will be correctly computed.

void
update_composition
(fp_t **conc_old, fp_t **conc_lap, fp_t **conc_new, const int nx, const int ny, const int nm, const fp_t D, const fp_t dt)¶ Update composition field using explicit Euler discretization (forwardtime centered space)

fp_t
euclidean_distance
(const fp_t ax, const fp_t ay, const fp_t bx, const fp_t by)¶ Compute Euclidean distance between two points, a and b.

fp_t
manhattan_distance
(const fp_t ax, const fp_t ay, const fp_t bx, const fp_t by)¶ Compute Manhattan distance between two points, a and b.

fp_t
distance_point_to_segment
(const fp_t ax, const fp_t ay, const fp_t bx, const fp_t by, const fp_t px, const fp_t py)¶ Compute minimum distance from point p to a line segment bounded by points a and b.
This function computes the projection of p onto ab, limiting the projected range to [0, 1] to handle projections that fall outside of ab. Implemented after Grumdrig on Stackoverflow, https://stackoverflow.com/a/1501725.

void
analytical_value
(const fp_t x, const fp_t t, const fp_t D, fp_t *c)¶ Analytical solution of the diffusion equation for a carburizing process.
For 1D diffusion through a semiinfinite domain with initial and farfield composition \( c_{\infty} \) and boundary value \( c(x=0, t) = c_0 \) with constant diffusivity D, the solution to Fick’s second law is
\[ c(x,t) = c_0  (c_0  c_{\infty})\mathrm{erf}\left(\frac{x}{\sqrt{4Dt}}\right) \]which reduces, when \( c_{\infty} = 0 \), to\[ c(x,t) = c_0\left[1  \mathrm{erf}\left(\frac{x}{\sqrt{4Dt}}\right)\right]. \]

void
check_solution
(fp_t **conc_new, fp_t **conc_lap, const int nx, const int ny, const fp_t dx, const fp_t dy, const int nm, const fp_t elapsed, const fp_t D, fp_t *rss)¶ Compare numerical and analytical solutions of the diffusion equation.
Overwrites
conc_lap, into which the pointwise RSS is written. Normalized RSS is then computed as the sum of the pointwise values. Return
 Residual sum of squares (RSS), normalized to the domain size.
output.h¶
Declaration of output function prototypes for diffusion benchmarks.
Functions

void
param_parser
(int argc, char *argv[], int *bx, int *by, int *checks, int *code, fp_t *D, fp_t *dx, fp_t *dy, fp_t *linStab, int *nm, int *nx, int *ny, int *steps)¶ Read parameters from file specified on the command line.

void
print_progress
(const int step, const int steps)¶ Prints timestamps and a 20point progress bar to stdout.
Call inside the timestepping loop, near the top, e.g.
for (int step=0; step<steps; step++) { print_progress(step, steps); take_a_step(); elapsed += dt; }
timer.h¶
Declaration of timer function prototypes for diffusion benchmarks.
type.h¶
Definition of scalar data type and Doxygen diffusion group.
Typedefs

typedef double
fp_t
¶ Specify the basic data type to achieve the desired accuracy in floatingpoint arithmetic: float for singleprecision, double for doubleprecision. This choice propagates throughout the code, and may significantly affect runtime on GPU hardware.

struct
Stopwatch
¶  #include <type.h>
Container for timing data
gpucudadiffusion¶
cuda_kernels.cuh¶
Declaration of functions to execute on the GPU (CUDA kernels)
Functions

void
boundary_kernel
(fp_t *conc, const int nx, const int ny, const int nm)¶ Boundary condition kernel for execution on the GPU.
This function accesses 1D data rather than the 2D array representation of the scalar composition field
Boundary condition kernel for execution on the GPU.
Boundary condition kernel for execution on the GPU
This function accesses 1D data rather than the 2D array representation of the scalar composition field

void
convolution_kernel
(fp_t *conc_old, fp_t *conc_lap, const int nx, const int ny, const int nm)¶ Tiled convolution algorithm for execution on the GPU.
This function accesses 1D data rather than the 2D array representation of the scalar composition field, mapping into 2D tiles on the GPU with halo cells before computing the convolution.
Note:
 The source matrix (conc_old) and destination matrix (conc_lap) must be identical in size
 One CUDA core operates on one array index: there is no nested loop over matrix elements
 The halo (nm/2 perimeter cells) in conc_lap are unallocated garbage
 The same cells in conc_old are boundary values, and contribute to the convolution
 conc_tile is the shared tile of input data, accessible by all threads in this block

void
diffusion_kernel
(fp_t *conc_old, fp_t *conc_new, fp_t *conc_lap, const int nx, const int ny, const int nm, const fp_t D, const fp_t dt)¶ Vector addition algorithm for execution on the GPU.
This function accesses 1D data rather than the 2D array representation of the scalar composition field. Memory allocation, data transfer, and array release are handled in cuda_init(), with arrays on the host and device managed through CudaData, which is a struct passed by reference into the function. In this way, device kernels can be called in isolation without incurring the cost of data transfers and with reduced risk of memory leaks.
gpuopencldiffusion¶
opencl_data.h¶
Declaration of OpenCL data container.
Functions

void
report_error
(cl_int error, const char *message)¶ Report error code when status is not
CL_SUCCESS
.Refer to https://streamhpc.com/blog/20130428/openclerrorcodes/ for help interpreting error codes.

void
build_program
(const char *filename, cl_context *context, cl_device_id *gpu, cl_program *program, cl_int *status)¶ Build kernel program from text input.
Source follows the OpenCL Programming Book, https://www.fixstars.com/en/opencl/book/OpenCLProgrammingBook/callingthekernel/

void
init_opencl
(fp_t **conc_old, fp_t **mask_lap, const int nx, const int ny, const int nm, struct OpenCLData *dev)¶ Initialize OpenCL device memory before marching.

void
device_boundaries
(struct OpenCLData *dev, const int flip, const int nx, const int ny, const int nm, const int bx, const int by)¶ Apply boundary conditions on OpenCL device.

void
device_convolution
(struct OpenCLData *dev, const int flip, const int nx, const int ny, const int nm, const int bx, const int by)¶ Compute convolution on OpenCL device.

void
device_diffusion
(struct OpenCLData *dev, const int flip, const int nx, const int ny, const int nm, const int bx, const int by, const fp_t D, const fp_t dt)¶ Solve diffusion equation on OpenCL device.

void
read_out_result
(struct OpenCLData *dev, const int flip, fp_t **conc_new, const int nx, const int ny)¶ Copy data out of OpenCL device.

void
free_opencl
(struct OpenCLData *dev)¶ Free OpenCL device memory after marching.

struct
OpenCLData
¶  #include <opencl_data.h>
Container for GPU array pointers and parameters.
From the OpenCL v1.2 spec:
 A Context is the environment within which the kernels execute and the domain in which synchronization and memory management is defined. The context includes a set of devices, the memory accessible to those devices, the corresponding memory properties and one or more commandqueues used to schedule execution of a kernel(s) or operations on memory objects.
 A Program Object encapsulates the following information:
 A reference to an associated context.
 A program source or binary.
 The latest successfully built program executable, the list of devices for which the program executable is built, the build options used and a build log.
 The number of kernel objects currently attached.
 A Kernel Object encapsulates a specific
__kernel
function declared in a program and the argument values to be used when executing this__kernel
function.
Public Members

cl_context
context
¶ OpenCL interface to the GPU, hardware and software

cl_mem
conc_old
¶ Copy of old composition field on the GPU

cl_mem
conc_new
¶ Copy of new composition field on the GPU

cl_mem
conc_lap
¶ Copy of Laplacian field on the GPU

cl_mem
mask
¶ Copy of Laplacian mask on the GPU

cl_program
boundary_program
¶ Boundary program source for JIT compilation on the GPU

cl_program
convolution_program
¶ Convolution program source for JIT compilation on the GPU

cl_program
diffusion_program
¶ Timestepping program source for JIT compilation on the GPU

cl_kernel
boundary_kernel
¶ Boundary program executable for the GPU

cl_kernel
convolution_kernel
¶ Convolution program executable for the GPU

cl_kernel
diffusion_kernel
¶ Timestepping program executable for the GPU

cl_command_queue
commandQueue
¶ Queue for submitting OpenCL jobs to the GPU