API Reference

common-diffusion

boundaries.h

Declaration of boundary condition function prototypes.

Functions

void apply_initial_conditions(fp_t **conc_old, const int nx, const int ny, const int nm)

Initialize flat composition field with fixed boundary conditions.

The boundary conditions are fixed values of \( c_{hi} \) along the lower-left half and upper-right half walls, no flux everywhere else, with an initial values of \( c_{lo} \) everywhere. These conditions represent a carburizing process, with partial exposure (rather than the entire left and right walls) to produce an inhomogeneous workload and highlight numerical errors at the boundaries.

void apply_boundary_conditions(fp_t **conc_old, const int nx, const int ny, const int nm)

Set fixed value \( (c_{hi}) \) along left and bottom, zero-flux elsewhere.

mesh.h

Declaration of mesh function prototypes for diffusion benchmarks.

Functions

void make_arrays(fp_t ***conc_old, fp_t ***conc_new, fp_t ***conc_lap, fp_t ***mask_lap, const int nx, const int ny, const int nm)

Allocate 2D arrays to store scalar composition values.

Arrays are allocated as 1D arrays, then 2D pointer arrays are mapped over the top. This facilitates use of either 1D or 2D data access, depending on whether the task is spatially dependent or not.

void free_arrays(fp_t **conc_old, fp_t **conc_new, fp_t **conc_lap, fp_t **mask_lap)

Free dynamically allocated memory.

void swap_pointers(fp_t ***conc_old, fp_t ***conc_new)

Swap pointers to 2D arrays.

Rather than copy data from fp_t** conc_old into fp_t** conc_new, an expensive operation, simply trade the top-most pointers. New becomes old, old becomes new, with no data lost and in almost no time.

void swap_pointers_1D(fp_t **conc_old, fp_t **conc_new)

Swap pointers to data underlying 1D arrays.

Rather than copy data from fp_t* conc_old[0] into fp_t* conc_new[0], an expensive operation, simply trade the top-most pointers. New becomes old, old becomes new, with no data lost and in almost no time.

numerics.h

Declaration of Laplacian operator and analytical solution functions.

Defines

MAX_MASK_W

Maximum width of the convolution mask (Laplacian stencil) array.

MAX_MASK_H

Maximum height of the convolution mask (Laplacian stencil) array.

Functions

void set_mask(const fp_t dx, const fp_t dy, const int code, fp_t **mask_lap, const int nm)

Specify which stencil (mask) to use for the Laplacian (convolution)

The mask corresponding to the numerical code will be applied. The suggested encoding is mask width as the ones digit and value count as the tens digit, e.g. 53 specifies five_point_Laplacian_stencil(), while 93 specifies nine_point_Laplacian_stencil().

To add your own mask (stencil), add a case to this function with your chosen numerical encoding, then specify that code in the input parameters file (params.txt by default). Note that, for a Laplacian stencil, the sum of the coefficients must equal zero and nm must be an odd integer.

If your stencil is larger than \( 5\times 5\), you must increase the values defined by MAX_MASK_W and MAX_MASK_H.

void five_point_Laplacian_stencil(const fp_t dx, const fp_t dy, fp_t **mask_lap, const int nm)

Write 5-point Laplacian stencil into convolution mask.

\(3\times3\) mask, 5 values, truncation error \(\mathcal{O}(\Delta x^2)\)

void nine_point_Laplacian_stencil(const fp_t dx, const fp_t dy, fp_t **mask_lap, const int nm)

Write 9-point Laplacian stencil into convolution mask.

\(3\times3\) mask, 9 values, truncation error \(\mathcal{O}(\Delta x^4)\)

void slow_nine_point_Laplacian_stencil(const fp_t dx, const fp_t dy, fp_t **mask_lap, const int nm)

Write 9-point Laplacian stencil into convolution mask.

\(5\times5\) mask, 9 values, truncation error \(\mathcal{O}(\Delta x^4)\)

Provided for testing and demonstration of scalability, only: as the name indicates, this 9-point stencil is computationally more expensive than the \(3\times3\) version. If your code requires \(\mathcal{O}(\Delta x^4)\) accuracy, please use nine_point_Laplacian_stencil().

void compute_convolution(fp_t **const conc_old, fp_t **conc_lap, fp_t **const mask_lap, const int nx, const int ny, const int nm)

Perform the convolution of the mask matrix with the composition matrix.

If the convolution mask is the Laplacian stencil, the convolution evaluates the discrete Laplacian of the composition field. Other masks are possible, for example the Sobel filters for edge detection. This function is general purpose: as long as the dimensions nx, ny, and nm are properly specified, the convolution will be correctly computed.

void update_composition(fp_t **conc_old, fp_t **conc_lap, fp_t **conc_new, const int nx, const int ny, const int nm, const fp_t D, const fp_t dt)

Update composition field using explicit Euler discretization (forward-time centered space)

fp_t euclidean_distance(const fp_t ax, const fp_t ay, const fp_t bx, const fp_t by)

Compute Euclidean distance between two points, a and b.

fp_t manhattan_distance(const fp_t ax, const fp_t ay, const fp_t bx, const fp_t by)

Compute Manhattan distance between two points, a and b.

fp_t distance_point_to_segment(const fp_t ax, const fp_t ay, const fp_t bx, const fp_t by, const fp_t px, const fp_t py)

Compute minimum distance from point p to a line segment bounded by points a and b.

This function computes the projection of p onto ab, limiting the projected range to [0, 1] to handle projections that fall outside of ab. Implemented after Grumdrig on Stackoverflow, https://stackoverflow.com/a/1501725.

void analytical_value(const fp_t x, const fp_t t, const fp_t D, fp_t *c)

Analytical solution of the diffusion equation for a carburizing process.

For 1D diffusion through a semi-infinite domain with initial and far-field composition \( c_{\infty} \) and boundary value \( c(x=0, t) = c_0 \) with constant diffusivity D, the solution to Fick’s second law is

\[ c(x,t) = c_0 - (c_0 - c_{\infty})\mathrm{erf}\left(\frac{x}{\sqrt{4Dt}}\right) \]
which reduces, when \( c_{\infty} = 0 \), to
\[ c(x,t) = c_0\left[1 - \mathrm{erf}\left(\frac{x}{\sqrt{4Dt}}\right)\right]. \]

void check_solution(fp_t **conc_new, fp_t **conc_lap, const int nx, const int ny, const fp_t dx, const fp_t dy, const int nm, const fp_t elapsed, const fp_t D, fp_t *rss)

Compare numerical and analytical solutions of the diffusion equation.

Overwrites conc_lap, into which the point-wise RSS is written. Normalized RSS is then computed as the sum of the point-wise values.

Returns

Residual sum of squares (RSS), normalized to the domain size.

output.h

Declaration of output function prototypes for diffusion benchmarks.

Functions

void param_parser(int argc, char *argv[], int *bx, int *by, int *checks, int *code, fp_t *D, fp_t *dx, fp_t *dy, fp_t *linStab, int *nm, int *nx, int *ny, int *steps)

Read parameters from file specified on the command line.

void print_progress(const int step, const int steps)

Prints timestamps and a 20-point progress bar to stdout.

Call inside the timestepping loop, near the top, e.g.

for (int step=0; step<steps; step++) {
    print_progress(step, steps);
    take_a_step();
    elapsed += dt;
}

void write_csv(fp_t **conc, const int nx, const int ny, const fp_t dx, const fp_t dy, const int step)

Writes scalar composition field to diffusion.???????.csv.

void write_png(fp_t **conc, const int nx, const int ny, const int step)

Writes scalar composition field to diffusion.???????.png.

timer.h

Declaration of timer function prototypes for diffusion benchmarks.

Functions

void StartTimer()

Set CPU frequency and begin timing.

double GetTimer()

Return elapsed time in seconds.

type.h

Definition of scalar data type and Doxygen diffusion group.

Typedefs

typedef double fp_t

Specify the basic data type to achieve the desired accuracy in floating-point arithmetic: float for single-precision, double for double-precision. This choice propagates throughout the code, and may significantly affect runtime on GPU hardware.

struct Stopwatch
#include <type.h>

Container for timing data

Public Members

fp_t conv

Cumulative time executing compute_convolution()

fp_t step

Cumulative time executing solve_diffusion_equation()

fp_t file

Cumulative time executing write_csv() and write_png()

fp_t soln

Cumulative time executing check_solution()

gpu-cuda-diffusion

cuda_kernels.cuh

Declaration of functions to execute on the GPU (CUDA kernels)

Functions

void boundary_kernel(fp_t *conc, const int nx, const int ny, const int nm)

Boundary condition kernel for execution on the GPU.

This function accesses 1D data rather than the 2D array representation of the scalar composition field

Boundary condition kernel for execution on the GPU.

Boundary condition kernel for execution on the GPU

This function accesses 1D data rather than the 2D array representation of the scalar composition field

void convolution_kernel(fp_t *conc_old, fp_t *conc_lap, const int nx, const int ny, const int nm)

Tiled convolution algorithm for execution on the GPU.

This function accesses 1D data rather than the 2D array representation of the scalar composition field, mapping into 2D tiles on the GPU with halo cells before computing the convolution.

Note:

  • The source matrix (conc_old) and destination matrix (conc_lap) must be identical in size

  • One CUDA core operates on one array index: there is no nested loop over matrix elements

  • The halo (nm/2 perimeter cells) in conc_lap are unallocated garbage

  • The same cells in conc_old are boundary values, and contribute to the convolution

  • conc_tile is the shared tile of input data, accessible by all threads in this block

void diffusion_kernel(fp_t *conc_old, fp_t *conc_new, fp_t *conc_lap, const int nx, const int ny, const int nm, const fp_t D, const fp_t dt)

Vector addition algorithm for execution on the GPU.

This function accesses 1D data rather than the 2D array representation of the scalar composition field. Memory allocation, data transfer, and array release are handled in cuda_init(), with arrays on the host and device managed through CudaData, which is a struct passed by reference into the function. In this way, device kernels can be called in isolation without incurring the cost of data transfers and with reduced risk of memory leaks.

Variables

fp_t d_mask[5 * 5]

Convolution mask array on the GPU, allocated in protected memory.

gpu-opencl-diffusion

opencl_data.h

Declaration of OpenCL data container.

Functions

void report_error(cl_int error, const char *message)

Report error code when status is not CL_SUCCESS.

Refer to https://streamhpc.com/blog/2013-04-28/opencl-error-codes/ for help interpreting error codes.

void build_program(const char *filename, cl_context *context, cl_device_id *gpu, cl_program *program, cl_int *status)

Build kernel program from text input.

Source follows the OpenCL Programming Book, https://www.fixstars.com/en/opencl/book/OpenCLProgrammingBook/calling-the-kernel/

void init_opencl(fp_t **conc_old, fp_t **mask_lap, const int nx, const int ny, const int nm, struct OpenCLData *dev)

Initialize OpenCL device memory before marching.

void device_boundaries(struct OpenCLData *dev, const int flip, const int nx, const int ny, const int nm, const int bx, const int by)

Apply boundary conditions on OpenCL device.

void device_convolution(struct OpenCLData *dev, const int flip, const int nx, const int ny, const int nm, const int bx, const int by)

Compute convolution on OpenCL device.

void device_diffusion(struct OpenCLData *dev, const int flip, const int nx, const int ny, const int nm, const int bx, const int by, const fp_t D, const fp_t dt)

Solve diffusion equation on OpenCL device.

void read_out_result(struct OpenCLData *dev, const int flip, fp_t **conc_new, const int nx, const int ny)

Copy data out of OpenCL device.

void free_opencl(struct OpenCLData *dev)

Free OpenCL device memory after marching.

struct OpenCLData
#include <opencl_data.h>

Container for GPU array pointers and parameters.

From the OpenCL v1.2 spec:

  • A Context is the environment within which the kernels execute and the domain in which synchronization and memory management is defined. The context includes a set of devices, the memory accessible to those devices, the corresponding memory properties and one or more command-queues used to schedule execution of a kernel(s) or operations on memory objects.

  • A Program Object encapsulates the following information:

    • A reference to an associated context.

    • A program source or binary.

    • The latest successfully built program executable, the list of devices for which the program executable is built, the build options used and a build log.

    • The number of kernel objects currently attached.

  • A Kernel Object encapsulates a specific __kernel function declared in a program and the argument values to be used when executing this __kernel function.

Public Members

cl_context context

OpenCL interface to the GPU, hardware and software

cl_mem conc_old

Copy of old composition field on the GPU

cl_mem conc_new

Copy of new composition field on the GPU

cl_mem conc_lap

Copy of Laplacian field on the GPU

cl_mem mask

Copy of Laplacian mask on the GPU

cl_program boundary_program

Boundary program source for JIT compilation on the GPU

cl_program convolution_program

Convolution program source for JIT compilation on the GPU

cl_program diffusion_program

Timestepping program source for JIT compilation on the GPU

cl_kernel boundary_kernel

Boundary program executable for the GPU

cl_kernel convolution_kernel

Convolution program executable for the GPU

cl_kernel diffusion_kernel

Timestepping program executable for the GPU

cl_command_queue commandQueue

Queue for submitting OpenCL jobs to the GPU

opencl_kernels.h

Warning

doxygenfile: Cannot find file “opencl_kernels.h

Looking for something specific?