--- title: "Chapter 07: Kernels --- Writing and Using OpenCL Kernel Files" author: "Kjell Nygren" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Chapter 07: Kernels --- Writing and Using OpenCL Kernel Files} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>") ``` ## What is a kernel? An OpenCL **kernel** is the entry point that executes on the GPU. It is declared with the `__kernel` qualifier and is compiled at runtime (by the OpenCL driver) from a source string assembled by the host program. Each invocation of the kernel processes one work-item --- typically one element of an output array --- in parallel with all other work-items. In `nmathopencl`, kernels live in `inst/cl/src/`. Each file defines exactly one `__kernel void` function. The file name follows the pattern `_kernel.cl`; for example, `dnorm_kernel.cl` implements the vectorized density computation for the normal distribution. ## Anatomy of a simple kernel ```c // dnorm_kernel.cl // Vectorized wrapper kernel for the public Mathlib dnorm interface. #pragma OPENCL EXTENSION cl_khr_fp64 : enable __kernel void dnorm_kernel( __global const double* x, const double mu, const double sigma, const int give_log, __global double* out, const int n ) { int i = get_global_id(0); if (i >= n) return; out[i] = dnorm4(x[i], mu, sigma, give_log); } ``` Key elements: | Element | Purpose | |---------|---------| | `#pragma OPENCL EXTENSION cl_khr_fp64 : enable` | Activates double-precision arithmetic (required for all `nmathopencl` kernels) | | `__kernel void` | Marks this function as a GPU entry point | | `__global const double* x` | Input buffer in device (global) memory | | `const double mu` | Scalar parameter passed by value | | `__global double* out` | Output buffer in device (global) memory | | `const int n` | Work-item count guard | | `get_global_id(0)` | Returns this work-item's index in the first dimension | | `dnorm4(...)` | Calls the nmath library function, defined in the concatenated library source | The `if (i >= n) return;` guard handles the case where the global work size is rounded up to a multiple of the local work-group size, leaving excess work-items that must not write out-of-bounds. ## Standard argument layout All kernels in `inst/cl/src/` follow a consistent argument layout designed to work with `openclPort::opencl_dbl_scalar_kernel_runner`: ``` kernel(double arg0, double arg1, ..., double argN, __global double* out, int n_out) ``` Scalar `double` parameters come first (in any order needed by the underlying nmath function), followed by the output buffer and its length. Input **vectors** are also represented as `__global const double*` and precede the scalar arguments when present. ## The `*_ex_kernel.cl` variants Some distributions expose a secondary interface. The `bessel_*_ex_kernel.cl` files, for example, provide a version that returns both the function value and a sign/exponent component simultaneously. These exist to mirror the extended interfaces in nmath (`bessel_i` vs `bessel_i` with `expo` argument) and are distinguished by the `_ex_` infix. ## Kernel file index The ~130 kernel files in `inst/cl/src/` cover every function exposed by the `nmathopencl` R API: ### Density kernels `dbeta_kernel.cl`, `dbinom_kernel.cl`, `dbinom_raw_kernel.cl`, `dcauchy_kernel.cl`, `dchisq_kernel.cl`, `dexp_kernel.cl`, `df_kernel.cl`, `dgamma_kernel.cl`, `dgeom_kernel.cl`, `dhyper_kernel.cl`, `dlnorm_kernel.cl`, `dlogis_kernel.cl`, `dnbeta_kernel.cl`, `dnbinom_kernel.cl`, `dnbinom_mu_kernel.cl`, `dnchisq_kernel.cl`, `dnf_kernel.cl`, `dnorm_kernel.cl`, `dnt_kernel.cl`, `dpois_kernel.cl`, `dpois_raw_kernel.cl`, `dsignrank_kernel.cl`, `dt_kernel.cl`, `dunif_kernel.cl`, `dweibull_kernel.cl`, `dwilcox_kernel.cl` ### CDF kernels `pbeta_kernel.cl`, `pbinom_kernel.cl`, `pcauchy_kernel.cl`, `pchisq_kernel.cl`, `pexp_kernel.cl`, `pf_kernel.cl`, `pgamma_kernel.cl`, `pgeom_kernel.cl`, `phyper_kernel.cl`, `plnorm_kernel.cl`, `plogis_kernel.cl`, `pnbeta_kernel.cl`, `pnbinom_kernel.cl`, `pnbinom_mu_kernel.cl`, `pnchisq_kernel.cl`, `pnf_kernel.cl`, `pnorm_kernel.cl`, `pnt_kernel.cl`, `ppois_kernel.cl`, `psignrank_kernel.cl`, `pt_kernel.cl`, `ptukey_kernel.cl`, `punif_kernel.cl`, `pweibull_kernel.cl`, `pwilcox_kernel.cl` ### Quantile kernels `qbeta_kernel.cl`, `qbinom_kernel.cl`, `qcauchy_kernel.cl`, `qchisq_kernel.cl`, `qexp_kernel.cl`, `qf_kernel.cl`, `qgamma_kernel.cl`, `qgeom_kernel.cl`, `qhyper_kernel.cl`, `qlnorm_kernel.cl`, `qlogis_kernel.cl`, `qnbeta_kernel.cl`, `qnbinom_kernel.cl`, `qnbinom_mu_kernel.cl`, `qnchisq_kernel.cl`, `qnf_kernel.cl`, `qnorm_kernel.cl`, `qnt_kernel.cl`, `qpois_kernel.cl`, `qsignrank_kernel.cl`, `qt_kernel.cl`, `qtukey_kernel.cl`, `qunif_kernel.cl`, `qweibull_kernel.cl`, `qwilcox_kernel.cl` ### Random-variate kernels `rbeta_kernel.cl`, `rbinom_kernel.cl`, `rcauchy_kernel.cl`, `rchisq_kernel.cl`, `rexp_kernel.cl`, `rf_kernel.cl`, `rgamma_kernel.cl`, `rgeom_kernel.cl`, `rhyper_kernel.cl`, `rlnorm_kernel.cl`, `rlogis_kernel.cl`, `rmultinom_kernel.cl`, `rnbinom_kernel.cl`, `rnbinom_mu_kernel.cl`, `rnchisq_kernel.cl`, `rnorm_kernel.cl`, `rpois_kernel.cl`, `rsignrank_kernel.cl`, `rt_kernel.cl`, `runif_kernel.cl`, `rweibull_kernel.cl`, `rwilcox_kernel.cl` ### Special function kernels `beta_special_kernel.cl`, `choose_special_kernel.cl`, `lbeta_special_kernel.cl`, `lchoose_special_kernel.cl`, `gammafn_kernel.cl`, `lgammafn_kernel.cl`, `lgammafn_sign_kernel.cl`, `lgamma1p_kernel.cl`, `digamma_kernel.cl`, `trigamma_kernel.cl`, `psigamma_kernel.cl`, `pentagamma_kernel.cl`, `tetragamma_kernel.cl`, `dpsifn_kernel.cl` ### Math support kernels `fmax2_kernel.cl`, `fmin2_kernel.cl`, `fprec_kernel.cl`, `fround_kernel.cl`, `fsign_kernel.cl`, `ftrunc_kernel.cl`, `imax2_kernel.cl`, `imin2_kernel.cl`, `sign_kernel.cl`, `r_pow_kernel.cl`, `r_pow_di_kernel.cl`, `pow1p_kernel.cl`, `log1pmx_kernel.cl`, `log1pexp_kernel.cl`, `log1mexp_kernel.cl`, `logspace_add_kernel.cl`, `logspace_sub_kernel.cl`, `logspace_sum_kernel.cl` ### RNG core kernels `norm_rand_kernel.cl`, `unif_rand_kernel.cl`, `exp_rand_kernel.cl`, `r_unif_index_kernel.cl` ### Utility kernels `r_check_user_interrupt_kernel.cl`, `r_check_stack_kernel.cl` ### Bessel kernels (standard and extended) `bessel_i_kernel.cl`, `bessel_i_ex_kernel.cl`, `bessel_j_kernel.cl`, `bessel_j_ex_kernel.cl`, `bessel_k_kernel.cl`, `bessel_k_ex_kernel.cl`, `bessel_y_kernel.cl`, `bessel_y_ex_kernel.cl` ## Program assembly Kernels are **never compiled in isolation**. The host-side runner assembles a complete OpenCL program string by concatenating: 1. The nmath library source (via `opencltools::load_kernel_library("nmath", package = "nmathopencl")`) 2. The kernel file (via `opencltools::load_kernel_source("src/_kernel.cl", package = "nmathopencl")`) This concatenated string is then passed to `clBuildProgram`. Because `opencltools::load_kernel_library()` performs topological sorting, the library functions are always defined before the kernel refers to them. Chapter 08 covers this assembly process in detail. ## Writing a custom kernel To write a kernel that calls an `nmathopencl` nmath function: 1. Identify the nmath function you need (e.g., `dgamma`). 2. Check its `@provides` tag in the corresponding `.cl` file to confirm the exact C signature. 3. Write a `__kernel void` function following the standard layout. 4. Load the nmath library subset your function needs using the `@all_depends` metadata as a guide (see Chapter 08). 5. Append your kernel source after the library source in the assembled program string. Chapter 10 works through this process end-to-end for the `glmbayes` GLM log-posterior kernels.