--- title: "Chapter 08: Kernel Loading --- load_kernel_source and load_kernel_library" author: "Kjell Nygren" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Chapter 08: Kernel Loading --- load_kernel_source and load_kernel_library} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>") ``` ## The assembly problem OpenCL compiles a **single string** of source code for each program. The string must be self-contained: all functions used by the kernel must be defined in the same string, in order, before they are called. There are no separate compilation units and no linker. When a kernel needs `dnorm4`, and `dnorm4` itself needs `R_D__0` and `R_finite`, those definitions must precede `dnorm4` in the string, which must precede the kernel body. **opencltools** provides two R functions that assemble and sort source components; use `package = "nmathopencl"` for this package's `inst/cl/` tree. ## `opencltools::load_kernel_source()` ```{r, eval = FALSE} opencltools::load_kernel_source(relative_path, package = "nmathopencl") ``` Reads and returns the contents of a single `.cl` file from `inst/cl/` inside the named package. The `relative_path` is relative to `inst/cl/`: ```{r, eval = FALSE} # Load the dnorm kernel entry-point file kernel_src <- opencltools::load_kernel_source("src/dnorm_kernel.cl", package = "nmathopencl") # Load a custom kernel from another package's inst/cl/ my_kernel <- opencltools::load_kernel_source("ex_glmbayes_src/f2_f3_binomial_logit.cl", package = "nmathopencl") ``` The returned character string can be concatenated with a library string and passed to the C++ runner. ## `opencltools::load_kernel_library()` ```{r, eval = FALSE} opencltools::load_kernel_library(subdir, package = "nmathopencl", verbose = FALSE) ``` Reads **all** `.cl` files from `inst/cl//`, parses the `@provides` / `@depends` annotations in each file, performs a topological sort, and returns a single concatenated source string with files in dependency-correct order. ```{r, eval = FALSE} # Load the complete nmath library (all ~180 files) nmath_src <- opencltools::load_kernel_library("nmath", package = "nmathopencl") # Load a smaller subset pre-curated for glmbayes kernels nmath_glm <- opencltools::load_kernel_library("ex_glmbayes_nmath", package = "nmathopencl") # Load with diagnostic output to trace the sort order nmath_src <- opencltools::load_kernel_library("nmath", package = "nmathopencl", verbose = TRUE) ``` ## The annotation format For `opencltools::load_kernel_library()` to sort correctly, every `.cl` file in the subdirectory must begin with an annotation block: ```c // @provides: symbol1, symbol2 // @depends: fileA, fileB // @includes: library_name ``` | Tag | Required? | Meaning | |-----|-----------|---------| | `@provides` | Yes | Comma-separated list of function (or symbol) names defined in this file | | `@depends` | Yes (may be empty) | Comma-separated list of `.cl` filenames (without extension) that must appear earlier in the output | | `@includes` | Optional | Library grouping label; used for diagnostics only | **Rules for `@depends`:** - List only **direct** dependencies, not the full transitive set. The resolver computes the transitive closure automatically. - Every non-header file must list the library header file (the file whose basename matches the subdirectory name) in `@depends`, so the header is always loaded first. - If a file has no dependencies at all (i.e., it is the library header), leave `@depends` empty or omit the tag. Example for `dgamma.cl`: ```c // @provides: dgamma // @depends: nmath, dpois, lgamma // @includes: nmath ``` This tells the resolver: emit `nmath.cl`, `dpois.cl`, and `lgamma.cl` before `dgamma.cl`. The resolver handles those files' own dependencies recursively. ## The topological sort algorithm `opencltools::load_kernel_library()` uses a straightforward iterative topological sort: 1. Parse all `.cl` files and extract their `@provides` and `@depends` tags. 2. Build a set of "unsorted" files. Initialize the "sorted" set to empty. 3. In each pass, find every unsorted file whose `@depends` entries are all already in the sorted set. Emit those files and add them to the sorted set. 4. Repeat until all files are sorted or no progress is made. 5. If no progress is made but unsorted files remain, throw: `"Dependency sort failed: unresolved dependencies remain."` Files with an empty `@depends` list are emitted first (pass 1). These are always the library header files. ### Verbose output When `verbose = TRUE`, the function prints each pass of the sort: ``` Pass 1: emitting nmath.cl, Rmath.cl, dpq.cl Pass 2: emitting chebyshev.cl, cospi.cl, fmax2.cl, gammalims.cl Pass 3: emitting lgammacor.cl Pass 4: emitting lgamma.cl, gamma.cl ... ``` This is invaluable for debugging missing or circular `@depends` tags. ## Cycle detection and breaking If file A lists file B in its `@depends` and file B lists file A, the resolver will detect this on the first pass where neither can be promoted, and throw an error. If `verbose = TRUE`, it prints the stuck files before throwing. The standard fix is to split one of the mutually dependent files into two: - A "cycle-free" fragment containing functions that make no forward call. - A "cycle-dependent" fragment containing functions that call the cycle-free fragment. - A `refactored.cl` that provides forward declarations for any symbols referenced before they are defined. Chapter 04 describes the three specific cycles in the nmath library and how they are resolved. ## Assembling a complete program A complete OpenCL program string is assembled by concatenating library and kernel sources in order: ```{r, eval = FALSE} library(nmathopencl) # 1. Load the nmath library (sorted automatically) nmath_src <- opencltools::load_kernel_library("nmath", package = "nmathopencl") # 2. Load the kernel entry-point kernel_src <- opencltools::load_kernel_source("src/dnorm_kernel.cl", package = "nmathopencl") # 3. Concatenate into a single program string program <- paste(nmath_src, kernel_src, sep = "\n") ``` This `program` string is then passed to the C++ runner (see Chapter 09). ## Using libraries from another package The `package` argument tells `opencltools::load_kernel_source()` and `opencltools::load_kernel_library()` which package's `inst/cl/` to look in: ```{r, eval = FALSE} # Load the curated nmath subset shipped inside nmathopencl # for use by a downstream package's kernels nmath_sub <- opencltools::load_kernel_library("ex_glmbayes_nmath", package = "nmathopencl") # Load a custom kernel from the same downstream package my_kernel <- opencltools::load_kernel_source("ex_glmbayes_src/f2_f3_binomial_logit.cl", package = "nmathopencl") program <- paste(nmath_sub, my_kernel, sep = "\n") ``` This is exactly the pattern used by `f2_f3_opencl()` in the `ex_glmbayes` example (see Chapter 10). ## Dependency index --- faster, kernel-specific loading `opencltools::load_kernel_library()` always sorts and loads **all** files in the subdirectory. For a large library like `nmath` (137 files) this is thorough but wasteful when only a small subset is needed for a specific kernel. `nmathopencl` therefore ships a pre-built dependency index alongside every curated library subdirectory. The index encodes the full transitive dependency map and the global load order, enabling two important optimizations: 1. **`load_library_for_kernel()` (R)** --- loads only the files required by one specific kernel, with no sort at runtime. 2. **`load_library_for_kernel()` (C++)** --- the same function implemented entirely in C++, suitable for use inside Rcpp kernel wrappers. ### Building the index --- `write_kernel_dependency_index()` ```{r, eval = FALSE} write_kernel_dependency_index(library_dir, verbose = TRUE) ``` This function analyzes a library directory, computes the full transitive dependency closure for every stem, and writes **two companion files** next to the `.cl` sources: | File | Format | Consumer | |------|--------|----------| | `kernel_dependency_index.rds` | Binary R object | R functions | | `kernel_dependency_index.tsv` | Tab-separated text | C++ code | The `.tsv` has a header row and one row per stem in global load order: ``` stem all_depends refactored Rmath refactored nmath refactored, Rmath chebyshev refactored, Rmath, nmath dbinom dpq, refactored, Rmath, nmath, stirlerr_cycle_free, ... ``` Both files are shipped with the package so end users never need to regenerate them. Run `write_kernel_dependency_index()` only after adding or removing `.cl` files from a library directory. ```{r, eval = FALSE} # Regenerate after changing the nmath library nmath_dir <- system.file("cl", "nmath", package = "nmathopencl") write_kernel_dependency_index(nmath_dir, verbose = TRUE) ``` ### Loading with the index --- `load_library_for_kernel()` (R) ```{r, eval = FALSE} load_library_for_kernel(kernel_path, library_dir, depends_tag = "all_depends", index = NULL) ``` Reads the annotation tag `@{depends_tag}` from the kernel file to get the list of needed stems, looks them up in the pre-loaded index, and returns their source concatenated in correct dependency order. Returns `""` when the kernel has no `@{depends_tag}` annotation (e.g. a kernel using only OpenCL built-ins). The `index` argument should always be supplied. Load it once and reuse it across all calls to avoid reading from disk on every invocation: ```{r, eval = FALSE} nmath_dir <- system.file("cl", "nmath", package = "nmathopencl") # Load the index once idx <- readRDS(file.path(nmath_dir, "kernel_dependency_index.rds")) # Re-use across multiple kernels --- only the subset for each kernel is read src_gaussian <- load_library_for_kernel( kernel_path = system.file("cl", "ex_glmbayes_src", "f2_f3_gaussian.cl", package = "nmathopencl"), library_dir = nmath_dir, depends_tag = "depends_nmath", index = idx ) src_gamma <- load_library_for_kernel( kernel_path = system.file("cl", "ex_glmbayes_src", "f2_f3_gamma.cl", package = "nmathopencl"), library_dir = nmath_dir, depends_tag = "depends_nmath", index = idx ) ``` The annotation tag in the kernel file must follow the same format used throughout the library: ```c // @depends_nmath: dbinom, pnorm, dnorm ``` ### Extracting a subset --- `extract_library_subset()` ```{r, eval = FALSE} extract_library_subset(kernel_paths, library_dir, dest_dir, depends_tag = "all_depends", index = NULL, overwrite = FALSE) ``` Given a set of kernel files, reads their `@{depends_tag}` annotations, takes the union of all transitive dependencies, and copies exactly those `.cl` files (plus both index files) from `library_dir` to `dest_dir` in dependency order. The destination directory must already exist. ```{r, eval = FALSE} nmath_dir <- system.file("cl", "nmath", package = "nmathopencl") src_dir <- system.file("cl", "ex_glmbayes_src", package = "nmathopencl") dest_dir <- "inst/cl/mypkg_nmath" # must exist idx <- readRDS(file.path(nmath_dir, "kernel_dependency_index.rds")) result <- extract_library_subset( kernel_paths = list.files(src_dir, pattern = "\\.cl$", full.names = TRUE), library_dir = nmath_dir, dest_dir = dest_dir, depends_tag = "depends_nmath", index = idx ) print(result) # data frame: stem, source, dest, copied ``` The result includes a row for `kernel_dependency_index.rds` and `kernel_dependency_index.tsv` so the extracted subset is immediately usable by both R and C++. ### C++ equivalent --- `openclPort::load_library_for_kernel()` For use inside Rcpp kernel wrappers (where calling back into R is not permitted), `kernel_loader.cpp` provides a C++ implementation with the same semantics: ```cpp #include "openclPort.h" // declares load_library_for_kernel // Inside a kernel wrapper, after the family/link dispatch has set kernel_file: std::string nmath_source = openclPort::load_library_for_kernel( kernel_file, // e.g. "ex_glmbayes_src/f2_f3_binomial_logit.cl" "ex_glmbayes_nmath", // library subdirectory "nmathopencl", // package "depends_nmath" // annotation tag ); ``` The C++ function reads `kernel_dependency_index.tsv` from the library directory once per call, parses the annotation tag from the kernel file, and loads only the required subset in the pre-computed order. No topological sort is performed at runtime. The `@depends_nmath` annotation must list the **direct** nmath entry-point stems called by the kernel (not the full transitive list): ```c // @depends_nmath: dbinom, pnorm, dnorm ``` The transitive expansion is performed automatically using the TSV index. ## Error messages | Error | Cause | |-------|-------| | `"OpenCL support is not available"` | Package built without `USE_OPENCL` | | `"Dependency sort failed: unresolved dependencies remain."` | A `@depends` entry names a file that does not exist in the directory, or there is a genuine circular dependency | | `"File not found: inst/cl/"` | `relative_path` does not match any file in the package |