--- title: "Chapter 04: The nmath OpenCL Library" author: "Kjell Nygren" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Chapter 04: The nmath OpenCL Library} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>") ``` ## What is `inst/cl/nmath/`? The `nmath/` subdirectory under `inst/cl/` contains a complete OpenCL C port of R's `nmath` (Mathlib) library --- approximately 180 `.cl` files, one per source file in R's `src/nmath/`. Together they implement the full suite of statistical distribution functions (densities, CDFs, quantiles, random-variate generators) and supporting mathematical helpers that make up R's statistical computing core. The port is designed so that any downstream package can load a **subset** of this library --- just the functions it actually needs --- without pulling in the entire collection. The annotation scheme and dependency resolver (described in Chapter 08) make this automatic. ## Annotation scheme Every `.cl` file in `nmath/` begins with a standard metadata block: ```c // @source_type: c // @source_origin: dnorm.c // @includes: nmath.h, dpq.h // @depends: nmath, dpq // @provides: dnorm4 // @all_depends_count: 3 // @all_depends: dpq, Rmath, nmath // @load_order: 36 ``` | Tag | Meaning | |-----|---------| | `@source_type` | Always `c` for nmath ports | | `@source_origin` | The original C source file in R's `src/nmath/` | | `@includes` | Headers included in the original C file | | `@depends` | **Direct** file dependencies (`.cl` filenames without extension) | | `@provides` | Functions defined in this file | | `@all_depends` | Full transitive closure of dependencies (union over the DAG) | | `@all_depends_count` | Number of entries in `@all_depends` | | `@load_order` | Pre-computed position in the topologically sorted order | The `@depends` tag is what `opencltools::load_kernel_library()` uses to sort files. The `@all_depends` tag is informational --- it documents the complete dependency set so that a developer extracting a minimal subset knows exactly which files to include. ## File families The ~180 files are organized by function family. The table below shows representative entries for the main families. ### Infrastructure / header files | File | Provides | |------|----------| | `nmath.cl` | Library header: constants, macros, error codes | | `Rmath.cl` | Forward declarations for all Mathlib functions | | `dpq.cl` | `R_D__0`, `R_D__1`, `R_DT_0`, `R_DT_1`, `R_D_Lval`, ... (log-prob helpers) | | `refactored.cl` | Forward declarations for cycle-broken functions | ### Core math helpers | File | Provides | |------|----------| | `chebyshev.cl` | `chebyshev_init`, `chebyshev_eval` | | `cospi.cl` | `cospi`, `sinpi`, `tanpi` | | `fmax2.cl` | `fmax2` | | `fmin2.cl` | `fmin2` | | `gammalims.cl` | `gammalims` | | `lgammacor.cl` | `lgammacor` | | `lgamma.cl` | `lgammafn`, `lgammafn_sign` | | `gamma.cl` | `gammafn` | | `stirlerr.cl` | `stirlerr` | | `bd0.cl` | `bd0`, `ebd0` | | `pgamma_utils.cl` | `log1pmx`, `lgamma1p` | | `mlutils.cl` | `log1pexp`, `log1mexp`, `logspace_add`, `logspace_sub` | | `expm1.cl` | `expm1` (if not an OpenCL built-in) | | `log1p.cl` | `log1p` (if not an OpenCL built-in) | ### Density functions (`d*`) | File | Provides | |------|----------| | `dnorm.cl` | `dnorm4` | | `dgamma.cl` | `dgamma` | | `dbinom.cl` | `dbinom_raw`, `dbinom` | | `dpois.cl` | `dpois_raw`, `dpois` | | `dbeta.cl` | `dbeta` | | `dt.cl` | `dt` | | `df.cl` | `df` | | `dnbinom.cl` | `dnbinom`, `dnbinom_mu` | | `dchisq.cl` | `dchisq` | | `dexp.cl` | `dexp` | | `dweibull.cl` | `dweibull` | | `dlnorm.cl` | `dlnorm` | | `dlogis.cl` | `dlogis` | | `dcauchy.cl` | `dcauchy` | | `dunif.cl` | `dunif` | | `dhyper.cl` | `dhyper` | | `dgeom.cl` | `dgeom` | | `dnt.cl` | `dnt` (noncentral t) | | `dnf.cl` | `dnf` (noncentral F) | | `dnchisq.cl` | `dnchisq` (noncentral chi-sq) | | `dnbeta.cl` | `dnbeta` (noncentral beta) | ### CDF functions (`p*`) Similar coverage for `pnorm.cl`, `pgamma.cl`, `pbinom.cl`, `ppois.cl`, `pbeta.cl`, `pt.cl`, `pf.cl`, etc. Each file ports the corresponding `p.c` from R's source. ### Quantile functions (`q*`) `qnorm.cl`, `qgamma.cl`, `qbinom.cl`, `qpois.cl`, `qbeta.cl`, `qt.cl`, `qf.cl`, etc. Discrete quantile functions use a shared binary-search helper in `qDiscrete_search.cl`. ### Random-variate generators (`r*`) `rnorm.cl`, `rgamma.cl`, `rbinom.cl`, `rpois.cl`, `rbeta.cl`, `rt.cl`, `runif.cl`, etc. These files define the device-side generator logic; the actual RNG seed and state management is handled by the host-side kernel runner. ### Special functions | File | Provides | |------|----------| | `beta.cl` | `beta` (beta function) | | `lbeta.cl` | `lbeta` | | `choose.cl` | `choose`, `lchoose` | | `polygamma.cl` | `psigamma`, `digamma`, `trigamma`, ... | | `toms708.cl` | Incomplete beta (TOMS 708 algorithm) | | `bessel.cl` | `bessel_i`, `bessel_j`, `bessel_k`, `bessel_y` | | `signrank.cl` | `dsignrank`, `psignrank`, `qsignrank`, `rsignrank` | | `wilcox.cl` | `dwilcox`, `pwilcox`, `qwilcox`, `rwilcox` | ### Additional utilities | File | Provides | |------|----------| | `fprec.cl`, `fround.cl`, `ftrunc.cl`, `fsign.cl` | Rounding/truncation | | `imax2.cl`, `imin2.cl` | Integer min/max | | `sign.cl` | `sign` | | `d1mach.cl`, `i1mach.cl` | Machine constants (BLAS-era helpers) | | `r_check_user_interrupt.cl` | Stub for interrupt checking | | `sexp.cl`, `snorm.cl`, `sunif.cl` | Alternative RNG implementations | ## Cycle-breaking artifacts OpenCL compiles a program as a **single translation unit**: every function must be defined before it is called (no forward-declaration headers, no separate compilation). A handful of the nmath C files are mutually recursive in ways that C handles via separate compilation but OpenCL cannot. `nmathopencl` resolves these cycles by splitting the offending source file into two `.cl` files: ``` stirlerr.c -> stirlerr_cycle_free.cl (no forward references) stirlerr_cycle_dependent.cl (calls the forward-declared part) refactored.cl (forward declarations only) ``` `refactored.cl` contains `extern`-style forward declarations so that the cycle-dependent portion can reference symbols that appear later in the concatenated source. The `@depends` tags ensure the three files are loaded in the correct order: `refactored` -> `stirlerr_cycle_free` -> `stirlerr_cycle_dependent` -> `stirlerr`. Similarly, `bessel_j` and `bessel_y` are each split into `*_cycle_free.cl` and `*_cycle_dependent.cl` for the same reason. ## The `ex_glmbayes_nmath/` subset Because the full `nmath/` library is large (~180 files), the package also ships a pre-curated minimal subset in `inst/cl/ex_glmbayes_nmath/` --- exactly the 20 files needed by the six `glmbayes` kernels: ``` Rmath.cl, nmath.cl, refactored.cl, chebyshev.cl, cospi.cl, fmax2.cl, gammalims.cl, lgammacor.cl, gamma.cl, lgamma.cl, stirlerr_cycle_free.cl, pgamma_utils.cl, stirlerr_cycle_dependent.cl, stirlerr.cl, bd0.cl, dpois.cl, dgamma.cl, dnorm.cl, pnorm.cl, dbinom.cl ``` This subset is determined by the `@all_depends` metadata in the six kernel files. Chapter 10 shows how to perform this extraction for a new downstream package. ## Porting fidelity The port aims for bit-for-bit equivalence with the CPU nmath results on IEEE 754 hardware. The main sources of deviation are: 1. **Compiler flags**: the OpenCL compiler may reorder floating-point operations unless `-cl-fp64-round-to-nearest` is specified. The `OpenCLConfig` struct in `openclPort.h` probes the device and sets appropriate build flags. 2. **Built-in precision**: some OpenCL devices implement `sqrt`, `exp`, and `log` with 1 ULP error (IEEE 754 guarantees exactly rounded); others allow slightly more. In practice, deviations are below numerical tolerance for all statistical applications. 3. **OpenCL built-ins**: functions like `lgamma`, `tgamma`, `expm1`, and `log1p` are part of the OpenCL 1.1+ specification and do **not** need to be ported. The corresponding `.cl` files include `#ifndef` guards so that the ported version is only compiled if the built-in is absent.