--- title: "Chapter 00: nmathopencl --- Package Overview" author: "Kjell Nygren" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Chapter 00: nmathopencl --- Package Overview} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## What is `nmathopencl`? `nmathopencl` is a **developer library**: it ports R's internal `nmath` (Mathlib) statistical math functions to OpenCL so that downstream R packages can embed those functions inside their own custom GPU kernels. The primary audience is **package authors** who want GPU-accelerated computation and need statistical math functions available on the device side --- without having to port the underlying nmath sources themselves. A secondary audience is **end users** who want to call distribution functions (`dnorm`, `pgamma`, `rbinom`, ...) directly on GPU hardware. The package exports `*_opencl` wrappers for the full nmath family, but their main role is **validation**: running them on large vectors confirms that the OpenCL pipeline and GPU hardware are working before a downstream package is built. For modest vector sizes the GPU often performs no better than the CPU, because the cost of kernel compilation and host-to-device data transfer dominates. Meaningful GPU acceleration of individual nmath calls requires very large workloads. The real performance story is at the downstream package level. When nmath calls are embedded *inside* larger GPU kernels --- alongside other expensive device-side operations such as the gradient and envelope calculations in `glmbayes` --- the GPU does the computation without the round-trip transfer penalty, and substantial gains become possible. The design here supports that pattern; the exported `*_opencl` functions demonstrate it works. OpenCL is vendor-neutral: the same kernels run on NVIDIA, AMD, and Intel hardware. CPU-only execution is always supported when no OpenCL stack is present, so the package is safe to list as a dependency even in environments that lack a GPU. ## Three-layer architecture The package is organized in three layers, each corresponding to a set of vignettes: ``` ???????????????????????????????????????????????? | Layer 3 --- Kernels (inst/cl/src/) | | __kernel functions for the R-callable API | ???????????????????????????????????????????????? | Layer 2 --- nmath library (inst/cl/nmath/) | | Ported nmath/Rmath functions as device-side | | OpenCL C functions | ???????????????????????????????????????????????? | Layer 1 --- Upstream shims | | (inst/cl/R_shims/, R_ext/, System/, ...) | | Type definitions, macros, and constants | | that replace C headers unavailable in | | OpenCL C | ???????????????????????????????????????????????? ``` Layer 1 is the foundation: it makes the rest of the ported code compile under OpenCL's restricted C99 dialect without modification to the nmath sources. Layer 2 is the library: ~180 `.cl` files implementing the full suite of Mathlib functions. Layer 3 is the API surface: thin wrapper kernels that map a GPU work-item index to an element of an input vector and call the appropriate Layer 2 function. Downstream packages locate the Layer 2 sources at runtime with `system.file("cl", package = "nmathopencl")` and assemble them into their own OpenCL programs using `opencltools::load_kernel_library(..., package = "nmathopencl")`. They own the kernel runners, R wrappers, and compilation lifecycle; `nmathopencl` simply provides the portable math library they build on. See **Chapter 03** for the detailed assembly model, including how the four components of a complete kernel program (global configuration header, shims, nmath subset, and kernel function) are concatenated and compiled at runtime. ## C++ layout inside the package DLL | Layer | Location | Purpose | |-------|----------|---------| | **`nmathopencl`** | `nmathopencl.h`, `kernel_runners.cpp`, `kernel_wrappers.cpp` | Distribution-specific kernel runners and R-facing wrappers for all nmath functions | | **Internal OpenCL infrastructure** | `openclPort.h`, `opencl_kernel_runners.cpp` | Generic kernel runner, error helpers, device probing, and kernel loading inside the DLL --- see **Chapter 09** | | **`ex_glmbayes`** | `ex_glmbayes_*.cpp/.h` | Self-contained example showing how a downstream package (`glmbayes`) builds custom GLM kernels on top of the layers above | Kernel authors who `LinkingTo: nmathopencl` may include `openclPort.h` directly; the internal runner layer is documented in Chapter 09. ## Related packages `nmathopencl` is part of a small suite of cooperating packages: | Package | Role | Typical entry points | |---------|------|----------------------| | **`nmathopencl`** (this package) | OpenCL-ported Mathlib, `*_opencl` validation API, kernel loaders, package-local device selection | `nmathopencl_has_opencl()`, `load_kernel_*`, `dnorm_opencl()` | | **`opencltools`** ([CRAN](https://CRAN.R-project.org/package=opencltools)) | Host/runtime diagnostics and kernel-library authoring tools | `detect_environment_and_gpus()`, `verify_opencl_runtime()`, `load_library_for_kernel()`, `diagnose_glmbayes()` (opencltools-only report) | | **`glmbayes`** ([CRAN](https://CRAN.R-project.org/package=glmbayes)) | End-user Bayesian GLMs with optional GPU paths | `glmb()`, `use_opencl = TRUE` | **`nmathopencl` Imports `opencltools` (>= 0.8.0).** Host inventory, driver/ICD checks, and PATH validation are delegated to **opencltools**; compile-time OpenCL status for **this** package's DLL stays local via **`nmathopencl_has_opencl()`**. Host/runtime probes (`detect_*`, PATH helpers, `gpu_names`, and related functions) are **not** re-exported from **nmathopencl** --- call `opencltools::…` directly. Kernel-library authoring helpers (`load_library_for_kernel`, `extract_library_subset`, and related tagging tools) are re-exported for downstream kernel authors. For OpenCL setup and enablement, start with **Chapter 01** (attach messages and the nmathopencl-specific enablement path) and **`opencltools`** vignette **Chapter 01** (platform install details). ## R-side API families The exported `*_opencl` functions cover the full nmath family and mirror the structure of base R's `stats` package: | R file | Functions | |--------|-----------| | `normal_opencl.R` | `dnorm_opencl`, `pnorm_opencl`, `qnorm_opencl`, `rnorm_opencl` | | `gamma_opencl.R` | `dgamma_opencl`, `pgamma_opencl`, ... | | `binomial_opencl.R` | `dbinom_opencl`, `pbinom_opencl`, ... | | `poisson_opencl.R` | `dpois_opencl`, `ppois_opencl`, ... | | `beta_opencl.R` | `dbeta_opencl`, ... | | ... | (and so on for all families) | | `special_opencl.R` | `lgammafn_opencl`, `gammafn_opencl`, ... | | `math_support_opencl.R` | `fmax2_opencl`, `fmin2_opencl`, ... | Every function accepts a scalar parameter set, dispatches to the GPU via the kernel infrastructure, and falls back to the corresponding `stats::` or base-R function if OpenCL is unavailable or if the call fails. As noted above, these wrappers serve primarily as a working demonstration of the GPU pipeline; they can show speedups at very large vector sizes but are not the primary mechanism through which downstream packages obtain GPU acceleration. ## Checking OpenCL availability ```{r, eval = FALSE} library(nmathopencl) # Compile-time OpenCL support in this nmathopencl build nmathopencl_has_opencl() # Same check for the imported opencltools dependency opencltools::has_opencl() # Host/runtime diagnostic report (opencltools) opencltools::diagnose_glmbayes() ``` - **`nmathopencl_has_opencl()`** (nmathopencl) --- was **this** package built with OpenCL (`-DUSE_OPENCL`)? - **`opencltools::has_opencl()`** --- was the imported dependency built with OpenCL? - **`opencltools::diagnose_glmbayes()`** --- host/runtime report from **opencltools**. Host and driver inventory (`detect_environment_and_gpus()`, `verify_opencl_runtime()`, and related probes) live in **`opencltools`** --- use `opencltools::…` when calling them directly. All exported `*_opencl` wrappers branch on `nmathopencl_has_opencl()` first; the `fallback` argument then controls whether a failed OpenCL call is replaced with the CPU path (ignored when OpenCL is absent at compile time). See **Chapter 01** for the step-by-step enablement path (attach messages, opencltools first, then source reinstall of nmathopencl). ## Vignette guide **Part 0: Overview** | Vignette | Topic | |----------|-------| | Chapter 00 (this document) | Package overview and architecture | **Part I: Getting Started** | Vignette | Topic | |----------|-------| | Chapter 01 | OpenCL enablement for `nmathopencl` (attach messages, `opencltools` dependency, source reinstall) | | Chapter 02 | Adding `USE_OPENCL` and `has_opencl()` to your package: `configure` scripts, `opencltools` runtime relationship | **Part II: The Library and Program Model** | Vignette | Topic | |----------|-------| | Chapter 03 | Structure of `nmath` kernel programs: the four-layer assembly model | | Chapter 04 | The `nmath` OpenCL library (`inst/cl/nmath/`): cycles, shims, and annotation | **Part III: Developer Guide** | Vignette | Topic | |----------|-------| | Chapter 05 | Kernels, kernel runners, and kernel wrappers: roles and interaction | | Chapter 06 | Integrating kernel wrappers into your codebase: CPU fallbacks and R interfaces | | Chapter 07 | Writing and annotating `__kernel` functions | | Chapter 08 | Kernel loading: `load_kernel_source` and `load_kernel_library` | | Chapter 09 | Generic OpenCL kernel runners: the `openclPort` C++ infrastructure | | Chapter 10 | Case study: building custom GLM kernels (`ex_glmbayes`) | | Chapter 11 | Testing, debugging, and benchmarking GPU kernels | **Part IV: The R API** | Vignette | Topic | |----------|-------| | Chapter 12 | The `nmathopencl` R API: distribution functions on the GPU |