--- title: "Chapter 12: The nmathopencl R API --- Distribution Functions on the GPU" author: "Kjell Nygren" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Chapter 12: The nmathopencl R API --- Distribution Functions on the GPU} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>") ``` ## Overview `nmathopencl` exports a family of R functions that mirror R's `stats::` distribution functions but dispatch computation to an OpenCL device when one is available. The calling convention is intentionally close to the base-R equivalents so that substitution is straightforward. All exported functions share three common arguments: | Argument | Type | Default | Meaning | |----------|------|---------|---------| | `fallback` | logical | `FALSE` | If `TRUE` while `nmathopencl_has_opencl()` is `TRUE`, fall back to the `stats::` analogue when the OpenCL call fails; ignored when OpenCL is unavailable (CPU is used anyway) | | `verbose` | logical | `FALSE` | Extra diagnostics (including when picking the CPU path with no OpenCL runtime) | | `log` / `lower.tail` | logical | distribution-specific | Mirror the `stats::` convention for the same function | ## Checking availability ```{r, eval = FALSE} library(nmathopencl) # TRUE if this nmathopencl build was compiled with USE_OPENCL nmathopencl_has_opencl() # GPU device names on the host (opencltools --- system inventory) opencltools::gpu_names() ``` If OpenCL is not available (`nmathopencl_has_opencl()` returns `FALSE`), calls use the CPU equivalent unconditionally so examples and CI keep working. By default `fallback = FALSE`: when OpenCL **is** available but a dispatch fails, the error propagates---useful while debugging kernels. Pass `fallback = TRUE` only when you deliberately want GPU failures masked with the CPU analogue. ## Normal distribution ```{r, eval = FALSE} x <- rnorm(1e6) # Density d <- dnorm_opencl(x, mean = 0, sd = 1, log = FALSE) # CDF --- same core arguments as stats::pnorm(q, mean, sd, lower.tail, log.p); # plus opencl_parallel, fallback, verbose. Long outputs use recycling length. p <- pnorm_opencl( rep(1.96, 1e6), mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE ) # Quantile (`qnorm_opencl` retains leading `n` + scalar `p`, unlike `stats::qnorm`-style vectors) q <- qnorm_opencl(n = 1e6, p = 0.975, mean = 0, sd = 1) # Random draws r <- rnorm_opencl(n = 1e6, mean = 0, sd = 1) ``` Note the signature difference between `dnorm_opencl` (takes a vector `x`) and `qnorm_opencl` / `rnorm_opencl`, which retain a leading `n` argument for replicate-many quantiles (`qnorm`) or draws (`rnorm`). For the CDF, `pnorm_opencl` parallels `stats::pnorm(q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)` and adds only `opencl_parallel`, `fallback`, and `verbose`. Output length is the usual recycled length (`max(length(q), ...)`, as in stats). On GPU (when used), each output element runs `pnorm_kernel` once in sequence (high overhead until a batched kernel is added). ## Distribution families ### Gamma ```{r, eval = FALSE} dgamma_opencl(n, x, shape = 2, scale = 1) pgamma_opencl(q, shape = 2, rate = 1) qgamma_opencl(p, shape = 2, scale = 1) rgamma_opencl(n, shape = 2, scale = 1) ``` ### Binomial ```{r, eval = FALSE} dbinom_opencl(x, size = 10, prob = 0.3) pbinom_opencl(q, size = 10, prob = 0.3) qbinom_opencl(p, size = 10, prob = 0.3) rbinom_opencl(n, size = 10, prob = 0.3) ``` ### Poisson ```{r, eval = FALSE} dpois_opencl(x, lambda = 3) ppois_opencl(q, lambda = 3) qpois_opencl(p, lambda = 3) rpois_opencl(n, lambda = 3) ``` ### Beta ```{r, eval = FALSE} dbeta_opencl(x, shape1 = 2, shape2 = 5) pbeta_opencl(q, shape1 = 2, shape2 = 5) qbeta_opencl(p, shape1 = 2, shape2 = 5) rbeta_opencl(n, shape1 = 2, shape2 = 5) ``` ### Additional families All families below follow the same four-function pattern (`d*`, `p*`, `q*`, `r*`) with the same `fallback` / `verbose` arguments: | Family | R file | |--------|--------| | Cauchy | `cauchy_opencl.R` | | Chi-squared | `chisq_opencl.R` | | Exponential | `exponential_opencl.R` | | F | `f_opencl.R` | | Geometric | `geometric_opencl.R` | | Hypergeometric | `hypergeometric_opencl.R` | | Log-normal | `lnorm_opencl.R` | | Logistic | `logistic_opencl.R` | | Negative binomial | `negative_binomial_opencl.R` | | t | `t_opencl.R` | | Uniform | `uniform_opencl.R` | | Weibull | `weibull_opencl.R` | | Wilcoxon rank-sum | `rwilcox_opencl.R` | | Wilcoxon signed-rank | `signrank_opencl.R` | | Multivariate discrete | `multinomial_opencl.R` | | Tukey | `tukey_opencl.R` | | Bessel | `bessel_opencl.R` | ### Noncentral distributions Noncentral variants (`dnt_opencl`, `dnchisq_opencl`, `dnf_opencl`, `dnbeta_opencl`, and their `p*`, `q*` counterparts) are available with the noncentrality parameter `ncp`. ## Special functions and math support ```{r, eval = FALSE} # Log-gamma lgammafn_opencl(n, x) lgammafn_sign_opencl(n, x) # Gamma function gammafn_opencl(n, x) # Log-gamma at x+1 lgamma1p_opencl(n, x) # Digamma / polygamma digamma_opencl(n, x) trigamma_opencl(n, x) psigamma_opencl(n, x, deriv) # Beta and log-beta beta_special_opencl(n, a, b) lbeta_special_opencl(n, a, b) # Binomial coefficients choose_special_opencl(n_out, n, k) lchoose_special_opencl(n_out, n, k) # Logspace arithmetic logspace_add_opencl(n, logx, logy) logspace_sub_opencl(n, logx, logy) logspace_sum_opencl(n, logx, logy) # Math support fmax2_opencl(n, x, y) fmin2_opencl(n, x, y) ``` These are exported via `special_opencl.R` and `math_support_opencl.R`. ## R extension utilities (`R_ext`) ```{r, eval = FALSE} # R-compatible power r_pow_opencl(n, x, y) r_pow_di_opencl(n, x, i) # Miscellaneous math helpers log1pmx_opencl(n, x) log1pexp_opencl(n, x) log1mexp_opencl(n, x) pow1p_opencl(n, x, y) ``` These are exported via `rext_utils_opencl.R`. ## RNG core ```{r, eval = FALSE} # Base RNG draws norm_rand_opencl(n) unif_rand_opencl(n) exp_rand_opencl(n) r_unif_index_opencl(n, dt) ``` These generate random samples using device-side implementations of the same Kinderman-Ramage / Ahrens-Dieter algorithms used by R's internal RNG. They are exported via `rng_core_opencl.R`. ## Fallback behavior in detail 1. **`nmathopencl_has_opencl()` is `FALSE`** (no runtime device / not compiled with OpenCL): the wrapper always uses the CPU `stats::`/base analogue. The **`fallback` argument does not gate this**; it is ignored for this branch. 2. **`nmathopencl_has_opencl()` is `TRUE`**, but OpenCL enqueue/compile/bind fails inside the `.Call`: the outer R shim catches the failure. **`fallback = FALSE` (default):** propagate the error. **`fallback = TRUE`:** return the CPU analogue instead (optional masking for fragile stacks). Runnable examples under `inst/examples/` mostly use **`fallback = FALSE`** so `R CMD check` surfaces real OpenCL breakage when a device is present. ```{r, eval = FALSE} # Default --- surface GPU/build errors during development / local check with OpenCL enabled: dnorm_opencl(x) # Permit CPU masking while OpenCL is flaky (not the development default): dnorm_opencl(x, fallback = TRUE, verbose = TRUE) ``` ## Performance notes GPU acceleration is most beneficial for **large vectors** (>= ~10,000 elements) and for functions with non-trivial per-element computation (e.g., `pgamma`, `pt`, `qnorm`). For very small inputs the kernel compilation overhead and data transfer latency dominate. The first call in an R session may be slightly slower because OpenCL JIT- compiles the kernel source for the specific device. Subsequent calls with the same kernel reuse the compiled program from the driver's cache.