 # SigOpt and QMCPy: What Makes a Sequence “Low Discrepancy”?

Quasi Monte Carlo

This work was originally published here, and has been reproduced with the author’s permission.

The first blog post in this series, “Why add Q to MC?”, introduced the concept of evenly spread points, which are commonly referred to as low discrepancy (LD) points. This is in contrast to independent and identically distributed (IID) points.

Consider two sequences,

$$\boldsymbol{T}_1, \boldsymbol{T}_2, \ldots \overset{\text{IID}}{\sim} \mathcal{U}[0,1]^d$$ Figure 1: 64 IID standard uniform points in 2 dimensions.

$\boldsymbol{X}_1, \boldsymbol{X}_2, \ldots \overset{\text{LD}}{\sim} \mathcal{U}[0,1]^d.$ Figure 2: 64 shifted lattice points in 2 dimensions.

Both sequences are expected to look like points spread uniformly over the unit cube, $$[0,1]^d$$. The first sequence must be random, or as random looking as our (deterministic) random number generators can make it. Since the points are independent, the location of any $$\boldsymbol{T}_i$$ has no bearing on the location of any other $$\boldsymbol{T}_j$$. Removing a point at random does not affect the IID property.

The second sequence may be random or deterministic. Let $$F_{\{\boldsymbol{X}_1, \ldots, \boldsymbol{X}_n\}}$$ denote the empirical distribution function of the first $$n$$ points of this sequence, i.e., the probability distribution that assigns a probability of $$1/n$$ to each location $$\boldsymbol{X}_i$$. For $$\boldsymbol{X}_1, \boldsymbol{X}_2, \ldots$$ to be LD, $$F_{\{\boldsymbol{X}_1, \ldots, \boldsymbol{X}_n\}}$$ should be close to the uniform probability distribution, $$F_{\text{unif}}: \boldsymbol{x} \mapsto x_1 \cdots x_d$$.

“Close” implies that we can measure how far apart two distributions are. We call this measure the discrepancy. Just like beauty is in the eye of the beholder, so there are different measurements of discrepancy. They tend to take the form of a the distance between the empirical distribution of the point set, $$F_{\{\boldsymbol{X}_1, \ldots, \boldsymbol{X}_n\}}$$, and the target measure, $$F_{\text{unif}}$$. An example is the star discrepancy [1,(3.16)]:

$\text{disc}(\{\boldsymbol{X}_1, \ldots, \boldsymbol{X}_n\}) := \sup_{\boldsymbol{x} \in [0,1]^d} | F_{\text{unif}} (\boldsymbol{x}) – F_{\{\boldsymbol{X}_1, \ldots, \boldsymbol{X}_n\}}(\boldsymbol{x})|,$

which is known in the statistics literature as a Kolomogorov-Smirnov goodness-of-fit statistic. This discrepancy compares is the maximum absolute difference between the volume of the box $$[\boldsymbol{0},\boldsymbol{x}]^d$$ and the proportion of the points $$\{\boldsymbol{X}_1, \ldots, \boldsymbol{X}_n\}$$ that lie in that box. Ideally, these should be the same, but practically they will be at least a bit different.

The computational cost of evaluating the star discrepancy can be rather large, typically at least $$\mathcal{O}(n^d)$$ operations. A family of computationally cheaper discrepancies is defined in terms of kernel, $$K: [0,1]^d \times [0,1]^d \to \mathbb{R}$$, which satisfies two crucial properties:

\begin{aligned} \text{Symmetry:} \quad& K(\boldsymbol{t},\boldsymbol{x}) = K(\boldsymbol{x},\boldsymbol{t}) \qquad \forall \boldsymbol{t}, \boldsymbol{x} \in [0,1]^d, \\ \text{Positive Definiteness:} \quad& \boldsymbol{c}^T \mathsf{K} \boldsymbol{c} > 0, \text{ where } \mathsf{K} =\bigl(K(\boldsymbol{x}_i, \boldsymbol{x}_j) \bigr)_{i,j=1}^n, \\ & \qquad \qquad \forall \boldsymbol{c} \ne \boldsymbol{0}, \text{ distinct } \boldsymbol{x}_1, \boldsymbol{x}_2, \ldots \in [0,1]^d. \end{aligned}

For such a kernel, we may define a discrepancy as

\begin{aligned} &\text{disc}(\{\boldsymbol{X}_1, \ldots, \boldsymbol{X}_n\}) \\ &\qquad := \int_{[0,1]^d \times [0,1]^d} K(\boldsymbol{t}, \boldsymbol{x}) \, \rm d (F_{\text{unif}} – F_{\{\boldsymbol{X}_1, \ldots, \boldsymbol{X}_n\}})(\boldsymbol{t}) \, \rm d(F_{\text{unif}} – F_{\{\boldsymbol{X}_1, \ldots, \boldsymbol{X}_n\}}) (\boldsymbol{x}) \\ &\qquad = \int_{[0,1]^d \times [0,1]^d} K(\boldsymbol{t}, \boldsymbol{x}) \, \rm d \boldsymbol{t} \rm d \boldsymbol{x} – \frac{2}{n} \sum_{i=1}^n \int_{[0,1]^d} K(\boldsymbol{x}_i,\boldsymbol{x}) \, \rm d \boldsymbol{x} \\ &\qquad \qquad \qquad + \frac{1}{n^2} \sum_{i,j=1}^n K(\boldsymbol{x}_i, \boldsymbol{x}_j). \end{aligned}

For example, the centered $L^2$-discrepancy  is defined in terms of the kernel

\begin{aligned} K(\boldsymbol{t},\boldsymbol{x}) = \prod_{k=1}^d \left[1 + \frac{1}{2} |t_k – 1/2| + \frac{1}{2} |x_k – 1/2| = \frac{1}{2} |t_k – x_k| \right]. \end{aligned}

After straightforward calculations it becomes

\begin{aligned} & \text{disc}({\boldsymbol{X}_1, \ldots, \boldsymbol{X}_n}) = \left(\frac{13}{12} \right)^d – \frac{2}{n} \sum_{i=1}^n \prod_{k=1}^d \left(1 + \frac{1}{2} |x_k – 1/2| – \frac{1}{2} |x_k – 1/2|^2 \right)\ + \\ & \qquad \qquad \frac{1}{n^2} \sum_{i,j=1}^n\prod_{k=1}^d \left[1 + \frac{1}{2} |x_{ik} – 1/2| + \frac{1}{2} |x_{jk} – 1/2| = \frac{1}{2} |x_{ik} – x_{jk}| \right]. \end{aligned}

This discrepancy only requires $$\mathcal{O}(dn^2)$$ operations to evaluate.

LD sequences have discrepancies of $$\mathcal{O}(n^{-1+\epsilon})$$ for the discrepancies illustrated above. IID sequences have root mean square discrepancies of $$\mathcal{O}(n^{-1/2})$$.This difference is asymptotic order can translate into orders of magnitude improvements in the accuracy of numerical solutions.

For problems where $d$ is large, the discrepancies defined above do not decay so quickly. However, if these discrepancy definitions are modified to include coordinate weights [1, Section 4], then they retain their $$\mathcal{O}(n^{-1 +\epsilon})$$ decay. Coordinate weights express the assumption that certain coordinates contribute more to the variation of the function than others.

Demonstrating that a particular sequence is LD can be done by brute force computation, which requires in general $$\mathcal{O}(dn^2)$$ operations. For certain sequences matched with certain discrepancy definitions, this can be reduced to $$\mathcal{O}(dn)$$ operations. If $$n$$ is small enough, the search for an LD set can be performed using global optimization algorithms . Number theoretic arguments are used to construct certain popular LD sequences [4,5].

#### References

1. Dick, J., Kuo, F. & Sloan, I. H. High dimensional integration — the Quasi-Monte Carlo way. Acta Numer.22, 133–288 (2013).

2. Hickernell, F. J. A Generalized Discrepancy and Quadrature Error Bound. Math. Comp. 67, 299–322 (1998).

3. Winker, P. & Fang, K. T. Application of Threshold Accepting to the Evaluation of the Discrepancy of a Set of Points. SIAM J. Numer. Anal. 34, 2028–2042 (1997).

4. Dick, J. & Pillichshammer, F. Digital Nets and Sequences: Discrepancy Theory and Quasi-Monte Carlo Integration (Cambridge University Press, Cambridge, 2010).

5. Niederreiter, H. Random Number Generation and Quasi-Monte Carlo Methods (SIAM, Philadelphia, 1992).

Fred J. Hickernell Guest Author