Quasi-Monte Carlo Rules

Approximating Integrals in High Dimensions— An Informal Report

Asia Pacific Mathematics Newsletter

Aug 26, 2013 • 11 min read

1. Introduction

For half a century or more many researchers have grappled with the problem of numerical integration in many dimensions— on the one hand to understand what is possible, on the other to construct and analyse effective algorithms. This is an informal and rather personal account of one part of that effort.

For simplicity, we consider only the problem of integration over the unit cube in $$s$$ dimensions.

Thus the task is to approximate the integral

$$I_s(f)=int_{[0,1]^s}f(x)dx,$$ _____ (1)

where $$s$$ is an integer greater than 1, and $$f$$ is a given continuous function (or a function from a given class), hopefully having also some additional smoothness property. The interesting case is when $$s$$ is large, say in the hundreds or even thousands.

The simplest kind of $$s-$$dimensional integration rule is just a product of 1-dimensional rules, obtained by using a favourite quadrature rule (say an $$m-$$point Gauss rule) in each dimension. But when $$s$$ is large no such product rule is feasible, since the total number of points for the $$s-$$dimensional rule is then $$n=m^s,$$ an impossibly large number for large $$s$$ : if $$s$$ is 100 then even a product of 2-point Gauss rules would need $$2^{100}$$ function evaluations, a number unlikely ever to be feasible on a conventional computer.

2. Quasi-Monte Carlo Rules

In this short report we consider only $$n-$$point integration rules of the form

$$Q_{n,s}(f)=Q_{n,s}(t_0,dots ,t_{n-1};f)=frac{1}{n}sum_{i=0}^{n-1}f(t_i),$$ _____ (2)

that use prescribed points $$t_0,dots ,t_{n-1}in [0,1]^s.$$ A rule of this form is called a Quasi-Monte Carlo (or QMC) rule. In contrast, if the points $$t_0,dots ,t_{n-1}in [0,1]^s$$ are independent random samples from a uniform distribution on $$[0,1]^s$$ then we have the simplest Monte Carlo rule. Monte Carlo methods have many virtues, the most prominent being that smoothness of the integrand $$f$$ is not needed: all that is needed is that $$f$$ belong to $$L_2([0,1]^s).$$ But the Monte Carlo method suffers from slow convergence: the root mean square error in the sense of expectation has the celebrated convergence rate of $$frac{1}{sqrt{n}}.$$ Our aim is to do better than that.

The central question for the QMC method is how to choose the points $$t_0,dots t_{n-1}.$$ That is a big subject, explored in a recent survey [1]. But here the view is personal and selective, so we can be brief.

There are now two main strategies for choosing QMC points, which we may call the low discrepancy and lattice approaches. Here the focus is on the second.

3. Lattice Rules

In the simplest form of lattice rule, the only kind considered here, the points are given by the simple formula

$$t_i={frac{iz}{n}}, i=0,1,dots ,n-1,$$ _____ (3)

where $$zin Z^s,$$ known as the generating vector, is an integer vector of length $$s$$ having no factor in common with $$n,$$ and the braces around a vector indicate that we take the fractional part of each component in the vector.

Lattice methods began life in the 1950s in the hands of number theorists, who obtained many beautiful results, especially for functions that are 1-periodic with respect to each component of $$x.$$ For surveys of the classical results see [4] and [5]. Those results, based on Fourier analysis, are of limited practical use, because high-dimensional integrals arising from applications are usually not naturally periodic; and forcing periodicity through a change of variables has a tendency to turn easy high-dimensional problems into difficult ones. In the developments reported here we make no periodicity assumption.

The point set (3) depends entirely on the choice of the integer vector $$z.$$ How should $$z$$ be chosen? Clearly there is no sense in allowing a component of $$z$$ to lie outside the range $$0,1,dots ,n-1.$$ Within these constraints, how are we to find a good choice for $$z$$ ? Up to the present time no closed formula for a good generating vector $$z$$ (whatever “good” might mean) is known, beyond $$s=2.$$ And intuition seems of little use in high dimensions. To find a good $$z$$ for such an $$s$$ we need some mathematical structure.

4. Worst Case Error, and a Norm with Weights

We assume that our functions $$f$$ all belong to some Hilbert space $$H,$$ and then choose a generating vector $$z$$ to minimise (or at least makes small) the worst case error in this function space. The worst case error in the space $$H$$ of a QMC rule $$Q_{n,s}(P;.)$$ using the point set $$P={t_0,t_1,dots ,t_{n-1}}$$ is the largest error for $$f$$ in the unit ball of $$H,$$ that is

$$e_{n,s}(P;H):= sup_{|f|_Hleq 1}|I_s(f)-Q_{n,s}(P;f)|.$$

So now we have again changed the problem, this time to one of choosing a good function space $$H.$$ It turns out that a good choice is to take $$H$$ to consist of $$s-$$variate absolutely continuous functions whose mixed first derivatives are squareintegrable. More precisely, we take

$$H=H_{s,gamma}$$

$$:={fin L_2([0,1]^s):int_{[0,1]^s}|frac{partial^{u}}{partial x_u}f(x_u;0)|^2dx_u<infty forall usubseteq {1:s}},$$

where $${1:s}$$ is shorthand for $${1,2,dots ,s},$$ while $$x_u$$ denotes the components $$x_j$$ of $$x$$ with $$jin u,$$ and $$(x_u;0)$$ denotes $$x$$ with all components $$x_j$$ with $$j$$ not in $$u$$ set equal to 0, which we refer to as the anchor.

But how exactly should we define the norm in $$H$$? It turns out to be a fruitful idea to introduce positive numbers called weights into the norm: in the most general case we introduce one weight $$gamma _u$$ for each $$usubseteq {1:s}$$ (with $$gamma_{phi}$$), taking the norm of $$H_{s,gamma}$$ to be

$$|f|_{s,gamma}:=|sum_{usubseteq {1:s}}gamma_u^{-1}int_{[0,1]^{|u|}}|frac{partial^{|u|}}{partial x_u}f(x_u;0)|^2dx_u|^{frac{1}{2}}.$$

Note that for $$f$$ to be in the unit ball of $$H_{s,gamma},$$ any term in the sum over $$u$$ with a small weight $$gamma_u$$ must also have a small mixed first derivative $$(frac{partial^{|u|}}{partial x_u})f.$$ Thus the weight parameters describe the relative importance of different subsets of the variables $$x_1,dots ,x_s:$$ small weights correspond to unimportant subsets.

The space $$H_{s,gamma}$$ is of course a Hilbert space, with the obvious inner product, which we denote by $$<.,.>_{s,gamma}.$$ Less obviously, $$H_{s,gamma}$$ is a reproducing kernel Hilbert space with the relatively simple kernel

$$K_{s,gamma}(x,y)=sum_{usubseteq{a:s}}gamma_uprod_{jin u}min(x_j,y_j).$$

That is to say, $$K_{s,gamma}(x,y)$$ is a symmetric function on $$[0,1]^stimes [0,1]^s$$ with the property that $$K_{s,gamma}(x,.)in H_{s,gamma}$$ and the reproducing property (easily verified for $$s=1$$)

$$f(x)=<K_{s,gamma}(x,.),f>_{s,gamma}forall xin [0,1]^s,fin H_{s,gamma}.$$

Why is the reproducing kernel property helpful? It is because in a reproducing kernel Hilbert space $$H_s(K)$$ with kernel $$K$$ the worst case error is computable, via the formula

$$e^2_{n,s}(P;H_s(K))=int_{[0,1]^s}int_{[0,1]^s}K(x,y)dxdy-frac{2}{n}sum_{i=0}^{n-1}int_{[0,1]^s}K(t_i,y)dy+frac{1}{n^2}sum_{i=0}^{n-1}sum_{k=0}^{n-1}K(t_i,t_k).$$

This can easily be proved by simple Hilbert space arguments together with the reproducing property of the kernel.

But is the worst case error really computable for our function space $$H_{s,gamma}=H(K_{s,gamma}),$$ given that the sum over $$u$$ in our expression for $$K_{s,gamma}$$ contains $$2^s$$ terms? Clearly computations with general weights $$gamma_u$$ are not feasible when $$s$$ is large, but they become so for some special cases. The original weights introduced in [6] were of the product form

$$gamma_u=prod_{jin u}tau_j,$$

in which case it is easily seen that $$K_{s,gamma}$$ can be expressed as the easily evaluated product

$$K_{s,gamma}(x,y)=prod_{j=1}^s(1+tau_jmin(x_j,y_j)).$$

Many results take an especially simple form in the product case. In particular, the necessary and sufficient condition established in [6] for the worst case error to be bounded independently of $$s$$ is just

$$sum_{j=1}^{infty}tau_j<infty.$$

This condition fails spectacularly in the classical case in which $$tau_j=1$$ for all $$j,$$ but is satisfied, for example, if $$tau_j=frac{1}{j^2}.$$

Nevertheless, it has to be admitted that product weights, in spite of their popularity, were introduced for mathematical convenience, rather than being guided by application. In this report general weights have been retained, because certain non-product weights of so-called product and order dependent or POD form, have arisen recently in applications, see [2]. The POD weights have the form

$$gamma_u=Gamma_{|u|}prod_{jin u}tau_j, usubseteq{1:s},$$

Where $$Gamma_0=1,Gamma_1,dots Gamma_s$$ are given positive numbers. As with product weights, computations with POD weights turn out to be feasible (see below).

5. A Little Randomisation

It turns out to be useful, for reasons of both theory and practice, to modify slightly the QMC algorithm by introducing some randomisation: specifically, we replace the lattice rule given by (2) and (3) by the randomly shifted lattice rule

$$Q_{n,s}^{ran}(z;f):=frac{1}{n}sum_{j=0}^{n-1}f({frac{iz}{n}+Delta}),$$

where $$Delta$$ is a random vector of length s drawn from a uniform distribution on $$[0,1]^s.$$ This has some flavour of the Monte Carlo method: indeed it reduces to the Monte Carlo method if $$n=1.$$ In practice the randomly shifted lattice rule is implemented, once $$z$$ is chosen, by picking some fixed number (say 10 or 30) of random (or rather pseudo-random) shifts $$Delta_k$$ and averaging the cubature results for the separate shifts to obtain an estimate of the integral, while using the spread of the results to estimate the error, just as with the Monte Carlo method. The worst case error is now replaced by the root mean square expected value of the error, known as the shift-averaged worst case error,

$$e_{n,s}^{sh}(z;H):=(E|sup_{|f|_{Hleq 1}}|I_s(f)-Q_{n,s}^{ran}(z;f)|^2|)^{frac{1}{2}}.$$

For the case of our particular Hilbert space $$H_{s,gamma}$$ it can be computed by the formula (see [1, Lemma 5.7])

$$e_{n,s}^{sh}(z;H_{s,gamma})=(sum_{phinequsubseteq {1:s}}gamma_u(frac{1}{n}sum_{k=0}^{n-1}prod_{fin u}[B_2({frac{kz_j}{n}})+frac{1}{3}]-(frac{1}{3})^{|u|}))^{frac{1}{2}},$$

where $$B_2(x)=x^2-x+frac{1}{6}$$ is the Bernoulli polynomial of degree 2. For the case of product weights the expression reduces to the easily computable

$$e_{n,s}^{sh}(z;H_{s,gamma})=(-prod_{j=1}^s(1+frac{tau_j}{3})+frac{1}{n}sum_{k=0}^{n-1}prod_{j=1}[B_2({frac{kz_j}{n}})+frac{1}{3}]))^{frac{1}{2}}.$$

6. Good Lattice Rules Exist!

As explained below, we now know the following:

For every $$n$$ there exists $$zin{0,1,dots ,n-1}^s$$ such that, for all $$gammain (frac{1}{2},1],$$

$$e_{n,s}^{sh}(z;H_{s,gamma})leq (frac{1}{phi(n)}sum_{phineq usubseteq {1:s}}gamma_u^{lambda}(frac{2zeta(2lambda)}{(2pi^2)^{lambda}}+(frac{1}{3})^{lambda})^{|u|})^{frac{1}{(2lambda)}}.$$ _____ (4)

Here $$zeta(.)$$ is the Riemann zeta function, and $$phi(n)$$ is the Euler totient function, that is, it is the number of integers between 1 and $$n-1$$ that are relatively prime to $$n.$$ If $$n$$ is prime then $$phi(n)=n-1$$ and in that case the order of convergence is $$n^{-frac{1}{2lambda}}.$$ We would set $$lambda=frac{1}{2}$$ if that were possible, because then the order of convergence would be $$n^{-1},$$ but that is not possible because $$zeta(2lamda)$$ diverges as $$2lambda$$ approaches 1 from above. But the result shows (since $$lambda$$ can be chosen arbitrarily close to 1/2) that for every $$fin H_{s,gamma}$$ the rate of convergence will ultimately be arbitrarily close to $$n^{-1}.$$ Can good rates of convergence be obtained independently of dimension? The answer depends on the weights $$gamma_u$$ : the answer is yes if the weights $$gamma_u$$ decay sufficiently quickly.

7. Good Lattice Rules are (sometimes) Constructible!

Lattice rules that achieve the bound (4) are in principle constructible— indeed, the proof of (4) (see [1, Theorem 5.9]) is by an inductive argument based on an explicit construction. That construction is the so-called component by component (or CBC) construction.

The idea of the CBC construction is that the components of the generating vector $$z=(z_1,z_2,dots ,z_s)$$ are determined one after the other, starting with $$z_1=1$$ with the successive components $$z_d$$ for $$d=2,dots ,s$$ being minimisers of a quantity related to the shift-averaged worst case error for the $$d-$$dimensional problem. For our particular choice of norm, the quantity to be minimised in the case of general weights $$gamma_u$$ is not exactly the shift-averaged worst case error, see [1, Theorem 5.9], but if we change slightly the definition of the norm, to

$$|f|_{s,gamma}:=[sum_{usubseteq {1:s}}gamma_u^{-1}int_{[0,1]^{s-|u|}}|int_{[0,1]^{s-|u|}}frac{partial^{|u|}}{partial x_u}f(x_u;x_{-u})dx_{-u}|^2dx_u]^{frac{1}{2}},$$

then the CBC algorithm precisely minimises the shift-averaged worstcase error. Here $$x_{-u}$$ denotes the components $$x_j$$ of $$x$$ with $$j$$ in $${1:s}$$ but not in $$u,$$ so that now the components of $$x$$ that do not appear in the mixed derivative are integrated out instead of being anchored at 0. The formula for the shift-averaged worst case error is the same as before, except that the $$frac{1}{3}$$ terms are now replaced by zero. The bound achieved by the CBC construction (proved inductively in [1, Theorem 5.8]) is exactly (4), except that again the $$frac{1}{3}$$ term is replaced by zero. Finally, fast CBC construction is possible for both product weights and POD weights, see respectively [1, Section 5.5] and [1, Section 5.6].

8. Conclusion

We have seen that lattice rules for high dimensional integration are at an interesting stage of development: for given dimensionality $$s$$ and given weights $$(gamma_u)$$ we can in principle construct a generating vector $$z$$ for which the order of convergence for $$f$$ in the unit ball of $$H_{s,gamma}$$ is arbitrarily close to $$O(n^{-1}),$$ with an implied constant that is known explicitly for a given convergence rate, and that is independent of the dimensionality $$s$$ if the weights are small enough. The construction is feasible for both product weights and POD weights.

This informal report is incomplete in many ways. Not reported on are higher order QMC methods, which aim to achieve a rate of convergence faster than $$O(n^{-1}).$$ Also not reported is infinite-dimensional integration, a topic now showing high levels of activity. We have not done justice to the important work on fast implementation of CBC, initiated by Dirk Nuyens and Ronald Cools. We have not discussed how the weights should be chosen for a given practical problem. We have not considered lattice rules that are extensible in number of points or dimension. We have not considered integration over unbounded regions. All these matters are considered in [1].

Finally, many people have contributed to the recent developments outlined here, too many to mention them all, but especially I want to acknowledge Henryk Wo´zniakowski, Stephen Joe, Frances Kuo and Josef Dick. I am grateful to these and all others for their contributions, and to the Australian Research Council for its sustained support.

References

[1] J. Dick, F. Y. Kuo and I. H. Sloan, Numerical integration in high dimensions — the quasi-Monte Carlo way, Acta Numerica, 13 (2013) 133–288.

[2] F. Y. Kuo, C. Schwab and I. H. Sloan, Quasi- Monte Carlo finite element methods for a class of elliptic partial differential equations with random coefficients, SIAM J., Numerical Analysis, 50 (2012) 3351–3374.

[3] F. Y. Kuo and I. H. Sloan, Lifting the curse of dimensionality, Notices of the AMS 52 (2005) 1320–1328.

[4] H. Niederreiter, Quasi-Monte Carlo methods and pseudo random numbers, SIAM (1992).

[5] I. H. Sloan and S. Joe, Lattice Methods for Multiple Integration (Oxford University Press, 1994).

[6] I. H. Sloan and H. Wo´zniakowski, When are quasi-Monte Carlo algorithms, efficient for highdimensional integrals? J. Complexity 14 (1998) 1–33.

Ian Sloan

University of New South Wales, Australia

Ian Sloan completed physics and mathematics degrees at Melbourne University, a Master’s degree in mathematical physics at Adelaide, and a PhD in theoretical atomic physics (under the supervision of HSW Massey) at the University of London, finishing in 1964. After a decade of research on few-body collision problems in nuclear physics, his main research interests shifted to computational mathematics.

He was Head of the School of Mathematics in University of New South Wales and member of the ARC’s Research Grants Committee, and is a former President of the Australian Mathematical Society. His awards/recognitions include Fellow of the Australian Academy of Science, ANZIAM Medal, etc. In 2008 he was appointed an Officer of the Order of Australia (AO). He is currently Deputy Director of MASCOS, the ARC Centre of Excellence for Mathematics and Statistics of Complex Systems.

[This article originally appeared in the Asia Pacific Mathematics Newsletter, Volume 3 No. 3 (July 2013), published by World Scientific. It has been republished here with a special permission from World Scientific.]