Examples of numeric integration

*Author: Kyle Miller
Date: Monday, October 12, 2015*

In this I will show you how you might use a computer to find approximations for definite integrals. The programming language I am using is Python (in an iPython notebook), but the techniques apply to any language with floating-point numbers.

Normal distributions

The first example we will look at is $f(x)=\frac{1}{\sqrt{\pi}}\int_{0}^x e^{-t^2}\,dt$, which is the function for a normal distribution. (The reason for the $\frac{1}{\sqrt{\pi}}$ is so that $\frac{1}{\sqrt{\pi}}\int_{-\infty}^\infty e^{-t^2}\,dt=1$. We will learn how to compute this improper integral later.) The idea with a probability distribution is that if we randomly select a real number according to the distribution, the probability that the chosen number lies in the interval $[a,b]$ is $f(b)-f(a)$.

This integral has no elementary antiderivative, so if we would like numbers we must approximate it somehow. We might consider using an infinite series or a numerical integration technique. The trapezoidal rule is relatively easy to implement, so for sake of exercise, we would like to compute $f(x)$ to ten decimal places (i.e., an error of at most $\varepsilon=10^{-10}$). Let us define this error in Python:

In [1]:
error = 10 ** -10

Let $g(t)=\frac{1}{\sqrt{\pi}} e^{-t^2}$.

In [2]:
import math
def g(t) :
    return math.exp(-t**2) / math.sqrt(math.pi)
    # note: the exp function is the exponential, so exp(x) == Math.e ** x

Just to get an idea of what this function is, let's plot it:

In [3]:
%matplotlib inline

import matplotlib
import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(-3, 3, 500)
plt.plot(x, np.vectorize(g)(x))
plt.title('g(x)')
plt.show()

According to Section 7.7 of Stewart, if $K$ is an upper bound for $|g''(t)|$ for $t\in [0,x]$, then $$ |E_T|\leq \frac{K(x-0)^3}{12n^2}$$ with $n$ the number of trapezoids (for simplicity we are assuming $x>0$). If we select an $n$ such that this error estimate is less than $\varepsilon$, then $|E_T|$ will be less than $\varepsilon$ as well, so solving $$\frac{Kx^3}{12n^2}\leq\varepsilon$$ for $n$, we see $n\geq \sqrt{\frac{Kx^3}{12\varepsilon}}$. For our particular $\varepsilon$, this is $10^5\sqrt{Kx^3/12}$.

The first derivative of $g(t)=e^{-t^2}/\sqrt{\pi}$ is $g'(t)=-2te^{-t^2}/\sqrt{\pi}$, and the second derivative is $g''(t)=(4t^2-2)e^{-t^2}/\sqrt{\pi}$. Conveniently, since $g'(t)=0$ only if $t=0$, a local optimum of $e^{-t^2}$ must be at $0$, and by some further analysis of intervals of increase and decrease we see that in fact $e^{-t^2}$ takes a maximum value of $1$ at $t=0$. So, $|g''(t)|\leq|4t^2-2|/\sqrt{\pi}$ for all $t$. Furthermore, by the triangle inequality, $|4t^2-2|\leq |4t^2|+|2|=4t^2+2$, so we may as well take $K$ to be $(4x^2+2)/\sqrt{\pi}$ since $4t^2+2$ is increasing along $[0,x]$. Thus, if $n$ satisfies the following, we will have a sufficient number of trapezoids to meet our maximum acceptable error $\varepsilon$: $$n\geq \sqrt{\frac{x^3(4x^2+2)}{12\varepsilon\sqrt{\pi}}}$$

We will write a small function which computes this number:

In [4]:
def num_trapezoids(x) :
    """x is the upper bound of the interval which we are using to compute
    the integral of g.  This function uses the global variable `error` for
    getting the lower bound on trapezoids"""
    
    nfrac = math.sqrt((x**3 * (4 * x**2 + 2)) / (12 * error * math.sqrt(math.pi)))
    # since nfrac is a fraction, we should find the least integer which is
    # at least as large as nfrac.  This is what the ceiling function is for.
    return int(math.ceil(nfrac))

(We aren't being particularly careful here. To be completely rigorous, we would need to check which direction each operation rounds so that num_trapezoids returns a number which is no smaller than the "correct" answer.)

To test this, let us see how many trapezoids are sufficient for the interval $[0,1]$:

In [5]:
num_trapezoids(1)
Out[5]:
53113

That is quite a lot of trapezoids! That is, if a human had to compute them. Luckily for us, we have electronic computers to grind through the calculations. Next, we automate computing the trapezoid rule. A first useful function is finding a sample point from the index. That is, for $i$ from $0$ through $n$, we return the corresponding one of the equally-spaced points in the interval $[0,x]$.

In [6]:
def sample_pt(x, n, i) :
    """For an integer i between 0 and n inclusive, returns a point in [0,x]"""
    return x * i / n
def width(x, n) :
    """Computes the width of a trapezoid."""
    return x / n

With this we compute using the trapezoid rule. The for loop is representing the sum $\sum_{i=0}^{n-1}\frac{x}{n}\cdot\frac{g(xi/n) + g(x(i+1)/n)}{2}$.

In [7]:
def f(x) :
    """Computes the definite integral for g on the closed interval [0,x] using
    the trapezoid rule."""
    
    x = float(x)  # technicality: want x to be a floating-point number rather
                  # than an integer
    
    if x < 0 : raise ArgumentError("We are assuming x >= 0.")
    
    n = num_trapezoids(x)
    
    acc = 0.0  # the accumulator for the integral
    for i in range(0, n) :  # the upper bound is exclusive
        left_val = g(sample_pt(x, n, i))
        right_val = g(sample_pt(x, n, i + 1))
        acc += width(x, n) * (left_val + right_val) / 2
    
    return acc

Since we aren't the ones computing the integral by hand, we are being somewhat wasteful, recomputing the function at the same points multiple times. We could be less wasteful, but it would make the code slightly less understandable. In practice, if we found this function is taking too much time to compute, we would be less wasteful, for instance using the formula for the trapezoid rule in the book. Or perhaps switch to using Simpson's rule, or maybe using another technique altogether.

Let us test this at $0$. Hopefully $f(0)=0$.

In [8]:
f(0)
Out[8]:
0.0

Next let us test $f(1)$. According to Mathematica (typing in N[Integrate[e^(-x^2)/sqrt(pi),{x,0,1}],11] in WolframAlpha), this should be, to eleven decimal places, $0.42135039647$. Let us check:

In [9]:
f(1)
Out[9]:
0.4213503964625907

And this matches to our wanted ten!

Now what about something like $f(10)$? First, how many trapezoids will this want?

In [10]:
num_trapezoids(10)
Out[10]:
13747855

Uh oh, that is quite a lot more rectangles.

In [11]:
%time f(10)
CPU times: user 24.1 s, sys: 213 ms, total: 24.3 s
Wall time: 24.5 s
Out[11]:
0.499999999997392

This, rounded to 10 places, is $0.5$ (which corroborates with WolframAlpha). But look how long it took: 24 seconds. A first optimization is to not overcalculate the function by a factor of two, which would bring the running time down to around 12 seconds. This might still not be fast enough. If we needed these integrals for a real purpose, we could break the integral up into intervals, for instance $[0,1]$ and $[1,10]$, then do the error analysis on each interval separately, since one could get better bounds on the second derivative in $[1,10]$ than in $[0,1]$. The keyword here is "adaptive," because the sampling density changes, adapting to characteristics of the function being integrated.

Just for kicks, let us build a table of $\frac{1}{\sqrt{\pi}}\int_{-\infty}^xe^{-t^2}\,dt$. Building tables was the imagined use of computers, so we will fulfill that dream. Notice that the integrand is an even function, so this is equal to half the integral from $-\infty$ to $\infty$ plus our $f(x)$. That is, $\frac{1}{\sqrt{\pi}}\int_{-\infty}^xe^{-t^2}\,dt=\frac{1}{2}+f(x)$. This function represents the probability a real number selected according to the distribution will be less than $x$.

In [12]:
max_x = 3.0
table_entries = 60
xs = []
ys = []
print "   x |   f(x) + 1/2"
print "-----+-------------"
for i in range(table_entries+1) :
    x = max_x * i / table_entries
    intx = 0.5 + f(x)
    print "{0:0.2f} | {1:0.10f}".format(x, intx)
    xs.append(x); ys.append(intx) # save these for plotting
   x |   f(x) + 1/2
-----+-------------
0.00 | 0.5000000000
0.05 | 0.5281859888
0.10 | 0.5562314579
0.15 | 0.5839979856
0.20 | 0.6113512945
0.25 | 0.6381631950
0.30 | 0.6643133797
0.35 | 0.6896910267
0.40 | 0.7141961775
0.45 | 0.7377408598
0.50 | 0.7602499389
0.55 | 0.7816616831
0.60 | 0.8019280454
0.65 | 0.8210146636
0.70 | 0.8389005969
0.75 | 0.8555778168
0.80 | 0.8710504823
0.85 | 0.8853340288
0.90 | 0.8984541062
0.95 | 0.9104454036
1.00 | 0.9213503965
1.05 | 0.9312180530
1.10 | 0.9401025348
1.15 | 0.9480619215
1.20 | 0.9551569891
1.25 | 0.9614500641
1.30 | 0.9670039725
1.35 | 0.9718810981
1.40 | 0.9761425599
1.45 | 0.9798475128
1.50 | 0.9830525732
1.55 | 0.9858113666
1.60 | 0.9881741917
1.65 | 0.9901877925
1.70 | 0.9918952293
1.75 | 0.9933358356
1.80 | 0.9945452508
1.85 | 0.9955555150
1.90 | 0.9963952146
1.95 | 0.9970896668
2.00 | 0.9976611325
2.05 | 0.9981290480
2.10 | 0.9985102667
2.15 | 0.9988193035
2.20 | 0.9990685769
2.25 | 0.9992686417
2.30 | 0.9994284117
2.35 | 0.9995553665
2.40 | 0.9996557431
2.45 | 0.9997347099
2.50 | 0.9997965240
2.55 | 0.9998446698
2.60 | 0.9998819828
2.65 | 0.9999107561
2.70 | 0.9999328336
2.75 | 0.9999496890
2.80 | 0.9999624934
2.85 | 0.9999721719
2.90 | 0.9999794511
2.95 | 0.9999848985
3.00 | 0.9999889548

We may plot these data to see a graph which looks about as expected (as in, it is close to what we would have come up with using calculus-based techniques for graphing).

In [13]:
plt.plot(xs, ys)
plt.axis((-0.1,3.1,0.4,1.1))
plt.title('integral from -infty to x')
plt.show()

Though, using symmetry properties of the integral, in particular that $f(x)$ is odd since $g(x)$ is even, we may extend the graph as follows:

In [14]:
xs2 = list(reversed([-x for x in xs])) + xs
ys2 = list(reversed([1-y for y in ys])) + ys
plt.plot(xs2, ys2)
plt.axis((-3,3,-0.1,1.1))
plt.title('integral from -infty to x, extended to negative side')
plt.show()

A simple integral

Next a simple test of $\int_0^x\cos t\,dt$. We already know that this is equal to $\sin x$, but let us see if we can use a computer to compute this numerically. For this we will just use the trapezoid rule. (I would use the left-endpoint rule, but there is no error estimate for this in the book.) The error bound is $\frac{Kx^3}{12n^2}$ for $n$ trapezoids, and since the absolute value of the second derivative of $\cos t$ is $|\cos t|$, we see that $K=1$ is a suitable upper bound for the second derivative. Hence, to get an answer within some error $\varepsilon$, it is sufficient for us to have $n\geq \sqrt{\frac{x^3}{12\varepsilon}}$. With this, we write the integral $f(x)$:

In [15]:
def integral_cos(x) :
    """Computes the integral of cos from 0 to x, within some precision
    defined by the global variable `error`."""
    
    x = float(x)
    if x < 0 : raise ArgumentError("We are assuming x >= 0")
    
    n = int(math.ceil(math.sqrt(x ** 3 / (12 * error))))
    
    acc = 0.0
    for i in range(0, n) :
        left_pt = sample_pt(x, n, i)
        right_pt = sample_pt(x, n, i + 1)
        acc += width(x, n) * (math.cos(left_pt) + math.cos(right_pt)) / 2
    return acc

Hopefully this evaluated at $\pi$ is $0$, since $\int_0^\pi\cos t\,dt=0$:

In [16]:
integral_cos(math.pi)
Out[16]:
1.2277911977040534e-16

And the distance between this and $0$ is less than $\varepsilon=10^{-10}$, so it seems correct. Let's make a graph on the interval $[0,2\pi]$:

In [17]:
x = np.linspace(0, 2*math.pi, 50)
plt.plot(x, np.vectorize(integral_cos)(x))
plt.title('integral_cos(x)')
plt.show()

That look's like $\sin(x)$ to me! Let's look at the error, that is, the absolute value of the difference between integral_cos and the actual $\sin(x)$ function.

In [18]:
x = np.linspace(0, 2*math.pi, 50)
plt.plot(x, np.abs(np.sin(x) - np.vectorize(integral_cos)(x)))
plt.title('|sin(x) - integral_cos(x)|')
plt.show()

The vertical scale is in units of $10^{-10}$, so all of the errors are well within $\varepsilon$ (except for a brief region near $0$ where the errors get fairly close to our maximum acceptable error). Though, to tell the whole story, we should take a look at how many trapezoids we are using for each value of $x$:

In [19]:
x = np.linspace(0, 2*math.pi, 50)
plt.plot(x, np.ceil(np.sqrt(x ** 3 / (12 * error))))
plt.title('number of trapezoids')
plt.show()

Computing $\pi$

The last example will be for estimating $4\tan^{-1}1$, since this quantity equals $\pi$. Recall that the derivative of $4\tan^{-1} x$ is $\frac{4}{1+x^2}$. Again, we will be using the trapezoidal rule. The second derivative of $\frac{4}{1+x^2}$ is $4\frac{6x^2-2}{(x^2+1)^3}$. If we analyze for critical points, we can find that the point of greatest absolute value on the domain is at $0$, where it reaches $-8$. Hence, the number of trapezoids must satisfy $$\frac{8x^3}{12n^2}<\varepsilon$$ which gives $n\geq\sqrt{\frac{8x^3}{12\varepsilon}}$.

For concreteness, when $x=1$ and $\varepsilon=10^{-10}$, then $n=81650$ will do. This is much better than the number of terms required for computing $\tan^{-1}(1)$ through integrating the infinite series for $1/(1+x^2)$: in that case, you would need to compute on the order of $10^{10}$ terms!

Writing this as a program:

In [24]:
def int_4atan(x) :
    """Computes the integral on [0,x] to get 4*atan.  Again computes to within `error`."""
    
    x = float(x)
    if x < 0 : raise ArgumentError("Expecting x >= 0")
    
    n = int(math.ceil(math.sqrt((8 * x**3) / (12 * error))))
    
    acc = 0.0
    for i in range(0, n) :
        left = 4 / (1 + sample_pt(x, n, i) ** 2)
        right = 4 / (1 + sample_pt(x, n, i+1) ** 2)
        acc += width(x, n) * (left + right) / 2
    return acc

And it really does give $\pi$ to ten places after the decimal point:

In [25]:
int_4atan(1)
Out[25]:
3.141592653564764

Exercises

  1. Write some equivalent programs for computing these using the midpoint and/or Simpson's rules instead.
  2. Make a logarithm table for numbers from $1$ to $e$ using $\ln x=\int_1^x\frac{dt}{t}$. Bonus: make a table for $\log_{10} x$ from $1$ to $10$. Make sure you don't use math.ln or math.log, otherwise it is sort of cheating.
  3. Compute some interesting integrals.
  4. Try breaking up the interval and compute the bounds on the second derivative on each subinterval so that you need fewer trapezoids.
  5. Super bonus: Make an adaptive integrator which makes use of a function which can find an upper bound of the absolute value of the second derivative on a given interval. Hint: use the trapezoid rule when $n=1$, and if the error is too great for the sub-interval, recurse by breaking the integral into two, and allow each sub-integral half of the error.
  6. For understanding the midpoint- and trapezoidal-rules' error bounds, use the mean value theorem to prove the inequality in the book when $n=1$.
  7. How might you use the inequality $T_n\leq \int_a^b f(t)\,dt\leq M_n$ for a function $f$ which is concave-down on $[a,b]$ to improve the estimates? (Hint: consider inflection points.)
In [ ]: