How to Choose Knots

This example shows how to select and optimize knots using the optknt and newknt commands from Curve Fitting Toolbox™.

Sample Data

Here are some sample data, much used for testing spline approximation with variable knots, the so-called Titanium Heat Data. They record some property of titanium measured as a function of temperature.

[xx,yy] = titanium;
plot(xx,yy,'x');
axis([500 1100 .55 2.25]);
title('The Titanium Heat Data');
hold on

Figure contains an axes. The axes with title The Titanium Heat Data contains an object of type line.

Notice the rather sharp peak. We'll use these data to illustrate some methods for knot selection.

First, we pick a few data points from these somewhat rough data. We will interpolate using this subset, then compare results to the full dataset.

pick = [1 5 11 21 27 29 31 33 35 40 45 49];
tau = xx(pick);
y = yy(pick);
plot(tau,y,'ro');
legend({'Full Dataset' 'Subsampled Data'}, 'location','NW');

Figure contains an axes. The axes with title The Titanium Heat Data contains 2 objects of type line. These objects represent Full Dataset, Subsampled Data.

General Considerations

A spline of order k with n+k knots has n degrees of freedom. Since we have 12 data sites, tau(1) < ... < tau(12), a fit with a cubic spline, i.e., a fourth order spline, requires a knot sequence t of length 12+4.

Moreover, the knot sequence t must satisfy the Schoenberg-Whitney conditions, i.e., must be such that the i-th data site lies in the support of the i-th B-spline, i.e.,

t(i) < tau(i) < t(i+k) for all i,

with equality allowed only in case of a knot of multiplicity k.

One way to choose a knot sequence satisfying all these conditions is as the optimal knots, of Gaffney/Powell and Micchelli/Rivlin/Winograd.

Optimal Knots

In optimal spline interpolation, to values at sites

tau(1), ..., tau(n)

say, the knots are chosen so as to minimize the constant in a standard error formula. Specifically, the first and the last data site are chosen as k-fold knots. The remaining n-k knots are supplied by optknt.

Here is the beginning of the help from optknt:

OPTKNT Optimal knot distribution.

OPTKNT(TAU,K) returns an `optimal' knot sequence for

interpolation at data sites TAU(1), ..., TAU(n) by splines of

order K. TAU must be an increasing sequence, but this is not

checked.

OPTKNT(TAU,K,MAXITER) specifies the number MAXITER of iterations

to be tried, the default being 10.

The interior knots of this knot sequence are the n-K

sign-changes in any absolutely constant function h ~= 0 that

satisfies

integral{ f(x)h(x) : TAU(1) < x < TAU(n) } = 0

for all splines f of order K with knot sequence TAU.

Trying OPTKNT

We try using optknt for interpolation on our example, interpolating by cubic splines to data

(tau(i), y(i)), for i = 1, ..., n.

k = 4;
osp = spapi( optknt(tau,k), tau,y);

fnplt(osp,'r');
hl = legend({'Full Dataset' 'Subsampled Data' ...
        'Cubic Spline Interpolant Using Optimal knots'}, ...
        'location','NW');
hl.Position = hl.Position-[.14,0,0,0];

Figure contains an axes. The axes with title The Titanium Heat Data contains 3 objects of type line. These objects represent Full Dataset, Subsampled Data, Cubic Spline Interpolant Using Optimal knots.

This is a bit disconcerting!

Here, marked by stars, are also the interior optimal knots:

xi = fnbrk(osp,'knots'); 
xi([1:k end+1-(1:k)]) = [];
plot(xi,repmat(1.4, size(xi)),'*');
hl = legend({'Full Dataset' 'Subsampled Data' ...
        'Cubic Spline Interpolant Using Optimal knots' ...
        'Optimal Knots'}, 'location','NW');
hl.Position = hl.Position-[.14,0,0,0];

Figure contains an axes. The axes with title The Titanium Heat Data contains 4 objects of type line. These objects represent Full Dataset, Subsampled Data, Cubic Spline Interpolant Using Optimal knots, Optimal Knots.

What Happened?

The knot choice for optimal interpolation is designed to make the maximum over all functions f of the ratio

norm(f - If) / norm(D^k f)

as small as possible, where the numerator is the norm of the interpolation error, f - If, and the denominator is the norm of the k-th derivative of the interpolant, D^k f. Since our data imply that D^k f is rather large, the interpolation error near the flat part of the data is of acceptable size for such an `optimal' scheme.

Actually, for these data, the ordinary cubic spline interpolant provided by csapi does quite well:

cs = csapi(tau,y);
fnplt(cs,'g',2);
hl = legend({'Full Dataset' 'Subsampled Data' ...
        'Cubic Spline Interpolant Using Optimal knots' ...
        'Optimal Knots' 'Cubic Spline Interpolant Using CSAPI'}, ...
        'location','NW');
hl.Position = hl.Position-[.14,0,0,0];
hold off

Figure contains an axes. The axes with title The Titanium Heat Data contains 5 objects of type line. These objects represent Full Dataset, Subsampled Data, Cubic Spline Interpolant Using Optimal knots, Optimal Knots, Cubic Spline Interpolant Using CSAPI.

Knot Choice for Least Squares Approximation

Knots must be selected when doing least-squares approximation by splines. One approach is to use equally-spaced knots to begin with, then use newknt with the approximation obtained for a better knot distribution.

The next sections illustrate these steps with the full titanium heat data set.

Least Squares Approximation with Uniform Knot Sequence

We start with a uniform knot sequence.

unif = linspace(xx(1), xx(end), 2+fix(length(xx)/4));
sp = spap2(augknt(unif, k), k, xx, yy);
plot(xx,yy,'x');
hold on
fnplt(sp,'r');
axis([500 1100 .55 2.25]);
title('The Titanium Heat Data');
hl = legend({'Full Dataset' ...
        'Least Squares Cubic Spline Using Uniform Knots'}, ...
        'location','NW');
hl.Position = hl.Position-[.14,0,0,0];

Figure contains an axes. The axes with title The Titanium Heat Data contains 2 objects of type line. These objects represent Full Dataset, Least Squares Cubic Spline Using Uniform Knots.

This is not at all satisfactory. So we use newknt for a spline approximation of the same order and with the same number of polynomial pieces, but the breaks better distributed.

Using NEWKNT to Improve the Knot Distribution

spgood = spap2(newknt(sp), k, xx,yy);
fnplt(spgood,'g',1.5);
hl = legend({'Full Dataset' ...
        'Least Squares Cubic Spline Using Uniform Knots' ...
        'Least Squares Cubic Spline Using NEWKNT'}, ...
        'location','NW');
hl.Position = hl.Position-[.14,0,0,0];
hold off

Figure contains an axes. The axes with title The Titanium Heat Data contains 3 objects of type line. These objects represent Full Dataset, Least Squares Cubic Spline Using Uniform Knots, Least Squares Cubic Spline Using NEWKNT.

This is quite good. Incidentally, even one interior knot fewer would not have sufficed in this case.