Cubic Smoothing Splines

This example shows how to use the csaps and spaps commands from Curve Fitting Toolbox™ to construct cubic smoothing splines.

The CSAPS Command

The command csaps provides the smoothing spline. This is a cubic spline that more or less follows the presumed underlying trend in noisy data. A smoothing parameter, to be chosen by you, determines just how closely the smoothing spline follows the given data. Here is the basic information, an abbreviated version of the documentation:

CSAPS Cubic smoothing spline.

VALUES = CSAPS(X, Y, P, XX)

Returns the values at XX of the cubic smoothing spline for the

given data (X,Y) and depending on the smoothing parameter P, chosen

from the interval [0 .. 1]. This smoothing spline f minimizes

P * sum_i W(i)(Y(i) - f(X(i)))^2 + (1-P) * integral (D^2 f)^2

Example: Noisy Data From a Cubic Polynomial

Here are some trial runs. We start with data from a simple cubic, q(x) := x^3, contaminate the values with some noise, and choose the value of the smoothing parameter to be .5. Then plot the resulting smoothed values, along with the underlying cubic, and the contaminated data.

xi = (0:.05:1);
q = @(x) x.^3;
yi = q(xi);
randomStream = RandStream.create( 'mcg16807', 'Seed', 23 );
ybad = yi+.3*(rand(randomStream, size(xi))-.5);
p = .5;
xxi = (0:100)/100;
ys = csaps(xi,ybad,p,xxi);
plot(xi,yi,':',xi,ybad,'x',xxi,ys,'r-')
title('Clean Data, Noisy Data, Smoothed Values')
legend( 'Exact', 'Noisy', 'Smoothed', 'Location', 'NorthWest' )

Figure contains an axes. The axes with title Clean Data, Noisy Data, Smoothed Values contains 3 objects of type line. These objects represent Exact, Noisy, Smoothed.

The smoothing is way overdone here. By choosing the smoothing parameter p closer to 1, we obtain a smoothing spline closer to the given data. We try p = .6, .7, .8, .9, 1, and plot the resulting smoothing splines.

yy = zeros(5,length(xxi));
p = [.6 .7 .8 .9 1];
for j=1:5
   yy(j,:) = csaps(xi,ybad,p(j),xxi);
end
hold on
plot(xxi,yy);
hold off
title('Smoothing Splines for Various Values of the Smoothing Parameter')
legend({'Exact','Noisy','p = 0.5','p = 0.6','p = 0.7','p = 0.8', ...
        'p = 0.9', 'p = 1.0'}, 'Location', 'NorthWest' )

Figure contains an axes. The axes with title Smoothing Splines for Various Values of the Smoothing Parameter contains 8 objects of type line. These objects represent Exact, Noisy, p = 0.5, p = 0.6, p = 0.7, p = 0.8, p = 0.9, p = 1.0.

We see that the smoothing spline can be very sensitive to the choice of the smoothing parameter. Even for p = 0.9, the smoothing spline is still far from the underlying trend, while for p = 1, we get the interpolant to the (noisy) data.

In fact, the formulation used by csapi (p.235ff of A Practical Guide to Splines) is very sensitive to scaling of the independent variable. A simple analysis of the equations used shows that the sensitive range for p is around 1/(1+epsilon), with epsilon := h^3/16, and h the average difference between neighboring sites. Specifically, you would expect a close following of the data when p = 1/(1+epsilon/100) and some satisfactory smoothing when p = 1/(1+epsilon*100).

The plot below shows the smoothing spline for values of p near this magic number 1/(1+epsilon). For this case, it is more informative to look at 1-p since the magic number, 1/(1+epsilon), is very close to 1.

epsilon = ((xi(end)-xi(1))/(numel(xi)-1))^3/16;
1 - 1/(1+epsilon)
ans = 7.8124e-06
plot(xi,yi,':',xi,ybad,'x')
hold on
labels = cell(1,5);
for j=1:5
   p = 1/(1+epsilon*10^(j-3));
   yy(j,:) = csaps(xi,ybad,p,xxi);
   labels{j} = ['1-p= ',num2str(1-p)];
end
plot(xxi,yy)
title('Smoothing Splines for Smoothing Parameter Near Its ''Magic'' Value')
legend( [{'Exact', 'Noisy'}, labels], 'Location', 'NorthWest' )
hold off

Figure contains an axes. The axes with title Smoothing Splines for Smoothing Parameter Near Its 'Magic' Value contains 7 objects of type line. These objects represent Exact, Noisy, 1-p= 7.8125e-08, 1-p= 7.8125e-07, 1-p= 7.8124e-06, 1-p= 7.8119e-05, 1-p= 0.00078064.

In this example, the smoothing spline is very sensitive to variation of the smoothing parameter near the magic number. The one farthest from 1 seems the best choice from these, but you may prefer the one beyond that.

p = 1/(1+epsilon*10^3);
yy = csaps(xi,ybad,p,xxi);
hold on
plot( xxi, yy, 'y', 'LineWidth', 2 )
title( sprintf( 'The Smoothing Spline For 1-p = %s is Added, in Yellow', num2str(1-p) ) )
hold off

Figure contains an axes. The axes with title The Smoothing Spline For 1-p = 0.0077519 is Added, in Yellow contains 8 objects of type line. These objects represent Exact, Noisy, 1-p= 7.8125e-08, 1-p= 7.8125e-07, 1-p= 7.8124e-06, 1-p= 7.8119e-05, 1-p= 0.00078064.

You can also supply csaps with error weights, to pay more attention to some data points than others. Also, if you do not supply the evaluation sites xx, then csaps returns the ppform of the smoothing spline.

Finally, csaps can also handle vector-valued data and even multivariate, gridded data.

The SPAPS Command

The cubic smoothing spline provided by the command spaps differs from the one constructed in csaps only in the way it is selected. Here is an abbreviated version of the documentation for spaps:

SPAPS Smoothing spline.

[SP,VALUES] = SPAPS(X,Y,TOL) returns the B-form and, if asked,

the values at X, of a cubic smoothing spline f for the given

data (X(i),Y(:,i)), i=1,2, ..., n.

The smoothing spline f minimizes the roughness measure

F(D^2 f) := integral ( D^2 f(t) )^2 dt on X(1) < t < X(n)

over all functions f for which the error measure

E(f) := sum_j { W(j)*( Y(:,j) - f(X(j)) )^2 : j=1,...,n }

is no bigger than the given TOL. Here, D^M f denotes the M-th

derivative of f. The weights W are chosen so that E(f) is the

Composite Trapezoid Rule approximation for F(y-f).

f is constructed as the unique minimizer of

rho*E(f) + F(D^2 f),

with the smoothing parameter RHO so chosen so that E(f) equals

TOL. Hence, FN2FM(SP,'pp') should be (up to roundoff) the same

as the output from CPAPS(X,Y,RHO/(1+RHO)).

Tolerance vs. Smoothing Parameter

It may be easier to supply a suitable tolerance for spaps than the smoothing parameter p required by csaps. In our earlier example, we added uniformly-distributed random noise from the interval 0.3*[-0.5 .. 0.5]. Hence, we can estimate a reasonable value for tol as the value of the error measure e at such noise.

tol = sum((.3*(rand(randomStream, size(yi))-.5)).^2);

This plot shows the resulting smoothing spline constructed by spaps. Note that the error weights are specified to be uniform, which is their default value in csaps.

[sp,ys,rho] = spaps(xi,ybad,tol,ones(size(xi)));
plot(xi,yi,':',xi,ybad,'x',xi,ys,'r-')
title( sprintf( 'Clean Data, Noisy Data, Smoothed Values (1-p = %s )', num2str(1/(1+rho)) ) );
legend( {'Exact','Noisy','Smoothed'}, 'location', 'NorthWest' )

Figure contains an axes. The axes with title Clean Data, Noisy Data, Smoothed Values (1-p = 0.013761 ) contains 3 objects of type line. These objects represent Exact, Noisy, Smoothed.

The figure title shows the value of p you would use in csaps to obtain exactly this smoothing spline for these data.

Here, in addition, is the smoothing spline provided by csaps when not given a smoothing parameter. In this case csaps chooses the parameter by a certain ad hoc procedure that attempts to locate the region where the smoothing spline is most sensitive to the smoothing parameter (similar to the earlier discussion).

hold on
plot(xxi,fnval(csaps(xi,ybad),xxi),'-')
title('Clean Data, Noisy Data, Smoothed Values')
legend({'Exact' 'Noisy' 'spaps, specified tolerance' ...
        'csaps, default smoothing parameter'}, 'Location', 'NorthWest' )
hold off

Figure contains an axes. The axes with title Clean Data, Noisy Data, Smoothed Values contains 4 objects of type line. These objects represent Exact, Noisy, spaps, specified tolerance, csaps, default smoothing parameter.

CSAPS vs. SPAPS

The csaps and spaps commands differ in the way in which you specify a particular smoothing spline, via a smoothing parameter vs. a tolerance. Another difference is that spaps can provide a linear or a quintic smoothing spline, in addition to the cubic smoothing spline.

The quintic smoothing spline is better than the cubic smoothing spline in the situation when you would like the second derivative to move as little as possible.