CURVE FITTING

Method of Complete Coverage

Data for bears from
Triola's Elementary Statistics, 8th ed
page 521

The problem concerns whether one can fairly accurately determine the weight of a bear by measuring the circumference of its chest, which would be much easier than hauling scales into the woods to weigh it. The complete data set for the bears is in Appendix B in Triola, and consists of measurements of chest sizes and weight of 54 bears together with other measurements. Studies of this type should always have as large a data base as possible. For illustrative purposes only, we will use a small data set of 8 bears, though this is not large enough to be meaningful in a real study. We shall refer to the measurements of these 8 bears as Observed Chest Size X and Observed Weight Y. Probably the trickiest thing about curve fitting is to decide which family of curves to use. In this example we shall use the simplest family, straight lines, whose generic formula is y = mx + b, m and b being the parameters of the family. This is ad hoc. In some curve fitting problems, there are theoretical reasons to choose one family as opposed to another.

> restart:with(plots):with(stats):with(fit):

> `Observed chest sizes in inches`;X:=[26., 45., 54., 49., 41., 49., 44., 19];

[Maple Math]

[Maple Math]

> `Observed weights in pounds`;Y:=[90., 344., 416., 348., 262., 360., 332., 34.];

[Maple Math]

[Maple Math]

> `Choose a family of curves to fit (often refered to as the model)`;`Linear model`;f:=x -> m*x+b;

[Maple Math]

[Maple Math]

[Maple Math]

The key to curve fitting is to determine which member of the family best fits the observed data. If we substitute the chest sizes into the model we can calculate what the predicted (or computed) weights would be, but since the parameters are not yet determined, these predicted weights come out in terms of the parameters. This gives in this case 8 predicted weights denoted by Y_HAT[i], i = 1..8.

> Y_HAT:=[seq(f(X[i]),i=1..8)];

[Maple Math]

Since the observed weights Y[i] and also the predicted weights Y_HAT[i], i=1..8 have 8 coordinates, we can regard them as points in 8 dimensional space. What we would like to do is to choose the values on the parameter m and b which make Y and Y_HAT as close together as possible in 8 dimensional space. The distance formula for 8 dimensions is an extension of the distance formula for 2 and 3 dimensional space. For points U and V in 3 dimensions, this is defined as:

> dist(U,V) = sqrt(Sum((U[k]-V[k])^2,k=1..3));

[Maple Math]

In 8 dimensions

> dist(U,V) = sqrt(Sum((U[k]-V[k])^2,k=1..8));

[Maple Math]

> `Distance in 8 dim space between points Y and Y_HAT`;sqrt(Sum((y[j]-y_hat[j])^2,j=1..8));

[Maple Math]

[Maple Math]

> `The distance between Y and Y_HAT is clearly a function of the parameters m and b`;DISTANCE:= unapply(sqrt(sum((Y[j]-Y_HAT[j])^2,j=1..8)),m,b);

[Maple Math]

[Maple Math]
[Maple Math]

The goal is to choose the values of m and b that makes the distance between Y and Y_HAT as small as possible.

That is, we wish to find the values of m and b that minimizes DISTANCE(m,b). Let us draw a contourplot for DISTANCE(m,b). This plot will be in the m,b plane, which might be referred to as Parameter Space.

> graph1:=contourplot(DISTANCE(m,b),m=9..14,b=-350..-100,contours=[600,400,200,150,125,100,80,70,60,50,45],numpoints=10000,title= `Contours of DISTANCE(m,b) in Parameter Space`):

> graph2:=implicitplot({m=11.27,b=-187.46},m=9..14, b=-350..-100,color=green):

> display(graph1,graph2);

[Maple Plot]

The contours are at levels 600, 400, 200, 150, 125, 100 , 80, 70, 60, 50 and 45, the inner contour being at level 45. Clearly there is a local minimum of DISTANCE(m,b) near m = 11.3 and b = -190 indicated by the green cross.

We have shown previously by analytic methods that the mininmum occurs at m = 11.27, b=-187.5. With nonlinear models, the analytic method of finding the minimum may be extremely difficult or impossible. A numerical method that is easy to understand and works reasonably well even for nonlinear models will be demonstrated on this linear model for which we already know the answer. This method consists of calculating DISTANCE(m,b) at a grid of points overlying parameter space and determining for which of these grid points DISTANCE(m,b) is smallest. This becomes more accurate as the grid becomes finer, but requires more computations. Called the Method of Complete Coverage, this method does not really cover all of parameter space, which in this example is the m,b plane. One needs a preliminary estimate of the parameters in order to know which part of parameter space to cover. This is usually a rectangle in the plane for models with two parameters and a rectangular solid in three space for a model with three parameters. Even the selected part of parameter space is not covered completely, because there are infinitely many points in it. However if the grid is fine enough, the values of the parameters that minimize DISTANCE can be computed by this method for a required number of significant digits, and in this sense the rectangle or rectangular solid has been covered completely for practical purposes. If there are four parameters, the region can still be covered, it just cannot be drawn. However the number of calculations increases rapidly with the number of parameters to be evaluated.

> k:=1:for M from 9 to 14 by .5 do for B from -340 to -100 by 20 do Point[k]:=[M,B];k:=k+1;od:od:

> grid:=[seq(Point[k],k=1..143)]:

> graph3:=plot(grid,style=point,symbol=circle,color=blue,axes=boxed):

> display(graph1,graph3);

[Maple Plot]

> DISTANCE_MIN:=10^20:for M from 9 to 14 by .5 do for B from -340 to -100 by 20 do if DISTANCE(M,B)<DISTANCE_MIN then DISTANCE_MIN:=DISTANCE(M,B);m_MIN:=M;b_MIN:=B;fi;od;od;

> print(m_MIN,b_MIN,DISTANCE_MIN);

[Maple Math]

The above graph shows contours of DISTANCE with parameter space covered by a not very fine grid. Evaluation of DISTANCE at each of the 143 grid points finds the smallest value of DISTANCE at m = 11.5, b = -200.

> k:=1:for M from 9 to 14 by .1 do for B from -340 to -100 by 5 do Point[k]:=[M,B];k:=k+1;od:od:

> grid:=[seq(Point[k],k=1..2499)]:

> graph4:=plot(grid,style=point,symbol=point,color=blue,axes=boxed):

> display(graph1,graph4);

[Maple Plot]

> DISTANCE_MIN:=10^20:for M from 9 to 14 by .1 do for B from -340 to -100 by 5 do if DISTANCE(M,B)<DISTANCE_MIN then DISTANCE_MIN:=DISTANCE(M,B);m_MIN:=M;b_MIN:=B;fi;od;od;

> print(m_MIN,b_MIN,DISTANCE_MIN);

[Maple Math]

The much finer grid shown here finds the minimum of DISTANCE at m = 11.2, b= -185.

DISTANCE_MIN:=10^20:
for M from 9 to 14 by .01 do
for B from -340 to -100 by .1 do
*** if DISTANCE(M,B)<DISTANCE_MIN then
*** *** DISTANCE_MIN:=DISTANCE(M,B);
*** *** m_MIN:=M;
*** *** b_MIN:=B;
*** fi;
od;
od;
print(m_MIN,b_MIN,DISTANCE_MIN);

[Maple Math]

The grid is not shown for this computation. It consists of 1 202 901 grid points. This grid requires several minutes for the computation of DISTANCE for all the grid points, but gives the values of m and b correct to 4 significant figures: m = 11.27, b = -187.4.

Maple does not lend itself to writing programs in good style. I have rewritten the above in html code showing how do loops and if-then structures work.

It is clear that had it been necessary, we could have covered a much smaller rectangle, thereby decreasing the number of required computations.