LINEAR REGRESSION

Standard Calculation Formulas

For linear models y = mx + b the slope m and y-intercept b of the best fitting line to the observed data have traditionally been computed from formulas. These are derived as follows. SSE is the square of the distance between Y and Y_HAT in n dimensional space as defined in the page on curve fitting.

> restart:with(student):

> SSE = Sum((Y[i]-Y_HAT[i])^2,i=1..n);

[Maple Math]

> Y_HAT[i]=m*X[i]+b;`Substituting gives`;SSE = Sum((Y[i]-m*X[i]-b)^2,i=1..n);

[Maple Math]

[Maple Math]

[Maple Math]

> SSE = Sum(expand((Y[i]-m*X[i]-b)^2),i=1..n);

[Maple Math]

> SSE = Sum(Y[i]^2,i=1..n) - 2*m*Sum(X[i]*Y[i],i=1..n) -2*b*Sum(Y[i],i=1..n) +m^2* Sum(X[i]^2,i=1..n) + 2*m*b*Sum(X[i],i=1..n) + n*b^2;

[Maple Math]

At this point SSE has not been defined to be a function. Unapply is a strange command, but basically the command says: Make SSE the function of two variable such that SSE(m,b) is the right hand side of the previous output where m and b are regarded as the variables and all else is constant. We need to make SSE a function so that we can (partially) differentiate it.

> SSE:=unapply(rhs(%),m,b);

[Maple Math]

> eq1:=diff(SSE(m,b),m)=0;

[Maple Math]

> eq2:=diff(SSE(m,b),b)=0;

[Maple Math]

This command sets the partial derivatives wrt m and b equal to zero and solves for m and b.

> solve({eq1,eq2},{m,b});

[Maple Math]

> solve({eq2},{b});

[Maple Math]

> %%[2],%[1];

[Maple Math]

It is easier to use the last expressions for m and b.

Simple Example

> X:=[1.,3.,2.,4.];

[Maple Math]

> Y:=[3.,8.,3.,10.];

[Maple Math]

> XX:=[seq(X[i]^2,i=1..4)];

>

[Maple Math]

> YY:=[seq(Y[i]^2,i=1..4)];

[Maple Math]

> XY:=[seq(X[i]*Y[i],i=1..4)];

[Maple Math]

> Sum(x[i],i=1..4)=sum(X[i],i=1..4);SUMX:=rhs(%);

[Maple Math]

[Maple Math]

> Sum(y[i],i=1..4)=sum(Y[i],i=1..4);SUMY:=rhs(%);

[Maple Math]

[Maple Math]

> Sum(x[i]^2,i=1..4)=sum(XX[i],i=1..4);SUMXX:=rhs(%);

[Maple Math]

[Maple Math]

> Sum(y[i]^2,i=1..4)=sum(YY[i],i=1..4);SUMYY:=rhs(%);

[Maple Math]

[Maple Math]

> Sum(x[i]*y[i],i=1..4)=sum(XY[i],i=1..4);SUMXY:=rhs(%);

[Maple Math]

[Maple Math]

> printf(" X Y XX YY XY"); printf("----------------------------------------------------------------------");for j from 1 to 4 do printf("%7.0f %7.0f %7.0f %7.0f %7.0f",X[j],Y[j],XX[j],YY[j],XY[j]);print():od;printf("----------------------------------------------------------------------");printf("%7.0f %7.0f %7.0f %7.0f %7.0f",SUMX,SUMY,SUMXX,SUMYY,SUMXY);print():

>

      X             Y                XX               YY         XY

----------------------------------------------------------------------

      1             3                1                9          3

      3             8                9               64         24

      2             3                4                9          6

      4            10               16              100         40

----------------------------------------------------------------------

     10            24               30              182         73

> n:=4:

> m:=(n*SUMXY - SUMX*SUMY)/(n*SUMXX - SUMX*SUMX);

[Maple Math]

> b:=SUMY/n - m*SUMX/n;

[Maple Math]

Confession

Actually Maple will do the simple example with one command.

>

> with(stats):with(fit):leastsquare[{x,y},y=M*x+B,{M,B}]([X,Y]);

[Maple Math]

>