LINEAR REGRESSION
Standard Calculation Formulas
For linear models y = mx + b the slope m and y-intercept b of the best fitting line to the observed data have traditionally been computed from formulas. These are derived as follows. SSE is the square of the distance between Y and Y_HAT in n dimensional space as defined in the page on curve fitting.
> restart:with(student):
> SSE = Sum((Y[i]-Y_HAT[i])^2,i=1..n);
> Y_HAT[i]=m*X[i]+b;`Substituting gives`;SSE = Sum((Y[i]-m*X[i]-b)^2,i=1..n);
> SSE = Sum(expand((Y[i]-m*X[i]-b)^2),i=1..n);
> SSE = Sum(Y[i]^2,i=1..n) - 2*m*Sum(X[i]*Y[i],i=1..n) -2*b*Sum(Y[i],i=1..n) +m^2* Sum(X[i]^2,i=1..n) + 2*m*b*Sum(X[i],i=1..n) + n*b^2;
At this point SSE has not been defined to be a function. Unapply is a strange command, but basically the command says: Make SSE the function of two variable such that SSE(m,b) is the right hand side of the previous output where m and b are regarded as the variables and all else is constant. We need to make SSE a function so that we can (partially) differentiate it.
> SSE:=unapply(rhs(%),m,b);
> eq1:=diff(SSE(m,b),m)=0;
> eq2:=diff(SSE(m,b),b)=0;
This command sets the partial derivatives wrt m and b equal to zero and solves for m and b.
> solve({eq1,eq2},{m,b});
> solve({eq2},{b});
> %%[2],%[1];
It is easier to use the last expressions for m and b.
Simple Example
> X:=[1.,3.,2.,4.];
> Y:=[3.,8.,3.,10.];
> XX:=[seq(X[i]^2,i=1..4)];
>
> YY:=[seq(Y[i]^2,i=1..4)];
> XY:=[seq(X[i]*Y[i],i=1..4)];
> Sum(x[i],i=1..4)=sum(X[i],i=1..4);SUMX:=rhs(%);
> Sum(y[i],i=1..4)=sum(Y[i],i=1..4);SUMY:=rhs(%);
> Sum(x[i]^2,i=1..4)=sum(XX[i],i=1..4);SUMXX:=rhs(%);
> Sum(y[i]^2,i=1..4)=sum(YY[i],i=1..4);SUMYY:=rhs(%);
> Sum(x[i]*y[i],i=1..4)=sum(XY[i],i=1..4);SUMXY:=rhs(%);
> printf(" X Y XX YY XY"); printf("----------------------------------------------------------------------");for j from 1 to 4 do printf("%7.0f %7.0f %7.0f %7.0f %7.0f",X[j],Y[j],XX[j],YY[j],XY[j]);print():od;printf("----------------------------------------------------------------------");printf("%7.0f %7.0f %7.0f %7.0f %7.0f",SUMX,SUMY,SUMXX,SUMYY,SUMXY);print():
>
X Y XX YY XY
----------------------------------------------------------------------
1 3 1 9 3
3 8 9 64 24
2 3 4 9 6
4 10 16 100 40
----------------------------------------------------------------------
10 24 30 182 73
> n:=4:
> m:=(n*SUMXY - SUMX*SUMY)/(n*SUMXX - SUMX*SUMX);
> b:=SUMY/n - m*SUMX/n;
Confession
Actually Maple will do the simple example with one command.
>
> with(stats):with(fit):leastsquare[{x,y},y=M*x+B,{M,B}]([X,Y]);
>