Question and Answers Forum

All Questions      Topic List

Others Questions

Previous in All Question      Next in All Question      

Previous in Others      Next in Others      

Question Number 5117 by FilupSmith last updated on 14/Apr/16

I have a question. I am unsure how this  is done because I have never learnt it.    How do you determine the line of best fit?

$$\mathrm{I}\:\mathrm{have}\:\mathrm{a}\:\mathrm{question}.\:\mathrm{I}\:\mathrm{am}\:\mathrm{unsure}\:\mathrm{how}\:\mathrm{this} \\ $$$$\mathrm{is}\:\mathrm{done}\:\mathrm{because}\:\mathrm{I}\:\mathrm{have}\:\mathrm{never}\:\mathrm{learnt}\:\mathrm{it}. \\ $$$$ \\ $$$$\mathrm{How}\:\mathrm{do}\:\mathrm{you}\:\mathrm{determine}\:\mathrm{the}\:\mathrm{line}\:\mathrm{of}\:\mathrm{best}\:\mathrm{fit}? \\ $$

Commented by Yozzii last updated on 14/Apr/16

Least squares method is one way.  Given a set of n points (x_i ,y_i ), suppose  that the line of bestfit has the form  y=mx+c. Then, for each x_i , we get  the result y_j =mx_i +c. The least squares  method requires that for the regression line  y on x, the quantity Q must be minimised  where Q=Σ_(i=1) ^n (y_i −y_j )^2 =Σ_(i=1) ^n (y_i −mx_i −c)^2   This means that we aim to minimise  the squared distances between a given y_i  and  the point on the line y_j  corresponding to  x_i . Q is a function of two variables m  and c since we assume that (x_i ,y_i ) are  known. We can then employ techniques  of multivariable calculus to find m and  c. Since Q is of a quadratic form its  stationary value is a minimum one;  in 3D space, the locus of points (m,c,Q) is a bowl surface  where Q≥0.   ⇒(∂Q/∂m)=Σ_(i=1) ^n (−2x_i )(y_i −mx_i −c)  (∂Q/∂m)=2m(Σ_(i=1) ^n x_i ^2 )+2c(Σ_(i=1) ^n x_i )−2(Σ_(i=1) ^n x_i y_i )  and (∂Q/∂c)=Σ_(i=1) ^n (−2)(y_i −mx_i −c)  (∂Q/∂c)=2m(Σ_(i=1) ^n x_i )+2c(Σ_(i=1) ^n 1)−2Σ_(i=1) ^n y_i   or (∂Q/∂c)=2m(Σ_(i=1) ^n x_i )+2cn−2(Σ_(i=1) ^n y_i )  At the stationary point, (∂Q/∂m)=0 and (∂Q/∂c)=0.  You get the following result for m  from these two equations by eliminating c.  m=((nΣ_(i=1) ^n x_i y_i −(Σ_(i=1) ^n x_i )(Σ_(i=1) ^n y_i ))/(nΣ_(i=1) ^n x_i ^2 −(Σ_(i=1) ^n x_i )^2 ))  It can shown that (x^− ,y^− )=(((Σ_(i=1) ^n x_i )/n),((Σ_(i=1) ^n y_i )/n))  lies on the line of best fit, so c can be  found from c=y^− −mx^− . The result  of y=mx+c is the least squares line of  best fit.  (x^− ,y^− ) is the centroid of all the points  (x_i ,y_i ). m has another form.  m=((Σ_(i=1) ^n x_i y_i −nx^− y^− )/(Σ_(i=1) ^n x_i ^2 −n(x^− )^2 )).

$${Least}\:{squares}\:{method}\:{is}\:{one}\:{way}. \\ $$$${Given}\:{a}\:{set}\:{of}\:{n}\:{points}\:\left({x}_{{i}} ,{y}_{{i}} \right),\:{suppose} \\ $$$${that}\:{the}\:{line}\:{of}\:{bestfit}\:{has}\:{the}\:{form} \\ $$$${y}={mx}+{c}.\:{Then},\:{for}\:{each}\:{x}_{{i}} ,\:{we}\:{get} \\ $$$${the}\:{result}\:{y}_{{j}} ={mx}_{{i}} +{c}.\:{The}\:{least}\:{squares} \\ $$$${method}\:{requires}\:{that}\:{for}\:{the}\:{regression}\:{line} \\ $$$${y}\:{on}\:{x},\:{the}\:{quantity}\:{Q}\:{must}\:{be}\:{minimised} \\ $$$${where}\:{Q}=\underset{{i}=\mathrm{1}} {\overset{{n}} {\sum}}\left({y}_{{i}} −{y}_{{j}} \right)^{\mathrm{2}} =\underset{{i}=\mathrm{1}} {\overset{{n}} {\sum}}\left({y}_{{i}} −{mx}_{{i}} −{c}\right)^{\mathrm{2}} \\ $$$${This}\:{means}\:{that}\:{we}\:{aim}\:{to}\:{minimise} \\ $$$${the}\:{squared}\:{distances}\:{between}\:{a}\:{given}\:{y}_{{i}} \:{and} \\ $$$${the}\:{point}\:{on}\:{the}\:{line}\:{y}_{{j}} \:{corresponding}\:{to} \\ $$$${x}_{{i}} .\:{Q}\:{is}\:{a}\:{function}\:{of}\:{two}\:{variables}\:{m} \\ $$$${and}\:{c}\:{since}\:{we}\:{assume}\:{that}\:\left({x}_{{i}} ,{y}_{{i}} \right)\:{are} \\ $$$${known}.\:{We}\:{can}\:{then}\:{employ}\:{techniques} \\ $$$${of}\:{multivariable}\:{calculus}\:{to}\:{find}\:{m}\:{and} \\ $$$${c}.\:{Since}\:{Q}\:{is}\:{of}\:{a}\:{quadratic}\:{form}\:{its} \\ $$$${stationary}\:{value}\:{is}\:{a}\:{minimum}\:{one}; \\ $$$${in}\:\mathrm{3}{D}\:{space},\:{the}\:{locus}\:{of}\:{points}\:\left({m},{c},{Q}\right)\:{is}\:{a}\:{bowl}\:{surface} \\ $$$${where}\:{Q}\geqslant\mathrm{0}.\: \\ $$$$\Rightarrow\frac{\partial{Q}}{\partial{m}}=\underset{{i}=\mathrm{1}} {\overset{{n}} {\sum}}\left(−\mathrm{2}{x}_{{i}} \right)\left({y}_{{i}} −{mx}_{{i}} −{c}\right) \\ $$$$\frac{\partial{Q}}{\partial{m}}=\mathrm{2}{m}\left(\underset{{i}=\mathrm{1}} {\overset{{n}} {\sum}}{x}_{{i}} ^{\mathrm{2}} \right)+\mathrm{2}{c}\left(\underset{{i}=\mathrm{1}} {\overset{{n}} {\sum}}{x}_{{i}} \right)−\mathrm{2}\left(\underset{{i}=\mathrm{1}} {\overset{{n}} {\sum}}{x}_{{i}} {y}_{{i}} \right) \\ $$$${and}\:\frac{\partial{Q}}{\partial{c}}=\underset{{i}=\mathrm{1}} {\overset{{n}} {\sum}}\left(−\mathrm{2}\right)\left({y}_{{i}} −{mx}_{{i}} −{c}\right) \\ $$$$\frac{\partial{Q}}{\partial{c}}=\mathrm{2}{m}\left(\underset{{i}=\mathrm{1}} {\overset{{n}} {\sum}}{x}_{{i}} \right)+\mathrm{2}{c}\left(\underset{{i}=\mathrm{1}} {\overset{{n}} {\sum}}\mathrm{1}\right)−\mathrm{2}\underset{{i}=\mathrm{1}} {\overset{{n}} {\sum}}{y}_{{i}} \\ $$$${or}\:\frac{\partial{Q}}{\partial{c}}=\mathrm{2}{m}\left(\underset{{i}=\mathrm{1}} {\overset{{n}} {\sum}}{x}_{{i}} \right)+\mathrm{2}{cn}−\mathrm{2}\left(\underset{{i}=\mathrm{1}} {\overset{{n}} {\sum}}{y}_{{i}} \right) \\ $$$${At}\:{the}\:{stationary}\:{point},\:\frac{\partial{Q}}{\partial{m}}=\mathrm{0}\:{and}\:\frac{\partial{Q}}{\partial{c}}=\mathrm{0}. \\ $$$${You}\:{get}\:{the}\:{following}\:{result}\:{for}\:{m} \\ $$$${from}\:{these}\:{two}\:{equations}\:{by}\:{eliminating}\:{c}. \\ $$$${m}=\frac{{n}\underset{{i}=\mathrm{1}} {\overset{{n}} {\sum}}{x}_{{i}} {y}_{{i}} −\left(\underset{{i}=\mathrm{1}} {\overset{{n}} {\sum}}{x}_{{i}} \right)\left(\underset{{i}=\mathrm{1}} {\overset{{n}} {\sum}}{y}_{{i}} \right)}{{n}\underset{{i}=\mathrm{1}} {\overset{{n}} {\sum}}{x}_{{i}} ^{\mathrm{2}} −\left(\underset{{i}=\mathrm{1}} {\overset{{n}} {\sum}}{x}_{{i}} \right)^{\mathrm{2}} } \\ $$$${It}\:{can}\:{shown}\:{that}\:\left(\overset{−} {{x}},\overset{−} {{y}}\right)=\left(\frac{\underset{{i}=\mathrm{1}} {\overset{{n}} {\sum}}{x}_{{i}} }{{n}},\frac{\underset{{i}=\mathrm{1}} {\overset{{n}} {\sum}}{y}_{{i}} }{{n}}\right) \\ $$$${lies}\:{on}\:{the}\:{line}\:{of}\:{best}\:{fit},\:{so}\:{c}\:{can}\:{be} \\ $$$${found}\:{from}\:{c}=\overset{−} {{y}}−{m}\overset{−} {{x}}.\:{The}\:{result} \\ $$$${of}\:{y}={mx}+{c}\:{is}\:{the}\:{least}\:{squares}\:{line}\:{of} \\ $$$${best}\:{fit}. \\ $$$$\left(\overset{−} {{x}},\overset{−} {{y}}\right)\:{is}\:{the}\:{centroid}\:{of}\:{all}\:{the}\:{points} \\ $$$$\left({x}_{{i}} ,{y}_{{i}} \right).\:{m}\:{has}\:{another}\:{form}. \\ $$$${m}=\frac{\underset{{i}=\mathrm{1}} {\overset{{n}} {\sum}}{x}_{{i}} {y}_{{i}} −{n}\overset{−} {{x}}\overset{−} {{y}}}{\underset{{i}=\mathrm{1}} {\overset{{n}} {\sum}}{x}_{{i}} ^{\mathrm{2}} −{n}\left(\overset{−} {{x}}\right)^{\mathrm{2}} }. \\ $$

Terms of Service

Privacy Policy

Contact: info@tinkutara.com