Regression analysis. Least squares method in Excel. Regression analysis Least squares method in excel power function

4.1. Using built-in functions

Calculation regression coefficients carried out using the function

LINEST(Values_y; x-values; Const; statistics),

Values_y- array of y values,

x-values- optional array of values x, if array X is omitted, it is assumed that this is an array (1;2;3;...) of the same size as Values_y,

Const- a boolean value that indicates whether the constant is required b was equal to 0. If Const has the meaning TRUE or omitted, then b is calculated in the usual way. If the argument Const is FALSE, then b is assumed to be 0 and the values a are selected so that the relation is fulfilled y=ax.

Statistics is a boolean value that indicates whether additional regression statistics are required to be returned. If the argument Statistics has the meaning TRUE, then the function LINEST returns additional regression statistics. If the argument Statistics has the meaning LIE or omitted, then the function LINEST returns only the coefficient a and constant b.

It must be remembered that the result of the functions LINEST() is a set of values ​​– an array.

For calculation correlation coefficient function is used

CORREL(Array1;Array2),

returning the values ​​of the correlation coefficient, where Array1- array of values y, Array2- array of values x. Array1 And Array2 must be the same size.

EXAMPLE 1. Addiction y(x) is presented in the table. Build regression line and calculate correlation coefficient.

y 0.5 1.5 2.5 3.5
x 2.39 2.81 3.25 3.75 4.11 4.45 4.85 5.25

Let's enter a table of values ​​into a MS Excel sheet and build a scatter plot. The worksheet will take the form shown in Fig. 2.

In order to calculate the values ​​of regression coefficients A And b select the cells A7:B7, Let's go to the function wizard and in the category Statistical select a function LINEST. Let's fill in the dialog box that appears as shown in Fig. 3 and press OK.


As a result, the calculated value will appear only in the cell A6(Fig. 4). In order for the value to appear in the cell B6 you need to enter edit mode (key F2), and then press the key combination CTRL+SHIFT+ENTER.

To calculate the value of the correlation coefficient in a cell C6 the following formula was introduced:

C7=CORREL(B3:J3;B2:J2).

Knowing the regression coefficients A And b let's calculate the function values y=ax+b for given x. To do this, we introduce the formula

B5=$A$7*B2+$B$7

and copy it to the range C5:J5(Fig. 5).

Let's plot the regression line on the diagram. Select the experimental points on the graph, right-click and select the command Initial data. In the dialog box that appears (Fig. 5), select the tab Row and click on the button Add. Let's fill in the input fields as shown in Fig. 6 and press the button OK. A regression line will be added to the experimental data graph. By default, its graph will be drawn as points not connected by smoothing lines.



To change the appearance of the regression line, perform the following steps. Right-click on the points depicting the line graph and select the command Chart type and set the type of scatter diagram, as shown in Fig. 7.

The line type, color and thickness can be changed as follows. Select a line on the diagram, right-click and select the command in the context menu Data series format... Next, make settings, for example, as shown in Fig. 8.

As a result of all transformations, we obtain a graph of experimental data and a regression line in one graphical area (Fig. 9).

4.2. Using a trend line.

The construction of various approximating dependencies in MS Excel is implemented as a chart property - trend line.

EXAMPLE 2. As a result of the experiment, a certain tabular dependence was determined.

0.15 0.16 0.17 0.18 0.19 0.20
4.4817 4.4930 5.4739 6.0496 6.6859 7.3891

Select and construct an approximating dependence. Construct graphs of tabular and selected analytical dependences.

Solving the problem can be divided into the following stages: entering initial data, constructing a scatter plot and adding a trend line to this graph.

Let's look at this process in detail. Let's enter the initial data into the worksheet and plot the experimental data. Next, select the experimental points on the graph, right-click and use the command Add l trend line(Fig. 10).

The dialog box that appears allows you to build an approximating relationship.

The first tab (Fig. 11) of this window indicates the type of approximating dependence.

On the second (Fig. 12) the construction parameters are determined:

· name of the approximating dependence;

· forecast forward (backward) by n units (this parameter determines how many units forward (backward) the trend line needs to be extended);

whether to show the point of intersection of a curve with a straight line y=const;

· show the approximating function on the diagram or not (the option to show the equation on the diagram);

· whether to place the value of the standard deviation on the diagram or not (the option to place the value of the approximation reliability on the diagram).

Let us choose a polynomial of the second degree as an approximating dependence (Fig. 11) and display the equation that describes this polynomial on a graph (Fig. 12). The resulting diagram is shown in Fig. 13.

Similarly using trend lines you can select the parameters of such dependencies as

linear y=a∙x+b,

logarithmic y=a∙ln(x)+b,

· exponential y=a∙e b,

· sedate y=a∙x b,

polynomial y=a∙x 2 +b∙x+c, y=a∙x 3 +b∙x 2 +c∙x+d and so on, up to a polynomial of the 6th degree inclusive,

· linear filtration.

4.3. Using the options analysis tool: Finding a solution.

Of significant interest is the implementation in MS Excel of selecting functional dependence parameters using the method least squares using the options analysis tool: Finding a solution. This technique allows you to select the parameters of a function of any type. Let's consider this possibility using the following problem as an example.

EXAMPLE 3. As a result of the experiment, the dependence z(t) was obtained, presented in the table

0,66 0,9 1,17 1,47 1,7 1,74 2,08 2,63 3,12
38,9 68,8 64,4 66,5 64,95 59,36 82,6 90,63 113,5

Select dependence coefficients Z(t)=At 4 +Bt 3 +Ct 2 +Dt+K least squares method.

This problem is equivalent to the problem of finding the minimum of a function of five variables

Let's consider the process of solving the optimization problem (Fig. 14).

Let the values A, IN, WITH, D And TO stored in cells A7:E7. Let's calculate the theoretical values ​​of the function Z(t)=At 4 +Bt 3 +Ct 2 +Dt+K for given t(B2:J2). To do this, in the cell B4 enter the value of the function at the first point (cell B2):

B4=$A$7*B2^4+$B$7*B2^3+$C$7*B2^2+$D$7*B2+$E$7.

Let's copy this formula into the range C4:J4 and get the expected value of the function at the points whose abscissas are stored in the cells B2:J2.

To cell B5 Let's introduce a formula that calculates the square of the difference between the experimental and calculated points:

B5=(B4-B3)^2,

and copy it to the range C5:J5. In a cell F7 we will store the total squared error (10). To do this, enter the formula:

F7 = SUM(B5:J5).

Let's use the command Service®Search for a solution and solve the optimization problem without restrictions. Let's fill in the input fields in the dialog box shown in Fig. accordingly. 14 and press the button Execute. If a solution is found, the window shown in Fig. 15.

The result of the decision block will be output to cells A7:E7parameter values functions Z(t)=At 4 +Bt 3 +Ct 2 +Dt+K. In cells B4:J4 we get expected function value at the starting points. In a cell F7 will be stored total square error.

You can display experimental points and a fitted line in one graphic area by selecting a range B2:J4, call Chart Wizard and then format appearance received graphs.

Rice. 17 displays the MS Excel worksheet after the calculations have been performed.

The least squares method is a mathematical procedure for constructing linear equation, which would most closely match a set of two series of numbers. The purpose of using this method is to minimize the total square error. Excel has tools that can help you apply this method to your calculations. Let's figure out how this is done.

· Using the method in Excel

o Enabling the “Solution Search” add-on

o Problem conditions

o Solution

Using the method in Excel

The least squares method (LSM) is a mathematical description of the dependence of one variable on another. It can be used for forecasting.

Enabling the Find Solution add-on

In order to use MNC in Excel, you need to enable the add-in "Finding a solution", which is disabled by default.

1. Go to the tab "File".

2. Click on the section name "Options".

3. In the window that opens, select the subsection "Add-ons".

4. In the block "Control", which is located at the bottom of the window, set the switch to position "Excel Add-ins"(if it has a different value) and click on the button "Go...".

5. A small window opens. We put a tick next to the parameter "Finding a solution". Click on the button "OK".

Now the function Finding a solution in Excel is activated, and its tools appear on the ribbon.

Lesson: Finding a solution in Excel

Conditions of the problem

Let us describe the application of LSM on specific example. We have two rows of numbers x And y, the sequence of which is shown in the image below.

This dependence can be most accurately described by the function:

At the same time, it is known that when x=0 y also equal 0 . That's why given equation can be described by dependence y=nx.

We have to find the minimum sum of squares of the difference.

Solution

Let's move on to a description of the direct application of the method.

1. To the left of the first value x put a number 1 . This will be an approximate value of the first coefficient value n.

2. To the right of the column y add another column - nx. In the first cell of this column we write the formula for multiplying the coefficient n per cell of the first variable x. At the same time, we make the link to the field with the coefficient absolute, since this value will not change. Click on the button Enter.

3. Using the fill marker, copy this formula for the entire range of the table in the column below.

4. In a separate cell, calculate the sum of the differences between the squares of the values y And nx. To do this, click on the button "Insert Function".



5. In the opened "Function Wizard" looking for an entry "SUMMKVARNA". Select it and press the button "OK".

6. The arguments window opens. In field "Array_x" y. In field "Array_y" enter the range of column cells nx. In order to enter values, simply place the cursor in the field and select the corresponding range on the sheet. After entering, click on the button "OK".

7. Go to the tab "Data". On the ribbon in the toolbox "Analysis" click on the button "Finding a solution".

8. The parameters window for this tool opens. In field “Optimize the objective function” indicate the address of the cell with the formula "SUMMKVARNA". In the parameter "Before" be sure to set the switch to position "Minimum". In field "Changing Cells" indicate the address with the coefficient value n. Click on the button "Find a solution".

9. The solution will be displayed in the coefficient cell n. This value will be the least square of the function. If the result satisfies the user, then click on the button "OK" in an additional window.

As you can see, the application of the least squares method is a rather complex mathematical procedure. We showed it in action using a simple example, but there are much more complex cases. However, the tools Microsoft Excel designed to simplify the calculations as much as possible.

http://multitest.semico.ru/mnk.htm

General provisions

The smaller the number in absolute value, the better the chosen straight line (2). As a characteristic of the accuracy of selecting a straight line (2), we can take the sum of squares

The minimum conditions for S will be

(6)
(7)

Equations (6) and (7) can be written as follows:

(8)
(9)

From equations (8) and (9) it is easy to find a and b from the experimental values ​​of xi and y i. Line (2), defined by equations (8) and (9), is called a line obtained by the least squares method (this name emphasizes that the sum of squares S has a minimum). Equations (8) and (9), from which straight line (2) is determined, are called normal equations.

You can indicate a simple and general way to compose normal equations. Using experimental points (1) and equation (2), we can write a system of equations for a and b

y 1 =ax 1 +b,
y 2 =ax 2 +b, ... (10)
y n = ax n + b,

Let's multiply the left and right sides of each of these equations by the coefficient of the first unknown a (i.e. by x 1, x 2, ..., x n) and add the resulting equations, resulting in the first normal equation (8).

Let us multiply the left and right sides of each of these equations by the coefficient of the second unknown b, i.e. by 1, and add the resulting equations, the result is the second normal equation (9).

This method of obtaining normal equations is general: it is suitable, for example, for the function

there is a constant value and it must be determined from experimental data (1).

The system of equations for k can be written:

Find straight line (2) using the least squares method.

Solution. We find:

X i =21, y i =46.3, x i 2 =91, x i y i =179.1.

We write equations (8) and (9)91a+21b=179.1,

21a+6b=46.3, from here we find
a=0.98 b=4.3.

The least squares method (LS) is based on minimizing the sum of squared deviations of the selected function from the data under study. In this article, we approximate the available data using linear function y = a x + b .

Least square method(English) Ordinary Least Squares , O.L.S.) is one of the basic methods of regression analysis in terms of estimating unknown parameters regression models according to sample data.

Let's consider approximation by functions that depend only on one variable:

  • Linear: y=ax+b (this article)
  • : y=a*Ln(x)+b
  • : y=a*x m
  • : y=a*EXP(b*x)+с
  • : y=ax 2 +bx+c

Note: Cases of approximation by a polynomial from the 3rd to the 6th degree are considered in this article. Approximation by a trigonometric polynomial is considered here.

Linear dependence

We are interested in the connection between 2 variables X And y. There is an assumption that y depends on X according to linear law y = ax + b. To determine the parameters of this relationship, the researcher made observations: for each value of x i, a measurement of y i was made (see example file). Accordingly, let there be 20 pairs of values ​​(x i; y i).

Note: If the change step is X is constant, then to build scatter plots can be used, if not, then you need to use the chart type Spot .

It is obvious from the diagram that the relationship between the variables is close to linear. To understand which of the many straight lines most “correctly” describes the relationship between variables, it is necessary to determine the criterion by which the lines will be compared.

As such a criterion we use the expression:

Where ŷ i = a * x i + b ; n – number of pairs of values ​​(in our case n=20)

The above expression is the sum of the squared distances between the observed values ​​of y i and ŷ i and is often denoted as SSE ( Sum of Squared Errors (Residuals), sum of squared errors (residuals)) .

Least square method is to select such a line ŷ = ax + b, for which the above expression takes the minimum value.

Note: Any line in two-dimensional space is uniquely determined by the values ​​of 2 parameters: a (slope) and b (shift).

It is believed that the smaller the sum of squared distances, the better the corresponding line approximates the available data and can be further used to predict the values ​​of y from the x variable. It is clear that even if in reality there is no relationship between the variables or the relationship is nonlinear, then OLS will still select the “best” line. Thus, the least squares method does not say anything about the presence of a real relationship between variables; the method simply allows you to select such function parameters a And b , for which the above expression is minimal.

By performing not very complex mathematical operations (for more details, see), you can calculate the parameters a And b :

As can be seen from the formula, the parameter a represents the ratio of covariance and , therefore in MS EXCEL to calculate the parameter A You can use the following formulas (see Linear sheet example file):

= KOVAR(B26:B45;C26:C45)/ DISP.G(B26:B45) or

= COVARIANCE.B(B26:B45;C26:C45)/DISP.B(B26:B45)

Also to calculate the parameter A you can use the formula = TILT(C26:C45;B26:B45). For parameter b use the formula = LEG(C26:C45;B26:B45) .

Finally, the LINEST() function allows you to calculate both parameters at once. To enter a formula LINEST(C26:C45;B26:B45) You need to select 2 cells in a row and click CTRL + SHIFT + ENTER(see article about). The value will be returned in the left cell A , on the right – b .

Note: To avoid messing with the input array formulas you will need to additionally use the INDEX() function. Formula = INDEX(LINEST(C26:C45,B26:B45),1) or just = LINEST(C26:C45;B26:B45) will return the parameter responsible for the slope of the line, i.e. A . Formula = INDEX(LINEST(C26:C45,B26:B45),2) will return the parameter responsible for the intersection of the line with the Y axis, i.e. b .

Having calculated the parameters, scatter diagram you can draw the corresponding line.

Another way to plot a straight line using the least squares method is the graph tool Trend line. To do this, select the diagram, select from the menu Layout tab, V group Analysis click Trend line, then Linear approximation .

By checking the “show equation in diagram” box in the dialog box, you can make sure that the parameters found above coincide with the values ​​​​in the diagram.

Note: In order for the parameters to match, the diagram type must be . The point is that when constructing a diagram Schedule X-axis values ​​cannot be specified by the user (the user can only specify labels that do not affect the location of the points). Instead of X values, the sequence 1 is used; 2; 3; ... (for numbering categories). Therefore, if you build trend line on a type diagram Schedule, then instead of the actual values ​​of X the values ​​of this sequence will be used, which will lead to an incorrect result (unless, of course, the actual values ​​of X do not coincide with the sequence 1; 2; 3; ...).

4.1. Using built-in functions

Calculation regression coefficients carried out using the function

LINEST(Values_y; x-values; Const; statistics),

Values_y- array of y values,

x-values- optional array of values x, if array X is omitted, it is assumed that this is an array (1;2;3;...) of the same size as Values_y,

Const- a boolean value that indicates whether the constant is required b was equal to 0. If Const has the meaning TRUE or omitted, then b is calculated in the usual way. If the argument Const is FALSE, then b is assumed to be 0 and the values a are selected so that the relation is fulfilled y=ax.

Statistics is a boolean value that indicates whether additional regression statistics are required to be returned. If the argument Statistics has the meaning TRUE, then the function LINEST returns additional regression statistics. If the argument Statistics has the meaning LIE or omitted, then the function LINEST returns only the coefficient a and constant b.

It must be remembered that the result of the functions LINEST() is a set of values ​​– an array.

For calculation correlation coefficient function is used

CORREL(Array1;Array2),

returning the values ​​of the correlation coefficient, where Array1- array of values y, Array2- array of values x. Array1 And Array2 must be the same size.

EXAMPLE 1. Addiction y(x) is presented in the table. Build regression line and calculate correlation coefficient.

y 0.5 1.5 2.5 3.5
x 2.39 2.81 3.25 3.75 4.11 4.45 4.85 5.25

Let's enter a table of values ​​into a MS Excel sheet and build a scatter plot. The worksheet will take the form shown in Fig. 2.

In order to calculate the values ​​of regression coefficients A And b select the cells A7:B7, Let's go to the function wizard and in the category Statistical select a function LINEST. Let's fill in the dialog box that appears as shown in Fig. 3 and press OK.


As a result, the calculated value will appear only in the cell A6(Fig. 4). In order for the value to appear in the cell B6 you need to enter edit mode (key F2), and then press the key combination CTRL+SHIFT+ENTER.



To calculate the value of the correlation coefficient in a cell C6 the following formula was introduced:

C7=CORREL(B3:J3;B2:J2).


Knowing the regression coefficients A And b let's calculate the function values y=ax+b for given x. To do this, we introduce the formula

B5=$A$7*B2+$B$7

and copy it to the range C5:J5(Fig. 5).

Let's plot the regression line on the diagram. Select the experimental points on the graph, right-click and select the command Initial data. In the dialog box that appears (Fig. 5), select the tab Row and click on the button Add. Let's fill in the input fields as shown in Fig. 6 and press the button OK. A regression line will be added to the experimental data graph. By default, its graph will be drawn as points not connected by smoothing lines.

Rice. 6

To change the appearance of the regression line, perform the following steps. Right-click on the points depicting the line graph and select the command Chart type and set the type of scatter diagram, as shown in Fig. 7.

The line type, color and thickness can be changed as follows. Select a line on the diagram, right-click and select the command in the context menu Data series format... Next, make settings, for example, as shown in Fig. 8.

As a result of all transformations, we obtain a graph of experimental data and a regression line in one graphical area (Fig. 9).

4.2. Using a trend line.

The construction of various approximating dependencies in MS Excel is implemented as a chart property - trend line.

EXAMPLE 2. As a result of the experiment, a certain tabular dependence was determined.

0.15 0.16 0.17 0.18 0.19 0.20
4.4817 4.4930 5.4739 6.0496 6.6859 7.3891

Select and construct an approximating dependence. Construct graphs of tabular and selected analytical dependences.

Solving the problem can be divided into the following stages: entering initial data, constructing a scatter plot and adding a trend line to this graph.

Let's look at this process in detail. Let's enter the initial data into the worksheet and plot the experimental data. Next, select the experimental points on the graph, right-click and use the command Add l trend line(Fig. 10).

The dialog box that appears allows you to build an approximating relationship.

The first tab (Fig. 11) of this window indicates the type of approximating dependence.

On the second (Fig. 12) the construction parameters are determined:

· name of the approximating dependence;

· forecast forward (backward) by n units (this parameter determines how many units forward (backward) the trend line needs to be extended);

whether to show the point of intersection of a curve with a straight line y=const;

· show the approximating function on the diagram or not (the option to show the equation on the diagram);

· whether to place the value of the standard deviation on the diagram or not (the option to place the value of the approximation reliability on the diagram).

Let us choose a polynomial of the second degree as an approximating dependence (Fig. 11) and display the equation that describes this polynomial on a graph (Fig. 12). The resulting diagram is shown in Fig. 13.

Similarly using trend lines you can select the parameters of such dependencies as

linear y=a∙x+b,

logarithmic y=a∙ln(x)+b,

· exponential y=a∙e b,

· sedate y=a∙x b,

polynomial y=a∙x 2 +b∙x+c, y=a∙x 3 +b∙x 2 +c∙x+d and so on, up to a polynomial of the 6th degree inclusive,

· linear filtration.

4.3. Using a solver block

Of significant interest is the implementation in MS Excel of selecting parameters using the least squares method using a solver block. This technique allows you to select the parameters of a function of any type. Let's consider this possibility using the following problem as an example.

EXAMPLE 3. As a result of the experiment, the dependence z(t) was obtained, presented in the table

0,66 0,9 1,17 1,47 1,7 1,74 2,08 2,63 3,12
38,9 68,8 64,4 66,5 64,95 59,36 82,6 90,63 113,5

Select dependence coefficients Z(t)=At 4 +Bt 3 +Ct 2 +Dt+K least squares method.

This problem is equivalent to the problem of finding the minimum of a function of five variables

Let's consider the process of solving the optimization problem (Fig. 14).

Let the values A, IN, WITH, D And TO stored in cells A7:E7. Let's calculate the theoretical values ​​of the function Z(t)=At 4 +Bt 3 +Ct 2 +Dt+K for given t(B2:J2). To do this, in the cell B4 enter the value of the function at the first point (cell B2):

B4=$A$7*B2^4+$B$7*B2^3+$C$7*B2^2+$D$7*B2+$E$7.

Let's copy this formula into the range C4:J4 and get the expected value of the function at the points whose abscissas are stored in the cells B2:J2.

To cell B5 Let's introduce a formula that calculates the square of the difference between the experimental and calculated points:

B5=(B4-B3)^2,

and copy it to the range C5:J5. In a cell F7 we will store the total squared error (10). To do this, enter the formula:

F7 = SUM(B5:J5).

Let's use the command Service®Search for a solution and solve the optimization problem without restrictions. Let's fill in the input fields in the dialog box shown in Fig. accordingly. 14 and press the button Execute. If a solution is found, the window shown in Fig. 15.

The result of the decision block will be output to cells A7:E7parameter values functions Z(t)=At 4 +Bt 3 +Ct 2 +Dt+K. In cells B4:J4 we get expected function value at the starting points. In a cell F7 will be stored total square error.

You can display experimental points and a fitted line in one graphic area by selecting a range B2:J4, call Chart Wizard, and then format the appearance of the resulting graphs.

Rice. 17 displays the MS Excel worksheet after the calculations have been performed.


5. REFERENCES

1. Alekseev E.R., Chesnokova O.V., Solving problems of computational mathematics in the Mathcad12, MATLAB7, Maple9 packages. – NT Press, 2006.–596 p. :il. –(Tutorial)

2. Alekseev E.R., Chesnokova O.V., E.A. Rudchenko, Scilab, solution for engineering and mathematical problems. –M., BINOM, 2008.–260 p.

3. Berezin I.S., Zhidkov N.P., Methods of calculations. – M.: Nauka, 1966. – 632 p.

4. Garnaev A.Yu., Using MS EXCEL and VBA in economics and finance. – St. Petersburg: BHV - Petersburg, 1999.–332 p.

5. Demidovich B.P., Maron I.A., Shuvalova V.Z., Numerical methods analysis. – M.: Nauka, 1967. – 368 p.

6. Korn G., Korn T., Handbook of mathematics for scientific workers and engineers. – M., 1970, 720 p.

7. Alekseev E.R., Chesnokova O.V. Guidelines for implementation laboratory work in MS EXCEL. For students of all specialties. Donetsk, DonNTU, 2004. 112 p.

The method of least squares is a mathematical procedure for constructing a linear equation that will most accurately fit a set of two series of numbers. The purpose of using this method is to minimize the total square error. Excel has tools that can help you apply this method to your calculations. Let's figure out how this is done.

The least squares method (LSM) is a mathematical description of the dependence of one variable on another. It can be used for forecasting.

Enabling the Find Solution add-on

In order to use MNC in Excel, you need to enable the add-in "Finding a solution", which is disabled by default.


Now the function Finding a solution in Excel is activated, and its tools appear on the ribbon.

Conditions of the problem

Let us describe the use of LSM using a specific example. We have two rows of numbers x And y , the sequence of which is shown in the image below.

This dependence can be most accurately described by the function:

At the same time, it is known that when x=0 y also equal 0 . Therefore, this equation can be described by the dependence y=nx .

We have to find the minimum sum of squares of the difference.

Solution

Let's move on to a description of the direct application of the method.


As you can see, the application of the least squares method is a rather complex mathematical procedure. We showed it in action using a simple example, but there are much more complex cases. However, Microsoft Excel tools are designed to simplify the calculations as much as possible.

mob_info