Example.

Experimental data on the values ​​of variables X And at are given in the table.

As a result of their alignment, the function

Using method least squares , approximate these data with a linear dependence y=ax+b(find options A And b). Find out which of the two lines is better (in the sense of the least squares method) aligns the experimental data. Make a drawing.

The essence of the method of least squares (LSM).

The problem is to find the coefficients linear dependence, for which the function of two variables A And b accepts smallest value. That is, given the data A And b the sum of the squared deviations of the experimental data from the found straight line will be the smallest. This is the whole point of the least squares method.

Thus, the solution of the example is reduced to finding the extremum of a function of two variables.

Derivation of formulas for finding coefficients.

A system of two equations with two unknowns is compiled and solved. Finding partial derivatives of a function with respect to variables A And b, we equate these derivatives to zero.

We solve the resulting system of equations by any method (for example substitution method or ) and obtain formulas for finding coefficients using the least squares method (LSM).

With data A And b function takes the smallest value. The proof of this fact is given.

That's the whole method of least squares. Formula for finding the parameter a contains the sums , , , and the parameter n- amount of experimental data. The values ​​of these sums are recommended to be calculated separately. Coefficient b found after calculation a.

It's time to remember the original example.

Solution.

In our example n=5. We fill in the table for the convenience of calculating the amounts that are included in the formulas of the required coefficients.

The values ​​in the fourth row of the table are obtained by multiplying the values ​​of the 2nd row by the values ​​of the 3rd row for each number i.

The values ​​in the fifth row of the table are obtained by squaring the values ​​of the 2nd row for each number i.

The values ​​of the last column of the table are the sums of the values ​​across the rows.

We use the formulas of the least squares method to find the coefficients A And b. We substitute in them the corresponding values ​​from the last column of the table:

Hence, y=0.165x+2.184 is the desired approximating straight line.

It remains to find out which of the lines y=0.165x+2.184 or better approximates the original data, i.e. to make an estimate using the least squares method.

Estimation of the error of the method of least squares.

To do this, you need to calculate the sums of squared deviations of the original data from these lines And , a smaller value corresponds to a line that better approximates the original data in terms of the least squares method.

Since , then the line y=0.165x+2.184 approximates the original data better.

Graphic illustration of the least squares method (LSM).

Everything looks great on the charts. The red line is the found line y=0.165x+2.184, the blue line is , the pink dots are the original data.

What is it for, what are all these approximations for?

I personally use to solve data smoothing problems, interpolation and extrapolation problems (in the original example, you could be asked to find the value of the observed value y at x=3 or when x=6 according to the MNC method). But we will talk more about this later in another section of the site.

Proof.

So that when found A And b function takes the smallest value, it is necessary that at this point the matrix of the quadratic form of the second-order differential for the function was positive definite. Let's show it.

Approximation of experimental data is a method based on the replacement of experimentally obtained data with an analytical function that most closely passes or coincides at the nodal points with the initial values ​​(data obtained during the experiment or experiment). There are currently two ways to define an analytic function:

By constructing an n-degree interpolation polynomial that passes directly through all points given array of data. In this case, the approximating function is represented as: an interpolation polynomial in the Lagrange form or an interpolation polynomial in the Newton form.

By constructing an n-degree approximating polynomial that passes close to points from the given data array. Thus, the approximating function smoothes out all random noise (or errors) that may occur during the experiment: the measured values ​​during the experiment depend on random factors that fluctuate according to their own random laws (measurement or instrument errors, inaccuracy or experimental errors). In this case, the approximating function is determined by the least squares method.

Least square method(in the English literature Ordinary Least Squares, OLS) is a mathematical method based on the definition of an approximating function, which is built in the closest proximity to points from a given array of experimental data. The proximity of the initial and approximating functions F(x) is determined by a numerical measure, namely: the sum of the squared deviations of the experimental data from the approximating curve F(x) should be the smallest.

Fitting curve constructed by the least squares method

The least squares method is used:

To solve overdetermined systems of equations when the number of equations exceeds the number of unknowns;

To search for a solution in the case of ordinary (not overdetermined) nonlinear systems of equations;

For approximating point values ​​by some approximating function.

The approximating function by the least squares method is determined from the condition of the minimum sum of squared deviations of the calculated approximating function from a given array of experimental data. This criterion of the least squares method is written as the following expression:

Values ​​of the calculated approximating function at nodal points ,

Specified array of experimental data at nodal points .

The quadratic criterion has a number of "good" properties, such as differentiability, providing a unique solution to the approximation problem with polynomial approximating functions.

Depending on the conditions of the problem, the approximating function is a polynomial of degree m

The degree of the approximating function does not depend on the number of nodal points, but its dimension must always be less than the dimension (number of points) of the given array of experimental data.

∙ If the degree of the approximating function is m=1, then we approximate the table function with a straight line (linear regression).

∙ If the degree of the approximating function is m=2, then we approximate the table function with a quadratic parabola (quadratic approximation).

∙ If the degree of the approximating function is m=3, then we approximate the table function with a cubic parabola (cubic approximation).

IN general case, when it is required to construct an approximating polynomial of degree m for given tabular values, the condition for the minimum sum of squared deviations over all nodal points is rewritten in the following form:

- unknown coefficients of the approximating polynomial of degree m;

The number of specified table values.

A necessary condition for the existence of a minimum of a function is the equality to zero of its partial derivatives with respect to unknown variables . As a result, we get next system equations:

Let's transform the resulting linear system of equations: open the brackets and move the free terms to the right side of the expression. As a result, the resulting system of linear algebraic expressions will be written in the following form:

This system linear algebraic expressions can be rewritten in matrix form:

As a result, a system of linear equations of dimension m + 1 was obtained, which consists of m + 1 unknowns. This system can be solved using any method for solving linear algebraic equations (for example, the Gauss method). As a result of the solution, unknown parameters of the approximating function will be found that provide the minimum sum of squared deviations of the approximating function from the original data, i.e. the best possible quadratic approximation. It should be remembered that if even one value of the initial data changes, all coefficients will change their values, since they are completely determined by the initial data.

Approximation of initial data by linear dependence

(linear regression)

As an example, consider the method for determining the approximating function, which is given as a linear relationship. In accordance with the least squares method, the condition for the minimum sum of squared deviations is written as follows:

Coordinates of nodal points of the table;

Unknown coefficients of the approximating function, which is given as a linear relationship.

A necessary condition for the existence of a minimum of a function is the equality to zero of its partial derivatives with respect to unknown variables. As a result, we obtain the following system of equations:

Let us transform the resulting linear system of equations.

We solve the resulting system of linear equations. The coefficients of the approximating function in the analytical form are determined as follows (Cramer's method):

These coefficients provide the construction of a linear approximating function in accordance with the criterion for minimizing the sum of squares of the approximating function from given tabular values ​​(experimental data).

Algorithm for implementing the method of least squares

1. Initial data:

Given an array of experimental data with the number of measurements N

The degree of the approximating polynomial (m) is given

2. Calculation algorithm:

2.1. Coefficients are determined for constructing a system of equations with dimension

Coefficients of the system of equations (left side of the equation)

- index of the column number of the square matrix of the system of equations

Free members of the system of linear equations (right side of the equation)

- index of the row number of the square matrix of the system of equations

2.2. Formation of a system of linear equations with dimension .

2.3. Solution of a system of linear equations in order to determine the unknown coefficients of the approximating polynomial of degree m.

2.4 Determination of the sum of squared deviations of the approximating polynomial from the initial values ​​over all nodal points

The found value of the sum of squared deviations is the minimum possible.

Approximation with Other Functions

It should be noted that when approximating the initial data in accordance with the least squares method, a logarithmic function, an exponential function, and a power function are sometimes used as an approximating function.

Log approximation

Consider the case when the approximating function is given logarithmic function type:

Choosing the type of regression function, i.e. the type of the considered model of the dependence of Y on X (or X on Y), for example, a linear model y x = a + bx, it is necessary to determine the specific values ​​of the coefficients of the model.

At different values a and b it is possible to construct an infinite number of dependencies of the form y x =a+bx i.e. there are an infinite number of lines on the coordinate plane, but we need such a dependence that corresponds to the observed values the best way. Thus, the problem is reduced to the selection of the best coefficients.

We are looking for a linear function a + bx, based only on a certain number of available observations. To find the function with the best fit to the observed values, we use the least squares method.

Denote: Y i - the value calculated by the equation Y i =a+bx i . y i - measured value, ε i =y i -Y i - difference between the measured and calculated values, ε i =y i -a-bx i .

The method of least squares requires that ε i , the difference between the measured y i and the values ​​of Y i calculated from the equation, be minimal. Therefore, we find the coefficients a and b so that the sum of the squared deviations of the observed values ​​from the values ​​on the straight regression line is the smallest:

Investigating this function of arguments a and with the help of derivatives to an extremum, we can prove that the function takes on a minimum value if the coefficients a and b are solutions of the system:

(2)

If we divide both sides of the normal equations by n, we get:

Given that (3)

Get , from here, substituting the value of a in the first equation, we get:

In this case, b is called the regression coefficient; a is called the free member of the regression equation and is calculated by the formula:

The resulting straight line is an estimate for the theoretical regression line. We have:

So, is a linear regression equation.

Regression can be direct (b>0) and inverse (b Example 1. The results of measuring the X and Y values ​​are given in the table:

x i -2 0 1 2 4
y i 0.5 1 1.5 2 3

Assuming that there is a linear relationship between X and Y y=a+bx, determine the coefficients a and b using the least squares method.

Solution. Here n=5
x i =-2+0+1+2+4=5;
x i 2 =4+0+1+4+16=25
x i y i =-2 0.5+0 1+1 1.5+2 2+4 3=16.5
y i =0.5+1+1.5+2+3=8

and normal system (2) has the form

Solving this system, we get: b=0.425, a=1.175. Therefore y=1.175+0.425x.

Example 2. There is a sample of 10 observations economic indicators(X) and (Y).

x i 180 172 173 169 175 170 179 170 167 174
y i 186 180 176 171 182 166 182 172 169 177

It is required to find a sample regression equation Y on X. Construct a sample regression line Y on X.

Solution. 1. Let's sort the data by values ​​x i and y i . We get a new table:

x i 167 169 170 170 172 173 174 175 179 180
y i 169 171 166 172 180 176 177 182 182 186

To simplify the calculations, we will compile a calculation table in which we will enter the necessary numerical values.

x i y i x i 2 x i y i
167 169 27889 28223
169 171 28561 28899
170 166 28900 28220
170 172 28900 29240
172 180 29584 30960
173 176 29929 30448
174 177 30276 30798
175 182 30625 31850
179 182 32041 32578
180 186 32400 33480
∑x i =1729 ∑y i =1761 ∑x i 2 299105 ∑x i y i =304696
x=172.9 y=176.1 x i 2 =29910.5 xy=30469.6

According to formula (4), we calculate the regression coefficient

and by formula (5)

Thus, the sample regression equation looks like y=-59.34+1.3804x.
Let's plot the points (x i ; y i) on the coordinate plane and mark the regression line.


Fig 4

Figure 4 shows how the observed values ​​are located relative to the regression line. To numerically estimate the deviations of y i from Y i , where y i are observed values, and Y i are values ​​determined by regression, we will make a table:

x i y i Y i Y i -y i
167 169 168.055 -0.945
169 171 170.778 -0.222
170 166 172.140 6.140
170 172 172.140 0.140
172 180 174.863 -5.137
173 176 176.225 0.225
174 177 177.587 0.587
175 182 178.949 -3.051
179 182 184.395 2.395
180 186 185.757 -0.243

Y i values ​​are calculated according to the regression equation.

The noticeable deviation of some observed values ​​from the regression line is explained by the small number of observations. When studying the degree of linear dependence of Y on X, the number of observations is taken into account. The strength of the dependence is determined by the value of the correlation coefficient.

which finds the widest application in various fields science and practice. It can be physics, chemistry, biology, economics, sociology, psychology and so on and so forth. By the will of fate, I often have to deal with the economy, and therefore today I will arrange for you a ticket to an amazing country called Econometrics=) … How do you not want that?! It's very good there - you just have to decide! …But what you probably definitely want is to learn how to solve problems least squares. And especially diligent readers will learn to solve them not only accurately, but also VERY FAST ;-) But first general statement of the problem+ related example:

Let indicators be studied in some subject area that have a quantitative expression. At the same time, there is every reason to believe that the indicator depends on the indicator. This assumption can be both a scientific hypothesis and based on an elementary common sense. Let's leave science aside, however, and explore more appetizing areas - namely, grocery stores. Denote by:

– retail space of a grocery store, sq.m.,
- annual turnover of a grocery store, million rubles.

It is quite clear that the larger the area of ​​the store, the greater its turnover in most cases.

Suppose that after conducting observations / experiments / calculations / dancing with a tambourine, we have at our disposal numerical data:

With grocery stores, I think everything is clear: - this is the area of ​​the 1st store, - its annual turnover, - the area of ​​the 2nd store, - its annual turnover, etc. By the way, it is not at all necessary to have access to classified materials - a fairly accurate assessment of the turnover can be obtained using mathematical statistics. However, do not be distracted, the course of commercial espionage is already paid =)

Tabular data can also be written in the form of points and depicted in the usual way for us. Cartesian system .

Let's answer an important question: how many points do you need qualitative research?

The bigger, the better. The minimum admissible set consists of 5-6 points. In addition, if not in large numbers data, "abnormal" results should not be included in the sample. So, for example, a small elite store can help out orders of magnitude more than “their colleagues”, thereby distorting general pattern, which is to be found!

If it’s quite simple, we need to choose a function , schedule which passes as close as possible to the points . Such a function is called approximating (approximation - approximation) or theoretical function . Generally speaking, here immediately appears the obvious "applicant" - the polynomial high degree, whose graph passes through ALL points. But this option is complicated, and often simply incorrect. (because the chart will “wind” all the time and poorly reflect the main trend).

Thus, the desired function must be sufficiently simple and at the same time reflect the dependence adequately. As you might guess, one of the methods for finding such functions is called least squares. First, let's analyze its essence in general view. Let some function approximate the experimental data:


How to evaluate the accuracy of this approximation? Let us also calculate the differences (deviations) between the experimental and functional values (we study the drawing). The first thought that comes to mind is to estimate how big the sum is, but the problem is that the differences can be negative. (For example, ) and deviations as a result of such summation will cancel each other out. Therefore, as an estimate of the accuracy of the approximation, it suggests itself to take the sum modules deviations:

or in folded form: (suddenly, who doesn’t know: is the sum icon, and is an auxiliary variable-“counter”, which takes values ​​from 1 to ).

Approximating the experimental points with various functions, we will obtain different meanings, and obviously, where this sum is less, that function is more accurate.

Such a method exists and is called least modulus method. However, in practice it has become much more widespread. least square method, in which the possible negative values are eliminated not by the modulus, but by squaring the deviations:

, after which efforts are directed to the selection of such a function that the sum of the squared deviations was as small as possible. Actually, hence the name of the method.

And now we're back to another important point: as noted above, the selected function should be quite simple - but there are also many such functions: linear , hyperbolic, exponential, logarithmic, quadratic etc. And, of course, here I would immediately like to "reduce the field of activity." What class of functions to choose for research? Primitive but effective technique:

- The easiest way to draw points on the drawing and analyze their location. If they tend to be in a straight line, then you should look for straight line equation with optimal values ​​and . In other words, the task is to find SUCH coefficients - so that the sum of the squared deviations is the smallest.

If the points are located, for example, along hyperbole, then it is clear that the linear function will give a poor approximation. In this case, we are looking for the most “favorable” coefficients for the hyperbola equation - those that give the minimum sum of squares .

Now notice that in both cases we are talking about functions of two variables, whose arguments are searched dependency options:

And in essence, we need to solve a standard problem - to find minimum of a function of two variables.

Recall our example: suppose that the "shop" points tend to be located in a straight line and there is every reason to believe the presence linear dependence turnover from the trading area. Let's find SUCH coefficients "a" and "be" so that the sum of squared deviations was the smallest. Everything as usual - first partial derivatives of the 1st order. According to linearity rule you can differentiate right under the sum icon:

If you want to use this information for an essay or a term paper, I will be very grateful for the link in the list of sources, you will not find such detailed calculations anywhere:

Let's make a standard system:

We reduce each equation by a “two” and, in addition, “break apart” the sums:

Note : independently analyze why "a" and "be" can be taken out of the sum icon. By the way, formally this can be done with the sum

Let's rewrite the system in an "applied" form:

after which the algorithm for solving our problem begins to be drawn:

Do we know the coordinates of the points? We know. Sums can we find? Easily. We compose the simplest system of two linear equations with two unknowns("a" and "beh"). We solve the system, for example, Cramer's method, resulting in a stationary point . Checking sufficient condition for an extremum, we can verify that at this point the function reaches precisely minimum. Verification is associated with additional calculations and therefore we will leave it behind the scenes. (if necessary, the missing frame can be viewed). We draw the final conclusion:

Function the best way (at least compared to any other linear function) brings experimental points closer . Roughly speaking, its graph passes as close as possible to these points. In tradition econometrics the resulting approximating function is also called paired linear regression equation .

The problem under consideration is of great practical importance. In the situation with our example, the equation allows you to predict what kind of turnover ("yig") will be at the store with one or another value of the selling area (one or another meaning of "x"). Yes, the resulting forecast will be only a forecast, but in many cases it will turn out to be quite accurate.

I will analyze just one problem with "real" numbers, since there are no difficulties in it - all calculations are at the level school curriculum 7-8 grade. In 95 percent of cases, you will be asked to find just a linear function, but at the very end of the article I will show that it is no more difficult to find the equations for the optimal hyperbola, exponent, and some other functions.

In fact, it remains to distribute the promised goodies - so that you learn how to solve such examples not only accurately, but also quickly. We carefully study the standard:

Task

As a result of studying the relationship between two indicators, the following pairs of numbers were obtained:

Using the least squares method, find the linear function that best approximates the empirical (experienced) data. Make a drawing in which in Cartesian rectangular system coordinates to build experimental points and a graph of the approximating function . Find the sum of squared deviations between empirical and theoretical values. Find out if the function is better (in terms of the least squares method) approximate experimental points.

Note that "x" values ​​are natural values, and this has a characteristic meaningful meaning, which I will talk about a little later; but they, of course, can be fractional. In addition, depending on the content of a particular task, both "X" and "G" values ​​can be fully or partially negative. Well, we have been given a “faceless” task, and we start it solution:

We find the coefficients of the optimal function as a solution to the system:

For the purposes of a more compact notation, the “counter” variable can be omitted, since it is already clear that the summation is carried out from 1 to .

It is more convenient to calculate the required amounts in a tabular form:


Calculations can be carried out on a microcalculator, but it is much better to use Excel - both faster and without errors; watch a short video:

Thus, we get the following system:

Here you can multiply the second equation by 3 and subtract the 2nd from the 1st equation term by term. But this is luck - in practice, systems are often not gifted, and in such cases it saves Cramer's method:
, so the system has a unique solution.

Let's do a check. I understand that I don’t want to, but why skip mistakes where you can absolutely not miss them? Substitute the found solution into the left side of each equation of the system:

The right parts of the corresponding equations are obtained, which means that the system is solved correctly.

Thus, the desired approximating function: – from all linear functions experimental data is best approximated by it.

Unlike straight dependence of the store's turnover on its area, the found dependence is reverse (principle "the more - the less"), and this fact is immediately revealed by the negative angular coefficient. Function informs us that with an increase in a certain indicator by 1 unit, the value of the dependent indicator decreases average by 0.65 units. As they say, the higher the price of buckwheat, the less sold.

To plot the approximating function, we find two of its values:

and execute the drawing:


The constructed line is called trend line (namely, a linear trend line, i.e. in the general case, a trend is not necessarily a straight line). Everyone is familiar with the expression "to be in trend", and I think that this term does not need additional comments.

Calculate the sum of squared deviations between empirical and theoretical values. Geometrically, this is the sum of the squares of the lengths of the "crimson" segments (two of which are so small you can't even see them).

Let's summarize the calculations in a table:


They can again be carried out manually, just in case I will give an example for the 1st point:

but it is much more efficient to do the already known way:

Let's repeat: what is the meaning of the result? From all linear functions function the exponent is the smallest, that is, it is the best approximation in its family. And here, by the way, the final question of the problem is not accidental: what if the proposed exponential function will it be better to approximate the experimental points?

Let's find the corresponding sum of squared deviations - to distinguish them, I will designate them with the letter "epsilon". The technique is exactly the same:


And again for every fire calculation for the 1st point:

In Excel, we use the standard function EXP (Syntax can be found in Excel Help).

Conclusion: , so the exponential function approximates the experimental points worse than the straight line .

But it should be noted here that "worse" is doesn't mean yet, what is wrong. Now I built a graph of this exponential function - and it also passes close to the points - so much so that without an analytical study it is difficult to say which function is more accurate.

This completes the solution, and I return to the question of the natural values ​​of the argument. In various studies, as a rule, economic or sociological, months, years or other equal time intervals are numbered with natural "X". Consider, for example, such a problem.

I am a computer programmer. Most big jump in my career I accomplished when I learned to say: "I do not understand anything!" Now I am not ashamed to tell the luminary of science that he is giving me a lecture, that I do not understand what it, the luminary, is talking to me about. And it's very difficult. Yes, it's hard and embarrassing to admit you don't know. Who likes to admit that he does not know the basics of something-there. By virtue of my profession, I have to attend a large number of presentations and lectures, where, I confess, in the vast majority of cases I feel sleepy, because I do not understand anything. And I don’t understand because the huge problem of the current situation in science lies in mathematics. It assumes that all students are familiar with absolutely all areas of mathematics (which is absurd). To admit that you do not know what a derivative is (that this is a little later) is a shame.

But I've learned to say that I don't know what multiplication is. Yes, I don't know what a subalgebra over a Lie algebra is. Yes, I don't know why you need in life quadratic equations. By the way, if you are sure that you know, then we have something to talk about! Mathematics is a series of tricks. Mathematicians try to confuse and intimidate the public; where there is no confusion, no reputation, no authority. Yes, it is prestigious to speak in the most abstract language possible, which is complete nonsense in itself.

Do you know what a derivative is? Most likely you will tell me about the limit of the difference relation. In the first year of mathematics at St. Petersburg State University, Viktor Petrovich Khavin me defined derivative as the coefficient of the first term of the Taylor series of the function at the point (it was a separate gymnastics to determine the Taylor series without derivatives). I laughed at this definition for a long time, until I finally understood what it was about. The derivative is nothing more than just a measure of how much the function we are differentiating is similar to the function y=x, y=x^2, y=x^3.

I now have the honor of lecturing students who afraid mathematics. If you are afraid of mathematics - we are on the way. As soon as you try to read some text and it seems to you that it is overly complicated, then know that it is badly written. I argue that there is not a single area of ​​mathematics that cannot be spoken about "on fingers" without losing accuracy.

The challenge for the near future: I instructed my students to understand what a linear-quadratic controller is. Don't be shy, waste three minutes of your life, follow the link. If you do not understand anything, then we are on the way. I (a professional mathematician-programmer) also did not understand anything. And I assure you, this can be sorted out "on the fingers." On this moment I do not know what it is, but I assure you that we will be able to figure it out.

So, the first lecture that I am going to give to my students after they come running to me in horror with the words that the linear-quadratic controller is a terrible bug that you will never master in your life is least squares methods. Can you decide linear equations? If you are reading this text, then most likely not.

So, given two points (x0, y0), (x1, y1), for example, (1,1) and (3,2), the task is to find the equation of a straight line passing through these two points:

illustration

This straight line should have an equation like the following:

Here alpha and beta are unknown to us, but two points of this line are known:

You can write this equation in matrix form:

Here we should make a lyrical digression: what is a matrix? A matrix is ​​nothing but a two-dimensional array. This is a way of storing data, no more values ​​​​should be given to it. It is up to us how exactly to interpret a certain matrix. Periodically, I will interpret it as a linear mapping, periodically as a quadratic form, and sometimes simply as a set of vectors. This will all be clarified in context.

Let's replace specific matrices with their symbolic representation:

Then (alpha, beta) can be easily found:

More specifically for our previous data:

Which leads to the following equation of a straight line passing through the points (1,1) and (3,2):

Okay, everything is clear here. And let's find the equation of a straight line passing through three points: (x0,y0), (x1,y1) and (x2,y2):

Oh-oh-oh, but we have three equations for two unknowns! The standard mathematician will say that there is no solution. What will the programmer say? And he will first rewrite the previous system of equations in the following form:

In our case, the vectors i, j, b are three-dimensional, therefore, (in the general case) there is no solution to this system. Any vector (alpha\*i + beta\*j) lies in the plane spanned by the vectors (i, j). If b does not belong to this plane, then there is no solution (equality in the equation cannot be achieved). What to do? Let's look for a compromise. Let's denote by e(alpha, beta) how exactly we did not achieve equality:

And we will try to minimize this error:

Why a square?

We are looking not just for the minimum of the norm, but for the minimum of the square of the norm. Why? The minimum point itself coincides, and the square gives a smooth function (a quadratic function of the arguments (alpha,beta)), while just the length gives a function in the form of a cone, non-differentiable at the minimum point. Brr. Square is more convenient.

Obviously, the error is minimized when the vector e orthogonal to the plane spanned by the vectors i And j.

Illustration

In other words: we are looking for a line such that the sum of the squared lengths of the distances from all points to this line is minimal:

UPDATE: here I have a jamb, the distance to the straight line should be measured vertically, not orthogonal projection. the commenter is right.

Illustration

In completely different words (carefully, poorly formalized, but it should be clear on the fingers): we take all possible lines between all pairs of points and look for the average line between all:

Illustration

Another explanation on the fingers: we attach a spring between all data points (here we have three) and the line that we are looking for, and the line of the equilibrium state is exactly what we are looking for.

Quadratic form minimum

So, given the vector b and the plane spanned by the columns-vectors of the matrix A(in this case (x0,x1,x2) and (1,1,1)), we are looking for a vector e with a minimum square of length. Obviously, the minimum is achievable only for the vector e, orthogonal to the plane spanned by the columns-vectors of the matrix A:

In other words, we are looking for a vector x=(alpha, beta) such that:

I remind you that this vector x=(alpha, beta) is the minimum quadratic function||e(alpha, beta)||^2:

Here it is useful to remember that the matrix can be interpreted as well as the quadratic form, for example, the identity matrix ((1,0),(0,1)) can be interpreted as a function of x^2 + y^2:

quadratic form

All this gymnastics is known as linear regression.

Laplace equation with Dirichlet boundary condition

Now the simplest real problem: there is a certain triangulated surface, it is necessary to smooth it. For example, let's load my face model:

The original commit is available. To minimize external dependencies I took the code of my software renderer, already on Habré. For solutions linear system I use OpenNL , it's a great solver, but it's really hard to install: you need to copy two files (.h+.c) to your project folder. All smoothing is done by the following code:

For (int d=0; d<3; d++) { nlNewContext(); nlSolverParameteri(NL_NB_VARIABLES, verts.size()); nlSolverParameteri(NL_LEAST_SQUARES, NL_TRUE); nlBegin(NL_SYSTEM); nlBegin(NL_MATRIX); for (int i=0; i<(int)verts.size(); i++) { nlBegin(NL_ROW); nlCoefficient(i, 1); nlRightHandSide(verts[i][d]); nlEnd(NL_ROW); } for (unsigned int i=0; i&face = faces[i]; for (int j=0; j<3; j++) { nlBegin(NL_ROW); nlCoefficient(face[ j ], 1); nlCoefficient(face[(j+1)%3], -1); nlEnd(NL_ROW); } } nlEnd(NL_MATRIX); nlEnd(NL_SYSTEM); nlSolve(); for (int i=0; i<(int)verts.size(); i++) { verts[i][d] = nlGetVariable(i); } }

X, Y and Z coordinates are separable, I smooth them separately. That is, I solve three systems of linear equations, each with the same number of variables as the number of vertices in my model. The first n rows of matrix A have only one 1 per row, and the first n rows of vector b have original model coordinates. That is, I spring-tie between the new vertex position and the old vertex position - the new ones shouldn't be too far away from the old ones.

All subsequent rows of matrix A (faces.size()*3 = the number of edges of all triangles in the grid) have one occurrence of 1 and one occurrence of -1, while the vector b has zero components opposite. This means I put a spring on each edge of our triangular mesh: all edges try to get the same vertex as their starting and ending points.

Once again: all vertices are variables, and they cannot deviate far from their original position, but at the same time they try to become similar to each other.

Here is the result:

Everything would be fine, the model is really smoothed, but it moved away from its original edge. Let's change the code a little:

For (int i=0; i<(int)verts.size(); i++) { float scale = border[i] ? 1000: 1; nlBegin(NL_ROW); nlCoefficient(i, scale); nlRightHandSide(scale*verts[i][d]); nlEnd(NL_ROW); }

In our matrix A, for the vertices that are on the edge, I add not a row from the category v_i = verts[i][d], but 1000*v_i = 1000*verts[i][d]. What does it change? And this changes our quadratic form of the error. Now a single deviation from the top at the edge will cost not one unit, as before, but 1000 * 1000 units. That is, we hung a stronger spring on the extreme vertices, the solution prefers to stretch others more strongly. Here is the result:

Let's double the strength of the springs between the vertices:
nlCoefficient(face[ j ], 2); nlCoefficient(face[(j+1)%3], -2);

It is logical that the surface has become smoother:

And now even a hundred times stronger:

What is this? Imagine that we have dipped a wire ring in soapy water. As a result, the resulting soap film will try to have the least curvature as possible, touching the same border - our wire ring. This is exactly what we got by fixing the border and asking for a smooth surface inside. Congratulations, we have just solved the Laplace equation with Dirichlet boundary conditions. Sounds cool? But in fact, just one system of linear equations to solve.

Poisson equation

Let's have another cool name.

Let's say I have an image like this:

Everyone is good, but I don't like the chair.

I cut the picture in half:



And I will select a chair with my hands:

Then I will drag everything that is white in the mask to the left side of the picture, and at the same time I will say throughout the whole picture that the difference between two neighboring pixels should be equal to the difference between two neighboring pixels of the right image:

For (int i=0; i

Here is the result:

Real life example

I deliberately did not do licked results, because. I just wanted to show exactly how you can apply the least squares methods, this is a training code. Let me now give an example from life:

I have a number of photographs of fabric samples like this one:

My task is to make seamless textures from photos of this quality. First, I (automatically) look for a repeating pattern:

If I cut out this quadrangle right here, then because of the distortions, the edges will not converge, here is an example of a pattern repeated four times:

Hidden text

Here is a fragment where the seam is clearly visible:

Therefore, I will not cut along a straight line, here is the cut line:

Hidden text

And here is the pattern repeated four times:

Hidden text

And its fragment to make it clearer:

Already better, the cut did not go in a straight line, bypassing all sorts of curls, but still the seam is visible due to uneven lighting in the original photo. This is where the least squares method for the Poisson equation comes to the rescue. Here's the final result after lighting alignment:

The texture turned out perfectly seamless, and all this automatically from a photo of a very mediocre quality. Do not be afraid of mathematics, look for simple explanations, and you will be lucky in engineering.