What are statistical models. For many novice researchers, statistical data processing is something like the final chord when writing a scientific paper. Relationship with mathematics

Appendix 1. METHODS OF STATISTICAL ANALYSIS AND FORECASTING IN BUSINESS

2. Mathematical models as a necessary tool for statistical analysis and forecasting in business

Let's start with a simple example showing differences purely statistical, purely probabilistic and probabilistic-statistical approaches to the development of a predictive solution. At the same time, this example clearly shows the role of mathematical models in the technology of forming a predictive solution.

Statistical method of decision making. Let the reader imagine himself as a businessman watching the game of his two businessmen friends ( BUT and AT) in the bone. The game is played according to the following rules. Four consecutive throws of a dice are made. Player BUT receives one currency unit from the player AT, if these four rolls result in six at least once (let's call this outcome "six"), and pays one unit of money to the player AT otherwise (let's call this outcome "not six"). After a hundred rounds, the reader must change one of the players, and he has the right to choose the situation on which he will bet his monetary unit in the next series of rounds: for the appearance of at least one "six" or against. The correct implementation of this choice is determined, of course, by the quality of his forecast regarding the outcome of the game when betting on the “six” outcome: if the probability of this outcome is correctly estimated by a value exceeding half, then the player must bet on this outcome. So, the task of the observer is to make a reliable forecast.

Statistical way The solution to this problem is dictated by ordinary common sense and is as follows. Having observed one hundred rounds of the game of the previous partners and having calculated the relative frequencies of their winnings, it would seem natural to bet on the situation that arose more often in the course of the game. For example, it was recorded that in 52 games out of 100 the player won AT, i.e. in 52 rounds out of 100, the “six” did not fall out even once when the dice was thrown four times (respectively, in the remaining 48 games out of a hundred, the outcome was “six”). Therefore, the reader concludes, applying statistical method reasoning, it is more profitable to bet on the outcome “not six”, i.e. on that outcome, the relative frequency of which is equal to 0.52 (more than half).

Theoretical and probabilistic way of solving . This method is based on a specific mathematical model of the phenomenon under study: assuming the die is regular (i.e., symmetrical), and therefore, taking the chances of any face of the die falling out in one throw equal to each other (in other words, the relative frequency, or probability, of a “one” falling out is equal to the relative frequency of a “two” falling out) , "triples", etc. and is equal to 1/6), you can calculate the probability P(“not six”) of the implementation of the situation “not six”, i.e. event probability, which consists in the fact that during four consecutive throws of a dice, the "six" will never appear. This calculation is based on the following facts arising from the assumptions of the model that we have adopted. Probability don't throw away six at one throw the dice is made up of the chances of a one, two, three, four, and five appearing as a result of a single throw, and, therefore, is (in accordance with the definition of the probability of any event) a value of 5/6. Then we use the rule of multiplication of probabilities, according to which the probability of occurrence of several independent events is equal to the product of the probabilities of these events. In our case, we consider the fact of the occurrence of four independent events, each of which consists in not getting a "six" on one throw and has a probability of occurrence equal to 5/6. That's why

As you can see, the probability of the situation “not six” turned out to be less than half, therefore, the chances of the situation “six” are preferable (the corresponding probability is: 1-0.482 = 0.518). This means that the reader who used the probabilistic method of reasoning will come to a decision diametrically opposite to that of a reader with a statistical way of thinking and will bet “six” on the situation in the game.

Probabilistic-statistical (or mathematical-statistical) way of making a decision. This method, as it were, synthesizes the tools of the two previous ones, since when developing the final conclusion with its help, the results accumulated as a result of observing the game are also used. initial statistics(in the form of the relative frequencies of occurrence of situations "six" and "not six", which, as we remember, were equal to 0.48 and 0.52, respectively), and probabilistic model considerations. However, the model adopted in this case, less rigid, less limited, it is, as it were, adjusts to reality using for this, the accumulated statistical information. In particular, this model no longer postulates the correctness of the dice used, assuming that the center of gravity of the dice can be shifted in some special way. The nature of this bias (if it exists) should somehow show up in the underlying statistics that we have. However, the reader, who owns a probabilistic-statistical way of thinking, should be aware that the values of the relative frequencies of outcomes "six" and "not six" obtained from these data give only some approximate estimates the true (theoretical) chances of either situation: after tossing, say, 10 times even a perfectly symmetrical coin, we can accidentally get seven “coats of arms”; accordingly, the relative frequency of the "coat of arms", calculated from these test results, will be equal to 0.7; but this does not mean that the true (theoretical) chances (probabilities) of the appearance of the "coat of arms" and the other side of the coin are estimated at 0.7 and 0.3, respectively - these probabilities, as we know, are 0.5. In the same way, the relative frequency of the “not six” outcome (equal to 0.52) established by us in a series of one hundred game rounds may differ from the true (theoretical) probability of the same event and, therefore, may not be a sufficient basis for choosing this situation in the game. !

It turns out that the whole question is how much the observed can differ (as a result of the implementation of n tests) the relative frequency of the event of interest to us from the true probability of the occurrence of this event, and how this difference, i.e., the error, depends on the number of observations at our disposal (it is intuitively clear that the longer we watched the game, i.e., the the greater the total number of observations we used, the more credible the empirical relative frequencies calculated by us deserve, i.e., the less their difference from the true probabilities unknown to us). The answer to this question can be obtained in our case, if we use a number of additional model considerations: a) assume that the result of each round does not depend in any way on the results of previous rounds, and the probability of the “not six” situation, unknown to us, remains the same throughout all rounds of the game; b) use the fact that the behavior of a randomly changing (when the experiment is repeated) error is approximately described the law of normal probability distributions with a mean value equal to zero and a variance equal to (see, Sec. 3.1.5).

These considerations, in particular, make it possible to estimate the absolute value of the error by replacing the unknown value of the probability of the event of interest to us (in our case, the outcome “not six”) by the relative frequency of this event recorded in a series of tests (in our case, , but ). If we were able to numerically estimate the absolute value of the possible error , then it is natural to apply the following decision rule: if the relative frequency of the outcome “not six” is more than half and continues to exceed 0.5 after subtracting the possible error from it, then it is more profitable to bet on “not six »; if the relative frequency is less than half and continues to be less than 0.5 after adding a possible error to it, then it is more profitable to bet on "six"; in other cases, the observer has no basis for a statistical conclusion about the advantages of one or another choice of stake in the game (i.e., one must either continue to observe, or participate in the game with an arbitrary choice of stake, expecting that this cannot lead to any tangible win or lose).

An approximate calculation of the maximum possible value of this error, based on the model consideration b) (i.e., the Moivre-Laplace theorem, see also Section 4.3), gives in the example under consideration that with practical certainty, namely with a probability of 0.95, the inequality

Squaring this inequality and solving the resulting squared inequality with respect to the unknown parameter gives

or, up to values of the order of smallness higher than ,

In this case (for and ) we get:

Consequently,

Thus, observations of the outcomes of a hundred games give us reason only to conclude that the unknown value of the probability of the “not six” outcome that interests us can actually be any number from the interval , i.e., it can be both a value less than 0.5 (and then it is necessary to bet in the game on the situation “six”), and with a value greater than 0.5 (and then it is necessary to bet in the game on the situation “not six”).

In other words, the reader who used the probabilistic-statistical method of solving the problem and the above model assumptions should come to the following “cautious” conclusion: one hundred games as an initial statistical material was not enough to make a reliable conclusion about which of the outcomes of the game is more likely. Hence the decision: either continue the role of the "spectator" until the range of possible values for the probability , obtained from estimates of the form (4), does not lie entirely to the left or to the right of 0.5, or enter the game, estimating it as close to " harmless”, i.e. one in which, in a long series of tours, you will practically remain “with your own people”.

The given example illustrates the role and purpose of probabilistic and mathematical-statistical methods, their relationship. If a probability theory provides the researcher with a set of mathematical models, designed to describe patterns in the behavior of real phenomena or systems, the functioning of which occurs under the influence a large number interacting random factors, then means of mathematical statistics make it possible to select among the many possible probabilistic models the one which, in a certain sense, best corresponds to the statistical data available to the researcher characterizing the actual behavior of a particular system under study.

Mathematical model . A mathematical model is a certain mathematical construction, which is an abstraction of the real world: in the model, the relations between real elements that are of interest to the researcher are replaced by suitable relations between the elements of the mathematical construction (mathematical categories). These relationships, as a rule, are presented in the form of equations and (or) inequalities between indicators (variables) that characterize the functioning of the simulated real system. The art of constructing a mathematical model is to combine the greatest possible brevity in its mathematical description with sufficient accuracy of model reproduction of precisely those aspects of the analyzed reality that are of interest to the researcher.

Above, when analyzing the relationship between purely statistical, purely probabilistic and mixed - probabilistic-statistical reasoning, we actually used the simplest models, namely:

statistical frequency model a random event of interest to us, which consists in the fact that as a result of four successive throws of a dice, the “six” will never fall out; estimating by history relative frequency this developments and taking it as the probability of the occurrence of this event in a future series of trials, we, thus, use the model of a random experiment with a known probability of its outcome (see also section 1.1.3);

probabilistic model of the Bernoulli test sequence(see also § 3.1.1), which is in no way connected with the use of observational results (i.e. with statistics); to calculate the probability of an event of interest to us, it is sufficient to make a hypothetical assumption that the dice used is perfectly symmetrical. Then, in accordance with the model of a series of independent tests and fair, within the framework of this model, the probability multiplication theorem, the probability of interest to us is calculated by the formula ;

probabilistic-statistical model, which interprets the relative frequency estimated in a purely statistical approach as a certain random variable (see also Sec. 2.1), whose behavior obeys the rules determined by the so-called Moivre–Laplace theorem; when constructing this model, both probabilistic concepts and rules and statistical techniques based on the results of observations were used.

Summarizing this example, we can say that:

probabilistic model – is a mathematical model that simulates the mechanism of functioning hypothetical(not specific) real phenomenon (or system) of stochastic nature; in our example, hypotheticality referred to the properties of the dice: it had to be perfectly symmetrical;

probabilistic-statistical model- e then a probabilistic model, the values of individual characteristics (parameters) of which are estimated based on the results of observations (initial statistical data) characterizing the functioning of the modeled specific(rather than a hypothetical) phenomenon (or system).

A probabilistic-statistical model that describes the mechanism of functioning of an economic or socio-economic system is called econometric.

Predictive and managerial models in business . Let's return to the tasks of statistical analysis of the mechanism of functioning of an enterprise (firm) and related forecasts. Considering again phase space» these problems, it is not difficult to describe the general logical structure of the models necessary for their solution. This structure follows directly from the above definition business strategies.

In order to formalize (i.e., write in terms of a mathematical model) the problems of optimal control and forecasting in business, we introduce the following notation:

– column vector of resulting indicators (sales volume, etc.);

– column vector of “behavioral” (controlled) variables (investments in the development of fixed assets, in marketing services, etc.);

- a column vector of the so-called "status" variables, i.e. indicators characterizing the state of the company (number of employees, fixed assets, age of the company, etc.);

- column vector of geo-socio-economic-demographic characteristics of the external environment (indicators of the general economic situation, characteristics of customers and suppliers, etc.);

is a column vector of random regression residuals (more on them below).

Then the system of equations, on the basis of which optimal enterprise management and performing the necessary predictive calculations, in the most general form can be represented in the form:

, (5)

where is some vector-valued (-dimensional) function of , the structure (parameter values) of which, generally speaking, depends on the levels at which the values of the firm's "state" and "external environment" variables are fixed.

Then basic problem statistical analysis and forecasting in business consists in constructing the best (in a certain sense) estimate for an unknown function based on the initial statistical information available to the researcher of the form

where are the values of the behavioral, “status”, external and resulting variables, respectively, characterizing the th tact of time (or measured at the th statistically surveyed enterprise), . Accordingly, the parameter ( sample size) is interpreted as total duration of observations for the values of the analyzed variables at the enterprise under study, if observations were recorded in time, And How the total number of statistically surveyed enterprises of the same type, if observations were recorded in space(i.e., moving from one enterprise to another). In this case, the description of the function must be accompanied by a calculation method guaranteed approximation errors(forecast errors), i.e., such vector (-dimensional) values and , which for any given values and would guarantee the fulfillment of the inequalities (with a probability not less than , where is a pre-specified positive value sufficiently close to unity), i.e . respectively behavioral (managed), "status" and environmental variables for the time point of the classical regression model, the value is identically equal to zero (see ).

Some general information about mathematical tools for solving problems (9) and (10), see below, in section 4.

Mathematical statistics is a branch of mathematics that develops methods for recording, describing and analyzing observational and experimental data in order to build probabilistic models of random phenomena and processes. Depending on the mathematical nature of the specific results of observations, mathematical statistics is divided into statistics of numbers, multivariate statistical analysis, analysis of functions (processes) and time series, and statistics of non-numerical objects. Mathematical statistics combines various methods of statistical analysis based on the use of statistical patterns or their characteristics.

The history of statistics is usually considered starting with the problem of recovering dependencies, from the moment it was developed by K. Gauss in 1794 (according to other sources - in 1795) method least squares. The development of data approximation methods and description dimension reduction was started more than 100 years ago, when K. Pearson created principal component method. Later developed factor analysis, various construction methods (cluster analysis), analysis and use (discriminant analysis) classifications (typologies) and others. At the beginning of the 20th century. the theory of mathematical statistics was developed by A. A. Chuprov. A. A. Markov, E. E. Slutsky, A. N. Kolmogorov, A. Ya. Khinchin and others made a significant contribution to the theory of random processes. data analysis theory is called parametric statistics, since its main object of study is samples from distributions described by one or a small number of parameters. The most general is the family of Pearson curves defined by four parameters. The most popular was the normal distribution. The Pearson, Student, and Fisher criteria were used to test the hypotheses. The maximum likelihood method, analysis of variance were proposed, and the main ideas for planning the experiment were formulated.

In 1954, Academician of the Academy of Sciences of the Ukrainian SSR B. V. Gnedenko gave the following definition: “Statistics consists of three sections:

1) collection of statistical information, i.e. information characterizing individual units of any mass aggregates;
2) statistical study the data obtained, which consists in elucidating those patterns that can be established on the basis of mass observation data;
3) development of techniques for statistical observation and analysis of statistical data.

The last section, in fact, is the content of mathematical statistics.

According to the degree of specificity of methods associated with immersion in specific problems, there are three types of scientific and applied activities in the field of statistical methods of data analysis:

a) development and research of general purpose methods, without taking into account the specifics of the field of application;
b) development and research of statistical models of real phenomena and processes in accordance with the needs of a particular field of activity;
c) application of statistical methods and models for statistical analysis of specific data.

The most common methods of statistical analysis are:

regression analysis(based on a comparison of mathematical expectations);
analysis of variance (based on comparison of variances);
correlation analysis(takes into account mathematical expectations, dispersions and characteristics of connections between events or processes);
factor analysis (statistical processing of a multifactorial experiment);
rank correlation (combination of correlation and factor analyses).

When applying various methods of mathematical statistics, statistical patterns or their characteristics are obtained in various ways: by observing and examining samples, using approximate methods based on various methods of converting or splitting a sample into a form variation series, splitting samples into streams, cuts, random time intervals, etc.

Mathematical statistics is used in various areas of management.

The term "statistics" was originally used to describe the economic and political condition of a state or part of it. For example, the definition refers to 1792: "statistics describe the state of the state at the present time or at some known moment in the past." And at present, the activities of state statistical services fit well into this definition. Statistics has been defined as a branch of knowledge that addresses the general issues of collecting, measuring and analyzing massive statistical (quantitative or qualitative) data; the study of the quantitative side of mass social phenomena in numerical form.

The word "statistics" comes from the Latin status- the state of affairs. The term "statistics" was introduced into science by the German scientist Gottfried Achenwall in 1746, who proposed to replace the title of the course "State Studies" taught at German universities with "Statistics", thereby laying the foundation for the development of statistics as a science and academic discipline.

In statistics, a special methodology for the study and processing of materials is used: mass statistical observations, the method of groupings, averages, indices, the balance method, the method of graphic images and other methods of analyzing statistical data.

The development of computing technology has had a significant impact on statistics. Previously, statistical models were represented mainly by linear models. Increasing computer speed and the development of corresponding numerical algorithms have caused an increased interest in non-linear models such as artificial neural networks, and has led to the development of complex statistical models, such as a generalized linear model and a hierarchical model. Computational methods based on resampling have become widespread. Currently, computational statistics is developing, there is a variety of statistical software for general and specialized purposes. Statistical methods are used in the direction called "Data Mining" (see Ch. 8).

Statistical modeling is numerical method solutions math problems, at which the sought values are the probabilistic characteristics of some random phenomenon. This phenomenon is simulated, after which the desired characteristics are approximately determined by statistical processing of the "observations" of the model.

The development of such models consists in choosing a method of statistical analysis, planning the process of obtaining data, arranging data on ecological system, algorithms and calculation of statistical ratios by computer means. Changing Patterns of Development environmental situation requires repeating the described procedure, but in a new quality.

Statistical finding of a mathematical model includes selection of the type of model and definition of its parameters. Moreover, the desired function can be either a function of one independent variable (single-factor), or many variables (multi-factor). The task of choosing the type of model is an informal task, since the same dependence can be described with the same error by a variety of analytical expressions (regression equations). A rational choice of the type of model can be justified taking into account a number of criteria: compactness (for example, described by a monomial or polynomial), interpretability (the ability to give meaningful meaning to the coefficient of the model), etc. The task of calculating the parameters of the selected model is often purely formal and is carried out on a computer.

Forming a statistical hypothesis about a certain ecological system, it is necessary to have an array of various data (database), which may be unreasonably large. An adequate representation of the system is associated in this case with the separation of non-essential information. Both the list (type) of data and the amount of data can be reduced. One of the methods for implementing such a compression of environmental information (without a priori assumptions about the structure and dynamics of the observed ecosystem) can be factor analysis. Data reduction is carried out by the method of least squares, principal components and other multivariate statistical methods, using further, for example, cluster analysis.

Note that primary environmental information has more-less the following features:

– multidimensionality of data;

– non-linearity and ambiguity of relationships in the system under study;

– measurement error;

- the influence of unaccounted for factors;

– spatio-temporal dynamics.

When solving the first problem of choosing the type of model, it is assumed that m input (x 1 , x 2 , ..., x m and n output (y 1 , y 2 , ..., y) data are known. In this case, it is possible, in in particular, the following two models in matrix notation:

where X and Y are known input (output) and output (input) parameters of an ecological object ("black box") in vector form; A and B are the desired matrices constant coefficients model (model parameters).

Along with the above models a more general form of statistical modeling is considered:

where F is the vector of hidden influencing factors; C and D are the desired coefficient matrices.

When solving environmental problems it is expedient to use both linear and non-linear mathematical models, since many ecological regularities have been little studied. As a result, the multidimensionality and nonlinearity of the modeled relationships will be taken into account.

Based on a generalized model it is possible to single out the internal hidden factors of the studied ecological processes that are not known to the environmental engineer, but their manifestation is reflected in the components of the X and Y vectors. This procedure is most appropriate in the case when there is no strict cause-and-effect relationship between the X and Y values. The generalized model, taking into account the influence of hidden factors, eliminates a certain contradiction between two models with matrices A and B, when in fact two different models could be used to describe the same ecological process. This contradiction is caused by the opposite meaning of the causal relationship between the quantities A and Y (in one case, X is an input, and Y is an output, and vice versa in the other). The generalized model, taking into account the value of F, describes a more complex system from which both X and Y are output values, and hidden factors F act on the input.

Important in statistical modeling is the use of a priori data, when even in the process of solving some patterns of models can be established and their potential number is narrowed.

Let us assume that it is necessary to make a model with the help of which it is possible to numerically determine the fertility of a certain type of soil in 24 hours, taking into account its temperature T and moisture W. Neither wheat nor an apple tree can produce a crop in 24 hours. But for test sowing, you can use bacteria with a short life cycle, and as a quantitative criterion for the intensity of their vital activity, use the amount P of the released CO 2 per unit time. Then the mathematical model of the process under study is the expression

where P 0 is a numerical indicator of soil quality.

It seems that we do not have any data on the form of the function f(T, W) because the systems engineer does not have the necessary agronomic knowledge. But it is not so. Who does not know that at T ≈ 0 ° C water freezes and, therefore, CO 2 cannot be released, and at 80 ° C pasteurization occurs, i.e. most bacteria die. A priori data are already sufficient to assert that the desired function has a quasi-parabolic character, is close to zero at T=0 and 80°C, and has an extremum within this temperature range. Similar reasoning regarding humidity leads to the factual fixation of the maximum of the extremum of the desired function at W=20% and its approach to zero at W=0 and 40%. Thus, the form of the approximate mathematical model is a priori determined, and the task of the experiment is only to clarify the nature of the function f(T, W) at T=20 ... 30 and 50 ... 60°C, as well as at W=10 ... 15 and 25 ... 30% and more accurate determination of the extremum coordinates (which reduces the amount of experimental work, i.e. the amount of statistical data).

Mathematical statistics is a branch of applied mathematics, directly adjacent and based on the theory of probability. Like any mathematical theory, mathematical statistics develops within the framework of a certain model that describes a certain range of real phenomena. To define a statistical model and explain the specifics of the problems of mathematical statistics, we recall some provisions from the theory of probability.

The mathematical model of random phenomena studied in probability theory is based on the concept of a probability space. At the same time, in each specific situation the probability is considered to be a fully known numerical function on the -algebra, that is, for any number is completely determined. The main task of the theory of probability is the development of methods for finding the probabilities of various complex events from the known probabilities of simpler ones (for example, according to the known laws of the distribution of random variables, their numerical characteristics and distribution laws of functions of random variables).

However, in practice, when studying a specific random experiment, the probability , as a rule, is unknown or partially known. One can only assume that the true probability is an element of some class of probabilities (at worst is the class of all possible probabilities that can be set on ). Class is called the totality admissible to describe a given experiment of probabilities , and the set - statistical model experiment. In the general case, the task of mathematical statistics is to refine the probabilistic model of the random phenomenon under study (that is, to find the true or close to it probability), using the information provided by the observed outcomes of the experiment, which are called statistical data.

In classical mathematical statistics, which we will study further, deal with random experiments consisting in conducting n repeated independent observations over some random variable , which has an unknown probability distribution, i.e. unknown distribution function. In this case, the set of all possible values of the observable random variable called general population , which has a distribution function or distributed according to . Numbers , which are the result of independent observations on a random variable , are called sampling from the general population or selective (statistical) data. The number of observations is called volume samples.

The main task of mathematical statistics is how to sample from the general population, extracting the maximum information from it, to draw reasonable conclusions regarding the unknown probabilistic characteristics of the observed random variable.

Under the statistical model corresponding to repeated independent observations on a random variable , naturally, instead of understanding the set , where is the general population, is the algebra of Borel subsets from , is the class of admissible distribution functions for the given random variable , to which the true unknown distribution function also belongs.

Triple is often called a statistical experiment.

If the distribution functions from are given up to the values of some parameter , i.e. ( - parametric set), then such a model is called parametric . It is said that in this case it is known type of distribution of the observed random variable, and only the parameter on which the distribution depends is unknown. The parameter can be either scalar or vector.

The statistical model is called continuous or discrete , if these are all components of the distribution function class, respectively.

Example 1. Assume that the distribution of the observed random variable is Gaussian with known variance and unknown mathematical expectation.

In this case, the statistical model is continuous and has the form:

If the variance is also unknown, then the statistical model has the form:

and the distribution function has a probability density

This is the so-called general normal model, denoted by .

Example 2. Let us assume that the distribution of the observed random variable is Poisson with an unknown parameter . In this case, the statistical model is discrete and has the form: , random variables (it is said that random variables are copies of ), and which has not yet taken specific meaning as a result of the experiment. Transition from sample-specific to a random sample will be repeatedly used further when solving theoretical issues and problems to obtain conclusions that are valid for any sample from the general population.

The main problems considered in mathematical statistics can be divided into two large groups:

1. Tasks related to the definition of an unknown distribution law of an observed random variable and parameters included in it (they are considered within the framework of the statistical estimation theory).

2. Tasks related to testing hypotheses regarding the law of distribution of the observed random variable (solved within the framework of the theory of testing statistical hypotheses).

Statistical observation.

The essence of statistical observation.

The initial stage of any statistical research is the systematic, scientifically organized collection of data on the phenomena and processes of social life, called statistical observation. The significance of this stage of the study is determined by the fact that the use of only a completely objective and sufficiently complete, obtained as a result of statistical observation, at subsequent stages is able to provide scientifically based conclusions about the nature and patterns of development of the object under study. Statistical observation is carried out by evaluating and registering the characteristics of the units of the studied population in the relevant accounting documents. The data obtained in this way are facts that in one way or another characterize the phenomena of social life. The use of reasoning based on facts does not contradict the use of theoretical analysis, since any theory is ultimately based on factual material. The probative ability of facts increases even more as a result of statistical processing, which ensures their systematization, presentation in a compressed form. Statistical observation should be distinguished from other forms of observation carried out in Everyday life based on sensory perception. Only such an observation can be called statistical, which ensures the registration of the established facts in accounting documents for their subsequent generalization. Concrete examples statistical observation is the systematic collection of information, for example, at machine-building enterprises on the number of machines and components produced, production costs, profits, etc. Statistical observation must meet rather stringent requirements: 1. The observed phenomena must have a certain national economic significance, scientific or practical value, express certain socio-economic types of phenomena. 2. Statistical observation should ensure the collection of mass data, which reflects the totality of facts related to the issue under consideration, since social phenomena are in constant change, development, and have different qualitative states.

Incomplete data that characterize the process insufficiently versatile lead to the fact that erroneous conclusions are drawn from their analysis. 3. The variety of causes and factors that determine the development of social and economic phenomena predetermines the orientation of statistical observation, along with the collection of data that directly characterize the object under study, to take into account the facts and events under the influence of which a change in its states is carried out. 4. To ensure the reliability of statistical data at the stage of statistical observation, a thorough check of the quality of the collected facts is necessary. The strict reliability of his data is one of the most important characteristics of statistical observation. Defects in statistical information, expressed in its unreliability, cannot be eliminated in the process of further processing, so their appearance makes it difficult to make scientifically based decisions and balance the economy. 5. Statistical observation should be carried out on a scientific basis according to a previously developed system, plan and rules (program) that provide a strictly scientific solution of all program, methodological and organizational issues.

Software and methodological support of statistical observation.

Preparation for statistical observation, which ensures the success of the case, implies the need to timely resolve a number of methodological issues related to the definition of tasks, goals, objects, units of observation, the development of a program and tools, and the determination of a method for collecting statistical data. The tasks of statistical observation directly follow from the tasks of statistical research and consist, in particular, in obtaining mass data directly about the state of the object under study, in taking into account the state of phenomena that affect the object, and in studying data on the development of phenomena. The goals of monitoring are determined, first of all, by the needs information support for economic and social development society. The goals set for state statistics are specified and concretized by its governing bodies, as a result of which the directions and scope of work are determined. Depending on the purpose, the question of the object of statistical observation is decided, i.e. what should be observed. An object is understood as a set of material objects, enterprises, labor collectives, persons, etc., through which the phenomena and processes that are subject to statistical research are carried out. The objects of observation, depending on the goals, can be, in particular, the masses of units of production equipment, products, inventory items, settlements, districts, enterprises, organizations and institutions of various sectors of the national economy, the population and its individual categories, etc. The establishment of the object of statistical observation is associated with the determination of its boundaries on the basis of the corresponding criterion, expressed by some characteristic restrictive feature, called the qualification. The choice of qualification has a significant impact on the formation of homogeneous aggregates, ensures the impossibility of mixing different objects or underestimating some part of the object. The essence of the object of statistical observation becomes clearer when considering the units of which it consists: The units of observation are the primary elements of the object of statistical observation, which are carriers of the registered features.

A reporting unit should be distinguished from an observation unit. The reporting unit is such a unit of statistical observation, from which information subject to registration is received in the prescribed manner. In some cases, both concepts coincide, but often they have a completely independent meaning. It turns out to be impossible and inexpedient to take into account all the many features that characterize the object of observation, therefore, when developing a plan for statistical observation, the issue of the composition of features to be recorded in accordance with the goal should be carefully and skillfully addressed. The list of features, formulated as questions addressed to the units of the population, to which a statistical study must answer, is a program of statistical observation.

To obtain an exhaustive description of the phenomenon under study, the entire range of its essential features must be taken into account in the program. However, the problematic nature of the practical implementation of this principle makes it necessary to include in the program only the most significant features that express the socio-economic types of the phenomenon, its most important features, properties and relationships. The scope of the program is governed by the amount of resources available to the statistical authorities, the timing of the results, the requirements for the level of detail of developments, etc. The content of the program is determined by the nature and properties of the object under study, the goals and objectives of the study. To the number general requirements The preparation of the program includes the inadmissibility of including in its composition questions to which it is difficult to obtain accurate, completely reliable answers that give an objective picture of a particular situation. In considering some of the most important features the program is expected to include test questions, serving for the consistency of the information received. In order to strengthen the mutual verification of questions and the analyticity of the observation program, interrelated questions are arranged in a certain sequence, sometimes in blocks of interrelated features.

Questions of the program of statistical observation should be formulated clearly, clearly, concisely, without allowing the possibility of their different interpretations. The program often lists options answers, through which the semantic content of the questions is clarified. The methodological support of statistical observation assumes that, simultaneously with the observation program, a program for its development is drawn up. Research objectives are formulated in the list of generalizing statistical indicators. These indicators should be obtained as a result of processing the collected material, the characteristics with which each indicator corresponds, and the layouts of statistical tables, where the results of processing the primary information are presented. The development program, identifying the missing information, allows you to refine the program of statistical observation. Carrying out statistical observation implies the need to prepare appropriate tools: forms and instructions for filling them out. The statistical form is the primary document in which the answers to the questions of the program for each of the units of the population are recorded. The form is thus a carrier of primary information. All forms are characterized by some mandatory elements: a content part, including a list of program questions, a free column or several columns for recording answers and ciphers (codes) of answers, title and address seals. In order to ensure the unity of interpretation of their content, statistical forms are usually accompanied by instructions, i.e. written instructions and explanations for filling in statistical observation forms. The instruction explains the purpose of statistical observation, characterizes its object and unit, the time and duration of observation, the procedure for completing documentation, and the timing of the presentation of results. However, the main purpose of the instruction is to explain the content of the program questions, how to answer them and fill out the form.

Types and methods of statistical observation.

The success of the collection of high-quality and complete initial data, taking into account the requirement of economical expenditure of material, labor and financial resources, is largely determined by the decision on the choice of the type, method and organizational form of statistical observation.

Types of statistical observation.

The need to choose one or another option for collecting statistical data that best suits the conditions of the problem being solved is determined by the presence of several types of observation, which differ primarily in terms of the nature of accounting for facts over time. Systematic observation, carried out continuously and necessarily as the signs of the phenomenon appear, is called current. Current monitoring is carried out on the basis of primary documents containing information necessary for sufficient complete characteristics phenomenon being studied. Statistical observation carried out at certain equal intervals of time is called periodic. An example is the population census. An observation carried out from time to time, without observing a strict periodicity or in a one-time order, is called a one-time observation. Types of statistical observation are differentiated taking into account the difference in information on the basis of the completeness of coverage of the population. In this regard, a distinction is made between continuous and non-continuous observations. A continuous observation is one that takes into account all units of the studied population without exception. Non-continuous observation is deliberately oriented towards taking into account some, as a rule, a fairly large part of observation units, which nevertheless makes it possible to obtain stable generalizing characteristics of the entire statistical population. In statistical practice, various types of non-continuous observation are used: selective, the method of the main array, questionnaire and monographic. The quality of non-continuous observation is inferior to the results of continuous observation, however, in a number of cases, statistical observation in general turns out to be possible only as non-continuous one. To obtain a representative characteristic of the entire statistical population for some part of its units, a sample observation is used, based on the scientific principles of the formation of a sample population. The random nature of the selection of units of the population guarantees the impartiality of the results of the sample, prevents their bias. According to the method of the main array, the largest, most significant units of the population are selected, which prevail in their total mass according to the trait under study. A specific type of statistical observation is a monographic description, which is a detailed examination of a separate, but very typical object, which also determines interest from the point of view of studying the entire population.

Methods of statistical observation.

Differentiation of varieties of statistical observation is also possible depending on the sources and methods of obtaining primary information. In this regard, a distinction is made between direct observation, questioning and documentary observation. Direct observation is called, carried out by counting, measuring the values of signs, taking instrument readings by special persons who carry out observations, in other words, by registrars. Quite often, due to the impossibility of using other methods, statistical observation is carried out by means of a survey on a certain list of questions. Answers are recorded in a special form. Depending on the methods of obtaining answers, there are forwarding and correspondent methods, as well as a method of self-registration. The forwarding method of questioning is carried out orally by a special person (counter, forwarder), who simultaneously fills out a form or survey form.

Correspondent method of questioning is organized by distribution of survey forms by the statistical authorities to some appropriately prepared circle of persons, called correspondents. The latter are obliged by agreement to complete the form and return it to the statistical organization. Checking the correctness of filling out the forms takes place during the survey by the method of self-registration. Questionnaires are filled out, as in the correspondent method, by the respondents themselves, but their distribution and collection, as well as instruction and control of the correctness of filling, are carried out by counters.

Basic organizational forms of statistical observation.

The whole variety of types and methods of observation is carried out in practice through two main organizational forms: reporting and specially organized observation. Statistical reporting is the main form of statistical observation in social society, covering all enterprises, organizations and institutions of the production and non-production spheres. Reporting is a systematic presentation of accounting and statistical documentation in the form of reports in a timely manner that comprehensively characterizes the results of the work of enterprises and institutions during reporting periods. Reporting is directly related to primary and accounting records, is based on them and represents their systematization, i.e. the result of processing and generalization. Reporting is carried out according to a strictly established form approved by the State Statistics Committee of Russia. A list of all forms with their details (accessories) is called a reporting sheet. Each of the reporting forms must contain the following information: name; number and date of approval; name of the enterprise, its address and subordination; addresses to which reports are submitted; frequency, date of submission, method of transmission; content in the form of a table; the official composition of persons responsible for the development and reliability of reporting data, i.e. required to sign the report. The variety of conditions of the production process in various branches of material production, the specificity of the reproduction process in local conditions, the consideration of the significance of certain indicators determine the difference in the types of reporting. Distinguish, first of all, standard and specialized reporting. Standard reporting has the same form and content for all enterprises or institutions of the national economy. Specialized reporting expresses specific moments for individual enterprises in the industry. According to the principle of periodicity, reporting is divided into annual and current: quarterly, monthly, fortnightly, weekly. Depending on the method of transmission of information, postal and telegraphic reporting is distinguished. Statistical censuses serve as the second most important organizational form of statistical observation. The census is a specially organized statistical observation aimed at accounting for the number and composition of certain objects (phenomena), as well as establishing the qualitative characteristics of their aggregates at a certain point in time. Censuses provide statistical information that is not provided for by reporting, and in some cases significantly refine the data of current accounting.

To ensure the high quality of the results of statistical censuses, a set of preparatory work is being carried out. The content of organizational measures for the preparation of censuses, carried out in accordance with the requirements and rules of statistical science, is set out in a specially developed document called the organizational plan for statistical observation. In the organizational plan, the issues of the subject (performer) of statistical observation, the place, time, timing and procedure for conducting, the organization of census tracts, the selection and training of counting workers, providing them with the necessary accounting documentation, the implementation of a number of other preparatory works and etc. The subject of observation is the organization (institution) or its subdivision responsible for observation, organizing its implementation, as well as directly performing the functions of collecting and processing statistical data. The question of the place of observation (the place of registration of facts) arises mainly during statistical and sociological research and is decided depending on the purpose of the study.

The observation time is the period of time during which the work on recording and verifying the received data must be started and completed. The observation time is chosen based on the criterion of minimum spatial mobility of the object under study. From the time of observation it is necessary to distinguish the critical moment, to which the collected data are timed.

The concept of statistical observation is quite interesting topic for consideration. Statistical observations are used almost everywhere where it is possible to condition their use. At the same time, despite the extensive scope, statistical observations are a rather complex subject and errors are not uncommon. However, in general, statistical observations as a subject for consideration are of great interest.