Probabilistic-statistical research methods. What is "mathematical statistics" The problem of probabilistic and statistical methods

What is "mathematical statistics"

Under mathematical statistics understand “a section of mathematics devoted to the mathematical methods of collecting, systematizing, processing and interpreting statistical data, as well as using them for scientific or practical conclusions. The rules and procedures of mathematical statistics are based on the theory of probability, which makes it possible to evaluate the accuracy and reliability of the conclusions obtained in each problem on the basis of the available statistical material. At the same time, statistical data refers to information about the number of objects in any more or less extensive collection that have certain characteristics.

According to the type of problems being solved, mathematical statistics is usually divided into three sections: data description, estimation, and hypothesis testing.

According to the type of statistical data being processed, mathematical statistics is divided into four areas:

  • - one-dimensional statistics (statistics of random variables), in which the observation result is described by a real number;
  • - multivariate statistical analysis, where the result of observation of an object is described by several numbers (vector);
  • - statistics of random processes and time series, where the result of observation is a function;
  • - statistics of objects of non-numerical nature, in which the result of observation has a non-numerical nature, for example, is a set ( geometric figure), ordering or obtained as a result of measurement on a qualitative basis.

Historically, some areas of statistics of objects of non-numerical nature (in particular, problems of estimating the percentage of defective products and testing hypotheses about it) and one-dimensional statistics were the first to appear. The mathematical apparatus is simpler for them, therefore, by their example, they usually demonstrate the main ideas of mathematical statistics.

Only those methods of data processing, ie. mathematical statistics are evidence-based, which are based on probabilistic models of relevant real phenomena and processes. We are talking about models of consumer behavior, the occurrence of risks, the functioning of technological equipment, obtaining the results of an experiment, the course of a disease, etc. A probabilistic model of a real phenomenon should be considered constructed if the quantities under consideration and the relationships between them are expressed in terms of probability theory. Correspondence to the probabilistic model of reality, i.e. its adequacy is substantiated, in particular, with the help of statistical methods for testing hypotheses.

Incredible data processing methods are exploratory, they can only be used in preliminary data analysis, since they do not make it possible to assess the accuracy and reliability of the conclusions obtained on the basis of limited statistical material.

Probabilistic and statistical methods are applicable wherever it is possible to construct and substantiate a probabilistic model of a phenomenon or process. Their use is mandatory when conclusions drawn from sample data are transferred to the entire population (for example, from a sample to an entire batch of products).

In specific areas of application, both probabilistic-statistical methods of wide application and specific ones are used. For example, in the section of production management devoted to statistical methods of product quality control, applied mathematical statistics (including the design of experiments) are used. With the help of its methods, a statistical analysis of the accuracy and stability of technological processes and a statistical assessment of quality are carried out. Specific methods include methods of statistical acceptance control of product quality, statistical regulation of technological processes, assessment and control of reliability, etc.

Such applied probabilistic-statistical disciplines as reliability theory and queuing theory are widely used. The content of the first of them is clear from the title, the second deals with the study of systems such as a telephone exchange, which receives calls at random times - the requirements of subscribers dialing numbers on their telephones. The duration of the service of these requirements, i.e. the duration of conversations is also modeled by random variables. A great contribution to the development of these disciplines was made by Corresponding Member of the USSR Academy of Sciences A.Ya. Khinchin (1894-1959), academician of the Academy of Sciences of the Ukrainian SSR B.V. Gnedenko (1912-1995) and other domestic scientists.

Statistical Methods

Statistical methods- methods of analysis of statistical data. Allocate applied statistics methods that can be applied in all areas scientific research and any sectors of the national economy, and other statistical methods, the applicability of which is limited to a particular area. This refers to methods such as statistical acceptance control, statistical control of technological processes, reliability and testing, and design of experiments.

Classification of statistical methods

Statistical methods of data analysis are used in almost all areas of human activity. They are used whenever it is necessary to obtain and substantiate any judgments about a group (objects or subjects) with some internal heterogeneity.

It is advisable to distinguish three types of scientific and applied activities in the field of statistical methods of data analysis (according to the degree of specificity of methods associated with immersion in specific problems):

a) development and research of general purpose methods, without taking into account the specifics of the field of application;

b) development and research statistical models real phenomena and processes in accordance with the needs of a particular field of activity;

c) application of statistical methods and models for statistical analysis of specific data.

Applied Statistics

Description of the type of data and the mechanism of their generation is the beginning of any statistical study. Both deterministic and probabilistic methods are used to describe data. With the help of deterministic methods, it is possible to analyze only those data that are at the disposal of the researcher. For example, they were used to obtain tables calculated by official state statistics bodies on the basis of statistical reports submitted by enterprises and organizations. It is possible to transfer the obtained results to a wider set, to use them for prediction and control only on the basis of probabilistic-statistical modeling. Therefore, only methods based on probability theory are often included in mathematical statistics.

We do not consider it possible to oppose deterministic and probabilistic-statistical methods. We consider them as successive stages of statistical analysis. At the first stage, it is necessary to analyze the available data, present them in a form convenient for perception using tables and charts. Then it is advisable to analyze the statistical data on the basis of certain probabilistic-statistical models. Note that the possibility of a deeper insight into the essence of a real phenomenon or process is provided by the development of an adequate mathematical model.

In the simplest situation, statistical data are the values ​​of some feature characteristic of the objects under study. Values ​​can be quantitative or represent an indication of the category to which the object can be assigned. In the second case, we talk about a qualitative sign.

When measuring by several quantitative or qualitative characteristics, we obtain a vector as statistical data about the object. It can be considered as the new kind data. In this case, the sample consists of a set of vectors. If part of the coordinates is numbers, and part is qualitative (categorized) data, then we are talking about a vector of heterogeneous data.

One element of the sample, that is, one dimension, can be a function as a whole. For example, describing the dynamics of the indicator, that is, its change over time, is the patient's electrocardiogram or the amplitude of the beats of the motor shaft. Or a time series that describes the dynamics of the performance of a particular company. Then the sample consists of a set of functions.

The elements of the sample can also be other mathematical objects. For example, binary relations. Thus, when polling experts, they often use ordering (ranking) of objects of expertise - product samples, investment projects, options for management decisions. Depending on the regulations of the expert study, the elements of the sample can be various types of binary relations (ordering, partitioning, tolerance), sets, fuzzy sets etc.

So, the mathematical nature of the sample elements in various problems of applied statistics can be very different. However, two classes of statistics can be distinguished - numeric and non-numeric. Accordingly, applied statistics is divided into two parts - numerical statistics and non-numerical statistics.

Numeric statistics are numbers, vectors, functions. They can be added, multiplied by coefficients. Therefore, in numerical statistics great importance have different amounts. The mathematical apparatus for analyzing the sums of random sample elements is the (classical) laws big numbers and central limit theorems.

Non-numeric statistical data are categorized data, vectors of heterogeneous features, binary relations, sets, fuzzy sets, etc. They cannot be added and multiplied by coefficients. So it doesn't make sense to talk about sums of non-numeric statistics. They are elements of non-numerical mathematical spaces (sets). The mathematical apparatus for the analysis of non-numerical statistical data is based on the use of distances between elements (as well as proximity measures, difference indicators) in such spaces. With the help of distances, empirical and theoretical averages are determined, the laws of large numbers are proved, nonparametric estimates of the probability distribution density are constructed, problems of diagnostics and cluster analysis are solved, etc. (see).

Applied research uses various types of statistical data. This is due, in particular, to the methods of obtaining them. For example, if testing of some technical devices continues until a certain point in time, then we get the so-called. censored data consisting of a set of numbers - the duration of the operation of a number of devices before failure, and information that the remaining devices continued to work at the end of the test. Censored data is often used in the assessment and control of the reliability of technical devices.

Usually, statistical methods of data analysis of the first three types are considered separately. This limitation is caused by the circumstance noted above that the mathematical apparatus for analyzing data of a non-numerical nature is essentially different from that for data in the form of numbers, vectors, and functions.

Probabilistic-statistical modeling

When applying statistical methods in specific areas of knowledge and sectors of the national economy, we obtain scientific and practical disciplines such as “statistical methods in industry”, “statistical methods in medicine”, etc. From this point of view, econometrics is “statistical methods in economics”. These disciplines of group b) are usually based on probabilistic-statistical models built in accordance with the characteristics of the application area. It is very instructive to compare the probabilistic-statistical models used in various fields, to discover their closeness and, at the same time, to state some differences. Thus, one can see the closeness of the problem statements and the statistical methods used to solve them in such areas as scientific medical research, specific sociological research and marketing research, or, in short, in medicine, sociology and marketing. These are often grouped together under the name "sampling studies".

The difference between selective studies and expert studies is manifested, first of all, in the number of objects or subjects examined - in selective studies, we usually talk about hundreds, and in expert studies, about tens. But the technology of expert research is much more sophisticated. The specificity is even more pronounced in demographic or logistical models, in the processing of narrative (textual, chronicle) information, or in the study of the mutual influence of factors.

The issues of reliability and safety of technical devices and technologies, the theory of queuing are considered in detail in a large number of scientific papers.

Statistical analysis of specific data

The application of statistical methods and models for the statistical analysis of specific data is closely tied to the problems of the respective field. The results of the third of the identified types of scientific and applied activities are at the intersection of disciplines. They can be considered as examples of the practical application of statistical methods. But there is no less reason to attribute them to the corresponding field of human activity.

For example, the results of a survey of instant coffee consumers are naturally attributed to marketing (which is what they do when lecturing on marketing research). The study of price growth dynamics using inflation indices calculated from independently collected information is of interest primarily from the point of view of economics and management of the national economy (both at the macro level and at the level of individual organizations).

Development prospects

The theory of statistical methods is aimed at solving real problems. Therefore, new productions are constantly appearing in it. math problems analysis of statistical data, new methods are being developed and substantiated. Justification is often carried out by mathematical means, that is, by proving theorems. An important role is played by the methodological component - how exactly to set tasks, what assumptions to accept for the purpose of further mathematical study. The role of modern information technologies, in particular, computer experiment, is great.

An urgent task is to analyze the history of statistical methods in order to identify development trends and apply them for forecasting.

Literature

2. Naylor T. Machine simulation experiments with models of economic systems. - M.: Mir, 1975. - 500 p.

3. Kramer G. Mathematical methods of statistics. - M.: Mir, 1948 (1st ed.), 1975 (2nd ed.). - 648 p.

4. Bolshev L. N., Smirnov N. V. Tables of mathematical statistics. - M.: Nauka, 1965 (1st ed.), 1968 (2nd ed.), 1983 (3rd ed.).

5. Smirnov N. V., Dunin-Barkovsky I. V. A course in the theory of probability and mathematical statistics for technical applications. Ed. 3rd, stereotypical. - M.: Nauka, 1969. - 512 p.

6. Norman Draper, Harry Smith Applied regression analysis. Multiple Regression = Applied Regression Analysis. - 3rd ed. - M .: "Dialectics", 2007. - S. 912. - ISBN 0-471-17082-8

See also

Wikimedia Foundation. 2010 .

  • Yat Kha
  • Amalgam (disambiguation)

See what "Statistical Methods" is in other dictionaries:

    STATISTICAL METHODS- STATISTICAL METHODS scientific methods descriptions and studies of mass phenomena that allow a quantitative (numerical) expression. The word “statistics” (from the Yigal. stato state) has a common root with the word “state”. Initially it…… Philosophical Encyclopedia

    STATISTICAL METHODS -- scientific methods of description and study of mass phenomena that allow quantitative (numerical) expression. The word "statistics" (from Italian stato - state) has a common root with the word "state". Initially, it referred to the science of management and ... Philosophical Encyclopedia

    Statistical Methods- (in ecology and biocenology) methods of variation statistics that allow you to explore the whole (for example, phytocenosis, population, productivity) in its particular sets (for example, according to data obtained on registration sites) and assess the degree of accuracy ... ... Ecological dictionary

    statistical methods- (in psychology) (from Latin status status) some methods of applied mathematical statistics used in psychology mainly for processing experimental results. The main purpose of using S. m is to increase the validity of conclusions in ... ... Great Psychological Encyclopedia

    Statistical Methods- 20.2. Statistical Methods Specific statistical methods used to organize, regulate and validate activities include, but are not limited to: a) design of experiments and factor analysis; b) analysis of variance and … Dictionary-reference book of terms of normative and technical documentation

    STATISTICAL METHODS- Methods for the study of quantities. aspects of mass societies. phenomena and processes. S. m. make it possible in digital terms to characterize the ongoing changes in societies. processes, to study diff. forms of social economic. patterns, change ... ... Agricultural Encyclopedic Dictionary

    STATISTICAL METHODS- some methods of applied mathematical statistics used to process experimental results. A number of statistical methods have been developed specifically for quality assurance psychological tests, for use in professional ... ... Professional education. Dictionary

    STATISTICAL METHODS- (in engineering psychology) (from Latin status status) some methods of applied statistics used in engineering psychology to process experimental results. The main purpose of using S. m is to increase the validity of conclusions in ... ... encyclopedic Dictionary in psychology and pedagogy

In accordance with the three main possibilities - decision-making under conditions of complete certainty, risk and uncertainty - decision-making methods and algorithms can be divided into three main types: analytical, statistical and based on fuzzy formalization. In each specific case, the decision-making method is selected based on the task, the available initial data, the available problem models, the decision-making environment, the decision-making process, the required solution accuracy, and the analyst's personal preferences.

In some information systems, the algorithm selection process can be automated:

The corresponding automated system has the ability to use a variety of different types of algorithms (library of algorithms);

The system interactively prompts the user to answer a number of questions about the main characteristics of the problem under consideration;

Based on the results of the user's responses, the system offers the most appropriate (according to the criteria specified in it) algorithm from the library.

2.3.1 Probabilistic-statistical methods of decision making

Probabilistic-statistical decision-making methods (MPD) are used when the effectiveness of decisions made depends on factors that are random variables for which probability distribution laws and other statistical characteristics are known. Moreover, each decision can lead to one of the many possible outcomes, and each outcome has a certain probability of occurrence, which can be calculated. The indicators characterizing the problem situation are also described with the help of probabilistic characteristics. With such DPR, the decision maker always runs the risk of getting the wrong result, which he is guided by, choosing the optimal solution based on the averaged statistical characteristics of random factors, that is, the decision is made under risk conditions.

In practice, probabilistic and statistical methods are often used when conclusions drawn from sample data are transferred to the entire population (for example, from a sample to an entire batch of products). However, in this case, in each specific situation, one should first assess the fundamental possibility of obtaining sufficiently reliable probabilistic and statistical data.

When using the ideas and results of probability theory and mathematical statistics in making decisions, the base is a mathematical model in which objective relationships are expressed in terms of probability theory. Probabilities are primarily used to describe the randomness that must be taken into account when making decisions. This refers to both undesirable opportunities (risks) and attractive ones (“lucky chance”).

The essence of probabilistic-statistical decision-making methods is the use of probabilistic models based on estimation and testing of hypotheses using sample characteristics.

We emphasize that the logic of using sample characteristics for decision-making based on theoretical models involves the simultaneous use of two parallel series of concepts– related to theory (probabilistic model) and related to practice (sample of observational results). For example, the theoretical probability corresponds to the frequency found from the sample. The mathematical expectation (theoretical series) corresponds to the sample arithmetic mean (practical series). As a rule, sample characteristics are estimates of theoretical characteristics.

The advantages of using these methods include the ability to take into account various scenarios for the development of events and their probabilities. The disadvantage of these methods is that the scenario probabilities used in the calculations are usually very difficult to obtain in practice.

The application of a specific probabilistic-statistical decision-making method consists of three stages:

The transition from economic, managerial, technological reality to an abstract mathematical and statistical scheme, i.e. building a probabilistic model of a control system, a technological process, a decision-making procedure, in particular based on the results of statistical control, etc.

Carrying out calculations and obtaining conclusions by purely mathematical means within the framework of a probabilistic model;

Interpretation of mathematical and statistical conclusions in relation to a real situation and making an appropriate decision (for example, on the conformity or non-compliance of product quality with established requirements, the need to adjust the technological process, etc.), in particular, conclusions (on the proportion of defective units of products in a batch, on a specific form of laws of distribution of controlled parameters of the technological process, etc.).

A probabilistic model of a real phenomenon should be considered constructed if the quantities under consideration and the relationships between them are expressed in terms of probability theory. The adequacy of the probabilistic model is substantiated, in particular, using statistical methods for testing hypotheses.

Mathematical statistics is usually divided into three sections according to the type of problems to be solved: data description, estimation, and hypothesis testing. According to the type of statistical data being processed, mathematical statistics is divided into four areas:

One-dimensional statistics (statistics of random variables), in which the result of an observation is described by a real number;

Multivariate statistical analysis, where the result of observation of an object is described by several numbers (vector);

Statistics of random processes and time series, where the result of observation is a function;

Statistics of objects of a non-numerical nature, in which the result of an observation is of a non-numerical nature, for example, it is a set (a geometric figure), an ordering, or obtained as a result of a measurement by a qualitative attribute.

An example of when it is advisable to use probabilistic-statistical models.

When controlling the quality of any product, a sample is taken from it to decide whether the batch of products produced meets the established requirements. Based on the results of the sample control, a conclusion is made about the entire batch. In this case, it is very important to avoid subjectivity in the formation of the sample, i.e. it is necessary that each unit of product in the controlled lot has the same probability of being selected in the sample. The choice based on the lot in such a situation is not sufficiently objective. Therefore, in production conditions, the selection of units of production in the sample is usually carried out not by lot, but by special tables of random numbers or by means of computer random number generators.

In statistical regulation of technological processes, based on the methods of mathematical statistics, rules and plans for statistical control of processes are developed, aimed at timely detection of the disorder of technological processes and taking measures to adjust them and prevent the release of products that do not meet the established requirements. These measures are aimed at reducing production costs and losses from the supply of low-quality products. With statistical acceptance control, based on the methods of mathematical statistics, quality control plans are developed by analyzing samples from product batches. The difficulty lies in being able to correctly build probabilistic-statistical decision-making models, on the basis of which it is possible to answer the questions posed above. In mathematical statistics, probabilistic models and methods for testing hypotheses have been developed for this purpose3.

In addition, in a number of managerial, industrial, economic, national economic situations, problems of a different type arise - problems of estimating the characteristics and parameters of probability distributions.

Or, in a statistical analysis of the accuracy and stability of technological processes, it is necessary to evaluate such quality indicators as the average value of the controlled parameter and the degree of its spread in the process under consideration. According to the theory of probability, it is advisable to use it as the average value of a random variable expected value, and as a statistical characteristic of the spread - dispersion, standard deviation or coefficient of variation. This raises the question: how to estimate these statistical characteristics from sample data and with what accuracy can this be done? There are many similar examples in the literature. All of them show how probability theory and mathematical statistics can be used in production management when making decisions in the field of statistical product quality management.

In specific areas of application, both probabilistic-statistical methods of wide application and specific ones are used. For example, in the section of production management devoted to statistical methods of product quality control, applied mathematical statistics (including the design of experiments) are used. With the help of its methods, a statistical analysis of the accuracy and stability of technological processes and a statistical assessment of quality are carried out. Specific methods include methods of statistical acceptance control of product quality, statistical regulation of technological processes, assessment and control of reliability, etc.

In production management, in particular when optimizing product quality and ensuring compliance with standards, it is especially important to apply statistical methods at the initial stage. life cycle products, i.e. at the stage of research preparation of experimental design developments (development of promising requirements for products, preliminary design, terms of reference for experimental design development). This is due to the limited information available at the initial stage of the product life cycle and the need to predict the technical possibilities and economic situation for the future.

The most common probabilistic-statistical methods are regression analysis, factor analysis, analysis of variance, statistical methods for risk assessment, scenario method, etc. The field of statistical methods, devoted to the analysis of statistical data of a non-numeric nature, is gaining more and more importance. measurement results on qualitative and heterogeneous features. One of the main applications of statistics of objects of non-numerical nature is the theory and practice of expert assessments related to the theory of statistical decisions and voting problems.

The role of a person in solving problems using the methods of the theory of statistical decisions is to formulate the problem, i.e., to bring the real problem to the corresponding model one, to determine the probabilities of events based on statistical data, and also to approve the resulting optimal solution.

Send your good work in the knowledge base is simple. Use the form below

Students, graduate students, young scientists who use the knowledge base in their studies and work will be very grateful to you.

Posted on http://www.allbest.ru/

Posted on http://www.allbest.ru/

Introduction

1. Chi-square distribution

Conclusion

Appendix

Introduction

How are the approaches, ideas and results of probability theory used in our lives? mathematical square theory

The base is a probabilistic model of a real phenomenon or process, i.e. a mathematical model in which objective relationships are expressed in terms of probability theory. Probabilities are used primarily to describe the uncertainties that must be taken into account when making decisions. This refers to both undesirable opportunities (risks) and attractive ones ("lucky chance"). Sometimes randomness is deliberately introduced into the situation, for example, when drawing lots, random selection of units for control, conducting lotteries or consumer surveys.

Probability theory allows one to calculate other probabilities that are of interest to the researcher.

A probabilistic model of a phenomenon or process is the foundation of mathematical statistics. Two parallel series of concepts are used - those related to theory (a probabilistic model) and those related to practice (a sample of observational results). For example, the theoretical probability corresponds to the frequency found from the sample. The mathematical expectation (theoretical series) corresponds to the sample arithmetic mean (practical series). As a rule, sample characteristics are estimates of theoretical ones. At the same time, the quantities related to the theoretical series "are in the minds of researchers", refer to the world of ideas (according to the ancient Greek philosopher Plato), and are not available for direct measurement. Researchers have only selective data, with the help of which they try to establish the properties of the theoretical probabilistic model that are of interest to them.

Why do we need a probabilistic model? The fact is that only with its help it is possible to transfer the properties established by the results of the analysis of a particular sample to other samples, as well as to the entire so-called general population. The term "population" is used to refer to a large but finite population of units being studied. For example, about the totality of all residents of Russia or the totality of all consumers of instant coffee in Moscow. The purpose of marketing or sociological surveys is to transfer statements received from a sample of hundreds or thousands of people to general populations of several million people. In quality control, a batch of products acts as a general population.

To transfer inferences from a sample to a larger population, some assumptions are needed about the relationship of sample characteristics with the characteristics of this larger population. These assumptions are based on an appropriate probabilistic model.

Of course, it is possible to process sample data without using one or another probabilistic model. For example, you can calculate the sample arithmetic mean, calculate the frequency of fulfillment of certain conditions, etc. However, the results of the calculations will apply only to a specific sample; transferring the conclusions obtained with their help to any other set is incorrect. This activity is sometimes referred to as "data analysis". Compared to probabilistic-statistical methods, data analysis has limited cognitive value.

So, the use of probabilistic models based on estimation and testing of hypotheses with the help of sample characteristics is the essence of probabilistic-statistical decision-making methods.

1. Chi-square distribution

The normal distribution defines three distributions that are now often used in statistical data processing. These are the distributions of Pearson ("chi - square"), Student and Fisher.

We will focus on the distribution ("chi - square"). This distribution was first studied by the astronomer F. Helmert in 1876. In connection with the Gaussian theory of errors, he studied the sums of squares of n independent standard normally distributed random variables. Later, Karl Pearson named this distribution function "chi-square". And now the distribution bears his name.

Due to its close connection with the normal distribution, the h2 distribution plays an important role in probability theory and mathematical statistics. The h2 distribution, and many other distributions that are defined by the h2 distribution (for example, the Student's distribution), describe sample distributions of various functions from normal distributed results observations and are used to construct confidence intervals and statistical tests.

Pearson distribution (chi - squared) - distribution of a random variable where X1, X2, ..., Xn are normal independent random variables, and the mathematical expectation of each of them is zero, and the standard deviation is one.

Sum of squares

distributed according to the law ("chi - square").

In this case, the number of terms, i.e. n, is called the "number of degrees of freedom" of the chi-squared distribution. As the number of degrees of freedom increases, the distribution slowly approaches normal.

The density of this distribution

So, the distribution of h2 depends on one parameter n - the number of degrees of freedom.

The distribution function h2 has the form:

if h2?0. (2.7.)

Figure 1 shows a graph of the probability density and the χ2 distribution function for different degrees of freedom.

Figure 1 Dependence of the probability density q (x) in the distribution of h2 (chi - squared) for a different number of degrees of freedom

Moments of the "chi-square" distribution:

The chi-squared distribution is used in variance estimation (using a confidence interval), in testing hypotheses of agreement, homogeneity, independence, primarily for qualitative (categorized) variables that take a finite number of values, and in many other tasks of statistical data analysis.

2. "Chi-square" in problems of statistical data analysis

Statistical methods of data analysis are used in almost all areas of human activity. They are used whenever it is necessary to obtain and substantiate any judgments about a group (objects or subjects) with some internal heterogeneity.

The modern stage of development of statistical methods can be counted from 1900, when the Englishman K. Pearson founded the journal "Biometrika". First third of the 20th century passed under the sign of parametric statistics. Methods based on the analysis of data from parametric families of distributions described by Pearson family curves were studied. The most popular was the normal distribution. The Pearson, Student, and Fisher criteria were used to test the hypotheses. The maximum likelihood method, analysis of variance were proposed, and the main ideas for planning the experiment were formulated.

The chi-square distribution is one of the most widely used in statistics for testing statistical hypotheses. On the basis of the "chi-square" distribution, one of the most powerful goodness-of-fit tests, Pearson's "chi-square" test, is constructed.

The goodness-of-fit test is a criterion for testing the hypothesis about the proposed law of the unknown distribution.

The p2 ("chi-square") test is used to test the hypothesis of different distributions. This is his merit.

The calculation formula of the criterion is equal to

where m and m" are empirical and theoretical frequencies, respectively

distribution under consideration;

n is the number of degrees of freedom.

For verification, we need to compare empirical (observed) and theoretical (calculated under the assumption of a normal distribution) frequencies.

If the empirical frequencies completely coincide with the frequencies calculated or expected, S (E - T) = 0 and the criterion ch2 will also be equal to zero. If S (E - T) is not equal to zero, this will indicate a discrepancy between the calculated frequencies and the empirical frequencies of the series. In such cases, it is necessary to evaluate the significance of the criterion p2, which theoretically can vary from zero to infinity. This is done by comparing the actually obtained value of ch2f with its critical value (ch2st). (a) and number of degrees of freedom (n).

The distribution of probable values ​​of the random variable h2 is continuous and asymmetric. It depends on the number of degrees of freedom (n) and approaches a normal distribution as the number of observations increases. Therefore, the application of the p2 criterion to the estimation of discrete distributions is associated with some errors that affect its value, especially for small samples. To obtain more accurate estimates, the sample distributed in variation series, must have at least 50 options. The correct application of the p2 criterion also requires that the frequencies of variants in the extreme classes should not be less than 5; if there are less than 5 of them, then they are combined with the frequencies of neighboring classes so that their total amount is greater than or equal to 5. According to the combination of frequencies, the number of classes (N) also decreases. The number of degrees of freedom is set according to the secondary number of classes, taking into account the number of restrictions on the freedom of variation.

Since the accuracy of determining the criterion p2 largely depends on the accuracy of calculating the theoretical frequencies (T), unrounded theoretical frequencies should be used to obtain the difference between the empirical and calculated frequencies.

As an example, take a study published on a website dedicated to the application of statistical methods in the humanities.

The Chi-square test allows comparison of frequency distributions, whether they are normally distributed or not.

Frequency refers to the number of occurrences of an event. Usually, the frequency of occurrence of an event is dealt with when the variables are measured in the scale of names and their other characteristics, except for the frequency, are impossible or problematic to select. In other words, when the variable has qualitative characteristics. Also, many researchers tend to translate test scores into levels (high, medium, low) and build tables of score distributions to find out the number of people at these levels. To prove that in one of the levels (in one of the categories) the number of people is really more (less), the Chi-square coefficient is also used.

Let's take a look at the simplest example.

A self-esteem test was conducted among younger adolescents. Test scores were translated into three levels: high, medium, low. The frequencies were distributed as follows:

High (H) 27 pers.

Medium (C) 12 people

Low (H) 11 pers.

Obviously, children with high self-esteem majority, but this needs to be statistically proven. To do this, we use the Chi-square test.

Our task is to check whether the obtained empirical data differ from the theoretically equally probable ones. To do this, it is necessary to find the theoretical frequencies. In our case, theoretical frequencies are equiprobable frequencies that are found by adding all frequencies and dividing by the number of categories.

In our case:

(B + C + H) / 3 \u003d (27 + 12 + 11) / 3 \u003d 16.6

The formula for calculating the chi-square test is:

h2 \u003d? (E - T) I / T

We build a table:

Empirical (Uh)

Theoretical (T)

(E - T)І / T

Find the sum of the last column:

Now you need to find the critical value of the criterion according to the table of critical values ​​(Table 1 in the Appendix). To do this, we need the number of degrees of freedom (n).

n = (R - 1) * (C - 1)

where R is the number of rows in the table, C is the number of columns.

In our case, there is only one column (meaning the original empirical frequencies) and three rows (categories), so the formula changes - we exclude the columns.

n = (R - 1) = 3-1 = 2

For the error probability p?0.05 and n = 2, the critical value is h2 = 5.99.

The empirical value obtained is greater than the critical value - the frequency differences are significant (n2= 9.64; p≤0.05).

As you can see, the calculation of the criterion is very simple and does not take much time. The practical value of the chi-square test is enormous. This method is most valuable in the analysis of responses to questionnaires.

Let's take a more complex example.

For example, a psychologist wants to know if it is true that teachers are more biased towards boys than towards girls. Those. more likely to praise girls. To do this, the psychologist analyzed the characteristics of students written by teachers, regarding the frequency of occurrence of three words: "active", "diligent", "disciplined", synonyms of words were also counted.

Data on the frequency of occurrence of words were entered in the table:

To process the obtained data, we use the chi-square test.

To do this, we construct a table of distribution of empirical frequencies, i.e. the frequencies that we observe:

Theoretically, we expect the frequencies to be distributed equally, i.e. the frequency will be distributed proportionally between boys and girls. Let's build a table of theoretical frequencies. To do this, multiply the row sum by the column sum and divide the resulting number by total amount(s).

The resulting table for calculations will look like this:

Empirical (Uh)

Theoretical (T)

(E - T)І / T

boys

"Active"

"Diligent"

"Disciplined"

"Active"

"Diligent"

"Disciplined"

Amount: 4.21

h2 \u003d? (E - T) I / T

where R is the number of rows in the table.

In our case, chi-square = 4.21; n = 2.

According to the table of critical values ​​of the criterion, we find: with n = 2 and an error level of 0.05, the critical value h2 = 5.99.

The resulting value is less than the critical value, which means that the null hypothesis is accepted.

Conclusion: teachers do not attach importance to the gender of the child when writing his characteristics.

Conclusion

Students of almost all specialties study the section "probability theory and mathematical statistics" at the end of the course of higher mathematics, in reality they get acquainted only with some basic concepts and results, which are clearly not enough for practical work. Students meet some mathematical methods of research in special courses (for example, such as "Forecasting and technical and economic planning", "Technical and economic analysis", "Product quality control", "Marketing", "Controlling", "Mathematical methods of forecasting ", "Statistics", etc. - in the case of students economic specialties), however, the presentation in most cases is very abbreviated and prescription in nature. As a result, the knowledge of applied statisticians is insufficient.

Therefore, the course "Applied Statistics" in technical universities is of great importance, and in economic universities - the course "Econometrics", since econometrics, as you know, is a statistical analysis of specific economic data.

Probability theory and mathematical statistics provide fundamental knowledge for applied statistics and econometrics.

They are necessary for specialists for practical work.

I considered a continuous probabilistic model and tried to show its useability with examples.

And at the end of my work, I came to the conclusion that the competent implementation of the basic procedures of mathematical and static data analysis, static testing of hypotheses is impossible without knowledge of the chi-square model, as well as the ability to use its table.

Bibliography

1. Orlov A.I. Applied statistics. M.: Publishing house "Exam", 2004.

2. Gmurman V.E. Theory of Probability and Mathematical Statistics. M.: Higher school, 1999. - 479s.

3. Ayvozyan S.A. Probability Theory and Applied Statistics, v.1. M.: Unity, 2001. - 656s.

4. Khamitov G.P., Vedernikova T.I. Probabilities and statistics. Irkutsk: BSUEP, 2006 - 272p.

5. Ezhova L.N. Econometrics. Irkutsk: BSUEP, 2002. - 314p.

6. Mosteller F. Fifty entertaining probabilistic problems with solutions. M.: Nauka, 1975. - 111p.

7. Mosteller F. Probability. M.: Mir, 1969. - 428s.

8. Yaglom A.M. Probability and information. M.: Nauka, 1973. - 511s.

9. Chistyakov V.P. Probability course. M.: Nauka, 1982. - 256s.

10. Kremer N.Sh. Theory of Probability and Mathematical Statistics. M.: UNITI, 2000. - 543p.

11. Mathematical encyclopedia, v.1. M.: Soviet Encyclopedia, 1976. - 655s.

12. http://psystat.at.ua/ - Statistics in psychology and pedagogy. Article Chi-square test.

Appendix

Critical distribution points p2

Table 1

Hosted on Allbest.ru

...

Similar Documents

    Probabilistic model and axiomatics A.N. Kolmogorov. Random variables and vectors, the classical limit problem of probability theory. Primary processing of statistical data. Point estimates of numerical characteristics. Statistical testing of hypotheses.

    training manual, added 03/02/2010

    Rules for execution and design control works for the correspondence department. Tasks and examples of solving problems in mathematical statistics and probability theory. Distribution reference data tables, standard normal distribution density.

    training manual, added 11/29/2009

    Basic methods of formalized description and analysis of random phenomena, processing and analysis of the results of physical and numerical experiments of probability theory. Basic concepts and axioms of probability theory. Basic concepts of mathematical statistics.

    course of lectures, added 04/08/2011

    Determination of the probability distribution law of measurement results in mathematical statistics. Compliance check empirical distribution theoretical. Determination of the confidence interval in which the value of the measured quantity lies.

    term paper, added 02/11/2012

    Convergence of sequences of random variables and probability distributions. Method of characteristic functions. Testing statistical hypotheses and fulfilling the central limit theorem for given sequences of independent random variables.

    term paper, added 11/13/2012

    The main stages of processing data from natural observations by the method of mathematical statistics. Evaluation of the obtained results, their use in making managerial decisions in the field of nature protection and nature management. Testing statistical hypotheses.

    practical work, added 05/24/2013

    The essence of the distribution law and its practical application for solving statistical problems. Determination of the variance of a random variable, mathematical expectation and standard deviation. Features of one-way analysis of variance.

    test, added 12/07/2013

    Probability and its general definition. Theorems of addition and multiplication of probabilities. Discrete random variables and their numerical characteristics. The law of large numbers. Statistical distribution of the sample. Elements of correlation and regression analysis.

    course of lectures, added 06/13/2015

    Course program, basic concepts and formulas of probability theory, their justification and significance. Place and role of mathematical statistics in the discipline. Examples and explanations for solving the most common tasks on various topics of these academic disciplines.

    training manual, added 01/15/2010

    Probability theory and mathematical statistics are sciences about methods of quantitative analysis of mass random phenomena. A set of values ​​of a random variable is called a sample, and the elements of the set are called sample values ​​of a random variable.

Probabilistic-statistical methods for modeling economic systems


Introduction


As a rule, the task of identifying the distribution law of an observed random variable (structural-parametric identification) is usually understood as the problem of choosing such a parametric model of the probability distribution law that best matches the results of experimental observations. Random errors of measuring instruments are not so often subject to the normal law, more precisely, they are not so often well described by the normal law model. Measuring devices and systems are based on different physical principles, different measurement methods and different conversions of measuring signals. Measurement errors as quantities are the result of the influence of many factors, random and non-random, acting constantly or episodically. Therefore, it is clear that only when certain prerequisites (theoretical and technical) are met, the measurement errors are sufficiently well described by the normal law model.

Generally speaking, it should be understood that the true distribution law (if it exists, of course), describing the errors of a particular measuring system, remains (remains) unknown, despite all our attempts to identify it. Based on measurement data and theoretical considerations, we can only choose a probabilistic model that, in some sense, best approximates this true law. If the constructed model is adequate, that is, the applied criteria do not give grounds for its rejection, then on the basis of this model it is possible to calculate all the probabilistic characteristics of the random component of the measuring instrument error that are of interest to us, which will differ from the true values ​​only due to the not excluded systematic (unobserved or unregistered ) component of the measurement error. Its smallness characterizes the correctness of measurements. The set of possible probability distribution laws that can be used to describe the observed random variables is not limited. It makes no sense to set the task of identification as the goal of finding the true distribution law of the observed quantity. We can only solve the problem of choosing the best model from a certain set. For example, from that set of parametric laws and distribution sets that are used in applications and references to which can be found in the literature.

Classical approach to structural-parametric identification of the distribution law. Under the classical approach, we mean the algorithm for choosing the distribution law, which is entirely based on the apparatus of mathematical statistics.


1. Elementary concepts about random events, quantities and functions


We have already seen that for many experiments there are no differences in the calculation of the probabilities of events, while the elementary outcomes in these experiments are very different. But it is precisely the probabilities of events that should interest us, and not the structure of the space of elementary outcomes. Therefore, it is time to use, for example, numbers instead of the most different elementary outcomes in all such “similar” experiments. In other words, each elementary outcome should be assigned some real number, and work only with numbers.

Let the probability space be given.

Definition 26.Function called random variable, if for any Borel set lots of is an event, i.e. belongs - algebra .

Lots of , consisting of those elementary outcomes , for which belongs , is called the full inverse image of the set .

Remark 9 . In general, let the function operates from many into the multitude , and are given -algebras And subsets And respectively. Function called measurable, if for any set its full prototype belongs .

Remark 10. The reader who does not want to bother with abstractions related to -algebras of events and with measurability, can safely assume that any set of elementary outcomes is an event, and, therefore, a random variable is arbitraryfunction from in . This does not cause trouble in practice, so you can skip everything further in this paragraph.

Now, having got rid of inquisitive readers, let's try to understand why a random variable needs measurability.

If a random variable is given , we may need to compute probabilities of the form , , , (and in general a variety of probabilities of falling into Borel sets on the line). This is possible only if the sets under the sign of probability are events, because probabilitythere is a function defined only on -algebra of events. The measurability requirement is equivalent to the fact that for any Borel set probability is determined.

One can demand something else in Definition 26. For example, for an event to be a hit in any interval: , or in any half-interval: .

Let us verify, for example, that definitions 26 and 27 are equivalent:

Definition 27. Function is called a random variable if for any real lots of belongs to -algebra .

Proof equivalence of definitions 26, 27.

If - a random variable in the sense of Definition 26, then it will be a random variable in the sense of Definition 27, since any interval is a Borel set.

Let us prove that the converse is also true. Let for any interval done . We must prove that the same is true for any Borel sets.

Collect in abundance all subsets of the real line whose preimages are events. Lots of already contains all intervals . Let us now show that the set is an -algebra. By definition, if and only if the set belongs .

1. Let's make sure that . But and hence .

2. Let's make sure that for anyone . Let be . Then , because - -algebra.

3. Let's make sure that for any . Let be for all . But - -algebra, so

We have proven that - -algebra and contains all intervals on the line. But - the smallest of -algebras containing all intervals on the line. Consequently, contains : .

Let us give examples of measurable and non-measurable functions.

Example 25. We toss the cube. Let be , and two functions from in set like this: , . Not set yet -algebra , one cannot speak of measurability. A function measurable with respect to some -algebras , may not be the same for another .

If there is a set of all subsets , then And are random variables, since any set of elementary outcomes belongs to , including or . You can write a correspondence between the values ​​of random variables And and probabilities to take these values ​​in the form "probability distribution tables"or, briefly, "distribution tables":

Here .


2. Let - algebra of events consists of four sets:



those. an event is, except for certain and impossible events, the loss of an even or odd number of points. Let us make sure that with such a relatively poor -algebra , nor are not random variables because they are not measurable. Let's take, let's say . We see that and


2. Numerical characteristics of random variables


Expected value.The mathematical expectation of a discrete random variable X, which takes a finite number of values ​​xi with probabilities pi, is the sum:


(6a)


The mathematical expectation of a continuous random variable X is the integral of the product of its values ​​x and the probability distribution density f(x):


(6b)


The improper integral (6b) is assumed to be absolutely convergent (otherwise, the expected value M(X) is said not to exist). Mathematical expectation characterizes the average value of the random variable X. Its dimension coincides with the dimension of the random variable. Properties of mathematical expectation:



Dispersion.The variance of a random variable X is the number:



Dispersion is a characteristic of the dispersion of the values ​​of a random variable X relative to its average value M (X). The dimension of the variance is equal to the dimension of the random variable squared. Based on the definitions of variance (8) and mathematical expectation (5) for a discrete random variable and (6) for a continuous random variable, we obtain similar expressions for the variance:



Here m = M(X).

Dispersion properties:


(10)


Standard deviation:


(11)


Since the dimension of the standard deviation is the same as that of a random variable, it is more often than the variance used as a measure of dispersion.

distribution moments.The concepts of mathematical expectation and variance are special cases of more general concept for numerical characteristics of random variables - distribution moments. The distribution moments of a random variable are introduced as mathematical expectations of some simple functions of a random variable. Thus, the moment of order k relative to the point x0 is the mathematical expectation M (X - x0) k. Moments relative to the origin x = 0 are called initial moments and are denoted:


(12)


The initial moment of the first order is the distribution center of the considered random variable:


(13)


Moments about the distribution center x = m are called central moments and are denoted:


(14)


From (7) it follows that the central moment of the first order is always equal to zero:


(15)


The central moments do not depend on the origin of the values ​​of the random variable, since with a shift by a constant value of C, its center of distribution is shifted by the same value of C, and the deviation from the center does not change:


X - m \u003d (X - C) - (m - C).


It is now obvious that the variance is a second-order central moment:


(16)


Asymmetry.Central moment of the third order:


(17)


serves to estimate the skewness of the distribution. If the distribution is symmetrical with respect to the point x = m, then the central moment of the third order will be equal to zero (as well as all central moments of odd orders). Therefore, if the central moment of the third order is different from zero, then the distribution cannot be symmetric. The amount of asymmetry is estimated using a dimensionless asymmetry coefficient:


(18)


The sign of the asymmetry coefficient (18) indicates right-sided or left-sided asymmetry (Fig. 2).


Rice. 1. Types of distribution skewness


Excess.Central moment of the fourth order:


(19)


serves to estimate the so-called kurtosis, which determines the degree of steepness (pointedness) of the distribution curve near the center of the distribution with respect to the normal distribution curve. Since for a normal distribution , then the following value is taken as kurtosis:


(20)


On fig. 3 shows examples of distribution curves with different values ​​of kurtosis. For a normal distribution, E = 0. Curves that are more peaked than normal have a positive kurtosis, and more flat ones have a negative kurtosis.


Rice. 2. Distribution curves with different degrees of steepness (kurtosis)


Higher-order moments in engineering applications of mathematical statistics are usually not used.

Fashiondiscrete random variable is its most probable value. The mode of a continuous random variable is its value at which the probability density is maximum (Fig. 2). If the distribution curve has one maximum, then the distribution is called unimodal. If the distribution curve has more than one maximum, then the distribution is called polymodal. Sometimes there are distributions whose curves have not a maximum, but a minimum. Such distributions are called antimodal. In the general case, the mode and the mathematical expectation of a random variable do not coincide. In a special case, for a modal, i.e. having a mode, a symmetric distribution, and provided that there is a mathematical expectation, the latter coincides with the mode and the center of symmetry of the distribution.

Medianrandom variable X is its value Me, for which the equality takes place: those. it is equally probable that the random variable X will be less than or greater than Me. Geometrically, the median is the abscissa of the point at which the area under the distribution curve is bisected. In the case of a symmetric modal distribution, the median, mode, and mean are the same.


. Statistical evaluation of the laws of distribution of random variables


The general population is the totality of all objects to be studied or the possible results of all observations made under the same conditions on one object.

sampling set or a sample is a set of objects or results of observation of an object, selected randomly from the general population.

Sample sizeis the number of objects or observations in the sample.

Specific values the samples are called the observed values ​​of the random variable X. The observed values ​​are recorded in the protocol. The protocol is a table. The compiled protocol is the primary form of recording the processing of the received material. To obtain reliable, reliable conclusions, the sample must be sufficiently representative in terms of volume. A large sample is an unordered set of numbers. For the study, the sample is brought to a visual ordered form. To do this, the protocol finds the largest and smallest values ​​of a random variable. The sample, sorted in ascending order, is shown in Table 1.

Table 1. Protocol

8,66-5,49-4,11-3,48-2,9-2,32-1,82-1,09-0,440,64-8,31-4,71-3,92-3,41-2,85-2,31-1,82-1,01-0,430,71-8,23-4,68-3,85-3,33-2,83-2,29-1,8-0,99-0,430,73-7,67-4,6-3,85-3,25-2,77-2,27-1,77-0,95-0,310,99-6,64-4,43-3,81-3,08-2,72-2,25-1,73-0,89-0,31,03-6,6-4,38-3,8-3,07-2,67-2,19-1,38-0,70,041,05-6,22-4,38-3,77-3,01-2,6-2,15-1,32-0,560,081,13-5,87-4,25-3,73-3,01-2,49-2,09-1,3-0,510,151,76-5,74-4,18-3,59-2,99-2,37-2,01-1,28-0,490,262,95-5,68-4,14-3,49-2,98-2,33-1,91-1,24-0,480,534,42

Sampling spanis the difference between the largest and smallest value of the random variable X:

The range of the sample is divided into k intervals - digits. The number of digits is set depending on the size of the sampling range from 8 to 25, in this term paper let's take k = 10.

Then the length of the interval will be equal to:

In the protocol, we count the number of observed values ​​that fall into each interval, denote them m1, m2, ..., m10. .

Let's call mi hit raterandom variable in i interval. If any observed value of a random variable coincides with the end of the interval, then this value of the random variable, by agreement, is assigned to one of the intervals.

After we have determined the frequencies mi, we define frequenciesrandom variable, i.e. we find the ratio of frequencies mi to the total number of observed values ​​n.

Frequency, completeness condition -

Find the middle of each interval: .

Let's make a table 2

Table of interval limits values and corresponding frequencies , where i = 1, 2, 3, …, k, is called a statistical series. A graphic representation of a statistical series is called a histogram. It is constructed as follows: intervals are plotted along the abscissa, and on each such interval, as on the basis, a rectangle is constructed, the area of ​​\u200b\u200bwhich is equal to the corresponding frequency.

, - the height of the rectangle, .


table 2

Interval numberLeft border of the intervalRight border of the intervalIntervalMiddle of the intervalInterval frequencyInterval frequencyRectangle height .030.02293-6.044-4.736(-6.044; -4.736)-5.3940.040.03064-4.736-3.428(-4.736; -3.428)-4.082200.20.15295-3.428-2.12(- 3.428; -2.12)-2.774260.260.19886-2.12-0.812(-2.12; -0.812)-1.466180.180.13767-0.8120.496(-0.812; 0.496) -0.158140.140.107080.4961.804(0.496; 1.804)1.1590.090.068891.8043.112(1.804; 3.112)2.45810.010.0076103.1124.42(3.112; 4.42 )3.76610.010.0076Sum1001

Figure 3


The statistical distribution function is the frequency of a random variable that does not exceed a given value X:

For a discrete random variable X, the statistical distribution function is found by the formula:

We write the statistical distribution function in expanded form:

where is the middle of the interval i, and are the corresponding frequencies, where i=1, 2,…, k.

The graph of the statistical distribution function is a stepped line, the break points of which are the midpoints of the intervals, and the final jumps are equal to the corresponding frequencies.


Figure 3


Calculation of numerical characteristics of a statistical series

Statistical mathematical expectation,

statistical variance,

Statistical standard deviation.

Statistical expectationor statistical mediumis called the arithmetic mean of the observed values ​​of the random variable X.

Statistical dispersionis called the arithmetic mean value or

With a large sample size, calculations by formulas and lead to cumbersome calculations. To simplify calculations, a statistical series with boundaries is used and frequencies , where i = 1, 2, 3, …, k, find the midpoints of the intervals , and then all elements of the selection , which fell into the interval , is replaced by a single value , then there will be such values in every interval.

where - average value of the corresponding interval ;- interval frequency

Table 4. Numerical characteristics

Frequency PiXiPi(Xi-m)^2(Xi-m)^2*Pi1-8.0060.04-0.320231.486911.25952-6.6980.03-0.200918.518560.55563-5.390.04 -0.21568.971940.35894-4.0820.20-0.81642.847050.56945-2.7740.26-0.72120.143880.03746-1.4660.18-0.26390.862450.15527 Statistical mean -2.3947 Statistical variance 5.3822Statistical standard deviation2.3200

Determines the position of the grouping center of the observed values ​​of the random variable.

, characterize the dispersion of the observed values ​​of the random variable around

In any statistical distribution, there are inevitably elements of randomness. However, at very large numbers observations, these accidents are smoothed out, and random phenomena reveal a regularity inherent in it.

When processing statistical material, one has to decide how to choose a theoretical curve for a given statistical series. This theoretical distribution curve should express the essential features of the statistical distribution - this task is called the task of smoothing or leveling the statistical series.

Sometimes the general form of the distribution of a random variable X follows from the very nature of this random variable.

Let the random variable X be the result of measuring some physical quantity of the device.

X \u003d exact value of a physical quantity + instrument error.

The random error of the device during the measurement has a total nature and is distributed according to the normal law. Therefore, the random variable X has the same distribution, i.e. normal distribution with probability density:


Where , , .


Parameters And are determined so that the numerical characteristics of the theoretical distribution are equal to the corresponding numerical characteristics of the statistical distribution. Under a normal distribution, it is assumed that ,,, then the normal distribution function will take the form:

Table 5. Leveling curve

Interval numberInterval middle Xi tabulated function normal curve 1-8.0060-2.41870.02140.00922-6.6980-1.85490.07140.03083-5.3900-1.29110.17340.07474-4.0820-0.72730.30620.13205- 2.7740-0.16350.39360.1697M-2.394700.39890.17206-1.46600.40030.36820.15877-0.15800.96410.25070.108081.15001.52790.12420.05802.4 09170.04480.0193103.76602.65550.01170.0051

We construct a theoretical normal curve from points on the same chart with the histogram of the statistical series (Error! Reference source not found).


Figure 6


Flattening the statistical distribution function

Statistical distribution function align with the distribution function of the normal law:



where ,,is the Laplace function.


Table 7 Distribution function

Interval numberInterval middle Xi Laplace function distribution function 1-8.0060-2.4187-0.49220.00782-6.6980-1.8549-0.46820.03183-5.3900-1.2911-0.40170.09834-4.0820-0, 7273-0.26650.23355-2.7740-0.1635-0.06490.4351m-2.3947000.50006-1.46600.40030.15550.65557-0.15800.96410.33250.832581.15001, 52790,43670,936792,45802,09170,48180,9818103,76602,65550,49600,9960

We build a plot of the theoretical distribution function by points / together with a graph of the statistical distribution function.


Figure 6


Let a random variable X be studied with mathematical expectation and dispersion , both parameters are unknown.

Let х1, х2, х3, …, хn be a sample obtained as a result of n independent observations of a random variable X. To emphasize the random nature of the values ​​х1, х2, х3, …, хn, we rewrite them in the form:

Х1, Х2, Х3, …, Хn, where Хi is the value of the random variable Х in the i-th experiment.

Based on these experimental data, it is required to estimate the mathematical expectation and variance of a random variable. Such estimates are called point estimates, and as an estimate of m and D, we can take the statistical expectation and statistical variance , where



Before the experiment, the sample X1, X2, X3, ..., Xn is a set of independent random variables that have a mathematical expectation and variance, which means that the probability distribution is the same as the random variable X itself. Thus:


Where i = 1, 2, 3, …, n.


Based on this, we find the mathematical expectation and variance of the random variable (using the properties of mathematical expectation).

Thus, the mathematical expectation of the statistical mean is equal to the exact value of the mathematical expectation m of the measured value, and the variance of the statistical mean n times smaller than the dispersion of individual measurement results.


at


This means that with a large sample size N, the statistical average is an almost non-random value, it only slightly deviates from the exact value of the random variable m. This law is called Chebyshev's law of large numbers.

Point estimates of unknown values ​​of the mathematical expectation and variance are of great importance at the initial stage of processing static data. Their disadvantage is that it is not known with what accuracy they give the estimated parameter.

Let for the given sample X1, X2, X3, …, Xn exact statistical estimates And , then the numerical characteristics of the random variable X will be approximately equal to . For a sample of a small size, the issue of streaming estimation is essential, because between m and , D and deviations are not large enough. In addition, when solving practical problems, it is required not only to find approximate values ​​of m and D, but also to evaluate their accuracy and reliability. Let be , i.e. is a point estimate for m. It's obvious that the more accurately determines m, the smaller the modulus of the difference . Let be , where ?>0, then the less ?, the more accurate is the estimate of m. In this way, ?>0 characterizes the accuracy of parameter estimation. However, statistical methods do not allow us to state categorically that the estimate of the true value of m satisfies , we can only talk about the probability ?, with which this inequality is satisfied:

In this way, ?- this confidence levelor reliability of the estimate, meaning ? are chosen in advance depending on the problem to be solved. Reliability ? it is customary to choose 0.9; 0.95; 0.99; 0.999. Events with such a probability are practically certain. For a given confidence level, you can find the number ?>0 from .

Then we get the interval , which covers with probability ? the true value of the expectation m, the length of this interval is 2 ?. This interval is called confidence interval. And this way of estimating the unknown parameter m - interval.



Let a sample Х1, Х2, Х3, …, Хn be given, and let this sample find , ,.

It is required to find the confidence interval for mathematical expectation m with confidence probability ?. Value is a random variable with mathematical expectation, .

Random value has a total nature, with a large sample size, it is distributed according to a law close to normal. Then the probability of a random variable falling into the interval will be equal to:


Where


Where is the Laplace function.

From formula (3) and the tables of the Laplace function, we find the number ?>0 and write the confidence interval for the exact value random variable X with reliability ?.

In this course work, the value ? replace , and then formula (3) will take the form:

Let's find the confidence interval , which contains the mathematical expectation. At ? = 0.99, n = 100, ,.

according to the Laplace tables we find:

From here? = 0.5986.

Confidence interval in which the exact value of the mathematical expectation lies with a 99% probability.


Conclusion

random value economic distribution

Solving the problems of structural-parametric identification with limited sample sizes, which, as a rule, metrologists have, exacerbates the problem. In this case, the correctness of the application of statistical methods of analysis is even more important. the use of estimates with the best statistical properties and criteria with the highest power.

When solving identification problems, it is preferable to rely on classical approach. When identifying, it is recommended to consider a wider set of distribution laws, including models in the form of mixtures of laws. In this case, for any empirical distribution, we can always construct an adequate, statistically significantly more justified mathematical model.

One should be guided by the use and development of software systems that provide the solution of problems of structural and parametric identification of distribution laws for any form of recorded observations (measurements), including modern methods statis analytical analysis, focus on a wide, but correct use of computer modeling methods in research. We have already seen that for many experiments there are no differences in the calculation of the probabilities of events, while the elementary outcomes in these experiments are very different. But it is precisely the probabilities of events that should interest us, and not the structure of the space of elementary outcomes. Therefore, it is time to use, for example, numbers instead of the most different elementary outcomes in all such “similar” experiments. In other words, each elementary outcome should be assigned some real number, and work only with numbers.

mob_info