Construct an empirical function for a given sampling distribution. Empirical distribution function, properties. Examples of problems on finding the empirical distribution function

Variation series. Polygon and histogram.

Distribution range- represents an ordered distribution of units of the population being studied into groups according to a certain varying characteristic.

Depending on the characteristic underlying the formation of the distribution series, they are distinguished attributive and variational distribution rows:

§ Distribution series constructed in ascending or descending order of values ​​of a quantitative characteristic are called variational.

The variation series of the distribution consists of two columns:

The first column provides quantitative values ​​of the varying characteristic, which are called options and are designated . Discrete option - expressed as an integer. The interval option ranges from and to. Depending on the type, options can be constructed discrete or interval variation series.
The second column contains number of specific option, expressed in terms of frequencies or frequencies:

Frequencies- these are absolute numbers, showing the number of times a given value of a characteristic occurs in the aggregate, which denote. The sum of all frequencies must be equal to the number of units in the entire population.

Frequencies() are frequencies expressed as a percentage of the total. The sum of all frequencies expressed as percentages must be equal to 100% in fractions of one.

Graphic image distribution series

The distribution series are visually presented using graphical images.

The distribution series are depicted as:

§ Polygon

§ Histograms

§ Cumulates

Polygon

When constructing a polygon, the values ​​of the varying characteristic are plotted on the horizontal axis (x-axis), and frequencies or frequencies are plotted on the vertical axis (y-axis).

1. Polygon in Fig. 6.1 is based on data from the micro-census of the population of Russia in 1994.


bar chart



To construct a histogram, the values ​​of the boundaries of the intervals are indicated along the abscissa axis and, based on them, rectangles are constructed, the height of which is proportional to the frequencies (or frequencies).

In Fig. 6.2. shows a histogram of the distribution of the population of Russia in 1997 by age groups.

Fig.1. Distribution of the Russian population by age groups

Empirical function distributions, properties.

Let the statistical distribution of frequencies of a quantitative characteristic X be known. Let us denote by the number of observations in which a value of the characteristic was observed that was less than x and by n – total number observations. Obviously, the relative frequency of event X

An empirical distribution function (sampling distribution function) is a function that determines for each value x the relative frequency of the event X

In contrast to the empirical distribution function of a sample, the population distribution function is called the theoretical distribution function. The difference between these functions is that the theoretical function determines the probability of event X

As n increases, the relative frequency of the event X

Basic properties

Let an elementary outcome be fixed. Then is the distribution function of the discrete distribution given by the following probability function:

where and - number of sample elements equal to . In particular, if all elements of the sample are different, then .

The mathematical expectation of this distribution is:

.

Thus, the sample mean is the theoretical mean of the sampling distribution.

Similarly, sample variance is the theoretical variance of a sampling distribution.

The random variable has a binomial distribution:

The sample distribution function is an unbiased estimate of the distribution function:

.

The variance of the sample distribution function has the form:

.

According to the strong law of large numbers, the sample distribution function converges almost certainly to the theoretical distribution function:

almost certainly at .

The sample distribution function is an asymptotically normal estimate of the theoretical distribution function. If , then

According to the distribution at .

Determination of the empirical distribution function

Let $X$ be a random variable. $F(x)$ is the distribution function of a given random variable. We will carry out $n$ experiments on a given random variable under the same conditions, independent from each other. In this case, we obtain a sequence of values ​​$x_1,\ x_2\ $, ... ,$\ x_n$, which is called a sample.

Definition 1

Each value $x_i$ ($i=1,2\ $, ... ,$ \ n$) is called a variant.

One estimate of the theoretical distribution function is the empirical distribution function.

Definition 3

An empirical distribution function $F_n(x)$ is a function that determines for each value $x$ the relative frequency of the event $X \

where $n_x$ is the number of options less than $x$, $n$ is the sample size.

The difference between the empirical function and the theoretical one is that the theoretical function determines the probability of the event $X

Properties of the empirical distribution function

Let us now consider several basic properties of the distribution function.

    The range of the function $F_n\left(x\right)$ is the segment $$.

    $F_n\left(x\right)$ is a non-decreasing function.

    $F_n\left(x\right)$ is a left continuous function.

    $F_n\left(x\right)$ is a piecewise constant function and increases only at points of values ​​of the random variable $X$

    Let $X_1$ be the smallest and $X_n$ the largest variant. Then $F_n\left(x\right)=0$ for $(x\le X)_1$ and $F_n\left(x\right)=1$ for $x\ge X_n$.

Let us introduce a theorem that connects the theoretical and empirical functions.

Theorem 1

Let $F_n\left(x\right)$ be the empirical distribution function, and $F\left(x\right)$ be the theoretical distribution function of the general sample. Then the equality holds:

\[(\mathop(lim)_(n\to \infty ) (|F)_n\left(x\right)-F\left(x\right)|=0\ )\]

Examples of problems on finding the empirical distribution function

Example 1

Let the sampling distribution have the following data recorded using a table:

Picture 1.

Find the sample size, create an empirical distribution function and plot it.

Sample size: $n=5+10+15+20=50$.

By property 5, we have that for $x\le 1$ $F_n\left(x\right)=0$, and for $x>4$ $F_n\left(x\right)=1$.

$x value

$x value

$x value

Thus we get:

Figure 2.

Figure 3.

Example 2

20 cities were randomly selected from the cities of the central part of Russia, for which the following data on public transport fares were obtained: 14, 15, 12, 12, 13, 15, 15, 13, 15, 12, 15, 14, 15, 13 , 13, 12, 12, 15, 14, 14.

Create an empirical distribution function for this sample and plot it.

Let's write down the sample values ​​in ascending order and calculate the frequency of each value. We get the following table:

Figure 4.

Sample size: $n=20$.

By property 5, we have that for $x\le 12$ $F_n\left(x\right)=0$, and for $x>15$ $F_n\left(x\right)=1$.

$x value

$x value

$x value

Thus we get:

Figure 5.

Let's plot the empirical distribution:

Figure 6.

Originality: $92.12\%$.

As is known, the distribution law of a random variable can be specified in various ways. A discrete random variable can be specified using a distribution series or an integral function, and a continuous random variable can be specified using either an integral or a differential function. Let's consider selective analogues of these two functions.

Let there be a sample set of values ​​of some random volume variable and each option from this set is associated with its frequency. Let further is some real number, and – number of sample values ​​of the random variable
, smaller .Then the number is the frequency of the quantity values ​​observed in the sample X, smaller , those. frequency of occurrence of the event
. When it changes x in the general case, the value will also change . This means that the relative frequency is a function of the argument . And since this function is found from sample data obtained as a result of experiments, it is called selective or empirical.

Definition 10.15. Empirical distribution function(sampling distribution function) is the function
, defining for each value x relative frequency of the event
.

(10.19)

In contrast to the empirical sampling distribution function, the distribution function F(x) of the general population is called theoretical distribution function. The difference between them is that the theoretical function F(x) determines the probability of an event
, and the empirical one is the relative frequency of the same event. From Bernoulli's theorem it follows

,
(10.20)

those. at large probability
and relative frequency of the event
, i.e.
differ little from one another. From this it follows that it is advisable to use the empirical distribution function of the sample to approximate the theoretical (integral) distribution function of the general population.

Function
And
have the same properties. This follows from the definition of the function.

Properties
:


Example 10.4. Construct an empirical function based on the given sample distribution:

Options

Frequencies

Solution: Let's find the sample size n= 12+18+30=60. Smallest option
, hence,
at
. Meaning
, namely
observed 12 times, therefore:

=
at
.

Meaning x< 10, namely
And
were observed 12+18=30 times, therefore,
=
at
. At

.

The required empirical distribution function:

=

Schedule
shown in Fig. 10.2

R
is. 10.2

Control questions

1. What main problems does mathematical statistics solve? 2. General and sample population? 3. Define sample size. 4. What samples are called representative? 5. Errors of representativeness. 6. Basic methods of sampling. 7. Concepts of frequency, relative frequency. 8. The concept of statistical series. 9. Write down the Sturges formula. 10. Formulate the concepts of sample range, median and mode. 11. Frequency polygon, histogram. 12. The concept of a point estimate of a sample population. 13. Biased and unbiased point estimate. 14. Formulate the concept of a sample average. 15. Formulate the concept of sample variance. 16. Formulate the concept of sample standard deviation. 17. Formulate the concept of sample coefficient of variation. 18. Formulate the concept of sample geometric mean.

Find out what the empirical formula is. In chemistry, EP is the simplest way to describe a compound—essentially a list of the elements that make up a compound based on their percentage. It should be noted that this simple formula does not describe order atoms in a compound, it simply indicates what elements it consists of. For example:

  • A compound consisting of 40.92% carbon; 4.58% hydrogen and 54.5% oxygen will have the empirical formula C 3 H 4 O 3 (an example of how to find the EF of this compound will be discussed in the second part).
  • Understand the term "percentage composition.""Percentage composition" refers to the percentage of each individual atom in the entire compound in question. To find the empirical formula of a compound, you need to know the percentage composition of the compound. If you are looking up an empirical formula for homework, then percentages will most likely be given.

    • To find the percentage composition of a chemical compound in the laboratory, it is subjected to some physical experiments and then quantitative analysis. Unless you are in a lab, you don't need to do these experiments.
  • Keep in mind that you will have to deal with gram atoms. A gram atom is a specific amount of a substance whose mass is equal to its atomic mass. To find the gram atom, you need to use the following equation: The percentage of an element in a compound is divided by the atomic mass of the element.

    • Let's say, for example, that we have a compound that contains 40.92% carbon. The atomic mass of carbon is 12, so our equation would be 40.92 / 12 = 3.41.
  • Know how to find atomic ratios. When working with a compound, you will end up with more than one gram atom. After finding all the gram atoms of your compound, look at them. In order to find the atomic ratio, you will need to select the smallest gram-atom value that you have calculated. Then you will need to divide all the gram atoms into the smallest gram atom. For example:

    • Let's say you are working with a compound containing three gram atoms: 1.5; 2 and 2.5. The smallest of these numbers is 1.5. Therefore, to find the ratio of atoms, you must divide all the numbers by 1.5 and put a ratio sign between them : .
    • 1.5 / 1.5 = 1. 2 / 1.5 = 1.33. 2.5 / 1.5 = 1.66. Therefore, the ratio of atoms is 1: 1,33: 1,66 .
  • Understand how to convert atomic ratio values ​​to integers. When writing an empirical formula, you must use whole numbers. This means you can't use numbers like 1.33. After you find the ratio of the atoms, you need to convert fractions (like 1.33) to whole numbers (like 3). To do this, you need to find an integer, multiplying each number of the atomic ratio by which you will get integers. For example:

    • Try 2. Multiply the atomic ratio numbers (1, 1.33, and 1.66) by 2. You get 2, 2.66, and 3.32. These are not integers, so 2 is not appropriate.
    • Try 3. If you multiply 1, 1.33 and 1.66 by 3, you get 3, 4 and 5 respectively. Therefore, the atomic ratio of integers has the form 3: 4: 5 .
  • Determination of the empirical distribution function

    Let $X$ be a random variable. $F(x)$ is the distribution function of a given random variable. We will carry out $n$ experiments on a given random variable under the same conditions, independent from each other. In this case, we obtain a sequence of values ​​$x_1,\ x_2\ $, ... ,$\ x_n$, which is called a sample.

    Definition 1

    Each value $x_i$ ($i=1,2\ $, ... ,$ \ n$) is called a variant.

    One estimate of the theoretical distribution function is the empirical distribution function.

    Definition 3

    An empirical distribution function $F_n(x)$ is a function that determines for each value $x$ the relative frequency of the event $X \

    where $n_x$ is the number of options less than $x$, $n$ is the sample size.

    The difference between the empirical function and the theoretical one is that the theoretical function determines the probability of the event $X

    Properties of the empirical distribution function

    Let us now consider several basic properties of the distribution function.

      The range of the function $F_n\left(x\right)$ is the segment $$.

      $F_n\left(x\right)$ is a non-decreasing function.

      $F_n\left(x\right)$ is a left continuous function.

      $F_n\left(x\right)$ is a piecewise constant function and increases only at points of values ​​of the random variable $X$

      Let $X_1$ be the smallest and $X_n$ the largest variant. Then $F_n\left(x\right)=0$ for $(x\le X)_1$ and $F_n\left(x\right)=1$ for $x\ge X_n$.

    Let us introduce a theorem that connects the theoretical and empirical functions.

    Theorem 1

    Let $F_n\left(x\right)$ be the empirical distribution function, and $F\left(x\right)$ be the theoretical distribution function of the general sample. Then the equality holds:

    \[(\mathop(lim)_(n\to \infty ) (|F)_n\left(x\right)-F\left(x\right)|=0\ )\]

    Examples of problems on finding the empirical distribution function

    Example 1

    Let the sampling distribution have the following data recorded using a table:

    Picture 1.

    Find the sample size, create an empirical distribution function and plot it.

    Sample size: $n=5+10+15+20=50$.

    By property 5, we have that for $x\le 1$ $F_n\left(x\right)=0$, and for $x>4$ $F_n\left(x\right)=1$.

    $x value

    $x value

    $x value

    Thus we get:

    Figure 2.

    Figure 3.

    Example 2

    20 cities were randomly selected from the cities of the central part of Russia, for which the following data on public transport fares were obtained: 14, 15, 12, 12, 13, 15, 15, 13, 15, 12, 15, 14, 15, 13 , 13, 12, 12, 15, 14, 14.

    Create an empirical distribution function for this sample and plot it.

    Let's write down the sample values ​​in ascending order and calculate the frequency of each value. We get the following table:

    Figure 4.

    Sample size: $n=20$.

    By property 5, we have that for $x\le 12$ $F_n\left(x\right)=0$, and for $x>15$ $F_n\left(x\right)=1$.

    $x value

    $x value

    $x value

    Thus we get:

    Figure 5.

    Let's plot the empirical distribution:

    Figure 6.

    Originality: $92.12\%$.

    mob_info