Bootstrap, small samples, application in data analysis. Small-sample statistics Methods of selecting units from a population

When controlling the quality of goods in economic research, an experiment can be conducted on the basis of a small sample.

Under small sample refers to a non-continuous statistical survey in which the sample population is formed from a relatively small number of units in the general population. The volume of a small sample usually does not exceed 30 units and can reach 4 - 5 units.

The average error of a small sample is calculated using the formula:

,

Where
- small sample variance.

When determining variance the number of degrees of freedom is n-1:

.

Marginal small sample error
determined by the formula

In this case, the value of the confidence coefficient t depends not only on the given confidence probability, but also on the number of sampling units n. For individual values ​​of t and n, the confidence probability of a small sample is determined using special Student tables (Table 9.1.), which give the distribution of standardized deviations:

.

Since when conducting a small sample, the value of 0.59 or 0.99 is practically accepted as a confidence probability, then to determine the maximum error of a small sample
The following Student distribution readings are used:

Ways to generalize sample characteristics to the population.

The sampling method is most often used to obtain characteristics of the population according to the corresponding sample indicators. Depending on the purposes of the research, this is done either by direct recalculation of sample indicators for the general population, or by calculating correction factors.

Direct recalculation method. It consists in the fact that the sample share indicators or average applies to the general population, taking into account sampling error.

Thus, in trade, the number of non-standard products received in a consignment is determined. To do this (taking into account the accepted degree of probability), the indicators of the share of non-standard products in the sample are multiplied by the number of products in the entire batch of goods.

Method of correction factors. It is used in cases where the purpose of the sampling method is to clarify the results of continuous accounting.

In statistical practice, this method is used to clarify data from annual censuses of livestock owned by the population. To do this, after generalizing the data from the complete census, a 10% sample survey is used to determine the so-called “percentage of undercounting”.

Methods for selecting units from the general population.

In statistics, various methods of forming sample populations are used, which is determined by the objectives of the study and depends on the specifics of the object of study.

The main condition for conducting a sample survey is to prevent the occurrence of systematic errors that arise as a result of violation of the principle of equal opportunity for each unit of the general population to be included in the sample. Prevention of systematic errors is achieved through the use of scientifically based methods for forming a sample population.

There are the following methods for selecting units from the population:

1) individual selection - individual units are selected for the sample;

2) group selection - the sample includes qualitatively homogeneous groups or series of units being studied;

3) combined selection is a combination of individual and group selection.

Selection methods are determined by the rules for forming a sample population.

The sample could be:

Properly random;

Mechanical;

Typical;

Serial;

Combined.

Proper random sampling consists in the fact that the sample population is formed as a result of random (unintentional) selection of individual units from the general population. In this case, the number of units selected in the sample population is usually determined based on the accepted sample proportion.

The sample proportion is the ratio of the number of units in the sample population n to the number of units in the general population N, i.e.

.

So, with a 5% sample from a batch of goods of 2,000 units. sample size n is 100 units. (5*2000:100), and with a 20% sample it will be 400 units. (20*2000:100), etc.

Mechanical sampling consists in the fact that the selection of units in the sample population is made from the general population, divided into equal intervals (groups). In this case, the size of the interval in the population is equal to the inverse of the sample proportion.

So, with a 2% sample, every 50th unit is selected (1:0.02), with a 5% sample, every 20th unit (1:0.05), etc.

Thus, in accordance with the accepted proportion of selection, the general population is, as it were, mechanically divided into groups of equal size. From each group, only one unit is selected for the sample.

An important feature of mechanical sampling is that the formation of a sample population can be carried out without resorting to compiling lists. In practice, the order in which the units of the population are actually located is often used. For example, the sequence of exit of finished products from a conveyor or production line, the order of placement of units of a batch of goods during storage, transportation, sales, etc.

Typical sample. In typical sampling, the population is first divided into homogeneous typical groups. Then, from each typical group, a purely random or mechanical sample is used to individually select units into the sample population.

Sample sampling is usually used when studying complex statistical populations. For example, in a sample survey of labor productivity of trade workers, consisting of separate groups by qualification.

An important feature of a typical sample is that it gives more accurate results compared to other methods of selecting units in the sample population.

To determine the average error of a typical sample, the following formulas are used:

re-selection

,

non-repetitive selection

,

The variance is determined using the following formulas:

,

At single stage In a sample, each selected unit is immediately studied according to a given characteristic. This is the case with purely random and serial sampling.

At multi-stage In the sample, individual groups are selected from the general population, and individual units are selected from the groups. This is how a typical sample is made with a mechanical method of selecting units into the sample population.

Combined sampling can be two-stage. In this case, the population is first divided into groups. Then the groups are selected, and within the latter the individual units are selected.

The extension of sample characteristics to the general population, based on the law of large numbers, requires a sufficiently large sample size. However, in the practice of statistical research, one often encounters the impossibility, for one reason or another, of increasing the number of sample units that have a small size. This applies to studying the activities of enterprises, educational institutions, commercial banks, etc., the number of which in the regions is, as a rule, insignificant, and sometimes amounts to only 5-10 units.

In the case when the sample population consists of a small number of units, less than 30, the sample is called small In this case, Lyapunov’s theorem cannot be used to calculate the sampling error, since the sample mean is significantly influenced by the value of each of the randomly selected units and its distribution may differ significantly from normal.

In 1908 V.S. Gosset proved that the estimate of the discrepancy between the sample mean of a small sample and the general mean has a special distribution law (see Chapter 4). Dealing with the problem of probabilistic estimation of a sample mean with a small number of observations, he showed that in this case it is necessary to consider the distribution not of the sample means themselves, but of the magnitude of their deviations from the mean of the original population. In this case, the conclusions can be quite reliable.

Student's discovery is called small sample theory.

When assessing the results of a small sample, the value of the general variance is not used in the calculations. In small samples, the “corrected” sample variance is used to calculate the average sampling error:

those. in contrast to large samples in the denominator instead P costs (and - 1). The calculation of the average sampling error for a small sample is given in table. 5.7.

Table 5.7

Calculation of the average error of a small sample

The marginal error of a small sample is: where t- trust factor.

Magnitude t relates differently to probable estimation than with a large sample. In accordance with the Student distribution, the probable estimate depends on both the value t, and on the sample size I in the event that the marginal error does not exceed r-fold the average error in small samples. However, it largely depends on the number of units selected.

V.S. Gosset compiled a table of probability distributions in small samples corresponding to given values ​​of the confidence coefficient t and different volumes of a small sample and, an excerpt from it is given in table. 5.8.

Table 5.8

Fragment of Student's probability table (probabilities multiplied by 1000)

Table data 5.8 indicate that with an unlimited increase in the sample size (i = °°), the Student distribution tends to the normal distribution law, and at i = 20 it differs little from it.

The Student distribution table is often given in a different form, more convenient for practical use (Table 5.9).

Table 5.9

Some values ​​(Student's t-distributions

Number of degrees of freedom

for one-way interval

for two-way spacing

P= 0,99

Let's look at how to use the distribution table. Each fixed value P calculate the number of degrees of freedom k, Where k = n - 1. For each value of the degree of freedom, the limit value is indicated t p (t 095 or t 0 99), which with a given probability R will not be exceeded due to random fluctuations in the sampling results. Based on magnitude t p the boundaries of trust are determined

interval

As a rule, the confidence level used in two-sided testing is P = 0.95 or P = 0.99, which does not exclude the choice of other probability values. The probability value is selected based on the specific requirements of the tasks for which a small sample is used.

The probability of the general average values ​​going beyond the confidence interval is equal to q, Where q = 1 - R. This value is very small. Accordingly, for the considered probabilities R it is 0.05 and 0.01.

Small samples are widespread in the technical sciences and biology, but they must be used in statistical research with great caution, only with appropriate theoretical and practical examination. A small sample can be used only if the distribution of the characteristic in the population is normal or close to it, and the average value is calculated from sample data obtained as a result of independent observations. In addition, keep in mind that the accuracy of results from a small sample size is lower than from a large sample size.

Sampling – a group of objects limited in number (in psychology - subjects, respondents) specially selected from the general population to study its properties.

Population - this is the entire set of objects in relation to which a research hypothesis is formulated.

Studying the properties of a population using a sample is called sample study. Almost all psychological studies are selective, and their conclusions apply to general populations.

The main requirement for a sample of subjects is its representativeness - representativeness, indicativeness, compliance of the characteristics obtained as a result of a partial (sample) examination of any group with the characteristics of this group as a whole. . The researcher must be aware of the generalizability of the findings of a particular survey to the entire population of which the study group is a part.

Great care must be taken in selecting subjects in an empirical study. It is important to take into account gender, age, social status, level of education, health status, individual psychological characteristics of the subjects and other parameters that may influence the results.

There are two main types of sampling: probabilistic(built on mathematical and statistical calculations) and target(given by the purpose of the study and determined by the availability, typicality and equal representation of subjects).

In a strict sense, only a probability sample can be representative, because it corresponds to the principle of randomization: an equally equal probability of each member of the population being included in the sample population. There are the following types of probability sampling: simple, random, systematic, stratified, cluster, multi-stage.

Most often in psychological research, purposive selection is used, using purposive sampling. The criteria for constructing a target sample are: accessibility, typicality, equal representation. In this regard, the following types of sampling can be distinguished according to the principle of targeted selection: sampling based on the principle of available cases; selection of critical or typical cases; sampling based on the “snowball” method; quota sampling.

Sampling based on the principle of available cases– the most common option for sampling subjects. It is used when studying large groups of subjects who do not have unique, specific parameters.

Sampling based on the principle of selecting critical or typical cases, built on the basis of theoretical concepts or previous empirical experience of the researcher. From the entire population of subjects being examined, those who have the necessary specific characteristics are selected.

Example: The study sample consists of parents who assess the situation of their child entering school as stressful.

Sampling constructed using the “snowball” method or the “rare” population method. Initially, one or more people of the sample population of interest to the researcher are interviewed, who subsequently serve as sources of information about other members of this population. The sample expands exponentially, like a forming “snowball”. This method is used when subjects, for various reasons, do not advertise their membership in a particular group of people.

Example: The sample consists of scientists whose research concerns a narrow scientific problem.

Quota sampling is associated with the division of the study population into subgroups based on socio-demographic or other characteristics that are important for the study. Based on the known proportions of certain groups in the general population, the researcher allocates a “quota” for each subgroup being examined. (Socio-demographic data can be found in statistical collections published annually by regional statistics departments).

Example: The study sample includes men and women of pre-retirement age – 50-60 years old. According to statistics, men of this age make up 46%, and women – 54% of the general population. Therefore, with a total sample size of 100 people, at least 46 men and 54 women should be examined.

One of the important questions of psychological research is question of sample size , which should provide evidence of the conclusions of scientific research. Based on the methods of mathematical processing, the following requirements are imposed on the sample size:

    The largest sample size is required when developing a diagnostic technique - from 200 to 1000-2500 people.

    When comparing two samples, their total number should be at least 50 people. In this case, the number of compared samples should be approximately the same.

    When studying the relationship between properties, traits, etc. The sample size should be at least 30-35 people.

    If factor analysis is used to process data, it is important to remember that reliable factor solutions can only be obtained if the number of subjects exceeds the number of registered variables by three or more times.

    The greater the variability of the property being studied, the larger the sample size.

    It is advisable to increase the number of subjects by 5-10% compared to the planned one, since some of the forms received will be rejected during the study (they did not understand the instructions, did not accept the task, gave deviating results, etc.).

Dependent and independent samples

Often, a study is structured in such a way that a property of interest to the researcher is studied using two or more samples for the purpose of further comparison. These samples can be in different proportions, depending on the purpose and objectives of the study.

Independent samples and are characterized by the fact that the probability of selecting any subject from one sample does not depend on the selection of any subject from another sample.

Dependent samples are characterized by the fact that each subject from one sample is matched according to a certain criterion by a subject from another sample.

Example 1: Dependent samples are two series of values ​​obtained from examining the same group of subjects: the state of some property is measured “before” and “after” the experimental influence.

In this case, the samples (one “before”, the other “after” the impact) are dependent to the greatest possible extent, since they include the same subjects.

Example 2: Dependent samples: husbands – 1st sample, wives – 2nd sample.

Example 3: Dependent samples: children 5-7 years old - 1 sample, their brothers and sisters - 2 sample.

Examples 2 and 3 present options for less dependent samples.

In general, dependent samples involve pairwise selection of subjects into compared samples, and independent samples imply an independent selection of subjects.

  • 6. Types of statistical groupings, their cognitive significance.
  • 7.Statistical tables: types, construction rules, reading techniques
  • 8.Absolute quantities: types, cognitive significance. Conditions for the scientific use of absolute and relative indicators.
  • 9. Average values: content, types, types, scientific conditions of application.
  • 11.Dispersion properties. The rule for adding (decomposing) variance and its use in statistical analysis.
  • 12.Types of statistical graphs according to the content of the problems being solved and methods of construction.
  • 13. Dynamic series: types, analysis indicators.
  • 14. Methods for identifying trends in time series.
  • 15. Indices: definition, main elements of indices, problems solved with the help of indices, index system in statistics.
  • 16. Rules for constructing dynamic and territorial indices.
  • 17. Fundamentals of the theory of the sampling method.
  • 18. Small sample theory.
  • 19. Methods for selecting units in the sample population.
  • 20.Types of connections, statistical methods for analyzing relationships, the concept of correlation.
  • 21. Contents of correlation analysis, correlation models.
  • 22.Assessment of the strength (closeness) of the correlation connection.
  • 23. System of indicators of socio-economic statistics.
  • 24. Basic groupings and classifications in socio-economic statistics.
  • 25. National wealth: category content and composition.
  • 26. Contents of the land cadastre. Indicators of land composition by type of ownership, intended purpose and type of land.
  • 27. Classification of fixed assets, methods of evaluation and revaluation, indicators of movement, condition and use.
  • 28. Objectives of labor statistics. The concept and content of the main categories of the labor market.
  • 29. Statistics on the use of labor and working time.
  • 30. Labor productivity indicators and methods of analysis.
  • 31. Indicators of crop production and agricultural yields. Crops and lands.
  • 32. Indicators of livestock production and productivity of farm animals.
  • 33. Statistics of public costs and production costs.
  • 34. Statistics of wages and labor costs.
  • 35.Statistics of gross output and income.
  • 36. Indicators of movement and sales of agricultural products.
  • 37.Tasks of statistical analysis of agricultural enterprises.
  • 38. Statistics of prices and goods in sectors of the national economy: tasks and methods of analysis.
  • 39. Statistics of the market of goods and services.
  • 40. Statistics of social production indicators.
  • 41.Statistical analysis of consumer market prices.
  • 42.Inflation statistics and main indicators of its assessment.
  • 43.Tasks of financial statistics of enterprises.
  • 44. Main indicators of financial results of enterprises.
  • 45.Tasks of state budget statistics.
  • 46. ​​System of indicators of state budget statistics.
  • 47. System of indicators of monetary circulation statistics.
  • 48. Statistics of the composition and structure of the money supply in the country.
  • 49. The main tasks of banking statistics.
  • 50. Main indicators of banking statistics.
  • 51. Concept and classification of credit. Objectives of its statistical study.
  • 52.System of credit statistics indicators.
  • 53. Main indicators and methods of analysis of savings business.
  • 54.Tasks of statistics of the stock market and securities.
  • 56. Statistics of commodity exchanges: objectives and system of indicators.
  • 57. System of national accounts: concepts, main categories and classification.
  • 58.Basic principles of constructing a snc.
  • 59. Main macroeconomic indicators – content, methods of determination.
  • 60. Inter-industry balance: concepts, tasks, types of mob.
  • 62.Statistics of income and expenses of the population
  • 18. Small sample theory.

    With a large number of units in the sample population (n>100), the distribution of random errors of the sample mean in accordance with A.M. Lyapunov’s theorem is normal or approaches normal as the number of observations increases.

    However, in the practice of statistical research in a market economy, one increasingly has to deal with small samples.

    A small sample is a sample observation whose number of units does not exceed 30.

    When assessing the results of a small sample, the population size is not used. To determine possible error limits, the Student's test is used.

    The value of σ is calculated based on sample observation data.

    This value is used only for the population under study, and not as an approximate estimate of σ in the population.

    The probabilistic assessment of the results of a small sample differs from the assessment in a large sample in that with a small number of observations, the probability distribution for the average depends on the number of selected units.

    However, for a small sample, the value of the confidence coefficient t is related to the probability assessment differently than for a large sample (since the distribution law differs from normal).

    According to the distribution law established by Student, the probable distribution error depends on both the value of the confidence coefficient t and the sample size B.

    The average error of a small sample is calculated using the formula:

    where is the small sample variance.

    In MV, the coefficient n/(n-1) must be taken into account and must be adjusted. When determining the dispersion S2, the number of degrees of freedom is equal to:

    .

    The marginal error of a small sample is determined by the formula

    In this case, the value of the confidence coefficient t depends not only on the given confidence probability, but also on the number of sampling units n. For individual values ​​of t and n, the confidence probability of a small sample is determined using special Student tables, which give the distribution of standardized deviations:

    The probabilistic assessment of the results of MV differs from the assessment in BB in that with a small number of observations, the probability distribution for the average depends on the number of selected units

    19. Methods for selecting units in the sample population.

    1. The sample population must be large enough in size.

    2. The structure of the sample population should best reflect the structure of the general population

    3. The selection method must be random

    Depending on whether the selected units participate in the sample, a distinction is made between the non-repetitive and repeated methods.

    Non-repetitive selection is a selection in which a unit included in the sample does not return to the population from which further selection is carried out.

    Calculation of the average error of a non-repetitive random sample:

    Calculation of the maximum error of a non-repetitive random sample:

    In case of repeated selection, the unit included in the sample, after recording the observed characteristics, is returned to the original (general) population to participate in the further selection procedure.

    The average error of repeated simple random sampling is calculated as follows:

    Calculation of the maximum error of repeated random sampling:

    The type of formation of the sample population is divided into individual, group and combined.

    Selection method - determines the specific mechanism for selecting units from the general population and is divided into: actually - random; mechanical; typical; serial; combined.

    Actually – random the most common method of selection in a random sample, it is also called the drawing of lots, in which a ticket with a serial number is prepared for each unit of the statistical population. Next, the required number of units of the statistical population is randomly selected. Under these conditions, each of them has the same probability of being included in the sample.

    Mechanical sampling. It is used in cases where the general population is ordered in some way, i.e. there is a certain sequence in the arrangement of units.

    To determine the average error of mechanical sampling, the formula for the average error in actual random non-repetitive sampling is used.

    Typical selection. It is used when all units in the general population can be divided into several typical groups. Typical selection involves selecting units from each group in a purely random or mechanical way.

    For a typical sample, the standard error depends on the accuracy of the group means. Thus, in the formula for the maximum error of a typical sample, the average of the group variances is taken into account, i.e.

    Serial selection. It is used in cases where population units are combined into small groups or series. The essence of serial sampling lies in the actual random or mechanical selection of series, within which a continuous examination of units is carried out.

    With serial sampling, the magnitude of the sampling error depends not on the number of units studied, but on the number of surveyed series (s) and on the magnitude of intergroup dispersion:

    Combined selection may go through one or more stages. A sample is called single-stage if once selected units of the population are studied.

    The sample is called multi-stage, if the selection of a population takes place in stages, successive stages, and each stage, stage of selection has its own unit of selection.

    "
    mob_info