What does dispersion show? How to calculate the variance of a random variable

The variance of a random variable is a measure of the spread of the values ​​of this variable. Low variance means that the values ​​are clustered close together. Large dispersion indicates a strong spread of values. The concept of variance of a random variable is used in statistics. For example, if you compare the variance of two values ​​(such as between male and female patients), you can test the significance of a variable. Variance is also used when building statistical models, since low variance can be a sign that you are overfitting the values.

Steps

Calculating sample variance

  1. Record the sample values. In most cases, statisticians only have access to samples of specific populations. For example, as a rule, statisticians do not analyze the cost of maintaining the totality of all cars in Russia - they analyze a random sample of several thousand cars. Such a sample will help determine the average cost of a car, but most likely the resulting value will be far from the real one.

    • For example, let's analyze the number of buns sold in a cafe over 6 days, taken in random order. The sample looks like this: 17, 15, 23, 7, 9, 13. This is a sample, not a population, because we do not have data on buns sold for each day the cafe is open.
    • If you are given a population rather than a sample of values, continue to the next section.
  2. Write down a formula to calculate sample variance. Dispersion is a measure of the spread of values ​​of a certain quantity. How closer value dispersion to zero, the closer the values ​​are grouped to each other. When working with a sample of values, use the following formula to calculate variance:

    • s 2 (\displaystyle s^(2)) = ∑[(x i (\displaystyle x_(i))- x̅) 2 (\displaystyle ^(2))] / (n - 1)
    • s 2 (\displaystyle s^(2))– this is dispersion. Dispersion is measured in square units.
    • x i (\displaystyle x_(i))– each value in the sample.
    • x i (\displaystyle x_(i)) you need to subtract x̅, square it, and then add the results.
    • x̅ – sample mean (sample mean).
    • n – number of values ​​in the sample.
  3. Calculate the sample mean. It is denoted as x̅. The sample mean is calculated as a simple arithmetic mean: add up all the values ​​in the sample, and then divide the result by the number of values ​​in the sample.

    • In our example, add the values ​​in the sample: 15 + 17 + 23 + 7 + 9 + 13 = 84
      Now divide the result by the number of values ​​in the sample (in our example there are 6): 84 ÷ 6 = 14.
      Sample mean x̅ = 14.
    • The sample mean is the central value around which the values ​​in the sample are distributed. If the values ​​in the sample cluster around the sample mean, then the variance is small; otherwise the variance is large.
  4. Subtract the sample mean from each value in the sample. Now calculate the difference x i (\displaystyle x_(i))- x̅, where x i (\displaystyle x_(i))– each value in the sample. Each result obtained indicates the extent to which a particular value deviates from the sample mean, that is, how far this value is from the sample mean.

    • In our example:
      x 1 (\displaystyle x_(1))- x = 17 - 14 = 3
      x 2 (\displaystyle x_(2))- x̅ = 15 - 14 = 1
      x 3 (\displaystyle x_(3))- x = 23 - 14 = 9
      x 4 (\displaystyle x_(4))- x̅ = 7 - 14 = -7
      x 5 (\displaystyle x_(5))- x̅ = 9 - 14 = -5
      x 6 (\displaystyle x_(6))- x̅ = 13 - 14 = -1
    • The correctness of the results obtained is easy to check, since their sum should be equal to zero. This is related to the determination of the average value, since negative values(distances from the average value to smaller values) are fully compensated positive values(distances from average to large values).
  5. As noted above, the sum of the differences x i (\displaystyle x_(i))- x̅ must be equal to zero. This means that the average variance is always zero, which does not give any idea about the spread of values ​​of a certain quantity. To solve this problem, square each difference x i (\displaystyle x_(i))- x̅. This will result in you only getting positive numbers, which when added will never give 0.

    • In our example:
      (x 1 (\displaystyle x_(1))- x̅) 2 = 3 2 = 9 (\displaystyle ^(2)=3^(2)=9)
      (x 2 (\displaystyle (x_(2))- x̅) 2 = 1 2 = 1 (\displaystyle ^(2)=1^(2)=1)
      9 2 = 81
      (-7) 2 = 49
      (-5) 2 = 25
      (-1) 2 = 1
    • You found the square of the difference - x̅) 2 (\displaystyle ^(2)) for each value in the sample.
  6. Calculate the sum of the squares of the differences. That is, find that part of the formula that is written like this: ∑[( x i (\displaystyle x_(i))- x̅) 2 (\displaystyle ^(2))]. Here the sign Σ means the sum of squared differences for each value x i (\displaystyle x_(i)) in the sample. You have already found the squared differences (x i (\displaystyle (x_(i))- x̅) 2 (\displaystyle ^(2)) for each value x i (\displaystyle x_(i)) in the sample; now just add these squares.

    • In our example: 9 + 1 + 81 + 49 + 25 + 1 = 166 .
  7. Divide the result by n - 1, where n is the number of values ​​in the sample. Some time ago, to calculate sample variance, statisticians simply divided the result by n; in this case you will get the mean of the squared variance, which is ideal for describing the variance of a given sample. But remember that any sample is only a small part of the population of values. If you take another sample and perform the same calculations, you will get a different result. As it turns out, dividing by n - 1 (rather than just n) gives a more accurate estimate of the population variance, which is what you're interested in. Division by n – 1 has become common, so it is included in the formula for calculating sample variance.

    • In our example, the sample includes 6 values, that is, n = 6.
      Sample variance = s 2 = 166 6 − 1 = (\displaystyle s^(2)=(\frac (166)(6-1))=) 33,2
  8. The difference between variance and standard deviation. Note that the formula contains an exponent, so the dispersion is measured in square units of the value being analyzed. Sometimes such a magnitude is quite difficult to operate; in such cases, use the standard deviation, which is equal to the square root of the variance. That is why the sample variance is denoted as s 2 (\displaystyle s^(2)), and the standard deviation of the sample is as s (\displaystyle s).

    • In our example, the standard deviation of the sample is: s = √33.2 = 5.76.

    Calculating Population Variance

    1. Analyze some set of values. The set includes all values ​​of the quantity under consideration. For example, if you are studying the age of residents Leningrad region, then the population includes the ages of all residents of this area. When working with a population, it is recommended to create a table and enter the population values ​​into it. Consider the following example:

      • In a certain room there are 6 aquariums. Each aquarium contains the following number of fish:
        x 1 = 5 (\displaystyle x_(1)=5)
        x 2 = 5 (\displaystyle x_(2)=5)
        x 3 = 8 (\displaystyle x_(3)=8)
        x 4 = 12 (\displaystyle x_(4)=12)
        x 5 = 15 (\displaystyle x_(5)=15)
        x 6 = 18 (\displaystyle x_(6)=18)
    2. Write down a formula to calculate the population variance. Since the population includes all values ​​of a certain quantity, the formula below allows you to obtain the exact value of the population variance. To distinguish population variance from sample variance (which is only an estimate), statisticians use various variables:

      • σ 2 (\displaystyle ^(2)) = (∑(x i (\displaystyle x_(i)) - μ) 2 (\displaystyle ^(2)))/n
      • σ 2 (\displaystyle ^(2))– population dispersion (read as “sigma squared”). Dispersion is measured in square units.
      • x i (\displaystyle x_(i))– each value in total.
      • Σ – sum sign. That is, from each value x i (\displaystyle x_(i)) you need to subtract μ, square it, and then add the results.
      • μ – population mean.
      • n – number of values ​​in the population.
    3. Calculate the population mean. When working with a population, its mean is denoted as μ (mu). The population mean is calculated as a simple arithmetic mean: add up all the values ​​in the population, and then divide the result by the number of values ​​in the population.

      • Keep in mind that averages are not always calculated as the arithmetic mean.
      • In our example, the population mean: μ = 5 + 5 + 8 + 12 + 15 + 18 6 (\displaystyle (\frac (5+5+8+12+15+18)(6))) = 10,5
    4. Subtract the population mean from each value in the population. The closer the difference is to zero, the closer the specific value is to the population mean. Find the difference between each value in the population and its mean, and you will get a first idea of ​​the distribution of values.

      • In our example:
        x 1 (\displaystyle x_(1))- μ = 5 - 10.5 = -5.5
        x 2 (\displaystyle x_(2))- μ = 5 - 10.5 = -5.5
        x 3 (\displaystyle x_(3))- μ = 8 - 10.5 = -2.5
        x 4 (\displaystyle x_(4))- μ = 12 - 10.5 = 1.5
        x 5 (\displaystyle x_(5))- μ = 15 - 10.5 = 4.5
        x 6 (\displaystyle x_(6))- μ = 18 - 10.5 = 7.5
    5. Square each result obtained. The difference values ​​will be both positive and negative; If these values ​​are plotted on a number line, they will lie to the right and left of the population mean. This is not suitable for calculating variance, since positive and negative numbers compensate each other. So square each difference to get exclusively positive numbers.

      • In our example:
        (x i (\displaystyle x_(i)) - μ) 2 (\displaystyle ^(2)) for each population value (from i = 1 to i = 6):
        (-5,5)2 (\displaystyle ^(2)) = 30,25
        (-5,5)2 (\displaystyle ^(2)), Where x n (\displaystyle x_(n))– the last value in the population.
      • To calculate the average value of the results obtained, you need to find their sum and divide it by n:(( x 1 (\displaystyle x_(1)) - μ) 2 (\displaystyle ^(2)) + (x 2 (\displaystyle x_(2)) - μ) 2 (\displaystyle ^(2)) + ... + (x n (\displaystyle x_(n)) - μ) 2 (\displaystyle ^(2)))/n
      • Now let's write down the above explanation using variables: (∑( x i (\displaystyle x_(i)) - μ) 2 (\displaystyle ^(2))) / n and get a formula for calculating the population variance.

In the previous one, we presented a number of formulas that allow us to find the numerical characteristics of functions when the laws of distribution of arguments are known. However, in many cases, to find the numerical characteristics of functions, it is not necessary to even know the laws of distribution of arguments, but it is enough to know only some of their numerical characteristics; at the same time, we generally do without any laws of distribution. Determining the numerical characteristics of functions from given numerical characteristics of arguments is widely used in probability theory and can significantly simplify the solution of a number of problems. Most of these simplified methods relate to linear functions; however, some elementary nonlinear functions also allow a similar approach.

In the present we will present a number of theorems on the numerical characteristics of functions, which together represent a very simple apparatus for calculating these characteristics, applicable in a wide range of conditions.

1. Mathematical expectation of a non-random value

The formulated property is quite obvious; it can be proven by considering a non-random variable as a special type of random, with one possible meaning with probability one; then according to the general formula for mathematical expectation:

.

2. Variance of a non-random quantity

If is a non-random value, then

3. Substituting a non-random value for the sign of mathematical expectation

, (10.2.1)

that is, a non-random value can be taken out as a sign of the mathematical expectation.

Proof.

a) For discontinuous quantities

b) For continuous quantities

.

4. Taking a non-random value out of the sign of dispersion and standard deviation

If is a non-random quantity, and is random, then

, (10.2.2)

that is, a non-random value can be taken out of the sign of the dispersion by squaring it.

Proof. By definition of variance

Consequence

,

that is, a non-random value can be taken out of the sign of the standard deviation by its absolute value. We obtain the proof by taking the square root from formula (10.2.2) and taking into account that the r.s.o. - a significantly positive value.

5. Mathematical expectation of the sum of random variables

Let us prove that for any two random variables and

that is, the mathematical expectation of the sum of two random variables is equal to the sum of their mathematical expectations.

This property is known as the theorem of addition of mathematical expectations.

Proof.

a) Let be a system of discontinuous random variables. Let us apply the general formula (10.1.6) to the sum of random variables for the mathematical expectation of a function of two arguments:

.

Ho represents nothing more than the total probability that the quantity will take the value :

;

hence,

.

We will similarly prove that

,

and the theorem is proven.

b) Let be a system of continuous random variables. According to formula (10.1.7)

. (10.2.4)

Let us transform the first of the integrals (10.2.4):

;

similarly

,

and the theorem is proven.

It should be specially noted that the theorem for adding mathematical expectations is valid for any random variables - both dependent and independent.

The theorem for adding mathematical expectations is generalized to an arbitrary number of terms:

, (10.2.5)

that is, the mathematical expectation of the sum of several random variables is equal to the sum of their mathematical expectations.

To prove it, it is enough to use the method of complete induction.

6. Mathematical expectation of a linear function

Consider a linear function of several random arguments:

where are non-random coefficients. Let's prove that

, (10.2.6)

i.e. the mathematical expectation of a linear function is equal to the same linear function of the mathematical expectations of the arguments.

Proof. Using the addition theorem of m.o. and the rule of placing a non-random quantity outside the sign of the m.o., we obtain:

.

7. Dispepthis sum of random variables

The variance of the sum of two random variables is equal to the sum of their variances plus twice the correlation moment:

Proof. Let's denote

According to the theorem of addition of mathematical expectations

Let's move from random variables to the corresponding centered variables. Subtracting equality (10.2.9) term by term from equality (10.2.8), we have:

By definition of variance

Q.E.D.

Formula (10.2.7) for the variance of the sum can be generalized to any number of terms:

, (10.2.10)

where is the correlation moment of the quantities, the sign under the sum means that the summation extends to all possible pairwise combinations of random variables .

The proof is similar to the previous one and follows from the formula for the square of a polynomial.

Formula (10.2.10) can be written in another form:

, (10.2.11)

where the double sum extends to all elements of the correlation matrix of the system of quantities , containing both correlation moments and variances.

If all random variables , included in the system, are uncorrelated (i.e., when ), formula (10.2.10) takes the form:

, (10.2.12)

that is, the variance of the sum of uncorrelated random variables is equal to the sum of the variances of the terms.

This position is known as the theorem of addition of variances.

8. Variance of a linear function

Let's consider a linear function of several random variables.

where are non-random quantities.

Let us prove that the dispersion of this linear function is expressed by the formula

, (10.2.13)

where is the correlation moment of the quantities , .

Proof. Let us introduce the notation:

. (10.2.14)

Applying formula (10.2.10) for the dispersion of the sum to the right side of expression (10.2.14) and taking into account that , we obtain:

where is the correlation moment of the quantities:

.

Let's calculate this moment. We have:

;

similarly

Substituting this expression into (10.2.15), we arrive at formula (10.2.13).

In the special case when all quantities are uncorrelated, formula (10.2.13) takes the form:

, (10.2.16)

that is, the variance of a linear function of uncorrelated random variables is equal to the sum of the products of the squares of the coefficients and the variances of the corresponding arguments.

9. Mathematical expectation of a product of random variables

The mathematical expectation of the product of two random variables is equal to the product of their mathematical expectations plus the correlation moment:

Proof. We will proceed from the definition of the correlation moment:

Let's transform this expression using the properties of mathematical expectation:

which is obviously equivalent to formula (10.2.17).

If random variables are uncorrelated, then formula (10.2.17) takes the form:

that is, the mathematical expectation of the product of two uncorrelated random variables is equal to the product of their mathematical expectations.

This position is known as the theorem of multiplication of mathematical expectations.

Formula (10.2.17) is nothing more than an expression of the second mixed central moment of the system through the second mixed initial moment and mathematical expectations:

. (10.2.19)

This expression is often used in practice when calculating the correlation moment in the same way that for one random variable the variance is often calculated through the second initial moment and the mathematical expectation.

The theorem of multiplication of mathematical expectations is generalized to an arbitrary number of factors, only in this case, for its application, it is not enough that the quantities are uncorrelated, but it is required that some higher mixed moments, the number of which depends on the number of terms in the product, vanish. These conditions are certainly satisfied if the random variables included in the product are independent. In this case

, (10.2.20)

that is, the mathematical expectation of the product of independent random variables is equal to the product of their mathematical expectations.

This proposition can be easily proven by complete induction.

10. Variance of the product of independent random variables

Let us prove that for independent quantities

Proof. Let's denote . By definition of variance

Since the quantities are independent, and

When independent, the quantities are also independent; hence,

,

But there is nothing more than the second initial moment of the magnitude, and, therefore, is expressed through the dispersion:

;

similarly

.

Substituting these expressions into formula (10.2.22) and bringing similar terms, we arrive at formula (10.2.21).

In the case when centered random variables (variables with mathematical expectations equal to zero) are multiplied, formula (10.2.21) takes the form:

, (10.2.23)

that is, the variance of the product of independent centered random variables is equal to the product of their variances.

11. Higher moments of the sum of random variables

In some cases it is necessary to calculate the highest moments of the sum of independent random variables. Let us prove some relations related here.

1) If the quantities are independent, then

Proof.

whence, according to the theorem of multiplication of mathematical expectations

But the first central moment for any quantity equal to zero; the two middle terms vanish, and formula (10.2.24) is proven.

Relation (10.2.24) is easily generalized by induction to an arbitrary number of independent terms:

. (10.2.25)

2) The fourth central moment of the sum of two independent random variables is expressed by the formula

where are the variances of the quantities and .

The proof is completely similar to the previous one.

Using the method of complete induction, it is easy to prove the generalization of formula (10.2.26) to an arbitrary number of independent terms.

Probability theory is a special branch of mathematics that is studied only by students of higher educational institutions. Do you like calculations and formulas? Aren't you scared by the prospects of getting acquainted with the normal distribution, ensemble entropy, mathematical expectation and dispersion of a discrete random variable? Then this subject will be very interesting to you. Let's get acquainted with several of the most important basic concepts of this branch of science.

Let's remember the basics

Even if you remember the simplest concepts of probability theory, do not neglect the first paragraphs of the article. The point is that without a clear understanding of the basics, you will not be able to work with the formulas discussed below.

So, some random event occurs, some experiment. As a result of the actions we take, we can get several outcomes - some of them occur more often, others less often. The probability of an event is the ratio of the number of actually obtained outcomes of one type to total number possible. Only knowing the classical definition of this concept can you begin to study the mathematical expectation and dispersion of continuous random variables.

Arithmetic mean

Back in school, during math lessons, you started working with the arithmetic mean. This concept is widely used in probability theory, and therefore cannot be ignored. The main thing for us is at the moment is that we will encounter it in the formulas for the mathematical expectation and dispersion of a random variable.

We have a sequence of numbers and want to find the arithmetic mean. All that is required of us is to sum up everything available and divide by the number of elements in the sequence. Let us have numbers from 1 to 9. The sum of the elements will be equal to 45, and we will divide this value by 9. Answer: - 5.

Dispersion

Speaking scientific language, dispersion is the average square of deviations of the obtained characteristic values ​​from the arithmetic mean. It is designated by one capital Latin letter D. What is needed to calculate it? For each element of the sequence, we calculate the difference between the existing number and the arithmetic mean and square it. There will be exactly as many values ​​as there can be outcomes for the event we are considering. Next, we sum up everything received and divide by the number of elements in the sequence. If we have five possible outcomes, then divide by five.

Dispersion also has properties that need to be remembered in order to be used when solving problems. For example, when a random variable increases by X times, the variance increases by X squared times (i.e. X*X). It is never less than zero and does not depend on shifting values ​​up or down by equal amounts. Additionally, for independent trials, the variance of the sum is equal to the sum of the variances.

Now we definitely need to consider examples of the variance of a discrete random variable and the mathematical expectation.

Let's say we ran 21 experiments and got 7 different outcomes. We observed each of them 1, 2, 2, 3, 4, 4 and 5 times, respectively. What will the variance be equal to?

First, let's calculate the arithmetic mean: the sum of the elements, of course, is 21. Divide it by 7, getting 3. Now subtract 3 from each number in the original sequence, square each value, and add the results together. The result is 12. Now all we have to do is divide the number by the number of elements, and, it would seem, that’s all. But there's a catch! Let's discuss it.

Dependence on the number of experiments

It turns out that when calculating variance, the denominator can contain one of two numbers: either N or N-1. Here N is the number of experiments performed or the number of elements in the sequence (which is essentially the same thing). What does this depend on?

If the number of tests is measured in hundreds, then we must put N in the denominator. If in units, then N-1. Scientists decided to draw the border quite symbolically: today it passes through the number 30. If we conducted less than 30 experiments, then we will divide the amount by N-1, and if more, then by N.

Task

Let's return to our example of solving the problem of variance and mathematical expectation. We got an intermediate number 12, which needed to be divided by N or N-1. Since we conducted 21 experiments, which is less than 30, we will choose the second option. So the answer is: the variance is 12 / 2 = 2.

Expectation

Let's move on to the second concept, which we must consider in this article. The mathematical expectation is the result of adding all possible outcomes multiplied by the corresponding probabilities. It is important to understand that the obtained value, as well as the result of calculating the variance, is obtained only once for the whole task, no matter how many outcomes are considered.

The formula for mathematical expectation is quite simple: we take the outcome, multiply by its probability, add the same for the second, third result, etc. Everything related to this concept is not difficult to calculate. For example, the sum of the expected values ​​is equal to the expected value of the sum. The same is true for the work. Not every quantity in probability theory allows you to perform such simple operations. Let's take the problem and calculate the meaning of two concepts we have studied at once. Besides, we were distracted by theory - it's time to practice.

Another example

We ran 50 trials and got 10 types of outcomes - numbers from 0 to 9 - appearing in different percentage. These are, respectively: 2%, 10%, 4%, 14%, 2%,18%, 6%, 16%, 10%, 18%. Recall that to obtain probabilities, you need to divide the percentage values ​​by 100. Thus, we get 0.02; 0.1, etc. Let us present an example of solving the problem for the variance of a random variable and the mathematical expectation.

We calculate the arithmetic mean using the formula that we remember from elementary school: 50/10 = 5.

Now let’s convert the probabilities into the number of outcomes “in pieces” to make it easier to count. We get 1, 5, 2, 7, 1, 9, 3, 8, 5 and 9. From each value obtained, we subtract the arithmetic mean, after which we square each of the results obtained. See how to do this using the first element as an example: 1 - 5 = (-4). Next: (-4) * (-4) = 16. For other values, do these operations yourself. If you did everything correctly, then after adding them all up you will get 90.

Let's continue calculating the variance and expected value by dividing 90 by N. Why do we choose N rather than N-1? Correct, because the number of experiments performed exceeds 30. So: 90/10 = 9. We got the variance. If you get a different number, don't despair. Most likely, you made a simple mistake in the calculations. Double-check what you wrote, and everything will probably fall into place.

Finally, remember the formula for mathematical expectation. We will not give all the calculations, we will only write an answer that you can check with after completing all the required procedures. The expected value will be 5.48. Let us only recall how to carry out operations, using the first elements as an example: 0*0.02 + 1*0.1... and so on. As you can see, we simply multiply the outcome value by its probability.

Deviation

Another concept closely related to dispersion and mathematical expectation is standard deviation. It is designated either in Latin letters sd, or Greek lowercase "sigma". This concept shows how much on average the values ​​deviate from the central feature. To find its value, you need to calculate square root from dispersion.

If you plot a normal distribution graph and want to see the square deviation directly on it, this can be done in several stages. Take half of the image to the left or right of the mode (central value), draw a perpendicular to the horizontal axis so that the areas of the resulting figures are equal. The size of the segment between the middle of the distribution and the resulting projection onto the horizontal axis will represent the standard deviation.

Software

As can be seen from the descriptions of the formulas and the examples presented, calculating variance and mathematical expectation is not the simplest procedure from an arithmetic point of view. In order not to waste time, it makes sense to use the program used in higher education educational institutions- it's called "R". It has functions that allow you to calculate values ​​for many concepts from statistics and probability theory.

For example, you specify a vector of values. This is done as follows: vector<-c(1,5,2…). Теперь, когда вам потребуется посчитать какие-либо значения для этого вектора, вы пишете функцию и задаете его в качестве аргумента. Для нахождения дисперсии вам нужно будет использовать функцию var. Пример её использования: var(vector). Далее вы просто нажимаете «ввод» и получаете результат.

In conclusion

Dispersion and mathematical expectation are without which it is difficult to calculate anything in the future. In the main course of lectures at universities, they are discussed already in the first months of studying the subject. It is precisely because of the lack of understanding of these simple concepts and the inability to calculate them that many students immediately begin to fall behind in the program and later receive bad grades at the end of the session, which deprives them of a scholarship.

Practice for at least one week, half an hour a day, solving problems similar to those presented in this article. Then, on any test in probability theory, you will be able to cope with the examples without extraneous tips and cheat sheets.

.

Conversely, if is a non-negative a.e. function such that , then there is an absolutely continuous probability measure on such that it is its density.

    Replacing the measure in the Lebesgue integral:

,

where is any Borel function that is integrable with respect to the probability measure .

Dispersion, types and properties of dispersion The concept of dispersion

Dispersion in statistics is found as the standard deviation of the individual values ​​of the characteristic squared from the arithmetic mean. Depending on the initial data, it is determined using the simple and weighted variance formulas:

1. Simple variance(for ungrouped data) is calculated using the formula:

2. Weighted variance (for variation series):

where n is frequency (repeatability of factor X)

An example of finding variance

This page describes a standard example of finding variance, you can also look at other problems for finding it

Example 1. Determination of group, group average, intergroup and total variance

Example 2. Finding the variance and coefficient of variation in a grouping table

Example 3. Finding variance in a discrete series

Example 4. The following data is available for a group of 20 correspondence students. It is necessary to construct an interval series of the distribution of the characteristic, calculate the average value of the characteristic and study its dispersion

Let's build an interval grouping. Let's determine the range of the interval using the formula:

where X max is the maximum value of the grouping characteristic; X min – minimum value of the grouping characteristic; n – number of intervals:

We accept n=5. The step is: h = (192 - 159)/ 5 = 6.6

Let's create an interval grouping

For further calculations, we will build an auxiliary table:

X"i – the middle of the interval. (for example, the middle of the interval 159 – 165.6 = 162.3)

We determine the average height of students using the weighted arithmetic average formula:

Let's determine the variance using the formula:

The formula can be transformed like this:

From this formula it follows that variance is equal to the difference between the average of the squares of the options and the square and the average.

Dispersion in variation series with equal intervals using the method of moments can be calculated in the following way using the second property of dispersion (dividing all options by the value of the interval). Determining variance, calculated using the method of moments, using the following formula is less laborious:

where i is the value of the interval; A is a conventional zero, for which it is convenient to use the middle of the interval with the highest frequency; m1 is the square of the first order moment; m2 - moment of second order

Alternative trait variance (if in a statistical population a characteristic changes in such a way that there are only two mutually exclusive options, then such variability is called alternative) can be calculated using the formula:

Substituting q = 1- p into this dispersion formula, we get:

Types of variance

Total variance measures the variation of a characteristic across the entire population as a whole under the influence of all factors that cause this variation. It is equal to the mean square of the deviations of individual values ​​of a characteristic x from the overall mean value of x and can be defined as simple variance or weighted variance.

Within-group variance characterizes random variation, i.e. part of the variation that is due to the influence of unaccounted factors and does not depend on the factor-attribute that forms the basis of the group. Such dispersion is equal to the mean square of the deviations of individual values ​​of the attribute within group X from the arithmetic mean of the group and can be calculated as simple dispersion or as weighted dispersion.

Thus, within-group variance measures variation of a trait within a group and is determined by the formula:

where xi is the group average; ni is the number of units in the group.

For example, intragroup variances that need to be determined in the task of studying the influence of workers’ qualifications on the level of labor productivity in a workshop show variations in output in each group caused by all possible factors (technical condition of equipment, availability of tools and materials, age of workers, labor intensity, etc. .), except for differences in qualification category (within a group all workers have the same qualifications).

The average of within-group variances reflects random variation, that is, that part of the variation that occurred under the influence of all other factors, with the exception of the grouping factor. It is calculated using the formula:

Intergroup variance characterizes the systematic variation of the resulting characteristic, which is due to the influence of the factor-sign, which forms the basis of the group. It is equal to the mean square of the deviations of the group means from the overall mean. Intergroup variance is calculated using the formula:

Dispersion in statistics is defined as the standard deviation of individual values ​​of a characteristic squared from the arithmetic mean. A common method for calculating the squared deviations of options from the average and then averaging them.

In economic statistical analysis, it is customary to evaluate the variation of a characteristic most often using the standard deviation; it is the square root of the variance.

(3)

Characterizes the absolute fluctuation of the values ​​of a varying characteristic and is expressed in the same units of measurement as the options. In statistics, there is often a need to compare the variation of different characteristics. For such comparisons, a relative measure of variation, the coefficient of variation, is used.

Dispersion properties:

1) if you subtract any number from all options, then the variance will not change;

2) if all values ​​of the option are divided by any number b, then the variance will decrease by b^2 times, i.e.

3) if you calculate the average square of deviations from any number with an unequal arithmetic mean, then it will be greater than the variance. At the same time, by a well-defined value per square of the difference between the average value c.

Dispersion can be defined as the difference between the mean squared and the mean squared.

17. Group and intergroup variations. Variance addition rule

If a statistical population is divided into groups or parts according to the characteristic being studied, then the following types of dispersion can be calculated for such a population: group (private), group average (private), and intergroup.

Total variance– reflects the variation of a characteristic due to all the conditions and causes operating in a given statistical population.

Group variance- equal to the mean square of deviations of individual values ​​of a characteristic within a group from the arithmetic mean of this group, called the group mean. However, the group average does not coincide with the overall average for the entire population.

Group variance reflects the variation of a trait only due to conditions and causes operating within the group.

Average of group variances- is defined as the weighted arithmetic mean of the group variances, with the weights being the group volumes.

Intergroup variance- equal to the mean square of deviations of group averages from the overall average.

Intergroup dispersion characterizes the variation of the effective characteristic due to the grouping characteristic.

There is a certain relationship between the types of dispersions considered: the total dispersion is equal to the sum of the average group and intergroup dispersion.

This relationship is called the variance addition rule.

18. Dynamic series and its components. Types of time series.

Row in statistics- this is digital data that shows the change of a phenomenon in time or space and makes it possible to make a statistical comparison of phenomena both in the process of their development in time and in various forms and types of processes. Thanks to this, it is possible to detect the mutual dependence of phenomena.

In statistics, the process of development of the movement of social phenomena over time is usually called dynamics. To display dynamics, dynamics series (chronological, time) are constructed, which are series of time-varying values ​​of a statistical indicator (for example, the number of convicted people over 10 years), arranged in chronological order. Their constituent elements are the digital values ​​of a given indicator and the periods or points in time to which they relate.

The most important characteristic of dynamics series- their size (volume, magnitude) of a particular phenomenon achieved in a certain period or at a certain moment. Accordingly, the magnitude of the terms of the dynamics series is its level. Distinguish initial, middle and final levels of the dynamic series. Entry level shows the value of the first, the final - the value of the last term of the series. Intermediate level represents the average chronological variation range and is calculated depending on whether the dynamic series is interval or momentary.

Another important characteristic of the dynamic series- the time elapsed from the initial to the final observation, or the number of such observations.

There are different types of time series; they can be classified according to the following criteria.

1) Depending on the method of expressing levels, the dynamics series are divided into series of absolute and derivative indicators (relative and average values).

2) Depending on how the levels of the series express the state of the phenomenon at certain points in time (at the beginning of the month, quarter, year, etc.) or its value over certain time intervals (for example, per day, month, year, etc.) etc.), distinguish between moment and interval dynamics series, respectively. Moment series are used relatively rarely in the analytical work of law enforcement agencies.

In statistical theory, dynamics are distinguished according to a number of other classification criteria: depending on the distance between levels - with equal levels and unequal levels in time; depending on the presence of the main tendency of the process being studied - stationary and non-stationary. When analyzing time series, they proceed from the following; the levels of the series are presented in the form of components:

Y t = TP + E (t)

where TP is a deterministic component that determines the general tendency of change over time or trend.

E (t) is a random component that causes fluctuations in levels.



CATEGORIES

POPULAR ARTICLES

2024 “mobi-up.ru” - Garden plants. Interesting things about flowers. Perennial flowers and shrubs