Search words and concepts in our toolbox

The wonderful world of statistics sometimes has its own language. Words that have a specific definition so that everyone knows what they mean. We explain our vocabulary in this toolbox, so that you can get a better understanding of how to use and interpret these words yourself.

Amplitude

The mode, the median and the average are so-called position parameters that give an idea of the order of magnitude of a series of observations. They do not give any idea of the dispersion of the observations.

For this, we need to use dispersion parameters, such as the amplitude.

The amplitude represents the gap between the smallest value observed and the largest. The amplitude shows the extent of the series of observations.

The greater the amplitude, the further apart the observed values are; the smaller the amplitude, the closer the observed values are to each other.

Let’s take 2 classes of 14 pupils each. The pupils received a mark out of 10 for a homework.

Here are the results observed in both classes.

Marks in class nº2 are therefore much more spread out than those in class nº1.

Average

The (arithmetic) average is a position parameter that can be used to summarise the information in a sometimes very large data set.

To calculate the average, we need to add up all the observations and divide the total obtained by the number of observations.

For example:

Jules received 5 muffins, Alice 3, Samuel 1, Yana 10 and Lucile 6.

Let's put all the muffins together: there is a total of 25 muffins.

Let’s then divide the muffins equally between the children: each child will receive 5 muffins.

So the average is 5 muffins.

The average could be interpreted as the number of muffins that everyone would receive if all the muffins were distributed equally between the children.

If you want to know more, we talk about the average in the ‘Population’ and ‘Agriculture’ topics.

Average or median?

Both the median and the average are position parameters.

A position parameter indicates a typical value around which the observations are distributed.
This typical value is a kind of summary of the whole series of observations.

If you calculate the average and the median of a series of observations, you are calculating 2 typical values for your series, which can sometimes be very different from each other.
How do you choose which one makes the most sense?

Let's imagine this. We're going to talk about fruit this time, 14 cherries and a watermelon. The cherries are light, weighing just 10 grams each. The watermelon, on the other hand, is huge, weighing 5 kilos, or 5,000 grams!

Now we want to calculate how much each fruit weighs on average. To do this, we add the weight of all the cherries and the watermelon, then divide this sum by the total number of fruits, which is 15.

(14x10) +5000
15

After doing the maths, we find that the average is 333.33 grams per fruit, which is about 0.33 kilos. But that is way too heavy for a cherry, even a large one!

Let's now try to calculate the median weight of our fruit: 10, 10, 10, 10, 10,... 5,000, the median is 10 grams. Much more representative of our set of fruit, isn’t it?

This example shows you that if your series of observations contains some very large values or, on the contrary, some very small values, then the average of your series does not necessarily make sense. It is therefore preferable to use the median as the ‘typical’ value for your series.

Charts

A chart is a visual representation of data.

It must contain:

a title so you know what information it contains
a legend to help you find your way around
data
the source of these data

There are different types of charts: curve, bar chart, pie chart, etc.

Some charts have axes. These charts show the relationship between a dependent variable and an independent variable.

The horizontal axis is called the X-axis. It is associated with the independent variable (often referred to as x).

The vertical axis is called the Y-axis and is associated with the dependent variable (often referred to as y).

A bar chart is used to represent observations according to their frequency. The longer the bar, the greater the frequency of the observation.

Curves generally show an evolution.

A pie chart shows the share of various elements in a total.

To go from a pie chart to a bar chart, you have to know the total frequency, i.e. the total number of observations. Then simply apply each percentage to this total to find the frequency of each element.

You will find charts in all topics.

Cumulative frequency

The cumulative frequency is the sum of all the frequencies up to a specific value in the data series under consideration.

To calculate a cumulative frequency, the statistical series must be ordered, i.e. the observations must be classified in ascending order.

Both absolute and relative frequencies can be cumulated.

Here are the marks out of 10 for a test taken by the 24 pupils in a class, presented in the form of a frequency distribution.

Marks out of 10	0	1	2	3	4	5	6	7	8	9	10	Total
Number of pupils obtaining the mark	1	2	1	0	1	2	5	5	4	2	1	24

In this table, the marks are already classified in ascending order.
The number of pupils who had a mark lower than or equal to 5/10 is 1+2+1+0+1+2=7
The number of pupils who had a mark higher than 5/10 is
5+5+4+2+1=17

If you want to know more, we talk about cumulative frequencies in the ‘Population’ topic.

Data collection

Data are essential for any statistical study.

A statistician collects the data needed to answer the question being asked.

Examples

How much time do children your age spend watching television?
What is the age of the Belgian population?
Which of two different potato varieties has the highest yield?
How big are the ears of African elephants?

Where do the data come from?

There are several data sources:

experimental data: these data are obtained by scientists who set up experimental plans to test a whole bunch of hypotheses about the effectiveness of a medicine, for example, or the resistance of certain plant varieties to disease, etc.
administrative data: these are data that have been collected by another authority. These data already exist. So there is no need to ask citizens or enterprises to provide this information again. This saves time and money, and above all avoids annoying people unnecessarily.
survey data: when data are not available, there is no other choice but to collect them via surveys. A surveys is a form with questions to which people are invited to respond. Did you know that Statbel is one of the only bodies in Belgium which may impose compulsory surveys on citizens, enterprises and organisations?

To give you examples, the statistics presented in the topics population, evolution of the population and nationalities are based on data from the National Register, which are administrative data. However, agricultural statistics are partly based on survey data.

Frequency and absolute frequency

The absolute frequency is the number of times the same value is observed in a statistical series.

The total absolute frequency is the total number of observations.

Here are the marks out of 10 for a test taken by the 24 pupils in a class, presented in the form of a frequency distribution.

Marks out of 10	0	1	2	3	4	5	6	7	8	9	10
Number of pupils obtaining the mark	1	2	1	0	1	2	5	5	4	2	1

2 pupils obtained the mark of 1/10 2 is the absolute frequency of the mark
1/10 4 pupils obtained the mark of 8/10 4 is the absolute frequency of the mark 8/10
The absolute frequency of the mark of 4/10 is 1.

The total absolute frequency can be obtained by adding together the absolute frequencies of all observations:

1+2+1+0+1+2+5+5+4+2+1=24

24 is the total number of observations; it is also the number of pupils in the class.

Median

The median is a position parameter that can be used to summarise the information in a sometimes very large data set.

The median is an observed or unobserved value that divides the statistical series into 2, so that half of the observed values are lower and half of the observations are higher.

The median is the central value of a series of observations.

To determine the median, we first need to order the series of observations by ascending order.

Let's say we're interested in the height of the pupils in a class, measured in cm.

Here are the heights of the 15 pupils in the class.
Yana 135 cm, Alice 130 cm, Jules 132 cm, Samuel 150 cm, Fabio 133 cm, Lucile 138 cm, Emma 129 cm, Louis 133 cm, Ilan 134 cm, Selena 128 cm, Adriano 136 cm, Aisha 133 cm, Sofiane 135 cm, Aaron 140 cm and Noémie 139 cm.

The first step is to order the series of observations from the smallest to the largest. The pupils are asked to line up in order of increasing size, and this is what we get:

Selena	128 cm
Emma	129 cm
Alice	130 cm
Jules	132 cm
Emilie	133 cm
Fabio	133 cm
Louis	133 cm
Ilan	134 cm
Paul	135 cm
Michèle	135 cm
Nicolas	136 cm
Lucile	138 cm
Noémie	139 cm
Patrick	140 cm
Samuel	150 cm

After that, we need to find the middle of the series.

Selena	128 cm	7 observations are smaller then 134 cm
Emma	129 cm
Alice	130 cm
Jules	132 cm
Emilie	133 cm
Fabio	133 cm
Louis	133 cm
Ilan	134 cm	134 cm is the value that cuts the serie in the middle, 134 cm is the median
Paul	135 cm	7 observations are larger then 134 cm
Michèle	135 cm
Nicolas	136 cm
Lucile	138 cm
Noémie	139 cm
Patrick	140 cm
Samuel	150 cm

When the number of observations is uneven, the median is an observed value of the statistical series.
When the number of observations is even, the median is not an observed value of the statistical series.

Let’s say Fabio changed schools. There are now only 14 pupils in the class. The series of ordered observations can be presented as follows:

Selena	128 cm	7 observations are smaller then 134.5 cm
Emma	129 cm
Alice	130 cm
Jules	132 cm
Emilie	133 cm
Louis	133 cm
Ilan	134 cm
		134.5 cm is the value that cuts the serie in the middle, 134.5 cm is the median
Paul	135 cm	7 observations are larger then 134.5 cm
Michèle	135 cm
Nicolas	136 cm
Lucile	138 cm
Noémie	139 cm
Patrick	140 cm
Samuel	150 cm

In this case, the median is not an observed value of the statistical series.

If you want to know more, we talk about the median in the ‘Population’ topic.

Mode

The mode is the observation with the highest absolute frequency.

This is the most frequent or most frequently observed data.

The mode is a position parameter.

Here is the frequency distribution of the ages of the players in a football team.

Age	Number of players
19	1
20	1
21	1
23	1
24	2
25	3
26	1
28	2
30	1
32	1
35	1
Total	15

What is the most common age in this team? What is the most frequent observation? To answer these questions, you need to find the highest frequency.

Age	Number of players
19	1
20	1
21	1
23	1
24	2
25	3
26	1
28	2
30	1
32	1
35	1
Total	15

3 is the highest frequency and corresponds to the age of 25.

25 is the age most frequently observed in the team, the most frequent observation.

25 is the mode of the series.

If you want to know more, we talk about the mode in the ‘Population’ topic, but also in the ‘Traffic accidents’ topic.

Percentage

A percentage represents a proportion of a whole and is expressed in relation to 100. A percentage is a fraction out of 100. We use the symbol %.

20% means 20 shares out of 100 or 20/100.

A percentage that has nothing to do with anything is meaningless.

20% of what?

Example 1

Imagine a square made up of 100 small squares.

Of these 100 small squares, 20 are coloured yellow, 10 are coloured blue, 25 are coloured orange, 1 is coloured red and 44 are not coloured.

The proportion of yellow squares to the whole is therefore 20/100 or 20%; the proportion of blue squares is 10/100 or 10%; the proportion of orange squares is 25/100 or 25% and the proportion of red squares is 1/100 or 1% and the proportion of uncoloured squares is 44/100 or 44%.

Example 2

In a class there are 4 girls and 16 boys. What is the percentage of girls in the class?

The proportion of girls in the class (4 girls out of a total of 20) can be written as 4/20. To calculate the percentage, this fraction must be expressed in relation to 100.

To find the answer, you have to multiply the denominator by 5 and, therefore, the numerator by 5.

There are therefore 20% girls in the class.

The nationalities topic is full of percentages.

Proportionality

Proportionality is a concept that applies to variables that are linked by the same number.

If by multiplying the values taken by one variable by the same non-zero number, we obtain the values taken by the other variable, then we say that these variables are proportional.

The number by which the value of the first variable must be multiplied to find the value of the second variable is called the proportionality coefficient.

Proportional variables can be represented in a proportionality table.

number of litres of petrol consumed	2.5	4	6	10
number of kilometres travelled	50	80	120	200

Are the number of litres of petrol consumed and the number of kilometres travelled two proportional variables? In other words, by what number would you have to multiply the first line of the table to find the second line?

*20	number of litres of petrol consumed	2.5	4	6	10
*20	number of kilometres travelled	50	80	120	200

20 is the proportionality coefficient. This number represents the number of kilometres travelled per litre of petrol. The two variables are therefore proportional.

You will note that:

2,5 * 1,5 = 4	In a proportionality table, you can multiply the values in one column by a number to find the values in another column.
50 * 1.5 = 80
2.5 * 4 = 10
50 * 4 = 200

4 + 6 = 10	In a proportionality table, you can add up the values in two columns to find the values in another column.
80 + 120 = 200

Quantitative variables or qualitative variables

The characteristics of interest in a statistical study are called variables. Variable because the value of these characteristics may vary from one individual to another in the study.

Some variables can be expressed:

by a number: the size, the weight, the salary, etc. These are quantitative variables
by qualities: eye colour, gender, etc. These are qualitative variables

The variables you will encounter as you travel through the Statbel Junior topics are quantitative variables.

Relative frequency

The relative frequency of an observation is obtained by dividing the absolute frequency of this observation by the total absolute frequency of the statistical series.

The relative frequency is always between 0 and 1. It can be expressed in percentage.

The sum of all relative frequencies is always 1.

Let's take a look at the marks out of 10 for a test taken by the 24 pupils in a class, presented in the form of a frequency distribution.

Marks out of 10	0	1	2	3	4	5	6	7	8	9	10	Total
Number of pupils obtaining the mark	1	2	1	0	1	2	5	5	4	2	1	24

The absolute frequency of the mark of 7/10 is 5. The total absolute frequency is 24.
If we divide 5 by 24, we get 0.208.
0.208 is the relative frequency of the mark of 7/10.

This means that 20.8% of the pupils in the class received a mark of 7/10 in the test.

You can do the exercise for all marks from 0 to 10.

Representative sample

To ensure that the conclusions of the study are accurate, it is imperative that the sample is representative, i.e. that it has the same characteristics as the universe from which it is drawn.

Imagine a class of 24 children, with 6 girls and 18 boys.

Imagine you're interested in how satisfied the class is with a school outing.

Imagine that you can't interview the whole class and you decide to interview a sample of 6 children.

You choose to interview 5 girls and 1 boy about their level of satisfaction.

The 5 girls are unsatisfied. The boy is satisfied.

From this sample, you could come to the conclusion that the majority of the class is not satisfied with the school outing.

Is this reality? No. Because the sample is not representative. Girls in your sample are overrepresented, while boys are underrepresented. And in the reality of this class, the girls are not of the same opinion as the boys.

You would have obtained different results if you had interviewed all the pupils in the class.

Rule of three

If variables are proportional and you know 3 values, the rule of three can help you find the 4th value.

9 pineapples cost €36. How much cost 11 pineapples?

To solve this problem, we can first calculate the price of 1 pineapple, and then multiply the price obtained by 11.

€36 : 9 = €4 for 1 pineapple
€4 x 11 = €44 for 11 pineapples

11 pineapples will therefore cost €44.

Sample

The sample represents a subset of the universe (the population) on which the statistical study is based.

Why study a sample rather than the entire universe?

Quite simply for reasons of resources.

Interviewing all the units in the universe is very expensive and time-consuming.

But the sample must be representative of the population!

Statistical series and observations

An observation is a value observed for a given variable. It can be qualitative or quantitative.

For example: as part of a study on the age of players in a football team, each player was asked his age.

The variable or the characteristic that we study in this case is the age of the players.

Here are the results obtained.

24, 25, 30, 24, 25, 23, 28, 32, 20, 19, 35, 26, 28, 21, 25

All the observations together form a series.

This series can be ordered by ranking the values of the observations from the smallest to the largest.

19, 20, 21, 23, 24, 24, 25, 25, 25, 26, 28, 28, 30, 32, 35

This series can also be transcribed into a table known as a frequency distribution.

Age	Number of players
19	1
20	1
21	1
23	1
24	2
25	3
26	1
28	2
30	1
32	1
35	1
Total	15

Statistics

Statistics is the science of data. All the methods used to collect and analyse data.

The objectives of statistics are the following:

to collect data on a characteristic, subject, theme or phenomenon from a population or part of a population. This is what we call collecting data.
to extract information from data. This is the role of descriptive statistics, which use parameters, charts and tables to summarise the information contained in the data.
to generalise conclusions from one part of the population to the whole population: this is statistical inference

All the statistics presented on this website are produced by Statbel, the Belgian statistical office. Statbel is a General Directorate of the Federal Public Service Economy.

Universe and population

The universe or population is the set of units (also called individuals) on which a statistical study is based.

Belgian population: all the inhabitants in Belgium (unit = an inhabitant in Belgium)
Belgian vehicle fleet: all motor vehicles registered in Belgium (unit = a motor vehicle)
All the trees in a forest (unit = a tree)
Elephants of Africa (unit = an elephant)
The pupils in a school (unit = a pupil)
...

Statbel

North Gate

Boulevard du Roi Albert II, 16

1000 Bruxelles

statbel@economie.fgov.be

0800 120 33

Made byBits of Love