- Written By Jyoti Saxena
- Last Modified 25-01-2023

**Introduction to Statistics: **The word statistics seems to have derived from the Latin word status, which means a political state. Originally statistics was simply the collection of numerical data on some aspects of people’s lives. However, over time, its scope broadened.

Today, statistics mean a collection of facts or information concerning almost every aspect of people’s lives with a definite purpose in the form of numerical data, organisation, summarisation, and presentation of data by tables and graphs (charts), analysing and drawing inferences from the data. In this article, we will cover the basic concepts used in statistics. Scroll down to learn more!

## Collection of Data

Statistics is a branch of mathematics that involves collecting, organising, interpreting, presenting, and analysing data. The \(5\) stages of statistics are problem, plan, data, analysis, conclusion. Based on the studies of data obtained, people can draw conclusions, make decisions and plan wisely.

When the group we want to collect data from is large, we often randomly select a smaller group to represent the entire group. This method of data collection is called random sampling. Random sampling makes data collection faster, less costly, and easier to analyse since the data population is smaller.

### Introduction to Probability and Statistics

In Biology and engineering, we can set up control experiments from random samples of particular objects of interest.

1. The yield of cross-fertilisation of different types of rice can be measured using samples from a patch of rice fields.

2. The lifetime of a particular battery can be tested in a quality-control laboratory by using a small random sample of batteries.

### Observing Outcomes of Events

Some data, such as air pollution levels and bird migration paths, are observed through real-life situations.

- Air pollution levels are observed by taking random air samples.
- A bird migration path is traced by tagging a sample population of birds with identification numbers.

### Conducting Surveys

We usually conduct surveys to gather personal data and public opinion by requesting sample populations of target groups to complete a questionnaire. The survey may be performed through face-to-face interviews, telephone interviews, postal surveys, or Internet surveys.

- The United States Census is a \(100\% \) population survey that is conducted once in \(10\) years
- Customer satisfaction surveys taken by random samples of customers are widespread in the service sector.

### Reading Statistical Publications

Some data such as economic conditions and a country’s population are obtained by conducting large-scale surveys that individuals or small companies cannot do. We can receive these types of data from official statistics published by government organisations. The United Nations Statistical Yearbook contains a wide range of international economic, social, and environmental data.

### Important Terms Related to Statistics

Some of the important terms related to statistics are as follows:

1. **Primary Data: **The information collected** **by the collector himself with a definite purpose in their mind is called primary data.

2. **Secondary Data: **The information gathered from a source already stored is called secondary data.

3. **Raw Data: **The numerical data recorded in its original form as collected by the investigator or received from some source is called raw data.

4. **Variable: **A quantity that is being measured in an experiment or survey is called a variable. Height, age and weight of people, income and expenditure of people, number of members in a family, number of workers in a factory, marks obtained by a student in a test, the number of runs scored in a cricket test, etc., are examples of variables.

Variables are of two types:

**Continuous Variable**A variable which can take any value between two given values is called a continuous variable. For example, height, age and weight of people are continuous variables.**Discontinuous Variable**A variable which cannot take all possible values between two given values is called a discontinuous or discrete variable. For example, the number of members in a family and the number of workers in a factory are discrete variables (since the variables cannot take any value between \(1\) and \(2, 2\) and \(3\) etc.

5. **Range: **The difference between the maximum and minimum values of a variable is called its range.

6.** Variate: **A particular value of a variable is called variate (observation).

7. **Frequency: **The number of times a variate (observation) occurs in a given data is called the frequency of that variate.

8. **Frequency Distribution: **A tabular arrangement of given numerical data showing the frequency of different variates is called frequency distribution, and the table itself is called frequency distribution table.

### Types of Statistics

There are mainly two types of statistics, and they are;

**Descriptive Statistics:**Descriptive statistics uses data that describes the population either through numerical calculation or graph or table. It provides a graphical summary of data. It is used for summarising objects.**Inferential Statistics:**Inferential statistics make inferences and predictions about the population based on a population sample. It generalises a large dataset and applies probabilities to conclude. It is used for explaining the meaning of descriptive statistics. It is used to analyse, interpret the result, and conclude.

Inferential statistics is mainly related to and associated with hypothesis testing, whose main target is to reject the null hypothesis. Hypothesis testing is a type of inferential procedure that takes the help of sample data to evaluate and assess the credibility of a hypothesis about a population. Inferential statistics are generally used to determine how strong the relationship is within the sample.

We can broadly classify statistics as shown below:

### Tabulation of Raw Data

Suppose there are \(32\) students in Class \({\rm{IX}}\) in a school and in an examination, out of \(50\) marks, the marks obtained by them are as follows:

\(39,44,25,11,21,25,44,25,7,40,43,44,49,14,11,14,25,28,28,39,44,37,21,40,\)

\(43,3,37,25,25,21,37,38\)

The data in this form is the raw (or ungrouped or unclassified) data. Here the number of marks obtained is the variable, and each entry in the above list is an observation or variate.

To make it easily understandable, we present the above data in a table called the frequency distribution table.

The frequency distribution table for the above raw data is given below:

### Mean, Median and Mode of Ungrouped Data

The mean or arithmetic average of a number of observations is the sum of the values of all the observations divided by the total number of observations.

Mean \( = \frac{{{x_1} + {x_2} + {x_3} + \ldots {x_n}}}{n} = \frac{{\sum {{x_i}} }}{n},\) where \(\sum {{x_i}} = {x_{1 + }}{x_{2 + }}{x_{3 + }}{x_{4 + }} \ldots \ldots {x_n}.\)

Thus, mean \( = \frac{{{\rm{ Sum}}\,{\rm{of}}\,{\rm{all}}\,{\rm{observations }}}}{{{\rm{ Total}}\,{\rm{number}}\,{\rm{of}}\,{\rm{observations }}}}\)

Median is the central value of statistical data if it is arranged in ascending or descending order. Thus, if there are \(n\) observations \({x_{1,}}{x_{2,}}{x_{3,}}{x_{4 + }} \ldots \ldots {x_n}\) arranged in ascending or descending order then,

Median \( = {\left( {\frac{{n + 1}}{2}} \right)^{th}}\) observation, if n is odd.

Median \( = \frac{{{{\left( {\frac{n}{2}} \right)}^{{\rm{th}}}}{\rm{ observation }} + {{\left( {\frac{n}{2} + 1} \right)}^{{\rm{th}}}}{\rm{ observation }}}}{2},\) if \(n\) is even.

The mode of a set of data is the value that occurs most often.

Suppose the sizes of eight coats sold by a boutique manager are \(7, 8, 8, 8, 9, 10, 12\) and \(16.\)

In this case, the mean \(=9.75\) and the median \(=8.5\)

These two measures of central tendency are not very meaningful to the manufacturer of the coats and the boutique manager because they are not production size numbers. Instead, they will be more interested to know the most popular size so that they can cater to the customer’s needs. The mode is sometimes used as a measure of central tendency.

In this case, the modal size or the distribution mode is \(8\) as it is the most popular.

### Grouped or Classified Data

Consider the class intervals \(1-5, 6-10, 11-15,….\) And the other class interval as \(1-10, 10-20, 20-30, 30-40,….\)

In the class interval \(1-5, 1\) is the lower limit, and \(5\) is the upper limit. If \(x\) is a member of this class, then \(1≤x≤5.\) Similarly, \(6\) is the lower limit, and \(10\) is the upper limit of class \(6-10.\) In this example, the classes are not overlapping but discontinuous. Such a frequency distribution is called discrete (or inclusive) distribution.

In the class interval \(1-10, 1\) is the lower limit, and \(10\) is the upper limit. If \(x\) is a member of this class, then \(1≤x≤10.\) Similarly, \(10\) is the lower limit, and \(20\) is the upper limit of class \(10-20.\) In this example, the classes are non-overlapping but continuous. Such a frequency distribution is called continuous (or exclusive) distribution.

### Converting Discrete Distribution to Continuous Distribution

If we measure height, weight and time, there may be fractions of a meter, kilogram and hour respectively, therefore we need continuous distribution.

To convert discrete classes into continuous classes, we require some adjustments.

Adjustment factor\( = \frac{{{\rm{ lower}}\,{\rm{limit}}\,{\rm{of}}\,{\rm{one}}\,{\rm{class – upper}}\,{\rm{limit}}\,{\rm{of}}\,{\rm{previous}}\,{\rm{class }}}}{2}\)

Subtract the adjustment factor from all the lower limits and add the adjustment factor to all the upper limits.

### True Class Limits

In a continuous distribution, the class limits are called true or actual class limits. The class limits obtained after adjustment in a discrete distribution are the true class or actual class limits. In discrete distribution, the original class limits are called the stated class limits.

### Solved Examples on Introduction to Statistics

** Q.1. Find the mean of first** \(6\)

**\(5.\)**

*multiples of***The first \(6\) multiples of \(5\) are \(5, 10, 15, 20, 25\) and \(30.\)**

*Ans:*The sum of these multiples \(=5+10+15+20+25+30=105\)

Number of multiples \(=6\)

Average \( = \frac{{{\rm{ Sum}}\,{\rm{of}}\,6\,{\rm{multiples }}}}{{{\rm{ Number}}\,{\rm{of}}\,{\rm{multiples }}}}\)

Average \( = \frac{{5 + 10 + 15 + 20 + 25 + 30}}{6}\)

Average \( = \frac{{105}}{6} = 17.5\)

Hence, the arithmetic mean of the first \(6\) multiples of \(5\) is equal to \(17.5.\)

*Q.2. Find the mean of the first seven prime numbers.*** Ans:** First seven multiples are \(2, 3, 5, 7, 11, 13, 17\)

The sum of these prime numbers \(=2+3+5+7+11+13+17=58\)

Therefore their mean \( = \frac{{58}}{7} = 8\frac{2}{7}.\)

** Q.3. Find the median of the following data.** \(5, 3, 12, 0, 7, 11, 4, 3, 8\)

**Arranging the given data in ascending order, we get**

*Ans:*\(0, 3, 3, 4, 5, 7, 8, 11, 12\)

The total number of observations \(=n=9,\) which is odd.

Median \({\left( {\frac{{n + 1}}{2}} \right)^{{\rm{th}}}}\) observation

\( = \frac{{9 + 1}}{2} = 5\)

\({5^{{\rm{th}}}}\) observation, which is \(5.\)

Hence, the median is \(5.\)

*Q.4. The number of goals scored by a football team in a series of matches is:** \(3, 1, 0, 7, 5, 3, 3, 4, 1, 2, 0, 2.\) Find the median of the data.* Arranging the number of goals in ascending order, we get,

Ans:

\(0, 0, 1, 1, 2, 2, 3, 3, 3, 4, 5, 7\)

Here, \(n=12\)

Median \( = \frac{{{{\left( {\frac{n}{2}} \right)}^{{\rm{th}}}}{\rm{observation }} + {{\left( {\frac{n}{2} + 1} \right)}^{{\rm{th}}}}{\rm{observation }}}}{2}\)

\( = \frac{{{{(6)}^{{\rm{th}}}}{\rm{observation }} + {{(7)}^{{\rm{th}}}}{\rm{observation }}}}{2} = \frac{{2 + 3}}{2} = 2.5\)

Hence, the median of the data is \(2.5.\)

** Q.5. If the mean of** \(y+2, y+4, y+6, y+8\)

**\(y+10\)**

*and***\(13,\)**

*is***\(y.\)**

*find the value of***Given**

*Ans:*Mean of \(y+2, y+4, y+6, y+8\) and \(y+10\) is \(13.\)

Mean \( = \frac{{{\rm{ Sum}}\,{\rm{of}}\,{\rm{numbers }}}}{{{\rm{ Number}}\,{\rm{of}}\,{\rm{numbers }}}}\)

\(13 = \frac{{y + 2 + y + 4 + y + 6 + y + 8 + y + 10}}{5}\)

On further calculation, we get

\(13 \times 5 = 5y + 30\)

\( \Rightarrow 65 = 5y + 30\)

\( \Rightarrow 5y = 65 – 30\)

\( \Rightarrow 5y = 35\)

\( \Rightarrow y = 7\)

Hence, the value of \(y\) is equal to \(7.\)

### Summary

Statistics is a branch of mathematics. It involves collecting, organising, interpreting, presenting, and analysing data. Statistics is divided into two types namely, descriptive and inferential statistics. Descriptive statistics describe the population either through numerical calculation or table or graph. Inferential statistics make inferences and predictions regarding the population based on a population sample. Furthermore, the five stages in statistics are problem, plan, data, analysis, and conclusion.

### FAQs on Introduction to Statistics

*Q.1. What is the introduction to statistics?*** Ans:** Statistics is a branch of mathematics that involves collecting, organising, interpreting, presenting, and analysing data. Based on the studies of data obtained, people can draw conclusions, make decisions and plan wisely.

** Q.2. Write a difference between primary data and secondary data.** The information collected by a collector himself with a definite purpose is called primary data, whereas the information gathered from a source already stored is called secondary data.

Ans:

** Q.3. What are the** \(2\)

*types of statistics?***Statistics can be divided into two categories. They are:**

*Ans:*1. Descriptive Statistics: Descriptive statistics uses data that describes the population either through numerical calculation or graph or table. It provides a graphical summary of data.

2. Inferential Statistics: Inferential Statistics makes inferences and predictions about the population based on a population sample. It generalises a large data set and applies probabilities to conclude. It is used for explaining the meaning of descriptive stats. Inferential Statistics is mainly related to and associated with hypothesis testing, whose main target is to reject the null hypothesis.

*Q.4. What are the five stages of statistics?*** Ans:** The \(5\) stages of statistics are: Problem, Plan, Data, Analysis, Conclusion.

*Q.5. Define variables with an example.*** Ans:** A quantity that is being measured in an experiment or survey is called a variable. Height, age and weight of people, income and expenditure of people, number of members in a family, number of workers in a factory, marks obtained by a student in a test, the number of runs scored in a cricket test, etc., are examples of variables.

*Some other helpful articles by Embibe are provided below:*

Foundation Concepts | Class-wise Mathematical Formulas |

HCF and LCM | Maths Formulas For Class 6 |

Algebra Formulas | Maths Formulas For Class 7 |

BODMAS Rule | Maths Formulas For Class 8 |

Properties Of Triangles | Maths Formulas For Class 9 |

Trigonometry Formulas | Maths Formulas For Class 10 |

Mensuration Formulas | Maths Formulas For Class 11 |

Differentiation Formulas | Maths Formulas For Class 12 |

*We hope this article on introduction to statistics has provided significant value to your knowledge. If you have any queries or suggestions, feel free to write them down in the comment section below. We will love to hear from you. Embibe wishes you all the best of luck!*