Data Science MCQs

Q1.A collection of information about a related topic is referred to as a__________

(a) Visualisation

(b) Analysis

(d) Data

Q2. The process of examining data to draw insights is called _______.

(a) Visualisation

(b) Analysis

(d) Data

Q3.To find the _________, you add up all the numbers and then divide by how many of numbers you have.

(a) Median

(b) Mean

(d) Range

Q4. To find the ________, you put all numbers in order from least to greatest and find the number that is in the middle.

(a) Median

(b) Mode

(d) Range

Q5. Data on visitors' viewing habits at a bank's website has been collected. Which technique is used to identify pages commonly viewed during the same visit to the website?

(a) Clustering

(b) Classification

(d) Regression

Q6. A market research team studies smartphone preferences across different age groups (18–25, 26–40, 41–60, and 60+). To ensure each age group is proportionally represented in the sample, which sampling method should they use?

(a) Random sampling

(b) Stratified sampling

(d) Multistage sampling

Q7. A relationship between two or more variables is referred to as a ________

(a) Trend

(b) Spike

(d) None of the above

Q8. Data that sits outside the trend is referred to as a ______

(a) Outlier

(b) Trend

(d) Both (a) & (b)

Q9. A health researcher conducts a study on the effectiveness of a new fitness app by recruiting participants exclusively from a local gym. After analysing the data, the researcher concludes that the app significantly improves users' fitness levels. However, critics argue that the results may not apply to the general population.

Which type of bias most likely affects the study's conclusions due to its participant recruitment method?

(a) Selection Bias – The sample is unrepresentative because it only includes gym-goers (who may already be more health-conscious).

(b) Confirmation Bias – The researcher interprets data to confirm pre-existing beliefs.

(d) Recall Bias – Participants inaccurately remember or report past behaviours.

Q10. Which of the following is NOT a machine learning algorithm?

(a) SVG

(b) Random Forest

(d) None

Q11. Which of the following is one of the key data science skills?

(a) Machine Learning

(b) Statistics

(d) All of the above

Q12. Customer profile data often contains discrete features like gender, occupation, or car brand (stored as strings). Since most data analysis models require numeric inputs, which encoding method is typically applied?

(a) Normalisation

(b) One-Hot Encoding

(d) Principal Component Analysis (PCA)

Answers:

1. (d) Data

2. (b) Analysis

3. (b) Mean

4. (a) Median

5. (c) Association Rules

Association Rules is a data mining technique used to discover relationships or patterns between items in large datasets. In this case, it helps identify which web pages are frequently viewed together during the same visit (e.g., "Users who viewed Page A also viewed Page B")

6. (b) Stratified sampling

Stratified sampling guarantees proportional representation of key subgroups (here, age groups), making it ideal for comparative analysis.

7. (a) Trend

A trend represents a consistent, long-term relationship or pattern between two or more variables (e.g., as education level increases, income tends to rise)

8. (a) Outlier

An outlier is a data point that significantly deviates from the overall trend or pattern in a dataset.A trend refers to the general direction or relationship between variables, not an anomaly. A spike is a sudden, sharp increase, but doesn’t necessarily imply deviation from the trend.

9. (d) Recall Bias

The researcher recruited participants exclusively from a local gym. Gym-goers are generally more health-conscious and likely already have higher fitness levels or a stronger motivation to improve fitness compared to the general population. This makes the sample unrepresentative of the broader population, leading to conclusions that may not be generalizable.

10. (a) SVG

11. (d) All of the above

12. (b) One-Hot Encoding

One-Hot Encoding – Converts each category into a binary column (0/1). It is the standard method to convert string-based categories (e.g., "Male/Female") into numeric form for ML models.

👉SEE ALSO

1. Basic Statistics with Python

2. Types of Data Inputs

3. Handling Imbalanced Data in ML

4. High-Dimensional Space and Law of Large Numbers

5. Linear Regression

6. Understanding Data Preparation