Basic Statistics With Python | Data Science | Python #ipumusings #eduvictors
Basic Statistics With Python
We often need some statistical operations while working with data. For example, Mean, Mode, Median, Standard Deviation, and Variance, which are very basic but vital statistics operations, are commonly used.
Built-in Python modules that support statistical methods are: math and statistics
Following are famous third-party modules in Python that support statistical operations: numpy, scipy, pandas, and statsmodels
Using math module:
math modules don't directly calculate mean, median, or mode, but they can be used. Here is a code snippet to compute the measures of central tendency:
# Data
data = [1, 2, 3, 4, 5, 5, 6]
# Mean
mean = sum(data) / len(data)
print("Mean:", mean)
# Median
sorted_data = sorted(data)
n = len(sorted_data)
if n % 2 == 1:
median = sorted_data[n // 2]
else:
median = (sorted_data[n // 2 - 1] + sorted_data[n // 2]) / 2
print("Median:", median)
# Mode
from collections import Counter
count = Counter(data)
mode = count.most_common(1)[0][0]
print("Mode:", mode)
Here is the output:
Mean: 3.7142857142857144
Median: 4
Mode: 5
Using the statistics module:
This is a built-in module specifically designed for basic statistical operations.
It supports various functions like mean(), median(), mode(), stdev(), variance()
For example:
import statistics
data = [1, 2, 3, 4, 5]
print(statistics.mean(data)) # Output: 3
Using the statistics module:
You can compute standard deviations and variance (measures of dispersion).
Here is simple Python code
import statistics
data = [1, 2, 3, 4, 5]
print("Standard Deviation:", statistics.stdev(data))
print("Variance:", statistics.variance(data))
Famous Third Party Modules for Python Statistics:
Third-party modules like numpy, scipy, pandas, and stat models are very helpful if you want to do advanced statistical calculations. You need to install these packages before using them.
1. numpy:
A powerful library for numerical computations.
Functions: mean(), median(), std() (standard deviation), var() (variance).
Example:
import numpy as np
data = [1, 2, 3, 4, 5]
print(np.mean(data)) # Output: 3.0
2. scipy:
Built on top of numpy, it provides additional statistical functions.
Functions: scipy.stats.mode(), scipy.stats.describe().
Example:
from scipy import stats
data = [1, 2, 2, 3, 4]
print(stats.mode(data)) # Output: ModeResult(mode=2, count=2)
3. pandas:
A library for data manipulation and analysis, often used with tabular data.
Functions: mean(), median(), mode(), std(), var().
Example:
import pandas as pd
data = pd.Series([1, 2, 3, 4, 5])
print(data.mean()) # Output: 3.0
4. statmodels:
A library for advanced statistical modeling and analysis.
Functions: Descriptive statistics, regression analysis, hypothesis testing.
Example:
import statsmodels.api as sm
data = [1, 2, 3, 4, 5]
print(sm.stats.describe(data)) # Output: Descriptive statistics
In conclusion, Python provides a robust and versatile statistical analysis ecosystem, catering to fundamental and advanced requirements. From the built-in math and statistics modules, offering essential functions for basic calculations, to the powerful third-party libraries like NumPy, pandas, SciPy, and statsmodels, handling complex data manipulation and sophisticated statistical modelling, Python empowers users across diverse domains. Whether you're a student exploring introductory statistics or a seasoned researcher tackling intricate datasets, Python's accessibility and comprehensive libraries facilitate efficient and effective data analysis, making it an indispensable tool in the modern statistical toolkit.