Breaking News

Basic Statistics With Python | Data Science | Python #ipumusings #eduvictors

Basic Statistics With Python

Basic Statistics With Python | Data Science | Python #ipumusings #eduvictors


We often need some statistical operations while working with data. For example, Mean, Mode, Median, Standard Deviation, and Variance, which are very basic but vital statistics operations, are commonly used.

Built-in Python modules that support statistical methods are: math and statistics

Following are famous third-party modules in Python that support statistical operations: numpy, scipy, pandas, and statsmodels


Using math module:

math modules don't directly calculate mean, median, or mode, but they can be used. Here is a code snippet to compute the measures of central tendency:

# Data

data = [1, 2, 3, 4, 5, 5, 6]

# Mean

mean = sum(data) / len(data)

print("Mean:", mean)

# Median

sorted_data = sorted(data)

n = len(sorted_data)

if n % 2 == 1:

    median = sorted_data[n // 2]

else:

    median = (sorted_data[n // 2 - 1] + sorted_data[n // 2]) / 2

print("Median:", median)


# Mode

from collections import Counter

count = Counter(data)

mode = count.most_common(1)[0][0]

print("Mode:", mode)


Here is the output:

Mean: 3.7142857142857144

Median: 4

Mode: 5


Using the statistics module:

This is a built-in module specifically designed for basic statistical operations.

It supports various functions like mean(), median(), mode(), stdev(), variance()


For example:

import statistics

data = [1, 2, 3, 4, 5]

print(statistics.mean(data))  # Output: 3


Using the statistics module: 

You can compute standard deviations and variance (measures of dispersion).

Here is simple Python code


import statistics

data = [1, 2, 3, 4, 5]

print("Standard Deviation:", statistics.stdev(data))

print("Variance:", statistics.variance(data))



Famous Third Party Modules for Python Statistics:

Third-party modules like numpy, scipy, pandas, and stat models are very helpful if you want to do advanced statistical calculations. You need to install these packages before using them.


1. numpy:

A powerful library for numerical computations.

Functions: mean(), median(), std() (standard deviation), var() (variance).


Example: 

import numpy as np

data = [1, 2, 3, 4, 5]

print(np.mean(data))  # Output: 3.0


2. scipy:

Built on top of numpy, it provides additional statistical functions.

Functions: scipy.stats.mode(), scipy.stats.describe().


Example:

from scipy import stats

data = [1, 2, 2, 3, 4]

print(stats.mode(data))  # Output: ModeResult(mode=2, count=2)



3. pandas:

A library for data manipulation and analysis, often used with tabular data.

Functions: mean(), median(), mode(), std(), var().


Example:

import pandas as pd

data = pd.Series([1, 2, 3, 4, 5])

print(data.mean())  # Output: 3.0


4. statmodels:

A library for advanced statistical modeling and analysis.

Functions: Descriptive statistics, regression analysis, hypothesis testing.


Example:

import statsmodels.api as sm

data = [1, 2, 3, 4, 5]

print(sm.stats.describe(data))  # Output: Descriptive statistics


In conclusion, Python provides a robust and versatile statistical analysis ecosystem, catering to fundamental and advanced requirements. From the built-in math and statistics modules, offering essential functions for basic calculations, to the powerful third-party libraries like NumPy, pandas, SciPy, and statsmodels, handling complex data manipulation and sophisticated statistical modelling, Python empowers users across diverse domains. Whether you're a student exploring introductory statistics or a seasoned researcher tackling intricate datasets, Python's accessibility and comprehensive libraries facilitate efficient and effective data analysis, making it an indispensable tool in the modern statistical toolkit.