Basic Statistics With Python

We often need some statistical operations while working with data. For example, Mean, Mode, Median, Standard Deviation, and Variance, which are very basic but vital statistics operations, are commonly used.

Built-in Python modules that support statistical methods are: math and statistics

Following are famous third-party modules in Python that support statistical operations: numpy, scipy, pandas, and statsmodels

Using math module:

math modules don't directly calculate mean, median, or mode, but they can be used. Here is a code snippet to compute the measures of central tendency:

# Data

data = [1, 2, 3, 4, 5, 5, 6]

# Mean

mean = sum(data) / len(data)

print("Mean:", mean)

# Median

sorted_data = sorted(data)

n = len(sorted_data)

if n % 2 == 1:

median = sorted_data[n // 2]

else:

median = (sorted_data[n // 2 - 1] + sorted_data[n // 2]) / 2

print("Median:", median)

# Mode

from collections import Counter

count = Counter(data)

mode = count.most_common(1)[0][0]

print("Mode:", mode)

Here is the output:

Mean: 3.7142857142857144

Median: 4

Mode: 5

Using the statistics module:

This is a built-in module specifically designed for basic statistical operations.

It supports various functions like mean(), median(), mode(), stdev(), variance()

For example:

import statistics

data = [1, 2, 3, 4, 5]

print(statistics.mean(data)) # Output: 3

Using the statistics module:

You can compute standard deviations and variance (measures of dispersion).

Here is simple Python code

import statistics

data = [1, 2, 3, 4, 5]

print("Standard Deviation:", statistics.stdev(data))

print("Variance:", statistics.variance(data))

Famous Third Party Modules for Python Statistics:

Third-party modules like numpy, scipy, pandas, and stat models are very helpful if you want to do advanced statistical calculations. You need to install these packages before using them.

1. numpy:

A powerful library for numerical computations.

Functions: mean(), median(), std() (standard deviation), var() (variance).

Example:

import numpy as np

data = [1, 2, 3, 4, 5]

print(np.mean(data)) # Output: 3.0

2. scipy:

Built on top of numpy, it provides additional statistical functions.

Functions: scipy.stats.mode(), scipy.stats.describe().

Example:

from scipy import stats

data = [1, 2, 2, 3, 4]

print(stats.mode(data)) # Output: ModeResult(mode=2, count=2)

3. pandas:

A library for data manipulation and analysis, often used with tabular data.

Functions: mean(), median(), mode(), std(), var().

Example:

import pandas as pd

data = pd.Series([1, 2, 3, 4, 5])

print(data.mean()) # Output: 3.0

4. statmodels:

A library for advanced statistical modeling and analysis.

Functions: Descriptive statistics, regression analysis, hypothesis testing.

Example:

import statsmodels.api as sm

data = [1, 2, 3, 4, 5]

print(sm.stats.describe(data)) # Output: Descriptive statistics

In conclusion, Python provides a robust and versatile statistical analysis ecosystem, catering to fundamental and advanced requirements. From the built-in math and statistics modules, offering essential functions for basic calculations, to the powerful third-party libraries like NumPy, pandas, SciPy, and statsmodels, handling complex data manipulation and sophisticated statistical modelling, Python empowers users across diverse domains. Whether you're a student exploring introductory statistics or a seasoned researcher tackling intricate datasets, Python's accessibility and comprehensive libraries facilitate efficient and effective data analysis, making it an indispensable tool in the modern statistical toolkit.

IPUMusings.COM

Breaking News

Basic Statistics With Python | Data Science | Python #ipumusings #eduvictors

Basic Statistics With Python

Popular Posts

Recent Posts

Comments

Tags

Featured Posts

Recent Posts

Recent in Sports

Ad-Blocker Detected!