[Python] 정규 분포/가우시안 분포 그리기 (Normal Distribution / Gaussian Distribution plot)

정규 분포 / 가우시안 분포 PDF 정의

정규 분포 / 가우시안 분포 (Normal Distribution / Gaussian Distribution)는 $N(x|\mu , \sigma^2)$와 같이 표현합니다. 이때, $x$는 확률 변수를 의미하고 $\mu$는 평균값, 중앙값을 의미하고 $\sigma^2$는 분산을 의미합니다.

정규 분포 / 가우시안 분포 $N(x|\mu , \sigma^2)$의 PDF(Probability Density Function)은 다음과 같이 정의합니다.

$$N(x|\mu , \sigma^2) = \frac{1}{\sqrt{2 \pi \sigma^2}} exp [ - \frac{(x-\mu )^2}{2 \sigma^2} ]$$

방법 1. Numpy와 matplotlib만을 활용한 방법

np.arange 혹은 np.linspace를 통해 x의 범주를 설정하고 이에 대한 Gaussian value값을 얻어내어 함수로 표현하는 방법입니다. 아마도 가장 고전적인 방법으로 생각됩니다.

import numpy as np
import matplotlib.pyplot as plt

# set the range of x
x = np.arange(-5, 5, 0.01)
#print(x.shape) # (1000,)
#x = np.linspace(-5, 5, 1000)
#print(x.shape) # (1000,)

# gaussian distribution function
def gaussian(x, mean, sigma):  
    return (1 / np.sqrt(2*np.pi * sigma**2)) * np.exp(- (x-mean)**2 / (2*sigma**2))

legend = []
for i in range(1,5):
    legend.append(f'N(0,{i})')
    plt.plot(x, gaussian(x, 0, i))
plt.xlabel('x')
plt.ylabel('density')
plt.legend(legend)
plt.savefig('normal.png', dpi=72, bbox_inches='tight')
plt.show()

방법 2. Numpy와 matplotlib + scipy.stats.norm을 활용한 방법

scipy.stats.norm document

https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.norm.html

scipy.stats.norm — SciPy v1.8.0 Manual

expect(func, args=(), loc=0, scale=1, lb=None, ub=None, conditional=False, **kwds)

docs.scipy.org

앞에 방법 1에서는 Guassian PDF를 직접 코드로 구현했지만 여기서는 함수 패키지를 사용하는 방법입니다.

scipy.stats.norm(mean, sigma).pdf(x) 혹은 scipy.stats.norm.pdf(x, mean, sigma)의 형식으로 구성해서 값을 도출할 수 있습니다.

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

# set the range of x
x = np.arange(-5, 5, 0.01)
#print(x.shape) # (1000,)
#x = np.linspace(-5, 5, 1000)
#print(x.shape) # (1000,)

legend = []
for i in range(1,5):
    legend.append(f'N(0,{i})')
    plt.plot(x, norm(0, i).pdf(x))
    # plt.plot(x, norm.pdf(x, 0, i))
plt.xlabel('x')
plt.ylabel('density')
plt.legend(legend)
plt.savefig('normal.png', dpi=72, bbox_inches='tight')
plt.show()

번외) Gaussian 함수 내부 채우기

plt.fill_between을 통해 normal distribution의 내부를 채울 수 있습니다.

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

# set the range of x
x = np.arange(-5, 5, 0.01)
#print(x.shape) # (1000,)
#x = np.linspace(-5, 5, 1000)
#print(x.shape) # (1000,)

legend = []
for i in range(4,1,-1):
    legend.append(f'N(0,{i})')
    plt.fill_between(x, norm.pdf(x, 0, i), alpha=0.25*i)

plt.xlabel('x')
plt.ylabel('density')
plt.legend(legend)
plt.savefig('normal.png', dpi=72, bbox_inches='tight')
plt.show()

다양한 mean, sigma를 변화시키면서 다양한 형태의 Gaussian distribution의 모양 또한 얻을 수 있습니다.

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

# set the range of x
x = np.arange(-5, 5, 0.01)
#print(x.shape) # (1000,)
#x = np.linspace(-5, 5, 1000)
#print(x.shape) # (1000,)

legend = []
for j in range(0,4,2):
    for i in range(3,1,-1):
        legend.append(f'N({j},{i})')
        plt.fill_between(x, norm.pdf(x, j, i), alpha=0.5)

plt.xlabel('x')
plt.ylabel('density')
plt.legend(legend)
plt.savefig('normal.png', dpi=72, bbox_inches='tight')
plt.show()

번외) 다변수 Gaussian에 대한 그림 multivariate normal distribution

http://incredible.ai/statistics/2014/03/15/Multivariate-Gaussian-Distribution/

Multivariate Gaussian Distribution in Python

Normal Distribution 사람의 키, 측정치의 오류률, 혈압, 시험성적등등 많은 데이터의 유형이 gaussian distribution(normal distribution)을 따릅니다. 평균값과 분산값만 알고 있다면 central theorem을 통해 분포도

incredible.ai

사이트에서 제공하는 다변수 구성 예시

import numpy as np
from scipy.stats import multivariate_normal
import matplotlib.pyplot as plt

x, y = np.mgrid[-1:1:0.01, -1:1:.01]
pos = np.dstack((x, y))

rv1 = multivariate_normal(mean=[0, 0], cov=[[0.1, 0], [0, 0.1]])
rv2 = multivariate_normal(mean=[0, 0], cov=[[1, 0], [0, 1]])
rv3 = multivariate_normal(mean=[0.5, -1], cov=[[1, 0], [0, 1]])
rv4 = multivariate_normal(mean=[0, 0], cov=[[1, 0], [0, 1]])

fig, subplots = plt.subplots(2, 2)
fig.set_figwidth(14)
fig.set_figheight(14)
subplots = subplots.reshape(-1)

subplots[0].contourf(x, y, rv1.pdf(pos), cmap='magma')
subplots[1].contourf(x, y, rv2.pdf(pos), cmap='magma')
subplots[2].contourf(x, y, rv3.pdf(pos), cmap='magma')
subplots[3].contourf(x, y, rv4.pdf(pos), cmap='magma')

subplots[0].set_title('mean=[0, 0] cov=[[0.1, 0], [0, 0.1]]')
subplots[1].set_title('mean=[0, 0] cov=[[1, 0], [0, 1]]')
subplots[2].set_title('mean=[0.5, -1] cov=[[1, 0], [0, 1]]')
subplots[3].set_title('mean=[0, 0] cov=[[1, 0], [0, 1]]')
plt.show()