๋ฐ์ดํ„ฐ๋ถ„์„๊ฐ€ ๊ณผ์ •/Python

DAY49. Python Statis Scipy (์นด์ด์ œ๊ณฑ๊ฒ€์ •, T๊ฒ€์ •, ๊ณต๋ถ„์‚ฐ, ํšŒ๊ท€๋ถ„์„)

LEE_BOMB 2021. 11. 30. 17:55
statistics

statistics ๋ชจ๋“ˆ 
๊ธฐ์ˆ ํ†ต๊ณ„ : ๋Œ€ํ‘œ๊ฐ’, ์‚ฐํฌ๋„, ์™œ๋„/์ฒจ๋„ ๋“ฑ 


import statistics as st #๊ธฐ์ˆ ํ†ต๊ณ„ ์ œ๊ณต 
import pandas as pd  #csv file read

 

 

๊ธฐ์ˆ ํ†ต๊ณ„ : ๋น„์œจ์ฒ™๋„ or ๋“ฑ๊ฐ„์ฒ™๋„ 

dataset = pd.read_csv(r'C:\ITWILL\4_Python-II\data\descriptive.csv')
dataset.info()

x = dataset['cost'] #๊ตฌ๋งค๋น„์šฉ 
x



1. ๋Œ€ํ‘œ๊ฐ’ 

print('ํ‰๊ท  =', st.mean(x))
print('์ค‘์œ„์ˆ˜ =', st.median(x))
print('์ตœ๋นˆ์ˆ˜ =', st.mode(x)) #์ตœ๋นˆ์ˆ˜ = 6.0

x.value_counts() #6.0    21


2. ์‚ฐํฌ๋„ : ๋ถ„์‚ฐ, ํ‘œ์ค€ํŽธ์ฐจ, ์‚ฌ๋ถ„์œ„์ˆ˜ 

var = st.variance(x)
var # 6.0    21

std = st.stdev(x)
std # 1.1421532501476073

st.quantiles(x)
# [4.425000000000001, 5.4, 6.2]

x.describe()

25%        4.475000
50%        5.400000
75%        6.200000

import scipy.stats as sts




3. ์™œ๋„/์ฒจ๋„
์™œ๋„ = 0 : ์ •๊ทœ๋ถ„ํฌ 
์™œ๋„ > 0 : ์™ผ์ชฝ ๊ธฐ์šธ์–ด์ง 
์™œ๋„ < 0 : ์˜ค๋ฅธ์ชฝ ๊ธฐ์šธ์–ด์ง 

sts.skew(x) # -0.1531779106237012


์ฒจ๋„ = 0 or 3 

sts.kurtosis(x, fisher=True) #fisher = 0
sts.kurtosis(x, fisher=False) #pearson = 3

์ •๊ทœ๋ถ„ํฌ = 0 or 3
์ฒจ๋„ > ์ •๊ทœ๋ถ„ํฌ : ์œ„๋กœ ๋พฐ์กฑํ•จ 
์ฒจ๋„ < ์ •๊ทœ๋ถ„ํฌ : ์™„๋งŒํ•จ  

ํžˆ์Šคํ† ๊ทธ๋žจ + ๋ฐ€๋„๋ถ„ํฌ๊ณก์„  

import seaborn as sn 
sn.displot(data=x, kde=True)

 

 

 

 

 

chisquare_test

ํ™•๋ฅ ๋ณ€์ˆ˜์˜ ์ ํ•ฉ์„ฑ ๊ฒ€์ • - ์ผ์›
๋‘ ์ง‘๋‹จ๋ณ€์ˆ˜ ๊ฐ„์˜ ๋…๋ฆฝ์„ฑ ๊ฒ€์ • - ์ด์› 


from scipy import stats #ํ™•๋ฅ ๋ถ„ํฌ+๊ฐ€์„ค๊ฒ€์ • 
import numpy as np #์ˆ˜์‹ ๊ณ„์‚ฐ 
import pandas as pd #csv file read



1. ์ผ์› chi-square(1๊ฐœ ๋ณ€์ˆ˜) : ์ ํ•ฉ์„ฑ ๊ฒ€์ • 
๊ท€๋ฌด๊ฐ€์„ค : ๊ด€์ธก์น˜์™€ ๊ธฐ๋Œ€์น˜๋Š” ์ฐจ์ด๊ฐ€ ์—†๋‹ค.
๋Œ€๋ฆฝ๊ฐ€์„ค : ๊ด€์ธก์น˜์™€ ๊ธฐ๋Œ€์น˜๋Š” ์ฐจ์ด๊ฐ€ ์žˆ๋‹ค. 

real_data = [4, 6, 17, 16, 8, 9] # ๊ด€์ธก์น˜
exp_data = [10,10,10,10,10,10] # ๊ธฐ๋Œ€์น˜

chis = stats.chisquare(real_data, exp_data)
print(chis)

(statistic=14.200000000000001, pvalue=0.014387678176921308)

print('๊ฒ€์ •ํ†ต๊ณ„๋Ÿ‰ =', chis[0])
print('์œ ์˜ํ™•๋ฅ  =', chis[1])

์œ ์˜ํ™•๋ฅ  = 0.014387678176921308

real_arr = np.array(real_data)
exp_arr = np.array(exp_data)

statistic = sum((real_arr - exp_arr)**2 / exp_arr)
statistic # 14.200000000000001

[ํ•ด์„ค] ์œ ์˜๋ฏธํ•œ ์ˆ˜์ค€์—์„œ ์ฐจ์ด๊ฐ€ ์žˆ๋‹ค.


 

2. ์ด์› chi-square(2๊ฐœ ๋ณ€์ˆ˜) : ๊ต์ฐจ๋ถ„ํ• ํ‘œ 
๊ท€๋ฌด๊ฐ€์„ค : ๊ต์œก์ˆ˜์ค€๊ณผ ํก์—ฐ์œจ ๊ฐ„์— ๊ด€๋ จ์„ฑ์ด ์—†๋‹ค.(๊ธฐ๊ฐ€)
๋Œ€๋ฆฝ๊ฐ€์„ค : ๊ต์œก์ˆ˜์ค€๊ณผ ํก์—ฐ์œจ ๊ฐ„์— ๊ด€๋ จ์„ฑ์ด ์žˆ๋‹ค.(์ฑ„ํƒ)

 

smoke = pd.read_csv(r'C:\ITWILL\4_Python-II\data\smoke.csv')
smoke.info()

0   education  355 non-null    int64
1   smoking    355 non-null    int64

 

smoke.education.value_counts()

1    211
3     92
2     52

smoke.smoking.value_counts()

2    141
1    116
3     98

๋‹จ๊ณ„1 : ๋ณ€์ˆ˜ ์„ ํƒ 

education = smoke.education
smoking = smoke.smoking


๋‹จ๊ณ„2 : ๊ต์ฐจ๋ถ„ํ• ํ‘œ : ๊ด€์ธก์น˜ 

tab = pd.crosstab(index=education, columns=smoking, margins=True)
tab

smoking      1    2   3  All
education                   
1           51   92  68  211
2           22   21   9   52
3           43   28  21   92
All        116  141  98  355

๋‹จ๊ณ„3 : ์นด์ด์ œ๊ณฑ ๊ฒ€์ • : ๊ต์ฐจ๋ถ„ํ• ํ‘œ ์ด์šฉ 

chi2, pvalue, df, evalue = stats.chi2_contingency(observed=tab) 

print('๊ฒ€์ •ํ†ต๊ณ„๋Ÿ‰ : %.6f, ์œ ์˜ํ™•๋ฅ  : %.6f, ์ž์œ ๋„ : %d'%(chi2, pvalue,df))

๊ฒ€์ •ํ†ต๊ณ„๋Ÿ‰ : 18.910916, ์œ ์˜ํ™•๋ฅ  : 0.000818, ์ž์œ ๋„ : 4

๋‹จ๊ณ„4 : ๊ธฐ๋Œ€์น˜  

print(evalue)

[[68.94647887 83.8056338  58.24788732]
 [16.9915493  20.65352113 14.35492958]
 [30.06197183 36.54084507 25.3971831 ]]
[ํ•ด์„] ์œ ์˜๋ฏธํ•œ ์ˆ˜์ค€์—์„œ ๊ต์œก์ˆ˜์ค€๊ณผ ํก์—ฐ์œจ ๊ฐ„์˜ ๊ด€๋ จ์„ฑ์ด ์žˆ๋‹ค. 

51 ๊ธฐ์ค€ ๊ธฐ๋Œ€์น˜ = (ํ–‰ํ•ฉ * ์—ดํ•ฉ) / ์ „์ฒดํ•ฉ 

e11 = (211 * 116) / 355
e11 #  68.94647887323944


51 ๊ธฐ์ค€ ๊ธฐ๋Œ€๋น„์œจ = sum((๊ด€์ธก์น˜-๊ธฐ๋Œ€์น˜)**2) / ๊ธฐ๋Œ€์น˜

e11_ratio = (51.0-e11)**2 / e11 
e11_ratio # 4.671393074906378

 

 

 

 

 

t๊ฒ€์ •

t ๋ถ„ํฌ์— ๋Œ€ํ•œ ๊ฐ€์„ค๊ฒ€์ •  


t๊ฒ€์ • : ๋ชจ์ง‘๋‹จ์ด ์ •๊ทœ๋ถ„ํฌ์ด๊ณ , ๋ชจ๋ถ„์‚ฐ์ด ์•Œ๋ ค์ง€์ง€ ์•Š์€ ๊ฒฝ์šฐ 
z๊ฒ€์ • : ๋ชจ์ง‘๋‹จ์ด ์ •๊ทœ๋ถ„ํฌ์ด๊ณ , ๋ชจ๋ถ„์‚ฐ์ด ์•Œ๋ ค์ง„ ๊ฒฝ์šฐ
1. ํ•œ ์ง‘๋‹จ ํ‰๊ท  ๊ฒ€์ • : ๋ชจํ‰๊ท  ๊ฒ€์ •   
2. ๋‘ ์ง‘๋‹จ ํ‰๊ท  ๊ฒ€์ •
3. ๋Œ€์‘ ๋‘ ์ง‘๋‹จ


from scipy import stats #t-test
import numpy as np #sampling
import pandas as pd #csv file read

 


1. ํ•œ ์ง‘๋‹จ ํ‰๊ท  ๊ฒ€์ • : ๋‚จ์ž ํ‰๊ท  ํ‚ค(๋ชจํ‰๊ท ) : 175.5cm  

sample_data = np.random.uniform(172,179, size=29) 
print(sample_data)


๊ธฐ์ˆ ํ†ต๊ณ„ 

print('ํ‰๊ท  ํ‚ค =', sample_data.mean())


๋‹จ์ผ์ง‘๋‹จ ํ‰๊ท ์ฐจ์ด ๊ฒ€์ • 

one_group_test = stats.ttest_1samp(sample_data, 175.5) 
print('t๊ฒ€์ • ํ†ต๊ณ„๋Ÿ‰ = %.3f, pvalue = %.5f'%(one_group_test))

t๊ฒ€์ • ํ†ต๊ณ„๋Ÿ‰ = 0.381, pvalue = 0.70619
[ํ•ด์„ค] ํ‘œ๋ณธ์˜ ํ‰๊ท ์€ ๋ชจํ‰๊ท ๊ณผ ์ฐจ์ด๊ฐ€ ์—†๋‹ค.  


 

2. ๋‘ ์ง‘๋‹จ ํ‰๊ท  ๊ฒ€์ • : ๋‚จ์—ฌ ํ‰๊ท  ์ ์ˆ˜ ์ฐจ์ด ๊ฒ€์ • 

female_score = np.random.uniform(50, 100, size=30) # ์—ฌ์„ฑ 
male_score = np.random.uniform(45, 95, size=30) # ๋‚จ์„ฑ 

two_sample = stats.ttest_ind(female_score, male_score)
print(two_sample)
print('๋‘ ์ง‘๋‹จ ํ‰๊ท  ์ฐจ์ด ๊ฒ€์ • = %.3f, pvalue = %.3f'%(two_sample))

๋‘ ์ง‘๋‹จ ํ‰๊ท  ์ฐจ์ด ๊ฒ€์ • = 0.321, pvalue = 0.750
[ํ•ด์„ค] ๋‚จ์—ฌ ํ‰๊ท  ์ ์ˆ˜ ์ฐจ์ด๊ฐ€ ์—†๋‹ค.

file ์ž๋ฃŒ ์ด์šฉ : ๊ต์œก๋ฐฉ๋ฒ•์— ๋”ฐ๋ฅธ ์‹ค๊ธฐ์ ์ˆ˜์˜ ํ‰๊ท ์ฐจ์ด ๊ฒ€์ •  

sample = pd.read_csv('c:/itwill/4_python-ii/data/two_sample.csv')
print(sample.info())

two_df = sample[['method', 'score']]
print(two_df)

two_df.method.value_counts()

1    120
2    120

๊ต์œก๋ฐฉ๋ฒ• ๊ธฐ์ค€ subset

method1 = two_df[two_df.method==1]
method2 = two_df[two_df.method==2]

method1.info()

0   method  120 non-null    int64  
1   score   88 non-null     float64

score ์นผ๋Ÿผ ์ถ”์ถœ 

score1 = method1.score
score2 = method2.score

 

์ „์ฒ˜๋ฆฌ : ๊ฒฐ์ธก์น˜ ์ฒ˜๋ฆฌ 

score1 = score1.fillna(0)
score2 = score2.fillna(0)


๋‘ ์ง‘๋‹จ ํ‰๊ท ์ฐจ์ด ๊ฒ€์ • 

two_sample = stats.ttest_ind(score1, score2)
print(two_sample)

Ttest_indResult(statistic=-0.7833843755616479, pvalue=0.4341802874737909)
[ํ•ด์„ค] ๋‘ ๊ต์œก๋ฐฉ๋ฒ•์— ๋”ฐ๋ฅธ ํ‰๊ท ์— ์ฐจ์ด๊ฐ€ ์—†๋‹ค.

 

 

3. ๋Œ€์‘ ๋‘ ์ง‘๋‹จ : ๋ณต์šฉ์ „ 65 -> ๋ณต์šฉํ›„ 60 ๋ชธ๋ฌด๊ฒŒ ๋ณ€ํ™˜  

before = np.random.randint(60, 65, size=30)  
after = np.random.randint(59, 64,  size=30) 

paired_sample = stats.ttest_rel(before, after)
print(paired_sample)
print('t๊ฒ€์ • ํ†ต๊ณ„๋Ÿ‰ = %.5f, pvalue = %.5f'%paired_sample)

 

 

 

 

 

๊ณต๋ถ„์‚ฐ vs ์ƒ๊ด€๊ณ„์ˆ˜ 

๊ณตํ†ต์  : ๋ณ€์ˆ˜(๋น„์œจ,๋“ฑ๊ฐ„ ์ฒ™๋„) ๊ฐ„์˜ ์ƒ๊ด€์„ฑ ๋ถ„์„ 
 
1. ๊ณต๋ถ„์‚ฐ : ๋‘ ํ™•๋ฅ ๋ณ€์ˆ˜ ๊ฐ„์˜ ๋ถ„์‚ฐ(ํ‰๊ท ์—์„œ ํผ์ง ์ •๋„)๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ํ†ต๊ณ„
ํ™•๋ฅ ๋ณ€์ˆ˜ : X, Y 
์‹ : Cov(X,Y) = sum( (X-xmu) * (Y-ymu) ) / n
 
Cov(X, Y) > 0 : X๊ฐ€ ์ฆ๊ฐ€ํ•  ๋•Œ Y๋„ ์ฆ๊ฐ€
Cov(X, Y) < 0 : X๊ฐ€ ์ฆ๊ฐ€ํ•  ๋•Œ Y๋Š” ๊ฐ์†Œ
Cov(X, Y) = 0 : ๋‘ ๋ณ€์ˆ˜๋Š” ์„ ํ˜•๊ด€๊ณ„ ์•„๋‹˜(์„œ๋กœ ๋…๋ฆฝ์  ๊ด€๊ณ„) 
๋ฌธ์ œ์  : ๊ฐ’์ด ํฐ ๋ณ€์ˆ˜์— ์˜ํ–ฅ์„ ๋ฐ›๋Š”๋‹ค.(๊ฐ’ ํฐ ๋ณ€์ˆ˜๊ฐ€ ์ƒ๊ด€์„ฑ ๋†’์Œ)
    
2. ์ƒ๊ด€๊ณ„์ˆ˜ : ๊ณต๋ถ„์‚ฐ์„ ๊ฐ๊ฐ์˜ ํ‘œ์ค€ํŽธ์ฐจ๋กœ ๋‚˜๋ˆˆ์–ด ์ •๊ทœํ™”ํ•œ ํ†ต๊ณ„
๊ณต๋ถ„์‚ฐ ๋ฌธ์ œ์  ํ•ด๊ฒฐ 
๋ถ€ํ˜ธ๋Š” ๊ณต๋ถ„์‚ฐ๊ณผ ๋™์ผ, ๊ฐ’์€ ์ ˆ๋Œ€๊ฐ’ 1์„ ๋„˜์ง€ ์•Š์Œ(-1 ~ 1)    
์‹ : Corr(X, Y) = Cov(X,Y) / std(X) * std(Y)


import pandas as pd 
score_iq = pd.read_csv(r'c:/itwill/4_python-ii/data/score_iq.csv')
print(score_iq)



1. ํ”ผ์–ด์Šจ ์ƒ๊ด€๊ณ„์ˆ˜ ํ–‰๋ ฌ 

corr = score_iq.corr(method='pearson')
print(corr)

              sid     score        iq   academy      game        tv
sid      1.000000 -0.014399 -0.007048 -0.004398  0.018806  0.024565
score   -0.014399  1.000000  0.882220  0.896265 -0.298193 -0.819752
iq      -0.007048  0.882220  1.000000  0.671783 -0.031516 -0.585033
academy -0.004398  0.896265  0.671783  1.000000 -0.351315 -0.948551
game     0.018806 -0.298193 -0.031516 -0.351315  1.000000  0.239217
tv       0.024565 -0.819752 -0.585033 -0.948551  0.239217  1.000000

 


 
2. ๊ณต๋ถ„์‚ฐ ํ–‰๋ ฌ 

cov = score_iq.cov()
print(cov)

                 sid      score         iq   academy      game        tv
sid      1887.500000  -4.100671  -2.718121 -0.231544  1.208054  1.432886
score      -4.100671  42.968412  51.337539  7.119911 -2.890201 -7.214586
iq         -2.718121  51.337539  78.807338  7.227293 -0.413691 -6.972975
academy    -0.231544   7.119911   7.227293  1.468680 -0.629530 -1.543400
game        1.208054  -2.890201  -0.413691 -0.629530  2.186309  0.474899
tv          1.432886  -7.214586  -6.972975 -1.543400  0.474899  1.802640

 

 


3. ๊ณต๋ถ„์‚ฐ vs ์ƒ๊ด€๊ณ„์ˆ˜ ์‹ ์ ์šฉ 
1) ๊ณต๋ถ„์‚ฐ : Cov(X, Y) = sum( (X-xmu) * (Y-ymu) ) / n-1

X = score_iq['score']
Y = score_iq['iq']


ํ‘œ๋ณธํ‰๊ท  

muX = X.mean()
muY = Y.mean()


ํ‘œ๋ณธ์˜ ๊ณต๋ถ„์‚ฐ 

Cov = sum((X - muX) * (Y - muY)) / (len(X)-1)
print('Cov =', Cov)


2) ์ƒ๊ด€๊ณ„์ˆ˜ : Corr(X, Y) = Cov(X,Y) / std(X) * std(Y)

stdX = X.std()
stdY = Y.std()

Corr = Cov / (stdX * stdY)
print('Corr =', Corr)

 

 

 

 

 

 

regression

scipy ํŒจํ‚ค์ง€ ์ด์šฉ 
 1. ๋‹จ์ˆœ์„ ํ˜•ํšŒ๊ท€๋ถ„์„ 
 2. ๋‹ค์ค‘์„ ํ˜•ํšŒ๊ท€๋ถ„์„


from scipy import stats #ํšŒ๊ท€๋ถ„์„ 
import pandas as pd #csv file read 
import matplotlib.pyplot as plt 

score_iq = pd.read_csv(r'C:\ITWILL\4_Python-2\data\score_iq.csv')
score_iq.info()



1. ๋‹จ์ˆœ์„ ํ˜•ํšŒ๊ท€๋ถ„์„ x -> y
1) ๋ณ€์ˆ˜ ์„ ํƒ 

x = score_iq['iq'] # score_iq.iq
y = score_iq['score']


2) model ์ƒ์„ฑ 

model = stats.linregress(x, y) # iq -> score
print(model)

LinregressResult
(slope=0.6514309527270075,   : x๊ธฐ์šธ๊ธฐ(ํšŒ๊ท€๊ณ„์ˆ˜)
 intercept=-2.8564471221974657, : y์ ˆํŽธ 
 rvalue=0.8822203446134699,     : R^2 - ์„ค๋ช…๋ ฅ(๊ฒฐ์ •๊ณ„์ˆ˜) 
 pvalue=2.8476895206683644e-50, : F๊ฒ€์ • - ์œ ์˜์„ฑ๊ฒ€์ • 
 stderr=0.028577934409305443,   : ํ‘œ์ค€์˜ค์ฐจ 
 intercept_stderr=3.546211918048538)

a = model.slope #๊ธฐ์šธ๊ธฐ 
b = model.intercept #์ ˆํŽธ

score_iq.head()


ํšŒ๊ท€๋ฐฉ์ •์‹ -> y์˜ˆ์ธก์น˜ 

X = 140 
Y = 90
y_pred = X*a + b 
print(y_pred) # 88.34388625958358

err = Y - y_pred
print('error = ', err) # error =  1.6561137404164157


์ „์ฒด ๊ด€์ธก์น˜ ๋Œ€์ƒ 

len(x) # 150
y_pred = x*a + b
print(y_pred)


๊ด€์ธก์น˜ vs ์˜ˆ์ธก์น˜ 

y.mean() # 77.77333333333333
y_pred.mean() # 77.77333333333334

 



2. ํšŒ๊ท€๋ชจ๋ธ ์‹œ๊ฐํ™” 
์‚ฐ์ ๋„ 

plt.plot(score_iq['iq'], score_iq['score'], 'b.')


ํšŒ๊ท€์„  

plt.plot(score_iq['iq'], y_pred, 'r-')
plt.title('line regression')
plt.legend(['x y scatter', 'line regression'])
plt.show()




3. ๋‹ค์ค‘์„ ํ˜•ํšŒ๊ท€๋ถ„์„ : formula ํ˜•์‹(y ~ x1 + x2, ...)
๋ณ€์ˆ˜๋ช… : ์ (.), ๊ณต๋ฐฑ -> '_' ๊ต์ฒด 

from statsmodels.formula.api import ols


์ƒ๊ด€๊ณ„์ˆ˜ ํ–‰๋ ฌ 

corr = score_iq.corr()
corr['score'] # x1 = iq, x2 = academy, x3 = tv

obj = ols(formula='score ~ iq + academy + tv', data=score_iq)
model = obj.fit()
model #object info

 

ํšŒ๊ท€๋ถ„์„ ๊ฒฐ๊ณผ ์ œ๊ณต 

print(model.summary()) #R : summary(model)

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                  score   R-squared:                       0.946
Model:                            OLS   Adj. R-squared:                  0.945
Method:                 Least Squares   F-statistic:                     860.1
Date:                Fri, 12 Nov 2021   Prob (F-statistic):           1.50e-92
Time:                        11:18:16   Log-Likelihood:                -274.84
No. Observations:                 150   AIC:                             557.7
Df Residuals:                     146   BIC:                             569.7
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     24.7223      2.332     10.602      0.000      20.114      29.331
iq             0.3742      0.020     19.109      0.000       0.335       0.413
academy        3.2088      0.367      8.733      0.000       2.483       3.935
tv             0.1926      0.303      0.636      0.526      -0.406       0.791
==============================================================================
Omnibus:                       36.802   Durbin-Watson:                   1.905
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               57.833
Skew:                           1.252   Prob(JB):                     2.77e-13
Kurtosis:                       4.728   Cond. No.                     2.32e+03
==============================================================================
'''
# Adj. R-squared:                  0.945 : ๋ชจ๋ธ ์„ค๋ช…๋ ฅ 
# Prob (F-statistic):           1.50e-92 : ๋ชจ๋ธ ์œ ์˜์„ฑ ๊ฒ€์ •(<0.05) 
# coef    std err          t      P>|t| : x๋ณ€์ˆ˜ ์œ ์˜์„ฑ ๊ฒ€์ • 
# Durbin-Watson:                   1.905 : (1~4) ๋‹ค์ค‘๊ณต์ •์„ฑ ๋ฌธ์ œ 

dir(model)


ํšŒ๊ท€๊ณ„์ˆ˜ ๋ฐ˜ํ™˜ 

model.params

Intercept    24.722251 - y์ ˆํŽธ 
iq            0.374196 - x1 ๊ธฐ์šธ๊ธฐ 
academy       3.208802 - x2 ๊ธฐ์šธ๊ธฐ
tv            0.192573 - x3 ๊ธฐ์šธ๊ธฐ

y ์ ํ•ฉ์น˜ 

model.fittedvalues


ํ–‰๋ ฌ๊ณฑ(@) ์ด์šฉ ํšŒ๊ท€๋ฐฉ์ •์‹ 

X = score_iq[['iq', 'academy', 'tv']]
X.shape # (150, 3)

import numpy as np 
a = np.array([[0.374196],[3.208802],[0.192573]])
a.shape # (3, 1) - ๊ธฐ์šธ๊ธฐ ํ–‰๋ ฌ 

b = 24.722251 # ์ ˆํŽธ 
# y = x1*a1 + x2*a2 + x3*a3 + b
y_fitted = X @ a + b
print(y_fitted)


์ฐจํŠธ ๋ณด๊ธฐ : y vs y_fitted

y = score_iq['score']

plt.plot(y[:50], label='real value')
plt.plot(y_fitted[:50], label='predicted value')
plt.legend(loc= 'best')
plt.show()