LEE_BOMB 2021. 11. 23. 19:42
Seaborn ํŒจํ‚ค์ง€

๋‹ค์–‘ํ•œ ๋ฐฐ๊ฒฝ ํ…Œ๋งˆ ์ œ๊ณต
ํ†ต๊ณ„์šฉ ์ฐจํŠธ ์ œ๊ณต
์ž์ฒด dataset ์ œ๊ณต

import seaborn as sn #๋ณ„์นญ



์ œ๊ณต dataset ํ™•์ธ

print(sn.get_dataset_names())

['anagrams', 'anscombe', 'attention', 'brain_networks', 'car_crashes', 'diamonds', 'dots', 'exercise', 'flights', 'fmri', 'gammas', 'geyser', 'iris', 'mpg', 'penguins', 'planets', 'taxis', 'tips', 'titanic']

len(sn.get_dataset_names()) # 19




2. dataset load

iris = sn.load_dataset(name='iris')
type(iris) #pandas.core.frame.DataFrame
iris.info()
iris.head()

tips = sn.load_dataset(name='tips')
tips.info()

titanic = sn.load_dataset(name='titanic')
titanic.info()

flights = sn.load_dataset(name='flights')
flights.info()

 

 

 

 

 

object vs category

object : ์ˆœ์„œ ๋ณ€๊ฒฝ์ด ๋ถˆ๊ฐ€๋Šฅํ•œ ๋ฌธ์ž์—ด
category : ์ˆœ์„œ ๋ณ€๊ฒฝ์ด ๊ฐ€๋Šฅํ•œ ๋ฌธ์ž์—ด

import seaborn as sn



1. object vs category
dataset load

tips = sn.load_dataset(name = 'tips')
titanic = sn.load_dataset(name = 'titanic')

titanic.info()


subset ์ƒ์„ฑ

df = titanic[['survived', 'age', 'class', 'who']]
df.info()

 2   class     891 non-null    category
 3   who       891 non-null    object
  
category ์ •๋ ฌ

 df.head()

 0   survived  891 non-null    int64   
 1   age       714 non-null    float64 
 2   class     891 non-null    category
 3   who       891 non-null    object
 
object ์ •๋ ฌ

df.sort_values(by = 'who') #์˜ค๋ฆ„์ฐจ์ˆœ
df['who'].unique() #array(['man', 'woman', 'child'], dtype=object)

'child' > 'man' > 'woman' : child(base)
object -> category
'man' > 'woman' > 'child' : man(base)

category ์ˆ˜์ • : DF['column'].astype('type')

df['who_new'] = df['who'].astype('category')
df.head()
df.info()

 3   who       891 non-null    object  
 4   who_new   891 non-null    category
  
๋ฒ”์ฃผ ์ˆœ์„œ ๋ณ€๊ฒฝ

df['who_new'] = df['who_new'].cat.set_categories(['man', 'woman', 'child'])


R์—์„œ๋Š”

df$who_new = fator(df$who_new, levels = c('man', 'woman', 'child'))


category ์ •๋ ฌ

df.sort_values(by = 'who_new')




2. ๋ฒ”์ฃผํ˜• ์ž๋ฃŒ ์‹œ๊ฐํ™”
1) ๊ทธ๋ž˜ํ”„ ๋ฐฐ๊ฒฝ ์Šคํƒ€์ผ

sn.set_style(style = 'darkgrid')


2) ๋ฒ”์ฃผํ˜• ์ž๋ฃŒ ์‹œ๊ฐํ™”

tips.head()
sn.countplot(x = 'smoker', data = tips) #2๊ฐœ ๋ฒ”์ฃผ 
sn.countplot(x = 'class', data = titanic) #3๊ฐœ ๋ฒ”์ฃผ
sn.countplot(x = 'day', data = tips) #4๊ฐœ ๋ฒ”์ฃผ

 

 

 

 

 

continous

์—ฐ์†ํ˜• ๋ณ€์ˆ˜ ์‹œ๊ฐํ™” 
ํžˆ์Šคํ† ๊ทธ๋žจ, ์‚ฐ์ ๋„, ์‚ฐ์ ๋„ ํ–‰๋ ฌ, boxplot

import matplotlib.pyplot as plt
import seaborn as sn



dataset load 

iris = sn.load_dataset('iris')
tips = sn.load_dataset('tips')
type(iris) # pandas.core.frame.DataFrame

 



1. ํžˆ์Šคํ† ๊ทธ๋žจ

iris.info()
x = iris.sepal_length # iris['sepal_length']

sn.histplot(data=x)
plt.title('sepal_length hist')
plt.show()

 



2. distplot : hist + kde 

sn.distplot(x, bins=20, hist=True, kde=True)
plt.title('sepal_length hist & kde')
plt.show()

kde : ์ปค๋„๋ฐ€๋„์ถ”์ • 


 

3. ์‚ฐ์ ๋„ ํ–‰๋ ฌ 
sn.pairplot(data = DataFrame, hue='์ง‘๋‹จ๋ณ€์ˆ˜') 

sn.pairplot(data = iris, hue='species') 
plt.show()

 

 


4. ์‚ฐ์ ๋„ : ์—ฐ์†ํ˜• vs ์—ฐ์†ํ˜• 

sn.scatterplot(x='sepal_length', y='petal_length',
               data = iris)
plt.show()

 

hue='์ง‘๋‹จ๋ณ€์ˆ˜'

sn.scatterplot(x='sepal_length', y='petal_length',
               hue='species', data = iris)
plt.show()




5. boxplot

tips.info()
sn.boxplot(x='day', y='total_bill',
           hue='sex', data=tips)
plt.show()

 

 

 

 

 

ํ†ต๊ณ„์  ๋ชจ๋ธ ๊ด€๋ จ ์‹œ๊ฐํ™” 

์‹œ๊ณ„์—ด๋ถ„์„, ์ƒ๊ด€๋ถ„์„, ํšŒ๊ท€๋ถ„์„, ๋ถ„๋ฅ˜๋ถ„์„  

import matplotlib.pyplot as plt
import seaborn as sn 
import pandas as pd  # object

 


dataset load 

iris = sn.load_dataset('iris')
tips = sn.load_dataset('tips')
flights = sn.load_dataset('flights')

iris.info()
tips.info()
flights.info()

 



1. ์˜ค์ฐจ๋Œ€์—ญํญ์„ ๊ฐ–๋Š” ์‹œ๊ณ„์—ด ์‹œ๊ฐํ™” 
x : ์‹œ๊ฐ„์ถ•, y : ํ†ต๊ณ„๋Ÿ‰ 

sn.lineplot(x='year', y='passengers',
            data = flights) 
plt.show()


hue ์ถ”๊ฐ€ : ์›” ๋‹จ์œ„ ํƒ‘์Šน๊ฐ ์ถ”์„ธ 

sn.lineplot(x='year', y='passengers',
            hue='month', data = flights) 
plt.show()
print(flights)




2. ์ƒ๊ด€๋ถ„์„ : ์—ฐ์†ํ˜• vs ์—ฐ์†ํ˜• vs ์ด์‚ฐํ˜•   

sn.relplot(x='total_bill', y='tip', 
           size='size', data = tips)
plt.show()


์ƒ๊ด€๊ณ„์ˆ˜ : 0.6757341092113641

tips['total_bill'].corr(tips['tip'])

 



3. ์„ ํ˜•ํšŒ๊ท€๋ถ„์„ : ์‚ฐ์ ๋„์™€ ํšŒ๊ท€์„ , ์‹ ๋ขฐ๊ตฌ๊ฐ„(ci=95)  

sn.lmplot(x='sepal_length', y='petal_length', 
          data=iris, ci=95)
plt.show()


hue ์†์„ฑ : species

sn.lmplot(x='sepal_length', y='petal_length', 
          data=iris, hue='species',  ci=95)
plt.show()




4. ๋ถ„๋ฅ˜๋ถ„์„ : ํ˜ผ๋™ํ–‰๋ ฌ(confusion matrix)
1) y true vs y pred 

y_true = pd.Series([1,0,1,0,0])
y_pred = pd.Series([1,0,1,0,1])


2) ํ˜ผ๋™ํ–‰๋ ฌ : ๊ต์ฐจ๋ถ„ํ• ํ‘œ 

mat = pd.crosstab(index=y_true, columns=y_pred)
mat

col_0  0  1
row_0      
0      2  1
1      0  2

type(mat) # pandas.core.frame.DataFrame

 

3) ๋ถ„๋ฅ˜์ •ํ™•๋„ 

score = (mat.iloc[0,0] + mat.iloc[1,1]) / len(y_true) 
print('๋ถ„๋ฅ˜์ •ํ™•๋„ = ', score) # ๋ถ„๋ฅ˜์ •ํ™•๋„ =  0.8


4) heatmap 

sn.heatmap(data=mat, annot=True, fmt=".2f", linewidths=2)


annot=True : label ํ‘œ์‹œ 

plt.ylabel("Real label")
plt.xlabel("Predicted label")
acc_score = f"Accuracy score : {score}"
plt.title(acc_score, size = 15)
plt.show()