Nunmpy ํŒจํ‚ค์ง€

์ˆ˜์น˜ ๊ณผํ•™์šฉ ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ ๋ชฉ์ ์œผ๋กœ ์‚ฌ์šฉ 
์„ ํ˜•๋Œ€์ˆ˜(๋ฒกํ„ฐ, ํ–‰๋ ฌ) ์—ฐ์‚ฐ ๊ด€๋ จ ํ•จ์ˆ˜ ์ œ๊ณต 
N์ฐจ์› ๋ฐฐ์—ด, ์„ ํ˜•๋Œ€์ˆ˜ ์—ฐ์‚ฐ, ๊ณ ์† ์—ฐ์‚ฐ
์ˆ˜ํ•™/ํ†ต๊ณ„ ํ•จ์ˆ˜ ์ œ๊ณต 
indexing/slicing 
broadcast ์—ฐ์‚ฐ : ์„œ๋กœ ๋‹ค๋ฅธ ์ฐจ์› ๊ฐ„์˜ ์—ฐ์‚ฐ 

import numpy as np


1. list ๋ฐฐ์—ด vs numpy ๋ฐฐ์—ด
1) list ๋ฐฐ์—ด

lst = [1, 2, 3, 3.5] #์ •์ˆ˜ํ˜•๊ณผ ์‹ค์ˆ˜ํ˜•
lst #[1, 2, 3, 3.5] -> ๋‹ค์–‘ํ•œ ์ž๋ฃŒํ˜•
lst * 3 #[1, 2, 3, 3.5, 1, 2, 3, 3.5, 1, 2, 3, 3.5]
#lst * 0.5 = TypeError


list + for

calc = [v*0.5 for v in lst]
calc #[0.5, 1.0, 1.5, 1.75]

sum(lst) #์™ธ๋ถ€ํ•จ์ˆ˜ ์ด์šฉ


2) numpy ๋ฐฐ์—ด

arr = np.array(lst) #list -> numpyํ˜• ๋ณ€ํ™˜
arr #array([1. , 2. , 3. , 3.5]) -> ๋™์ผํ•œ ์ž๋ฃŒํ˜•
arr * 0.5 #์‚ฐ์ˆ ์—ฐ์‚ฐ : array([0.5 , 1.  , 1.5 , 1.75])
arr.sum() #์ž์ฒด ๊ฐ์ฒด ์ง€์› ํ•จ์ˆ˜(mehod)

type(arr) #numpy.ndarray
dir(arr)
arr.size #4
arr.shape #(4,)
arr.ndim #1
 
arr2 = np.array([10, 20, 30, '40'])
arr2 #array(['10', '20', '30', '40'], dtype='<U11')

 

 


2. array() : ๋‹ค์ฐจ์› ๋ฐฐ์—ด ์ƒ์„ฑ ํ•จ์ˆ˜ 
1) ๋‹จ์ผ list -> 1์ฐจ์› ๋ฐฐ์—ด

lst1d = [3, 5.3, 4, 7]


list -> array

arr1d = np.array(lst1d)
arr1d.shape #(4,)
arr1d.ndim #1 -> 1์ฐจ์›


2) ์ค‘์ฒฉ list -> 2์ฐจ์› ๋ฐฐ์—ด

lst2d = [[1, 2, 3, 4], [5, 6, 7, 8]]
print(lst2d) #[[1, 2, 3, 4], [5, 6, 7, 8]]


list -> array

arr2d = np.array(lst2d)
print(arr2d)

[[1 2 3 4] -> 1ํ–‰
 [5 6 7 8]] -> 2ํ–‰

arr2d.shape #(2, 4)
arr2d.ndim #2
arr2d.size #8




3. broadcast ์—ฐ์‚ฐ
์ž‘์€ ์ฐจ์›์ด ํฐ ์ฐจ์›์œผ๋กœ ๋Š˜์–ด๋‚œ ํ›„ 1:1 ์—ฐ์‚ฐ

1) scala(0) vs vector(1)

arr1d * 0.5 #array([1.5 , 2.65, 2.  , 3.5 ])


2) scala(0) vs matrix(2)

arr2d * 0.5

array([[0.5, 1. , 1.5, 2. ],
       [2.5, 3. , 3.5, 4. ]])

3) vector(1) vs matrix(2)

arr1d.shape #(4,)
arr2d.shape #(2, 4)
arr1d * arr2d

array([[ 3. , 10.6, 12. , 28. ],
       [15. , 31.8, 28. , 56. ]])

print(arr1d) #[3.  5.3 4.  7. ]
mu = arr1d.mean() #4.825
diff = (arr1d - mu)**2 #broadcast ์—ฐ์‚ฐ array([3.330625, 0.225625, 0.680625, 4.730625])
type(diff) #numpy.ndarray

var = sum(diff) / arr1d.size #๋ถ„์‚ฐ 2.241875
var = sum(diff) / (arr1d.size -1) #list๋ฐฐ์—ด๋ณด๋‹ค numpy๊ฐ€ ๊ณ„์‚ฐ์‹์—์„œ ๊ฐ„ํŽธํ•˜๋‹ค

 

 


4. zeros() ํ•จ์ˆ˜ vs ones() ํ•จ์ˆ˜
zeros(0ํ–‰๋ ฌ) : ๋ชจ๋“  ๊ฐ’์„ 0์œผ๋กœ ์ดˆ๊ธฐํ™”ํ•œ๋‹ค, ๋ชจ๋“  ํ–‰๋ ฌ์ด 0์„ ๊ฐ–๊ฒŒ ํ•œ๋‹ค.
ones(1ํ–‰๋ ฌ) : ๋ชจ๋“  ๊ฐ’์„ 1๋กœ ์ดˆ๊ธฐํ™”ํ•œ๋‹ค, ๋ชจ๋“  ํ–‰๋ ฌ์ด 1์„ ๊ฐ–๊ฒŒ ํ•œ๋‹ค.

zarr = np.zeros((3,10)) #3ํ–‰ 10์—ด์˜ 0ํ–‰๋ ฌ
zarr

array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]]) -> ๋ฐ”๊นฅ๊ด„ํ˜ธ 2๊ฐœ : 2์ฐจ์›

oarr = np.ones((3,10))
oarr

array([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]])



5. arange(start, stop, step)
1) range vs arange

list(range(1,11)) #[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
#range(-1.0, 10.5) : TypeError : 'float'

np.arange(1, 11) #array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])
np.arange(-1.2, 5.5) #array([-1.2, -0.2,  0.8,  1.8,  2.8,  3.8,  4.8])


x์˜ ์ˆ˜์—ด์— ๋Œ€ํ•œ 2์ฐจ ๋ฐฉ์ •์‹

x = np.arange(-1.0, 2, 0.1)
x.size #30 (์›์†Œ์˜ ๊ฐœ์ˆ˜)


f(x) ํ•จ์ˆ˜

def f(x) :
    y = x**2 + 2*x +3
    return y


ํ•จ์ˆ˜ ํ˜ธ์ถœ

f(x) #y๋ฐ˜ํ™˜


2์ฐจ ๋ฐฉ์ •์‹ ๊ทธ๋ž˜ํ”„

import matplotlib.pyplot as plt

plt.plot(x, f(x)) #์„ ๊ทธ๋ž˜ํ”„(x์ถ•, y์ถ•)
plt.show


2) ์ƒ‰์ธ

zarr #0ํ–‰๋ ฌ

cnt = 0
for i in np.arange(3) : #ํ–‰ index
    for j in np.arange(10) : #์—ด index
        cnt += 1    
        zarr[i,j] = cnt
        
zarr

 

 

 

 

 

indexing

1์ฐจ์› : list ์ƒ‰์ธ ๋™์ผ
2,3์ฐจ์› ์ƒ‰์ธ
์กฐ๊ฑด์‹ ์ƒ‰์ธ

import numpy as np



1. ์ƒ‰์ธ(indexing) : ์ž๋ฃŒ ์ฐธ์กฐ
1์ฐจ์› : obj[index]] : list ๋™์ผ
2์ฐจ์› : obj[ํ–‰ ,์—ด]
3์ฐจ์› : obj[๋ฉด, ํ–‰, ์—ด]
    
1) list ์ƒ‰์ธ

lst = list(range(6)) #0~5
lst #[0, 1, 2, 3, 4, 5]

lst[2] #2
lst[:3] #[0, 1, 2]
lst[3:] #[3, 4, 5]
lst[-1] #5


2) numpy ์ƒ‰์ธ

arr = np.arange(10)
arr #array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
arr[2] #2
arr[:3] #array([0, 1, 2])
arr[3:] #array([3, 4, 5, 6, 7, 8, 9])
arr[-1] #9

 

 


2. slicing : ํŠน์ • ๋ถ€๋ถ„์„ ์ž˜๋ผ์„œ new object

arr_obj = arr[1:4]
arr_obj #array([1, 2, 3])


์ „์ฒด ์›์†Œ ์ˆ˜์ •

arr_obj[:] = 100
arr_obj #array([100, 100, 100])


์›๋ณธ ๋‚ด์šฉ ํ™•์ธ

arr #array([  0, 100, 100, 100,   4,   5,   6,   7,   8,   9])

* ํ•ด๋‹น ๊ตฌ๊ฐ„์€ ์›๋ณธ์—์„œ๋„ ์ˆ˜์ •๋จ.

WHY? slicingํ•  ๋•Œ ์›๋ณธ์˜ ์ฃผ์†Œ๋ฅผ ๋ฐ˜ํ™˜ํ•˜๊ธฐ ๋•Œ๋ฌธ 

๋‚ด์šฉ ๋ฐ˜ํ™˜

arr_obj2 = arr[1:4].copy()
arr_obj2
arr_obj2[:] = 500
arr_obj2 #array([500, 500, 500])


์›๋ณธ ๋‚ด์šฉ

arr #array([  0, 100, 100, 100,   4,   5,   6,   7,   8,   9])

* list ์ƒ‰์ธ ๋ฌธ๋ฒ•์„ ๊ทธ๋Œ€๋กœ ์ ์šฉํ•˜๋ฉด ๋จ



3. ๊ณ ์ฐจ์› ์ƒ‰์ธ
1) 2์ฐจ์› ์ƒ‰์ธ

arr2d = np.array([[1,2,3], [4,5,6], [7,8,9]])
arr2d

array([[1, 2, 3], -> 1ํ–‰
       [4, 5, 6], -> 2ํ–‰
       [7, 8, 9]])-> 3ํ–‰

arr2d.shape #(3, 3)


ํ–‰ index ๊ธฐ๋ณธ

arr2d[0] #arr2d[0, :]
arr2d[:2] #์—ฐ์† 2ํ–‰ ์„ ํƒ
arr2d[[0,2]] #๋น„์—ฐ์† 2ํ–‰ ์„ ํƒ 
arr2d[:, [0,2]] #ํ–‰ ์ „์ฒด ์„ ํƒ, ๋น„์—ฐ์† 2์—ด ์„ ํƒ
arr2d[::2] #ํ™€์ˆ˜ํ–‰ ์„ ํƒ [start:stop:step]
arr2d[:2,1:] #box์„ ํƒ


2) 3์ฐจ์› ์ƒ‰์ธ

arr3d = np.array([ [[1,2,3], [4,5,6]], [[7,8,9], [10,11,12]] ]) #[]3๊ฐœ ์ค‘์ฒฉ -> 3์ฐจ์› 
arr3d

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

๋ฉด index ๊ธฐ๋ณธ

arr3d[0] #1๋ฉด์˜ ์ƒ‰์ธ
arr3d[1] #6๋ฉด์˜ ์ƒ‰์ธ

arr3d[0,1] #1๋ฉด 2ํ–‰ -> array([4, 5, 6])
arr3d[0, 1, 2] #1๋ฉด 2ํ–‰ 3์—ด -> 6

* 4์ฐจ์› : image = [size, h, w, c]



4. ์กฐ๊ฑด์‹ ์ƒ‰์ธ

dataset = np.random.randn(3,4)
dataset
dataset.shape #(3, 4)


0.7 ์ด์ƒ ์›์†Œ ์ถ”์ถœ

dataset[dataset >= 0.7] #array([1.21270247, 0.82024283, 0.91840696, 1.71777067])

 

0.1 ~ 0.7 ์›์†Œ ์ถ”์ถœ : ๋ฒ”์œ„

dataset[dataset >= 0.1 and dataset <= 0.7] #ValueError


numpy ๋…ผ๋ฆฌ์‹ ํ•จ์ˆ˜

np.logcal_and() #๋…ผ๋ฆฌ๊ณฑ
np.logcal_or() #๋…ผ๋ฆฌ ํ•ฉ
np.logcal_not() #๋ถ€์ •
np.logcal_xor() #๋ฐฐํƒ€์  ๋…ผ๋ฆฌํ•ฉ

dataset[np.logical_and(dataset >= 0.1, dataset <= 0.7)] #array([0.12758051, 0.5333522 , 0.11015089])


pandas ๊ฐ์ฒด ์ ์šฉ

import pandas as pd

ser = pd.Series([3, 2, 5, 4, 1, 8])
ser[np.logical_and(ser >= 3, ser <=5)]

0    3
2    5
3    4

 

 

 

 

 

universal

๋ฒ”์šฉํ•จ์ˆ˜(universal function) 
numpy ์ผ๋ฐ˜์ ์ธ ์ˆ˜ํ•™/ํ†ต๊ณ„ ๊ด€๋ จ ํ•จ์ˆ˜ 


import numpy as np

 


1. numpy ์ œ๊ณต ํ•จ์ˆ˜ : np.ํ•จ์ˆ˜(object)
object : list or ๊ธฐํƒ€ 
list ๊ฐ์ฒด ์ž์ฒด๋Š” ์ˆ˜ํ•™/ํ†ต๊ณ„ ํ•จ์ˆ˜ ์—†์Œ 

data = [1, 3, -5, 7] # list object

np.abs(data) #  ์ ˆ๋Œ€๊ฐ’ : array([1, 3, 5, 7])
np.sqrt(data) # ์ œ๊ณฑ๊ทผ : array([1.   , 1.73205081, nan, 2.64575131])
np.sqrt(np.abs(data)) # [1.        , 1.73205081, 2.23606798, 2.64575131]
np.square(data) # ์ œ๊ณฑ : [ 1,  9, 25, 49]
np.sign(data) # ๋ถ€ํ˜ธ : [ 1,  1, -1,  1]
np.var(data) # ๋ถ„์‚ฐ : 18.75
np.std(data) # ํ‘œ์ค€ํŽธ์ฐจ : 4.330127018922

 

๋กœ๊ทธ : ์™„๋งŒํ•œ ๋ณ€ํ™” - data ์ •๊ทœํ™” : ์ผ์ •ํ•œ ๋ฒ”์œ„๋กœ ์กฐ์ • 

data2 = np.array([1, 2.5, 3.35, 4.6, 55.3])
data2
np.log(data2) # [0.   , 0.91629073, 1.20896035, 1.5260563 , 4.01277291]


์ง€์ˆ˜ : ๊ธ‰๊ฒฉํ•œ ๋ณ€ํ™˜ - sigmoid ํ•จ์ˆ˜ 

e = np.exp(1) # 2.71828182845904
np.exp(data2)# [2.71828183e+00, 1.21824940e+01, 2.85027336e+01, 9.94843156e+01, 1.03868737e+24]


๋ฐ˜์˜ฌ๋ฆผ ํ•จ์ˆ˜ 

np.ceil(data2) # [ 1.,  3.,  4.,  5., 56.] - ํฐ ์ •์ˆ˜ ์˜ฌ๋ฆผ 
np.rint(data2) # [ 1.,  2.,  3.,  5., 55.] - ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ์ •์ˆ˜ ์˜ฌ๋ฆผ 
np.round(data2, 1) # [ 1. ,  2.5,  3.4,  4.6, 55.3]


๊ฒฐ์ธก์น˜ ์ฒ˜๋ฆฌ 

data3 = np.array([1, 2.5, 3.35, 4.6, np.nan])
data3 # [1.  , 2.5 , 3.35, 4.6 ,  nan]

np.isnan(data3) # [False, False, False, False,  True]


๊ฒฐ์ธก์น˜ ์ œ์™ธ : ์กฐ๊ฑด์‹

result = data3[np.logical_not(np.isnan(data3))] # True -> False 
result # [1.  , 2.5 , 3.35, 4.6 ]

data3[~np.isnan(data3)] # ๋ถ€์ • ๊ธฐํ˜ธ : ~

 

 


2. numpy ๊ฐ์ฒด ์ œ๊ณต ํ•จ์ˆ˜ : object.ํ•จ์ˆ˜() = method 
np.random.randn() - ํŒจํ‚ค์ง€.๋ชจ๋“ˆ.ํ•จ์ˆ˜()

data4 = np.random.randn(100, 400) # 2์ฐจ์› ํ‘œ์ค€์ •๊ทœ๋ถ„ํฌ ๋‚œ์ˆ˜ : N(0, 1^2)
data4

array([[ 0.70393722,  0.22611989, -0.26978612, -0.15859705],
       [ 1.05361659, -0.10306144,  0.03118237,  0.22184977],
       [-0.77531471,  1.33257045, -0.6020415 , -0.15902342]])

type(data4) # numpy.ndarray
data4.size # 12 -> 40000
data4.shape # (3, 4) -> (100, 400)

data4.sum() # 1.5014520523479282
data4.mean() # 0.12512100436232734 -> 0.0037392158490974635
data4.std() #  0.604996608366 -> 1.0006175982808039
data4.max() # 1.3325704466284878
data4.min() # -0.7753147073709586


method ๋ชฉ๋ก ํ™•์ธ 

dir(data4)




3. axis ์†์„ฑ 

data5 = np.random.randn(10, 20)
data5

ํ–‰์ถ•(axis=0) : ๊ฐ™์€ ์—ด ๋ชจ์Œ = ์—ด ๋‹จ์œ„ ํ†ต๊ณ„
์—ด์ถ•(axis=1) : ๊ฐ™์€ ํ–‰ ๋ชจ์Œ = ํ–‰ ๋‹จ์œ„ ํ†ต๊ณ„

์ „์ฒด ํ‰๊ท  

data5.mean() # 0.03548252444038019


ํ–‰์ถ• ํ‰๊ท  

data5.mean(axis = 0) # 1d(20)


์—ด์ถ• ํ‰๊ท  

data5.mean(axis = 1) # 1d(10)

 

 



+ Recent posts