๋ฐ์ดํ„ฐ๋ถ„์„๊ฐ€ ๊ณผ์ •/Hadoop

DAY78. Hadoop Basic Master ์„œ๋ฒ„ ์„ค์ •

LEE_BOMB 2022. 1. 11. 18:06

Hadoop ์‹œ์Šคํ…œ ๊ฐœ์š”
๊ตฌ๊ธ€ : GFS(Google File System)์™€ ๋งต๋ฆฌ๋“€์Šค(MapReduce)
2005๋…„ ๋”๊ทธ์ปคํŒ… ๊ตฌํ˜„
2008๋…„ ์•„ํŒŒ์น˜ ์ตœ์ƒ์œ„ ํ”„๋กœ์ ํŠธ ์Šน๊ฒฉ
์ฃผ์š” ๊ตฌ์„ฑ
- HDFS(Hadoop Distributed File System) : ์ €์žฅ์†Œ
- Map Reduce : ๋ถ„์‚ฐ์ฒ˜๋ฆฌ ์‹œ์Šคํ…œ

Hadoop ์ด์ 
์—ฌ๋Ÿฌ ๋Œ€ ์„œ๋ฒ„์— ๋ฐ์ดํ„ฐ ์ €์žฅ
๊ฐ ์„œ๋ฒ„์—์„œ ๋™์‹œ ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ(๋ถ„์‚ฐ์ฒ˜๋ฆฌ)
๊ธฐ์กด์˜ RDBMS ๋Œ€์ฒด
๊ณ ๊ฐ€ ์žฅ๋น„ ๋Œ€์‹  ๋ฆฌ๋ˆ…์Šค ์„œ๋ฒ„๋กœ ๋Œ€์ฒด
์˜คํ”ˆ ์†Œ์Šค (SW ๋ผ์ด์„ ์Šค, ๊ณ ๊ฐ€ HW ๋น„์šฉ ์ ˆ๊ฐ)


Hadoop ์—์ฝ” ์‹œ์Šคํ…œ



Hadoop+Hive+Spark ์†Œํ”„ํŠธ์›จ์–ด ์•„ํ‚คํ…์ฒ˜

 

 

Hadoop ๋ถ„์‚ฐ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ

 

 

ex) Word Count

 

 

Hadoop ๋ฐœ์ „๊ณผ์ •

Hadoop 1.0 Hadoop 2.0
- HDFS & MapReduce ๊ตฌ์„ฑ, MapReduce ์‹คํ–‰ ์‹œ ๋งตํผ๋Š” ๋ชจ๋‘ ๋™์ž‘ํ•˜๋Š”๋ฐ ๋ฆฌ๋“€์„œ๋Š” ๋†€๊ณ  ์žˆ๊ฑฐ๋‚˜ ๊ทธ ๋ฐ˜๋Œ€์˜ ๊ฒฝ์šฐ ๋ฐœ์ƒ
- ์ „์ฒด ํด๋Ÿฌ์Šคํ„ฐ ์‚ฌ์šฉ๋ฅ  ๋‚ฎ์Œ
- YARN(์–€)์ด ๋„์ž…๋˜๋ฉด์„œ ์ „์ฒด ํด๋Ÿฌ์Šคํ„ฐ์˜ ์ž์›(resource) ์ •๋ณด๋ฅผ ํ† ๋Œ€๋กœ ํšจ๊ณผ์ ์œผ๋กœ ์ž์›์„ ํ• ๋‹นํ•˜์—ฌ ๋งต๋ฆฌ๋“€์Šค ํƒœ์Šคํฌ ์‹คํ–‰
- ์ „์ฒด ํด๋Ÿฌ์Šคํ„ฐ ์‚ฌ์šฉ๋ฅ  ํ–ฅ์ƒ
- YARN์˜ ๋ฆฌ์†Œ์Šค ๋งค๋‹ˆ์ €๋Š” ์ „์ฒด ํด๋Ÿฌ์Šคํ„ฐ์˜ ๋ฆฌ์†Œ์Šค ๊ด€๋ฆฌ/ํ• ๋‹น

 

Hadoop1.0 ์•„ํ‚คํ…์ฒ˜

Job Tracker : ์‚ฌ์šฉ์ž๋กœ ๋ถ€ํ„ฐ Job ์‹คํ–‰ ์š”์ฒญ ๋ฐ›๊ณ , ํ•ด๋‹น Task Tracker์— ํ• ๋‹น

Task Tracker : ๋กœ์ปฌ Disk ๋Œ€์ƒ์œผ๋กœ ์‹ค์ œ Task ์‹คํ–‰, Job Tracker ์ง„ํ–‰ ๋ณด๊ณ 

 

Hadoop2.0 ์•„ํ‚คํ…์ฒ˜

 

 

Master vs Slave server

Master Server Slave server
NameNode, Resource Manager ๋ณด์กฐ ๋„ค์ž„๋…ธ๋“œ ์„ค์น˜ ์„œ๋ฒ„
๋””์Šคํฌ : 2 ~ 4๊ฐœ
๋ฉ”๋ชจ๋ฆฌ : 32GB ~ 128GB (64GB : 1์–ต ํŒŒ์ผ ์ €์žฅ)
CPU : ์ฝ”์–ด ์ˆ˜ 16~24๊ฐœ
DataNode, Node Manager ์„ค์น˜ ์„œ๋ฒ„
๋””์Šคํฌ : 4 ~ 12๊ฐœ(1TB~3TB)
๋ฉ”๋ชจ๋ฆฌ : 24GB ~ 48GB
CPU : ์ฟผ๋“œ ์ฝ”์–ด 2๊ฐœ ์ด์ƒ

Job Tracker : Job ์‹คํ–‰ ์š”์ฒญ ๋ฐ›๊ณ , Task Tracker์— ํ• ๋‹น

Task Tracker : ์‹ค์ œ Task ์‹คํ–‰, Job Tracker ์ง„ํ–‰ ์ƒํƒœ ๋ณด๊ณ 

 

 

NameNode(Master Server)

1) HDFS์—์„œ์˜ NameNode(master)๋Š” ๋ถ„์‚ฐํ™˜๊ฒฝ์—์„œ Job ๋ถ„๋ฐฐ, ์ง€์‹œ ๋ฐ ๊ฐ๋…

2) ์ž‘์—…์˜ ๋Œ€์ƒ์ด ๋˜๋Š” ํŒŒ์ผ์„ ๋ธ”๋ก(block)๋‹จ์œ„๋กœ ๋‚˜๋ˆ„์–ด์„œ slave node ๋ถ„๋ฐฐ

3) ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ์ €์žฅ

- ํŒŒ์ผ ๋ณ„ ๋ธ”๋ก, ๊ฐ ๋ธ”๋ก ๋ณ„ ์ •๋ณด(DataNode ์œ„์น˜ ๋“ฑ)

- ์‹ค์ œ ํŒŒ์ผ์˜ ๋ธ”๋ก์€ DataNode์— ์ €์žฅ๋จ

4) ์ „์ฒด ๋ถ„์‚ฐ ํŒŒ์ผ์‹œ์Šคํ…œ ์ด์ƒ ์œ ๋ฌด ์ฒดํฌ, DataNode(slave) ๋ฐ์ดํ„ฐ ์ž…์ถœ๋ ฅ ์ž‘์—… (low-level I/O tasks) ์ง€์‹œ

5) NameNode ๋ฌธ์ œ ๋ฐœ์ƒ ์‚ฌ Hadoop ํด๋Ÿฌ์Šคํ„ฐ ์ „์ฒด ์‹œ์Šคํ…œ ์ฐจ๋‹จ

- Secondary NameNode ์ด์šฉ(NameNode ์ด์ค‘ํ™”) ๋ฌธ์ œ ํ•ด๊ฒฐ

 

DataNode(Slave Server)

1) ์‹ค์ œ HDFS(๋ถ„์‚ฐํŒŒ์ผ์‹œ์Šคํ…œ) ๋Œ€์ƒ์œผ๋กœ ์ฝ๊ธฐ ๋ฐ ์“ฐ๊ธฐ์˜ ๋ชจ๋“  ์ž‘์—… ์ˆ˜ํ–‰

2) ๋ชจ๋“  ์ž‘์—…์€ ๋Œ€์ƒ ํŒŒ์ผ์„ randomํ•˜๊ฒŒ ๋ธ”๋ก(block) ๋‹จ์œ„๋กœ ๋‚˜๋ˆ„์–ด ์ง„ํ–‰

3) DataNode ์ž‘์—…์€ NameNode์— ์ˆ˜์‹œ ๋ณด๊ณ , ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ํ˜•์‹ ์ €์žฅ

4) ๋ฐ์ดํ„ฐ๋ฅผ ์ฝ์–ด ๋“ค์ด๋Š” ์ฆ‰์‹œ ๊ฐ Data node์— ๋ถ„๋ฐฐ(Name node)

- HDFS๋Š” ํฐ ๋ฐ์ดํ„ฐ ํŒŒ์ผ์„ ์—ฌ๋Ÿฌ ๊ฐœ๋กœ ๋ถ„๋ฆฌ์‹œ์ผœ์„œ ๊ฐ node ์ฒ˜๋ฆฌ

5) ๊ฐ๊ฐ์˜ ์กฐ๊ฐ(chunk) ๋Š” ์—ฌ๋Ÿฌ ๋Œ€์˜ ์ปดํ“จํ„ฐ์— ์ค‘๋ณต์ ์œผ๋กœ ๋ณต์ œ

- ํ•œ ์ปดํ“จํ„ฐ์—์„œ ์žฅ์• ๊ฐ€ ๋ฐœ์ƒํ•ด๋„ ๋‹ค๋ฅธ ์ปดํ“จํ„ฐ๋ฅผ ํ†ตํ•ด ๋ฐ์ดํ„ฐ ์ด์šฉ

6) ๋ชจ๋“  ํŒŒ์ผ์กฐ๊ฐ๋“ค์€ ํ•˜๋‚˜์˜ namespace๋ฅผ ๊ณต์œ 

- ํด๋Ÿฌ์Šคํ„ฐ ๋‚ด์˜ ๋ชจ๋“  node๋“ค์€ ๊ณต์œ ๋œ ํŒŒ์ผ ์ด์šฉ

 

 

Hadoop ํด๋Ÿฌ์Šคํ„ฐ ๊ตฌ์ถ• ์‹œ์Šคํ…œ ๊ตฌ์„ฑ๋„

 

 

 

 

 

 

Master ์„œ๋ฒ„ ์ƒ์„ฑ/ํ™˜๊ฒฝ ์„ค์ •

1 .Master ์„œ๋ฒ„ ์ƒ์„ฑ

1) C๋“œ๋ผ์ด๋ธŒ์— Master ํด๋” ์ƒ์„ฑ

2) VMware > Home> Create a New Virtual Machine > ๋งจ ์•„๋ž˜ ์˜ต์…˜ ์„ ํƒ > Linux / CentOS 7 64-bit ์„ค์ • > Location์€ Masterํด๋” > Maximum disk size 20.0 / Split ์˜ต์…˜ ์„ ํƒ

3) VMware Master > Edit virtual machine settings

- Memory 2048MB > Network Adapter์—์„œ Advanced... > MAC Address์˜ Generate (MAC์ฃผ์†Œ ์ƒ์„ฑ)

- CD/DVD > Use ISO image file > CentOS 7 7 64-bit

 

 

2. ๋ฐฉํ™”๋ฒฝ ์ œ๊ฑฐ

[Hadoop@localhost ~]$ su root -> ๊ด€๋ฆฌ์ž ๋ชจ๋“œ ์ง„์ž…
์•”ํ˜ธ:
[root@localhost hadoop]#
[root@localhost hadoop]# system status firewalld.service -> ๋ฐฉํ™”๋ฒฝ ๋ฐ๋ชจ ์ƒํƒœ ํ™•์ธ
.
.
[root@localhost hadoop]# system stop firewalld -> ๋ฐฉํ™”๋ฒฝ ๋ฐ๋ชจ ์ค‘์ง€
[root@localhost hadoop]# system mask firewalld -> ๋ฐฉํ™”๋ฒฝ ๋ฐ๋ชจ ์ œ๊ฑฐ
Created symlink from /etc/systemd/system/firewalld.service to /dev/null

 

 

๋„คํŠธ์›Œํฌ ์„ค์ •

๋„คํŠธ์›Œํฌ > ์œ ์„ ์˜ ํ†ฑ๋‹ˆ๋ฐ”ํ€ด > ์ž์„ธํžˆ๋ณด๊ธฐ > IPv4์ฃผ์†Œ (Linux์‹ค์ œ IP4์ฃผ์†Œ) > ๊ธฐ๋ณธ ๋ผ์šฐํŒ…, ๋„ค์ž„์„œ๋ฒ„(DNS)ํ™•์ธ (๊ธฐ๋ณธ ๋ผ์šฐํŒ… (๊ฒŒ์ดํŠธ์›จ์ด))

* ๋ฉ”๋ชจ์žฅ์— ๊ฐœ์ธ ๋„คํŠธ์›Œํฌ ์ •๋ณด ๋”ฐ๋กœ ๊ธฐ๋กํ•ด๋‘๊ธฐ

์‹ ์› > MAC์ฃผ์†Œ ์„ ํƒ

IPv4 > IPv4๋ฐฉ์‹ ์ˆ˜๋™ > ์ฃผ์†Œ์—์„œ ์‹ค์ œ IP์˜ 4๋ฒˆ์งธ ์ˆซ์ž ๋ณ€๊ฒฝํ•ด์„œ / ๊ฒŒ์ดํŠธ์›จ์ด์™€ ๋„ค์ž„์„œ๋ฒ„๋Š” ๊ธฐ๋ณธ ์„ค์ •๊ณผ ๋™์ผํ•˜๊ฒŒ > ์ ์šฉ

 

 

3. ํ˜ธ์ŠคํŠธ ๋„ค์ž„์„ค์ •

[Hadoop@localhost ~]$ su root -> ๊ด€๋ฆฌ์ž ๋ชจ๋“œ ์ง„์ž…
์•”ํ˜ธ:
[root@localhost hadoop]# hostnamectl set-hostname master -> ์„œ๋ฒ„ ์ด๋ฆ„ ๋ณ€๊ฒฝ
[root@localhost hadoop]# hostname -> ์„œ๋ฒ„ ์ด๋ฆ„ ํ™•์ธ
master
[root@localhost hadoop]#
[root@localhost hadoop]# vi /etc/hosts -> ํ˜ธ์ŠคํŠธ ํ™˜๊ฒฝ ์„ค์ •


์ด์ „ ๋‹จ๊ณ„์—์„œ ์„ค์ •ํ–ˆ๋˜ Master ๊ฐ€์ƒ IP์ฃผ์†Œ ๊ธฐ์ž…
172. 16. 57. 5 master
172. 16. 57. 10 slave1

์ธํ„ฐ๋„ท ์—ฐ๊ฒฐ ํ™•์ธ > Master์„œ๋ฒ„ ์žฌ๋ถ€ํŒ… > ์„ค์ • > ๋„คํŠธ์›Œํฌ > ์ž์„ธํžˆ๋ณด๊ธฐ์—์„œ IPv4, ํ•˜๋“œ์›จ์–ด์ฃผ์†Œ, ๊ธฐ๋ณธ ๋ผ์šฐํŒ…, ๋„ค์ž„์„œ๋ฒ„ ์ฃผ์†Œ ํ™•์ธ

 

 

4. Java ์„ค์น˜

JRE ์„ค์น˜ ํ™•์ธ

[Hadoop@master ~]$ su
์•”ํ˜ธ:
[root@master hadoop]# yum list java*jdk-devel ->1.8 ๋ฒ„์ „ ์„ค์น˜

 

JDK ์„ค์น˜

[Hadoop@master ~]$ su
์•”ํ˜ธ:
[root@master hadoop]# yum install java-1.8.0-openjdk-devel.x86_64
.
.
Is this ok [y/d/N]y

 

JDK ์„ค์น˜ ํ™•์ธ

Complete!
[root@master hadoop]# rpm -qa java-1.8.0-openjdk-devel.x86_64 -> JDK์„ค์น˜ ํ™•์ธ
Java-1.8.0-openjdk-devel-1.8.0.312.b07-1.e17_9.x86_64
[root@master hadoop]#
[root@master hadoop]#java -version -> JRE๋ฒ„์ „ ํ™•์ธ
openjdk version "1.8.0_312"

 

 

5. Slave ํด๋” ์ƒ์„ฑ(Master ํด๋” ๋ณต์‚ฌ)

Masterํด๋”๊ฐ€ ์žˆ๋Š” C๋“œ๋ผ์ด๋ธŒ ๊ณต๊ฐ„์— Masterํด๋” ๋ณต์‚ฌ ํ›„ Slave1๋กœ ์ด๋ฆ„ ๋ณ€๊ฒฝ