0. 참고할 문서

1. 일단 따라해봅시다

1. R 기본 문서를 참고해 주세요

1.1. 정의하기

a <- 10 # a는 10입니다.
b <- 5 # b는 5입니다.
c <- "animal" # c는 animal입니다.

주의: 변수 이름은 문자로 시작해야합니다.

1.2. 출력하기

## [1] 10

## [1] 5

## [1] "animal"

1.3. 사칙연산

a + b # 더하기

## [1] 15

a - b # 빼기

## [1] 5

a / b # 나누기

## [1] 2

a * b # 곱하기

## [1] 50

a^2 # 제곱

## [1] 100

sqrt(a^2) # 루트

## [1] 10

1.4. 논리

a == b # a는 b와 같다.

## [1] FALSE

a != b # a는 b와 같지 않다.

## [1] TRUE

a > b # a는 b보다 크다.

## [1] TRUE

a < b # a는 b보다 작다.

## [1] FALSE

a >= b # a는 b보다 크거나 같다.

## [1] TRUE

a <= b # a는 b보다 작거나 같다.

## [1] FALSE

1.5. 데이터프레임

a <- 1:10

a

##  [1]  1  2  3  4  5  6  7  8  9 10

b <- c(1, 2, 3, 4, 5, 10)

b

## [1]  1  2  3  4  5 10

c <- c("Konkuk", "University")

c

## [1] "Konkuk"     "University"

c는 ’합치다’라는 뜻의 ’combine’의 약자입니다.

example <- data.frame(name = c("철수", "영희", "건이"), sex = c("Male", "Female", "Male"), age = c(29, 40, 33), height = c(175, 163, 168))

example

str(example)

## 'data.frame':    3 obs. of  4 variables:
##  $ name  : Factor w/ 3 levels "건이","영희",..: 3 2 1
##  $ sex   : Factor w/ 2 levels "Female","Male": 2 1 2
##  $ age   : num  29 40 33
##  $ height: num  175 163 168

x <- example$height # example 데이터 중 height 데이터만 추출
x

## [1] 175 163 168

example[2, 3]

## [1] 40

example[3, 3]

## [1] 33

example[, 3]

## [1] 29 40 33

example[1, ]

example[-1, ]

example[c(2, 3), ]

1.6. 기본 통계

sum(x) # 합계

## [1] 506

mean(x) # 평균

## [1] 168.6667

median(x) # 중간값

## [1] 168

var(x) # 분산

## [1] 36.33333

sd(x) # 표준편차

## [1] 6.027714

max(x) # 최대값

## [1] 175

min(x) # 최소값

## [1] 163

range(x) # 범위

## [1] 163 175

2. 기초 R 문법

2.1. 패키지 인스톨

install.packages("dplyr")
install.packages("ggplot2")
install.packages("readxl")

2.2. 패키지 업데이트

update.packages(ask = FALSE)

2.3. 패키지 로딩

library(dplyr)
library(ggplot2)

2.4. 파일 읽기

read.csv("dataframe.csv")
readxl::read_excel("dataframe.xlsx")

2.8. 파일 쓰기

write.csv(example, "test.txt", row.names = FALSE)

3. 데이터 가공: dplyr 패키지

2. 데이터 가공 문서를 참고해 주세요

3.0. 패키지 로딩 및 데이터 읽기

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

readxl::read_excel("example_df.xlsx")

df <- readxl::read_excel("example_df.xlsx")

3.1. 필요한 데이터만 추출하기: filter

filter(df, Name == "Kim")

filter(df, weight > 500)

filter(df, wgrade != "C")

filter(df, qgrade != "3")

filter(df, qgrade == "1++" & wgrade == "A") # and

filter(df, qgrade == "1++" | wgrade == "A") # or

3.2. 파이프 함수의 사용: %>%

x %>% f(y) becomes f(x,y)

df$month %>% mean()

## [1] 32.06825

mean(df$month)

## [1] 32.06825

3.3. 그룹 분석: group_by

group_by(df, Name) %>% summarise(avg = mean(month))

group_by(df, Name) %>% summarise(month = mean(month), mean_weight = mean(weight), max_weight = max(weight))

group_by(df, qgrade) %>% summarise(month = mean(month))

4. 시각화: ggplot2 패키지

3. 데이터 시각화 문서 및 R graphic cookbook을 참고해 주세요

4.0. 패키지 로딩

library(ggplot2)

4.1. 메인 데이터셋 지정: ggplot(data=df, aes(x,y))

g <- ggplot(df, aes(x = month, y = weight))

g

4.2. 산점도: + geom_point()

g + geom_point()

4.3. 막대그래프: + geom_bar()

h <- ggplot(df, aes(qgrade))

h + geom_bar()

h + geom_bar() + scale_x_discrete(limits = c("3", "2", "1", "1+", "1++")) # 순서 지정하기

h + geom_bar(width = 0.5, fill = "red") + scale_x_discrete(limits = c("3", "2", "1", "1+", "1++")) # 너비 및 색 지정

4.4. 분포도 그리기: + geom_density()

g <- ggplot(df, aes(month))

g + geom_density(kernel = "gaussian")

5. dplyr + ggplot

filter(df, month < 40) %>% ggplot(aes(month)) + geom_density(kernel = "gaussian") # 40개월 미만의 개체의 분포도

filter(df, Name == "Kim") %>% ggplot(aes(wgrade)) + geom_bar(width = 0.4, fill = "#81BEF7") + coord_flip() # 김씨네 육량등급

html 컬러 차트: https://html-color-codes.info/Korean/

6. 연동형 문서 작성: R 마크다운

4. 연동형 문서 작성을 참고해 주세요
https://gist.github.com/ihoneymon/652be052a0727ad59601
워드나 한글로 예술 작품을 만들 필요는 없습니다 -> 보고서는 정보 전달이 목적
R 마크다운 = 기존의 마크다운 + R 코드를 재현
기본 데이터가 변하더라도 같은 형식의 보고서를 빠르게 작성할 수 있음

7. 결론

본 수업을 통해 기본적인 R 사용법을 배워보았습니다. 각자가 다뤄야 하는 데이터의 종류와 성격이 다르겠지만 기본적으로 데이터를 가공하고-분석하고-해석하는 일련의 과정들은 다르지 않을 것입니다. 지금은 비록 기본만 배웠지만 각자가 더 공부해 분석 및 보고서 작성과정에 들이는 시간을 줄일 수 있길 바래봅니다.

Lecture note: Basic R

Youngjun Na (ruminoreticulum@gmail.com)

0. 참고할 문서

1. 일단 따라해봅시다

1.1. 정의하기

1.2. 출력하기

1.3. 사칙연산

1.4. 논리

1.5. 데이터프레임

1.6. 기본 통계

2. 기초 R 문법

2.1. 패키지 인스톨

2.2. 패키지 업데이트

2.3. 패키지 로딩

2.4. 파일 읽기

2.8. 파일 쓰기

3. 데이터 가공: dplyr 패키지

3.0. 패키지 로딩 및 데이터 읽기

3.1. 필요한 데이터만 추출하기: filter

3.2. 파이프 함수의 사용: %>%

3.3. 그룹 분석: group_by

4. 시각화: ggplot2 패키지

4.0. 패키지 로딩

4.1. 메인 데이터셋 지정: ggplot(data=df, aes(x,y))

4.2. 산점도: + geom_point()

4.3. 막대그래프: + geom_bar()

4.4. 분포도 그리기: + geom_density()

5. dplyr + ggplot

6. 연동형 문서 작성: R 마크다운

7. 결론