ggplot初级绘图指南

Author

孟克-mengke25

1.ggplot简介

ggplot2 是 R 中非常强大的数据可视化工具,能够创建多种类型的图形。

ggplot2 图形的基本构建块包括

  • 数据(data)

  • 映射(aesthetics,简称 aes

  • 几何对象(geometries,简称 geom

  • 统计变换(statistics,简称 stat

  • 坐标系统(coordinates,简称 coord

  • 主题(themes)

2.绘图逻辑

  • 第一步:初始化ggplot对象

    • 使用 ggplot() 函数初始化 ggplot 对象,指定数据集和美学映射(变量如何映射到图形属性,如 x 轴、y 轴、颜色等)

      ggplot(data = <dataframe>, aes(x = <x-variable>, y = <y-variable>))
  • 第二步:添加几何对象

    • 使用 geom_ 系列函数添加几何对象,如点、线、条形、箱线图等

      + geom_point()
  • 第三步:添加其他组件

    • 例如,添加标题、标签、坐标系、主题等

      + labs(title = "Title", x = "X-axis Label", y = "Y-axis Label")
      + theme_minimal()

3.速查表

常见ggplot绘图的基本范式

  • 散点图geom_point()
ggplot(mtcars, aes(x = wt, y = mpg)) +
  geom_point() 
  • 线图geom_line()
ggplot(economics, aes(x = date, y = unemploy)) +
  geom_line()
  • 柱状图geom_bar()
ggplot(mpg, aes(x = class)) +
  geom_bar()
  • 密度图geom_density()
ggplot(mpg, aes(x = hwy)) +
  geom_density()
  • 直方图geom_histogram()
ggplot(mpg, aes(x = hwy)) +
  geom_histogram(binwidth = 1)
  • 箱线图geom_boxplot()
ggplot(mpg, aes(x = class, y = hwy)) +
  geom_boxplot()
  • 小提琴图geom_violin()
ggplot(mpg, aes(x = class, y = hwy)) +
  geom_violin()
  • 热力图geom_tile()
ggplot(mtcars, aes(x = factor(cyl), y = factor(gear))) +
  geom_tile(aes(fill = mpg))

4.绘图实例

加载程辑包

library(ggplot2)
Warning: package 'ggplot2' was built under R version 4.3.3
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(patchwork)
library(gridExtra)

Attaching package: 'gridExtra'
The following object is masked from 'package:dplyr':

    combine

定义颜色

my_colors <- c(
  rgb(31, 119, 180, maxColorValue = 255),
  rgb(255, 127, 14, maxColorValue = 255),
  rgb(44, 160, 44, maxColorValue = 255),
  rgb(214, 39, 40, maxColorValue = 255),
  rgb(148, 103, 189, maxColorValue = 255),
  rgb(140, 86, 75, maxColorValue = 255),
  rgb(227, 119, 194, maxColorValue = 255),
  rgb(127, 127, 127, maxColorValue = 255),
  rgb(188, 189, 34, maxColorValue = 255),
  rgb(23, 190, 207, maxColorValue = 255),
  rgb(174, 199, 232, maxColorValue = 255),
  rgb(255, 187, 120, maxColorValue = 255),
  rgb(152, 223, 138, maxColorValue = 255),
  rgb(255, 152, 150, maxColorValue = 255),
  rgb(197, 176, 213, maxColorValue = 255)
)

(1)散点图{geom_point()}

a.基本范式

ggplot(mtcars, aes(x = wt, y = mpg)) +
  geom_point(color = my_colors[1]) + 
  theme_bw()

  • 在该图中,有三个关键要素:数据集是mtcars,横轴是wt,纵轴是mpg

  • 选择了自定义主题中的第一个颜色 my_colors[1]

  • 使用了白色学术模板theme_bw()

b.分组绘制

  • 根据vs变量将散点图绘制成两组,只需在aes()选项内假如color=选项
ggplot(mtcars, aes(x = wt, y = mpg , color = factor(vs))) +
  geom_point() + 
  scale_color_manual(values = my_colors) +
  theme_bw() 

  • 要改变ggplot2legendtitle,可以使用labs()函数来设置自定义的legend标题

  • 要在ggplot2中将legend的labels从默认的01改为自定义的"type a""type b",可以使用scale_color_manual()函数的labels参数

ggplot(mtcars, aes(x = wt, y = mpg, color = factor(vs))) +
  geom_point() + 
  scale_color_manual(values = my_colors, labels = c("color a", "color b")) +
  theme_bw() +
  labs(color = "type")

c.加拟合线

  • 只需要在已经画好的散点图基础上,加入 geom_smooth(method = "lm")
ggplot(mtcars, aes(x = wt, y = mpg, color = factor(vs))) +
  geom_point() + 
  scale_color_manual(values = my_colors, labels = c("color a", "color b")) +
  theme_bw() +
  labs(color = "type")+
  geom_smooth(method = "lm") 
`geom_smooth()` using formula = 'y ~ x'

  • 要让置信区间的颜色也与my_colors一致,可以使用geom_smooth()中的aes()函数,并将fill属性设置为与颜色一致

  • label与之前类似,用labels选项即可定义label1label2 的名字

ggplot(mtcars, aes(x = wt, y = mpg, color = factor(vs))) +
  geom_point() + 
  scale_color_manual(values = my_colors, labels = c("color a", "color b")) +
  scale_fill_manual(values = my_colors, labels = c("fill a", "fill b")) +  
  theme_bw() +
  labs(color = "type") +
  geom_smooth(method = "lm", aes(fill = factor(vs)), alpha = 0.2) 
`geom_smooth()` using formula = 'y ~ x'

  • 如果不想展示某一类legend,只需在最后添加guides(fill = "none")选项
ggplot(mtcars, aes(x = wt, y = mpg, color = factor(vs))) +
  geom_point() + 
  scale_color_manual(values = my_colors, labels = c("color a", "color b")) +
  scale_fill_manual(values = my_colors, labels = c("fill a", "fill b")) +  
  theme_bw() +
  labs(fill = "type") +
  geom_smooth(method = "lm", aes(fill = factor(vs)), alpha = 0.2) +
  guides(color = "none")
`geom_smooth()` using formula = 'y ~ x'

d.加权散点(气泡图)

  • 只需在geom_point 选项下写入aes(size = qsec) 即可

  • qsec 相当于加权变量

  • 加入shape选项,来修改点的样式,如shape=1即为空心圆点

ggplot(mtcars, aes(x = wt, y = mpg, color = factor(vs))) +
  geom_point(aes(size = qsec),shape = 1 ) + 
  scale_color_manual(values = my_colors, labels = c("color a", "color b")) +
  scale_fill_manual(values = my_colors, labels = c("fill a", "fill b")) +  
  theme_bw() +
  labs(fill = "type") +
  geom_smooth(method = "lm", aes(fill = factor(vs)), alpha = 0.2) +
  guides(color = "none")
`geom_smooth()` using formula = 'y ~ x'

e.坐标与标题

  • 要在现有的代码基础上添加坐标轴标签和图表标题,可以使用labs()函数

  • 在这段代码中,labs()函数被扩展以包括x = "x"y = "y"title = "title",从而设置横坐标、纵坐标的标签和图表标题

ggplot(mtcars, aes(x = wt, y = mpg, color = factor(vs))) +
  geom_point() + 
  scale_color_manual(values = my_colors, labels = c("color a", "color b")) +
  scale_fill_manual(values = my_colors, labels = c("fill a", "fill b")) +  
  theme_bw() +
  labs(x = "x", y = "y", title = "title", fill = "type") +
  geom_smooth(method = "lm", aes(fill = factor(vs)), alpha = 0.2) +
  guides(color = "none")
`geom_smooth()` using formula = 'y ~ x'

  • 想要legend放在图片下面,只需要用guides(fill = guide_legend(nrow = 2)) 来实现

  • 当然,每次也需要调用一个list: legend_bottom,即需要定义一个theme。

legend_bottom <- theme(
  legend.position = "bottom",
  legend.box = "horizontal",
  legend.title = element_blank(),
  legend.key.size = unit(0.5, "cm"),
  legend.text = element_text(size = 8)
)


ggplot(mtcars, aes(x = wt, y = mpg, color = factor(vs))) +
  geom_point() + 
  scale_color_manual(values = my_colors, labels = c("color a", "color b")) +
  scale_fill_manual(values = my_colors, labels = c("fill a", "fill b")) +  
  theme_bw() +
  labs(x = "x", y = "y", title = "title", fill = "type") +
  geom_smooth(method = "lm", aes(fill = factor(vs)), alpha = 0.2) +
  guides(color = "none") + 
  guides(fill = guide_legend(nrow = 1)) +legend_bottom
`geom_smooth()` using formula = 'y ~ x'

  • 要更改x轴的刻度,使其从0开始,间隔为2,到达6,可以使用scale_x_continuous()函数并设置breaks参数

  • 要更改y轴的刻度,使其从10到40,间隔为10,可以使用scale_y_continuous()函数并设置limitsbreaks参数

ggplot(mtcars, aes(x = wt, y = mpg, color = factor(vs))) +
  geom_point() + 
  scale_color_manual(values = my_colors, labels = c("color a", "color b")) +
  scale_fill_manual(values = my_colors, labels = c("fill a", "fill b")) + 
  theme_bw() +
  labs(x = "x", y = "y", title = "title", fill = "type") +
  geom_smooth(method = "lm", aes(fill = factor(vs)), alpha = 0.2) +
  guides(color = "none") +
  scale_x_continuous(breaks = seq(0, 6, by = 2)) +
  scale_y_continuous(limits = c(10, 40), breaks = seq(10, 40, by = 10)) 
`geom_smooth()` using formula = 'y ~ x'

(2)线图{geom_line()}

a.基本范式

ggplot(economics, aes(x = date, y = unemploy)) +
  geom_line(color = my_colors[1]) + 
  theme_bw()

b.根据横轴分组

  • 要根据日期将数据分成两部分并分别应用不同的颜色,可以使用 dplyr 包来创建一个新的变量,表示不同时间段,然后在 ggplot2 中使用该变量进行颜色映射

  • 这种比较少见

economics <- economics %>%
  mutate(period = ifelse(date < as.Date("1990-01-01"), "before_1990", "after_1990"))

# 画图
ggplot(economics, aes(x = date, y = unemploy, color = period)) +
  geom_line() +
  scale_color_manual(values = c("before_1990" = my_colors[1], "after_1990" = my_colors[2])) +
  theme_bw() +
  labs(color = "Period")

c.根据纵轴分组

  • 可以使用 ggplot2geom_line() 函数在同一个 ggplot 对象中添加两个数据系列
ggplot(economics, aes(x = date)) +
  geom_line(aes(y = psavert, color = "psavert")) +
  geom_line(aes(y = uempmed, color = "uempmed")) +
  scale_color_manual(values = c("psavert" = my_colors[1], 
                                "uempmed" = my_colors[2])) +
  theme_bw()

  • 要在同一个图中将一个变量绘制在左轴上,另一个变量绘制在右轴上,可以使用 ggplot2 中的 sec_axis() 功能来创建第二个y轴。

    • 计算 uempmed 的缩放因子,以便与 psavert 的尺度匹配

    • 使用 geom_line(aes(y = uempmed * scale_factor, color = "uempmed")) 绘制缩放后的 uempmed 数据,使其与 psavert 的尺度匹配

    • 使用 scale_y_continuous() 设置左侧 y 轴和右侧 y 轴。sec_axis(~./scale_factor, name = "uempmed") 用于创建右侧 y 轴,并恢复 uempmed 的原始缩放

scale_factor <- max(economics$psavert) / max(economics$uempmed)

ggplot(economics, aes(x = date)) +
  geom_line(aes(y = psavert, color = "psavert")) +
  geom_line(aes(y = uempmed * scale_factor, color = "uempmed")) + 
  scale_color_manual(values = c("psavert" = my_colors[1], 
                                "uempmed" = my_colors[2])) +
  scale_y_continuous(
    name = "psavert",
    sec.axis = sec_axis(~./scale_factor, name = "uempmed")) +
  theme_bw() 

d.线状图与置信区间

  • 先生成一个数据,这里不用管
coef <- data.frame(
  year = c(2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014),
  coef = c(-0.43299584, -0.45390098, -0.48347805, -0.42459391, -0.39575535, -0.43394419, -0.39845995, -0.30237073, -0.42472867, -0.37063062, -0.25289686, -0.21151954, -0.09414755),
  ci_lower = c(-0.4422756, -0.4628580, -0.4920710, -0.4328384, -0.4039016, -0.4421753, -0.4065155, -0.3104660, -0.4326264, -0.3784235, -0.2606926, -0.2192482, -0.1018055),
  ci_upper = c(-0.42371613, -0.44494396, -0.47488511, -0.41634943, -0.38760905, -0.42571306, -0.39040442, -0.29427541, -0.41683091, -0.36283777, -0.24510107, -0.20379085, -0.08648955)
)


coef2 <- coef
coef2$coef2 <- coef2$coef + rnorm(nrow(coef2), mean = 0, sd = 0.05) + 0.1
coef2$ci_lower2 <- coef2$coef2 - abs(rnorm(nrow(coef2), mean = 0, sd = 0.01))
coef2$ci_upper2 <- coef2$coef2 + abs(rnorm(nrow(coef2), mean = 0, sd = 0.01))

coef2$ci_lower2 <- pmin(coef2$ci_lower2, coef2$coef2)
coef2$ci_upper2 <- pmax(coef2$ci_upper2, coef2$coef2)
  • 从这开始,我介绍一下关键的option

  • geom_line

    • geom_line():这个函数用于在图表中绘制线条。在这个例子中,它根据’year’和’coef’变量的值绘制了一条连续的线

    • size = 1 设置线的粗细

    • color = my_colors[1] 设置线的颜色(使用自定义颜色向量的第一个颜色)

  • geom_point

    • 这个函数用于在图表中添加散点。它在每个数据点的位置绘制一个点
    • size = 2 设置点的大小

    • color = my_colors[1] 设置点的颜色(与线条颜色相同)

  • geom_ribbon

    • geom_ribbon():这个函数用于创建一个带状区域,通常用来表示置信区间或误差范围

    • aes(ymin = ci_lower, ymax = ci_upper) 定义了带状区域的上下边界

    • fill = my_colors[1] 设置填充颜色(与线条和点的颜色相同)

    • alpha = 0.2 设置填充的透明度(0.2表示80%透明)

  • 先画一个简单的线条,后面我们依次加上关键的options

ggplot(coef, aes(x=year,y=coef)) +
  geom_line(color = my_colors[1],size=1.0) + 
  scale_x_continuous(breaks = c(2002,2006,2010,2014,2016)) + 
  theme_bw()
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

ggplot(coef, aes(x = year, y = coef, ymin = ci_lower, ymax = ci_upper)) +
  geom_line(size = 1,color = my_colors[1]) +  # 绘制线条
  geom_point(size = 2,color = my_colors[1]) + 
  geom_ribbon(aes(ymin = ci_lower, ymax = ci_upper), fill = my_colors[1], alpha = 0.2) +
  labs(title = "title",
       x = "Years",
       y = "coef.") +
  scale_x_continuous(breaks = seq(2002, 2015, by = 2)) + 
  geom_hline(yintercept = 0, linetype = "dashed", color = my_colors[2]) + 
  theme_bw() +
  theme(plot.title = element_text(hjust = 0.5))

  • 画两条线,就继续叠加geom_ribbongeom_linegeom_point

  • 标题居中用theme(plot.title = element_text(hjust = 0.5))来实现

ggplot(coef2, aes(x = year)) +
  # coef1
  geom_ribbon(aes(ymin=ci_lower,ymax=ci_upper),fill=my_colors[1], alpha = 0.2) +
  geom_line(aes(y = coef), size = 1, color = my_colors[1]) +
  geom_point(aes(y = coef), size = 2, color = my_colors[1]) +
  
  # coef2
  geom_ribbon(aes(ymin=ci_lower2, ymax=ci_upper2), fill=my_colors[2],alpha = 0.2) +
  geom_line(aes(y = coef2), size = 1, color = my_colors[2]) +
  geom_point(aes(y = coef2), size = 2, color = my_colors[2]) +
  
  labs(title = "Comparison of Two Coefficient Sets",
       x = "Years",
       y = "Coefficients") +
  scale_x_continuous(breaks = seq(2002, 2015, by = 2)) + 
  geom_hline(yintercept = 0, linetype = "dashed", color = my_colors[2]) + 
  theme_bw() +
  theme(plot.title = element_text(hjust = 0.5))

(3)柱状图{geom_bar()}

a.基本范式

ggplot(mpg, aes(x = class)) +
  geom_bar() + 
  theme_bw()

b.分组绘制

ggplot(mpg, aes(x = class,color=drv,fill=drv)) +
  geom_bar() + 
  scale_color_manual(values = my_colors) +
  scale_fill_manual(values = my_colors) +
  theme_bw()

c.横向绘制

ggplot(mpg, aes(x = class, color = drv, fill = drv)) +
  geom_bar() + 
  scale_color_manual(values = my_colors) +
  scale_fill_manual(values = my_colors) +
  theme_bw() +
  coord_flip() +  # 添加这行来翻转坐标轴
  labs(x = "Class", y = "Count", title = "Vehicle Class by Drive Type") +
  theme(legend.position = "bottom")  

d.常见位置选项

  • barplot常见的分组position有三个

    • dodge(躲避):

      • position = "dodge"position = position_dodge()

      • 作用:将同一 x 值(或 y 值,对于水平条形图)的不同组的元素并排放置,避免重叠。

      • 常用场景:分组条形图,当你想比较不同组在同一类别中的数值时

    • stack(堆叠):

      • position = "stack"position = position_stack()

      • 作用:将同一 x 值的不同组的元素垂直堆叠在一起。

      • 常用场景:当你想显示总量,同时也展示各部分的构成时。

    • fill(填充):

      • position = "fill"position = position_fill()

      • 作用:类似于 stack,但会将每组数据标准化,使总高度相同(通常为 1)

      • 常用场景:当你想比较不同类别中各组的相对比例时

f1<-ggplot(mpg, aes(x = class,color=drv,fill=drv)) +
  geom_bar() + 
  scale_color_manual(values = my_colors) +
  scale_fill_manual(values = my_colors) +
  theme_bw() + labs(title="default")

f2<-ggplot(mpg, aes(x = class,color=drv,fill=drv)) +
  geom_bar(position = "dodge") + 
  scale_color_manual(values = my_colors) +
  scale_fill_manual(values = my_colors) +
  theme_bw() + labs(title="dodge")

f3<-ggplot(mpg, aes(x = class,color=drv,fill=drv)) +
  geom_bar(position = "stack") + 
  scale_color_manual(values = my_colors) +
  scale_fill_manual(values = my_colors) +
  theme_bw() + labs(title="stack")

f4<-ggplot(mpg, aes(x = class,color=drv,fill=drv)) +
  geom_bar(position = "fill") + 
  scale_color_manual(values = my_colors) +
  scale_fill_manual(values = my_colors) +
  theme_bw() + labs(title="fill")


fig <- grid.arrange(f1,f2,f3,f4 , ncol=2)

(4)密度图{geom_density()}和直方图{geom_histogram()}

  • 从功能上看,密度图与直方图都是为了反应某一变量的分布情况,因此这这里放到一起去进行学习

a.基本范式

p1 <- ggplot(mpg, aes(x = hwy)) +
  geom_density() + 
  theme_bw()


p2 <- ggplot(mpg, aes(x = hwy)) +
  geom_histogram() + 
  theme_bw()

combined_plot <- grid.arrange(p1, p2, ncol = 2)
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# ggsave("combined_plot.png", combined_plot, width = 35, height = 14, units = "cm")

b.分组绘制

mpg$year <- as.factor(mpg$year)

# 密度图
p1 <- ggplot(mpg, aes(x = hwy, color = year, fill = year)) +
  geom_density(alpha = 0.3) + 
  scale_color_manual(values = my_colors) +
  scale_fill_manual(values = my_colors) +
  theme_bw() +
  labs(title = "Density")+
  theme(legend.position = "bottom")

# 直方图
p2 <- ggplot(mpg, aes(x = hwy, fill = year)) +
  geom_histogram(position = "dodge", alpha = 0.7, bins = 30) + 
  scale_fill_manual(values = my_colors) +
  theme_bw() +
  labs(title = "Histogram")+
  theme(legend.position = "bottom")


fig <- grid.arrange(p1,p2,ncol=2)

c.直方图选项

  • 在这里补充一下position的选项

    • 如果geom_histogram 之后,没有加position = "dodge",那么R就默认堆叠柱状图

    • 常见的position dodge

      • 作用:将同组的元素并排放置,避免重叠。

      • 常用于:分组条形图、分组点图等。

      • 示例:position = "dodge"position = position_dodge(width = 0.9)

    • stack

    -    作用:将同组的元素堆叠在一起。
    
    -    常用于:堆叠条形图、面积图等。
    
    -    示例:`position = "stack"` (这是许多几何对象的默认设置)
    • fill
    -    作用:类似于 `stack`,但会将每组数据标准化为相同的高度(percentage)。
    
    -    常用于:显示比例而非绝对数值的堆叠图。
    
    -    示例:`position = "fill"`
    • jitter
    -   作用:给数据点添加少量随机噪声,避免点的重叠。
    
    -    常用于:散点图中有大量重叠点的情况。
    
    -    示例:`position = "jitter"` 或 `position = position_jitter(width = 0.1)`
    • identity
    -    作用:保持原始位置不变。
    
    -    用途:当你想精确控制每个元素的位置时使用。
    
    -    示例:`position = "identity"`
    • nudge
    -    作用:将元素向特定方向微调。
    
    -    常用于:调整标签位置以避免与数据点重叠。
    
    -    示例:`position = position_nudge(x = 0.1, y = 0.1)`
p1<-ggplot(diamonds, aes(price, fill = cut)) +
  geom_histogram(binwidth = 500) + 
  theme_bw() + labs(title="p1")

p2<-ggplot(diamonds, aes(price, fill = cut)) +
  geom_histogram(position = "dodge",binwidth = 500) + 
  theme_bw() + labs(title="p2")

p3<-ggplot(diamonds, aes(price, fill = cut)) +
  geom_histogram(position = "stack",binwidth = 500) + 
  theme_bw() + labs(title="p3")

p4<-ggplot(diamonds, aes(price, fill = cut)) +
  geom_histogram(position = "fill",binwidth = 500) + 
  theme_bw() + labs(title="p4")


fig <- grid.arrange(p1,p2,p3,p4 , ncol=2)

(5)箱线图{geom_boxplot()}和小提琴图{geom_violin()}

  • 箱线图和小提琴图本质上就是分组density,查看或对比不同组数据分布时常用

a.基本范式

  • 这里我还使用 theme() 函数中的 axis.text.x 参数,并将 element_text(angle = 45, hjust = 1) 应用于 ggplot 对象
  • 目的是让横轴与xaxis夹角为45°
f1 <- ggplot(mpg, aes(x = class, y = hwy)) +
  geom_boxplot()+
  theme_bw()+
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

f2 <- ggplot(mpg, aes(x = class, y = hwy)) +
  geom_violin()+
  theme_bw() + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

fig <- grid.arrange(f1,f2, ncol=2)

b.箱线图+散点

  • 基本范式
ggplot(mpg, aes(x = class, y = hwy)) +
  geom_boxplot() +
  geom_jitter(width = 0.2) +
  theme_bw()

  • 调整颜色模板
ggplot(mpg, aes(x = class, y = hwy)) +
  geom_boxplot(fill = my_colors[1], color = my_colors[1] , alpha = 0.6) + 
  geom_jitter(width = 0.2, color = my_colors[2], alpha = 0.6) +  
  theme_bw() +
  labs(x = "Vehicle Class", 
       y = "Highway MPG", 
       title = "Highway MPG by Vehicle Class") 

(6)热力图{geom_tile()}

ggplot(mpg, aes(x = factor(manufacturer), y = factor(class))) +
  geom_tile(aes(fill = displ)) + 
  theme_bw()

(7)面积图{geom_area()}

ggplot(economics, aes(x = date, y = unemploy)) +
  geom_area(color = my_colors[1] ,fill =  my_colors[1]) + 
  theme_bw()