关于r：ggplot2条形图中的订购条

Order Bars in ggplot2 bar graph

我正在尝试制作条形图，其中最大的条形图最接近y轴，而最短的条形图最远。所以这有点像我的桌子

1
2
3
4
5
6
7

Name Position
1 James Goalkeeper
2 Frank Goalkeeper
3 Jean Defense
4 Steve Defense
5 John Defense
6 Tim Striker

所以我正在尝试建立一个条形图，以根据位置显示玩家数量

1	p <- ggplot(theTable, aes(x = Position)) + geom_bar(binwidth = 1)

但是该图首先显示了守门员栏，然后是防守方，最后是前锋。我希望对图表进行排序，以使防守杆最接近y轴，守门员最接近，前锋最接近。
谢谢

相关讨论

@GavinSimpson：reorder是对此的强大而有效的解决方案：

1
2
3
4

ggplot(theTable,
aes(x=reorder(Position,Position,
function(x)-length(x)))) +
geom_bar()

相关讨论

排序的关键是按照所需的顺序设置因子的水平。不需要有序的因子；不需要额外的有序因子信息，并且如果这些数据正在任何统计模型中使用，则错误的参数设置可能会导致多项式对比不适用于此类标称数据。

1
2
3
4
5
6
7

## set the levels in order we want
theTable <- within(theTable,
Position <- factor(Position,
levels=names(sort(table(Position),
decreasing=TRUE))))
## plot
ggplot(theTable,aes(x=Position))+geom_bar(binwidth=1)

barplot figure

从最一般的意义上讲，我们只需要将因子水平设置为所需顺序即可。如果未指定，则因子的水平将按字母顺序排序。您还可以如上所述在因子调用中指定级别顺序，也可以使用其他方法。

1	theTable$Position <- factor(theTable$Position, levels = c(...))

相关讨论

使用scale_x_discrete (limits = ...)指定条形顺序。

1 2	positions <- c("Goalkeeper","Defense","Striker") p <- ggplot(theTable, aes(x = Position)) + scale_x_discrete(limits = positions)

相关讨论

我认为已经提供的解决方案过于冗长。使用ggplot进行频率排序的条形图的一种更简洁的方法是

1	ggplot(theTable, aes(x=reorder(Position, -table(Position)[Position]))) + geom_bar()

它与Alex Brown的建议相似，但略短一些，并且无需任何函数定义即可工作。

更新

我认为我的旧解决方案当时很不错，但是现在我更愿意使用forcats::fct_infreq，它按频率对因子水平进行排序：

1
2
3

require(forcats)

ggplot(theTable, aes(fct_infreq(Position))) + geom_bar()

相关讨论

就像Alex Brown回答中的reorder()一样，我们也可以使用forcats::fct_reorder()。应用指定的函数后，它将基本上根据第二个arg中的值对第一个arg中指定的因子进行排序(默认值=中位数，这是我们在这里使用的，因为每个因子水平只有一个值)。

很遗憾，在OP的问题中，所需的顺序也是字母顺序的，因为这是创建因子时的默认排序顺序，因此将隐藏此函数的实际作用。为了更加清楚，我将" Goalkeeper"替换为" Zoalkeeper"。

1
2
3
4
5
6
7
8
9
10
11
12

library(tidyverse)
library(forcats)

theTable <- data.frame(
Name = c('James', 'Frank', 'Jean', 'Steve', 'John', 'Tim'),
Position = c('Zoalkeeper', 'Zoalkeeper', 'Defense',
'Defense', 'Defense', 'Striker'))

theTable %>%
count(Position) %>%
mutate(Position = fct_reorder(Position, n, .desc = TRUE)) %>%
ggplot(aes(x = Position, y = n)) + geom_bar(stat = 'identity')

enter image description here

相关讨论

一个简单的基于dplyr的因子重排序可以解决此问题：

1
2
3
4
5
6
7
8
9
10

library(dplyr)

#reorder the table and reset the factor to that ordering
theTable %>%
group_by(Position) %>% # calculate the counts
summarize(counts = n()) %>%
arrange(-counts) %>% # sort by counts
mutate(Position = factor(Position, Position)) %>% # reset factor
ggplot(aes(x=Position, y=counts)) + # plot
geom_bar(stat="identity") # plot histogram

您只需要指定Position列为有序因子，即可按其计数对级别进行排序：

1 2	theTable <- transform( theTable, Position = ordered(Position, levels = names( sort(-table(Position)))))

(请注意，table(Position)产生Position列的频率计数。)

然后，您的ggplot函数将以计数的降序显示条形图。
我不知道geom_bar中是否有一个选项可以不必显式创建有序因子来执行此操作。

相关讨论

除了forcats :: fct_infreq，还提到了
@HolgerBrandl，有forcats :: fct_rev，可反转因子顺序。

1
2
3
4
5
6
7
8
9
10
11
12

theTable <- data.frame(
Position=
c("Zoalkeeper","Zoalkeeper","Defense",
"Defense","Defense","Striker"),
Name=c("James","Frank","Jean",
"Steve","John","Tim"))

p1 <- ggplot(theTable, aes(x = Position)) + geom_bar()
p2 <- ggplot(theTable, aes(x = fct_infreq(Position))) + geom_bar()
p3 <- ggplot(theTable, aes(x = fct_rev(fct_infreq(Position)))) + geom_bar()

gridExtra::grid.arrange(p1, p2, p3, nrow=3)

gplot output

相关讨论

我同意zach的观点，在dplyr中进行计数是最好的解决方案。我发现这是最短的版本：

1
2
3
4

dplyr::count(theTable, Position) %>%
arrange(-n) %>%
mutate(Position = factor(Position, Position)) %>%
ggplot(aes(x=Position, y=n)) + geom_bar(stat="identity")

由于计数是在dplyr中而不是在ggplot中或使用table进行的，因此这比预先对因子水平进行重新排序要快得多。

如果图表列来自下面的数据框中的数字变量，则可以使用更简单的解决方案：

1 2	ggplot(df, aes(x = reorder(Colors, -Qty, sum), y = Qty)) + geom_bar(stat ="identity")

排序变量(-Qty)之前的减号控制排序方向(升/降)

这是一些测试数据：

1
2
3
4
5
6
7
8
9
10
11
12

df <- data.frame(Colors = c("Green","Yellow","Blue","Red","Yellow","Blue"),
Qty = c(7,4,5,1,3,6)
)

**Sample data:**
Colors Qty
1 Green 7
2 Yellow 4
3 Blue 5
4 Red 1
5 Yellow 3
6 Blue 6

当我找到该线程时，这就是我想要的答案。希望对其他人有用。

由于我们只查看单个变量("位置")的分布，而不是查看两个变量之间的关系，因此直方图可能是更合适的图形。 ggplot具有geom_histogram()，可轻松实现：

1	ggplot(theTable, aes(x = Position)) + geom_histogram(stat="count")

enter image description here

使用geom_histogram()：

我认为geom_histogram()有点古怪，因为它对连续数据和离散数据的处理方式不同。

对于连续数据，您可以仅使用不带参数的geom_histogram()。
例如，如果我们添加数字矢量"分数" ...

1
2
3
4
5
6
7

Name Position Score
1 James Goalkeeper 10
2 Frank Goalkeeper 20
3 Jean Defense 10
4 Steve Defense 10
5 John Defense 20
6 Tim Striker 50

并在"分数"变量上使用geom_histogram()...

1	ggplot(theTable, aes(x = Score)) + geom_histogram()

enter image description here

对于像"位置"这样的离散数据，我们必须指定一个通过美学计算的统计量，以使用stat ="count"给出钢筋高度的y值：

1	ggplot(theTable, aes(x = Position)) + geom_histogram(stat ="count")

注意：奇怪和令人困惑的是，您也可以对连续数据使用stat ="count"，我认为它提供了更美观的图形。

1	ggplot(theTable, aes(x = Score)) + geom_histogram(stat ="count")

enter image description here

编辑：扩展答案，以响应DebanjanB的有用建议。

相关讨论

另一种使用重排序对因子水平进行排序的方法。基于计数按升序(n)或降序(-n)。与使用forcats包中的fct_reorder的代码非常相似：

降序排列

1
2
3
4
5

df %>%
count(Position) %>%
ggplot(aes(x = reorder(Position, -n), y = n)) +
geom_bar(stat = 'identity') +
xlab("Position")

enter image description here

升序

1
2
3
4
5

df %>%
count(Position) %>%
ggplot(aes(x = reorder(Position, n), y = n)) +
geom_bar(stat = 'identity') +
xlab("Position")

enter image description here

数据框：

1
2
3
4
5

df <- structure(list(Position = structure(c(3L, 3L, 1L, 1L, 1L, 2L), .Label = c("Defense",
"Striker","Zoalkeeper"), class ="factor"), Name = structure(c(2L,
1L, 3L, 5L, 4L, 6L), .Label = c("Frank","James","Jean","John",
"Steve","Tim"), class ="factor")), class ="data.frame", row.names = c(NA,
-6L))