r - When using ggplot2, can I set the color of histogram bars without potentially obscuring low values? -
when calling geom_histogram()
color
, , fill
arguments, ggplot2
confusingly paint whole x-axis range, making impossible visually distinguish between low value , 0 value.
running following code:
ggplot(esubset, aes(x=exectime)) + geom_histogram(binwidth = 0.5) + theme_bw() + scale_x_continuous(breaks=seq(0,20), limits=c(0,20))
will result in
this visually unappealing. fix that, i'd instead use
ggplot(esubset, aes(x=exectime)) + geom_histogram(binwidth = 0.5, colour='black', fill='gray') + theme_bw() + scale_x_continuous(breaks=seq(0,20), limits=c(0,20))
which result in
the problem i'll have no way of distinguishing whether exectime
contains values past 10, few occurrences of 12, example, hidden behind horizontal line spanning whole x-axis.
use coord_cartesian
instead of scale_x_continuous
. coord_cartesian
sets axis range without affecting how data plotted. coord_cartesian
, can still use scale_x_continuous
set breaks
, coord_cartesian
override effect of scale_x_continuous
on how data plotted.
in fake data below, note i've added data few small bars.
set.seed(4958) dat = data.frame(value=c(rnorm(5000, 10, 1), rep(15:20,1:6))) ggplot(dat, aes(value)) + geom_histogram(binwidth=0.5, color="black", fill="grey") + theme_bw() + scale_x_continuous(limits=c(5,25), breaks=5:25) + ggtitle("scale_x_continuous") ggplot(dat, aes(value)) + geom_histogram(binwidth=0.5, color="black", fill="grey") + theme_bw() + coord_cartesian(xlim=c(5,25)) + scale_x_continuous(breaks=5:25) + ggtitle("coord_cartesian")
as can see in plots above, if there bins count=0 within data range, ggplot add zero-line, coord_cartesian
. makes difficult see bar @ 15 of height=1. can make border thinner lwd
argument ("linewidth") smaller bars less obscured:
ggplot(dat, aes(value)) + geom_histogram(binwidth=0.5, color="black", fill="grey", lwd=0.3) + theme_bw() + coord_cartesian(xlim=c(5,25)) + scale_x_continuous(breaks=5:25) + ggtitle("coord_cartesian")
one other option pre-summarise data , plot using geom_bar
in order spaces between bars , thereby avoid need border lines mark bar edges:
library(dplyr) library(tidyr) library(zoo) bins = seq(floor(min(dat$value)) - 1.75, ceiling(max(dat$value)) + 1.25, 0.5) dat.binned = dat %>% count(bin=cut(value, bins, right=false)) %>% # bin data complete(bin, fill=list(n=0)) %>% # restore empty bins , fill zeros mutate(bin = rollmean(bins,2)[-length(bins)]) # convert bin factor numeric value = mean of bin range ggplot(dat.binned, aes(bin, n)) + geom_bar(stat="identity", fill=hcl(240,100,30)) + theme_bw() + scale_x_continuous(breaks=0:21)
Comments
Post a Comment