Patubd, there is a lot going on and I am afraid that the comments will not suffice to get you going. Thus, I try to point out a few things here.
You are not providing a reproducible example. Thus, I "simulate" some data upfront. You can adapt this to your liking.
In your ggplot() calls you refer to the g dataframe. There is no need to then use the explicit g$variable notation.
You do the same in your MeanMarketCap pipe. I guess that is part of the problems you face.
data:
library(dplyr)
set.seed(666) # set seed for random generator
# ------------------- data frame with 60 examples of industry group SIC and MarketCap
df <- data.frame(
SIC = rep(c("0","1","2"), 20)
, MarketCap = c(rep(50, 30), rep(1000, 15), rep(2000, 10), rep(3000, 5))
)
# ------------------- add 15 random picks to make it less homogenuous
df <- df %>%
bind_rows(df %>% sample_n(15))
(I) "less colourful" and/or facets
fig1 <- ggplot(data = df, aes(x=MarketCap, group = SIC, fill=SIC)) +
geom_histogram(position = "dodge") +
#------------- as proposed to make graph less colourful / shades of grey ---------
scale_fill_grey() +
#---------------------------------------------------------------------------------
theme_bw() + xlim(0,5000) +
labs(x = "Market Value (in Millions $)", title = "Market Value per Industry")
# make a 2nd plot by facetting above
# If the plot is stored in an object, i.e. fig1, you do not have to "repeat" the code
# and just add the facet-layer
fig2 <- fig1 + facet_grid(. ~ SIC)
library(patchwork) # cool package to combine plots
fig1 / fig2 # puts one plot above the other
With a facet you break out the groups. This supports side-by-side analysis ... and the colouring of the group is less important as this is now part of the facetting. But you can combine both as shown.
![enter image description here]()
(II) summary mean
Your code will work, if you do not use the df$variable notation. This breaks the group-by call and you refer to the full data frame.
df %>%
group_by(SIC) %>%
summarise(MeanMarketCap = mean(MarketCap))
This yields with the - simplistic simulated - data:
# A tibble: 3 x 2
SIC MeanMarketCap
<chr> <dbl>
1 0 858.
2 1 876.
3 2 858.
To show distributions one can use boxplots. Boxplots work with the inter-quartile spread (25th-75th percentile and the median [50th percentile].
You can use geom_boxplot() for this. ggplot will take care of the statistical calculation.
df %>%
ggplot() +
geom_boxplot(aes(x = SIC, y = MarketCap)
With your data (more varied data points) the plot will look a bit more impressive.
But you can already clearly see the difference in the median across the example industries, SIC.
![enter image description here]()
If you feel like you can add your data points with geom_jitter().
Hope this gets you started. Good luck!