0

I'm very new to R (using RStudio) coming from a background in Excel using long vlookup and sumif expressions (which i'd use for this problem). However want to use R as the dataset SALES is so large. Is there a way to do the following in R?

I have a large dataframe called SALES with columns such as the following with example data:

 - **ID,  COUNTRY_CODE,  INVOICE_DATE,   NET_AMOUNT,  COUNTRY_NAME  ... etc** 
 - 1,         GB,        05/03/2017,       128,           ... 
 - 2,         GB,        05/03/2017,        58, 
 - 3,         FR,        04/03/2017,        200, 
 - 4,         US,        03/03/2017,       139,           ... 
 - 5,         GB,        03/03/2017,       152, 
 - ... 
 - 65492,     FR,        21/01/2011,        91, 
 - 65492,     GB,        20/01/2011,       136,

I want to sum the NET_AMOUNT from the SALES data frame, per month and per year, for each unique country code and recorded this in a new dataframe DF_SALES_BY_COUNTRY.

I have started by creating a new dataframe DF_SALES_BY_COUNTRY containing unique values for the COUNTRY_CODE and COUNTRY_NAME:

COUNTRY_NAME_UNIQUE <-unique(SALES$`COUNTRY_NAME`, incomparables = FALSE)
COUNTRY_CODE_UNIQUE <- unique(SALES$`COUNTRY_CODE`, incomparables = FALSE)
DF_SALES_BY_COUNTRY<- data.frame(COUNTRY_NAME_UNIQUE,COUNTRY_CODE_UNIQUE)

,then I get stuck:

  • How to sum the NET_AMOUNT for each COUNTRY_CODE per month and year (based on the INVOICE_DATE in the SALES dataframe) and adding it to a new column in DF_SALES_BY_COUNTRY

Any advice on how to, or even if ive started correctly would be great. Thanks!

Sotos
  • 47,396
  • 5
  • 31
  • 61
  • 1
    Consider using `aggregate`. Something along the lines of `aggregate(NET_AMOUNT ~ COUNTRY_CODE + MONTH + YEAR, FUN = sum, data = xy)`... – Roman Luštrik Jul 06 '17 at 13:58
  • Hi, thanks for suggesting using aggregate. When investigating how to I found this solution: DF_SALES_BY_COUNTRY$NET_AMOUNT_TO_DATE – Charlie Jul 07 '17 at 07:51

0 Answers0