0

My problem is that I can not understand the error message of this environment. I think it is very vague. Now I do not understand where the problem is.

EXEC sp_execute_external_script
  @language = N'R',
  @script = N'
    count = 0; x=1; y=2; m="that is good until here"
    data = as.vector(data);
    for(i in data){
        if(data[y]>data[x]){count=count+1; x=x+1; y=y+1}
        else{x=x+1; y=y+1}};
    count <- data.frame(count)',
    @output_data_1_name = N'count',
    @input_data_1_name = N'data',
    @input_data_1 = N'SELECT alcohol FROM [wine].[dbo].[wineT]'

enter image description here

enter image description here

marc_s
  • 704,970
  • 168
  • 1,303
  • 1,425
  • 1
    Can you get this R code to work on the R console without sql-server or ssms? – r2evans Dec 13 '21 at 13:45
  • yes, but it's not for main work. they use this way just for test. And I run this code to Rstudio and it's work perfectly. – Mahdi Khalil Nejad Dec 14 '21 at 09:16
  • 1
    Odd. If `data` is a `data.frame`, then `as.vector(data)` is also a frame, which means that `for(i in data)` will iterative over each column (i.e., the only column). With that, on the first and only iteration of the `for` loop, `i` is a vector representing whatever `alcohol` is, we'll say a vector of `numeric`, length greater than 1. Continuing, `data[y]` is `data[2]`, which ***fails*** because `y` is 2 and `data` is a single column frame. Forgive me for being skeptical, but your question is not reproducible, and your assertion that it works seems suspect. – r2evans Dec 14 '21 at 12:43
  • 1
    I suggest you make the R portion of this script reproducible by providing sample data. Please see https://stackoverflow.com/q/5963269, [mcve], and https://stackoverflow.com/tags/r/info. Then [edit] your question and add the output from `dput(x)` where `x` is a small representative sample. The please show the actual (expected) output on the console without sql. – r2evans Dec 14 '21 at 12:45
  • yes, you Really right... and I change "as.vector" to "unlist" and then I get error of "missing value where true/false needed" and then I fix this error, and guess what? bingo. It solved. by the way I haven't good knowledge in programming. and thanks for guiding me. – Mahdi Khalil Nejad Dec 14 '21 at 15:34

2 Answers2

1

Untested, try this:

EXEC sp_execute_external_script
  @language = N'R',
  @script = N'
    data = unlist(data);
    count = data.frame(count = sum(data[-length(data)] > data[-1]);',
  @output_data_1_name = N'count',
  @input_data_1_name = N'data',
  @input_data_1 = N'SELECT alcohol FROM [wine].[dbo].[wineT]'

Issues:

  1. as.vector does not do much to a data.frame, ergo the shift to unlist(data);

  2. Your missing value error is because you extend y out beyond the length of data. For instance, on the R console I can reproduce the error with this:

    for (i in data) { if (data[y] > data[x]) { count=count+1; x=x+1; y=y+1} else {x=x+1; y=y+1} }
    # Error in if (data[y] > data[x]) { (from #1) : missing value where TRUE/FALSE needed
    count
    # [1] 4
    x
    # [1] 10
    y
    # [1] 11
    

    Since length(data) is 10, then data[y] is data[11] is NA. This leads to a conditional of NA > 3 which returns NA which does not work in an if conditional. (FYI, an if conditional must always be length-1, and it must be clearly "truthy", meaning TRUE or FALSE, or a number where 0 is false and anything else is true.)

  3. An alternative to this creates i as an index on data starting at 2.

    count <- 0
    for(i in seq_along(data)[-1]) { if (data[i-1] > data[i]) { count=count+1 }; x=x+1; y=y+1; }
    count
    # [1] 4
    

    where seq_along(data) produces (in this example) 1:10, but [-1] removes the first 1, so we can index safely from 2 until the length of data.

  4. Better yet, though, is that we don't need to loop at all: all you want to do is compare each value (except the first) with the preceding value and count how many times the previous number is greater. R vectorizes very well, so we can determine in one expression which meet that condition, and sum them up just as quickly.

    data
    #  a1  a2  a3  a4  a5  a6  a7  a8  a9 a10 
    #   1   5  10   8   2   4   6   9   7   3 
    data[-length(data)] > data[-1]
    #    a1    a2    a3    a4    a5    a6    a7    a8    a9 
    # FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE  TRUE  TRUE 
    

    and sum(..) that up to get our needed result.

r2evans
  • 108,754
  • 5
  • 72
  • 122
  • I run your answer and I get a Error, so then I change your code a little. I get out "count" calculation from data.frame function and then calculate data.frame of "count". And this time I got the result without Error. but I think the result isn't TRUE. – Mahdi Khalil Nejad Dec 15 '21 at 09:22
  • I went to answer first question of Advent of Code 2021... so I know the answer is 1451, but your code give me the 548. I know that I can use just R, not R script in SQL... but it is for training. by the way I really appreciate your helping – Mahdi Khalil Nejad Dec 15 '21 at 09:45
  • 1
    I would be able to help more if your question included known sample data and the expected result. I tested with rather simple data (`data – r2evans Dec 15 '21 at 11:04
0

I know it is not a tidy and efficient answer but I get the right answer with this code.

  EXEC sp_execute_external_script
      @language = N"R",
      @script = N"
        count=0; x=1; y=2; z=NA;
        data = unlist(data);
        for(i in data){
            if(is.na(z)){z=FALSE}else{
            if(data[y]>data[x]){count=count+1; x=x+1; y=y+1}
            else{x=x+1; y=y+1}}};
        count <- data.frame(count)",
        @output_data_1_name = N"count",
        @input_data_1_name = N"data",
        @input_data_1 = N"SELECT column1 FROM [wine].[dbo].[data]"