9

I work for an org that has a number of internal packages that were created many years ago. These are in the form of package zip archives that were compiled on Windows on R 3.x. Therefore, they can't be installed on R 4.x, and can't be used on Macs or Linux either without being recompiled. So everyone in the entire org is stuck on R 3.6 until this is resolved. I don't have access to the original package source files. They are lost to time....

I want to take these packages, extract the code and data, and update them for modern best practices (roxygen, GitHub repos, testthat etc.). What is the best way of doing this? I have a fair amount of experience with package development. I have already tackled one. I started a new RStudio package project, and going function by function, copying the function code to a new script file, getting and reformatting the help from the help browser as roxygen docs. I've done the same for any internal hidden functions that i could find (via pkg_name::: mostly) , and also the internal datasets. That is all fairly straightforward, but very time consuming. It builds ok, but I haven't yet tested the actual functionality of the code.

I'm currently stuck because there are a couple of standardGeneric method functions for custom S4 class objects. I am completely unfamiliar with these and haven't been able to figure out how to copy them over. Viewing the source code they are wrapped in new() with "standardGeneric" as the first argument (plus a lot more obviously), as opposed to just being a simple function definition for all the other functions. Any help with how to recreate or copy these over would be very welcome.

But maybe I am going about this the wrong way in the first place. I haven't been able to find any helpful suggestions about how to "back engineer" R package source files from a compiled version.

Anyone any ideas?

hokeybot
  • 173
  • 3
  • I doubt I have sufficient expertise for this, but shouldn't it be possible to iterate over the namespace without copying things by hand? – Greg Nov 11 '21 at 15:24
  • 2
    In the R-internal manual (https://cran.r-project.org/doc/manuals/r-patched/R-ints.pdf), pages 21-22 something is discussed regarding how the R source file is converted in the compiled package. I guess that if the package you are trying to rebuild has some compiled C/C++ code, there won't be much that you can do. – nicola Nov 11 '21 at 16:18
  • 1
    There's not a one-size-fits-all way to do this. Without any sort of [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) it's really impossible to offer specific suggestions. Every package is different and without knowing exactly what the requirements are or how to measure "success", this really can't be answered. – MrFlick Nov 11 '21 at 18:58
  • What might `dput()` do for these custom objects and for their methods? This answer [here](https://stackoverflow.com/a/3474049) should handle the complexities of reproducing nested objects, if `dput()` falls short. – Greg Nov 16 '21 at 20:18
  • 2
    @hokeybot Just bountied this question. I've wrestled with some reverse-engineering myself, and I'm curious as to the answer (if any) for the `standardGeneric`s and custom objects. **I suggest you post a snippet of SOURCE CODE for one of those items, so any "bounty hunters" have a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) to work with**. – Greg Nov 23 '21 at 22:29

1 Answers1

7

Check out if this works in R 3.6.

Below script can automate least part of your problem by writing all function sources into separate and appropriately named .R files. This code will also take care of hidden functions.

Extracting code

# Use your package name
package_name <- "dplyr" 

# Extract all method names, including hidden
nms <- paste(lsf.str(paste0("package:", package_name), all.names = TRUE))

# Loop through the method names,
# extract head and body, and write them to R files
for (i in 1:length(nms)) {

    # Extract name
    nm <- nms[i]

    # Extract head
    hd_raw <- capture.output(args(nms[i]))
    # Collapse raw output, but drop trailing NULL
    hd <- paste0(hd_raw[-length(hd_raw)], collapse = "\n")

    # Extract body, collapse
    bd <- paste0(capture.output(body(nms[i])), collapse = "\n")
    
    # Write all to file
    write(paste0(hd, bd), file = paste0(nm, ".R"))
}

Extracting help files

To extract a functions's help text a similar way, you can use code from the following SO answers:

A starting point could be something like:

library(tools)
package_name <- "dplyr" 
db <- Rd_db(package_name)

# Extract all method names, including hidden
nms <- paste(lsf.str(paste0("package:", package_name), all.names = TRUE))

# Loop through the method names,
# extract Rd contents if they exist in this namespace, 
# and write them to new Rd files
for (i in 1:length(nms)) {
    
    # Extract name
    nm <- nms[i]
    
    rd_raw <- db[names(db) %in% paste0(nm, ".Rd")]
    if (length(rd_raw) > 0) {
        rd <- paste0(capture.output(rd_raw), collapse = "\n")
        # Write all to file
        write(rd, file = paste0(nm, ".Rd"))
    }
    
}
Roman
  • 4,446
  • 2
  • 15
  • 52
  • 1
    You can capture a function with a single line: `fn |> base::dput() |> utils::capture.output() |> base::paste0(collapse = "\n")` – Greg Nov 15 '21 at 14:45
  • Good point! Didn't think of using `dput` here. – Roman Nov 15 '21 at 15:22
  • 3
    Thank you, this is incredibly helpful! I only had to edit the help extraction code slightly (`".rd"` not `".Rd"`). Now i just need to figure out how to recreate the S4 custom generic... – hokeybot Nov 18 '21 at 08:58
  • 1
    @hokeybot To build off [@Roman](https://stackoverflow.com/users/9406040/roman)'s suggestion for help files, you might leverage the [**`Rd2roxygen`**](https://cran.r-project.org/web/packages/Rd2roxygen/vignettes/Rd2roxygen.html) package, to convert those `.rd` files back into [roxygen](https://roxygen2.r-lib.org/) comments. – Greg Dec 01 '21 at 21:53