2

I have a dataset where I am trying to take rows of data that are strung together:

neighborhoods that are listed as Allendale/Irvington/S. Hilton Beechfield/Ten Hills/West Hills etc

and they are associated with columns of data.

I would like to take those neighborhoods, use a split function to get

Allendale
Irvington
S. Hilton
Beechfield
Ten Hills

but I also want copy the data down so that the column data for Allendale Irvington and S. Hilton are the same!

Then I'll just sort it back to alphabetical order.

I'm a novice and google most of what I do, so if you could also kind of explain what you're doing, that would help a great deal!

Brian Tompsett - 汤莱恩
  • 5,438
  • 68
  • 55
  • 126
  • 3
    Try `strsplit` and specify the `split`. Please format your example and expected output. – akrun Jul 16 '15 at 18:40
  • 4
    Please provide a minimal example as in http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610 and specify what your expected output would look like for that example. – mts Jul 16 '15 at 18:44
  • As above commentators mentioend: specify expected output. I can venture an approximate answer: do it in two steps per row. In a first step you visit the cell or cells which need splitting and do the spliting (e.g., values – Marcelo Bielsa Jul 16 '15 at 18:49
  • use `dput` on your data which will make it copy-pastable – Dean MacGregor Jul 16 '15 at 19:18
  • Thanks everyone! Jaap got me! – Corinne Wiesner Jul 20 '15 at 16:08

1 Answers1

3

You can do that with the cSplit function of the package:

# create some dummy data
df <- data.frame(n=c(12,15),area=c("Allendale/Irvington/S. Hilton","Beechfield/Ten Hills/West Hills"))

# split & convert to long format
library(splitstackshape)
df.new <- cSplit(df, sep="/", "area", "long", type.convert=TRUE)

the result:

> df.new
    n       area
1: 12  Allendale
2: 12  Irvington
3: 12  S. Hilton
4: 15 Beechfield
5: 15  Ten Hills
6: 15 West Hills

An alternative is to use the tstrsplit function from the package:

library(data.table)
dt.new <- setDT(df)[, lapply(.SD, function(x) unlist(tstrsplit(x, "/", fixed=TRUE))), by=n]

this gives:

> dt.new
    n       area
1: 12  Allendale
2: 12  Irvington
3: 12  S. Hilton
4: 15 Beechfield
5: 15  Ten Hills
6: 15 West Hills

You can also use:

dt.new <- setDT(df)[, strsplit(area,"/",fixed=TRUE), by=n]

but that does not preserve the variable name (i.e. area).

Jaap
  • 77,147
  • 31
  • 174
  • 185