3

SQLDF newbie here.

I have a data frame which has about 15,000 rows and 1 column. The data looks like:

cars
autocar
carsinfo
whatisthat
donnadrive
car
telephone
...

I wanted to use the package sqldf to loop through the column and pick all values which contain "car" anywhere in their value. However, the following code generates an error.

> sqldf("SELECT Keyword FROM dat WHERE Keyword="car")
Error: unexpected symbol in "sqldf("SELECT Keyword FROM dat WHERE Keyword="car"

There is no unexpected symbol, so I'm not sure whats wrong.

so first, I want to know all the values which contain 'car'. then I want to know only those values which contain just 'car' by itself.

Can anyone help.

EDIT:

allright, there was an unexpected symbol, but it only gives me just car and not every row which contains 'car'.

> sqldf("SELECT Keyword FROM dat WHERE Keyword='car'")
  Keyword
1     car
ATMathew
  • 11,974
  • 24
  • 67
  • 74

4 Answers4

7

Using = will only return exact matches.

You should probably use the like operator combined with the wildcards % or _. The % wildcard will match multiple characters, while _ matches a single character.

Something like the following will find all instances of car, e.g. "cars", "motorcar", etc:

sqldf("SELECT Keyword FROM dat WHERE Keyword like '%car%'")

And the following will match "car" or "cars":

sqldf("SELECT Keyword FROM dat WHERE Keyword like 'car_'")
Andrie
  • 170,733
  • 42
  • 434
  • 486
3

This has nothing to do with sqldf; your SQL statement is the problem. You need:

dat <- data.frame(Keyword=c("cars","autocar","carsinfo",
  "whatisthat","donnadrive","car","telephone"))
sqldf("SELECT Keyword FROM dat WHERE Keyword like '%car%'")
#    Keyword
# 1     cars
# 2  autocar
# 3 carsinfo
# 4      car
Joshua Ulrich
  • 168,168
  • 29
  • 327
  • 408
2

You can also use regular expressions to do this sort of filtering. grepl returns a logical vector (TRUE / FALSE) stating whether or not there was a match or not. You can get very sophisticated to match specific items, but a basic query will work in this case:

#Using @Joshua's dat data.frame
subset(dat, grepl("car", Keyword, ignore.case = TRUE))

   Keyword
1     cars
2  autocar
3 carsinfo
6      car
Chase
  • 65,190
  • 17
  • 140
  • 160
0

Very similar to the solution provided by @Chase. Because we do not use subset we do not need a logical vector and can use both grep or grepl:

df <- data.frame(keyword = c("cars", "autocar", "carsinfo", "whatisthat", "donnadrive", "car", "telephone"))
df[grep("car", df$keyword), , drop = FALSE] # or
df[grepl("car", df$keyword), , drop = FALSE]

   keyword
1     cars
2  autocar
3 carsinfo
6      car

I took the idea from Selecting rows where a column has a string like 'hsa..' (partial string match)

Community
  • 1
  • 1
mpalanco
  • 11,967
  • 2
  • 55
  • 64