12

I want to be able to find the index of all occurrences of a substring in a larger string using Ruby. E.g.: all "in" in "Einstein"

str = "Einstein"
str.index("in") #returns only 1
str.scan("in")  #returns ["in","in"]
#desired output would be [1, 6]
Dimitar
  • 3,969
  • 4
  • 29
  • 46
Mokhtar
  • 145
  • 1
  • 7

4 Answers4

20

The standard hack is:

indices = "Einstein".enum_for(:scan, /(?=in)/).map do
  Regexp.last_match.offset(0).first
end
#=> [1, 6]
tokland
  • 63,578
  • 13
  • 136
  • 167
8
def indices_of_matches(str, target)
  sz = target.size
  (0..str.size-sz).select { |i| str[i,sz] == target }
end

indices_of_matches('Einstein', 'in')
  #=> [1, 6]
indices_of_matches('nnnn', 'nn')
  #=> [0, 1, 2]

The second example reflects an assumption I made about the treatment of overlapping strings. If overlapping strings are not to be considered (i.e., [0, 2] is the desired return value in the second example), this answer is obviously inappropriate.

Cary Swoveland
  • 101,330
  • 6
  • 60
  • 95
6

This is a more verbose solution which brings the advantage of not relying on a global value:

def indices(string, regex)
  position = 0
  Enumerator.new do |yielder|
    while match = regex.match(string, position)
      yielder << match.begin(0)
      position = match.end(0)
    end
  end
end

p indices("Einstein", /in/).to_a
# [1, 6]

It outputs an Enumerator, so you could also use it lazily or just take the n first indices.

Also, if you might need more information than just the indices, you could return an Enumerator of MatchData and extract the indices:

def matches(string, regex)
  position = 0
  Enumerator.new do |yielder|
    while match = regex.match(string, position)
      yielder << match
      position = match.end(0)
    end
  end
end

p matches("Einstein", /in/).map{ |match| match.begin(0) }
# [1, 6]

To get the behaviour described by @Cary, you could replace the last line in block by position = match.begin(0) + 1.

Eric Duminil
  • 50,694
  • 8
  • 64
  • 113
1

#Recursive Function

    def indexes string, sub_string, start=0
      index = string[start..-1].index(sub_string)
      return [] unless index
      [index+start] + indexes(string,sub_string,index+start+1)
    end

#For better Usage I would open String class

  class String

    def indexes sub_string,start=0
      index = self[start..-1].index(sub_string)
      return [] unless index
      [index+start] + indexes(sub_string,index+start+1)
    end

  end

This way we can call in this way: "Einstein".indexes("in") #=> [1, 6]

Qaisar Nadeem
  • 2,314
  • 12
  • 22