2

many Natural Languages have prefixes that adds some meaning to a word. for example: anti for antivirus, co for coordinator, counter for counterpart

detecting the stem needs these prefixes to be separated. suppose having a list of prefixes for a certain language:

prefix_list = ['c', 'ca', 'ata', 'de']

How to mach all possible overlapping occurrence in a word "catastrophic"

the result should be: ['c', 'ca']

trials:

  • | character doesn't support overlapping
  • Otto's solution doesn't mach overlaps in the beginning of the word
  • I tried to backward assertion instead in the previous solution but look-behind requires fixed-width pattern

notes:

  • ata can't be a result as the word doesn't start with ata
Community
  • 1
  • 1
MYaser
  • 398
  • 3
  • 14
  • It looks like you want us to write some code for you. While many users are willing to produce code for a coder in distress, they usually only help when the poster has already tried to solve the problem on their own. A good way to demonstrate this effort is to include the code you've written so far, example input (if there is any), the expected output, and the output you actually get (console output, stack traces, compiler errors - whatever is applicable). The more detail you provide, the more answers you are likely to receive. – Martijn Pieters Aug 03 '13 at 13:35
  • Why `ata` is not there in result? – Rohit Jain Aug 03 '13 at 13:35
  • @MartijnPieters I have edited the question adding more details – MYaser Aug 03 '13 at 14:11

1 Answers1

1

Don't use a regular expression. Use a list comprehension instead:

[prefix for prefix in prefix_list if word.startswith(prefix)]

This creates a list of all entries in prefix_list that are a prefix of word.

Martijn Pieters
  • 963,270
  • 265
  • 3,804
  • 3,187