find all possible overlapping prefixes in a word using python

Question

many Natural Languages have prefixes that adds some meaning to a word. for example: anti for antivirus, co for coordinator, counter for counterpart

detecting the stem needs these prefixes to be separated. suppose having a list of prefixes for a certain language:

prefix_list = ['c', 'ca', 'ata', 'de']

How to mach all possible overlapping occurrence in a word "catastrophic"

the result should be: ['c', 'ca']

trials:

| character doesn't support overlapping
Otto's solution doesn't mach overlaps in the beginning of the word
I tried to backward assertion instead in the previous solution but look-behind requires fixed-width pattern

notes:

ata can't be a result as the word doesn't start with ata

It looks like you want us to write some code for you. While many users are willing to produce code for a coder in distress, they usually only help when the poster has already tried to solve the problem on their own. A good way to demonstrate this effort is to include the code you've written so far, example input (if there is any), the expected output, and the output you actually get (console output, stack traces, compiler errors - whatever is applicable). The more detail you provide, the more answers you are likely to receive. — Martijn Pieters, Aug 03 '13 at 13:35
@MartijnPieters I have edited the question adding more details — MYaser, Aug 03 '13 at 14:11

score 1 · Answer 1 · answered Aug 03 '13 at 13:36

1

Don't use a regular expression. Use a list comprehension instead:

[prefix for prefix in prefix_list if word.startswith(prefix)]

This creates a list of all entries in prefix_list that are a prefix of word.

answered Aug 03 '13 at 13:36

Martijn Pieters

963,270
265
3,804
3,187

wouldn't that solution affect performance badly? – MYaser Aug 03 '13 at 14:12
That depends on the length of the prefixes list; regular expressions can easily be slower. – Martijn Pieters Aug 03 '13 at 14:16

find all possible overlapping prefixes in a word using python

1 Answers1