What could be the better approach than apply method

Question

I am working with fuzzy keyword matching The first dataset consists on 20180 rows and second dataset about 10000 rows I am using the .apply method to find the match I am using progress bar to see my iterations per seconds. It posted around 2-3 iterations per second. How do I increase the speed or Is there any better approach to give faster results than this code to fuzzy match?

df1['match']=df1['title'].progress_apply(lambda x: process.extractOne(x,df_conm['conm'].to_list(),score_cutoff=100))
df1

Please post code as text, not image. Be copy/paste friendly. — tdelaney, Sep 09 '21 at 22:56
One simple improvement is to compute `lst = df_conm['conm'].to_list()`, then use lst in `df1['match'] = ...`. This way you're not recomputing df_conm['conm'].to_list() for every row in df1 (i.e. 20180 rows). — DarrylG, Sep 09 '21 at 23:13
Still the same, It is taking around 3 hours to get the output — Achillies, Sep 09 '21 at 23:23
There was no improvement in time when you changed to `df1['match']=df1['title'].progress_apply(lambda x: process.extractOne(x, lst), score_cutoff=100))`? — DarrylG, Sep 09 '21 at 23:29
A bigger improvement can be obtained by using [rapidfuzz](https://github.com/maxbachmann/RapidFuzz) (rather than fuzzywuzzy) as illustrated by [Is there a way to modify this code to reduce run time?](https://stackoverflow.com/questions/68483600/is-there-a-way-to-modify-this-code-to-reduce-run-time/68494221?r=SearchResults&s=1|8.9407#68494221) — DarrylG, Sep 09 '21 at 23:39
As a side note it makes no sense to use a score_cutoff of 100. This means that only exact matches will be considered. — maxbachmann, Sep 10 '21 at 09:51
I’m voting to close this question because it belongs to https://codereview.stackexchange.com/ — stackprotector, Sep 16 '21 at 06:03

What could be the better approach than apply method

0 Answers0