Only extract words with 2 characters Pandas Series

Question

is there a way in Pandas data frame where I can extract words from a column of strings that are only of length of 2 characters?

For example:

Singapore SG Jalan ID Indonesia Malaysia MY

And the results will be

SG ID MY

score 4 · Accepted Answer · answered Mar 05 '18 at 07:35

Use str.findall by regex with str.join:

df['B'] = df['A'].str.findall(r'\b[a-zA-Z]{2}\b').str.join(' ')
print (df)
                                             A         B
0  Singapore SG Jalan ID Indonesia Malaysia MY  SG ID MY
1                          Singapore SG Jalan         SG
2                        Singapore Malaysia MY        MY

score 1 · Answer 2 · answered Mar 05 '18 at 07:37

1

This might help.

df["short"] = df["test"].apply(lambda x: " ".join([i for i in x.split() if len(i) == 2]))

Output:

                                          test     short
0  Singapore SG Jalan ID Indonesia Malaysia MY  SG ID MY

answered Mar 05 '18 at 07:37

Rakesh

78,594
17
67
103

score 1 · Answer 3 · answered Mar 05 '18 at 07:38

You can use this:

df = {'a': ['Singapore SG Jalan ID', 'SG Jalan ID Indonesia Malaysia MY'] }
df = pd.DataFrame(data=df)

                                   a
0              Singapore SG Jalan ID
1  SG Jalan ID Indonesia Malaysia MY

df['a1'] = df['a'].str.findall(r'\b\S\S\b')

Output:

                                   a            a1
0              Singapore SG Jalan ID      [SG, ID]
1  SG Jalan ID Indonesia Malaysia MY  [SG, ID, MY]

score 1 · Answer 4 · answered Mar 05 '18 at 08:54

Using pd.Series.str.replace

df.assign(B=df.A.str.replace('(\s*\w{3,}\s*)+', ' ').str.strip())

                                             A         B
0  Singapore SG Jalan ID Indonesia Malaysia MY  SG ID MY
1                           Singapore SG Jalan        SG
2                        Singapore Malaysia MY        MY

Only extract words with 2 characters Pandas Series

4 Answers4