Finding longest word in a txt file

Question

I am trying to create a function in which a filename is taken as a parameter and the function returns the longest word in the file with the line number attached to the front of it. This is what I have so far but it is not producing the expected output I need.

def word_finder(file_name):
    with open(file_name) as f:
        lines = f.readlines()
        line_num = 0
        longest_word = None
        for line in lines:
            line = line.strip()
            if len(line) == 0:
                return None
            else:
                line_num += 1
                tokens = line.split()
                for token in tokens:
                    if longest_word is None or len(token) > len(longest_word):
                        longest_word = token
            return (str(line_num) + ": " + str(longest_word))

Please include your sample input, actual output and expected output. Though cursory glance, you have your ```return``` statement inside the ```for line in lines:``` loop which will exit right after the 1st iteration. — ewong, Jun 01 '22 at 00:24
Need some more details. How the text is formatted in the txt (1 big block, multlines...)? How words are separated (whitespace, ","...)? — Drakax, Jun 01 '22 at 00:26
It would generally be better to return a tuple of `(line_num, longest_word)` and let the caller format that as needed. Also, why return None if a line is blank? — jarmod, Jun 01 '22 at 00:35
Side-note: There is almost never a need to call `f.readlines()`. Instead of doing `lines = f.readlines()`, then doing `for line in lines:`, just do `for line in f:`; files are lazy iterators over their lines, and iterating the lines from the file object directly means you only need to store one line at a time, so your memory usage is proportionate to the longest line in the file, not the total file size (for a multi-GB file, the difference could easily make or break your program). — ShadowRanger, Jun 01 '22 at 00:35

score 1 · Answer 1 · answered Jun 01 '22 at 00:38

1

I think this is the shortest way to find the word, correct if not

def wordFinder(filename):
    with open(filename, "r") as f:
        words = f.read().split() # split() returns a list with the words in the file
        longestWord = max(words, key = len) # key = len returns the word size
        print(longestWord) # prints the longest word

answered Jun 01 '22 at 00:38

x07ex

11
1

1

OP needs line number. – jarmod Jun 01 '22 at 00:39
The application of _max by length_ function is elegant! – hc_dev Jun 01 '22 at 00:55

hc_dev · Answer 2 · 2022-06-01T01:28:12.620

Issue

Exactly what ewong diagnosed:

last return statement is too deep indented

Currently:

the longest word in the first line only

Solution

Should be aligned with the loop's column, to be executed after the loop.

def word_finder(file_name):
    with open(file_name) as f:
        lines = f.readlines()
        line_num = 0
        longest_word = None
        for line in lines:
            line = line.strip()
            if len(line) == 0:
                return None
            else:
                line_num += 1
                tokens = line.split()
                for token in tokens:
                    if longest_word is None or len(token) > len(longest_word):
                        longest_word = token
            # return here would exit the loop too early after 1st line
        # loop ended
        return (str(line_num) + ": " + str(longest_word))

Then:

the longest word in the file with the line number attached to the front of it.

Improved

def word_finder(file_name):
    with open(file_name) as f:
        line_word_longest = None  # global max: tuple of (line-index, longest_word)
        for i, line in enumerate(f):  # line-index and line-content
            line = line.strip()
            if len(line) > 0:   # split words only if line present    
                max_token = max(token for token in line.split(), key = len)  # generator then max of tokens by length
                if line_word_longest is None or len(max_token) > len(line_word_longest[1]):
                    line_word_longest = (i, max_token)
        # loop ended
        if line_word_longest is None:
            return "No longest word found!"
        return f"{line_word_longest[0]}: '{line_word_longest[1]}' ({len(line_word_longest[1])} chars)"

See also:

Some SO research for similar questions:

inspiration from all languages: longest word in file
only python: [python] longest word in file
non python: -[python] longest word in file

In improved code did you mean: `max_token = max([token for token in line.split()], key = len)` # list-comprehension then max of tokens. Even better is to use a generator rather than list-comprehension: `max_token = max(token for token in line.split(), key = len)` — DarrylG, Jun 01 '22 at 01:00
@DarrylG thanks, not what we think (longest = max token), but what we measure (longest = max length) — hc_dev, Jun 01 '22 at 01:09

score 0 · Answer 3 · answered Jun 01 '22 at 01:33

Having a bit of fun with cutting this down:

def word_finder(file_name):
    with open("test.c") as f:
        lines = [{ 'num': i, 
                   'words': (ws := line.split()), 
                   'max': max(ws, key=len) if ws else '',  
                   'line': line } 
                 for i, line in enumerate(f.readlines())]
        m = max(lines, key=lambda l: len(l['max']))
        return f"{m['num']}: '{m['max']}'"

We use a list comprehension to turn each line into a dictionary describing its line number, all of the words that comprise it, the longest word and the original line. When computing the longest word we just insert an empty string if ws is empty, thus avoiding an exception for handing max an empty sequence.

It's then straightforward to use max to find the line with the longest word.

Finding longest word in a txt file

3 Answers3

Issue

Solution

Improved