8

In Python on a GNU/Linux system, what's the fastest way to recursively scan a directory for all .MOV or .AVI files, and to store them in a list?

Keith Pinson
  • 7,604
  • 6
  • 56
  • 100
ensnare
  • 37,180
  • 60
  • 149
  • 218
  • 1
    Fastest probably involves writing extension to use native code. But do you really want that? – David Heffernan Dec 24 '11 at 17:28
  • Even if you don't want to do that, depending on how many files and directories we're talking about, it might be faster to execute the external `find` command than processing the results of `os.walk()`. But if the `os.walk()` solution is fast enough, it is more elegant and easy to understand/edit. – Michael Hoffman Dec 24 '11 at 17:37

7 Answers7

7

You can use os.walk() for recuresive walking and glob.glob() or fnmatch.filter() for file matching:

Check this answer

Community
  • 1
  • 1
Aleksandra Zalcman
  • 3,102
  • 1
  • 17
  • 19
7

I'd use os.walk to scan the directory, os.path.splitext to grab the suffix and filter them myself.

suffixes = set(['.AVI', '.MOV'])
for dirpath, dirnames, filenames in os.walk('.'):
    for f in filenames:
        if os.path.splitext(f)[1] in suffixes:
            yield os.path.join(dirpath, f)
  • This is probably the best solution because it can be easily adapted to enforce case-insensitive matching. – ekhumoro Dec 24 '11 at 19:49
4

Example for a list of files in current directory. You can expand this for specific paths.

import glob
movlist = glob.glob('*.mov')
milancurcic
  • 6,072
  • 2
  • 32
  • 46
2
pattern = re.compile('.*\.(mov|MOV|avi|mpg)$')

def fileList(source):
   matches = []
   for root, dirnames, filenames in os.walk(source):
       for filename in filter(lambda name:pattern.match(name),filenames):
           matches.append(os.path.join(root, filename))
   return matches
Jhonathan
  • 1,581
  • 1
  • 12
  • 24
  • The [fnmatch](http://docs.python.org/library/fnmatch.html#module-fnmatch) module only supports very simple glob patterns, so your filter won't work. – ekhumoro Dec 24 '11 at 19:46
  • @ekhumoro if it works, symbols ([],.,?, *, ()) are allowed to glob, python test code and see which works – Jhonathan Dec 24 '11 at 20:01
  • Your pattern is equivalent to `*.[movMOVaipg()]`. This will match, for example, `*.i`, `*.a`, `*.M`, etc, but _not_ `*.MOV`, `*.avi`, etc. Try it for youself! – ekhumoro Dec 24 '11 at 20:21
1

Python 2.x:

import os

def generic_tree_matching(rootdirname, filterfun):
    return [
        os.path.join(dirname, filename)
        for dirname, dirnames, filenames in os.walk(rootdirname)
        for filename in filenames
        if filterfun(filename)]

def matching_ext(rootdirname, extensions):
    "Case sensitive extension matching"
    return generic_tree_matching(
        rootdirname,
        lambda fn: fn.endswith(extensions))

def matching_ext_ci(rootdirname, extensions):
    "Case insensitive extension matching"
    try:
        extensions= extensions.lower()
    except AttributeError: # assume it's a sequence of extensions
        extensions= tuple(
            extension.lower()
            for extension in extensions)
    return generic_tree_matching(
        rootdirname,
        lambda fn: fn.lower().endswith(extensions))

Use either matching_ext or matching_ext_ci with arguments the root folder and an extension or a tuple of extensions:

>>> matching_ext(".", (".mov", ".avi"))
tzot
  • 87,612
  • 28
  • 135
  • 198
1

I suggest the use of os.walk and a carefully reading of its documentation.

This may be a one liner approach:

[f for root,dirs,files in os.walk('/your/path') for f in files if is_video(f)]

Where in is_video you check your extensions.

Rik Poggi
  • 26,862
  • 6
  • 63
  • 81
0

You can also use pathlib for this.

from pathlib import Path

files_mov = list(Path(path).rglob("*.MOV"))
H. Sánchez
  • 467
  • 5
  • 12