93

guys. I'm trying to find the most elegant solution to a problem and wondered if python has anything built-in for what I'm trying to do.

What I'm doing is this. I have a list, A, and I have a function f which takes an item and returns a list. I can use a list comprehension to convert everything in A like so;

[f(a) for a in A]

But this return a list of lists;

[a1,a2,a3] => [[b11,b12],[b21,b22],[b31,b32]]

What I really want is to get the flattened list;

[b11,b12,b21,b22,b31,b32]

Now, other languages have it; it's traditionally called flatmap in functional programming languages, and .Net calls it SelectMany. Does python have anything similar? Is there a neat way to map a function over a list and flatten the result?

The actual problem I'm trying to solve is this; starting with a list of directories, find all the subdirectories. so;

import os
dirs = ["c:\\usr", "c:\\temp"]
subs = [os.listdir(d) for d in dirs]
print subs

currentliy gives me a list-of-lists, but I really want a list.

Steve Cooper
  • 19,026
  • 14
  • 70
  • 85

14 Answers14

141

You can have nested iterations in a single list comprehension:

[filename for path in dirs for filename in os.listdir(path)]

which is equivalent (at least functionally) to:

filenames = []
for path in dirs:
    for filename in os.listdir(path):
        filenames.append(filename)
Ruben Helsloot
  • 11,812
  • 5
  • 19
  • 40
Ants Aasma
  • 50,646
  • 15
  • 87
  • 92
  • 80
    Although clever, that is hard to understand and not very readable. – Curtis Yallop Oct 20 '14 at 23:01
  • 3
    Doesn't really answer the question as asked. This is rather a workaround for not encountering the problem in the first place. What if you already have a list of lists. For example, what if your list of lists is a result of the multiprocessing module's map function? Perhaps the itertools solution or the reduce solution is best. – Dave31415 Jan 22 '15 at 03:43
  • 28
    Dave31415: `[ item for list in listoflists for item in list ]` – rampion Feb 06 '15 at 19:53
  • 11
    'readability' is a subjective judgment. I find this solution quite readable. – Reb.Cabin May 22 '15 at 23:13
  • 1
    Can you please explain your code a bit? Which `for` execute first and as @rampion pointed how do we get listoflists in the middle. – iamprem Jan 28 '16 at 14:44
  • 14
    I thought it was readable too, until I saw the order of the terms... :( – c z May 10 '17 at 15:14
  • @cz Same order as if you write nested loops with list.append instead of a comprehension. See [here](https://fr.wikipedia.org/wiki/Python_(langage)#Programmation_fonctionnelle) (french wikipedia, but the relevant part is the Python example). –  Jul 05 '17 at 14:22
  • This is completely useless in any real use case. Python is really lacking this functionality. How would I for example get every `href` attribute of ever `a` tag in a list of `BeautifulSoup` objects? – Philippe Jun 11 '18 at 20:00
  • @Philippe [a.href for soup in soups for a in soup] – Jerther Aug 28 '18 at 19:18
  • 1
    I'm no expert here, but it seems like Python vs LINQ would be`c for a in source for b in a for c in b` vs `from a in source from b in a from c in b select c`, so it's really just taking the "select c" at the end and moving it to the beginning, removing the word "select". I hate it, but it is what it is. – cwharris Dec 08 '18 at 00:00
  • Just adding a simple running example that helped me: `[x for y in ((1,2,3), (4,5,6)) for x in y]` – Philippe Raemy May 08 '19 at 07:04
94
>>> from functools import reduce  # not needed on Python 2
>>> list_of_lists = [[1, 2],[3, 4, 5], [6]]
>>> reduce(list.__add__, list_of_lists)
[1, 2, 3, 4, 5, 6]

The itertools solution is more efficient, but this feels very pythonic.

Boris Verkhovskiy
  • 10,733
  • 7
  • 77
  • 79
Julian
  • 2,371
  • 19
  • 20
  • For a list of 1,000 sublists, this is [100 times slower](https://stackoverflow.com/questions/952914/how-to-make-a-flat-list-out-of-a-list-of-lists/45323085#45323085) that the `itertools` way and the difference only gets worse as your list grows. – Boris Verkhovskiy May 07 '22 at 07:54
  • @BorisV - thanks for benchmarking that. I'm not surprised! This version is probably fine for up to a few dozen sublists, but probably shouldn't be used if input size is unbounded (or just large). – Julian May 10 '22 at 14:17
63

You can find a good answer in the itertools recipes:

import itertools

def flatten(list_of_lists):
    return list(itertools.chain.from_iterable(list_of_lists))
Boris Verkhovskiy
  • 10,733
  • 7
  • 77
  • 79
rob
  • 35,274
  • 2
  • 54
  • 63
  • The same approach can be used to define flatmap, as proposed by [this answer](https://stackoverflow.com/a/20037408/1048186) and this [external blog post](http://naiquevin.github.io/a-look-at-some-of-pythons-useful-itertools.html) – Josiah Yoder Jul 11 '17 at 19:53
33

The question proposed flatmap. Some implementations are proposed but they may unnecessary creating intermediate lists. Here is one implementation that's based on iterators.

def flatmap(func, *iterable):
    return itertools.chain.from_iterable(map(func, *iterable))

In [148]: list(flatmap(os.listdir, ['c:/mfg','c:/Intel']))
Out[148]: ['SPEC.pdf', 'W7ADD64EN006.cdr', 'W7ADD64EN006.pdf', 'ExtremeGraphics', 'Logs']

In Python 2.x, use itertools.map in place of map.

vy32
  • 26,286
  • 33
  • 110
  • 218
Wai Yip Tung
  • 17,176
  • 9
  • 41
  • 46
19

You could just do the straightforward:

subs = []
for d in dirs:
    subs.extend(os.listdir(d))
Anon
  • 11,146
  • 3
  • 22
  • 19
18

You can concatenate lists using the normal addition operator:

>>> [1, 2] + [3, 4]
[1, 2, 3, 4]

The built-in function sum will add the numbers in a sequence and can optionally start from a specific value:

>>> sum(xrange(10), 100)
145

Combine the above to flatten a list of lists:

>>> sum([[1, 2], [3, 4]], [])
[1, 2, 3, 4]

You can now define your flatmap:

>>> def flatmap(f, seq):
...   return sum([f(s) for s in seq], [])
... 
>>> flatmap(range, [1,2,3])
[0, 0, 1, 0, 1, 2]

Edit: I just saw the critique in the comments for another answer and I guess it is correct that Python will needlessly build and garbage collect lots of smaller lists with this solution. So the best thing that can be said about it is that it is very simple and concise if you're used to functional programming :-)

Community
  • 1
  • 1
Martin Geisler
  • 71,485
  • 25
  • 165
  • 226
11
import itertools
x=[['b11','b12'],['b21','b22'],['b31']]
y=list(itertools.chain(*x))
print y

itertools will work from python2.3 and greater

Darknight
  • 1,044
  • 2
  • 12
  • 31
9
subs = []
map(subs.extend, (os.listdir(d) for d in dirs))

(but Ants's answer is better; +1 for him)

RichieHindle
  • 258,929
  • 46
  • 350
  • 392
  • Using reduce (or sum, which saves you many characters and an import;-) for this is just wrong -- you keep uselessly tossing away old lists to make a new one for each d. @Ants has the right answer (smart of @Steve to accept it!). – Alex Martelli Jul 03 '09 at 01:28
  • You can't say in general that this is a bad solution. It depends on whether performance is even an issue. Simple is better unless there is a reason to optimize. That's why the reduce method could be best for many problems. For example you have a slow function that produces a list of a few hundred objects. You want to speed it up by using multiprocessing 'map' function. So you create 4 processes and use reduce to flat map them. In this case the reduce function is fine and very readable. That said, it's good that you point out why this can be suboptimal. But it is not always suboptimal. – Dave31415 Jan 22 '15 at 03:48
4

You could try itertools.chain(), like this:

import itertools
import os
dirs = ["c:\\usr", "c:\\temp"]
subs = list(itertools.chain(*[os.listdir(d) for d in dirs]))
print subs

itertools.chain() returns an iterator, hence the passing to list().

Steef
  • 30,571
  • 4
  • 44
  • 36
4

This is the most simple way to do it:

def flatMap(array):
  return reduce(lambda a,b: a+b, array) 

The 'a+b' refers to concatenation of two lists

Nisamrine
  • 69
  • 1
  • 3
1

Google brought me next solution:

def flatten(l):
   if isinstance(l,list):
      return sum(map(flatten,l))
   else:
      return l
Vestel
  • 987
  • 1
  • 9
  • 25
  • 2
    Would be a little better if it handled generator expressions too, and would be a lot better if you explained how to use it... – ephemient Jul 03 '09 at 04:45
1
def flat_list(arr):
    send_back = []
    for i in arr:
        if type(i) == list:
            send_back += flat_list(i)
        else:
            send_back.append(i)
    return send_back
1

You can use pyxtension:

from pyxtension.streams import stream
stream([ [1,2,3], [4,5], [], [6] ]).flatMap() == range(7)
asu
  • 489
  • 5
  • 15
-1
If listA=[list1,list2,list3]
flattened_list=reduce(lambda x,y:x+y,listA)

This will do.

Baum mit Augen
  • 47,658
  • 24
  • 139
  • 177