16

I need an elegant method that takes an enumerable and gets the enumerable of enumerables each of the same number of elements in it but the last one:

public static IEnumerable<IEnumerable<TValue>> Chunk<TValue>(this IEnumerable<TValue> values, Int32 chunkSize)
{
    // TODO: code that chunks
}

This is what I have tried:

    public static IEnumerable<IEnumerable<TValue>> Chunk<TValue>(this IEnumerable<TValue> values, Int32 chunkSize)
    {
        var count = values.Count();
        var numberOfFullChunks = count / chunkSize;
        var lastChunkSize = count % chunkSize;
        for (var chunkIndex = 0; chunkSize < numberOfFullChunks; chunkSize++)
        {
            yield return values.Skip(chunkSize * chunkIndex).Take(chunkSize);
        }
        if (lastChunkSize > 0)
        {
            yield return values.Skip(chunkSize * count).Take(lastChunkSize);
        }
    }

UPDATE Just discovered there was a similar topic about splitting a list Split List into Sublists with LINQ

Community
  • 1
  • 1
Trident D'Gao
  • 17,538
  • 18
  • 90
  • 151
  • 7
    See the `Batch` method of [morelinq](http://code.google.com/p/morelinq/) – L.B Sep 12 '12 at 13:25
  • possible duplicate of [LINQ Partition List into Lists of 8 members](http://stackoverflow.com/questions/3773403/linq-partition-list-into-lists-of-8-members) – Rawling Sep 12 '12 at 13:42
  • possible duplicate of [Split List into Sublists with LINQ](http://stackoverflow.com/questions/419019/split-list-into-sublists-with-linq) – nawfal Feb 18 '13 at 10:38

5 Answers5

22

If memory consumption isn't a concern, then like this?

static class Ex
{
    public static IEnumerable<IEnumerable<TValue>> Chunk<TValue>(
        this IEnumerable<TValue> values, 
        int chunkSize)
    {
        return values
               .Select((v, i) => new {v, groupIndex = i / chunkSize})
               .GroupBy(x => x.groupIndex)
               .Select(g => g.Select(x => x.v));
    }
}

Otherwise you could get creative with the yield keyword, like so:

static class Ex
{
    public static IEnumerable<IEnumerable<TValue>> Chunk<TValue>(
                    this IEnumerable<TValue> values, 
                    int chunkSize)
    {
        using(var enumerator = values.GetEnumerator())
        {
            while(enumerator.MoveNext())
            {
                yield return GetChunk(enumerator, chunkSize).ToList();
            }
        }
    }

    private static IEnumerable<T> GetChunk<T>(
                     IEnumerator<T> enumerator,
                     int chunkSize)
    {
        do
        {
            yield return enumerator.Current;
        } while(--chunkSize > 0 && enumerator.MoveNext());
    }
}
Peter Lillevold
  • 33,065
  • 7
  • 95
  • 129
spender
  • 112,247
  • 30
  • 221
  • 334
8
public static IEnumerable<IEnumerable<T>> Chunk<T>(this IEnumerable<T> source, int chunksize)
{
    while (source.Any())
    {
        yield return source.Take(chunksize);
        source = source.Skip(chunksize);
    }
}
Tom
  • 12,388
  • 11
  • 70
  • 112
  • 1
    It will cause multiple enumeration of the source collection, which might be a problem – Serhii Kimlyk Apr 29 '21 at 14:07
  • Multiple enumeration is not a good property of a method that takes an `IEnumerable`. This function would be fine if it took an `IList`, because the caller would know that it requires eager evaluation of the entire set. Neither enumerating multiple times nor eagerly materializing any part of the IEnumerable (besides the first chunk, perhaps) is good. – ErikE Oct 14 '21 at 23:31
8

>= .Net 6

Built-in Enumerable.Chunk method:

// Giving an enumerable
var e = Enumerable.Range(1, 999);

// Here it is. Enjoy :)
var chunks = e.Chunk(29);

// Sample, iterating over chunks
foreach(var chunk in chunks) // for each chunk
{
    foreach(var item in chunk) // for each item in a chunk
    {
        Console.WriteLine(item);
    }
}
dani herrera
  • 44,444
  • 7
  • 103
  • 165
0

Only had some quick testing but this seems to work:

public static IEnumerable<IEnumerable<TValue>> Chunk<TValue>(this IEnumerable<TValue> values, Int32 chunkSize)
{
    var valuesList = values.ToList();
    var count = valuesList.Count();        
    for (var i = 0; i < (count / chunkSize) + (count % chunkSize == 0 ? 0 : 1); i++)
    {
        yield return valuesList.Skip(i * chunkSize).Take(chunkSize);
    }
}
Richard Dalton
  • 34,863
  • 6
  • 72
  • 90
  • 7
    Repeated iteration of the source sequence *may* be problematic. – spender Sep 12 '12 at 13:51
  • @spender Good point, I was trying to keep it short so ignored the resharper warning, but it is only one extra line. – Richard Dalton Sep 12 '12 at 14:07
  • Eagerly fully enumerating the IEnumerable is problematic, because of memory consumption. If streaming processing is possible, it should be used. Consider reading a 2 Gb file, or several of them. If the reader wants to take only a few pieces at a time, it should be able to expect using such a method that takes an `IEnumerable` that the entire file won't get loaded into memory just to return the first chunk. – ErikE Oct 14 '21 at 23:29
0

If you don't have .net 6, you might opt to patch the Chunk method from it into your project. The only adaptations you'll likely need to make are in relation to the exception helpers the .net source uses, as your own project probably won't have ThrowHelper in.

Their code:

ThrowHelper.ThrowArgumentNullException(ExceptionArgument.source);

would probably be more like:

throw new ArgumentNullException(nameof(source));

The following code block has had these adjustments applied; you can make a new file called Chunk.cs and drop the following code into it:

// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.

using System.Collections.Generic;

namespace System.Linq
{
    public static partial class Enumerable
    {
        /// <summary>
        /// Split the elements of a sequence into chunks of size at most <paramref name="size"/>.
        /// </summary>
        /// <remarks>
        /// Every chunk except the last will be of size <paramref name="size"/>.
        /// The last chunk will contain the remaining elements and may be of a smaller size.
        /// </remarks>
        /// <param name="source">
        /// An <see cref="IEnumerable{T}"/> whose elements to chunk.
        /// </param>
        /// <param name="size">
        /// Maximum size of each chunk.
        /// </param>
        /// <typeparam name="TSource">
        /// The type of the elements of source.
        /// </typeparam>
        /// <returns>
        /// An <see cref="IEnumerable{T}"/> that contains the elements the input sequence split into chunks of size <paramref name="size"/>.
        /// </returns>
        /// <exception cref="ArgumentNullException">
        /// <paramref name="source"/> is null.
        /// </exception>
        /// <exception cref="ArgumentOutOfRangeException">
        /// <paramref name="size"/> is below 1.
        /// </exception>
        public static IEnumerable<TSource[]> Chunk<TSource>(this IEnumerable<TSource> source, int size)
        {
            if (source == null)
            {
                throw new ArgumentNullException(nameof(source));
            }

            if (size < 1)
            {
                throw new ArgumentOutOfRangeException(nameof(size));
            }

            return ChunkIterator(source, size);
        }

        private static IEnumerable<TSource[]> ChunkIterator<TSource>(IEnumerable<TSource> source, int size)
        {
            using IEnumerator<TSource> e = source.GetEnumerator();
            while (e.MoveNext())
            {
                TSource[] chunk = new TSource[size];
                chunk[0] = e.Current;

                int i = 1;
                for (; i < chunk.Length && e.MoveNext(); i++)
                {
                    chunk[i] = e.Current;
                }

                if (i == chunk.Length)
                {
                    yield return chunk;
                }
                else
                {
                    Array.Resize(ref chunk, i);
                    yield return chunk;
                    yield break;
                }
            }
        }
    }
}

You should verify that incorporating their MIT licensed code into your project does not unduly impact your own license intentions

Caius Jard
  • 69,583
  • 5
  • 45
  • 72