0

I have this function, that used to read a directory contacting hundreds of thousands of files and get count of files for a specific date. Is there a way to use a search/count pattern based on the supplied date?

This works fine, but it takes too long time. Is there a better to do this?

I'm using VS 2008 (my client m/c. where I cannot upgrade either framework nor the VS)

 public static int GetFileCount(DirectoryInfo filePath)
    {
        int requestCount = 0;
        int day = -1;

        FileInfo[] files = filePath.GetFiles();

        DateTime minDate = DateTime.Today.AddDays(day);
        DateTime maxDate = DateTime.Today;
        DateTime lastWriteTime = DateTime.MinValue;


        foreach (FileInfo file in files)
        {
            if (file.LastWriteTime < maxDate && file.LastWriteTime > minDate)
            {
                requestCount++;
                //lastWriteTime = file.LastWriteTime;
            }
        }

        return requestCount;
    }
James Z
  • 12,104
  • 10
  • 27
  • 43
user1093452
  • 121
  • 2
  • 3
  • 18
  • possible duplicate of [Search files based on date created in c#](http://stackoverflow.com/questions/9215855/search-files-based-on-date-created-in-c-sharp) – Zohar Peled May 18 '15 at 07:35
  • @ZoharPeled - the suggested can never be duplicate, i want to have the file count over lakhs of files. Infact my above code works fine. But I'm expecting suggestion on how this can be made more faster . My above code takes hell of time to give the count value. SO ANY HELP ON OPTIMIZING MY ABOVE IS MOST APPRICIATED – user1093452 May 18 '15 at 07:40
  • @ZoharPeled : If you would have got my concern, Kindly request to umark my question from mentioning as duplicate – user1093452 May 18 '15 at 07:45
  • Sorry, but I don't see how your question is significantly different then the one I've linked to. you want to search files based on the `LastWriteTime`, the question I've linked to is searching files based on `CreationTime`. This is the only difference I see, and it's not very significant. – Zohar Peled May 18 '15 at 07:49
  • Thing is i'm looking to make my code to give faster result by tracing over thousands of files. – user1093452 May 18 '15 at 07:51

4 Answers4

1

A little bit more effficient is to use EnumerateFiles, for example with LINQ:

int requestCount = filePath.EnumerateFiles()
    .Count(file => file.LastWriteTime < maxDate && file.LastWriteTime >= minDate);

The EnumerateFiles and GetFiles methods differ as follows: When you use EnumerateFiles, you can start enumerating the collection of names before the whole collection is returned; when you use GetFiles, you must wait for the whole array of names to be returned before you can access the array. Therefore, when you are working with many files and directories, EnumerateFiles can be more efficient.

However, why are you converting a DateTime to String and then ConvertToDateTime again? Instead of Convert.ToDateTime(DateTime.Now.ToShortDateString()) you just need: DateTime.Today, so:

DateTime minDate = DateTime.Today.AddDays(day);
DateTime maxDate = DateTime.Today;

If you can't use .NET4 you cannot use EnumerateFiles and it's not easy to get the same lazy load behaviour. You could still use the LINQ approach for better readability.

Community
  • 1
  • 1
Tim Schmelter
  • 429,027
  • 67
  • 649
  • 891
  • I've constraints over the framework , i.e my client has only 3.5 version. So I'm handicapped , not to use EnumerateFiles() feature . Reference : http://stackoverflow.com/questions/4888836/enumeratefiles-equivalent-in-net-3-5 – user1093452 May 18 '15 at 07:46
  • @user1093452: then you can't use `EnumerateFiles` and there is no easy way to simulate it. However, why are you converting a `DateTime` to `String` and then `ConvertToDateTime` again? Instead of `Convert.ToDateTime(DateTime.Now.ToShortDateString())` you just need: `DateTime.Today;`. I have edited my answer. – Tim Schmelter May 18 '15 at 08:04
0

If you have 3.5 .NET framework constraints try this

public static int GetFileCount(DirectoryInfo filePath)
    {
        int requestCount = 0;
DirectoryInfo info = new DirectoryInfo(filePath);
        DateTime minDate = Convert.ToDateTime(DateTime.Now.AddDays(day).ToShortDateString());
        DateTime maxDate = Convert.ToDateTime(DateTime.Now.ToShortDateString());
        DateTime lastWriteTime = DateTime.MinValue;

requestCount = info.GetFiles().Select(x => (x.LastWriteTime >= minDate  && x.LastWriteTime <= maxDate)).Count();


        return requestCount;
    }
Rolwin Crasta
  • 4,061
  • 3
  • 32
  • 44
0

I would recommend you to break down your files array into 2/4/8 lists (depending on the number of processors you have i.e. dual processor - 2, quad processor - 4, octa processor - 8). Then spawn 2/4/8 threads and give each thread it's own list to process on and then when all threads have completed their individual processing, combine results and show it to the end-user.

Note: This solution will speed up your solution but it won't be 2x/4x/8x faster as this solution may make it sound like. There is Disk IO among other variables that will impact the duration of execution time.

Parth Shah
  • 1,962
  • 1
  • 20
  • 31
0

I've had a similar issue before and was able to cut the processing time down up to 10-fold in certain scenarios. What I've done was use the FindFile methods here: https://code.google.com/p/csharptest-net/source/browse/src/Library/IO/FindFile.cs

In addition, disable 8.3 filenames on the system. Relevant link: https://support.microsoft.com/en-us/kb/130694

Having 8.3 filenames enabled can hinder performance in certain scenarios. The Microsoft KB article I linked above describes the issue. In addition to disabling 8.3 filenames, you will have to modify the existing files on your system. What I did was move them all to a temp directory, then back. Disabling the 8.3 filenames prevents new files from getting an 8.3 filename assigned to it, but existing files will still have it.

After performing the above steps, I noticed the performance improvements. I had certain folders with over 500k files take nearly 2 hours to iterate through and process, but with this, it took me only around 5 minutes.

Lunyx
  • 3,014
  • 6
  • 25
  • 44