0

I have a following problem. Let's say I have a list of strings that are actually urls to xls excel files and I am trying to download them all and convert them to xlsx , since I am using the Microsoft Compatibility Pack I can't just use the converter after I downloaded a file because I don't want that many processes running at a time, and there are about 1600 files so I really don't want that many processes and doing it sequentially would probably last forever.
I was trying to improve my code by using TPL data flow because I thought that this situation is ideal for a producer-consumer like pattern and the internet suggested that TPL Data flow is what I need, but probably I misunderstood something from the tutorials I was reading because the following code is not working. What am I doing wrong ?

var pathsBuffer = new BufferBlock<string>(new DataflowBlockOptions 
{
    BoundedCapacity = 12
});

var converterOptions = new ExecutionDataflowBlockOptions 
{
    MaxDegreeOfParallelism = 4
};

var converter = new ActionBlock<string>((filePath) => 
{
    Process.Start(@"c:\Program Files (x86)\Microsoft Office\Office12\excelcnv.exe",
    string.Format(@" -nme -oice {0} {1}", filePath, filePath + "x")).WaitForExit();
}, converterOptions);
pathsBuffer.LinkTo(converter);
pathsBuffer.Completion.ContinueWith(task => converter.Complete());
Parallel.ForEach(FileAdress, async(file) => 
{
    using(var webClient = new WebClient()) 
    {
        string OutputDirectory = ConfigurationManager.AppSettings["RootDirectory"] + 
                                 FolderIndex;
        if (!Directory.Exists(OutputDirectory)) 
        {
            Directory.CreateDirectory(OutputDirectory);
        }
        string filePath = Path.Combine(OutputDirectory, AdressIndex[file]);
        await webClient.DownloadFileTaskAsync(new Uri(file), filePath);

        while (!pathsBuffer.Post(filePath)) {}
    }
});
pathsBuffer.Complete();
Yuval Itzchakov
  • 141,979
  • 28
  • 246
  • 306
Dave Demirkhanyan
  • 578
  • 1
  • 5
  • 21
  • 3
    [The whole idea behind Parallel.ForEach() is that you have a set of threads and each processes part of the collection. As you noticed, this doesn't work with async-await](http://stackoverflow.com/questions/11564506/nesting-await-in-parallel-foreach) – MickyD Nov 30 '15 at 08:19
  • No actually the Parallel.ForEach() is working great. I am the getting the files pretty fast. I suppose that when I am using await there the thread runs off to the next iteration or something before I am done there. The only problem I have is that the converter is not working. – Dave Demirkhanyan Nov 30 '15 at 08:22
  • The converter, I put a break-point inside the lambda - it never gets there. – Dave Demirkhanyan Nov 30 '15 at 08:28
  • You might be better placing the download code into a _source block_ and that near the top your graph. Then remove the `Parallel.ForEach`. – MickyD Nov 30 '15 at 08:34
  • @Micky could you please show me how ? – Dave Demirkhanyan Nov 30 '15 at 08:36
  • 1
    You never wait for the converter to complete, so your program may be terminating before even a single file can be processed. Also, that Parallel.ForEach is a TransformBlock with a DOP>1 waiting to be born. Add a starting TransformBlock that will receive a ULR as input, download the file then return the file path. Also, you *don't* need to add a BufferBlock before the ActionBlock, it *already* has one. Finally, you need to propagate completion among the blocks **and await for completion on the final one**. – Panagiotis Kanavos Dec 01 '15 at 09:12

0 Answers0