1

I have some array (export from csv file) with ~ 100k lines . File size is ~ 22mb on disk All i need is to find line with some data, process it and load to mssql (i need to sync csv data with mssql)

Problem is that search tooks almost 1 second (~ TotalMilliseconds : 655,0788)!

$csv.Where({$_.'Device UUID' -eq 'lalala'})

Any way just to speed it up?

Steven
  • 6,437
  • 1
  • 11
  • 14
Igor Kuznetsov
  • 391
  • 1
  • 5
  • 14
  • Do you only ever need 1 line/row? Or can there be multiple matching rows? – Mathias R. Jessen Apr 10 '21 at 13:26
  • If you need to search for multiple `$_.Device UUID`, you might consider to index your list using a hashtable: like `$HashTable[$_.Device UUID] = $_` first. – iRon Apr 10 '21 at 13:36
  • Related: [Fastest Way to get a uniquely index item from the property of an array](https://stackoverflow.com/a/59437162/1701026) – iRon Apr 10 '21 at 13:49
  • the `Select-String` cmdlet can grab a file directly. so you may be able to feed your CSV to it, find the desired line, and then use that. converting the result into a structured object could be done with `ConvertFrom-Csv`. – Lee_Dailey Apr 10 '21 at 14:19
  • Maybe it's time for a database. – js2010 Apr 10 '21 at 16:43

2 Answers2

3

Load all 100K rows into a hashtable, using the Device UUID property as the key - this will make it much faster to locate a row than by iterating the whole array with .Where({...}):

$deviceTable = @{}
Import-Csv .\path\to\device_list.csv |ForEach-Object {
  $deviceTable[$_.'Device UUID'] = $_
}

This will now take significantly less than 1 second:

$matchingDevice = $deviceTable['lalala']
Mathias R. Jessen
  • 135,435
  • 9
  • 130
  • 184
1

If you only need one or a few lookups, you can consider the following alternative to Mathias R. Jessen's helpful answer. Note that, like Mathias' solution, it requires reading all rows into memory at once:

# Load all rows into memory.
$allRows = Import-Csv file.csv

# Get the *index* of the row with the column value of interest.
# Note: This lookup is case-SENSITIVE.
$rowIndex = $allRows.'Device UUID'.IndexOf('lalala')

# Retrieve the row of interest by index, if found.
($rowOfInterest = if ($rowIndex -ne -1) { $allRows[$rowIndex] })

Once the rows are loaded into memory (as [pscustomobject] instances, which itself won't be fast), the array lookup - via member-access enumeration - is reasonably fast, thanks to .NET performing the (linear) array search, using the System.Array.IndexOf() method.

The problem with your .Where({ ... }) approach is that iteratively calling a PowerShell script block ({ ... }) many times is computationally expensive.


It comes down to the following trade-off:

  • Either: Spend more time up front to build up a data structure ([hashtable]) that allows efficient lookup (Mathias' answer)

  • Or: Read the file more quickly, but spend more time on each lookup (this answer).

mklement0
  • 312,089
  • 56
  • 508
  • 622