1

I have a script in PowerShell to build a DataTable from a StreamReader that reads a .csv or .txt file (both comma delimited) and inserts the datatable in the database.

My source file contains data with comma inside double quotes, example:

ID,Desc,Obs
1234,"Some text, More Text, Text again","Text"

The problem is when I'm going to make the row split:

$datatable = New-Object System.Data.DataTable

$src = "My comma delimited file (.txt/.csv)"
$reader = New-Object IO.StreamReader($src)
$header = Get-Content -Path $src | select -First 1
$columns = $header.Split(",")


foreach ($column in $columns) {
    $datatable.columns.add($column) 
} 

while(($line = $reader.ReadLine()) -ne $null){
    $line = $line -split(",")

Because of the comma in double quotes the split give to me 5 columns instead of 3.

I don't want to remove the commas inside the double quotes. The data is to be inserted like this: Some text, More Text, Text again

How can I resolve this problem?

Bacon Bits
  • 28,812
  • 5
  • 55
  • 62
tg_dev3
  • 153
  • 2
  • 14
  • 3
    How big are the source files? If they are not that large the simplest solution is to use Import-CSV and ditch streamreader logic. else you might need to venture into regex territory and replace the proper commas with other characters or vice versa so you can safely split – Matt Mar 29 '18 at 12:39
  • Thank you for your help @Matt! I tried to use Import-Csv but only work for small files. When used with big files (>1gb) don’t work and gives a time out. – tg_dev3 Mar 29 '18 at 12:59

2 Answers2

2

Solution 1 - ConvertFrom-CSV: Save the header and use ConvertFrom-Csv to parse it for you. Haven't tested it on large files, but it won't have to load the entire file in memory, so it should work at least. Ex:

#Create samplefile
@"
ID,Desc,Obs
1234,"Some text, More Text, Text again","Text"
5678,"Some text, More Text, Text again and again",Text2
$(1..100000 | % { "$_,`"Some text$_, More Text$_, Text again and again$_`",Text$_`n" })
"@ -split "`n" | % { $_.trim() } | Set-Content D:\Downloads\test.txt


$datatable = New-Object System.Data.DataTable
$src = "D:\Downloads\test.txt"
$reader = New-Object IO.StreamReader($src)

#Get header and split to columns
$columns = $reader.ReadLine() -split ','

foreach ($column in $columns) {
    $datatable.columns.add($column)
}

while(($line = $reader.ReadLine()) -ne $null){
    #Let ConvertFrom-CSV do the heavy-lifting by making it convert one "csv-file" per line using a known header
    $obj = $line | ConvertFrom-Csv -Header $columns

    $row = $datatable.NewRow()
    $row.ID = $obj.ID
    $row.Desc = $obj.Desc
    $row.Obs = $obj.Obs

    $datatable.Rows.Add($row)
}

Test:

#Show available columns
$datatable.Columns.Caption
ID
Desc
Obs

#Show datatable
$datatable

ID   Desc                                       Obs
--   ----                                       ---
1234 Some text, More Text, Text again           Text
5678 Some text, More Text, Text again and again Text2

Solution 2 - TextFieldParser: The VisualBasic-assembly has a TextFieldParser-class that understands quoted fields. This will perform faster (50% faster in my 100k csv-test) due to less overhead when working directly with .NET. Try:

Add-Type -AssemblyName Microsoft.VisualBasic

$datatable = New-Object System.Data.DataTable
$src = "D:\Downloads\test.txt"

$reader = New-Object -TypeName Microsoft.VisualBasic.FileIO.TextFieldParser -ArgumentList $src
$reader.Delimiters = @(",")
#Default values, but wanted to show the options
$reader.HasFieldsEnclosedInQuotes = $true
$reader.TrimWhiteSpace = $true

#Get header as array
$columns = $reader.ReadFields()

foreach ($column in $columns) {
    $datatable.columns.add($column)
}

while($fields = $reader.ReadFields()) {

    $row = $datatable.NewRow()

    #Insert value in property using field(column) index
    for ($i = 0; $i -lt $columns.Count; $i++) {
        $row.($columns[$i]) = $fields[$i]
    }

    $datatable.Rows.Add($row)
}

$reader.Close()
Frode F.
  • 49,327
  • 8
  • 90
  • 106
  • Thanks for your response I will try this method and see if it's work but how I can create a DataTable from ConvertFrom-Csv? The `$datatable.Rows.Add($row)` don't work in this case. – tg_dev3 Mar 29 '18 at 14:16
  • The same way you would add any row in a datable: Create a row-object, fill in the values and add it. Using add directly will put the whole object (or string-representation probably) in the first column. See updated answer. I didn't include it because I thought the problem in this question was parsing the csv. – Frode F. Mar 29 '18 at 14:28
  • Now I can run without errors but with the example data that I put above the datatable comes empty. The object output is: `@{ID,Desc,Obs=ID} @{ID,Desc,Obs=1234}` – tg_dev3 Mar 29 '18 at 15:07
  • Not sure what you're doing different. My recreation of the scenario using the sample data you provided works. See updated answer – Frode F. Mar 29 '18 at 15:21
  • Really strange, I use like the answer you gave and return empty data. the columns are returned but the data no. In your answer the -Header in `$obj = $line | ConvertFrom-Csv -Header $header` is giving error because it don't exists and I removed `$obj = $line | ConvertFrom-Csv`. But the rest is equal and gave empty... – tg_dev3 Mar 29 '18 at 15:33
  • Oh sorry, it's a typo. Use `-Header $columns`. `$headers` was leftover from first draft of the solution. I renamed it to merge with your original solution. That's what I get from not cleaning up my VS Code session. Fixed answer now – Frode F. Mar 29 '18 at 15:37
  • Already working! Thank you! But now I have another problem the small files are working with no problem but the huge takes huge time to process not consume much memory but takes a lot of time in files with >500 mb (approximately 5 million rows) ... – tg_dev3 Mar 29 '18 at 16:40
  • If it's still to slow you should change the delimiter to something unique like @Matt suggested so you could use string split. Or you would have to create more complex code to figure out where to split (takes more time) – Frode F. Mar 29 '18 at 16:43
  • See updated answer for alternative solution. Didn't want to spend time writing (or finding third party libraries) solutions for parsing the csv incl. delimiter inside quotes. However, I discovered that MS provides a cool class as part of the VisualBasic assembly – Frode F. Mar 29 '18 at 17:45
  • 1
    Great! The last solution works like a charm!!! It's much faster than the first! Thank you!!! – tg_dev3 Apr 02 '18 at 09:43
  • 1
    Just to link questions, I've written a [similar answer](https://stackoverflow.com/a/67029635/7571258) for a different question, where I wrapped `TextFieldParser` in a reusable function. – zett42 Apr 10 '21 at 10:28
0

try this :

$csv=import-csv "C:\temp\vminfo.csv"

$datatable = New-Object System.Data.DataTable

#Add all columns
$columnsname=$csv | Get-Member -MemberType NoteProperty | %{ $datatable.columns.add($_.Name) }

#Add datas by column name
$csv | %{

    $newrow=$datatable.NewRow()
    $rowcsv=$_

    $columnsname | %{$newrow[$_]=$rowcsv."$_"}

    $datatable.Rows.Add($newrow)

}
Esperento57
  • 15,129
  • 3
  • 36
  • 41