Rectangle 27 1

parsing Using semicolon separator csv with Powershell?


function Import-XMLCSV {
    Param($text,[char]$delimiter=',')
    $columns, $splitText=$text.split("`r`n") # we need lines, not full string
    # also this neat trick splits first line off the rest of text
    $columns= $columns.split($delimiter) 
    $splitText | foreach {
        $splits=@{}
        $splitLine=$_.split($delimiter) # split line normally
        $index=0
        $propIndex=0
        $value=""
        $tag=""
        while ($index -lt $splitLine.length) {
            if ($value -ne "") { $value+=$delimiter }
            if ($splitLine[$index] -match "^<([a-zA-Z0-9]+)") { $tag = $matches[1] }
            $value+=$splitLine[$index]
            if ($tag -eq "") {
                # no tag found, put full string in this property
                $splits[$columns[$propIndex]]=$value
                $value=""
                $propIndex+=1
            } else {
                if ($splitLine[$index] -match "/${tag}") {
                    # if there's a corresponding tag in this piece
                    # check valid XML in here, if not, continue
                    try {
                        $xml = New-Object System.Xml.XmlDocument
                        $xml.LoadXml($value)
                        # throws exception if not a valid XML, so won't save if unpaired
                        $splits[$columns[$propIndex]]=$value
                        $value=""
                        $propIndex+=1
                        $tag=""
                    }
                    catch [System.Xml.XmlException] {
                        # no action
                        write-debug "$index $propIndex $tag $value"
                        write-debug $_.exception
                    }
                } # if matches /tag
            } # if not matches /tag, continue adding to $value
            $index+=1
        } # end while
        # past this, we've got hash table populated
        New-Object PSCustomObject -Property $splits # return prepared object
    } # end foreach splittext
}

Note though, if you don't have a valid XML or a string in either of your fields, you will result in wrong output. Primarily, the trouble with your sample data is in your <img> tags, they aren't closed as is demanded by XML standard. To resolve, change them like so: <img style="..." src="..." /> - the last slash indicates immediate tag closure. Otherwise XML validation fails and you don't get "description" populated. XML validation in this code is a test in case there would be nested starting tags, say <div>...<div>...</div>...</div> so that building of the string won't stop after encountering the first </div>.

You should use a custom parser in this case. Your file is not a valid CSV because it does not have string delimiters wrapping the data (although it's hard to correctly wrap a HTML, you might first HTML-escape it, then wrap with quotes then separate with commas/semicolons). If you are creating such a file yourself, consider using [System.Web.HttpUtility]::HtmlEncode() to perform escaping of HTML characters. If not, and you need to parse this file, you will need to join the parts of the string that are mistakingly split by semicolons - but of course, raw call to Import-CSV will not work, and you'll have to simulate its functionality.

Note
Rectangle 27 1

parsing Using semicolon separator csv with Powershell?


$executingPath = split-path -parent $MyInvocation.MyCommand.Definition
$inputCSV = $executingPath + "\InputFileName.txt"
$outputXLSX = $executingPath + "\Output.xlsx"
$excel = New-Object -ComObject excel.application 
$workbook = $excel.Workbooks.Add(1)
$worksheet = $workbook.worksheets.Item(1)
$TxtConnector = ("TEXT;" + $inputCSV)
$Connector = $worksheet.QueryTables.add($TxtConnector,$worksheet.Range("A1"))
$query = $worksheet.QueryTables.item($Connector.name)
$query.TextFileOtherDelimiter = $Excel.Application.International(5)
$query.TextFileParseType  = 1
$query.TextFileColumnDataTypes = ,2 * $worksheet.Cells.Columns.Count
$query.AdjustColumnWidth = 1
$query.Refresh()
$query.Delete()
$Workbook.SaveAs($outputXLSX,51)
$excel.Quit()
$query.TextFileOtherDelimiter = ';'
Additional Settings
Control Panel
Region and Language

By default, Windows will have a default separator according to the region. For example, it may be comma as the default delimiter. If you want to change to semi-column, follow the below steps.

Now another window will open. Change the symbol in the List Separator section to the desired symbol(for example semi-column) and click apply.

Place the input file where the script file is placed and run the script. The output excel file will be generated in the same location.

Run the script. It will create an excel files and the columns in the excel file will be generated on the basis of semi-column.

Use the below script to convert comma/semi-column/pipe separated or any other symbol delimited values to different columns in Excel. Save this as a .ps1 file.

Note