5

I'm basically building my own parallel foreach pipeline function, using runspaces.

My problem is: I call my function like this:

somePipeline | MyNewForeachFunction { scriptBlockHere } | pipelineGoesOn...

How can I pass the $_ parameter correctly into the ScriptBlock? It works when the ScriptBlock contains as first line

param($_)

But as you might have noticed, the powershell built-in ForEach-Object and Where-Object do not need such a parameter declaration in every ScriptBlock that is passed to them.

Thanks for your answers in advance fjf2002

EDIT:

The goal is: I want comfort for the users of function MyNewForeachFunction - they shoudln't need to write a line param($_) in their script blocks.

Inside MyNewForeachFunction, The ScriptBlock is currently called via

$PSInstance = [powershell]::Create().AddScript($ScriptBlock).AddParameter('_', $_)
$PSInstance.BeginInvoke()

EDIT2:

The point is, how does for example the implementation of the built-in function ForEach-Object achieve that $_ need't be declared as a parameter in its ScriptBlock parameter, and can I use that functionality, too?

(If the answer is, ForEach-Object is a built-in function and uses some magic I can't use, then this would disqualify the language PowerShell as a whole in my opinion)

EDIT3:

Thanks to mklement0, I could finally build my general foreach loop. Here's the code:

function ForEachParallel {
    [CmdletBinding()]
    Param(
        [Parameter(Mandatory)] [ScriptBlock] $ScriptBlock,
        [Parameter(Mandatory=$false)] [int] $PoolSize = 20,
        [Parameter(ValueFromPipeline)] $PipelineObject
    )

    Begin {
        $RunspacePool = [runspacefactory]::CreateRunspacePool(1, $poolSize)
        $RunspacePool.Open()
        $Runspaces = @()
    }

    Process {
        $PSInstance = [powershell]::Create().
            AddCommand('Set-Variable').AddParameter('Name', '_').AddParameter('Value', $PipelineObject).
            AddCommand('Set-Variable').AddParameter('Name', 'ErrorActionPreference').AddParameter('Value', 'Stop').
            AddScript($ScriptBlock)

        $PSInstance.RunspacePool = $RunspacePool

        $Runspaces += New-Object PSObject -Property @{
            Instance = $PSInstance
            IAResult = $PSInstance.BeginInvoke()
            Argument = $PipelineObject
        }
    }

    End {
        while($True) {
            $completedRunspaces = @($Runspaces | where {$_.IAResult.IsCompleted})

            $completedRunspaces | foreach {
                Write-Output $_.Instance.EndInvoke($_.IAResult)
                $_.Instance.Dispose()
            }

            if($completedRunspaces.Count -eq $Runspaces.Count) {
                break
            }

            $Runspaces = @($Runspaces | where { $completedRunspaces -notcontains $_ })
            Start-Sleep -Milliseconds 250
        }

        $RunspacePool.Close()
        $RunspacePool.Dispose()
    }
}

Code partly from MathiasR.Jessen, Why PowerShell workflow is significantly slower than non-workflow script for XML file analysis

fjf2002
  • 833
  • 5
  • 13
  • Either inspect the AST of the scriptblock and inject a param declaration if none exist, or extend PSCmdlet and invoke the scriptblock with the dollarUnderscore parameter set – Mathias R. Jessen Oct 24 '18 at 17:54
  • 1
    The first argument passed to your scriptblock is in `$args[0]` or if it's taken as pipeline: `$input` – Maximilian Burszley Oct 24 '18 at 17:57
  • @MathiasR.Jessen: Could you be more specific? Do ForEach-Object / Where-Object etc. also do it like this? – fjf2002 Oct 24 '18 at 18:06
  • 1
    @mklement0: Thanks, I've added Dispose calls, a sane ErrorActionPreference and I have removed the "barrier" - now completed results get passed down the pipeline *before* all runspaces have finished. – fjf2002 Oct 27 '18 at 14:59

4 Answers4

7

The key is to define $_ as a variable that your script block can see, via a call to Set-Variable.

Here's a simple example:

function MyNewForeachFunction {
  [CmdletBinding()]
  param(
    [Parameter(Mandatory)]
    [scriptblock] $ScriptBlock
    ,
    [Parameter(ValueFromPipeline)]
    $InputObject
  )

  process {
    $PSInstance = [powershell]::Create()

    # Add a call to define $_ based on the current pipeline input object
    $null = $PSInstance.
      AddCommand('Set-Variable').
        AddParameter('Name', '_').
        AddParameter('Value', $InputObject).
      AddScript($ScriptBlock)

    $PSInstance.Invoke()
  }

}

# Invoke with sample values.
1, (Get-Date) | MyNewForeachFunction { "[$_]" }

The above yields something like:

[1]
[10/26/2018 00:17:37]
mklement0
  • 312,089
  • 56
  • 508
  • 622
1

Maybe this can help. I'd normally run auto-generated jobs in parallel this way:

Get-Job | Remove-Job

foreach ($param in @(3,4,5)) {

 Start-Job  -ScriptBlock {param($lag); sleep $lag; Write-Output "slept for $lag seconds" } -ArgumentList @($param)

}

Get-Job | Wait-Job | Receive-Job

If I understand you correctly, you are trying to get rid of param() inside the scriptblock. You may try to wrap that SB with another one. Below is the workaround for my sample:

Get-Job | Remove-Job

#scriptblock with no parameter
$job = { sleep $lag; Write-Output "slept for $lag seconds" }

foreach ($param in @(3,4,5)) {

 Start-Job  -ScriptBlock {param($param, $job)
  $lag = $param
  $script = [string]$job
  Invoke-Command -ScriptBlock ([Scriptblock]::Create($script))
 } -ArgumentList @($param, $job)

}

Get-Job | Wait-Job | Receive-Job
Mike Twc
  • 1,947
  • 2
  • 12
  • 19
1
# I was looking for an easy way to do this in a scripted function,
# and the below worked for me in PSVersion 5.1.17134.590

function Test-ScriptBlock {
    param(
        [string]$Value,
        [ScriptBlock]$FilterScript={$_}
    )
    $_ = $Value
    & $FilterScript
}
Test-ScriptBlock -Value 'unimportant/long/path/to/foo.bar' -FilterScript { [Regex]::Replace($_,'unimportant/','') }
1

What I think you're looking for (and what I was looking for) is to support a "delay-bind" script block, supported in PowerShell 5.1+. The Microsoft documentation tells a bit about what's required, but doesn't provide any user-script examples (currently).

The gist is that PowerShell will implicitly detect that your function can accept a delay-bind script block if it defines an explicitly typed pipeline parameter (either by Value or by PropertyName), as long as it's not of type [scriptblock] or type [object].

function Test-DelayedBinding {
    param(
        # this is our typed pipeline parameter
        # per doc this cannot be of type [scriptblock] or [object],
        # but testing shows that type [object] may be permitted
        [Parameter(ValueFromPipeline, Mandatory)][string]$string,
        # this is our scriptblock parameter
        [Parameter(Position=0)][scriptblock]$filter
    )

    Process {
        if (&$filter $string) {
            Write-Output $string
        }
    }
}

# sample invocation
>'foo', 'fi', 'foofoo', 'fib' | Test-DelayedBinding { return $_ -match 'foo' }
foo
foofoo

Note that the delay-bind will only be applied if input is piped into the function, and that the script block must use named parameters (not $args) if additional parameters are desired.

The frustrating part is that there is no way to explicitly specify that delay-bind should be used, and errors resulting from incorrectly structuring your function may be non-obvious.

Tydaeus
  • 1,345
  • 12
  • 13