TL;DR
To the very best of my knowledge, there is no canonical formula for determining Work in Progress (WIP) limits. However, there are some empirical best-practices.
What WIP Limits Are Intended to Do
The goals of WIP limits are generally to:
- Improve throughput.
- Reduce cycle times.
- Optimize team capacity.
- Limit the efficiency drag from multi-tasking.
- Increase slack to make a process more adaptable.
Optimizing WIP Limits
How one optimizes WIP limits will vary a lot based on a number of factors and a lot of in situ process analysis. However, a good rule of thumb is to limit WIP to the team's mono-tasking capacity. In other words, your baseline WIP limit should be the number of tasks that can be in progress simultaneously without idling.
For example, if you have six team members whose responsibilities don't overlap, your WIP limit should not exceed 6 across all columns on the Kanban, since that is the maximum number of tasks that can be worked on simultaneously without task-switching or multi-tasking. In addition, specific columns may further restrict WIP limits for that columb as a subset of the limit for the Kanban as a whole.
However, as a practical matter, cross-functional teams, teams that practice pair programming, and frameworks that encourage "swarming" the team over stories should lower their WIP limits accordingly. For example, a WIP of N-1 (where N is the number of team members) would allow for more flexibility in coordinating stories within an iteration, while a WIP of N/2 might be optimal for a team that is optimized for pair programming.
Regardless of the actual numerical limit, the important thing is to avoid the "100% utilization fallacy" and ensure the process has sufficient slack to ensure a consistent level of throughput over time. That means that you generally want to apply some fudge factor to lower your WIP limits, but the specifics will vary from project to project.
Perhaps counter-intuitively, reducing the WIP limits below the team's maximum capacity will generally improve your throughput. This is most often because at 100% capacity, any roadblocks or unexpected issues can create bottlenecks that impact the entire pull-queue cycle. Ensuring that there is sufficient slack in the process enables the team to overcome minor process issues as they arise without needed to "stop the line" altogether. This ability to adapt without stopping the line every time is part of what makes a process agile.
Per-Column WIP Limits
Kanban is a pull-queue system. Each column is essentially a separate queue into which work is pulled from the previous column when the WIP limit allows.
For example, if two of your columns are "Coding" and "Regression Testing," one would pull finished stories from the Coding column when the Regression Testing column was below its capacity.
As a further example, if your Kanban WIP limit is 6, but you have only one person dedicated to Regression Testing, then your WIP limit for the Regression Testing column should most likely be 1; not one test, necessarily, but rather a single story.
There are always exceptions. If regression testing is semi-automated, and six jobs can be run in parallel without multi-tasking by the human agents responsible for the column, you might set the WIP limit for that column to six. The optimal WIP limit for each queue is most definitely an inspect-and-adapt issue that needs to be consistently reviewed by the team and adjusted as necessary over time.
Ultimately, the methodology for optimizing WIP limits for columns is the same as optimizing for the board as a whole: your WIP limit for a column must not exceed its mono-tasking capacity, and should be low enough to provide sufficient slack in the process to prevent line-stoppage for minor problems.