2

I have a table that looks like this:

Column A | Column B
      10 |      0.1
      10 |      0.2
      10 |      0.3
      20 |      0.4
      20 |      0.5
      20 |      0.6

I need to transpose it to look like this:

 10 | 20
0.1 | 0.4
0.2 | 0.5
0.3 | 0.6

Almost 2k different values in Column A, for each one there is exactly 1k values in column B, which might repeat.

Is there an automated way of doing this?

RSFalcon7
  • 192
  • 1
    Use a PivotTable! – Eric F Aug 15 '18 at 19:03
  • 1
    Similar question here: https://superuser.com/q/244507/76571 You can probably use the solutions there and then just add a simple transpose step to the end to get what you want. – Excellll Aug 15 '18 at 19:19
  • 1
    A PivotTable doesn't keep the data, I can only extract certain functions that will apply to all columns (min, max, average, stddev, sum, product, etc) – RSFalcon7 Aug 16 '18 at 13:36
  • I also tried vlookup, but it finds only the last value of column B for each value of column A – RSFalcon7 Aug 16 '18 at 13:38
  • 1
    @Excellll indeed those answers would solve my problem, but neither the VBA script or the Google Spreadsheet solution were able to handle the amount of data. They crash with memory errors (even though I still have plenty of free memory) – RSFalcon7 Aug 16 '18 at 14:27
  • What version of Excel are you using? – Excellll Aug 16 '18 at 16:42
  • Excel 2016 from Office 365 – RSFalcon7 Aug 16 '18 at 17:16
  • I know you have a Python solution, but you can also do this with Power Query, PQ can handle about 1GB of data in the 32-bit version; much more in the 64-bit version. – Ron Rosenfeld Aug 16 '18 at 21:54
  • The problem wasn't memory, nor CPU. At least in the task manager the notebook was in ~50% load, after excel crashes it drops to ~45% – RSFalcon7 Aug 16 '18 at 21:59

2 Answers2

1

This python script did the trick:

import numpy as np

data = np.genfromtxt('inputcsv', delimiter=',')
keys = sorted(set(data[:,0]))

result = np.array([])

for k in keys:
    col = data[np.where(data[:,0] == k)][:,1]
    if not result.any():
        result = col
    else:
        result = np.vstack((result, col))
    print('key {0} finished'.format(k))

np.savetxt('final.csv', np.transpose(result), delimiter=',')
RSFalcon7
  • 192
  • You found a solution to you problem, that's great! Why don't you post excerpts of the input and output files for all to review? And don't forget to accept your answer if it works for you. – simlev Aug 28 '18 at 13:02
  • Because the examples of input and output are in the question – RSFalcon7 Aug 28 '18 at 18:15
  • I was suggesting you post actual input and output files because I tried running the script against your posted input and it gives an error. Obviously, the script expects a CSV input, so here's with the input formatted accordingly. You mention there are thousands of Column A values, so I added just one and this is the error I got. – simlev Aug 29 '18 at 08:05
0

I would bake a PHP script:

<?php
$file = fopen($argv[1], "r");
while(!feof($file)) {
    $line =  trim(fgets($file));
    if ($line=='') {continue;}
    $line = explode(' ', $line);
    $a[$line[0]] []= $line[1];
}
fclose($file);
foreach ($a as $k=>$v) {
    echo $k,"\t";
}
echo "\n";
$a = array_map(null, ...$a);
foreach($a as $b){
    echo implode("\t",$b),"\n";
}

Input file (tab-separated):

10      0.1
10      0.2
10      0.3
20      0.4
20      0.5
20      0.6
30      0.3
10      0.9

Command:

php script inputfile

Output (tab-separated):

10      20      30
0.1     0.4     0.3
0.2     0.5
0.3     0.6
0.9
simlev
  • 3,822
  • 3
  • 15
  • 33