Collapse duplicate rows by selecting row with greater value

Question

I am trying to filter a data file with this structure:

1-887127    mmu-miR-9-5p    100.000 22  0   0   1   22  1   22  6.43e-08    41.7
2-851665    mmu-miR-9-5p    100.000 23  0   0   1   23  1   23  1.95e-08    43.6
3-438265    mmu-miR-99a-5p  100.000 21  0   0   1   21  1   21  2.10e-07    39.9
3-438265    mmu-miR-100-5p  95.238  21  1   0   1   21  1   21  9.78e-06    34.4
4-436182    mmu-miR-100-5p  100.000 22  0   0   1   22  1   22  6.43e-08    41.7
4-436182    mmu-miR-99a-5p  95.455  22  1   0   1   22  1   22  2.99e-06    36.2
5-411498    mmu-miR-30d-5p  100.000 22  0   0   1   22  1   22  7.60e-08    41.7
5-411498    mmu-miR-30a-5p  95.455  22  1   0   1   22  1   22  3.54e-06    36.2
6-347902    mmu-miR-99a-5p  100.000 22  0   0   1   22  1   22  7.02e-08    41.7
6-347902    mmu-miR-100-5p  95.455  22  1   0   1   22  1   22  3.26e-06    36.2
7-346107    mmu-miR-370-3p  100.000 22  0   0   1   22  1   22  6.43e-08    41.7
8-295513    mmu-miR-99b-5p  100.000 22  0   0   1   22  1   22  6.43e-08    41.7
9-288607    mmu-miR-30d-5p  100.000 22  0   0   1   22  1   22  7.02e-08    41.7
9-288607    mmu-miR-30a-5p  95.455  22  1   0   1   22  1   22  3.26e-06    36.2

The number before the hyphen should only appear once in the output file. To filter this, I would like to consider every row that starts with the same number and only keep the one with the greatest value in column 3. So the desired output for the data shown above would be:

1-887127    mmu-miR-9-5p    100.000 22  0   0   1   22  1   22  6.43e-08    41.7
2-851665    mmu-miR-9-5p    100.000 23  0   0   1   23  1   23  1.95e-08    43.6
3-438265    mmu-miR-99a-5p  100.000 21  0   0   1   21  1   21  2.10e-07    39.9
4-436182    mmu-miR-100-5p  100.000 22  0   0   1   22  1   22  6.43e-08    41.7
5-411498    mmu-miR-30d-5p  100.000 22  0   0   1   22  1   22  7.60e-08    41.7
6-347902    mmu-miR-99a-5p  100.000 22  0   0   1   22  1   22  7.02e-08    41.7
7-346107    mmu-miR-370-3p  100.000 22  0   0   1   22  1   22  6.43e-08    41.7
8-295513    mmu-miR-99b-5p  100.000 22  0   0   1   22  1   22  6.43e-08    41.7
9-288607    mmu-miR-30d-5p  100.000 22  0   0   1   22  1   22  7.02e-08    41.7

I know it would be very easy to just require that column 3 = 100 but that might not always be the case for my dataset.

Does this answer your question? [Select the row with the maximum value in each group](https://stackoverflow.com/questions/24558328/select-the-row-with-the-maximum-value-in-each-group) — julien.leroux5, Aug 28 '21 at 09:22

Collapse duplicate rows by selecting row with greater value

0 Answers0