I am trying to filter a data file with this structure:
1-887127 mmu-miR-9-5p 100.000 22 0 0 1 22 1 22 6.43e-08 41.7
2-851665 mmu-miR-9-5p 100.000 23 0 0 1 23 1 23 1.95e-08 43.6
3-438265 mmu-miR-99a-5p 100.000 21 0 0 1 21 1 21 2.10e-07 39.9
3-438265 mmu-miR-100-5p 95.238 21 1 0 1 21 1 21 9.78e-06 34.4
4-436182 mmu-miR-100-5p 100.000 22 0 0 1 22 1 22 6.43e-08 41.7
4-436182 mmu-miR-99a-5p 95.455 22 1 0 1 22 1 22 2.99e-06 36.2
5-411498 mmu-miR-30d-5p 100.000 22 0 0 1 22 1 22 7.60e-08 41.7
5-411498 mmu-miR-30a-5p 95.455 22 1 0 1 22 1 22 3.54e-06 36.2
6-347902 mmu-miR-99a-5p 100.000 22 0 0 1 22 1 22 7.02e-08 41.7
6-347902 mmu-miR-100-5p 95.455 22 1 0 1 22 1 22 3.26e-06 36.2
7-346107 mmu-miR-370-3p 100.000 22 0 0 1 22 1 22 6.43e-08 41.7
8-295513 mmu-miR-99b-5p 100.000 22 0 0 1 22 1 22 6.43e-08 41.7
9-288607 mmu-miR-30d-5p 100.000 22 0 0 1 22 1 22 7.02e-08 41.7
9-288607 mmu-miR-30a-5p 95.455 22 1 0 1 22 1 22 3.26e-06 36.2
The number before the hyphen should only appear once in the output file. To filter this, I would like to consider every row that starts with the same number and only keep the one with the greatest value in column 3. So the desired output for the data shown above would be:
1-887127 mmu-miR-9-5p 100.000 22 0 0 1 22 1 22 6.43e-08 41.7
2-851665 mmu-miR-9-5p 100.000 23 0 0 1 23 1 23 1.95e-08 43.6
3-438265 mmu-miR-99a-5p 100.000 21 0 0 1 21 1 21 2.10e-07 39.9
4-436182 mmu-miR-100-5p 100.000 22 0 0 1 22 1 22 6.43e-08 41.7
5-411498 mmu-miR-30d-5p 100.000 22 0 0 1 22 1 22 7.60e-08 41.7
6-347902 mmu-miR-99a-5p 100.000 22 0 0 1 22 1 22 7.02e-08 41.7
7-346107 mmu-miR-370-3p 100.000 22 0 0 1 22 1 22 6.43e-08 41.7
8-295513 mmu-miR-99b-5p 100.000 22 0 0 1 22 1 22 6.43e-08 41.7
9-288607 mmu-miR-30d-5p 100.000 22 0 0 1 22 1 22 7.02e-08 41.7
I know it would be very easy to just require that column 3 = 100 but that might not always be the case for my dataset.