1

I have a file such as :

@SRR9110374.1 1/1
GAGTATAAAGAAGAAAGTAAATCTCGGTTCGTCTCTTCATCGAGAGAAATGTCGACGAGAAAAAAAAAACAAGGGCTCATTTAAAGCCTTTCAAATCCT
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@SRR9110374.2 2/1
ATATGGAACAAGTTAAAAAAAATAAAAAGCAAAGAAATAATGTTTTGTCATCGAAAGTGTCGACATAAAAACAGGTTGGCATCTGGCCTGGTATCTCA
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF<BFFFFFFF<FFFFFFFFFFF
@SRR9110374.3 3/1
NTATAACCGTATCAAAGAAGTTTACCCCGAGAGAAGCACGCAGTTTCCCACAGGTAATTTTCTCACAAGCGAGAGAAACATCATACCGCAATCAGGAAC
+
#<<BBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFF
@SRR9110374.4 4/1
GATAAAGAATATAGCTATGTATAGCCGGGATATATTAAGTGATTGAAATATCTCTTAGAAATCCATAGAATAGTAGTGTATCGAATAGGAGGAAGCGAAA
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@SRR9110374.5 5/1
CTTCCAATGCTTGCCAAAGTTCATTGTCGTTGTAATTATCGAAAGGATCTAAATTCTTTCTCAACGAACCCGAGAATAGGAAGGGTTCTTGAGGAATTAT
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF/FFFFFFFFFFFFFFFFFFBFFF/FFF
@SRR9110374.6 6/1
ACCGATAATCTTTCCTTCTCAAGAATTTTGTTAATATTCCACATTTTTAAATAGATTTCATTTCTCTCTCTCTTTCTCTCTCTTTTTCTTGTCCTCGATG
+
BBBBBFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFBFFFF///FF
@SRR9110374.7 7/1
GTTGTGCTGAGAATGTTAATAAATTACAAAATGTTATCACTAACTTGGAAATATTCGAATCGACAGATATCGCGTTTGTCGTGTTGTATTAATATATTC
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@SRR9110374.8 8/1
GTCATAGAACGGGGGAGGGGAGGAAGAAGAAAGGAAGGGAAAAAAACGAGAGAGAGAGAGGGGATTACGCTCGCCGTTCGAATCGTTAGGCGTCCGTTT
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFBBFBBFF
@SRR9110374.9 9/1
AATTATTATTTAATCGACGCGTCTATCGATAAATCATCCTCGAATGCTAAGCAAAACTGAACTTCCGCAAATATTGCACACGAAACGTTGAAACAAAG
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

and I would like to cat the content untill the Xth occurence savec in the Nb_occurence variable into a new file.

I tried :

Nb_occurence=4
cat file | awk 'BEGIN{ found=0} /@/{found=found+1} {if ( found < $Nb_occurence ) print }'

I should get :

@SRR9110374.1 1/1
GAGTATAAAGAAGAAAGTAAATCTCGGTTCGTCTCTTCATCGAGAGAAATGTCGACGAGAAAAAAAAAACAAGGGCTCATTTAAAGCCTTTCAAATCCT
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@SRR9110374.2 2/1
ATATGGAACAAGTTAAAAAAAATAAAAAGCAAAGAAATAATGTTTTGTCATCGAAAGTGTCGACATAAAAACAGGTTGGCATCTGGCCTGGTATCTCA
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF<BFFFFFFF<FFFFFFFFFFF
@SRR9110374.3 3/1
NTATAACCGTATCAAAGAAGTTTACCCCGAGAGAAGCACGCAGTTTCCCACAGGTAATTTTCTCACAAGCGAGAGAAACATCATACCGCAATCAGGAAC
+
#<<BBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFF
@SRR9110374.4 4/1
GATAAAGAATATAGCTATGTATAGCCGGGATATATTAAGTGATTGAAATATCTCTTAGAAATCCATAGAATAGTAGTGTATCGAATAGGAGGAAGCGAAA
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

Ps: The real file is very huge so i I should get a method adapted to that it would be nice.

Inian
  • 71,145
  • 9
  • 121
  • 139
bewolf
  • 165
  • 9

3 Answers3

3

Could you please try following.

Nb_occurence=4
awk -v nb_occur="$Nb_occurence" '
BEGIN{
  occur=0
}
/@/{
  occur++
}
occur>nb_occur{
  exit
}
occur
' Input_file


Ps: The real file is very huge so i I should get a method adapted to that it would be nice.

To make Input_file reading faster:

To FASTER your processing of Input_file I have used exit so once your mentioned number of occurrences are done with reading it will ASAP come out of Input_file, since we need NOT to read it further and thus it should be FASTER than your solution.

RavinderSingh13
  • 117,272
  • 11
  • 49
  • 86
2

You should rewrite your awk on this way:

awk -v occurence=$Nb_occurence 'BEGIN{ found=0} /@/{found=found+1} {if ( found < occurence ) print }' file

And you do not need cat, awk can read the file

Romeo Ninov
  • 5,508
  • 1
  • 20
  • 29
  • 1
    Doing so, you will read entire file, this could become harmfull if file is huge! Have a look at [RavinderSingh13's solution](https://stackoverflow.com/a/59358192/1765658)! – F. Hauri Dec 16 '19 at 14:09
  • Right. But this is the way of OP. I just tune his/her solution. – Romeo Ninov Dec 16 '19 at 14:11
2

Yet another awk:

$ awk -v n=4 '/@/&&!n--{exit}1' file

Output:

@SRR9110374.1 1/1
GAGTATAAAGAAGAAAGTAAATCTCGGTTCGTCTCTTCATCGAGAGAAATGTCGACGAGAAAAAAAAAACAAGGGCTCATTTAAAGCCTTTCAAATCCT
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@SRR9110374.2 2/1
ATATGGAACAAGTTAAAAAAAATAAAAAGCAAAGAAATAATGTTTTGTCATCGAAAGTGTCGACATAAAAACAGGTTGGCATCTGGCCTGGTATCTCA
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF<BFFFFFFF<FFFFFFFFFFF
@SRR9110374.3 3/1
NTATAACCGTATCAAAGAAGTTTACCCCGAGAGAAGCACGCAGTTTCCCACAGGTAATTTTCTCACAAGCGAGAGAAACATCATACCGCAATCAGGAAC
+
#<<BBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFF
@SRR9110374.4 4/1
GATAAAGAATATAGCTATGTATAGCCGGGATATATTAAGTGATTGAAATATCTCTTAGAAATCCATAGAATAGTAGTGTATCGAATAGGAGGAAGCGAAA
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

Explained:

$ awk -v n=4 '    # -v variable=value is the way to introduce values to awk from the shell
/@/ && !n-- {     # when @ met (n+1)th time
    exit          # ... exit
}1' file          # output
James Brown
  • 34,397
  • 6
  • 36
  • 56