Sure, this little Perl snippet should do it:
$ perl -pe 's/$/_$seen{$_}/ if ++$seen{$_}>1 and /^>/; ' file.fa
>1_uniqueGeneName
atgc
>1_anotherUniqueGeneName
atgc_2
>1_duplicateName
atgc_3
>1_duplicateName_2
atgc_4
Or, to make the changes in the original file, use -i:
perl -i.bak -pe 's/$/_$seen{$_}/ if ++$seen{$_}>1 and /^>/; ' file.fa
Note that the first occurrence of a duplicate name isn't changed, the second will become _2, the third _3 etc.
Explanation
perl -pe : print each input line after applying the script given by -e to it.
++$seen{$_}>1 : increment the current value stored in the hash %seen for this line ($_) by 1 and compare it to 1.
s/$/_$seen{$_}/ if ++$seen{$_}>1 and /^>/ : if the current line starts with a > and the value stored in the hash %seen for this line is greater than 1 (if this isn't the first time we see this line), replace the end of the line ($) with a _ and the current value in the hash
Alternatively, here's the same idea in awk:
$ awk '(/^>/ && s[$0]++){$0=$0"_"s[$0]}1;' file.fa
>1_uniqueGeneName
atgc
>1_anotherUniqueGeneName
atgc
>1_duplicateName
atgc
>1_duplicateName_2
atgc
To make the changes in the original file (assuming you are using GNU awk which is the default on most Linux versions), use -i inplace:
awk -iinplace '(/^>/ && s[$0]++){$0=$0"_"s[$0]}1;' file.fa
Explanation
In awk, the special variable $0 is the current line.
(/^>/ && s[$0]++) : if this line starts with a > and incrementing the value stored in the array s for this line by 1 evaluates to true (is greater than 0).
$0=$0"_"s[$0] : make the current line be itself with a _ and the value from s appended.
1; : this is just shorthand for "print this line". If an expression evaluates to true, awk will print the current line. Since 1 is always true, this will print every line.
If you want all of the duplicates to be marked, you need to read the file twice. Once to collect the names and a second to mark them:
$ awk '{
if (NR==FNR){
if(/^>/){
s[$0]++
}
next;
}
if(/^>/){
k[$0]++;
if(s[$0]>1){
$0=$0"_"k[$0]
}
}
print
}' file.fa file.fa
>1_uniqueGeneName
atgc
>1_anotherUniqueGeneName
atgc
>1_duplicateName_1
atgc
>1_duplicateName_2
atgc
IMPORTANT: note that all of these approaches assume you don't already have sequence names ending with _N where N is a number. If your input file has 2 sequences called foo and one called foo_2, then you will end up with two foo_2:
$ cat test.fa
>foo_2
actg
>foo
actg
>foo
actg
$ perl -pe 's/$/_$seen{$_}/ if ++$seen{$_}>1 and /^>/; ' test.fa
>foo_2
actg
>foo
actg
>foo_2
actg
If this can be an issue for you, use one of the more sophisticated approaches suggested by the other answers.
duplicate_1,duplicate_2or isduplicate,duplicate_1enough? The latter is far simpler since the former will require you to read the file twice. – terdon Jul 04 '17 at 11:57duplicate_1 duplicate_2format but if it is much more complex then I am fine with theduplicate duplicate_1format. – AudileF Jul 04 '17 at 12:32