1

Not sure if this question has been asked before, but I couldn't find it, so here it is:

randomList = ["ACGT","A#$..G","..,/\]AGC]]]T"]
randomList2 = []
for i in randomList:
  if i <contains any characters other than "A",C","G", or "T">:
    <add a string without junk to randomList2>

How would I do all the things within <>? Thanks,

Pydronia
  • 21
  • 6

3 Answers3

4
>>> randomList = ["ACGT","A#$..G","..,/\]AGC]]]T"]
>>> import re
>>> [re.sub("[^ACGT]+", "", s) for s in randomList]
['ACGT', 'AG', 'AGCT']

[^ACGT]+ matches one or more (+) characters except ACGT.

Some timings:

>>> import timeit
>>> setup = '''randomList = ["ACGT","A#$..G","..,/\]AGC]]]T"]
... import re'''
>>> timeit.timeit(setup=setup, stmt='[re.sub("[^ACGT]+", "", s) for s in randomList]')
8.197133132976195
>>> timeit.timeit(setup=setup, stmt='[re.sub("[^ACGT]", "", s) for s in randomList]')
9.395620040786165

Without re, it's faster (see @cmd's answer):

>>> timeit.timeit(setup=setup, stmt="[''.join(c for c in s if c in 'ACGT') for s in randomList]")
6.874829817476666

Even faster (see @JonClement's comment):

>>> setup='''randomList = ["ACGT","A#$..G","..,/\]AGC]]]T"]\nascii_exclude = ''.join(set('ACGT').symmetric_difference(map(chr, range(256))))'''
>>> timeit.timeit(setup=setup, stmt="""[item.translate(None, ascii_exclude) for item in randomList]""")
2.814761871275735

Also possible:

>>> setup='randomList = ["ACGT","A#$..G","..,/\]AGC]]]T"]'
>>> timeit.timeit(setup=setup, stmt="[filter(set('ACGT').__contains__, item) for item in randomList]")
4.341086316883207
Tim Pietzcker
  • 313,408
  • 56
  • 485
  • 544
4

re is overkill for this

randomList2 = [''.join(c for c in s if c in 'ACGT') for s in randomList]

and if you dont want the ones that didn't initially have junk

valid = set("ACGT")
randomList2 = [''.join(c for c in s if c in valid) for s in randomList if any(c2 not in valid for c2 in s)]
cmd
  • 5,444
  • 15
  • 29
0

You can use regular expressions:

import re
randomList = ["ACGT","A#$..G","..,/\]AGC]]]T"]
nonACGT = re.compile('[^ACGT]')
for i in range(len(randomList)):
    randomList[i] = nonACGT.sub('', randomList[i])
print randomList
Al Sweigart
  • 9,509
  • 9
  • 59
  • 85