3

I'm using CRCmapper on my data. There are several modules in the tool. I've used exactly instructed in the usage.

Commands used:

python CRCmapper.py -e MB12_noEncBlack_AllEnhancers.table.txt -b MB12.bam -g HG19 -f hg19-stdChr/ -s MB12_peaks.bed -n MB12 -o MB12/

Here is the error I got:

Warning: text mode turns off computation of q-values
IDENTIFY TF-TF INTERACTIONS
Traceback (most recent call last):
  File "CRCmapper.py", line 605, in <module>
    main()
  File "CRCmapper.py", line 593, in main
    graph = buildNetwork(projectFolder, projectName, candidateGenes, refseqToNameDict, motifConvertFile)
  File "CRCmapper.py", line 381, in buildNetwork
    target = refseqToNameDict[region[0]]
KeyError: 'AP1'

My expertise in python is very basic. I would be grateful if someone take a look at this error and suggest how to fix.

P.S. I contacted authors a long back, I didn't get any response.

Thank you.

llrs
  • 4,693
  • 1
  • 18
  • 42
user1545
  • 39
  • 3
  • It seems that the problem is within the files, are you sure that AP1 is in all your files? It could also be a bug on the software – llrs Sep 28 '17 at 10:55
  • Yeah, it exists. For other samples, it is a different motif for different sample. It seems to be a bug in the software but I've no idea. Hope someone here will be able to understand. – user1545 Sep 28 '17 at 13:33
  • Could you paste the minimal files that reproduce the bug? Maybe you can even propose a change in the repository to fix it – llrs Sep 28 '17 at 13:47
  • Yeah, I will prepare an example to reproduce. They haven't enabled the issue creator for their repo. – user1545 Sep 28 '17 at 13:59
  • Well you can do directly a pull request – llrs Sep 28 '17 at 14:52

2 Answers2

2

It seems that the author of CRCmapper.py wants to extract sequence_name, start, stop columns from fimo.txt file. But the code extract motif_alt_id, sequence_name, start columns. So the index of the code should be changed.

fimo.txt looks like this:

motif_id    motif_alt_id    sequence_name   start   stop    strand  score   p-value q-value matched_sequence
Transfac.V$AP1_02   AP1 NM_003046931|3|95762|152118 2332    3118    -   11.9802 6.49e-05        TGACTCA

In line 377-382 of the code:

line[1] should be changed to line[2].

for line in fimoTable[1:]:
    source = motifDatabaseDict[line[0]]
    region = line[1].split('|')
    target = refseqToNameDict[region[0]]
    location = (region[1], int(region[2]), int(region[3]))

In line 391-394 of the code:

line[2] should be changed to line[3], line[3] should be changed to line[4].

    # Count unique motifs
    if (region[1], int(region[2]) + int(line[2]), int(region[2]) + int(line[3])) not in motifDictSE[source]:
        edgeCountDictSE[(source, target)] += 1
        motifDictSE[source].append((region[1], int(region[2]) + int(line[2]), int(region[2]) + int(line[3])))
Bioathlete
  • 2,574
  • 12
  • 29
ying
  • 21
  • 3
2

This error came from the fact that the output of the new fimo version has an additional 'motif_alt_id' column in second position.

CRCmapper has been modified to take this into account. You can find the updated version here: https://bitbucket.org/young_computation/crcmapper/src/master/

P.S.: sorry if we missed you request user1545 and thanks for using CRCmapper!