How to calculate the Maximum Ladder Distance (MLD) of ssRNA from Dot-Bracket notation?

Question

The ladder distance (LD) between residue i and residue j is the minimum number of base pairs (created due to secondary structure of the ssRNA) one has to cross when going from i to j.

The Maximum Ladder Distance of a ssRNA molecule is the maximum ladder distance between any two of the residues of this molecule[*].

Given the secondary structure of a ssRNA molecule in dot-bracket notation (e.g. by RNAfold), how can I calculate the Maximum Ladder Distance of this molecule?

Below is an illustration from [*]. The MLD is 11, since this is the number of basepairs on the yellow line.

Edit: The dot-bracket notation for this molecule would be

..(((((.......))...((.((((((.......))))))..)).)).

*: Yoffe AM, Prinsen P, Gopal A, Knobler CM, Gelbart WM, et al. (2008) Predicting the sizes of large RNA molecules. Proc Natl Acad Sci U S A 105: 16153–16158

Please [edit] your question and post the dot-bracket representation of the above structure. Use the code formatting button ({}) to format it in a monospace font for clarity. — MattDMo, Apr 04 '23 at 21:58

score 5 · Answer 1 · answered Apr 05 '23 at 10:05

There are different implementations for calcuating the $\rm MLD$. One possible approach is to consider $\rm MLD$ as the heighest value of all possible mountain plots, generated by shifting the beginning of your sequence. The highest value of a mountain plot tells you the maximal ladder distance that can be reached from the first nucleotide. But the longest path might not begin at the first nucleotide, and that is why you have to go through all possible shifts of the sequence.

Here is an example of this implementation in Python. The algorithm is as following:

Convert dot-bracket notation to a list of base pairs (dot2pairs).
Calculate the first mountain plot.
Calculate required shifts. This is not strictly neccessary because you can swipe through all possible shifts = list(range(0, length)), but it gives better performance.
Iterate through all shifts, calculate the MLD at each step, and return the maximal one.

import numpy as np
from collections import deque
def dot2pairs(dot):
    stack, pairs = deque(), []
for i, n in enumerate(dot):
    if n == '(':
        stack.append(i)
    elif n == ')':
        pairs.append((stack.pop(), i))

return np.array(pairs)


def dot2MLD(dot):
    # Convert to a list of base pairs
    pairs0 = dot2pairs(dot)
if len(pairs0) == 0:
    return 0

MLD = 0
shifts = [0]
length = len(dot)
seq = np.zeros(length, dtype=int)
curr_mount = None

while shifts:        
    # Shift the sequence
    pairs = (pairs0 - shifts.pop()) % length

    # Find positions of opening and closing nucleotides
    opens, closes = np.min(pairs, axis=1), np.max(pairs, axis=1)

    seq.fill(0)
    seq[opens] = 1
    seq[closes] = -1

    mountain = np.cumsum(seq)

    MLD = max(MLD, int(np.max(mountain)))

    # In first iteration, calculate all required shifts
    if curr_mount is None:
        curr_mount = mountain[0]
        for i, m in enumerate(mountain):
            if m == curr_mount: 
                shifts.append(i)
            else:
                curr_mount = m

return MLD

Let's test it on your structure:

dot2MLD("..(((((.......)))...((.((((((.......))))))..)).)).")

11

And on a larger structure:

dot2MLD(s)

1141

It took ~ 5 seconds for this 17481 nt long structure (and it takes ~ 20 seconds if you don't precalculate requires shifts).

How to calculate the Maximum Ladder Distance (MLD) of ssRNA from Dot-Bracket notation?

1 Answers1

Related