6

I'm facing a problem I don't even know what to search in Google/Stack Overflow. So comment if you feel the need for further explanation, questions.

Basically I want to intersect two lists and return the similarity with the preserved order of the original first string value.

Example:

I have two strings, that I convert to a CharArray. I want to Intersect these two arrays and return the values that are similar, including/with the order of the first string (s1).


As you can see the first string contains E15 (in that specific order), and so does the seconds one.

So these two strings will return : { 'E', '1', '5' }

string s1 = "E15QD(A)";
string s2 = "NHE15H";

The problem I am facing is that if i replace "s2" with:

string s2 = "NQE18H" // Will return {'Q', 'E', '1' }

My operation will return : {'Q', 'E', '1' }

The result should be : {'E', '1' } because Q don't follow the letter 1

Currently my operation is not the greatest effort, because i don't know which methods to use in .NET to be able to do this.

Current code:

List<char> cA1 = s1.ToList();
List<char> cA2 = s2.ToList();

var result = cA1.Where(x => cA2.Contains(x)).ToList();

Feel free to help me out, pointers in the right direction is acceptable as well as a full solution.

Ehsan Sajjad
  • 60,736
  • 15
  • 100
  • 154
J.Olsson
  • 819
  • 3
  • 12
  • 28
  • Using `Intersect` will give you the Order from first string[Example Here](https://dotnetfiddle.net/KqSIPC). But your requirement suggest that You might want to write the your own [Intersect with custom equality comparer](http://stackoverflow.com/questions/4340273/intersect-with-a-custom-iequalitycomparer-using-linq) – Mahesh Mar 13 '15 at 10:23
  • What should be the output for `s1="AXBYC"` and `s2="ABACUS"`? – Sergey Kalinichenko Mar 13 '15 at 10:26
  • @CoderofCode Will look in to that right away. – J.Olsson Mar 13 '15 at 10:26
  • 1
    I guess you're looking for a way to find the [longest common substring](http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Longest_common_substring). Here's also a C# implementation: http://stackoverflow.com/a/18953894/284240 – Tim Schmelter Mar 13 '15 at 10:27
  • @dasblinkenlight that should return { 'A' } . As the first character in both words is alike (has no previous letter) – J.Olsson Mar 13 '15 at 10:29
  • @J.Olsson Why not `"ABC"`? Should the characters be next to each other to be considered "in order"? – Sergey Kalinichenko Mar 13 '15 at 10:30
  • @TimSchmelter Well, i think i've been digging to deep to see it that way. Will look int to this also. (Thats awesome) – J.Olsson Mar 13 '15 at 10:30
  • @dasblinkenlight Yes. I'm sorry that i did not specify that. The letters should be next to eachother. – J.Olsson Mar 13 '15 at 10:32

3 Answers3

3

This is a "longest common substring" problem.

You can use this extension to get all substrings lazily:

public static class StringExtensions
{
    public static IEnumerable<string> GetSubstrings(this string str)
    {
        if (string.IsNullOrEmpty(str))
            throw new ArgumentException("str must not be null or empty", "str");

        for (int c = 0; c < str.Length - 1; c++)
        {
            for (int cc = 1; c + cc <= str.Length; cc++)
            {
                yield return str.Substring(c, cc);
            }
        }
    }
}

Then it's easy and readable with this LINQ query:

string longestIntersection = "E15QD(A)".GetSubstrings()
    .Intersect("NQE18H".GetSubstrings())
    .OrderByDescending(s => s.Length)
    .FirstOrDefault();  // E1

Enumerable.Intersect is also quite efficient since it's using a set. One note: if one both strings is larger than the other then it's more efficient(in terms of memory) to use it first:

longString.GetSubstrings().Intersect(shortString.GetSubstrings())
Tim Schmelter
  • 429,027
  • 67
  • 649
  • 891
1

I think this should do it:

string similar = null;

for (int i = 0; i < s1.Length; i++)
{
    string s = s1.Substring(0, i + 1);

     if (s2.Contains(s))
     {
         similar = s;
     }
}

char[] result = similar.ToCharArray();
mainvoid
  • 319
  • 1
  • 2
  • 15
  • In my initial test this works! Thanks man. @TimSchmelter in the comment of the original post also provided a link for solution of LongestSubstring. I will upload that answer aswell. – J.Olsson Mar 13 '15 at 10:40
0

@TimSchmelter provided the link to this answer in the comments of the original post.

public int LongestCommonSubstring(string str1, string str2, out string sequence)
    {
        sequence = string.Empty;
        if (String.IsNullOrEmpty(str1) || String.IsNullOrEmpty(str2))
            return 0;

        int[,] num = new int[str1.Length, str2.Length];
        int maxlen = 0;
        int lastSubsBegin = 0;
        StringBuilder sequenceBuilder = new StringBuilder();

        for (int i = 0; i < str1.Length; i++)
        {
            for (int j = 0; j < str2.Length; j++)
            {
                if (str1[i] != str2[j])
                    num[i, j] = 0;
                else
                {
                    if ((i == 0) || (j == 0))
                        num[i, j] = 1;
                    else
                        num[i, j] = 1 + num[i - 1, j - 1];

                    if (num[i, j] > maxlen)
                    {
                        maxlen = num[i, j];
                        int thisSubsBegin = i - num[i, j] + 1;
                        if (lastSubsBegin == thisSubsBegin)
                        {//if the current LCS is the same as the last time this block ran
                            sequenceBuilder.Append(str1[i]);
                        }
                        else //this block resets the string builder if a different LCS is found
                        {
                            lastSubsBegin = thisSubsBegin;
                            sequenceBuilder.Length = 0; //clear it
                            sequenceBuilder.Append(str1.Substring(lastSubsBegin, (i + 1) - lastSubsBegin));
                        }
                    }
                }
            }
        }
        sequence = sequenceBuilder.ToString();
        return maxlen;
    }
J.Olsson
  • 819
  • 3
  • 12
  • 28