1

Say we have the following strings that we pass as parameters to the function below:

string sString = "S104";
string sString2 = "AS105";
string sString3 = "ASRVT106";

I want to be able to extract the numbers from the string to place them in an int variable. Is there a quicker and/or more efficient way of removing the letters from the strings than the following code?: (*These strings will be populated dynamically at runtime - they are not assigned values at construction.)

Code:

public GetID(string sCustomTag = null)
{
    m_sCustomTag = sCustomTag;
    try {
        m_lID = Convert.ToInt32(m_sCustomTag); }
        catch{
            try{
                int iSubIndex = 0;     
                char[] subString = sCustomTag.ToCharArray(); 

                //ITERATE THROUGH THE CHAR ARRAY
                for (int i = 0; i < subString.Count(); i++)     
                {
                    for (int j = 0; j < 10; j++)
                    {
                        if (subString[i] == j)
                        {
                            iSubIndex = i;
                            goto createID;
                        }
                    }
                }

            createID: m_lID = Convert.ToInt32(m_sCustomTag.Substring(iSubIndex));
            }
            //IF NONE OF THAT WORKS...
            catch(Exception e)
            {
                m_lID = 00000;
                throw e;
            }
         }
     }
 }

I've done things like this before, but I'm not sure if there's a more efficient way to do it. If it was just going to be a single letter at the beginning, I could just set the subStringIndex to 1 every time, but the users can essentially put in whatever they want. Generally, they will be formatted to a LETTER-then-NUMBER format, but if they don't, or they want to put in multiple letters like sString2 or sString3, then I need to be able to compensate for that. Furthermore, if the user puts in some whacked-out, non-traditional format like string sString 4 = S51A24;, is there a way to just remove any and all letters from the string?

I've looked about, and can't find anything on MSDN or Google. Any help or links to it are greatly appreciated!

Uchiha Itachi
  • 1,181
  • 1
  • 15
  • 39
  • why are you using `goto` in code just curious – MethodMan Jan 04 '17 at 22:20
  • 3
    use regex...... – L.B Jan 04 '17 at 22:21
  • I used that rather than a `break` because it's in a nested for loop and I want to the very first instance of a number. I feel like the `break` will find the first number, then keep iterating through the rest of the `char[]`, which I don't want, because I don't want `iSubIndex` to get reassigned. – Uchiha Itachi Jan 04 '17 at 22:22
  • @L.B what's regex? – Uchiha Itachi Jan 04 '17 at 22:23
  • 1
    @GeoffOverfield what's `google`..? also there are plenty of examples on the internet on how to extract numbers from a string here is 1 of them http://stackoverflow.com/questions/4734116/find-and-extract-a-number-from-a-string – MethodMan Jan 04 '17 at 22:23
  • 1
    and here is a solution to ... [http://stackoverflow.com/a/273144/3877877](http://stackoverflow.com/a/273144/3877877) .... in short `Regex.Replace(input, @"[^\d+$]", "");` – Martin E Jan 04 '17 at 23:21

5 Answers5

3

You can use a regular expression. It's not necessarily faster, but it's more concise.

string sString = "S104";
string sString2 = "AS105";
string sString3 = "ASRVT106";

var re = new Regex(@"\d+");

Console.WriteLine(re.Match(sString).Value); // 104
Console.WriteLine(re.Match(sString2).Value); // 105
Console.WriteLine(re.Match(sString3).Value); // 106
Wagner DosAnjos
  • 6,156
  • 1
  • 12
  • 26
2

You can use a Regex, but it's probably faster to just do:

public int ExtractInteger(string str)
{
    var sb = new StringBuilder();
    for (int i = 0; i < str.Length; i++)
        if(Char.IsDigit(str[i])) sb.Append(str[i]);
    return int.Parse(sb.ToString());
}

You can simplify further with some LINQ at the expense of a small performance penalty:

public int ExtractInteger(string str)
{
    return int.Parse(new String(str.Where(c=>Char.IsDigit(c)).ToArray()));
}

Now, if you only want to parse the first sequence of consecutive digits, do this instead:

public int ExtractInteger(string str)
{
    return int.Parse(new String(str.SkipWhile(c=>!Char.IsDigit(c)).TakeWhile(c=>Char.IsDigit(c)).ToArray()));
}
Diego
  • 17,399
  • 5
  • 57
  • 65
2

Fastest is to parse the string without removing anything:

var s = "S51A24";
int m_lID = 0;

for (int i = 0; i < s.Length; i++)
{
    int d = s[i] - '0';
    if ((uint)d < 10)
        m_lID = m_lID * 10 + d;
}

Debug.Print(m_lID + ""); // 5124
Slai
  • 21,055
  • 5
  • 42
  • 49
0
    string removeLetters(string s)
    {
        for (int i = 0; i < s.Length; i++)
        {
            char c = s[i];

            if (IsEnglishLetter(c))
            {
                s = s.Remove(i, 1);
            }
        }

        return s;
    }

    bool IsEnglishLetter(char c)
    {
        return (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z');
    }
BiggerD
  • 263
  • 3
  • 17
  • Thanks. I'm gonna try this when I get back to work tomorrow. – Uchiha Itachi Jan 04 '17 at 22:33
  • 3
    Note that since `string` is a immutable class, every time you remove a character (`s.Remove(i, 1)`), you are effectively instantiating a new object, which makes this code slower. – Diego Jan 04 '17 at 22:35
0

While you asked "what's the fastest way to remove characters..." what you're really saying is "how do I create an integer by extracting only the digits from the string".

Going with this assumption, your first call to Convert.ToInt32 will be slow for the case where you have other than digits because of the exception throwing. Change it to

        if (int.TryParse(sCustomTag, out m_lID))
            return;

You could then use in-place unsafe iteration of the characters of the string (this uses fixed and avoids the data copy in ToCharArray()), extracting the digits, and then converting them. It avoids the allocations of StringBuilder and is a little faster than iterating the string the usual way.

Here's a copy/paste-able version:

    public static unsafe int GetNumber(string s)
    {
        int number;
        if (int.TryParse(s, out number))
            return number;

        int value = 0;
        fixed (char* pString = s)
        {
            var pChar = pString;
            for (int i = 0; i != s.Length; i++, pChar++)
            {
                if (*pChar < '\u0030' || *pChar > '\u0039') continue;
                value = value * 10 + *pChar - '\u0030';
            }
        }

        return value;
    } 

If you know the digits are always at the beginning, change the continue to break; If the digits are always at the end, iterate backwards, converting each individual digit, multiplying that digit by the appropriate power of 10 and then adding to the accumulated result (e.g. your last example is 6*100 + 0*101 + 1*102) until you get to a non-digit.

Community
  • 1
  • 1
Kit
  • 17,129
  • 4
  • 56
  • 95
  • Just noticed... This is in C++... I'm working in C#. Thanks for the help Kit, but this won't work for me particularly. Anyone who needs this in C++ - this is your solution!! – Uchiha Itachi Jan 06 '17 at 16:31
  • This ***is*** C# -- throw it in a file and compile. I haven't benchmarked it but I'd be surprised if it wasn't way faster than your accepted answer. I think the pointer notation threw you off. – Kit Jan 06 '17 at 17:50
  • I see pointers. Regardless, Thanks for your help with this. I appreciate it. – Uchiha Itachi Jan 06 '17 at 17:54