0

I'm trying to make a function in c that takes a string as an input and returns the same string, but with the "&", ">", "<" replaced with "& amp;", "& lt;" and "& gt;" (Excluding whitespace).

I'm struggling to understand how I can do this.

I have tried to run the string through a loop and then compare each character in the string with the symbol using strcmp in order to compare. And if the character is the same, try to replace the character with the corresponding entities.

Some code to show what I've been trying:

#include <stdio.h>
#include <string.h>

char *replace_character(char *str) {
  for(size_t i = 0; i <= strlen(str); i++) {
    if(strcmp(str[i], '&') {
      str[i] = "&amp;";
    }
    ... (same procedure for the rest of the characters)
  }
  return str;
}


int main() {
 char with_symbol[] = "this & that";

 printf(replace_character(with_symbol));
}

Expected result: "This &amp that"

Svele
  • 13
  • 5
  • 3
    You can't replace the characters in place (because the lengths are different). See [this question](https://stackoverflow.com/questions/779875/what-is-the-function-to-replace-string-in-c) for some possible solutions. – Federico klez Culloca Sep 12 '19 at 10:53
  • You have to allocate new memory with enough size for the new string. Then coy all the content into the new memory. Remember to `free` it. – KamilCuk Sep 12 '19 at 11:07
  • @KamilCuk Could this be solved by giving the array an bigger predefined size e.g( char with_symbol[50] = ....) ? – Svele Sep 12 '19 at 11:24
  • It could be solved that way. And "bigger size" must be enough to hold all characters. Then the operation `str[i] = "&";` becomes "move all characters behind `i` 5 bytes to the left (memmove) and copy `"$amp;"` character into the position `i` (memcpy)". – KamilCuk Sep 12 '19 at 11:26
  • @KamilCuk I'm afraid I'm not quite following this. If the first if-statement hits (strcmp(str[i], '&' == 0), do I need to do the memmove and memcpy call or is this implicit done with the statement that is already written? – Svele Sep 12 '19 at 11:34
  • `implicit done` - this is C. Nothing is implicitly done. Also `strcmp(str[i], '&'` is invalid, you can't compare characters with `strcmp`. – KamilCuk Sep 12 '19 at 13:43

1 Answers1

1

The concept of string in C is a low-level one: an array of characters. Just as you cannot take an array of integers and directly replace one of its integers with a whole other array, you cannot directly replace a character of a string with another string. You must first allocate the necessary memory for the extra characters that you want to jam into your original string.

Below I offer a code that will do that. It isn't the most efficient, but gives you an idea of how this should work. It is inefficient because it first goes through the whole string counting the special symbols that are going to be replaced and figuring out how much extra space is needed, then it goes over it again when it copies the characters.

#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

char *replace(const char *s)
{
    size_t i, j;
    size_t len, extra;
    char *r = NULL;

    len = strlen(s);
    extra = 0;

    /* First we count how much extra space we need */
    for (i = 0; i < len; ++i) {
        if (s[i] == '&')
            extra += strlen("&amp;") - 1;
        else if (s[i] == '<')
            extra += strlen("&lt;") - 1;
        else if (s[i] == '>')
            extra += strlen("&gt;") - 1;
    }

    /* Allocate a new string with the extra space */
    r = malloc(len + extra + 1);
    assert(r != NULL);

    /* Put in the extra characters */
    j = 0;
    for (i = 0; i < len; ++i) {
        if (s[i] == '&') {
            r[j++] = '&';
            r[j++] = 'a';
            r[j++] = 'm';
            r[j++] = 'p';
            r[j++] = ';';
        } else if (s[i] == '<') {
            r[j++] = '&';
            r[j++] = 'l';
            r[j++] = 't';
            r[j++] = ';';
        } else if (s[i] == '>') {
            r[j++] = '&';
            r[j++] = 'g';
            r[j++] = 't';
            r[j++] = ';';
        } else {
            r[j++] = s[i];
        }
    }

    /* Mark the end of the new string */
    r[j] = '\0';

    /* Just to make sure nothing fishy happened */
    assert(strlen(r) == len + extra);

    return r;
}

int main(void)
{
    const char *sorig = "this &, this >, and this < are special characters";
    char *snew;

    snew = replace(sorig);

    printf("original  :  %s\n", sorig);
    printf("     new  :  %s\n", snew);

    free(snew);

    return 0;
}

A better strategy would be to define a lookup table or map so that you can include or exclude new pairs of symbols and their replacements just by changing the table. You can also use strncpy for this, avoiding the character by character treatment. The example above is just to illustrate what goes on under the hood.

gustgr
  • 152
  • 9