14

I have a situation where I am passing a string to a function. I want to convert   to " " (a blank space) before passing it to function. Does html_entity_decode does it?

If not how to do it?

I am aware of str_replace but is there any other way out?

Salman A
  • 248,760
  • 80
  • 417
  • 510
Abhishek Sanghvi
  • 4,502
  • 6
  • 26
  • 34

4 Answers4

39

Quote from html_entity_decode() manual:

You might wonder why trim(html_entity_decode(' ')); doesn't reduce the string to an empty string, that's because the ' ' entity is not ASCII code 32 (which is stripped by trim()) but ASCII code 160 (0xa0) in the default ISO 8859-1 characterset.

You can use str_replace() to replace the ascii character #160 to a space:

<?php
$a = html_entity_decode('>&nbsp;<');
echo 'before ' . $a . PHP_EOL;
$a = str_replace("\xA0", ' ', $a);
echo ' after ' . $a . PHP_EOL;
Salman A
  • 248,760
  • 80
  • 417
  • 510
  • 22
    If you are working with UTF-8 encoded strings you should replace \xC2\xA0 . $a = html_entity_decode('>  – chugadie Oct 11 '13 at 11:52
  • I've been struggling a lot with the data I retrieve from a `contenteditable` element, all `rtrim` and `preg_replace` attempts failed. I've also been trying to filter stuff with JavaScript before shooting it with `$.ajax()`, also failed. So now I do `str_replace(" ", ' ', $value)` and then `preg_replace('/\s+$/','',$value)`. It works, though not too elegant. If someone has suggestions, please tell me, – Matt Oct 20 '15 at 10:01
5

html_entity_decode does convert &nbsp; to a space, just not a "simple" one (ASCII 32), but a non-breaking space (ASCII 160) (as this is the definition of &nbsp;).

If you need to convert to ASCII 32, you still need a str_replace(), or, depending on your situation, a preg_match("/s+", ' ', $string) to convert all kinds of whitespace to simple spaces.

Aurimas
  • 2,468
  • 17
  • 23
5

YES

See PHP manual http://php.net/manual/en/function.html-entity-decode.php.

Carefully read the Notes, maybe that s the issue you are facing:

You might wonder why trim(html_entity_decode('&nbsp;')); doesn't reduce the string to an empty string, that's because the ' ' entity is not ASCII code 32 (which is stripped by trim()) but ASCII code 160 (0xa0) in the default ISO 8859-1 characterset.

Konrad Dzwinel
  • 35,424
  • 12
  • 95
  • 103
Frederic Bazin
  • 1,492
  • 12
  • 22
2

Not sure if it is a viable solution for most cases but I used trim(strip_tags(html_entity_decode(htmlspecialchars_decode($html), ENT_QUOTES, 'UTF-8')));in my most recent application. The addition of htmlspecialchars_decode() initially was the only thing that would actually strip them.

Tyler Christian
  • 480
  • 9
  • 14