1

Is there a simple way to remove extended ASCII characters in a varchar(max). I want to remove all ASCII characters from 128 onwards. eg - ù,ç,Ä

I have tried this solution and its not working, I think its because they are still valid ASCII characters?

How do I remove extended ASCII characters from a string in T-SQL?

Thanks

Community
  • 1
  • 1
SkipFeeney
  • 61
  • 1
  • 6
  • Is this gonna be a part of a trigger code? To avoid those values in the future? – jarlh Mar 23 '16 at 11:31
  • 3
    Use the linked answer, change `IF UNICODE(@nchar) between 32 and 255` to `IF UNICODE(@nchar) < 128` – Alex K. Mar 23 '16 at 11:39
  • The values are already in a column so either a function or Update statement will do? – SkipFeeney Mar 23 '16 at 11:41
  • @AlexK., as the linked answer is using a loop, I think this might be done better (see my answer) – Shnugo Mar 23 '16 at 12:17
  • 2
    Possible duplicate of [How do I remove extended ASCII characters from a string in T-SQL?](http://stackoverflow.com/questions/15259622/how-do-i-remove-extended-ascii-characters-from-a-string-in-t-sql) – Shnugo Mar 24 '16 at 08:11
  • Hi Feeney, I just visited your profile and found, that within years you did not raise one single vote and didn't accept answers (well, the very last one is the exception...) Please be aware that votes and acceptance are the fuel SO is running with. [Please read this: someone-answers](http://stackoverflow.com/help/someone-answers). It would be kind to have a look onto your older questions too. Thx! – Shnugo Apr 01 '16 at 08:16

1 Answers1

3

The linked solution is using a loop which is - if possible - something you should avoid.

My solution is completely inlineable, it's easy to create an UDF (or maybe even better: an inline TVF) from this.

The idea: Create a set of running numbers (here it's limited with the count of objects in sys.objects, but there are tons of example how to create a numbers tally on the fly). In the second CTE the strings are splitted to single characters. The final select comes back with the cleaned string.

DECLARE @tbl TABLE(ID INT IDENTITY, EvilString NVARCHAR(100));
INSERT INTO @tbl(EvilString) VALUES('ËËËËeeeeËËËË'),('ËaËËbËeeeeËËËcË');

WITH RunningNumbers AS
(
    SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS Nmbr
    FROM sys.objects
)
,SingleChars AS
(
    SELECT tbl.ID,rn.Nmbr,SUBSTRING(tbl.EvilString,rn.Nmbr,1) AS Chr
    FROM @tbl AS tbl
    CROSS APPLY (SELECT TOP(LEN(tbl.EvilString)) Nmbr FROM RunningNumbers) AS rn 
)
SELECT ID,EvilString
      ,(
        SELECT '' + Chr 
        FROM SingleChars AS sc
        WHERE sc.ID=tbl.ID AND ASCII(Chr)<128
        ORDER BY sc.Nmbr
        FOR XML PATH('')
      ) AS GoodString
FROM @tbl As tbl

The result

1   ËËËËeeeeËËËË    eeee
2   ËaËËbËeeeeËËËcË abeeeec

Here is another answer from me where this approach is used to replace all special characters with secure characters to get plain latin

Community
  • 1
  • 1
Shnugo
  • 64,489
  • 9
  • 46
  • 104
  • First +1 for the good solution and the alias "EvilString"... I know this is an old thread but... This will blow up if the string contains reserved XML characters or any spaces. I would suggest changing `FOR XML PATH('')` to `FOR XML PATH(''),TYPE` and including `.value('(text())[1]','varchar(8000)')` right before your `AS GoodString` alias; e.g. change `...) AS GoodString` to `...).value('(text())[1]','varchar(8000)') AS GoodString`. Since this post 2017 added STRING_AGG which circumvents the XML character issue. – Alan Burstein Apr 18 '19 at 22:58