0

I am developing an iOS app, and I want to find and replace things in a string. Basically, I want to remove both HTML tags and comments.

Here is my string :

<p>
  <!--[if gte mso 9]><xml> <w:WordDocument> <w:View>Normal</w:View> <w:Zoom>0</w:Zoom> <w:HyphenationZone>21</w:HyphenationZone> <w:PunctuationKerning /> <w:ValidateAgainstSchemas /> <w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid> <w:IgnoreMixedContent>false</w:IgnoreMixedContent> <w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText> <w:Compatibility> <w:BreakWrappedTables /> <w:SnapToGridInCell /> <w:WrapTextWithPunct /> <w:UseAsianBreakRules /> <w:DontGrowAutofit /> </w:Compatibility> <w:BrowserLevel>MicrosoftInternetExplorer4</w:BrowserLevel> </w:WordDocument> </xml><![endif]--><!--[if gte mso 9]><xml> <w:LatentStyles DefLockedState="false" LatentStyleCount="156"> </w:LatentStyles> </xml><![endif]-->
</p>
<p>
  <!--[if gte mso 10]><mce:style><!<br />/* Style Definitions */<br />table.MsoNormalTable<br /> {mso-style-name:"Tableau Normal";<br /> mso-tstyle-rowband-size:0;<br /> mso-tstyle-colband-size:0;<br /> mso-style-noshow:yes;<br /> mso-style-parent:"";<br /> mso-padding-alt:0cm 5.4pt 0cm 5.4pt;<br /> mso-para-margin:0cm;<br /> mso-para-margin-bottom:.0001pt;<br /> mso-pagination:widow-orphan;<br /> font-size:10.0pt;<br /> font-family:"Times New Roman";<br /> mso-ansi-language:#0400;<br /> mso-fareast-language:#0400;<br /> mso-bidi-language:#0400;}<br />--><!--[endif]-->
</p>

Here is my piece of code, trying to remove both HTML comments and HTML tags :

-(NSString *) stringByStrippingHTML: (NSString*) str
{
    NSString *s = str;
    NSError *err = NULL;
    NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:@"<!--(.*?)-->|<div[^>]+>(.*?)</div>|<[^>]+>"
                                                                           options:0
                                                                             error:&err];

    NSString *result = [regex stringByReplacingMatchesInString:s
                                                       options:0
                                                         range:NSMakeRange(0, s.length)
                                                  withTemplate:@""];

    return result;
}

My issue here is that I shouldn't get anything, but here's what I get :

/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Tableau Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-parent:"";
mso-padding-alt:0cm 5.4pt 0cm 5.4pt;
mso-para-margin:0cm;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:10.0pt;
font-family:"Times New Roman";
mso-ansi-language:#0400;
mso-fareast-language:#0400;
mso-bidi-language:#0400;}
--> 

The first comment between the two <p> tags is correctly removed, but the second one doesn't want to go away... I tried to see the result on many regex online tester, and it appears to be that my regular expression is correct. So, can any of you tell me where does my issue come from ?

I noticed a <! in the second comment, but I really don't know if it comes from here...

Thank you very much for your help !

And excuse my english, I'm french :)

Jerome
  • 905
  • 8
  • 17

0 Answers0