1

I have a really big XML files and somehow they fail validation as user's used < and > instead of &lt; and &gt; in attributes.

Is there a way in C# .Net Core to replace all the < and > with &lt; and &gt; quickly?

I have XML like so:

<?xml version="1.0" encoding="utf-8"?>
<rootXML test="b < a">
<inside anotherTest="i could have < and > in here">Hello < all</inside>
</tootXML>

The hard part is I don't know where the < and > are and the XMLs can be quite different.

Thanks.

Michael Kay
  • 147,186
  • 10
  • 83
  • 148
cdub
  • 22,735
  • 52
  • 164
  • 287
  • 3
    You don't have a really big XML file, you have a heap of junk. – Michael Kay Sep 20 '21 at 23:12
  • Assuming you're just worried about attributes, preprocess the string to replace each `>` or ` – David Browne - Microsoft Sep 21 '21 at 00:10
  • @DavidBrowne-Microsoft: That won't work. Quotes can appear elsewhere besides as attribute value delimiters. The inescapable problem is that it's impossible to parse an undefined language. For a collection of practical methods to (try to) ameliorate such markup messes, see [How to parse invalid (bad / not well-formed) XML?](https://stackoverflow.com/q/44765194/290085). – kjhughes Sep 21 '21 at 00:54
  • In general, yes. IE fi there is XML or HTML pasted into attributes or text nodes nothing will work. But in specific cases you can preprocess the text to make it valid. `>` and '` and are always followed by ``. Of course if the text contains `` you're screwed. But if it's just `>` or ` – David Browne - Microsoft Sep 21 '21 at 01:10
  • @DavidBrowne-Microsoft: Assuming rule-breaking to be bound by rules is unwise. – kjhughes Sep 21 '21 at 01:17

0 Answers0