HTML decoder in Natural

Is there a function or utility to decode HTML encoding in Natural? We have a need to remove encoding from strings (not xml documents).
Thanks,
Craig Soehnlin

Do you mean for displaying HTML for I/O purposes, or more like string parsing? I could see you wanting to display a web page, or maybe reverse engineer a 3270 map out of a web page (though that sounds rather difficult), or perhaps interpreting data from a table (or something like that) for processing and storage.

we have a .net app that is sending data and is using HTML encoding to send special characters in the free form text. We need to have Natural remove the encoding and insert the proper characters. We could do a lot of examine replaces, but it would be much nicer to have a function or utility I could call to do it all at once. The XML parser does XML decoding but this data is a string, not a document.
Thanks,
Craig Soehnlin

To help with debugging of Unicode and UTF-8 related problems, I’ve written two tools:

utf8-decoder
Paste in some UTF-8 bytes (either as hexadecimal, decimal, octal, or binary numbers, or as a hex dump, or as raw bytes in the form of Windows-1252 or ISO-8859-1 characters) and this script will tell you what the characters are, including UTF-8 decoding diagnostics.

For example, if you are viewing a UTF-8 encoded file in a raw Emacs buffer, and your buffer contains \342​\200​\253​\330​\263​\331​\204​\330​\247​\331​\205, and you want to know what on earth that is, you just need to select that exact string, paste it into the script’s input field, and click the submit button. It will then tell you the characters are:

202B RIGHT-TO-LEFT EMBEDDING
0633 ARABIC LETTER SEEN
0644 ARABIC LETTER LAM
0627 ARABIC LETTER ALEF
0645 ARABIC LETTER MEEM
This can be very useful, especially since the first one above (the RLE) is not a visible character! The script also includes some other useful information, such as the binary representation of each input byte and the entities you would use to include the characters in a US-ASCII HTML or XML file.


Google’s first page and loads of traffic to your website? Hire a SEO Specialist from Ocean Groups seo pecialist