I occasionally receive text files (for my file polling service) that contains accented characters (e.g. ü).
This throws out the pub.flatFile:convertToValues service with an “Server Error: sun.io.MalformedInputException”, even though I’ve specified the encoding to be UTF-8 (which should also be the default).
Is this a bug?
Hi serdar,
I have faced the same case in one of my interface. What i did is , I wrote a java service isASCII() which takes the string and search for any illegal character(if ch>128). You can write this simple java service and check.
One more alternative i have found is use encoding as “Cp1252”. I tried both and they worked. But I have chosen the first one.
Thanks for your response Balaram. I guess it is possible to do as you say and have a java service, but it’ll need to be checking each character, which could pose to be quite slow. Some of our CSVs can become large (MBs) so for now the CP1252 option is looking better. Thanks again.
ISO-8859 or UTF-8 are better encodings, as they cover a larger set of characters.
HTH
Bhavani Shankar