What product/components do you use and which version/fix level?
Hi,
We are on WM 10.3, we want to treat a flat file in the fields are separated with “||” two pipes, example : ORDER||1001||Laptop||5||MC Computers||9923222131
But we can’t create a dictionary for this type of delimiter, have you any idea how to resolve this please ?
Regards,
Are you using a free trial or a product with a customer license?
What are trying to achieve? Please describe in detail.
Do you get any error messages? Please provide a full error message screenshot and log file.
Review Chapter 1 in this guide on page 9 (link), which explains the purpose of a Schema vs Dictionary.
I haven’t encountered/tested a 2-character separator. Did you try adding || as the field delimiter in the Schema or as an input parameter to the pub.flatFile:convertToValues service that you invoke for parsing? I think that should work.
Workaround - If it doesn’t, then you can use pub.string:replace to replace || with a different delimiter (such as ~) and then parse the file. This is a terrible workaround, so I don’t recommend this, but I was forced to use this once.
Hi Kasi,
Thanks for your response.
Yes I tried to add || but it generated an error “please specify a single character”.
I through of the replace service but it won’t be clean.
Yassine
Do they offer rationale for using a multi-character delimiter? Can they use a single char?
It can be useful do understand why this is being used so that alternatives might be explored. The rationale likely is “the ‘||’ is used as it is not expected to ever exist in data values” but there are other ways to address that, such as using a different single-char field delimiter that also is not expected to be in data values. Before implementing a workaround might be useful to understand what is behind the use of that delimiter.
Edit: It is advisable to avoid and and all “pre-” or “post-processing” activities in an attempt to coerce the format and account for variations. Strive to keep it simple. Of course, you’ll be limited by what the target system can do. RFC 4180 - Common Format and MIME Type for Comma-Separated Values (CSV) Files is a great reference for how to structure delimited files and support all data values.
Hi,
Thanks guys for your feedback.
Unfortunately, the target application does not want to change their double pipe format as it is a big impact for them and they expect an output format as follows : FR||AS||1||LIX||110||1||2021-11-01T08:43:28||3090602017600||0||890||123456789
I think the way to do it, it’s to replace a double pipe with a single pipe and do the processing.
Yassine
This could go sideways if you’re dealing with heavy volumes coupled with large files, so be sure to follow the usual guidelines -
Use the “Scope” property to limit the pipeline replications/handoffs of the file content (transformers are another option, but use what suits your design best) to child services
Explicitly cleanup obsolete variables at appropriate places throughout the flow, and not just at the end
If you are dealing with large files, then follow large-file handling principles, test for performance and benchmark the throughputs
Gauge and make necessary resource (cores, heap) adjustments, based on #3
How will you handle the case where | is in the data?
Another possibility, though a bit ugly, is to define dummy fields in the FF schema that never have data in them. This will cause the FF generation to create empty fields – resulting in || between each “real” field.
Ugly. But perhaps more acceptable than a replace ‘|’ with “||” that could go astray depending upon data content.
I’d be interested in learning more about what that impact might be. Delimiters should be changeable without much fanfare. We’re back to wondering why they did this in the first place.