What product/components do you use and which version/fix level?
We are on WM 10.3, we want to treat a flat file in the fields are separated with “||” two pipes, example : ORDER||1001||Laptop||5||MC Computers||9923222131
But we can’t create a dictionary for this type of delimiter, have you any idea how to resolve this please ?
Are you using a free trial or a product with a customer license?
What are trying to achieve? Please describe in detail.
Do you get any error messages? Please provide a full error message screenshot and log file.
Review Chapter 1 in this guide on page 9 (link), which explains the purpose of a Schema vs Dictionary.
I haven’t encountered/tested a 2-character separator. Did you try adding || as the field delimiter in the Schema or as an input parameter to the pub.flatFile:convertToValues service that you invoke for parsing? I think that should work.
Workaround - If it doesn’t, then you can use pub.string:replace to replace || with a different delimiter (such as ~) and then parse the file. This is a terrible workaround, so I don’t recommend this, but I was forced to use this once.
Using a multi-character field delimiter is unusual. What it prompting the use of this?
Thanks for your response.
Yes I tried to add || but it generated an error “please specify a single character”.
I through of the replace service but it won’t be clean.
It’s the message format expected by the third-party application, a string message with fields separated by ||.
Update - I fully support Rob’s response, below; ask why, before how.
The only other way I can think of, is to set | (single pipe) as the delimiter and then validate/ignore/remove the blank/null fields.
I can’t recall if there was an option to treat consecutive delimiters as one (never mind, this was on Excel)
Do they offer rationale for using a multi-character delimiter? Can they use a single char?
It can be useful do understand why this is being used so that alternatives might be explored. The rationale likely is “the ‘||’ is used as it is not expected to ever exist in data values” but there are other ways to address that, such as using a different single-char field delimiter that also is not expected to be in data values. Before implementing a workaround might be useful to understand what is behind the use of that delimiter.
Edit: It is advisable to avoid and and all “pre-” or “post-processing” activities in an attempt to coerce the format and account for variations. Strive to keep it simple. Of course, you’ll be limited by what the target system can do. rfc4180 is a great reference for how to structure delimited files and support all data values.
Thanks guys for your feedback.
Unfortunately, the target application does not want to change their double pipe format as it is a big impact for them and they expect an output format as follows : FR||AS||1||LIX||110||1||2021-11-01T08:43:28||3090602017600||0||890||123456789
I think the way to do it, it’s to replace a double pipe with a single pipe and do the processing.
Yep, that’ll do the trick.
This could go sideways if you’re dealing with heavy volumes coupled with large files, so be sure to follow the usual guidelines -
- Use the “Scope” property to limit the pipeline replications/handoffs of the file content (transformers are another option, but use what suits your design best) to child services
- Explicitly cleanup obsolete variables at appropriate places throughout the flow, and not just at the end
- If you are dealing with large files, then follow large-file handling principles, test for performance and benchmark the throughputs
- Gauge and make necessary resource (cores, heap) adjustments, based on #3
Thank you so much for these tips.
How will you handle the case where | is in the data?
Another possibility, though a bit ugly, is to define dummy fields in the FF schema that never have data in them. This will cause the FF generation to create empty fields – resulting in || between each “real” field.
Ugly. But perhaps more acceptable than a replace ‘|’ with “||” that could go astray depending upon data content.
I’d be interested in learning more about what that impact might be. Delimiters should be changeable without much fanfare. We’re back to wondering why they did this in the first place.
This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.