I need to handle a custom data whose structure is something like, the individual fields start with a custom delimiter(mix of some alphabets and characters) and end with a new line feed followed by a character for some and just line feed for some. Data is enclosed between these custom delimiters. And some fields doesn’t occur once, it may occur from 0 to ‘n’ number of times. Kindly guide on defining a schema and read the input file which comes in with a structure like this with start and end delimiters.
Hi Venkat,
Just in a high level, below are the steps. Would suggest you to start with by reading the flat file schema guide, and let us know if you face any challenge.
Create flatfile dictionary
a. Create 3 record definition
b. Create field definitions inside each record
Create flatfile schema
a. Your data has delimiters at the start. So define schema by referencing the dictionary definitions, and provide the value TEXT, TEXT2 and TEXT3.
b. For TEXT2, you need to set the max elements to ‘unbounded’
Senthil, Thank you for your suggestion. Can you please elaborate on this? I created the flat file dictionary with the record definition for each TEXT,TEXT2,TEXT3 etc. and field definitions inside(of the same name).
What would be the values required in Record parser, Record identifier etc.?
Where do I mention the custom delimiters?
And where can I find the max elements?
I managed to create records and fields using above suggestion. What if the field delimiter comes in the value of the field also? How to avoid that becoming a new field?
Eg. Name ;Firstname;Lastname
My field delimiter here is ;
How to take both Firstname;Lastname as a single value in a field? Without having to add another field.
There is something called ‘escape character’. Escape character should be used to differentiate whether a delimiter is a data or a real delimiter. System doesn’t have that intelligence, and we need to feed it to understand, how to differentiate them…
In your example, Name ;Firstname;Lastname should be received as Name ;Firstname;Lastname where \ acts as a escape character…
I’m familiar with the concept of escape character. The way of feeding it is the question here as no changes will be made in the input file. I understand it like scanning the entire input file as received and placing an escape character in places wherever required before parsing. Each and every record needs to be checked and places in the value identified as non-delimiter characters and placing the escape character before them, then parsing looks more complicated to me. Kindly correct if my understanding is wrong.
You are correct. But this is what is called as flat fie. Unlike xml or json, Flat file carries NO structure with it. This is termed as “delimiter collision”
As I said, we need to feed the software (be it any tool or product or program) with info how to parse the data. In a fixed length file, which is a starting field, ending field etc.,