Get first and last line

What product/components do you use and which version/fix level?

WM 10.7

What are trying to achieve? Please descibe in detail.

Hi Experts,

I am trying to remove the file header and trailer and save the file. To achieve this, I am getting the first and last line of the file. I used tokenizer to get each line index and stringToFile to write new file content (without the header and trailer.)

Now, the code is working if I am only working for few lines (not more than 10). But when I am using a large file, it is getting formatted and will create 1-2 lines instead of the original 1 line per details.

I also tried using a Java service to split string but still getting the same result when using a large file.

Can you suggest a better way to get the header and trailer without destroying the file format? Original file format should be one line each data.

Do you get any error messages? Please provide a full error message screenshot and log file.

I am not getting and error message.

Thank you so much.

Assuming your schema configuration is correct, large file or not, tokenize or split should work for the delimiter, as expected.

If your single line is being broken into two/more, then -

  1. Check the integrity of the record separator (i.e., if it’s repeating within the record)

  2. Open the problematic content on Notepad++ and use the “Show all characters” function to inspect

  3. Although a possibility (albeit, remote), see if you’re transferring/processing a cross-OS file (i.e., Windows on Linux, vice versa, and so on)

Large File Handling - Having said that, consider using large file handling principles/features for your file, instead of loading the entire file in-memory; documentation is here (link)

There are large file handling articles in the forums; you can refer those.

KM

1 Like

Hi Michelle,

one thing which comes into mind for me is using WmFlatFile feature by specifying 3 type of record structures.
For further mapping/transformation and so on you will then only use the second structure and omit the first and third ones.

This will only work if any of these 3 lines is using a unique record format.

See Working with FlatFiles documentation for further information.
WmFlatFile also does have large file support.

Regards,
Holger

In additional to the useful items from @Venkata_Kasi_Viswanath_Mugada1 and @Holger_von_Thomsen, consider:

  • tokenize has specific behaviors that usually catch people by surprise. It is not appropriate for file parsing, IMO.
  • The flat-file services should do what you need. Avoid Java.
  • If flat-file is more than you need, then using Java may be okay. But the likely source of the trouble is how the data is being read. Can you share details about where the data is being read from?
    ** Is it from a local file? A remote put to your IS?
    ** Is the input being buffered?
    ** Is the file strictly text? Use the line-aware methods to read.
    ** Beware of buffered input – can end up with partial lines.
    ** Beware of large data – can exhaust memory and crash your JVM.

If you can share the service(s) you’re working on, that may be helpful.

1 Like

Additionally it might be helpful to share a sample of the file (format) which shows the different record structures and sample of the output were the data gets splitted false.

Regards,
Holger