Load data without defining Schema

Hi all,

Anyone has experience on loading mass data without defining Schema?

I want to use Tamino Data Loader to load a xml file in large file size.

However, I don’t have the schema and I just know the format of the file like follows:





I also know that before using Tamino Data Loader, I need to define collection and doctype for the database.

So I use Schema Editor to create a collection and a doctype and save it.

However, when I want to define the information to the database, it prompts out the following error code…
An error occured while processing a schema document; (mp-valid-doctype-name.3) For each doctype name where the doctype is not of type nonXML there must be a matching global element with the respective name

How can I set the collection and doctype to the database correctly so that I can load the data to it successfully?

Thanks

It sound a bit chaotic that you don’t know the doctypes ! :wink:
Tamino has a collection that will accept these unknown doctypes, it’s called ino:etc
If you load a document without specifying a collection it will end up in the ino:etc collection.

You specify ino:etc as the collection in your queries and use standard xpath/xquery syntax.

But !!! you won’t be needing indexin at a later stage ???

Finn

Hello there.

As Finn states, it is possible to store schema-less documents into ino:etc. But as he also mentions, you might find the need to create indexes at some point - which you cannot do without a schema (of course!).

Perhaps this schema will help:
   <?xml version = "1.0" encoding = "UTF-8"?>
   <xs:schema xmlns:xs=“XML Schema” xmlns:tsd=“http://namespaces.softwareag.com/tamino/TaminoSchemaDefinition”>
      xs:annotation
         xs:appinfo
            <tsd:schemaInfo name=“site”>
               <tsd:collection name=“site”></tsd:collection>
               <tsd:doctype name=“site”>
                  tsd:logical
                     tsd:contentopen</tsd:content>
                  </tsd:logical>
               </tsd:doctype>
            </tsd:schemaInfo>
         </xs:appinfo>
      </xs:annotation>
      <xs:element name=“site”>
         xs:complexType</xs:complexType>
      </xs:element>
   </xs:schema>

The schema is defined as “open content”, so you can store any XML with the rootnode “site” into it.
You can also evolve this schema as you progress, adding nodes & indexes when it becomes possible/necessary.

(If nothing else, perhaps you can open it in the Schema Editor and use it to help find the problem with your current schema!)

I hope that helps,
Trevor.

Hello all,
Thanks for all your great help. I can now create the schema as Trevor stated. But I find an error prompt when I use the Tamino Data Loader. Here is the log file …
ino:message
ino:messagelineInvalid input file format - ino:request missing</ino:messageline>
</ino:message>
ino:message
ino:messagelineMass load aborted</ino:messageline>

What else I have missed?

And my file is something like that…

<?xml version="1.0" standalone="yes"?>




United States
1
duteous nine eighteen
Creditcard



Do I need to change to this…

<?xml version="1.0" standalone="yes"?>
ino:request
ino:object




United States
1
duteous nine eighteen
Creditcard


</ino:object>
</ino:request>

However, I also can’t load the file into database with the following error…
<ino:message ino:returnvalue=“9286”><ino:messagetext ino:code=“INOXYE9286”>Internal database error has occurred</ino:messagetext></ino:message>
ino:message
ino:messagelineCOMMIT failed - mass load aborted</ino:messageline>

What’s that? Thanks a lot.

[This message was edited by klwong on 04 Oct 2002 at 03:03.]

Hello again.

If I understand correctly, you have already found the correct format for the load file:
   <?xml version="1.0" standalone="yes"?>
   ino:request
   ino:object
   
   
   
   
   United    States
   1
   duteous nine eighteen    
   Creditcard
   …
   
   </ino:object>
   </ino:request>

Perhaps the last error is caused by something special about your file…
How large is it?
Could you perhaps post it to the forum?

Cheers,
Trevor.

It is about 57M in size …

and I find the data of my Database space copying from Tamino Manager

index space 1 50.00 MB default
data space 1 100.00 MB default
journal space 1 200.00 MB default
temporary working space 1 7.00 MB default
log space 1 initial 2 172.56 KB default
backup space 1 initial 1.46 MB default


How can I know what is the internal error of the DB?

Hello there.

I’m afraid that I’m not very experienced with the data loaders, and I don’t have a 57MB file to test with!

So I had a look through the previous threads, and found some which might help:
   mass loader - internal database error

   Resize Temporary Working Space

   inoxmld commit error

The last link seems to be the most useful - though perhaps the solution of upgrading to Tamino 3.1.2 is not applicable.

With which version of Tamino are you experiencing this error?

Thanks,
Trevor.

Tamino has a default timeout of 300 seconds.
If the load takes more than that I guess you will get exactly that error because you try to commit a transaction that has timeout.
Go into the manager and increase this parameter.
You should also consider increasing the bufferpoolsize, while doing the mass load.
Finn

The version I now used is 3.1.1.3

As the orginal data file (57M in size) is not success to load into Tamino. I use another data file (only 34K) which is in the same format and use different loaders to load into Tamino database…

1. Tamino Data Loader (Actually this is my favor because I need to count the time usage)

It is still not successful to load the file with ‘internal database error’ …

Why does it cannot handle such small data file neither?

2. Java Data Loader
It is not successful … Here is my command line: (where Javaloader.jar and xercers.jar have installed in the specified directory)

java -classpath “/usr/local/tamino-3.1.1.3/ino/v3113/X_Tools/Tamino_Load/JavaLoader.jar;/usr/local/tamino-3.1.1.3/ino/v3113/X_Tools/Tamino_Load/xerces.jar” com.softwareag.tamino.db.tools.loader.TaminoLoad -u http://137.189.94.97/tamino/compareOrientX/site -d

With the following error:Exception in thread “main” java.lang.NoClassDefFoundError: com/softwareag/tamino/db/tools/loader/TaminoLoad

What’s the problem?

3. Use Interactive Interface to load the data file.

It is successful to load the small data file but not successful with orginal 57M data file…Is it related to Timeout in apache conf?

[This message was edited by klwong on 08 Oct 2002 at 08:07.]

I think your three problems have very different causes.
1 I think you might have a small problem with the special format for the inoxmld. Could you perhaps attach the 34K example?
2 The problem here is clearly your classpath. In my installation there is a inojload.cmd file that shows how to use the javaloader.
3 If you try to use the interactive interface to load a 57Mb file you’ll definately run into problems, because Tamino will consider this one transaction that will need to be handled in the journalfile, and within the timeout-settings og both Tamino and the webserver.

A trick mught be to use inoxmld to unload the document you succesfully loaded using the interactive interface. Then try to use the output as input for inoxmld.
There is a good chance that this will work !
Then you have a format to “brew” on for the full 57Mb load.
Finn

I’ve tried to load the file (0.0001size.xml) using Interactive Interface and unload the data as output (out.xml).

And use this out.xml as input to load into Tamino using inoxmld. However, ‘internal database error’ still occurs.

Then I use the Tamino Manager to load the out.xml (Data Loader). However, 'Open of Load file (/usr/local/tamino-3.1.1.3/ino/db/AAC000010000002103) has failed due to operating system error 17 (File exists) '

So, up to now, I still cannot load the file successfully …:frowning:
0.0001size.xml (33.7 KB)

My another file , out.xml
Actually, the difference bewteen 0.0001size.xml is that, only
ino:request
ino:object
tags pair are added

ie.
<?xml version="1.0" encoding="utf-8" ?>
ino:request
ino:object



2

</closed_auction>
</closed_auctions>

</ino:object>
</ino:request>

[This message was edited by klwong on 09 Oct 2002 at 03:00.]
out.xml (33.8 KB)

I just tried with your “site” document on my machine, created a DTD using XML-spy, imported that into the schema editor, defined this schema in Tamino.
I the loaded the doc using first Interactive interface, then unloaded this using inoxmld and loaded the output back again using inoxmld.
(load time 20 seconds on my slow/low-ram machine :wink: this can definately be speeded up using a bigger buffer.
I suddenly came to think of what might be the difference: I’m using win2K - are you using Linux or some other unix flavour ?
Finn

Hi klwong,

I would strongly recommend to upgrade to 3.1.1.4 with a recent hotfix or to 3.1.2. Your problem with mass load (and more) is fixed there.

Regards,
Hermann