Bulk Loading of Data – Oracle


The ORACLE SQL* Loader is a facility that allows you to populate database tables from flat files. To use this facility, you need to prepare (i) your flat file(s), and (ii) a control file that tells ORACLE how to “map” fields of the flat file to columns of an RDBMS table. Furthermore, the data can also be appended at the end of the control file, making the process much simpler. Most probably, you will prepare one control file for each table that you want to bulk load to. Notice that preparing the data is part of your assignment. There are no “tools” for this. You either create them from some C program (example) that generates the data and outputs it into a file, or if you are lucky, you might find a web page that already has the data that you need with minimal text processing

SQL*Loader is a bulk loader utility used for moving data from external files into the Oracle database. Its syntax is similar to that of the DB2 load utility, but comes with more options. SQL*Loader supports various load formats, selective loading, and multi-table loads.

SQL*Loader (sqlldr) is the utility to use for high performance data loads. The data can be loaded from any text file and inserted into the database.

A simple control file has the following form:

LOAD DATA
INFILE <dataFile>
APPEND INTO TABLE <tableName>
FIELDS TERMINATED BY '<separator>'
(<list of all attribute names to load>)
  • <dataFile> is the name of the data file. If you did not give a file name extension for <dataFile>, Oracle will assume the default extension “.dat“. Therefore, it is a good idea to name every data file with an extension, and specify the complete file name with the extension.
  • <tableName>is the name of the table to which data will be loaded. Of course, it should have been created already before the bulk load operation.
  • The optional keyword APPEND says that data will be appended to <tableName>. If APPENDis omitted, the table must be empty before the bulk load operation or else an error will occur.
  • <separator>specifies the field separator for your data file. This can be any string. It is a good idea to use a string that you know will never appear in the data, so the separator will not be confused with data fields.
  • Finally, list the names of attributes of <tableName> that are set by your data file, separated by commas and enclosed in parentheses. This list need not be the complete list of attributes in the actual schema of the table, nor must it be arranged in the same order as the attributes when the table was created — sqlldr will match attributes to by their names in the table schema. Any attributes unspecified in the list of attributes will be set to NULL.

As a concrete example, here are the contents of a control file test.ctl:

LOAD DATA INFILE test.dat INTO TABLE DBA_TEST FIELDS TERMINATED BY '|' (i, s)

Each line in the data file specifies one tuple to be loaded into <tableName>. It lists, in order, values for the attributes in the list specified in the control file, separated by <separator>. As a concrete example, test.datmight look like:

1|foo
2|bar
3| baz

Recall that the attribute list of test specified in test.ctl is (i, s), where i has the type int, and s has the type char(10). As the result of loading test.dat, the following tuples are inserted into test:

(1, 'foo')
(2, 'bar')
(3, ' baz')

Some Notes of Warning

  • Note that the third line of test.dat has a blank after “|“. This blank is not ignored by the loader. The value to be loaded for attribute s is ' baz', a four-character string with a leading blank. It is a common mistake to assume that 'baz', a three-character string with no leading blank, will be loaded instead. This can lead to some very frustrating problems that you will not notice until you try to query your loaded data, because ' baz' and 'baz'are different strings.
  • Oracle literally considers every single line to be one tuple, even an empty line! When it tries to load data from an empty line, however, an error would occur and the tuple will be rejected. Some text editors love to add multiple newlines to the end of a file; if you see any strange errors in your .log file about tuples with all NULLcolumns, this may be the cause. It shouldn’t affect other tuples loaded.
  • If you are using a Microsoft text editor, such as MSWord, you will find that Bill Gates believes in ending lines with the sequence <CR> (carriage return) <LF> (line feed). The UNIX world uses only <LF>, so each <CR> becomes ^M, the last character of strings in your load file. That makes it impossible for you ever to match a stored string in an SQL query. Here’s how you remove ^M symbols from your file. Let’s say the file with ^M symbols is bad_myRel.dat. Then the following command will create myRel.dat without ^M symbols:
                 cat bad_myRel.dat | tr -d '15' > myRel.dat

    If you’re an emacs fan, type in the following sequence to modify your current buffer:

    ESC-x replace-string CTRL-q CTRL-m ENTER ENTER
  • The Oracle bulk loader is called sqlldr. It is a UNIX-level command, i.e., it should be issued directly from your UNIX shell, rather than within sqlplus. A bulk load command has the following form:
    sqlldr <yourName> control=<ctlFile> log=<logFile> bad=<badFile>
    Everything but sqlldr is optional -- you will be prompted for your username, password, and control file. <ctlFile> is the name of the control file. If no file name extension is provided, sqlldr will assume the default extension ".ctl". The name of the data file is not needed on the command line because it is specified within the control file. You may designate <logFile> as the log file. If no file name extension is provided, ".log" will be assumed. sqlldr will fill the log file with relevant information about the bulk load operation, such as the number of tuples loaded, and a description of errors that may have occurred. Finally, you may designate <badFile> as the file where bad tuples (any tuples for which an error occurs on an attempt to load them) are recorded (if they occur). Again, if no file extension is specified, Oracle uses ".bad". If no log file or bad file are specified, sqlldr will use the name of the control file with the .log and .badextensions, respectively.

    As a concrete example, if sally wishes to run the control file test.ctl and have the log output stored in test.log, then she should type

    sqlldr sally control=test.ctl log=test.log
  • It is possible to use just the control file to load data, instead of using a separate data file. Here is an example:
    LOAD DATA INFILE * INTO TABLE DBA_TEST FIELDS TERMINATED BY '|' (i, s) BEGINDATA 1|foo 2|bar 3| baz
    The trick is to specify "*" as the name of the data file, and use BEGINDATA to start the data section in the control file

DBAWorkDatabase support, Database services, Oracle ADF

Leave a comment