Joinutility seperatorLogin utility separator Infobright.com

Infobright Blog

30
Mar

Unscripted

David Lutz's photo
by David Lutz     Mon, Mar 30, 2009

Do you have lots of files to load?

Are they in different directories?

With different extensions?

With mixed upper- and lower-case names?

With naming conventions that don’t follow the tables’ names?

Do you want to maintain parallel load processes without custom programming or scripting?

 

Do you prefer not to script each load or explicitly enter it on the command line?

 

Introducing the ParaFlex Loader Utility from Infobright’s Client Services team!  (Click here - http://www.infobright.org/Downloads/Contributed-Software/ - to be taken to the Download page.)

 

Our Client Services team experienced the same issues you have and evolved simple shell scripts to create a flexible tool that reads from a single control file of file/table pairs, writes the LOAD DATA INFILE statement for you, saves it, and even allows you to execute the loads in any degree of parallelism you choose.  Bear in mind this is a bash shell script so you are free to edit it as you see fit or need.

 

Original description

  • A single load script that accepts a parameter for the number of parallel processes one might want running for a given load configuration file.
  • One could fire up X processes of the load utility that read from the same queue and each would process the “next” data file/table in the list.
  • When a load process is finished, each process goes back to the stack for the next pair to load.  All X number of processes are reading from the same queue.

 

Purposes

  • Eliminate need to write any LOAD scripts
  • Allow parallel execution of LOAD streams
  • Allow constant parallelism of LOAD streams


Benefits

  • A script – execute from command line, scripts, ETL
  • Single control file, single execution command
  • Can also generate SQL LOAD script
  • Maximum, flexible parallelism of data loading

Additonally, there is a “zero parallelism” option that simple creates the script file for manual use or further editing.  (All exections, regardless of level of parallelism, generate a scripted load file in the user’s $HOME directory.)

 

Issues remaining to be addressed

  • Special case of 1 “parallel” process
    • a “single threaded” use case is currently not supported
  • Script assumes ‘mysql-ib’ executable      

    • as created during Infobright installation
    • syntax with mysql command
      • early versions of mysql-ib script did not properly process <—execute> or <-e> parameters so commands are echoed through a pipe
      • this will be resoilved in version 3.1.1 to support traditional SQL statement execution parameters
  • Add delimiters and enclosure characters to control file      

    • script assumes ‘|’ (pipe bar) delimiters and NULL enclosure characters
    • further enhancement could be to add these as command parameters or include them in the control file
    • script can be edited for other choices
  • No current reference to @bh_dataformat      

    • impacts Infobright Enterprise Edition only
    • can easily be added
  • No current check for multiple occurrences of the same table more than once in control file      

    • Infobright does not support parallel load processes of the same table
    • LOADs issue table-level locks

 

 

Infobright     Tags: data, download, etl, load

29
Mar

ICDE 2009: Introduction

Dominik Slezak's photo
by Dominik Slezak     Sun, Mar 29, 2009

30th of March, 1:18 AM, Shanghai. Roughly seven hours left to the opening. I was late to the workshops but there are four days of ICDE 2009 ahead of me! I’ll enjoy every single talk. On Monday, in particular, I’ll visit Industrial Session 1 and then switch to (the rest of) the seminar on Large Graph Mining.

 

I guess I’ll have a lot to write about. Instead of putting everything to the blog, I’ll rather use a new forum thread for real-time updates. I invite everyone (of course not only ICDE participants!) to add a few words there. Any presentations worth attending? Any topics that you would like me to focus on?

 

Best greetings,

 

Dominik

Infobright     Tags:

Previous Page   Next Page