Joinutility seperatorLogin utility separator Infobright.com

Infobright Blog

29
May

Rejected! Controlling data row errors in ICE/IEE

craigtrombly's photo
by craigtrombly     Tue, May 29, 2012

Infobright is committed to making new software releases and features more usable and friendlier. A notable feature in the current version (4.0.6) is the abililty to use a reject file during data loads. Previously, if there was a row-level error, the entire process would return a failure and not commit. The new feature gives you more control over row-level data errors during the load process by allowing you to set a few options beforehand.

Let's take a quick look at how to use this feature as described in the User's Guide:

There are primarily two different ways to use this feature. The first way is to set the path for the file using the @BH_REJECT_FILE_PATH variable with @BH_ABORT_ON_COUNT


/** when the number of rows rejected reaches 12 (Like the Cub's recent losing streak), abort process **/
set @BH_REJECT_FILE_PATH = '/tmp/reject_file';
set @BH_ABORT_ON_COUNT = 12;
load data infile DATAFILE.csv into table T;

To tell the loader to never abort, simply set the @BH_ABORT_ON_COUNT = -1. To use a percentage of row errors instead of a count, use the @BH_ABORT_ON_THRESHOLD variable like:


/** if 3% of the number of rows error, then abort the commit process **/
set @BH_REJECT_FILE_PATH = '/tmp/reject_file';
set @BH_ABORT_ON_THRESHOLD = 0.03;
load data infile DATAFILE.csv into table T;

Then to turn this feature off:


/** Disable the reject file feature **/
set @BH_REJECT_FILE_PATH = NULL;
set @BH_ABORT_ON_COUNT = NULL;

Something to note during the data loading process, please be aware of the differences between empty values and null values. The only way to use a null value during import is to specify the word NULL in all caps or '\N'. This differs from a space ' ' or even '' or "". You should always use quote encoding on the columns in CSV files to ensure that commas will not impede your data load. Using TAB or '\t' delimiters are often less troublesome in your data files, however this is just my preference, just be aware of your settings. What happens to the data row that errors? That entire row is passed as it is directly into the reject file. This enables the user to correct the issues in the reject file and then LOAD that file after each row is corrected. Ninety percent of the time, visually looking at a data row will allow someone to determine what the problem is, especially when this is a process that is repetitive. Features like this enable users to be more efficient and greatly enhance the user's experience.

Just another example of working smarter, not harder.

Infobright     Tags:

15
May

New open source code – A collection of examples.

craigtrombly's photo
by craigtrombly     Tue, May 15, 2012

****Download now available here****Watch a video that explains them****

In a concentrated effort to improve upon our users' experience in working with Infobright Community Edition, a new open source software download will be released shortly.  This download will be released under the MIT license and will be supported through the Community.  It will include sample projects written in C++, C#, Java and PHP, with future projects in Ruby and Silverlight.  These example projects have been organized by me and coded by interns working with Infobright from the University of Illinois.
 
The intern program was designed to bring in computer science and computer engineering students and give them an opportunity to work hands on with Community-driven software projects.  The transition from classroom to the real world can often be a challenging experience for young developers.  Adoption into the organization's community, adapting to the corporate infrastructure and the pressure to deliver quality code are just a few of the challenges that students face.
 
One of my personal goals in working with interns is to help instill a sense of confidence and humility within their professional manner.  The confidence factor is key for allowing developers to make decisions and the humility factor is essential in giving the developer a sense of possibility in making decisions.  These two traits combined gives young engineers the ability to be very effective in a industry that quickly changes and moves at the speed of research.
 
All of the projects within this single download are examples of how to connect and work with Infobright.  Each of these sample projects have well written comments and readme files for each language and should allow for the quick development to testing process.  One of the goals of this open source release was to provide a boilerplate project for each language that can be utilized as a "starting point" for individualized projects.  By assisting in this starting process, we are helping software developers to focus primarily on their business logic and architecture needs.
 
This software will be released to the public and available for download starting Friday, May 18th from the Contributed Software page on http://www.infobright.org. This release will be updated from time to time with additional projects supporting new architectures and platforms.  The next release will contain additional projects written in Ruby as well as Silverlight using various development and design patterns. 
 
Infobright applauds all of the interns that are participating in our intern program and we look forward to supporting the Open Source Community with further contributed software releases.

Infobright     Tags: download, open+source

Next Page