<?xml version="1.0" encoding="utf-8" ?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">

    <title type="text">Infobright.org Forums</title>
    <link rel="alternate" type="text/html" href="http://www.infobright.org/Forums/" />
    <link rel="self" type="application/atom+xml" href="http://www.infobright.org/Forums/atom/" />
    <updated></updated>
    <rights>Copyright (c) 2010</rights>
    <generator uri="http://expressionengine.com/" version="1.6.7">ExpressionEngine</generator>
    <id>tag:infobright.org,2010:07:27</id>


    <entry>
      <title>Best ETL tool for Infobright</title>
      <link rel="alternate" type="text/html" href="http://www.infobright.org/Forums/viewthread/1635/" />      
      <id>tag:infobright.org,2010:Forums/viewthread/.1635</id>
      <published>2010-07-05T19:01:15Z</published>
      <updated></updated>
      <author><name>Saravana7</name></author>
      <content type="html">
      <![CDATA[
        <p>Hi All,</p>

<p>Can you suggest me the <b>BEST</b> ETL tool which can be used to load data from SQL Server/DB2/Oracle (in scheduled manner) into Infobright?
</p>
      ]]>
      </content>
    </entry>

    <entry>
      <title>SELECT INTO OUTFILE / LOAD DATA INFILE  time zone issue</title>
      <link rel="alternate" type="text/html" href="http://www.infobright.org/Forums/viewthread/1652/" />      
      <id>tag:infobright.org,2010:Forums/viewthread/.1652</id>
      <published>2010-07-20T15:14:04Z</published>
      <updated></updated>
      <author><name>Nadir</name></author>
      <content type="html">
      <![CDATA[
        <p>Hello.</p>

<p>We are using ICE for data warehousing. <br />
I need some help explaining this:<br />
SELECT INTO OUTFILE on ICE seems to transform timestamp columns to UTC. But LOAD DATA INFILE&#8230; I cannot say what its supose to do.. It should convert from UTC to client time zone, but does not look to be doing it.</p>

<p>I will try to reproduce it with a simple example.</p>

<div class="codeblock"><code><span style="color: #000000">
<span style="color: #0000BB">SET&nbsp;</span><span style="color: #007700">@</span><span style="color: #0000BB">bh_dataformat&nbsp;</span><span style="color: #007700">=&nbsp;</span><span style="color: #DD0000">'txt_variable'</span><span style="color: #007700">;<br /></span><span style="color: #0000BB">SHOW&nbsp;</span><span style="color: #007700">GLOBAL&nbsp;</span><span style="color: #0000BB">VARIABLES&nbsp;like&nbsp;</span><span style="color: #DD0000">'%time_zone%'</span><span style="color: #007700">;<br />+------------------+---------------+<br />|&nbsp;</span><span style="color: #0000BB">Variable_name&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #007700">|&nbsp;</span><span style="color: #0000BB">Value&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #007700">|<br />+------------------+---------------+<br />|&nbsp;</span><span style="color: #0000BB">system_time_zone&nbsp;</span><span style="color: #007700">|&nbsp;</span><span style="color: #0000BB">CEST&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #007700">|<br />|&nbsp;</span><span style="color: #0000BB">time_zone&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #007700">|&nbsp;</span><span style="color: #0000BB">Europe</span><span style="color: #007700">/</span><span style="color: #0000BB">Madrid&nbsp;</span><span style="color: #007700">|<br />+------------------+---------------+<br /><br /></span><span style="color: #0000BB">SELECT&nbsp;now</span><span style="color: #007700">()&nbsp;</span><span style="color: #0000BB">INTO&nbsp;OUTFILE&nbsp;</span><span style="color: #DD0000">'/tmp/time.csv'</span><span style="color: #007700">;<br /></span><span style="color: #0000BB">SELECT&nbsp;now</span><span style="color: #007700">();<br />+---------------------+<br />|&nbsp;</span><span style="color: #0000BB">now</span><span style="color: #007700">()&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;|<br />+---------------------+<br />|&nbsp;</span><span style="color: #0000BB">2010</span><span style="color: #007700">-</span><span style="color: #0000BB">07</span><span style="color: #007700">-</span><span style="color: #0000BB">20&nbsp;21</span><span style="color: #007700">:</span><span style="color: #0000BB">13</span><span style="color: #007700">:</span><span style="color: #0000BB">08&nbsp;</span><span style="color: #007700">|<br />+---------------------+<br /><br /></span><span style="color: #0000BB">CREATE&nbsp;TABLE&nbsp;</span><span style="color: #007700">`</span><span style="color: #0000BB">timezoneTest</span><span style="color: #007700">`&nbsp;(<br />&nbsp;&nbsp;`</span><span style="color: #0000BB">col1</span><span style="color: #007700">`&nbsp;</span><span style="color: #0000BB">timestamp&nbsp;NOT&nbsp;NULL&nbsp;</span><span style="color: #007700">DEFAULT&nbsp;</span><span style="color: #0000BB">CURRENT_TIMESTAMP<br /></span><span style="color: #007700">)&nbsp;</span><span style="color: #0000BB">ENGINE</span><span style="color: #007700">=</span><span style="color: #0000BB">BRIGHTHOUSE</span><span style="color: #007700">;&nbsp;</span>
</span>
</code></div>

<p>shell$ cat /tmp/time.csv<br />
2010-07-20 21:13:04</p>

<p>Everything looks normal until here.
</p><div class="codeblock"><code><span style="color: #000000">
<span style="color: #0000BB">LOAD&nbsp;DATA&nbsp;INFILE&nbsp;</span><span style="color: #DD0000">'/tmp/time.csv'&nbsp;</span><span style="color: #0000BB">INTO&nbsp;TABLE&nbsp;timezoneTest</span><span style="color: #007700">;<br /></span><span style="color: #0000BB">SELECT&nbsp;</span><span style="color: #007700">*&nbsp;</span><span style="color: #0000BB">FROM&nbsp;timezoneTest</span><span style="color: #007700">;<br />+---------------------+<br />|&nbsp;</span><span style="color: #0000BB">col1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #007700">|<br />+---------------------+<br />|&nbsp;</span><span style="color: #0000BB">2010</span><span style="color: #007700">-</span><span style="color: #0000BB">07</span><span style="color: #007700">-</span><span style="color: #0000BB">20&nbsp;22</span><span style="color: #007700">:</span><span style="color: #0000BB">13</span><span style="color: #007700">:</span><span style="color: #0000BB">04&nbsp;</span><span style="color: #007700">|<br />+---------------------+&nbsp;</span>
</span>
</code></div>

<p>Ok, why +1 hour?</p>

<p>And now, the strangest thing:</p>

<div class="codeblock"><code><span style="color: #000000">
<span style="color: #0000BB">SELECT&nbsp;</span><span style="color: #007700">*&nbsp;</span><span style="color: #0000BB">FROM&nbsp;timezoneTest&nbsp;INTO&nbsp;OUTFILE&nbsp;</span><span style="color: #DD0000">'/tmp/time_loadInfile.csv'</span><span style="color: #007700">;&nbsp;</span>
</span>
</code></div><p>
shell$ cat /tmp/time_loadInfile.csv <br />
2010-07-20 20:13:04<br />
(2 hours less, this could be reasonable. If infobright loader converts to UTC on SELECT INTO OUTFILE as Europe/Madrid is +2 on summer time)
</p><div class="codeblock"><code><span style="color: #000000">
<span style="color: #0000BB">LOAD&nbsp;DATA&nbsp;INFILE&nbsp;</span><span style="color: #DD0000">'/tmp/time_loadInfile.csv'&nbsp;</span><span style="color: #0000BB">INTO&nbsp;TABLE&nbsp;timezoneTest</span><span style="color: #007700">;<br /><br /></span><span style="color: #0000BB">SELECT&nbsp;</span><span style="color: #007700">*&nbsp;</span><span style="color: #0000BB">FROM&nbsp;timezoneTest</span><span style="color: #007700">;<br />+---------------------+<br />|&nbsp;</span><span style="color: #0000BB">col1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #007700">|<br />+---------------------+<br />|&nbsp;</span><span style="color: #0000BB">2010</span><span style="color: #007700">-</span><span style="color: #0000BB">07</span><span style="color: #007700">-</span><span style="color: #0000BB">20&nbsp;22</span><span style="color: #007700">:</span><span style="color: #0000BB">13</span><span style="color: #007700">:</span><span style="color: #0000BB">04&nbsp;</span><span style="color: #007700">|<br />|&nbsp;</span><span style="color: #0000BB">2010</span><span style="color: #007700">-</span><span style="color: #0000BB">07</span><span style="color: #007700">-</span><span style="color: #0000BB">20&nbsp;21</span><span style="color: #007700">:</span><span style="color: #0000BB">13</span><span style="color: #007700">:</span><span style="color: #0000BB">04&nbsp;</span><span style="color: #007700">|<br />+---------------------+&nbsp;</span>
</span>
</code></div>

<p>Ok, why is happening this? I mean. If I SELECT INTO OUTFILE and then LOAD DATA INFILE, shouldn&#8217;t I obtain the same result?????</p>

<p>Obviously I missed something&#8230;</p>

<p>If this same test is done with @bh_dataformat = &#8216;mysql&#8217;, then behaviour is as expected, SELECT INTO OUTFILE  + LOAD DATA INFILE returns the same timestamp than original from table.</p>

<p><br />
Thanks a lot for the future answers! I&#8217;m just a bit stucked&#8230; <img src="http://www.infobright.org/images/smileys/wink.gif" width="19" height="19" alt="wink" style="border:0;" />
</p>
      ]]>
      </content>
    </entry>

    <entry>
      <title>Infobright Uknown Error on Load</title>
      <link rel="alternate" type="text/html" href="http://www.infobright.org/Forums/viewthread/1598/" />      
      <id>tag:infobright.org,2010:Forums/viewthread/.1598</id>
      <published>2010-06-25T13:27:01Z</published>
      <updated></updated>
      <author><name>aceeca</name></author>
      <content type="html">
      <![CDATA[
        <p>Hello:</p>

<p>We are running the community version of Infobright - IB_3.3.1_r6997_7017(ice). We have this issue during bulk load of a certain table. The error we get is:</p>

<p>ERROR 1402 (XA100) at line 1: The Infobright storage engine has encountered an unexpected error. The current transaction has been rolled back. For details on the error please see the brighthouse.log file in<br />
the /mnt/infobright-3.3.1-x86_64/data/ directory.</p>

<p>Now looking at the logs did not provide any clues. All it says in the logs is:</p>

<p>2010-06-24 21:43:39 The Infobright storage engine has encountered an unexpected error. The current transaction has been rolled back.<br />
2010-06-24 21:43:39 Error: Unknown error.</p>

<p>Now, to narrow down the issue, what I did was break the large file into smaller junks of 10K lines and then try to load them. Doing this&#8212;loading the smaller chunks of 10K each succeeded.</p>

<p><br />
Does any one know what maybe causing this issue?</p>

<p>TIA,<br />
M.
</p>
      ]]>
      </content>
    </entry>

    <entry>
      <title>data load vs. query performance</title>
      <link rel="alternate" type="text/html" href="http://www.infobright.org/Forums/viewthread/1590/" />      
      <id>tag:infobright.org,2010:Forums/viewthread/.1590</id>
      <published>2010-06-17T08:16:30Z</published>
      <updated>2010-06-17T08:17:23Z</updated>
      <author><name>statgen</name></author>
      <content type="html">
      <![CDATA[
        <p>Hi !</p>

<p>I just installed ICE and am doing some testing. My testing data is two sets of 25 files. Each file has +/- 1 million line. Each 25-file set in loaded in one table. So in the end, I have two tables, each with +/- 25 million rows. I almost have a 1-1 relationship between both tables but not quite. Otherwise, I would load all data in a single table.</p>

<p>I can load the data in two different ways. 1) I call the bulk loader 25 times for each table or 2) I combine all 25 files into 1 big file and call the bulk loader once for each table.</p>

<p>Here is the problem: query performance is dramatically worse in scenario 1 than in scenario 2. Why ?</p>

<p>Possible solutions would be : a) always &#8220;pool&#8221; data together before loading. However, this is not always possible. b) load data in chunks and then &#8220;optimize&#8221; the table. However, I could not find any optimizing tools for engine=brighthouse tables. c) another solution unknown to me as of yet.</p>

<p>I searched the different ICE forums, but could not find a definitive answer to this matter.</p>

<p>Any and all help highly appreciated !</p>

<p>Martin.</p>

<p>PS If relevant: testing data set is 25 million rows. Production data sets will be in billions of rows (notice &#8220;data sets&#8221; is plural).
</p>
      ]]>
      </content>
    </entry>

    <entry>
      <title>About data integration and real time</title>
      <link rel="alternate" type="text/html" href="http://www.infobright.org/Forums/viewthread/1589/" />      
      <id>tag:infobright.org,2010:Forums/viewthread/.1589</id>
      <published>2010-06-16T11:14:34Z</published>
      <updated></updated>
      <author><name>qrmikwen</name></author>
      <content type="html">
      <![CDATA[
        <p>Hello,<br />
 
Thanks for all of the previous answers. We are now needing a real time data integration program.<br />
We have not found the right software yet.&nbsp; </p>

<p>We will be using the tool for migration of contacts and data in real time. </p>

<p>Thanks!
</p>
      ]]>
      </content>
    </entry>

    <entry>
      <title>Software and data management</title>
      <link rel="alternate" type="text/html" href="http://www.infobright.org/Forums/viewthread/1575/" />      
      <id>tag:infobright.org,2010:Forums/viewthread/.1575</id>
      <published>2010-06-02T09:53:11Z</published>
      <updated></updated>
      <author><name>qrmikwen</name></author>
      <content type="html">
      <![CDATA[
        <p>Hello,</p>

<p>Thanks for all of the good help on the forum. <br />
We have been interested in different open source software for data quality and we would be interested in a similar field. </p>

<p>We are wanting to implement Master Data Management software. It would be to complete our existing software. Do you know about open source MDM ? </p>

<p>Thanks a lot.
</p>
      ]]>
      </content>
    </entry>

    <entry>
      <title>SQL Server replication into Infobright/MySQL</title>
      <link rel="alternate" type="text/html" href="http://www.infobright.org/Forums/viewthread/1562/" />      
      <id>tag:infobright.org,2010:Forums/viewthread/.1562</id>
      <published>2010-05-26T07:40:55Z</published>
      <updated></updated>
      <author><name>Andy Stephenson</name></author>
      <content type="html">
      <![CDATA[
        <p>Hi </p>

<p>Has anyone any knowledge of replicating data from SQL Server (2000 or 2005) into InfoBright/MySQL?&nbsp; I&#8217;m looking into methods of doing real(ish)-time updates of changing data into Infobright.&nbsp; I&#8217;ve seen it was possible, with a bit of poking, to get SQL2000 to replicate into MySQL 3.51, but haven&#8217;t seen any recent articles on the subject.</p>

<p>Andy
</p>
      ]]>
      </content>
    </entry>

    <entry>
      <title>ETL Connectors</title>
      <link rel="alternate" type="text/html" href="http://www.infobright.org/Forums/viewthread/99/" />      
      <id>tag:infobright.org,2008:Forums/viewthread/.99</id>
      <published>2008-08-29T12:27:16Z</published>
      <updated></updated>
      <author><name>John Kemp</name></author>
      <content type="html">
      <![CDATA[
        <p><i>This is a common question that we’ve encountered in the past during our PoC process.<br />
</i><br />
<b>Question:</b></p>

<p>Do you provide custom ETL connectors?&nbsp; If so, for what platforms?</p>

<p><b>Answer :</b></p>

<p>At present, there are no community developed ETL connectors available.&nbsp; I believe that there is a commercial offering out there (Infobright offers a connector for Pentaho on Windows), and that there will likely be more.
</p>
      ]]>
      </content>
    </entry>

    <entry>
      <title>To transfer contacts on database</title>
      <link rel="alternate" type="text/html" href="http://www.infobright.org/Forums/viewthread/1529/" />      
      <id>tag:infobright.org,2010:Forums/viewthread/.1529</id>
      <published>2010-05-05T10:23:17Z</published>
      <updated></updated>
      <author><name>qrmikwen</name></author>
      <content type="html">
      <![CDATA[
        <p>Hello,<br />
 
I am in the process of transferring data and contacts onto a Mysql database.<br />
I have had some help and am trying to get a software solution.&nbsp; </p>

<p>So main features would be to migrate large amounts of data and contacts every day. The migration would be done manually. </p>

<p>Thanks for the help.
</p>
      ]]>
      </content>
    </entry>

    <entry>
      <title>Getting data quality on database</title>
      <link rel="alternate" type="text/html" href="http://www.infobright.org/Forums/viewthread/1507/" />      
      <id>tag:infobright.org,2010:Forums/viewthread/.1507</id>
      <published>2010-04-21T08:01:31Z</published>
      <updated></updated>
      <author><name>qrmikwen</name></author>
      <content type="html">
      <![CDATA[
        <p>Hello,</p>

<p>Thank you for your help on the last question. <br />
We have been looking at several open source software for data integration. And now we would be looking at data quality software able to complete the data integration software. </p>

<p>Can you tell me if there are packages able to suit our needs? </p>

<p>Thanks a lot.
</p>
      ]]>
      </content>
    </entry>


</feed>