Recently, I have had a number of discussions regarding data loads and queries with any MySQL architected database using a JDBC connection. It seems that there are specific issues with memory during queries and batch inserts that use JDBC drivers which can be addressed. As you might expect, there are various articles already discussing related topics such as Slow JDBC batch insert (insert only applies to IEE in the case of Infobright) in which blogs have recommended using a specific driver like the Drizzle JDBC driver, which provides better performance if you have no alternative to using a JDBC driver. If you do have flexibility, using the Infobright loader (ICE or IEE), MySQL loader or DLP (IEE) or a third party ETL tool will result in much faster load and query speeds versus JDBC. So in the end, using a JDBC driver is highly discouraged.
Detail:
Memory issues:
The main options for avoiding memory contention between the database and JDBC connections are to either stream the results to the application rather than buffer, or isolate the java service onto a separate VM. If you end up having a large result set from a query, and this must fit into memory, you will obviously run into problems. However, simply moving the Java service to another server has worked consistently so make sure to do that.
For your convenience I have also included some additional information on managing large result sets using Connector/J.
MySQL API documentation for Connector/J
http://dev.mysql.com/doc/refman/5.1/en/connector-j-reference-configuration-properties.html
MySQL JDBC Streaming results example
http://benjchristensen.com/2008/05/27/mysql-jdbc-memory-usage-on-large-resultset/
DataDirect JDBC Driver (potential alternative to MySQL)
http://www.datadirect.com/products/jdbc/mysql/
MySQL Presentation on JDBC Performance
http://www.mysqlconf.com/mysql2009/public/schedule/detail/6886
Known Bugs with Connector/J
I hope this helps you in determining potential issues in your environment, as it's good to check all possible sources to a problem versus just the one that appears obvious. Good luck and please let us know if you have any other comments that will help everyone.
Regards,
Karl
Comments (0)
Recently, I published a blog about a free, machine-generated dataset. It's a great dataset to experiment with Infobright. To get you started, I wanted to provide additional information on how to load the data into Infobright. I've also provided some tweaks (using DomainExpert and lookup columns) which will help speed up your queries.
CREATE TABLE fcc.optimized_curr_httppostmt (unit_id int comment 'lookup', dtime timestamp, target varchar(500) comment 'lookup', fetch_time float, bytes_total float, bytes_sec float, warmup_time float, warmup_bytes float, sequence int, threads int, successes int, failures int );
create table fcc.optimized_curr_avail (unit_id int comment 'lookup', dtime timestamp, successes int, failures int);
create table fcc.optimized_curr_dlping (unit_id int comment 'lookup', dtime timestamp, target varchar(500) comment 'lookup', rtt_avg float, rtt_min float, rtt_max float, rtt_std float, successes int, failures int);
create table fcc.optimized_curr_dns (unit_id int comment 'lookup', dtime timestamp, nameserver varchar(200) comment 'lookup', lookup_host varchar(500) comment 'lookup', response_ip int, rtt int, successes int, failures int);
create table fcc.optimized_curr_httpgetmt (unit_id int comment 'lookup', dtime timestamp, target varchar(500) comment 'lookup', fetch_time float, bytes_total float, bytes_sec float, warmup_time float, warmup_bytes float, sequence int, threads int, successes int, failures int);
create table fcc.optimized_curr_netusage (unit_id int comment 'lookup', dtime timestamp, wan_r_bytes float, wan_tx_bytes float, sk_rx_bytes float, sk_tx_bytes float);
create table fcc.optimized_curr_ping (unit_id int comment 'lookup', dtime timestamp, target varchar(500) comment 'lookup', rtt_avg float, rtt_min float, rtt_max float, rtt_std float, successes int, failures int);
create table fcc.optimized_curr_udpjitter (unit_id int comment 'lookup', dtime timestamp, target varchar(500) comment 'lookup', packet_size float, stream_rate float, duration float, packet_up_sent int, packets_down_sent int, packets_up_recv int, packets_down_recv int, jitter_up float, jitter_down float, latency float, successes int, failures int);
create table fcc.optimized_curr_udplatency (unit_id int comment 'lookup', dtime timestamp, target varchar(500) comment 'lookup', rtt_avg float, rtt_min float, rtt_max float, rtt_std float, successes int, failures int);
create table fcc.optimized_curr_ulping (unit_id int comment 'lookup', dtime timestamp, target varchar(500) comment 'lookup', rtt_avg float, rtt_min float, rtt_max float, rtt_std float, successes int, failures int);
create table fcc.optimized_curr_videostream (unit_id int comment 'lookup', dtime timestamp, target varchar(500) comment 'lookup', downthrpt float, downjitter float, latency float, jitter float, buffer_underruns float, buffer_delay float, buffer_filltime float, duration float, bitrate float, buffer_size float, successes int, failures int);
create table fcc.optimized_curr_webget (unit_id int comment 'lookup', dtime timestamp, target varchar(500) comment 'lookup', fetch_time float, bytes_total float, bytes_sec float, objects int, successes int, failures int);
create table fcc.optimized_unit_metadata (unit_id int comment 'lookup', isp varchar(200) comment 'lookup', technology varchar(50) comment 'lookup', validated_by varchar(100) comment 'lookup', isp_down float, isp_up float, state varchar(10) comment 'lookup', region varchar(50) comment 'lookup');
create table fcc.optimized_unit_tz (unit_id int comment 'lookup', tz int, dst int);
create table fcc.optimized_unit_masked_ip (unit_id int comment 'lookup', masked_ip varchar(20) comment 'lookup');load data infile '/data/fcc/201102/curr_httppostmt.csv' into table fcc.optimized_curr_httppostmt fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201102/curr_avail.csv' into table fcc.optimized_curr_avail fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201102/curr_dlping.csv' into table fcc.optimized_curr_dlping fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201102/curr_dns.csv' into table fcc.optimized_curr_dns fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201102/curr_httpgetmt.csv' into table fcc.optimized_curr_httpgetmt fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201102/curr_netusage.csv' into table fcc.optimized_curr_netusage fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201102/curr_ping.csv' into table fcc.optimized_curr_ping fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201102/curr_udpjitter.csv' into table fcc.optimized_curr_udpjitter fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201102/curr_udplatency.csv' into table fcc.optimized_curr_udplatency fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201102/curr_ulping.csv' into table fcc.optimized_curr_ulping fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201102/curr_videostream.csv' into table fcc.optimized_curr_videostream fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201102/curr_webget.csv' into table fcc.optimized_curr_webget fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201103/curr_httppostmt.csv' into table fcc.optimized_curr_httppostmt fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201103/curr_avail.csv' into table fcc.optimized_curr_avail fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201103/curr_dlping.csv' into table fcc.optimized_curr_dlping fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201103/curr_dns.csv' into table fcc.optimized_curr_dns fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201103/curr_httpgetmt.csv' into table fcc.optimized_curr_httpgetmt fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201103/curr_netusage.csv' into table fcc.optimized_curr_netusage fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201103/curr_ping.csv' into table fcc.optimized_curr_ping fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201103/curr_udpjitter.csv' into table fcc.optimized_curr_udpjitter fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201103/curr_udplatency.csv' into table fcc.optimized_curr_udplatency fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201103/curr_ulping.csv' into table fcc.optimized_curr_ulping fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201103/curr_videostream.csv' into table fcc.optimized_curr_videostream fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201103/curr_webget.csv' into table fcc.optimized_curr_webget fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201104/curr_httppostmt.csv' into table fcc.optimized_curr_httppostmt fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201104/curr_avail.csv' into table fcc.optimized_curr_avail fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201104/curr_dlping.csv' into table fcc.optimized_curr_dlping fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201104/curr_dns.csv' into table fcc.optimized_curr_dns fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201104/curr_httpgetmt.csv' into table fcc.optimized_curr_httpgetmt fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201104/curr_netusage.csv' into table fcc.optimized_curr_netusage fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201104/curr_ping.csv' into table fcc.optimized_curr_ping fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201104/curr_udpjitter.csv' into table fcc.optimized_curr_udpjitter fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201104/curr_udplatency.csv' into table fcc.optimized_curr_udplatency fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201104/curr_ulping.csv' into table fcc.optimized_curr_ulping fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201104/curr_videostream.csv' into table fcc.optimized_curr_videostream fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201104/curr_webget.csv' into table fcc.optimized_curr_webget fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201105/curr_httppostmt.csv' into table fcc.optimized_curr_httppostmt fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201105/curr_avail.csv' into table fcc.optimized_curr_avail fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201105/curr_dlping.csv' into table fcc.optimized_curr_dlping fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201105/curr_dns.csv' into table fcc.optimized_curr_dns fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201105/curr_httpgetmt.csv' into table fcc.optimized_curr_httpgetmt fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201105/curr_netusage.csv' into table fcc.optimized_curr_netusage fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201105/curr_ping.csv' into table fcc.optimized_curr_ping fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201105/curr_udpjitter.csv' into table fcc.optimized_curr_udpjitter fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201105/curr_udplatency.csv' into table fcc.optimized_curr_udplatency fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201105/curr_ulping.csv' into table fcc.optimized_curr_ulping fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201105/curr_videostream.csv' into table fcc.optimized_curr_videostream fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201105/curr_webget.csv' into table fcc.optimized_curr_webget fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201106/curr_httppostmt.csv' into table fcc.optimized_curr_httppostmt fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201106/curr_avail.csv' into table fcc.optimized_curr_avail fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201106/curr_dlping.csv' into table fcc.optimized_curr_dlping fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201106/curr_dns.csv' into table fcc.optimized_curr_dns fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201106/curr_httpgetmt.csv' into table fcc.optimized_curr_httpgetmt fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201106/curr_netusage.csv' into table fcc.optimized_curr_netusage fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201106/curr_ping.csv' into table fcc.optimized_curr_ping fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201106/curr_udpjitter.csv' into table fcc.optimized_curr_udpjitter fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201106/curr_udplatency.csv' into table fcc.optimized_curr_udplatency fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201106/curr_ulping.csv' into table fcc.optimized_curr_ulping fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201106/curr_videostream.csv' into table fcc.optimized_curr_videostream fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/201106/curr_webget.csv' into table fcc.optimized_curr_webget fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/metadata/unit_metadata.csv' into table fcc.optimized_unit_metadata fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/metadata/unit_tz.csv' into table fcc.optimized_unit_tz fields terminated by ',' enclosed by '"';
load data infile '/data/fcc/metadata/unit_masked_ip.csv' into table fcc.optimized_unit_masked_ip fields terminated by ',' enclosed by '"';
I'm still generating benchmarks on this data. I plan to compare Infobright with lookups/domainexperts as compared to Infobright without these optimizations. I hope to release that information sometime by next week.