Tuesday 18 July 2017

Sqoop views in netezza to hdfs

I pondered upon a use case to transfer netezza tables/views to hadoop system. The current flow that we are using are :
1. Netezza -> SAN
2. SAN -> S3
3. S3 -> hdfs

If there is no primary key for the table in netezza you will be forced to use -split-by option or -m option. Only use verbose if needed.

And the reverse to transfer to netezza. After analyzing the use case the best option i found was to use sqoop. We are using yarn queues hence the queue option you can ignore this option if none is setup.

1. Transfer view
#Sqoop doesnot allow you to write into existing directory so removing the directory before transferring
hdfs dfs -rm -R /apps/hive/warehouse/<hivedbname>.db/<hivetablename>

sqoop import -Dmapreduce.job.queuename=q1 --hive-import --hive-database <hivedbname> --hive-table <hivetablename> --driver org.netezza.Driver --direct --connect jdbc:netezza://<host>:<port>/<netezzadbname> --username <netezzauser> --password <netezzapwd> --table <netezza tablename> --target-dir hdfs:///apps/hive/warehouse/<hivedbname>.db/<hivetablename> -split-by <anycolumn>

If we dont use --driver org.netezza.Driver parameter the following error is encountered.

2017-07-18 09:34:53,079 ERROR [Thread-16] org.apache.sqoop.mapreduce.db.netezza.NetezzaJDBCStatementRunner: Unable to execute external table export
org.netezza.error.NzSQLException: ERROR:  Column reference "DATASLICEID" not supported for views

at org.netezza.internal.QueryExecutor.getNextResult(QueryExecutor.java:276)
at org.netezza.internal.QueryExecutor.execute(QueryExecutor.java:73)
at org.netezza.sql.NzConnection.execute(NzConnection.java:2673)
at org.netezza.sql.NzStatement._execute(NzStatement.java:849)
at org.netezza.sql.NzPreparedStatament.execute(NzPreparedStatament.java:152)
at org.apache.sqoop.mapreduce.db.netezza.NetezzaJDBCStatementRunner.run(NetezzaJDBCStatementRunner.java:75)

End of LogType:syslog


Instead of split by option we can also use -m 1 , which transfers the data in one mapper & can be a bit slow.

2. Transfer a table
#Sqoop doesnot allow you to write into existing directory so removing the directory before transferring
hdfs dfs -rm -R /apps/hive/warehouse/<hivedbname>.db/<hivetablename>

sqoop import -Dmapreduce.job.queuename=q1 --verbose --hive-import --hive-database jijo --direct --connect jdbc:netezza://<host>:<port>/<netezzadbname> --username <netezzauser> --password <netezzapwd> --table <netezza tablename> --target-dir hdfs:///apps/hive/warehouse/<hivedbname>.db/<hivetablename> -m 1


Running
analyze table <hivedbname>.<hivetablename> compute statistics
would be ideal for hive running on tez execution engine.