Spring Batch requires unique job parameters for its execution.so you can add the current time as a job parameter
Map<String, JobParameter> confMap = new HashMap<String, JobParameter>(); confMap.put("time", new JobParameter(System.currentTimeMillis())); JobParameters jobParameters = new JobParameters(confMap); jobLauncher.run(springCoreJob, jobParameters); |
Monday, 30 December 2013
org.springframework.batch.core.repository.JobInstanceAlreadyCompleteException: A job instance already exists and is complete for parameters
Labels:
hadoop,
multiple jobs,
spring batch
Friday, 20 December 2013
Sort mapreduce output keys in descending order
Add the following class to your current class
NB. This only works if your key belongs to Text.class else modify the reverse comparator class accordingly
public static class ReverseComparator extends WritableComparator { private static final Text.Comparator TEXT_COMPARATOR = new Text.Comparator(); public ReverseComparator() { super(Text.class); } @Override public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) { return (-1)* TEXT_COMPARATOR.compare(b1, s1, l1, b2, s2, l2); } @SuppressWarnings("rawtypes") @Override public int compare(WritableComparable a, WritableComparable b) { if (a instanceof Text && b instanceof Text) { return (-1)*(((Text) a).compareTo((Text) b)); } return super.compare(a, b); } }in new api(mapreduce) add the following to your configuration.
Job.setSortComparator(ReverseComparator.class);
NB. This only works if your key belongs to Text.class else modify the reverse comparator class accordingly
Set separator for mapreduce output
By default the output separator is a single space, to set the output separated by our desired character set this configuration
where conf is a org.apache.hadoop.conf.Configuration object
conf.set("mapred.textoutputformat.separator", ",");The map reduce(ie the key and values) output will be comma separated in this case.
where conf is a org.apache.hadoop.conf.Configuration object
Tuesday, 10 December 2013
Region servers going down in cdh4 due to mapreduce job
I faced this problem because i had set the scan caching to 500 ie it passes 500 rows to your mapreduce job which is memory intensive and not recommended
data driven db input format
Include the id also.....
in case of dbinput format dont use the id in the VO.
in case of dbinput format dont use the id in the VO.
Thursday, 21 November 2013
Password of cloudera-scm user
It is present in the directory
/var/lib/cloudera-scm-server-db/data/generated_password.txt
/var/lib/cloudera-scm-server-db/data/generated_password.txt
Wednesday, 20 November 2013
Enable logging feature for soap webserivce to get test case response
In web-inf folder inside serivce.xml
<jaxws:endpoint...>
<jaxws:features>
<bean class="org.apache.cxf.feature.LoggingFeature"/>
</jaxws:features>
</jaxws:endpoint>
Thursday, 14 November 2013
Find count of each word in a file in linux command line
tr -s [:space:] \\n < your_filename | sort | uniq --count | sort -rn | head -n 50
This shows the top 50 words in your file in a sorted manner
This shows the top 50 words in your file in a sorted manner
Tuesday, 5 November 2013
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/client/Scan
Setting the hadoop_classpath variable will fix this issue
export HADOOP_CLASSPATH=`/usr/bin/hbase classpath`
Thursday, 31 October 2013
To install R studio in centos 6
Centos doesnot support R studio desktop so you would have to install r studio server instead which works in a browser
For EL6
After enabling EPEL you should then ensure that you have installed the version of R available from EPEL. You can do this using the following command:
Refernce : http://www.rstudio.com/ide/download/server
For EL6
RStudio Server has several dependencies on packages (including R itself) found in the Extra Packages for Enterprise Linux (EPEL) repository. If you don't already have this repository available you should add it to your system using the instructions found on the Fedora EPEL website.
$ su -c 'rpm -Uvh http://download.fedoraproject.org/pub/epel/6/i386/epel-release-6-8.noarch.rpm'
After enabling EPEL you should then ensure that you have installed the version of R available from EPEL. You can do this using the following command:
$ sudo yum install R
Download and Install
To download and install RStudio Server open a terminal window and execute the commands corresponding to the 32 or 64-bit version as appropriate.32-bit Size: 17.5 MB MD5: 3bc83db8c23c212c391342e731f65823
$ wget http://download2.rstudio.org/rstudio-server-0.97.551-i686.rpm
$ sudo yum install --nogpgcheck rstudio-server-0.97.551-i686.rpm
64-bit Size: 17.6 MB MD5: c89d5574a587472d06f72b304356a776
$ wget http://download2.rstudio.org/rstudio-server-0.97.551-x86_64.rpm
$ sudo yum install --nogpgcheck rstudio-server-0.97.551-x86_64.rpm
Then in your browser go to address
http://<your_server_name>:8787
The login credentials are the current username and password of your centos account
Refernce : http://www.rstudio.com/ide/download/server
Wednesday, 30 October 2013
HBase
list - to list the tables
if needed add
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2182</value>
</property>
in zoo.cfg change to 2182
in hbase-env.sh uncomment
#HBASE_MANAGES_ZK = true
try restarting the processes
Run from terminal to count the no of distinct rows in a hbase table
echo "scan 'table'" | hbase shell | grep "columnFamliyName:" | wc -l
Divide the count by the no of columns and you will get the no of distinct rows
if needed add
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2182</value>
</property>
in zoo.cfg change to 2182
in hbase-env.sh uncomment
#HBASE_MANAGES_ZK = true
try restarting the processes
Run from terminal to count the no of distinct rows in a hbase table
echo "scan 'table'" | hbase shell | grep "columnFamliyName:" | wc -l
Divide the count by the no of columns and you will get the no of distinct rows
Grep and replace linux
Finding and replacing using grep in linux / centos 6+
1. The syntax is
grep -rl 'findWord' path_to_your_file_name | xargs sed -i 's/findWord/replaceWord/g'
2. Searching for all files in the current directory for the term windows and replace it with linux
grep -rl 'windows' ./ | xargs sed -i 's/windows/linux/g'
s/ - indicates the start
/g - indicates the end
./ - the current directory
3. You can use regex also - eg.
grep -rl '{:\".*\",\"hello\":\"' data.txt | xargs sed -i 's/{:\".*\",\"hello\":\"//g'
Sunday, 20 October 2013
Include jquery in phantomjs
Use page.injectJs('jquery-1.6.1.min.js'); it will work fine.
here is a beautiful link for using jquery in phantomjs http://snippets.aktagon.com/snippets/534-how-to-scrape-web-pages-with-phantomjs-and-jquery
here is a beautiful link for using jquery in phantomjs http://snippets.aktagon.com/snippets/534-how-to-scrape-web-pages-with-phantomjs-and-jquery
Thursday, 17 October 2013
Point hbase conf folder to another folder using alternatives(a linux command) in CentOS
alternatives]$ alternatives --display hbase-conf
hbase-conf - status is auto.
link currently points to /etc/hbase/conf.dist
/etc/hbase/conf.dist - priority 30
Current `best' version is /etc/hbase/conf.dist.
alternatives]$ alternatives --install /etc/hbase/conf hbase-conf /etc/hbase/conf.my_cluster/ 10
failed to create /var/lib/alternatives/hbase-conf.new: Permission denied
alternatives]$ sudo alternatives --install /etc/hbase/conf hbase-conf /etc/hbase/conf.my_cluster/ 10
alternatives]$ sudo alternatives --display hbase-conf
hbase-conf - status is auto.
link currently points to /etc/hbase/conf.dist
/etc/hbase/conf.dist - priority 30
/etc/hbase/conf.my_cluster/ - priority 10
Current `best' version is /etc/hbase/conf.dist.
alternatives]$ sudo alternatives --set hbase-conf /etc/hbase/conf.my_cluster/
alternatives]$ sudo alternatives --display hbase-conf
hbase-conf - status is manual.
link currently points to /etc/hbase/conf.my_cluster/
/etc/hbase/conf.dist - priority 30
/etc/hbase/conf.my_cluster/ - priority 10
Current `best' version is /etc/hbase/conf.dist.
hbase-conf - status is auto.
link currently points to /etc/hbase/conf.dist
/etc/hbase/conf.dist - priority 30
Current `best' version is /etc/hbase/conf.dist.
alternatives]$ alternatives --install /etc/hbase/conf hbase-conf /etc/hbase/conf.my_cluster/ 10
failed to create /var/lib/alternatives/hbase-conf.new: Permission denied
alternatives]$ sudo alternatives --install /etc/hbase/conf hbase-conf /etc/hbase/conf.my_cluster/ 10
alternatives]$ sudo alternatives --display hbase-conf
hbase-conf - status is auto.
link currently points to /etc/hbase/conf.dist
/etc/hbase/conf.dist - priority 30
/etc/hbase/conf.my_cluster/ - priority 10
Current `best' version is /etc/hbase/conf.dist.
alternatives]$ sudo alternatives --set hbase-conf /etc/hbase/conf.my_cluster/
alternatives]$ sudo alternatives --display hbase-conf
hbase-conf - status is manual.
link currently points to /etc/hbase/conf.my_cluster/
/etc/hbase/conf.dist - priority 30
/etc/hbase/conf.my_cluster/ - priority 10
Current `best' version is /etc/hbase/conf.dist.
Owlim lite installation Centos
Follow the instructions on this link
http://owlim.ontotext.com/display/OWLIMv52/OWLIM-Lite+Installation
if access denied is shown at times then change the owner of that folder to tomcat
sudo chown tomcat:tomcat your_folder
If you have more than 64000 entities to be stored in owlim then to support more than that edit the tomcat6.conf
/etc/tomcat6/tomcat6.conf
CATALINA_OPTS = "-DentityExpansionLimit=1000000(or your desired no.)"
restart tomcat and everything should be fine
sudo service tomcat6 restart
http://owlim.ontotext.com/display/OWLIMv52/OWLIM-Lite+Installation
if access denied is shown at times then change the owner of that folder to tomcat
sudo chown tomcat:tomcat your_folder
If you have more than 64000 entities to be stored in owlim then to support more than that edit the tomcat6.conf
/etc/tomcat6/tomcat6.conf
CATALINA_OPTS = "-DentityExpansionLimit=1000000(or your desired no.)"
restart tomcat and everything should be fine
sudo service tomcat6 restart
Thursday, 10 October 2013
RdfParse Exception in eclipse when using large rdf files
org.openrdf.rio.RDFParseException: The parser has encountered more than "64,000" entity expansions in this document; this is the limit imposed by the application.
This error occurs when using sparql queries because it doesnot allow you to process more than 64000 entities
Open Window -> Preferences -> Java -> Installed JRE's
edit the installed jre's
add the following to the default vm's arguments , the arguments must be separated by space.
-Xmx1024m -DentityExpansionLimit=100000
This error occurs when using sparql queries because it doesnot allow you to process more than 64000 entities
Open Window -> Preferences -> Java -> Installed JRE's
edit the installed jre's
add the following to the default vm's arguments , the arguments must be separated by space.
-Xmx1024m -DentityExpansionLimit=100000
Wednesday, 9 October 2013
Jena Fuseki server configurations
Download and unzip the jena fuseki server
to start the server with the dataset to be loaded in the memory
fuseki-server --update --mem /ds
to start the server with the dataset to be loaded into a user specified directory
fuseki-server --update --loc=your_path_to_directory /ds
if you don't specify --mem then by default the data set uploaded will be stored to the DB directory that is present in the unzipped folder
You can specify a custom assembler using
fuseki-server --update /inf --desc=assembler.ttl
assembler.ttl
@prefix : <#> .
@prefix fuseki: <http://jena.apache.org/fuseki#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix tdb: <http://jena.hpl.hp.com/2008/tdb#> .
@prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix sdb: <http://jena.hpl.hp.com/2007/sdb#> .
[] rdf:type fuseki:Server ;
fuseki:services (
<#tdb>
) .
<#tdb> rdf:type fuseki:Service ;
fuseki:name "tdb" ; # http://host/inf
fuseki:serviceQuery "sparql" ; # SPARQL query service
fuseki:serviceUpdate "update" ;
fuseki:dataset <#dataset2> ; #select which set to
. #use
[] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
tdb:DatasetTDB rdfs:subClassOf ja:RDFDataset .
tdb:GraphTDB rdfs:subClassOf ja:Model .
<#dataset2> rdf:type ja:RDFDataset ;
ja:defaultGraph <#model2>;
.
<#model2> a ja:OntModel;
ja:baseModel <#tdbGraph>;
ja:ontModelSpec ja:OWL_MEM ;
ja:content [ja:externalContent <file:////home/wn20full/wnfull.rdfs>]
.
<#tdbGraph> rdf:type tdb:GraphTDB;
tdb:location "DB";
.
<#interpretationSchema> a ja:MemoryModel ;
ja:content [
ja:externalContent <file:////home/wn20full/wnfull.rdfs> ;
] .
to start the server with the dataset to be loaded in the memory
fuseki-server --update --mem /ds
to start the server with the dataset to be loaded into a user specified directory
fuseki-server --update --loc=your_path_to_directory /ds
if you don't specify --mem then by default the data set uploaded will be stored to the DB directory that is present in the unzipped folder
You can specify a custom assembler using
fuseki-server --update /inf --desc=assembler.ttl
assembler.ttl
@prefix : <#> .
@prefix fuseki: <http://jena.apache.org/fuseki#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix tdb: <http://jena.hpl.hp.com/2008/tdb#> .
@prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix sdb: <http://jena.hpl.hp.com/2007/sdb#> .
[] rdf:type fuseki:Server ;
fuseki:services (
<#tdb>
) .
<#tdb> rdf:type fuseki:Service ;
fuseki:name "tdb" ; # http://host/inf
fuseki:serviceQuery "sparql" ; # SPARQL query service
fuseki:serviceUpdate "update" ;
fuseki:dataset <#dataset2> ; #select which set to
. #use
[] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
tdb:DatasetTDB rdfs:subClassOf ja:RDFDataset .
tdb:GraphTDB rdfs:subClassOf ja:Model .
<#dataset2> rdf:type ja:RDFDataset ;
ja:defaultGraph <#model2>;
.
<#model2> a ja:OntModel;
ja:baseModel <#tdbGraph>;
ja:ontModelSpec ja:OWL_MEM ;
ja:content [ja:externalContent <file:////home/wn20full/wnfull.rdfs>]
.
<#tdbGraph> rdf:type tdb:GraphTDB;
tdb:location "DB";
.
<#interpretationSchema> a ja:MemoryModel ;
ja:content [
ja:externalContent <file:////home/wn20full/wnfull.rdfs> ;
] .
Sunday, 6 October 2013
How to synchronize time with ntp server in windows 7
In cmd type:
net set \\your_ntp_server_ip /set /yes
net set \\your_ntp_server_ip /set /yes
Wednesday, 2 October 2013
How to create a user in CentOS and give sudo access
To create a user in CentOS follow these steps
1. You must be logged in as root to add a new user
2. Issue the useradd command to create a locked account
3. Issue the passwd command to set the password of the newly created user
4. To give the user sudo access you need to add the user to wheel group
To do this issue the command : visudo
To be able to add a user to the wheel group you must uncomment the line ie
5. Adding the newly created user to wheel group
6. Finished - Your user now has sudo access !! enjoy !!
1. You must be logged in as root to add a new user
2. Issue the useradd command to create a locked account
useradd <username>
3. Issue the passwd command to set the password of the newly created user
passwd <username>This will you prompt you to enter the password of the newly created user
4. To give the user sudo access you need to add the user to wheel group
To do this issue the command : visudo
#%wheel ALL=(ALL) ALLto
%wheel ALL=(ALL) ALL
5. Adding the newly created user to wheel group
usermod -G wheel <username>
6. Finished - Your user now has sudo access !! enjoy !!
How to manually setup hbase for your cloudera cdh4 cluster in CentOS | RHEL | Linux
Follow these steps
1. Install hbase on all the machines
sudo yum install hbase
2.Install hbase-master and zookeper-server on your master machine
sudo yum install zookeper-server
sudo yum install hbase-master
zookeeper-server automatically installs the base zookeper package also.
For hbase to start it needs to have zookeeper
3. Install hbase-region server and zookeeper in all your slave machines
sudo yum install zookeeper
sudo yum install hbase-regionserver
4.Modifying the HBase Configuration
To enable pseudo-distributed mode, you must first make some configuration changes. Open /etc/hbase/conf/hbase-site.xml in your editor of choice, and insert the following XML properties between the <configuration> and </configuration> tags. Be sure to replace myhost with the hostname of your HDFS NameNode (as specified by fs.default.name or fs.defaultFS in your hadoop/conf/core-site.xml file); you may also need to change the port number from the default (8020).
<property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <property> <name>hbase.rootdir</name> <value>hdfs://myhost:8020/hbase</value> </property>
5.Configuring for Distributed Operation
After you have decided which machines will run each process, you can edit the configuration so that the nodes may locate each other. In order to do so, you should make sure that the configuration files are synchronized across the cluster. Cloudera strongly recommends the use of a configuration management system to synchronize the configuration files, though you can use a simpler solution such as rsync to get started quickly.
The only configuration change necessary to move from pseudo-distributed operation to fully-distributed operation is the addition of the ZooKeeper Quorum address in hbase-site.xml. Insert the following XML property to configure the nodes with the address of the node where the ZooKeeper quorum peer is running:
<property> <name>hbase.zookeeper.quorum</name> <value>mymasternode</value> </property>
6.Creating the /hbase Directory in HDFS
Before starting the HBase Master, you need to create the /hbase directory in HDFS. The HBase master runs as hbase:hbase so it does not have the required permissions to create a top level directory.
To create the /hbase directory in HDFS:
$ sudo -u hdfs hadoop fs -mkdir /hbase $ sudo -u hdfs hadoop fs -chown hbase /hbase
7.Starting the ZooKeeper Server
- To start ZooKeeper after a fresh install:
$ sudo service zookeeper-server init $ sudo service zookeeper-server start
8.Starting zookeeper
$ sudo service zookeeper start
9.Starting the HBase Master
- On Red Hat and SLES systems (using .rpm packages) you can now start the HBase Master by using the included service script:
$ sudo service hbase-master start
To start the Region Server:
$ sudo service hbase-regionserver start
10.Accessing HBase by using the HBase Shell
After you have started HBase, you can access the database by using the HBase Shell:
$ hbase shell
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Installation-Guide/cdh4ig_topic_20_2.html
Subscribe to:
Posts (Atom)