Monday 30 December 2013

org.springframework.batch.core.repository.JobInstanceAlreadyCompleteException: A job instance already exists and is complete for parameters


Spring Batch requires unique job parameters for its execution.so you can add the current time as a job parameter

Map<String, JobParameter> confMap = new HashMap<String, JobParameter>();
confMap.put("time", new JobParameter(System.currentTimeMillis()));
JobParameters jobParameters = new JobParameters(confMap);
jobLauncher.run(springCoreJob, jobParameters);

Friday 20 December 2013

Sort mapreduce output keys in descending order

Add the following class to your current class

public static class ReverseComparator extends WritableComparator {
    
    private static final Text.Comparator TEXT_COMPARATOR = new Text.Comparator();
    public ReverseComparator() {
        super(Text.class);
    }

    @Override
    public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
       return (-1)* TEXT_COMPARATOR.compare(b1, s1, l1, b2, s2, l2);
    }

    @SuppressWarnings("rawtypes")
    @Override
    public int compare(WritableComparable a, WritableComparable b) {
        if (a instanceof Text && b instanceof Text) {
                return (-1)*(((Text) a).compareTo((Text) b));
        }
        return super.compare(a, b);
    }
}
in new api(mapreduce) add the following to your configuration.
Job.setSortComparator(ReverseComparator.class);

NB. This only works if your key belongs to Text.class else modify the reverse comparator class accordingly

Set separator for mapreduce output

By default the output separator is a single space, to set the output separated by our desired character set this configuration
conf.set("mapred.textoutputformat.separator", ",");
The map reduce(ie the key and values) output will be comma separated in this case.

where conf is a org.apache.hadoop.conf.Configuration  object


Tuesday 10 December 2013

Region servers going down in cdh4 due to mapreduce job

I faced this problem because i had set the scan caching to 500 ie it passes 500 rows to your mapreduce job which is memory intensive and not recommended

data driven db input format

Include the id also.....

in case of dbinput format dont use the id in the VO.