I have a project that does some indexing for full-text search. For this I use hadoop. I'm getting the error: "GC Overhead limit exceeded"
Task TASKID="tip_201610111152_0066_r_000033" TASK_TYPE="REDUCE" TASK_STATUS="FAILED" FINISH_TIME="1512484551448"
ERROR="java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.apache.hadoop.io.SequenceFile$CompressedBytes.writeUncompressedBytes(SequenceFile.java:505)
at org.apache.hadoop.mapred.ReduceTask$ValuesIterator.getNext(ReduceTask.java:206)
at org.apache.hadoop.mapred.ReduceTask$ValuesIterator.next(ReduceTask.java:168)
at org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.next(ReduceTask.java:234)
at org.apache.nutch.crawl.CrawlDbReducer.reduce(CrawlDbReducer.java:62)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:322)
at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1743)
Last reduce's to do.
I have already set the configuration:
export HADOOP_DATANODE_OPTS = "-Xmx10g"
But it did not work, I have already re-indexed but always returns with the same error in the last reduce. Any idea what it might be?
Thanks.