“GC Overhead limit exceeded” on Hadoop

Asked Dec 05 '17 at 16:30

Active Dec 05 '17 at 16:30

Viewed 37 times

I have a project that does some indexing for full-text search. For this I use hadoop. I'm getting the error: "GC Overhead limit exceeded"

    Task TASKID="tip_201610111152_0066_r_000033" TASK_TYPE="REDUCE" TASK_STATUS="FAILED" FINISH_TIME="1512484551448" 
ERROR="java.lang.OutOfMemoryError: GC overhead limit exceeded
        at org.apache.hadoop.io.SequenceFile$CompressedBytes.writeUncompressedBytes(SequenceFile.java:505)
        at org.apache.hadoop.mapred.ReduceTask$ValuesIterator.getNext(ReduceTask.java:206)
        at org.apache.hadoop.mapred.ReduceTask$ValuesIterator.next(ReduceTask.java:168)
        at org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.next(ReduceTask.java:234)
        at org.apache.nutch.crawl.CrawlDbReducer.reduce(CrawlDbReducer.java:62)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:322)
        at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1743)

Last reduce's to do.

I have already set the configuration:

export HADOOP_DATANODE_OPTS = "-Xmx10g"

But it did not work, I have already re-indexed but always returns with the same error in the last reduce. Any idea what it might be?

Thanks.

asked Dec 05 '17 at 16:30

alves

Have you done memory profiling? You might need to adjust the garbage collector setings. – Thorbjørn Ravn Andersen Dec 05 '17 at 16:34
No, how can I make proper profiling, and adjust gargabe collector settings? – alves Dec 05 '17 at 16:37
Learn to use VisualVM. It contains a memory profiler. – Thorbjørn Ravn Andersen Dec 05 '17 at 16:39
OK, thanks. but, do you have any idea what garbage collection settings can be changed, and how do I change them? – alves Dec 05 '17 at 16:45
Depends on what the problem is. You need to investigate. – Thorbjørn Ravn Andersen Dec 05 '17 at 16:57
@ThorbjørnRavnAndersen Could not the cause of the problem be a mapred.reduce.tasks value too high or low? – alves Dec 13 '17 at 11:47
Good question. What did your investigation show? – Thorbjørn Ravn Andersen Dec 13 '17 at 16:17
@ThorbjørnRavnAndersen I haven't had time yet to analyze in depth. I couldn't do deep memory profiling. It's the additional information that the input data is 12 terabytes, is a very large collection. – alves Dec 14 '17 at 15:03
Check for memory leaks. – Thorbjørn Ravn Andersen Dec 15 '17 at 17:59

“GC Overhead limit exceeded” on Hadoop

0 Answers0