I'm consolidating my comments into an answer.
First, it is hard to make being "awkward to implement" precise, so an empirical study of program length or development time for a number of approximation algorithms is in order to answer that question.
From a more theoretical viewpoint, every algorithm that can be implemented on a RAM with runtime $f(n)$ can be implemented in a purely functional language with runtime at most $O(\log n * f(n))$. This is because functional languages are turing complete and the simulation of mutable, random access memory via search trees causes logarithmic slowdown. Chris Okasaki's Purely Functional Data Structures shows how to implement search trees in a functional language.
Nowadays many approximation algorithms are designed for huge datasets and are written in the language of MapReduce or Pregel. This type of algorithm is very easy to translate to functional languages, as these distributed computing paradigms generally avoid stateful computations.
As a comment below notes, this question on Stack Overflow is closely related. In particular Edward Kmett gives references that show that under strict evaluation (i.e. not like the lazy evaluation you see in Haskell) there are examples where the logarithmic slowdown is necessary. It seems to be open whether this is also true in lazy settings.