The Fisher information in a statistic computed on sample data, describes a parameter of the probability distribution from which the data have been sampled. An unbiased statistic's value (ignoring measurement error) is equal to that of the not-directly-observable parameter, plus a random perturbation in the value. The random discrepancy between estimate and parameter, which arises due to the sampling process itself (i.e., due to the fact that not all population members are in the sample), is called sampling error.
A statistic's Fisher information is inversely related to its error, so that greater error means less information, and vice versa. In short, Fisher information is precision. This is why it is technically meaningless to report a point estimate without a measure of its precision, like the sample standard error: otherwise, we have no idea how informative our estimate is.
In contrast, the Shannon information in a measure describes a message, not a parameter. Unlike a parameter, a message is completely observable: if we have the message because it was composed and transmitted to us, it clearly was observed by someone at some point. Also, unlike a statistical sample, we need not assume a message has a latent structure.
Messages are usually represented as binary strings (or can be expressed as such). Now, suppose we are interested in message $X$ but all we know about $X$ is that its length is $n$. Then the $2^n$ possible binary strings of length $n$ constitute the set of all possible $X$. A measure that provides some non-zero amount of Shannon information about $X$ is anything$-$an observation, a rule, a function$-$that allows us to distinguish $X$ from at least one other member of the set. If the measure lets us precisely determine which member of the set $X$ is, it contains the total Shannon information about $X$, the expected value of which is its total Shannon entropy.
Shannon information, like Fisher information, is probabilistic. Suppose $X$ was sent to us over a noisy connection, so that some 0's are randomly flipped to 1's and vice versa. Call the noisy message $X'$. Shannon entropy can be used to describe the expected characteristics of $X|X'$ probabilistically. Or, if $X'$ is one message plucked from a large number of similar messages (e.g., calls going through a cell tower), the entropy of $X'$ describes mean message $X$.
Historically, these two fields, and their respective information types, were developed and studied separately. As noted by others, there is no precise translation between them, although mathematical inequalities have been defined. They describe different (even incompatible) attributes of data.
(As an aside, this suggests to me not that the two are unrelated, but that they are in fact complementary and even mutually necessary. Consider, for example, that estimation theory conceives of data randomness in two distinct ways: as a long-run dispersion parameter, and as the sample-specific effect of that parameter. Fisher information can describe the former but not the latter. Were the latter describable by an alternative information type, one independent of the first, their respective measures would fully characterize any unique probability sample.)