From the sklearn documentation:
$$
\kappa = (p_o - p_e) / (1 - p_e)
$$
where $p_o$ is the empirical probability of agreement on the label assigned to any sample (the observed agreement ratio), and $p_e$ is the expected agreement when both annotators assign labels randomly. $p_e$ is estimated using a per-annotator empirical prior over the class labels.
Let's unpack the documentation.
$p_o$ is the empirical probability of agreement on the label assigned to any sample (the observed agreement ratio)
The observed agreement ratio is the classification accuracy. I say this makes sense, as the classification accuracy is the number of predicted labels that agree with the true labels divided by the total number of attempts.
$p_e$ is the expected agreement when both annotators assign labels randomly
This means that $p_e$ is the expected classification accuracy when predicted labels are randomly assigned and true labels are randomly assigned.
$p_e$ is estimated using a per-annotator empirical prior over the class labels.
This means that the distribution of labels comes from the true distribution of labels. In other words, the random labels come from sampling from the true labels with the class ratios respected. The reference given for how exactly this is calculated is Artstein and Poesio (2008), with the derivation completed on page $8$, which appears to be the same as the calculation given on Wikipedia.
Let $N$ be the total number of classification attempts; let there be $K$ categories; let $n_{k1}$ be the number of times label $k$ appears in the predictions; and let $n_{k2}$ be the number of times label $k$ is a true label. Then: $$p_e = \dfrac{1}{N^2}\overset{K}{\underset{k=1}{\sum}}n_{k1}n_{k2}$$
With these definitions for $p_0$ and $p_e$, we arrive at the sklearn calculation: $$
\kappa = (p_o - p_e) / (1 - p_e)
$$
and is it a good idea for my usecase
Cohen's $\kappa$ is a function of the classification accuracy, so if you are interested in the classification accuracy, Cohen's $\kappa$ might be a statistic that gives a context for that accuracy. In particular, Cohen's $\kappa$ can be seen as a comparison between the classification accuracy of your model and the classification accuracy that comes from randomly assigning labels. An advantage of transforming the accuracy this way is that it exposes performance worse than random. For instance, a common complaint about classification accuracy is that it can be high for an imbalanced problem yet not indicate good performance, such as getting $95\%$ accuracy when $99\%$ of the observations belong to one category. While the $95\%$ accuracy looks high, running such a situation through the Cohen's $\kappa$ calculation is likely to expose such performance as being worse than it would be for random guessing. If this sounds appealing, then Cohen's $\kappa$ might be a good measure of performance for you.
A drawback of Cohen's $\kappa$ is that it requires you to bin continuous model outputs, such as those given by logistic regressions and neural networks. While this does not discuss Cohen's $\kappa$ in particular, all of the criticisms apply.
REFERENCES
Artstein, Ron, and Massimo Poesio. "Inter-coder agreement for computational linguistics." Computational linguistics 34.4 (2008): 555-596.