These are contingency tables. In your matrix m1, you have the counts associated with a null hypothesis in which the cell probabilities are all the same. That is somewhat different from the typical case of using a chi-squared test on a contingency table. The default test would check if the variables are independent, which is to say, does being in one row (column) make you more likely to be in a particular column (row) than being in a different row (column) would? That null is considerably less restrictive than yours, so we cannot use the default chi-squared test setup, but we can use the chi-squared test with a custom setup.
In essence, you are after a chi-squared test for goodness of fit, with a particular null specified. Thus, you just need to ask your software for that and specify the null you want. Any software should be able to do that for you; I will demonstrate this with R.
chisq.test(x=as.vector(m2), p=as.vector(m1)/sum(m1))
# Chi-squared test for given probabilities
#
# data: as.vector(m2)
# X-squared = 18, df = 8, p-value = 0.02123
R complains about the above test, so we can check it by simulating the p-value, instead of relying on the chi-squared distribution with 8 degrees of freedom being correct. There doesn't seem to be much problem:
set.seed(6625)
chisq.test(x=as.vector(m2), p=as.vector(m1)/sum(m1), simulate.p.value=TRUE)
# Chi-squared test for given probabilities with
# simulated p-value (based on 2000 replicates)
#
# data: as.vector(m2)
# X-squared = 18, df = NA, p-value = 0.02449
The above gives you a test of the hypothesis that your observed matrix m2 comes from a population with the pattern specified in the expected matrix m1. Alternatively, if both m1 and m2 are observed matrices, and you wonder if they differ from each other, you need to use a log linear model for multi-way contingency tables (I discuss this more thoroughly here: $χ^2$ of multidimensional data).
# this creates the multi-way contingency table:
tab = rep(NA, 18)
dim(tab) = c(3,3,2)
tab[,,1] = m1; tab[,,2] = m2
tab = as.table(tab)
names(dimnames(tab)) = c("row", "column", "matrix")
tab
# , , matrix = A
# column
# row A B C
# A 3 3 3
# B 3 3 3
# C 3 3 3
#
# , , matrix = B
# column
# row A B C
# A 6 3 0
# B 0 6 0
# C 3 3 6
library(MASS) # we'll use this package
m.sat = loglm(~row*column*matrix, tab) # this is the saturated model
m1 = loglm(~matrix + row*column, tab) # assumes the r*c pattern is = by m
anova(m1, m.sat) # nested model test of m1 vs m.sat
# LR tests for hierarchical log-linear models
#
# Model 1: ~matrix + row * column
# Model 2: ~row * column * matrix
# Deviance df Delta(Dev) Delta(df) P(> Delta(Dev)
# Model 1 15.53483 8
# Model 2 0.00000 0 15.53483 8 0.04954
# Saturated 0.00000 0 0.00000 0 1.00000
Notice that this version is less powerful, because there could be sampling error in the observed m1 counts, whereas the chi-squared test above assumes those counts were specified a-priori.
Your use of the word "measure" is somewhat ambiguous to me. If you are interested in a measure of effect size (i.e., how far is m2 from the uniform), you can just take the $N$ (or more literally, $\sqrt N$) out of the chi-squared test statistic. That gives you the $\phi$ coefficient.