I would encourage that you consider reframing your research question. Statistically speaking, you cannot prove the absence of a difference unless you plan on using Bayesian methods. I'll explain what I mean using a simple example of how you might use your data.
Say that you decide to take the mean rating of items 1-6 and 7-12. To do this, you might document a rating of "Strongly Disagree" as a 1, "Disagree" as a 2, "Neutral" as a 3, etc. You could then average the two parts of your scale so that each person has an average rating of perceived age bias and an average rating of personal age bias. You might then use a paired sample's t-test to "test" whether the average rating of these constructs differ within individuals. Under standard null hypothesis testing, the following would be your null and alternative hypotheses:
$H_0: \mu_d = 0$
$H_a: \mu_d \ne 0$
Just for clarity, $\mu_d$ is the mean difference (i.e., the average difference in the perceived age bias and personal age bias for each person). This null hypothesis test should illustrate that the null hypothesis (that there is no average difference) is your research hypothesis. At first glance, that should mean that your question is answerable since you could test that hypothesis; however, the issue with frequentist methods is that we are testing whether we can reject the null hypothesis. Frequentist null hypothesis tests can only ever have two results: reject the null hypothesis (result is statistically significant) or fail to reject the null hypothesis (result is not statistically significant).
So, depending on your statistical orientation, framing your question as trying to show the absence of an effect is not possible. From a scientific integrity perspective, your research question and hypothesis shouldn't be trying to prove a negative either.
Not all hope is gone, however. Like I mentioned, you could use Bayesian methods to estimate the evidence for the null relative to the alternative hypotheses. Or, you could reframe your statistical hypothesis. For example, instead of saying that there is no difference between the two subscales, you could test whether results are negligibly different from zero (or effectively zero). This framework is the two one-sided t-test (TOST).
As far as strategies for answering a question of whether people respond to items similarly or differently, I'm going to assume that items are repeated on the scale. In other words, the same items are given on the perceived bias and personal bias scales (e.g., Q1 = Q7, Q2 = Q8, etc.). the Item Response Theory framework gives a few resources for doing this analysis. You could test directly whether there is differential item functioning depending on whether an item is for personal bias or perceived bias, but this would assume that it's safe to say that there is a single latent variable (i.e., bias) as opposed to two separate latent variables (i.e., personal bias and perceived bias). Item Response Theory generally requires a fairly large sample (e.g., 250-500 people). You could do a chi-square for each item where you check whether counts of endorsements for the different Likert ratings versus whether the item is for perceived or personal bias, but you would need to adjust your p-values for multiple observations. The issue is that you would run into the fact that a non-significant result can't be taken as meaning that there is no difference.