0

I would like to do some testing on cache for my x86 IA32 Intel CPU.

Referred the below document, due to poor coding in assembly and new to cache concepts, I could not proceed the cache testing.

https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf

I would like to enable the cache, invalidate, writeback, writethrough and cache disable.

Can you please help me with inline C assembly code ?

I come across some asm instruction, clflush, wbinvd... not sure when to use and how to use.

Also how can I verify the cache enable/disable/invalidate/writeback functions.

Saw this post, but it seems x86 64bit. Lot of assembly instruction is not matched.

enable/disable cache on intel 64bit machine: CD bit always set?

int cache_test(int opt) {

        unsigned int cr0;

        switch(opt) {
        case 0:
            __asm__ volatile(
                "pushl %%eax\n\t"
                "movl %%cr0,%%eax\n\t"
                "orl $0x60000000,%%eax\n\t"
                "movl %%eax,%%cr0\n\t"
                "movl %%cr0, %0\n\t"
                "wbinvd\n\t"
                "popl  %%eax"
                : "=r"(cr0)
                :
                :);
        printf("printf: disable cache cr0 0x%x\n", cr0);
        break;

        case 1:
            __asm__ volatile(
                "pushl %%eax\n\t"
                "movl %%cr0,%%eax\n\t"
                "andl $0x9fffffff,%%eax\n\t"
                "movl %%eax,%%cr0\n\t"
                "movl %%cr0, %0\n\t"
                "popl  %%eax"
                : "=r"(cr0)
                :
                :);
        printf("printf: enable cache; cr0 0x%x\n", cr0);
        break;

        case 2:
           __asm__ volatile(
                "pushl %%eax\n\t"
                "movl %%cr0, %%eax\n\t"
                "movl %%eax, %0\n\t"
                "popl %%eax"
                : "=r"(cr0)
                :
                :);
        printf("printf: XENMEM_show_cache_status cro value is 0x%x\n", cr0);
        return (long)cr0;
    }

        return cr0;
}

Ported the code to 32bit IA CPU. Is this okay to enable and disable the cache ?

Thanks for the help.

Peter Cordes
  • 286,368
  • 41
  • 520
  • 731
Titus
  • 41
  • 2
  • 8
  • Have you read the descriptions of these instructions and controls in the Intel SDM? If so, please tell us what parts of the explanations you are having trouble understanding. – prl Apr 11 '18 at 04:20
  • The 64-bit assembly code in the question you linked can be converted to 32-bit code by simply changing rax to eax. (Also, remove the upper 32 bits of the mask constant.) – prl Apr 11 '18 at 04:22
  • Hi prl, thanks for your reply. Yes I ported that code and want to know that whether I have ported code correctly, and also can you please let me know how to confirm that cache is enabled/disable bases on the performance, is it any assembly instruction or reading PC value will help to check the CPU performance before and after cache enable. Thanks again. – Titus Apr 11 '18 at 05:32
  • *Also how can I verify the cache enable/disable/invalidate/writeback functions.* With performance tests. Easy for enable/disable. – Peter Cordes Apr 11 '18 at 05:44
  • Just declare a clobber on `eax` like a normal person, or even better use `%0` as your scratch reg. That push/pop to save/restore `eax` is a total waste of instructions. (I know you just ported it from the 64-bit question; I commented the same thing there.) – Peter Cordes Apr 11 '18 at 05:50
  • Titus, use @prl to notify users when you reply to them. – Peter Cordes Apr 11 '18 at 05:51
  • @PeterCordes Thanks for suggestions, its really useful. How we can test the performance before and after cache enable ? I tried 'rdpmc' instruction to check the performance, but it returns 0 all the time. Can you please help me on this ? – Titus Apr 11 '18 at 06:47
  • The difference will be *huge*, you can use `rdtsc` without worrying about frequency scaling. (Or set your governor to `performance` and disable turbo, etc.) Run a pointer-chasing loop like `mov rax, [rax]`. (or write it in C with a `volatile void*ptr_to_self = &ptr_to_self;`) – Peter Cordes Apr 11 '18 at 07:11
  • @PeterCordes `uint64_t rdtsc(){ unsigned int lo,hi; __asm__ __volatile__ ("rdtsc" : "=a" (lo), "=d" (hi)); return ((uint64_t)hi << 32) | lo; } cache_test(0);//Cache Disable cache_test(2);//Cache Status mem_access(); uint64_t tick = rdtsc(); printk("tick -> %lld \n", tick); cache_test(1);//Cache Enable cache_test(2);//Cache Status mem_access(); tick = rdtsc(); printk("tick 2 -> %lld \n", tick); ` Tried this code, getting 334 value both times. – Titus Apr 11 '18 at 07:52
  • GNU C has a `__builtin_rdtsc`, you don't have to code it up in inline asm yourself. Anyway, your `cache_test` doesn't include a microbenchmark, so you're only measuring the cycles to change `cr0`, not for any loads. Also, if your measurement is tiny, don't forget to use a serializing instruction like `lfence` to stop `rdtsc` from executing out-of-order. – Peter Cordes Apr 11 '18 at 07:54
  • Output : printf: disable cache! cr0=0x0 printf: cache_status! cr0=0x2 tick -> 334 printf: enable cache; cr0=0x1 printf: cache_status! cr0=0x2 tick 2 -> 334 – Titus Apr 11 '18 at 07:54
  • @PeterCordes I am accessing the system memory of the CPU ('mem_access' function) before and after cache enable, both times reading counter. – Titus Apr 11 '18 at 07:56
  • @PeterCordes I will read the performance by reading some timer registers, now I would like to do cache invalidate or writeback, how can I do this and test, and say that cache operations are working good in CPU, any asm inline code ? BTW. I am not using any Linux OS but bare metal code running on CPU via JTAG. Thanks for the help. I heard that we should access the MTRR registers to cache the system memory, I have physical memory, address 0x8000_0000, how can I enable the cache for this region using MTRR or how can I do this ? – Titus Apr 11 '18 at 10:38
  • @PeterCordes Any help ? – Titus Apr 12 '18 at 01:52
  • set up a long linked list (a couple hundred nodes) with each node in a different cache line (non sequential to defeat prefetch, but not with a large stride so you don't get conflict misses; L1d is 8-way associative on Intel). Get them hot in cache and then measure how long it takes to walk the list to the end. Then `wbinvd` to flush all caches, or `clflushopt` on each line separately, and time again. It should be *much* slower, like at least 10 times slower. Your MTRR should have the memory you're using set to WB, of course. – Peter Cordes Apr 12 '18 at 02:08
  • Thanks @PeterCordes for your help. – Titus Apr 17 '18 at 06:07

0 Answers0