24

The accepted answer to What is the strict aliasing rule? mentions that you can use char * to alias another type but not the other way.

It doesn't make sense to me — if we have two pointers, one of type char * and another of type struct something * pointing to the same location, how is it possible that the first aliases the second but the second doesn't alias the first?

Community
  • 1
  • 1
user3489275
  • 351
  • 2
  • 7
  • 13
    You can read a `T` via a `char *`, but you can't read an arbitrary `char` buffer via a `T *`. – Oliver Charlesworth May 24 '14 at 18:09
  • 2
    It's a rule, nothing else... Basically allowing the compiler to optimise more (as with `restrict`)... But also compiler guys being lazy IMHO... – Macmade May 24 '14 at 18:13
  • @OliCharlesworth what about writes then? Is it allowed to write to `T` via `char *`? – user3489275 May 24 '14 at 18:14
  • 1
    this is one of those places where C/C++ don't work the same way – Grady Player May 24 '14 at 18:14
  • @user3489275: No: The lifetime of an object ends when the memory in which it is stored is reused. If the object's type has a non-trivial destructor, it is UB to do so without calling the destructor. – Kerrek SB May 24 '14 at 18:18
  • 1
    @KerrekSB how does object lifetime relates to strict aliasing? – user3489275 May 24 '14 at 18:19
  • @user3489275: Well, you want to alias objects, but that only makes sense if the objects exist. So if there's no more object (because you reused the storage), then there's no point in aliasing. – Kerrek SB May 24 '14 at 18:20
  • @KerrekSB so you are trying to say that casts is not allowed at all? – user3489275 May 24 '14 at 18:22
  • 2
    @user3489275: Wait, maybe my reference wasn't clear - casting the pointer is OK, but not writing to the memory of an object through a char pointer. I.e. when you write to the memory, you invalidate the original object. Reading the bytes of the underlying representation of an object through a char pointer is perfectly fine (and indeed this is how any I/O works). – Kerrek SB May 24 '14 at 18:26
  • 2
    @KerrekSB: It is **not** UB to reuse the memory (and thus terminate lifetime) of an object with a non-trivial destructor. Weird as it might sound, the standard explicitly states that this is only undefined if the program depends on side effects of the destructor. Also, I don't think that writing through the `char*` *ends* the lifetime of the object. Using placement new, sure, just writing through a pointer... not so sure. – David Rodríguez - dribeas May 24 '14 at 20:00
  • @DavidRodríguez-dribeas: Hm, fair enough, I should have said "destructor with side effects". – Kerrek SB May 24 '14 at 20:51
  • 1
    @KerrekSB: Well, if the side effects of the destructor don't affect the observable behavior of the program, then the program does not *depend* on the destructor being executed. – David Rodríguez - dribeas May 24 '14 at 20:54
  • They do alias one another. However, of course, you can't access a `char` object through an incompatible reference type. I explain here: http://stackoverflow.com/questions/29121176/can-aliasing-problems-be-avoided-with-const-variables/29217925#29217925 – jschultz410 Mar 30 '15 at 19:45
  • @GradyPlayer this is the same in both C and C++ – M.M Apr 11 '15 at 23:50
  • 1
    The answer to this question is "Because the standard says so" – M.M Apr 11 '15 at 23:52
  • @GradyPlayer "_C/C++ don't work the same way_" Please elaborate – curiousguy Aug 15 '15 at 03:03
  • @OliverCharlesworth "_you can't read an arbitrary char buffer via a T *._" Yes. Alignment alone implies you can't. – curiousguy Aug 15 '15 at 03:04
  • @KerrekSB "_Reading the bytes of the underlying representation of an object through a char pointer is perfectly fine (and indeed this is how any I/O works)._" Please elaborate – curiousguy Aug 15 '15 at 03:08
  • @DavidRodríguez-dribeas "_Also, I don't think that writing through the char* ends the lifetime of the object._" a polymorphic object or a POD? – curiousguy Aug 15 '15 at 03:10
  • I would guess that it can be because `char` is a single byte so `char*` can represent a sequence of bytes. – Roy Avidan Aug 18 '20 at 21:47

3 Answers3

17

if we have two pointers, one of type char * and another of type struct something * pointing to the same location, how is it possible that the first aliases the second but the second doesn't alias the first?

It does, but that's not the point.

The point is that if you have one or more struct somethings then you may use a char* to read their constituent bytes, but if you have one or more chars then you may not use a struct something* to read them.

Lightness Races in Orbit
  • 369,052
  • 73
  • 620
  • 1,021
  • 1
    The reason why a char* is allowed to alias another type is simply because it provides a very simple way to serialize a struct and is a commonly used pattern. – doron Jun 30 '15 at 11:57
  • and now the OP knows as well. – doron Jun 30 '15 at 15:13
  • Define "you have one or more chars" – curiousguy Aug 14 '15 at 10:25
  • 2
    @curiousguy: What's unclear? `char buf[sizeof(something)] = {}; something* ptr = reinterpret_cast(&buf[0]); // invalid` – Lightness Races in Orbit Aug 14 '15 at 10:32
  • 1
    The whole concept of "having". A char is a byte. Every object representation is a bunch of bytes, or chars. You always have one or more chars. – curiousguy Aug 14 '15 at 10:35
  • @LightnessRacesinOrbit `char buf[sizeof(something)]` is an example NOT a definition. – curiousguy Aug 14 '15 at 11:07
  • @LightnessRacesinOrbit I don't understand why your sample is invalid. I thought `char` was an exception to strict aliasing. doesn't `vector` and `variant` and `optional` and anything managing memory uses this pattern ? (+ alignment fixes tho) – v.oddou Apr 19 '18 at 06:42
  • @v.oddou: They do it by actually constructing an object in that space (placement new), which is a different example. – Lightness Races in Orbit Apr 19 '18 at 13:56
  • how so ? you have to cast the memory content to the user type when the user calls `get<0>()` or this kind of stuff, it looks exactly like your invalid line. – v.oddou Apr 20 '18 at 02:44
  • @v.oddou: They _actually constructed an object in that space_ (with placement new) beforehand, so the cast is valid and correct. In the `char buf[sizeof(something)] = {}` example above, though, that has not happened - there is no `something` in existence and you can't just pretend that there is. – Lightness Races in Orbit Apr 26 '18 at 12:28
  • ahah, of course. lol omg I thought we were not talking at this level. It's obvious than in those exact 2 lines of code, the unconstructed object is invalid to use. but we were talking of legalese; the topic being aliasing, it could work if a 3rd line is doing `new (ptr) something{};` just after that then no? The question is not there, it's whether or not it is UB because of aliasing rules. (completely ignoring that alignof(something) might not be 1.) – v.oddou Apr 26 '18 at 14:21
11

The wording in the referenced answer is slightly erroneous, so let's get that ironed out first: One object never aliases another object, but two pointers can "alias" the same object (meaning, the pointers point to the same memory location - as M.M. pointed out, this is still not 100% correct wording but you get the Idea). Also, the standard itself doesn't (to the best of my knowledge) actually talk about strict aliasing at all, but only gives rules, through which kinds of expressions a object may be accessed or not. Compiler flags like '-fno-strict-aliasing' tell the compiler whether it can assume the programmer followed those rules (so it can perform optimizations based on that assumption) or not.

Now to your question: Any object can be accessed through a pointer to char, but a char object (especially a char array) may not be accessed through most other pointer types. Based on that the compiler can/must make the following assumptions:

  1. If the type of the actual object itself is not known, a char* and T* could always point to the same object (alias each other) -> symmetric relationship.
  2. If T1and T2 are not "related" and not char, then T1* and T2* may never point to the same object -> symmetric relationship
  3. A char* may point to a char OR a T object
  4. A T* may NOT point to an char object -> asymmetric relationship

I believe, the main rationale behind the asymmetric rules about accessing object through pointers is that a char array might not satisfy the alignment requirements of e.g. an int.

So, even without compiler optimizations based on the strict aliasing rule, e.g. writing an int to the location of a 4-byte char array at addresses 0x1,0x2,0x3,0x4 will - in the best case - result in poor performance and - in the worst case - access a different memory location, because the CPU instructions might ignore the lowest two address bits when writing a 4-byte value (so here this might result in a write to 0x0,0x1,0x2 and 0x3).

Please also be aware that the meaning of "related" differs from language to language (between C and C++), but that is not relevant for your question.

M.M
  • 134,614
  • 21
  • 188
  • 335
MikeMB
  • 18,817
  • 9
  • 54
  • 98
  • 8
    Alignment is NOT a rationale for the strict aliasing rule, let alone the main one. It's an orthogonal issue. The reason for the aliasing rule is to enable optimizations. (sometimes called TBAA - type-based aliasing analysis). Further, the rule is not about pointers aliasing each other either. It is about an lvalue aliasing an object. – M.M Jun 30 '15 at 11:54
  • @M.M: I didn't say it was the main reason for strict aliasing rule itselft, but why e.g. a `char*` may point to an `int` while a `int*` may not point to a char array. I corrected my post, so it's no longer talking about two pointers aliasing each other. Maybe you can have another look? – MikeMB Jun 30 '15 at 12:46
  • Also take out the paragraph starting "The main rationale" ; alignment is not a rationale for aliasing rules – M.M Jun 30 '15 at 12:57
  • @M.M: Sorry, I happen to disagree with you on this part and the following paragraph explains, why I think it is a valid rational. Once again, I'm not saying, it is a rational for the strict aliasing rule itself, but for that specific part of it. – MikeMB Jun 30 '15 at 13:15
  • The aliasing rules apply even for aligned memory. – curiousguy Aug 14 '15 at 10:42
  • "*If the type of the actual object itself is not known*" Can you tell me how does the compiler decides whether the object itself is known or not? For example a generic allocator implementation may use 64K char buffers, and here and there it must alias the allocated block header structs onto it to write the necessary tracking data into it. Intent is storing the allocated blocks which can be aliased by char* but if the compiler thinks the buffer holds char objects then I cannot alias the block headers onto without breaking the rule. – Calmarius Nov 28 '15 at 10:45
  • @Calmarius: Not sure if I understand your question. If you access the value of a char object, via an expression of a different type (even by dereferencing a pointer to a POD) you are always violating the strict aliasing rule - whether the compiler knows its actually a char or not. However, you can create an object of the appropriate type inside a char array e.g. via placement new. – MikeMB Nov 28 '15 at 15:50
  • For example do I invoke undefined behavior by writing this: `char *pC = malloc(123); int *pI = (int*)pC; *pI = 42;`? My rationale here is that `malloc` provides the best aligned pointers because it don't know what I use the buffer for, so misalignment cannot happen, then I only access the area via an int pointer. Allocators do something like this: they allocate buffers with `malloc`, store the buffer in a `char*` then when allocating, they use the pointer arithmetic to get an aligned address and the dereference it only via the header struct. – Calmarius Nov 28 '15 at 17:18
  • I mostly use C. But I'm curious what's the situation in C++. – Calmarius Nov 28 '15 at 17:36
  • @Calmarius: In c I believe this should be legal, in c++ this won't even compile (malloc returns void*, which can't be implicitly casted to char*). Aside from that I'm not sure if it is legal in c++ (definitively not for types that aren't trivially copyable, maybe for PODs), but I'd have to look it up in the standard again. The c++ way to do it would be placement new: `new (pI) Int(42);` – MikeMB Nov 28 '15 at 18:03
4

if we have two pointers, one of type char * and another of type struct something * pointing to the same location, how is it possible that the first aliases the second but the second doesn't alias the first?

Pointers don't alias each other; that's sloppy use of language. Aliasing is when an lvalue is used to access an object of a different type. (Dereferencing a pointer gives an lvalue).

In your example, what's important is the type of the object being aliased. For a concrete example let's say that the object is a double. Accessing the double by dereferencing a char * pointing at the double is fine because the strict aliasing rule permits this. However, accessing a double by dereferencing a struct something * is not permitted (unless, arguably, the struct starts with double!).

If the compiler is looking at a function which takes char * and struct something *, and it does not have available the information about the object being pointed to (this is actually unlikely as aliasing passes are done at a whole-program optimization stage); then it would have to allow for the possibility that the object might actually be a struct something *, so no optimization could be done inside this function.

M.M
  • 134,614
  • 21
  • 188
  • 335
  • Dereferencing a struct has no effect, it doesn't read memory. – curiousguy Aug 18 '15 at 16:48
  • 2
    @curiousguy the lvalue produced by dereferencing a pointer to struct may be used to read or write memory. For example `struct something *x = whatever; *x = bla;` – M.M Aug 19 '15 at 00:39
  • I'd describe aliasing as occurring when storage is addressed or accessed via two independent means, each within the active lifetime of the other. If one were to defines aliasing in that fashion, and specify that 6.5p7 only applies in cases that would actually involve that form of aliasing, that would eliminate the need for the Effective Type nonsense as well the character-type exception, and would simultaneously allow more optimizations than are permitted under C99 while allowing the use of more code that would otherwise require `-fno-strict-aliasing`. – supercat Jul 12 '18 at 20:48
  • If code were to say `short *p =(short*)someObject; *p+=1;` and never use `p` again, that should not be considered aliasing because the active lifetime of `p` would only extend until its last use, and `someObject` would never be accessed via any means other than `p` within that time. If, however, code were to perform some accesses to `someObject(without using `p`) before accessing those same parts with `p`, the accesses to `someObject` would alias the lvalue `*p` that was actively associated with those parts. Recognizing what aliasing is and isn't would simplify the Standard hugely. – supercat Jul 12 '18 at 20:53