161

I looked at the source code at http://referencesource.microsoft.com/, and it appears all the source code is in C#.

I also looked at the source code for the new C# compiler platform (Roslyn), and it is also in C#. How is that possible? Is C# language compiler written in C#? Or am I missing something obvious? If C# compiler is written in C# then how does it work?

Shahzeb
  • 4,635
  • 4
  • 26
  • 40
CriketerOnSO
  • 2,560
  • 2
  • 14
  • 24
  • 14
    Many compilers are written in the language they compile - Google [bootstrapping](http://en.wikipedia.org/wiki/Bootstrapping_%28compilers%29) to learn more. – Paul Roub Dec 16 '14 at 20:24
  • 20
    I think the _original_ compiler was written in C++. – PoweredByOrange Dec 16 '14 at 20:25
  • 47
    Well, a hammer can be forged by using another hammer. Previous version of it... – Eugene Sh. Dec 16 '14 at 20:25
  • By using a spec. And backwards compatibility. – Peter Dec 16 '14 at 20:25
  • it gets compiled into IL – Sam I am says Reinstate Monica Dec 16 '14 at 20:25
  • 11
    The link you posted is the link to the source code of the Framework library, not to the compiler. – Steve Dec 16 '14 at 20:26
  • 9
    Possibly related: [Implementing a compiler in “itself”](http://stackoverflow.com/questions/193560/implementing-a-compiler-in-itself) and [Bootstrapping a language](http://stackoverflow.com/questions/193560/implementing-a-compiler-in-itself) – Habib Dec 16 '14 at 20:58
  • It's not nearly as mind blowing as something like a self-hosted JVM implementation written in Java (JikesRVM). – SK-logic Dec 17 '14 at 02:02
  • @SK-logic: AFAIK, JikesRVM is basically a statically compiled VM, which just happens to be written in Java. What is more mindblowing IMO, is something like the Maxine RVM, which runs inside of itself, compiling itself with its own dynamic JIT compiler while it is running. So, in Jikes, there is still a clear separation between compiling the VM and running the VM, at least as far as I understand it. – Jörg W Mittag Dec 17 '14 at 16:10
  • I'm pretty sure for most popular languages there are compilers written in that language. – iFreilicht Dec 17 '14 at 17:10
  • I remember being blown away by code like this in a Lisp interpreter: `(defun car (cons) (car cons))`. It looks like infinite recursion, but it isn't, because of open-coding in the compiler. – Barmar Dec 23 '14 at 22:49

4 Answers4

247

The original C# compiler wasn't written in C#, it was in C and C++. The new Roslyn compiler was written in C#, but was initially compiled with the old compiler. Once the new compiler was done, it was able to compile its own source code: this is called bootstrapping.

Thomas Levesque
  • 278,830
  • 63
  • 599
  • 738
  • 2
    So when a change has to be made to the "original compiler", does that has to be compiled with the old compiler *(written in C,C++)* ? – CriketerOnSO Dec 16 '14 at 20:31
  • 12
    There would be no need to change the "original compiler" the newer versions would be modified – Pseudonym Dec 16 '14 at 20:35
  • 1
    @CriketerOnSO, the new compiler will replace the old one, so there will be no need to modify the old one. But if MS wanted to do that, they would recompile the old compiler with a C++ compiler, as they did before. – Thomas Levesque Dec 16 '14 at 20:40
  • @ThomasLevesque, thanks for your answer, I wasn't aware of the term bootstrapping. It is really useful. – CriketerOnSO Dec 16 '14 at 20:43
  • @PeterMortensen, what about it? To be frank, I'm not an expert on the subject, and I'm not sure what's the difference between self-hosting and bootstrapping... – Thomas Levesque Dec 17 '14 at 00:10
  • 3
    @ThomasLevesque Self-hosting is the end result of boot-strapping. – arx Dec 17 '14 at 00:19
  • @CriketerOnSO: No, of course not. How would it even be possible to compile the old compiler with the old compiler? The compiler is for C#, but it is written in C++, you need a C++ compiler to compile it. – Jörg W Mittag Dec 17 '14 at 06:02
  • 1
    If you look at the early history of C and Unix you will find a similar story. – Lee Taylor Dec 23 '14 at 20:32
  • 1
    The same is applied to C/C++ compilers, which have a little 'bootstrapping' in assembly which compiles a little subset of C, so it compiles another subset of C increasing support for some 'high-level' programming, and so on... until reaching the last/current subset of the compiler, which can be tested compiling it self. This is and can be used in any kind of language. – Luciano Dec 24 '14 at 13:56
  • When new keywords are added to the language, how it will be compiled? because we don't have the compiler(yet) which understands the new tokens ? – Sriram Sakthivel Apr 14 '15 at 11:36
  • 2
    @SriramSakthivel, the code of the compiler can't use the new keywords, at least not until there is a compiler that understands them. You always use an older version of the compiler to build the new one. – Thomas Levesque Apr 14 '15 at 13:08
  • Thanks, I got it. Thing is new compiler shouldn't use new keywords till you build a compiler which understands them. – Sriram Sakthivel Apr 14 '15 at 13:19
34

Compilers are utility programs - they turn programming language text into machine code. If the programming language describes software that just happens to be a compiler.....

Compilers can also produce machine code for other architectures. For example, Apple compiles iOS using racks of Intel-based servers. The compiler does not have to run the ARM code it generates, just write it to disk.

Compiler 2.0 must be written in a language compiler 1.0 can process, but it can certainly create compiler 2.0 with newer features like optimization. You can then re-compile the source code using compiler 2.0 and make a better version of itself. Again, the compiler doesn't know it's making another version of itself.

If we go far enough back into the mists of time then we do reach a point where we have no compiler - the very first iteration of a high-level language. Then we have to get out the pencils and opcode books and write the first one in assembly. How did we write the first assembler? Direct machine code entry, probably on punched paper tape, or flipping switches on the front panel.

paul
  • 341
  • 2
  • 2
  • 11
    And the paper tape is just flipping switches via holes in the paper. :-) – Zan Lynx Dec 17 '14 at 00:11
  • 3
    Paper tape as a storage technology will *never* take off. It's just too complex and error-prone, plus it burns easily if there is a short circuit in the reader and that will completely destroy your program. – user Dec 18 '14 at 08:33
17

A compiler is just a program like any other program. There is nothing magical or special about it. It takes some input and produces some output. In this particular case, the input just happens to be C# and the output just happens to be CIL, but that's no different from the input being a series of tax returns and the output being a report.

Jörg W Mittag
  • 351,196
  • 74
  • 424
  • 630
  • 11
    It is different -- it is much easier ,-). – Peter - Reinstate Monica Dec 17 '14 at 13:22
  • 4
    @PeterSchneider: People like to cast compilers as mythical magical creatures, but in the end, they are just programs that convert input to output. Pretty much every program on the planet parses some input, tries to make sense of it, and turns it into some output. In some sense, every input is a program written in some language, every program is a compiler. – Jörg W Mittag Dec 17 '14 at 13:35
  • 3
    I couldn't agree more. All I wanted to say is that tax laws are a terrible mess. By contrast, formal languages are typically well defined in a way that is suitable for automatization. Which makes a simple compiler arguably easier to write than a program dealing with taxes. Although Eric Lippert may beg to disagree wrt C# compilers, cf. http://blogs.msdn.com/b/ericlippert/archive/2010/02/04/how-many-passes.aspx. Came long way from one-pass C compilers. – Peter - Reinstate Monica Dec 17 '14 at 13:42
  • 1
    @PeterSchneider: Ah, sorry, I misinterpreted your comment by 180° :-D – Jörg W Mittag Dec 17 '14 at 14:20
  • I like this answer best as it addresses the OP's thought most directly. It clears the mist that surrounds the "all mighty" compiler. – Assaf Levy Dec 23 '14 at 22:20
0

You write a language in whatever is available and create a new compiler for it. Now this program we can call it C# Compiler V 1.0 is able to read and compile any C# code with current set of reserved words. Now, you say, well I want to introduce a new feature that did not exist before, like where statement. Ok, you use C# Compiler V 1.0 which obviously does not have where statement anywhere and compile a code into a new version C# Compiler V 2.0.

You may ask here: but wait, there is no where statement in C# Compiler V 1.0. Now, a compiler is such beast that it does a very specific job for which you do not need more than 20% of what C# can offer anyway. Sure, it is sometimes tricky to think about new features like yield, but unless yield is expressed in simpler terms, you would not be able to implement it easily anyway regardless of what compiling language you use.

Once your C# Compiler V 2.0 is created, even though you do not need where statement and it is maybe not even used anywhere in the Code for C# Compiler V 2.0, you would still recompile it with your new compiler and this C# Compiler V 2.0 produced from the Code for C# Compiler V 2.0 by C# Compiler V 2.0 is your New C# Compiler V 2.0 compiler.

Before you do this since your new compiler can understand new syntax you are entitled to adjust the compiler code itself and add anything that can be compiled into it, if you think that it will improve anything. However, it is a small chance that a new syntax can improve the compiler itself.

alex.peter
  • 236
  • 2
  • 7