163

Python recognizes the following as instruction which defines file's encoding:

# -*- coding: utf-8 -*-

I definitely saw this kind of instructions before (-*- var: value -*-). Where does it come from? What is the full specification, e.g. can the value include spaces, special symbols, newlines, even -*- itself?

My program will be writing plain text files and I'd like to include some metadata in them using this format.

Wai Ha Lee
  • 8,173
  • 68
  • 59
  • 86
hamstergene
  • 23,424
  • 5
  • 53
  • 71
  • 5
    This is easier to remember and works in my editor, PyCharm. `# coding: utf-8` – crizCraig Apr 27 '12 at 00:23
  • 3
    Using `# coding: utf8` works out of the box with Python 2.7, even outside of PyCharm. (I use SublimeText). – Basj Feb 23 '18 at 14:48
  • 2
    [File local variable in Emacs](https://www.gnu.org/software/emacs/manual/html_node/emacs/Specifying-File-Variables.html), [cookie in SciTE](https://www.scintilla.org/SciTEDoc.html#Encodings), [Encoding declarations in Python](https://docs.python.org/3/reference/lexical_analysis.html#encoding-declarations) and [Modeline in Vim](http://vimdoc.sourceforge.net/htmldoc/usr_21.html#21.6). – Vlastimil Ovčáčík Apr 27 '18 at 17:44
  • 2
    @Cbhihe This question is not about Python, not about what the instruction does or how it works. It is asking which pre-Python software invented it and if there is more to it than just file encoding. – hamstergene Apr 04 '19 at 18:47

4 Answers4

101

This way of specifying the encoding of a Python file comes from PEP 0263 - Defining Python Source Code Encodings.

It is also recognized by GNU Emacs (see Python Language Reference, 2.1.4 Encoding declarations), though I don't know if it was the first program to use that syntax.

Andrea Spadaccini
  • 12,000
  • 5
  • 39
  • 54
  • 5
    From what I can conclude from Emacs manual, value can be any LISP expression, particularly, a double-quoted string – hamstergene Feb 02 '11 at 10:19
  • Thanks for the pep link. I was formerly under the impression that the directive was only used by the text editor. Until now, I never knew that the python interpreter actually parsed the comment if it is present on the first two lines of the file. – umeboshi Dec 26 '14 at 03:08
56

# -*- coding: utf-8 -*- is a Python 2 thing.

In Python 3.0+ the default encoding of source files is already UTF-8 so you can safely delete that line because unless it says something other than some variation of "uft-8", it has no effect. See Should I use encoding declaration in Python 3?


pyupgrade is a tool you can run on your code to remove those comments and other no-longer-useful leftovers from Python 2, like having all your classes inherit from object.

Boris Verkhovskiy
  • 10,733
  • 7
  • 77
  • 79
10

This is so called file local variables, that are understood by Emacs and set correspondingly. See corresponding section in Emacs manual - you can define them either in header or in footer of file

Alex Ott
  • 64,084
  • 6
  • 72
  • 107
  • 1
    This specific type of file local variable is also understood by the Python interpreter itself, it's not just for text editors. https://stackoverflow.com/questions/41680533/is-coding-utf-8-also-a-comment-in-python – Boris Verkhovskiy Jan 09 '20 at 04:25
4

In PyCharm, I'd leave it out. It turns off the UTF-8 indicator at the bottom with a warning that the encoding is hard-coded. Don't think you need the PyCharm comment mentioned above.

cwp393
  • 57
  • 1
  • 1
  • actually, if I put a line like `test1 = 'äöü'` it will hint you to add such a headder to the file. (pycharm 2019.1) – Cutton Eye Sep 27 '19 at 10:33