欢迎光临
我们一直在努力

关于python源码字符编码的定义

运行如下Python打印语句:
print u’I “said” do not touch “this.””‘
其中包含一个中文的双引号,python解释器报错。报错信息如下:

[wangy@bogon 文档]$ python ex1.py
  File “ex1.py”, line 7
SyntaxError: Non-ASCII character ‘\xe2’ in file ex1.py on line 7, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details

查看链接 http://www.python.org/peps/pep-0263.html


主要内容如下:

在Python2.1版本中,源码文件仅仅支持Latin-1,西欧国家的字符编码,从而给亚洲的编程爱

好者造成很大的困扰,必须使用“unicode-escape”编码来表示Unicode literals。

解决的方法就是为了让解释器了解源代码的编码,必须对源码文件的编码进行声明。

定义编码的方式:
Python will default to ASCII as standard encoding if no other encoding hints are given.

To define a source code encoding, a magic comment must be placed into the source 

files either as first or second line in the file, such as:

# coding=
or (using formats recognized by popular editors):

#!/usr/bin/python
# -*- coding: -*-
or:

#!/usr/bin/python
# vim: set fileencoding= :

最好使用第一种或者第二种。

文中特别提到在windows平台下,增加Unicode BOM标记在Unicode文件头,因此不需要特别声明文件编码,同理也会在UTF-8文件头增加UTF-8标记,故亦不需要声明。

如果源文件使用 both the UTF-8 BOM mark signature and a magic encoding comment, the only allowed encoding for the comment is ‘utf-8’. Any other encoding will cause an 
error.

Examples
These are some examples to clarify the different styles for defining the source code encoding at the top of a Python source file:

With interpreter binary and using Emacs style file encoding comment:

#!/usr/bin/python
# -*- coding: latin-1 -*-
import os, sys

#!/usr/bin/python
# -*- coding: iso-8859-15 -*-
import os, sys

#!/usr/bin/python
# -*- coding: ascii -*-
import os, sys

Without interpreter line, using plain text:
# This Python file uses the following encoding: utf-8
import os, sys

Text editors might have different ways of defining the file’s encoding, e.g.:
#!/usr/local/bin/python
# coding: latin-1
import os, sys

Without encoding comment, Python’s parser will assume ASCII text:
#!/usr/local/bin/python
import os, sys

Encoding comments which don’t work:
Missing “coding:” prefix:
#!/usr/local/bin/python
# latin-1
import os, sys

Encoding comment not on line 1 or 2:
#!/usr/local/bin/python
#
# -*- coding: latin-1 -*-
import os, sys

Unsupported encoding:
#!/usr/local/bin/python
# -*- coding: utf-42 -*-
import os, sys

修改源代码,以UTF-8保存,编辑器使用了Linux下的gedit
# -*- coding: utf-8 -*-
print “hello world!”
print “hello Again”
print “I like trying this”
print “This is fun”
print ‘Yay! Printing’
print “I’d much rather you ‘not’.”
print u’I “said” 这里有中文双引号 “this.””‘

正常打印

赞(0)
【声明】:本博客不参与任何交易,也非中介,仅记录个人感兴趣的主机测评结果和优惠活动,内容均不作直接、间接、法定、约定的保证。访问本博客请务必遵守有关互联网的相关法律、规定与规则。一旦您访问本博客,即表示您已经知晓并接受了此声明通告。