关于python：SQLAlchemy的Unicode问题

Unicode Problem with SQLAlchemy

我知道从Unicode转换时有问题，但我不确定它发生在哪里。

我正在从一个HTML文件目录中提取最近一次eruopean旅行的数据。一些位置名称具有非ASCII字符(如_，？，U)。我使用regex从文件的字符串表示中获取数据。

如果我在找到位置时打印位置，它们将使用字符打印，因此编码必须正常：

1 2	Le Pré-Saint-Gervais, France H?tel-de-Ville, France

我使用sqlAlchemy将数据存储在sqlite表中：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36

Base = declarative_base()
class Point(Base):
__tablename__ = 'points'

id = Column(Integer, primary_key=True)
pdate = Column(Date)
ptime = Column(Time)
location = Column(Unicode(32))
weather = Column(String(16))
high = Column(Float)
low = Column(Float)
lat = Column(String(16))
lon = Column(String(16))
image = Column(String(64))
caption = Column(String(64))

def __init__(self, filename, pdate, ptime, location, weather, high, low, lat, lon, image, caption):
self.filename = filename
self.pdate = pdate
self.ptime = ptime
self.location = location
self.weather = weather
self.high = high
self.low = low
self.lat = lat
self.lon = lon
self.image = image
self.caption = caption

def __repr__(self):
return"<Point('%s','%s','%s')>" % (self.filename, self.pdate, self.ptime)

engine = create_engine('sqlite:///:memory:', echo=False)
Base.metadata.create_all(engine)
Session = sessionmaker(bind = engine)
session = Session()

我遍历这些文件，并将每个文件中的数据插入数据库：

1
2
3
4
5
6
7
8
9

for filename in filelist:

# open the file and extract the information using regex such as:
location_re = re.compile("(.*)",re.M)
# extract other data

newpoint = Point(filename, pdate, ptime, location, weather, high, low, lat, lon, image, caption)
session.add(newpoint)
session.commit()

我在每个插件上看到以下警告：

1 2	/usr/lib/python2.5/site-packages/SQLAlchemy-0.5.4p2-py2.5.egg/sqlalchemy/engine/default.py:230: SAWarning: Unicode type received non-unicode bind param value 'Spitalfields, United Kingdom' param.append(processors[key](compiled_params[key]))

当我试图对桌子做任何事情时，比如：

1	session.query(Point).all()

我得到：

1
2
3
4
5
6
7
8
9
10
11
12

Traceback (most recent call last):
File"./extract_trips.py", line 131, in <module>
session.query(Point).all()
File"/usr/lib/python2.5/site-packages/SQLAlchemy-0.5.4p2-py2.5.egg/sqlalchemy/orm/query.py", line 1193, in all
return list(self)
File"/usr/lib/python2.5/site-packages/SQLAlchemy-0.5.4p2-py2.5.egg/sqlalchemy/orm/query.py", line 1341, in instances
fetch = cursor.fetchall()
File"/usr/lib/python2.5/site-packages/SQLAlchemy-0.5.4p2-py2.5.egg/sqlalchemy/engine/base.py", line 1642, in fetchall
self.connection._handle_dbapi_exception(e, None, None, self.cursor, self.context)
File"/usr/lib/python2.5/site-packages/SQLAlchemy-0.5.4p2-py2.5.egg/sqlalchemy/engine/base.py", line 931, in _handle_dbapi_exception
raise exc.DBAPIError.instance(statement, parameters, e, connection_invalidated=is_disconnect)
sqlalchemy.exc.OperationalError: (OperationalError) Could not decode to UTF-8 column 'points_location' with text 'Le Pré-Saint-Gervais, France' None None

我希望能够正确地存储和返回位置名称，并保持原始字符的完整性。任何帮助都将不胜感激。

我发现这篇文章解释了我的困境

http：//www.amk.ca/python/howto/uniccode 355；reading-and-writing-uniccode-data

我本来可以通过使用编码器模块来获得所希望的结果，然后将我的程序改变为：

When opening the file:

ZZU1

When printing the location:

1	print location.encode('ISO-8859-1')

我现在可以从桌子上查询和操纵数据，而不必事先弄错。我只需要在输出文本时具体说明编码。

(我仍然不完全明白这是如何工作，所以我想现在是时候更多地了解Python的独码处理。)

相关讨论

From Sqlalchemy.org

See section 0.4.2

added new flag to String and
create_engine(),
assert _unicode=(True|False|'warn'|None).
Defaults to False or None on
create _engine() and String, 'warn' on the Unicode type. When
True,
results in all unicode conversion operations raising an
exception when a
non-unicode bytestring is passed as a bind parameter. 'warn' results
in a warning. It is strongly advised that all unicode-aware
applications
make proper use of Python unicode objects (i.e. u'hello' and not
'hello')
so that data round trips accurately.

我想你正在尝试输入一个非单码字节。也许这会带领你进入正轨？有些形式的转变是需要的，比如说"你好"和"你好"。

谢尔

Try using a column type of uniccode rather than string for the uniccode columns:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

Edit：response to comment:

如果你得到了一个关于统一码编码的警告，那么有两件事你可以尝试：

把你的位置转换成统一码。This would mean having your point created like this:

NewPoint=Point(Filename，PDATE，PTIME，Unicode(Location)，Weather，High，Low，LAT，Lon，Image，Cption)

单码转换会产生一个单码字符串，当它穿过一条弦或一条单码字符串时，所以你不必担心你所经历的一切。

如果不解决编码问题，请试着在您的Unicode Objects上加密。这意味着使用代码如：

NewPoint=Point(Filename，PDATE，PTIME，Unicode(Location).Encode("UTF-8")，Weather，High，Low，Lat，Lon，Image，Caption)

This step probably won't be necessary，but what it essentially does is converts a unicode object from UNICODE-Points to a specific Byte representation(in this case，UTF-8).我希望当你走进一个单一代码对象时能为你做这件事，但可能不是。