关于python:为什么从np.datetime64到float和back的转换会导致时差?

Why does a conversion from np.datetime64 to float and back lead to a time difference?

使用以下代码,转换回np.datetime64后,我得到两个小时的差异。

我怎么能避免这个? (如果这应该是一个主题:我目前在中欧)

1
2
3
4
5
6
7
8
9
10
import pandas as pd
import numpy as np
import datetime

a = np.datetime64('2018-04-01T15:30:00').astype("float")
a
b = np.datetime64(datetime.datetime.fromtimestamp(a))
b

Out[18]: numpy.datetime64('2018-04-01T17:30:00.000000')

问题不在np.datetime64转换中,而在datetime.datetime.fromtimestamp中。

自Numpy 1.11起,np.datetime64是时区的天真。它不再假定输入是在本地时间,也不是打印本地时间。

但是,datetime.datetime.fromtimestamp确实假设当地时间。来自文档:

Return the local date and time corresponding to the POSIX timestamp, such as is returned by time.time(). If optional argument tz is None or not specified, the timestamp is converted to the platform’s local date and time, and the returned datetime object is naive.

您可以使用datetime.datetime.utcfromtimestamp代替:

1
2
3
>>> a = np.datetime64('2018-04-01T15:30:00').astype("float")
>>> np.datetime64(datetime.datetime.utcfromtimestamp(a))
numpy.datetime64('2018-04-01T15:30:00.000000')

回过头来看一些笔记,我发现了以下内容:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import numpy
dt64 = numpy.datetime64("2011-11-11 14:23:56" )

# dt64 is internally just some sort of int
#      it has no fields, and very little support in numpy

import datetime, time
dtdt = dt64.astype(datetime.datetime)         # <<<<<<<< use this!
dtdt.year
dtdt.month
dtdt.day

# to convert back:
dt64 = np.datetime64(dtdt)                    # <<<<<<<< use this too!
dt64.item().strftime("%Y%b%d")

模块的日期时间和时间是普通的python模块:它们工作得相当好,有很多字段,转换和支持。

datetime64是一个未完全实现的子类型,内置于numpy中。它只是某种64位int(?)(自1970年以来的秒数?)。 datetime64与datetime.datetime完全不同。如果将datetime64转换为float并返回,则会丢失大量精度(位) - 因此会出错。

(不是numpy)模块datetime也可以执行以下操作:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# timedelta()
delta = datetime.timedelta(days=11, hours=10, minutes=9, seconds=8)

delta                   # datetime.timedelta(11, 36548)     # (days,seconds)
delta.days
delta.seconds
delta.microseconds
delta.total_seconds()   # 986948.0

# arithmetic: +-*/
#   2 timedelta's
#   timedelta and datetime
now = datetime.datetime.now()
christmas = datetime.datetime(2019,12,25)
delta = christmas - now

所以让numpy有时会将你的日期数据存储为datetime64,但是我建议使用not-numpy模块的datetime来处理datetime-arithmetic。


https://github.com/numpy/numpy/issues/3290

As of 1.7, datetime64 attempts to handle timezones by:

  • Assuming all datetime64 objects are in UTC
  • Applying timezone offsets when parsing ISO 8601 strings
  • Applying the Locale timezone offset when the ISO string does not specify a TZ.
  • Applying the Locale timezone offset when printing, etc.

https://stackoverflow.com/a/18817656/7583612

classmethod datetime.fromtimestamp(timestamp, tz=None)

Return the local date and time corresponding to the POSIX timestamp,
such as is returned by time.time(). If optional argument tz is None or
not specified, the timestamp is converted to the platform’s local date
and time, and the returned datetime object is naive.

Else tz must be an instance of a class tzinfo subclass, and the
timestamp is converted to tz‘s time zone. In this case the result is
equivalent to
tz.fromutc(datetime.utcfromtimestamp(timestamp).replace(tzinfo=tz))