是否可以”黑”python的打印功能？

Is it possible to “hack” Python's print function?

注意：此问题仅供参考。我感兴趣的是，看看Python的内部有多深，可以这样做。

不久前，在一个特定问题内部开始了一场讨论，讨论在调用print之后/期间是否可以修改传递给print语句的字符串。例如，考虑函数：

1 2	def print_something(): print('This cat was scared.')

现在，当运行print时，终端的输出应显示：

1	This dog was scared.

注意，单词"cat"已被单词"dog"替换。某个地方可以通过某种方式修改这些内部缓冲区来更改打印的内容。假设这是在没有原始代码作者明确许可的情况下完成的(因此，黑客/劫持)。

尤其是来自wise@abarnett的评论让我想到：

There are a couple of ways to do that, but they're all very ugly, and
should never be done. The least ugly way is to probably replace the
code object inside the function with one with a different co_consts
list. Next is probably reaching into the C API to access the str's
internal buffer. [...]

看起来这是可能的。

以下是我处理这个问题的天真方法：

1
2
3
4

>>> import inspect
>>> exec(inspect.getsource(print_something).replace('cat', 'dog'))
>>> print_something()
This dog was scared.

当然，exec是不好的，但这并不能真正回答这个问题，因为它在调用print的时间/之后实际上没有修改任何内容。

如@abarnett所解释的那样，该如何做？

相关讨论

首先，实际上有一种更不那么下流的方式。我们所要做的就是改变print的印刷品，对吗？好的。

1
2
3
4
5

_print = print
def print(*args, **kw):
args = (arg.replace('cat', 'dog') if isinstance(arg, str) else arg
for arg in args)
_print(*args, **kw)

或者，类似地，您可以使用monkeypatch sys.stdout，而不是print。好的。

另外，exec … getsource …的想法没有什么问题。好吧，当然有很多问题，但比接下来的要少…好的。

但是如果您确实想修改函数对象的代码常量，我们可以这样做。好的。

如果你真的想玩代码对象，你应该使用像bytecode(当它完成时)或byteplay(直到那时，或者对于旧的python版本)这样的库，而不是手工操作。即使是对于如此琐碎的事情，CodeType初始值设定项也是一种痛苦；如果你真的需要做一些诸如修复lnotab之类的事情，只有疯子才会手动这么做。好的。

另外，不用说，并非所有的Python实现都使用cpython风格的代码对象。这段代码将在CPython3.7中工作，并且可能所有的版本都会回到至少2.2，只做一些小的更改(而不是代码黑客工具，而是类似于生成器表达式的东西)，但是它不会与任何版本的Ironpython一起工作。好的。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34

import types

def print_function():
print ("This cat was scared.")

def main():
# A function object is a wrapper around a code object, with
# a bit of extra stuff like default values and closure cells.
# See inspect module docs for more details.
co = print_function.__code__
# A code object is a wrapper around a string of bytecode, with a
# whole bunch of extra stuff, including a list of constants used
# by that bytecode. Again see inspect module docs. Anyway, inside
# the bytecode for string (which you can read by typing
# dis.dis(string) in your REPL), there's going to be an
# instruction like LOAD_CONST 1 to load the string literal onto
# the stack to pass to the print function, and that works by just
# reading co.co_consts[1]. So, that's what we want to change.
consts = tuple(c.replace("cat","dog") if isinstance(c, str) else c
for c in co.co_consts)
# Unfortunately, code objects are immutable, so we have to create
# a new one, copying over everything except for co_consts, which
# we'll replace. And the initializer has a zillion parameters.
# Try help(types.CodeType) at the REPL to see the whole list.
co = types.CodeType(
co.co_argcount, co.co_kwonlyargcount, co.co_nlocals,
co.co_stacksize, co.co_flags, co.co_code,
consts, co.co_names, co.co_varnames, co.co_filename,
co.co_name, co.co_firstlineno, co.co_lnotab,
co.co_freevars, co.co_cellvars)
print_function.__code__ = co
print_function()

main()

黑客代码对象有什么问题？主要是segfaults，吞食整个堆栈的RuntimeError，可以处理的更正常的RuntimeError，或者当您尝试使用它们时可能只提高TypeError或AttributeError的垃圾值。例如，当字节码中有一个LOAD_CONST 0，或者当varnames减少了1，所以最高的LOAD_FAST实际上加载了一个freevar/cellvar单元时，可以尝试只使用RETURN_VALUE，而堆栈中没有任何内容(字节码b'S\0'用于3.6+，b'S')，或者使用一个空的tuple来创建一个代码对象。.为了一些真正的乐趣，如果你的lnotab足够错误，你的代码只会在调试器中运行时出错。好的。

使用bytecode或byteplay并不能保护您不受所有这些问题的影响，但它们确实有一些基本的健全性检查，以及一些不错的帮助程序，可以让您执行一些操作，例如插入一块代码，让它担心更新所有偏移量和标签，这样您就不会出错，等等。(另外，它们让您不必输入荒谬的6行构造函数，也不必调试由此产生的愚蠢的拼写错误。)好的。

现在到2。好的。

我提到代码对象是不可变的。当然，consts是一个元组，所以我们不能直接改变它。常量元组中的东西是一个字符串，我们也不能直接改变它。这就是为什么我必须构建一个新的字符串来构建一个新的元组来构建一个新的代码对象。好的。

但是，如果可以直接更改字符串呢？好的。

好的，没有在深覆盖，所有的只是一些数据指向C，对吗？如果你使用CPython的C API，它访问的对象，你可以使用一个在ctypesAPI访问Python本身，这是这样一个可怕的想法，他们有权把pythonapictypes在程序的模块。最重要的技巧：你需要知道的是，id(x)实际指向在内存(x为int)。

不幸的是，C API字符串不会让我们可以得到内部储存在已冻结的字符串。螺杆可以这样读吧，让我们及时发现自己的头文件和存储。

如果你使用CPython 3.4～3.7(它与不同的老年版本，谁知道在未来，一个字符串字面量)从一个模块，这是由纯粹的ASCII码是要使用ASCII格式的光盘存储，这意味着早期的结构和缓冲结束后即刻(ASCII字节在内存中。本想休息(在可能的segfault)如果你把一个非ASCII字符的字符串，或某些种非字面字符串，但你能读上其他的方式来访问缓冲区的4种不同的字符串。

好让事情更容易，我使用我的superhackyinternalsGitHub项目关闭。(这是故意的困境，因为你真的不该管可安装在一个实验中使用除了你的本地版本的解释器和一个样。)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

import ctypes
import internals # https://github.com/abarnert/superhackyinternals/blob/master/internals.py

def print_function():
print ("This cat was scared.")

def main():
for c in print_function.__code__.co_consts:
if isinstance(c, str):
idx = c.find('cat')
if idx != -1:
# Too much to explain here; just guess and learn to
# love the segfaults...
p = internals.PyUnicodeObject.from_address(id(c))
assert p.compact and p.ascii
addr = id(c) + internals.PyUnicodeObject.utf8_length.offset
buf = (ctypes.c_int8 * 3).from_address(addr + idx)
buf[:3] = b'dog'

print_function()

main()

如果你想玩这个东西是一个整体，int焊料在比str封面。它把你到焊料容易打破由CAN的价值变化2到1，好吗？实际上，可以想象，让我们只是做它(使用一个superhackyinternals类型)：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

>>> n = 2
>>> pn = PyLongObject.from_address(id(n))
>>> pn.ob_digit[0]
2
>>> pn.ob_digit[0] = 1
>>> 2
1
>>> n * 3
3
>>> i = 10
>>> while i < 40:
... i *= 2
... print(i)
10
10
10

当代码……这盒有无限长度的滚动条。

我在想同样的事情ipython，和第一次，我想2马特拉斯的提示，它去到一些小不间断的无限循环。presumably它使用在其数有2解释器在repl环楼，不是吗？

好的。

相关讨论

@ C？？？？S？？？？？代码咀嚼可以说是合理的python，尽管您通常只希望出于更好的原因(例如，通过自定义优化器运行字节码)接触代码对象。另一方面，访问PyUnicodeObject的内部存储，这可能是真正意义上的python，因为python解释器将运行它……
您的第一个代码片段引发了NameError: name 'arg' is not defined。你是说：args = [arg.replace('cat', 'dog') if isinstance(arg, str) else arg for arg in args]吗？写这篇文章最好的方法是：args = [str(arg).replace('cat', 'dog') for arg in args]。另一个，甚至更短的选项：args = map(lambda a: str(a).replace('cat', 'dog'), args)。这还有一个额外的好处，即args是懒惰的(也可以通过用生成器one-*args来代替上面的列表理解来实现)。
@康斯坦丁：谢谢你的接球。我把它改成了修正现有的listcomp而不是str不具说服力的非str论点。毕竟，这样的话，如果有人在我们面前拦截了print，并用一个与非str参数不同的函数替换了它，它就可以工作了。(谁会做这种事？那么，谁会在这段代码中做这些事情呢？)
@Konstantin关于使用map：一般来说，如果映射的表达式不仅仅是函数调用，那么理解比将表达式包装在函数调用中传递给map更清楚。如果你想让它变得懒惰，你可以使用genexpr而不是listcomp，但既然将它显示为函数调用，就只需要列出它……好吧，这是一个判断调用，但我认为你就在这里；我们并不在乎它是否会被列出，所以没有理由强迫它严格。我会编辑的。
啊，好吧，import internals来自您的Github存储库。顺便说一下，都修好了。
@ C？？？？S？？？？是的，iirc我只使用PyUnicodeObject结构定义，但是把它复制到答案中会妨碍我，我认为superhackyinternals的自述文件和/或源注释实际上解释了如何访问缓冲区(至少在下次我关心的时候足够提醒我；不确定它是否对其他人足够…)，我不想进入这里。相关部分是如何通过ctypes从活动的python对象获取其PyObject *。(也可以模拟指针算法，避免自动char_p转换等)
在导入任何模块之前，我们可以在__builtins__内重新分配print以使第一个解决方案更适用于全球吗？
JPMC26当然可以。一般来说，这是一件很难看的事情，但远不及修改字符串对象的内部缓冲区那么难看。
@JPMC26我认为在导入模块之前不需要这样做，只要在模块打印之前这样做。模块将每次执行名称查找，除非它们显式地将print绑定到名称。您还可以为它们绑定名称print：import yourmodule; yourmodule.print = badprint。
@leewz是的，这是取代sys.stdout的优点和/或缺点之一：每个模块都有一个全局，但只有一个sys。当然，你也可以很容易地将一个假的sys模块注入到模块中，只要你在它们导入sys之后，但在它们使用之前就这样做。(这可能意味着一个进口钩子，而不仅仅是猴子从外面修补。)
@艾伯特：我注意到你经常警告你要这样做(例如，"你永远不想这样做"、"为什么改变价值是个坏主意"等等)。目前还不清楚可能出什么问题(挖苦)，你愿意详细阐述一下吗？这可能有助于那些盲目尝试的人。
@L'L'L我增加了一点关于这些技术中的每一种都会导致什么样的问题。但事实上，你不能通过篡改int来唤醒cthulhu，只使用str，所以不要担心太多。：)
喜欢最新的添加。我知道你可以用Java做这件事，但是我不知道在Python中重写文字也是很容易的。
@ C？？？？S？？？？即使Intercal著名的操作数重载也无法在没有特殊命令行开关的情况下重载即时常量值。：)
唉，我的堡垒过去在Python中萦绕着我！-)

猴子补丁print。

print是一个内置函数，因此它将使用builtins模块(或python 2中的__builtin__模块)中定义的print函数。因此，只要您想修改或更改内置函数的行为，就可以简单地重新分配该模块中的名称。

这个过程称为monkey-patching。

1
2
3
4
5
6
7
8
9
10
11
12

# Store the real print function in another variable otherwise
# it will be inaccessible after being modified.
_print = print

# Actual implementation of the new print
def custom_print(*args, **options):
_print('custom print called')
_print(*args, **options)

# Change the print function globally
import builtins
builtins.print = custom_print

之后，每个print调用都将通过custom_print，即使print在外部模块中。

但是，您不希望打印其他文本，而是希望更改打印的文本。一种方法是将其替换为将要打印的字符串：

1
2
3
4
5
6
7
8
9
10
11
12
13
14

_print = print

def custom_print(*args, **options):
# Get the desired seperator or the default whitspace
sep = options.pop('sep', ' ')
# Create the final string
printed_string = sep.join(args)
# Modify the final string
printed_string = printed_string.replace('cat', 'dog')
# Call the default print function
_print(printed_string, **options)

import builtins
builtins.print = custom_print

事实上，如果你跑步：

1
2
3
4

>>> def print_something():
... print('This cat was scared.')
>>> print_something()
This dog was scared.

或者，如果将其写入文件：

测试文件

1
2
3
4

def print_something():
print('This cat was scared.')

print_something()

并导入：

1
2
3
4

>>> import test_file
This dog was scared.
>>> test_file.print_something()
This dog was scared.

所以它真的按预期工作。

但是，如果您只是暂时想要monkey patch print，可以将其包装在上下文管理器中：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

import builtins

class ChangePrint(object):
def __init__(self):
self.old_print = print

def __enter__(self):
def custom_print(*args, **options):
# Get the desired seperator or the default whitspace
sep = options.pop('sep', ' ')
# Create the final string
printed_string = sep.join(args)
# Modify the final string
printed_string = printed_string.replace('cat', 'dog')
# Call the default print function
self.old_print(printed_string, **options)

builtins.print = custom_print

def __exit__(self, *args, **kwargs):
builtins.print = self.old_print

因此，当您运行时，它取决于所打印的内容：

1
2
3
4
5
6

>>> with ChangePrint() as x:
... test_file.print_something()
...
This dog was scared.
>>> test_file.print_something()
This cat was scared.

所以你可以通过猴子补丁"黑客"print。

修改目标而不是print。

如果你看print的签名，你会注意到file的论点，默认为sys.stdout。注意，这是一个动态默认参数(每次调用print时，它都会查找sys.stdout)，与Python中的正常默认参数不同。因此，如果您更改sys.stdoutprint，实际上将打印到不同的目标，甚至比python还提供redirect_stdout函数更方便(从python 3.4开始，但很容易为早期的python版本创建等效函数)。

缺点是，它不适用于不打印到sys.stdout的print声明，并且创建自己的stdout并不是很简单。

1
2
3
4
5
6
7
8
9

import io
import sys

class CustomStdout(object):
def __init__(self, *args, **kwargs):
self.current_stdout = sys.stdout

def write(self, string):
self.current_stdout.write(string.replace('cat', 'dog'))

但是，这也有效：

1
2
3
4
5
6
7

>>> import contextlib
>>> with contextlib.redirect_stdout(CustomStdout()):
... test_file.print_something()
...
This dog was scared.
>>> test_file.print_something()
This cat was scared.

总结

@abarnet已经提到了其中一些要点，但我想更详细地探讨这些选项。尤其是如何跨模块修改它(使用builtins/__builtin__)以及如何使该更改只是临时的(使用contextmanagers)。