Is it possible to “hack” Python's print function?
注意:此问题仅供参考。我感兴趣的是,看看Python的内部有多深,可以这样做。
不久前,在一个特定问题内部开始了一场讨论,讨论在调用
1 2 | def print_something(): print('This cat was scared.') |
现在,当运行
1 | This dog was scared. |
注意,单词"cat"已被单词"dog"替换。某个地方可以通过某种方式修改这些内部缓冲区来更改打印的内容。假设这是在没有原始代码作者明确许可的情况下完成的(因此,黑客/劫持)。
尤其是来自wise@abarnett的评论让我想到:
There are a couple of ways to do that, but they're all very ugly, and
should never be done. The least ugly way is to probably replace the
code object inside the function with one with a differentco_consts
list. Next is probably reaching into the C API to access the str's
internal buffer. [...]
看起来这是可能的。
以下是我处理这个问题的天真方法:
1 2 3 4 | >>> import inspect >>> exec(inspect.getsource(print_something).replace('cat', 'dog')) >>> print_something() This dog was scared. |
当然,
如@abarnett所解释的那样,该如何做?
首先,实际上有一种更不那么下流的方式。我们所要做的就是改变
1 2 3 4 5 | _print = print def print(*args, **kw): args = (arg.replace('cat', 'dog') if isinstance(arg, str) else arg for arg in args) _print(*args, **kw) |
或者,类似地,您可以使用monkeypatch
另外,
但是如果您确实想修改函数对象的代码常量,我们可以这样做。好的。
如果你真的想玩代码对象,你应该使用像
另外,不用说,并非所有的Python实现都使用cpython风格的代码对象。这段代码将在CPython3.7中工作,并且可能所有的版本都会回到至少2.2,只做一些小的更改(而不是代码黑客工具,而是类似于生成器表达式的东西),但是它不会与任何版本的Ironpython一起工作。好的。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | import types def print_function(): print ("This cat was scared.") def main(): # A function object is a wrapper around a code object, with # a bit of extra stuff like default values and closure cells. # See inspect module docs for more details. co = print_function.__code__ # A code object is a wrapper around a string of bytecode, with a # whole bunch of extra stuff, including a list of constants used # by that bytecode. Again see inspect module docs. Anyway, inside # the bytecode for string (which you can read by typing # dis.dis(string) in your REPL), there's going to be an # instruction like LOAD_CONST 1 to load the string literal onto # the stack to pass to the print function, and that works by just # reading co.co_consts[1]. So, that's what we want to change. consts = tuple(c.replace("cat","dog") if isinstance(c, str) else c for c in co.co_consts) # Unfortunately, code objects are immutable, so we have to create # a new one, copying over everything except for co_consts, which # we'll replace. And the initializer has a zillion parameters. # Try help(types.CodeType) at the REPL to see the whole list. co = types.CodeType( co.co_argcount, co.co_kwonlyargcount, co.co_nlocals, co.co_stacksize, co.co_flags, co.co_code, consts, co.co_names, co.co_varnames, co.co_filename, co.co_name, co.co_firstlineno, co.co_lnotab, co.co_freevars, co.co_cellvars) print_function.__code__ = co print_function() main() |
黑客代码对象有什么问题?主要是segfaults,吞食整个堆栈的
使用
现在到2。好的。
我提到代码对象是不可变的。当然,consts是一个元组,所以我们不能直接改变它。常量元组中的东西是一个字符串,我们也不能直接改变它。这就是为什么我必须构建一个新的字符串来构建一个新的元组来构建一个新的代码对象。好的。
但是,如果可以直接更改字符串呢?好的。
好的,没有在深覆盖,所有的只是一些数据指向C,对吗?如果你使用CPython的C API,它访问的对象,你可以使用一个在
不幸的是,C API字符串不会让我们可以得到内部储存在已冻结的字符串。螺杆可以这样读吧,让我们及时发现自己的头文件和存储。
如果你使用CPython 3.4~3.7(它与不同的老年版本,谁知道在未来,一个字符串字面量)从一个模块,这是由纯粹的ASCII码是要使用ASCII格式的光盘存储,这意味着早期的结构和缓冲结束后即刻(ASCII字节在内存中。本想休息(在可能的segfault)如果你把一个非ASCII字符的字符串,或某些种非字面字符串,但你能读上其他的方式来访问缓冲区的4种不同的字符串。
好让事情更容易,我使用我的
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | import ctypes import internals # https://github.com/abarnert/superhackyinternals/blob/master/internals.py def print_function(): print ("This cat was scared.") def main(): for c in print_function.__code__.co_consts: if isinstance(c, str): idx = c.find('cat') if idx != -1: # Too much to explain here; just guess and learn to # love the segfaults... p = internals.PyUnicodeObject.from_address(id(c)) assert p.compact and p.ascii addr = id(c) + internals.PyUnicodeObject.utf8_length.offset buf = (ctypes.c_int8 * 3).from_address(addr + idx) buf[:3] = b'dog' print_function() main() |
如果你想玩这个东西是一个整体,
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | >>> n = 2 >>> pn = PyLongObject.from_address(id(n)) >>> pn.ob_digit[0] 2 >>> pn.ob_digit[0] = 1 >>> 2 1 >>> n * 3 3 >>> i = 10 >>> while i < 40: ... i *= 2 ... print(i) 10 10 10 |
当代码……这盒有无限长度的滚动条。
我在想同样的事情ipython,和第一次,我想
好的。
猴子补丁
这个过程称为
1 2 3 4 5 6 7 8 9 10 11 12 | # Store the real print function in another variable otherwise # it will be inaccessible after being modified. _print = print # Actual implementation of the new print def custom_print(*args, **options): _print('custom print called') _print(*args, **options) # Change the print function globally import builtins builtins.print = custom_print |
之后,每个
但是,您不希望打印其他文本,而是希望更改打印的文本。一种方法是将其替换为将要打印的字符串:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | _print = print def custom_print(*args, **options): # Get the desired seperator or the default whitspace sep = options.pop('sep', ' ') # Create the final string printed_string = sep.join(args) # Modify the final string printed_string = printed_string.replace('cat', 'dog') # Call the default print function _print(printed_string, **options) import builtins builtins.print = custom_print |
事实上,如果你跑步:
1 2 3 4 | >>> def print_something(): ... print('This cat was scared.') >>> print_something() This dog was scared. |
或者,如果将其写入文件:
测试文件1 2 3 4 | def print_something(): print('This cat was scared.') print_something() |
并导入:
1 2 3 4 | >>> import test_file This dog was scared. >>> test_file.print_something() This dog was scared. |
所以它真的按预期工作。
但是,如果您只是暂时想要monkey patch print,可以将其包装在上下文管理器中:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | import builtins class ChangePrint(object): def __init__(self): self.old_print = print def __enter__(self): def custom_print(*args, **options): # Get the desired seperator or the default whitspace sep = options.pop('sep', ' ') # Create the final string printed_string = sep.join(args) # Modify the final string printed_string = printed_string.replace('cat', 'dog') # Call the default print function self.old_print(printed_string, **options) builtins.print = custom_print def __exit__(self, *args, **kwargs): builtins.print = self.old_print |
因此,当您运行时,它取决于所打印的内容:
1 2 3 4 5 6 | >>> with ChangePrint() as x: ... test_file.print_something() ... This dog was scared. >>> test_file.print_something() This cat was scared. |
所以你可以通过猴子补丁"黑客"
如果你看
缺点是,它不适用于不打印到
1 2 3 4 5 6 7 8 9 | import io import sys class CustomStdout(object): def __init__(self, *args, **kwargs): self.current_stdout = sys.stdout def write(self, string): self.current_stdout.write(string.replace('cat', 'dog')) |
但是,这也有效:
1 2 3 4 5 6 7 | >>> import contextlib >>> with contextlib.redirect_stdout(CustomStdout()): ... test_file.print_something() ... This dog was scared. >>> test_file.print_something() This cat was scared. |
总结
@abarnet已经提到了其中一些要点,但我想更详细地探讨这些选项。尤其是如何跨模块修改它(使用
从
我将使用
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | from functools import partial output_buffer = None print_orig = print def ob_start(fname="print.txt"): global print global output_buffer print = partial(print_orig, file=output_buffer) output_buffer = open(fname, 'w') def ob_end(): global output_buffer close(output_buffer) print = print_orig def ob_get_contents(fname="print.txt"): return open(fname, 'r').read() |
用途:
1 2 3 4 5 | print ("Hi John") ob_start() print ("Hi John") ob_end() print (ob_get_contents().replace("Hi","Bye")) |
将打印
Hi John
Bye John
让我们将此与框架内省结合起来!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | import sys _print = print def print(*args, **kw): frame = sys._getframe(1) _print(frame.f_code.co_name) _print(*args, **kw) def greetly(name, greeting ="Hi") print(f"{greeting}, {name}!") class Greeter: def __init__(self, greeting ="Hi"): self.greeting = greeting def greet(self, name): print(f"{self.greeting}, {name}!") |
你会发现这个技巧在每一个问候语前面都有调用函数或方法。这对于日志记录或调试可能非常有用;特别是当它允许您在第三方代码中"劫持"打印语句时。