bytecode – Python bytecode manipulation

The bytecode module lets you manipulate python bytecode in a version-independent way. To facilitate this, this module provides a couple of function to disassemble and assemble python bytecode into a high-level representation and some functions to manipulate those structures.

The python version independent function take a py_internals parameter which represents the specifics of bytecode on that particular version of python. The pwnypack.py_internals module provides these internal specifics for various python versions.

Examples

Disassemble a very simple function, change an opcode and reassemble it:

>>> from pwny import *
>>> import six
>>> def foo(a):
>>>     return a - 1
...
>>> print(foo, six.get_function_code(foo).co_code, foo(5))
<function foo at 0x10590ba60> b'|dS' 4
>>> ops = bc.disassemble(foo)
>>> print(ops)
[LOAD_FAST 0, LOAD_CONST 1, BINARY_SUBTRACT, RETURN_VALUE]
>>> ops[2].name = 'BINARY_ADD'
>>> print(ops)
[LOAD_FAST 0, LOAD_CONST 1, BINARY_ADD, RETURN_VALUE]
>>> bar = bc.rebuild_func_from_ops(foo, ops, co_name='bar')
>>> print(bar, six.get_function_code(bar).co_code, bar(5))
<function bar at 0x10590bb70> b'|dS' 6
class pwnypack.bytecode.AnnotatedOp(code_obj, name, arg)[source]

An annotated opcode description. Instances of this class are generated by CodeObject.disassemble() if you set its annotate argument to True.

It contains more descriptive information about the instruction but cannot be translated back into a bytecode operation at the moment.

This class uses the code object’s reference to the python internals of the python version that it originated from and the properties of the code object to decode as much information as possible.

Parameters:
  • code_obj (CodeObject) – The code object this opcode belongs to.
  • name (str) – The mnemonic of the opcode.
  • arg (int) – The integer argument to the opcode (or None).
code = None

The numeric opcode.

code_obj = None

A reference to the CodeObject it belongs to.

has_arg = None

Whether this opcode has an argument.

has_compare = None

Whether this opcode’s argument is a compare operation.

has_const = None

Whether this opcode’s argument is a reference to a constant.

has_free = None

Whether this opcode’s argument is a reference to a free or cell var (for closures and nested functions).

has_local = None

Whether this opcode’s argument is a reference to a local.

has_name = None

Whether this opcode’s argument is a reference to the names table.

name = None

The name of the operation.

class pwnypack.bytecode.Block(label=None)[source]

A group of python bytecode ops. Produced by blocks_from_ops().

Parameters:label (Label) – The label of this block. Will be None for the first block.
label = None

The label the block represents.

next = None

A pointer to the next block.

ops = None

The opcodes contained within this block.

class pwnypack.bytecode.Op(name, arg=None)[source]

Bases: object

Describes a single bytecode operation.

Parameters:
  • name (str) – The name of the opcode.
  • arg – The argument of the opcode. Should be None for opcodes without arguments, should be a Label for opcodes that define a jump, should be an int otherwise.
arg = None

The opcode’s argument (or None).

name = None

The name of the opcode.

class pwnypack.bytecode.Label[source]

Bases: object

Used to define a label in a series of opcodes.

pwnypack.bytecode.disassemble(code, origin=None)[source]

Disassemble python bytecode into a series of Op and Label instances.

Parameters:
  • code (bytes) – The bytecode (a code object’s co_code property). You can also provide a function.
  • origin (dict) – The opcode specification of the python version that generated code. If you provide None, the specs for the currently running python version will be used.
Returns:

A list of opcodes and labels.

Return type:

list

pwnypack.bytecode.assemble(ops, target=None)[source]

Assemble a set of Op and Label instance back into bytecode.

Parameters:
  • ops (list) – A list of opcodes and labels (as returned by disassemble()).
  • target – The opcode specification of the targeted python version. If this is None the specification of the currently running python version will be used.
Returns:

The assembled bytecode.

Return type:

bytes

pwnypack.bytecode.blocks_from_ops(ops)[source]

Group a list of Op and Label instances by label.

Everytime a label is found, a new Block is created. The resulting blocks are returned as a dictionary to easily access the target block of a jump operation. The keys of this dictionary will be the labels, the values will be the Block instances. The initial block can be accessed by getting the None item from the dictionary.

Parameters:ops (list) – The list of Op and Label instances (as returned by disassemble().
Returns:The resulting dictionary of blocks grouped by label.
Return type:dict
pwnypack.bytecode.calculate_max_stack_depth(ops, target=None)[source]

Calculate the maximum stack depth (and required stack size) from a series of Op and Label instances. This is required when you manipulate the opcodes in such a way that the stack layout might change and you want to re-create a working function from it.

This is a fairly literal re-implementation of python’s stackdepth and stackdepth_walk.

Parameters:
  • ops (list) – A list of opcodes and labels (as returned by disassemble()).
  • target – The opcode specification of the targeted python version. If this is None the specification of the currently running python version will be used.
Returns:

The calculated maximum stack depth.

Return type:

int

class pwnypack.bytecode.CodeObject(co_argcount, co_kwonlyargcount, co_nlocals, co_stacksize, co_flags, co_code, co_consts, co_names, co_varnames, co_filename, co_name, co_firstlineno, co_lnotab, co_freevars, co_cellvars, origin=None)[source]

Bases: object

Represents a python code object in a cross python version way. It contains all the properties that exist on code objects on Python 3 (even when run on Python 2).

Parameters:
  • co_argcount – number of arguments (not including , * or keyword only args)
  • co_kwonlyargcount – The keyword-only argument count of this code.
  • co_nlocals – number of local variables
  • co_stacksize – virtual machine stack space required
  • co_flags – bitmap: 1=optimized | 2=newlocals | 4=*arg | 8=**arg
  • co_code – string of raw compiled bytecode
  • co_consts – tuple of constants used in the bytecode
  • co_names – tuple of names of local variables
  • co_varnames – tuple of names of arguments and local variables
  • co_filename – name of file in which this code object was created
  • co_name – name with which this code object was defined
  • co_firstlineno – number of first line in Python source code
  • co_lnotab – encoded mapping of line numbers to bytecode indices
  • co_freevars – tuple of names of closure variables
  • co_cellvars – tuple containing the names of local variables that are referenced by nested functions
  • origin (dict) – The opcode specification of the python version that generated the code. If you provide None, the specs for the currently running python version will be used.
annotate_op(op)[source]

Takes a bytecode operation (Op) and annotates it using the data contained in this code object.

Parameters:op (Op) – An Op instance.
Returns:An annotated bytecode operation.
Return type:AnnotatedOp
assemble(ops, target=None)[source]

Assemble a series of operations and labels into bytecode, analyse its stack usage and replace the bytecode and stack size of this code object. Can also (optionally) change the target python version.

Parameters:
  • ops (list) – The opcodes (and labels) to assemble into bytecode.
  • target – The opcode specification of the targeted python version. If this is None the specification of the currently running python version will be used.
Returns:

A reference to this CodeObject.

Return type:

CodeObject

disassemble(annotate=False, blocks=False)[source]

Disassemble the bytecode of this code object into a series of opcodes and labels. Can also annotate the opcodes and group the opcodes into blocks based on the labels.

Parameters:
  • annotate (bool) – Whether to annotate the operations.
  • blocks (bool) – Whether to group the operations into blocks.
Returns:

A list of Op (or AnnotatedOp) instances and labels.

Return type:

list

classmethod from_code(code, co_argcount=BORROW, co_kwonlyargcount=BORROW, co_nlocals=BORROW, co_stacksize=BORROW, co_flags=BORROW, co_code=BORROW, co_consts=BORROW, co_names=BORROW, co_varnames=BORROW, co_filename=BORROW, co_name=BORROW, co_firstlineno=BORROW, co_lnotab=BORROW, co_freevars=BORROW, co_cellvars=BORROW)[source]

Create a new instance from an existing code object. The originating internals of the instance will be that of the running python version.

Any properties explicitly specified will be overridden on the new instance.

Parameters:
  • code (types.CodeType) – The code object to get the properties of.
  • .. – The properties to override.
Returns:

A new CodeObject instance.

Return type:

CodeObject

classmethod from_function(f, *args, **kwargs)[source]

Create a new instance from a function. Gets the code object from the function and passes it and any other specified parameters to from_code().

Parameters:f (function) – The function to get the code object from.
Returns:A new CodeObject instance.
Return type:CodeObject
to_code()[source]

Convert this instance back into a native python code object. This only works if the internals of the code object are compatible with those of the running python version.

Returns:The native python code object.
Return type:types.CodeType
to_function()[source]

Convert this CodeObject back into a python function. This only works if the internals of the code object are compatible with those of the running python version.

Returns:The newly created python function.
Return type:function