visit
This is the sixth post in a series about property-based testing. This post describes "random shrinking", the third and last implementation of shrinking I know of. It keeps all the advantages of internal shrinking, which we discussed in part 5, and is much simpler. Previous posts are:
We make two modifications to Random
. In addition to generating a random value, we also return the size of the generated value. For our purposes, the size is simply represented by a positive integer. The smaller the integer, the smaller the size. Additionally, to enable short-circuiting, the generator takes in the current minimal size, min_size
.
Size = int
class SizeExceeded(Exception):
pass
class Random(Generic[T]):
def __init__ (self,
generator: Callable[[Optional[Size]], Tuple[T, Size]]):
self._generator = generator
def generate(self, min_size: Optional[Size] = None) -> Tuple[T, Size]:
return self._generator(min_size)
If min_size
is None
, we're in the random generation phase of the test run - meaning we haven't found a failing test yet and so the generator should never short-circuit. If min_size
is a Size
value, we are shrinking, and min_size
is the current smallest value for which the test fails. If a generator exceeds that size, there's no point in going on, and the generator should throw SizeExceeded
to short-circuit. Don't worry if that doesn't make sense yet - examples below.
Beginning with constant
, as usual:
def constant(value:T) -> Random[T]:
return Random(lambda: (value, 0))
Our old friend int_between
is a bit more tricky.
def int_between(low: int, high: int) -> Random[int]:
def zig_zag(i: int):
if i < 0:
return -2*i - 1
else:
return 2*i
def generator(min_size: Optional[Size]):
value = random.randint(low, high)
size = zig_zag(value)
# short circuit - implementation below
dec_size(min_size, size)
return value, size
return Random(generator)
int_between
uses so-called ZigZag encoding to calculate the size of an int. ZigZag encoding is also used in . The gist of it is that each signed integer gets a positive size, by zigzagging between negative and positive integers. Unlike the implementation here, with a limited type like int32 or int64, you only need a few bit shifts and exclusive or to compute the size, which makes it pretty fast in practice. The idea is that integers with greater absolute value have greater size.
I've also introduced a dec_size
method (for "decrease size"), which makes sure we short-circuit generation as soon as possible.
def dec_size(min_size: Optional[Size],
decrease: Size)
-> Optional[Size]:
if min_size is None:
return None
smaller = min_size - decrease
if smaller < 0:
raise SizeExceeded()
return smaller
If min_size
is not None
, meaning we're currently shrinking, dec_size
subtracts the size of the currently generated value from the current minimum size, and checks if that is still positive. If it's not, it throws SizeExceeded
to indicate that the value we're about to generate is already greater than the best, minimum size - thereby short-circuiting. The less work we do, the more values we can generate in a shorter time, and the more opportunities we have for finding smaller values. We're not yet using the result of dec_size
but we will do so soon.
map
is uninteresting - it just maintains the size:
def map(func: Callable[[T], U],
gen: Random[T])
-> Random[U]:
def generator():
result, size = gen.generate()
return func(result), size
return Random(generator)
The size of mapN
is the sum of the sizes of all its inputs:
def mapN(func: Callable[...,T],
gens: Iterable[Random[Any]])
-> Random[T]:
def generator(min_size: Optional[Size]):
results: list[Any] = []
size_acc = 0
for gen in gens:
result, size = gen.generate(min_size)
min_size = dec_size(min_size, size)
results.append(result)
size_acc += size
return func(*results), size_acc
return Random(generator)
Both map
and mapN
assume that the size of the output value decreases as the size of the input values decreases. Reasonable, though it's easy to imagine a counterexample. Here's where the result of dec_size
is used: as values for the inputs are generated, mapN
subtracts their size from min_size
. This ensures that, if say we have 10 inputs, and we already exceed the min_size
by the third generator, we short-circuit as soon as possible.
Finally, bind
:
def bind(func: Callable[[T], Random[U]],
gen: Random[T])
-> Random[U]:
def generator(min_size: Optional[Size]):
result,size_outer = gen.generate(min_size)
min_size = dec_size(min_size, size_outer)
result,size_inner = func(result).generate(min_size)
size = size_inner+size_outer
return result, size
return Random(generator)
The size of bind
is interpreted as the sum of the size of both inner and outer generators.
And that is pretty much it! We can literally reuse our implementation of Property
and for_all
, which I've repeated here as a reminder.
@dataclass(frozen=True)
class TestResult:
is_success: bool
arguments: Tuple[Any,...]
Property = Gen[TestResult]
def for_all(gen: Gen[T],
property: Callable[[T], Union[Property,bool]])
-> Property:
def property_wrapper(value: T) -> Property:
outcome = property(value)
if isinstance(outcome, bool):
return constant(
TestResult(
is_success=outcome,
arguments=(value,)
)
)
else:
return map(
lambda inner_out: replace(
inner_out,
arguments=(value,) + inner_out.arguments
),
outcome
)
return bind(property_wrapper, gen)
def test(property: Property):
def find_smaller(min_result: TestResult, min_size: Size):
... # the only new bit
for test_number in range(100):
result, size = property.generate()
if not result.is_success:
print(f"Fail: at test {test_number} with arguments {result.arguments}.")
find_smaller(result, size)
return
print("Success: 100 tests passed.")
Now, find_smaller
is new but is hardly difficult. The idea is to keep generating random values until we find one that is both smaller than the current smallest known size, and that still fails the test:
def find_smaller(min_result: TestResult, min_size: Size):
skipped, not_shrunk, shrunk = 0, 0, 0
while skipped + not_shrunk + shrunk <= 100_000 and min_size > 0:
try:
result, size = property.generate(min_size)
if size >= min_size:
skipped += 1
elif not result.is_success:
shrunk += 1
min_result, min_size = result, size
else:
not_shrunk += 1
except SizeExceeded:
skipped += 1
print(f"Shrinking: gave up at {min_result.arguments}")
print(f"{skipped=} {not_shrunk=} {shrunk=} {min_size=}")
Now the question is - does this even work? It seems naively simple. Only one way to find out - using our canonical prop_wrong_sort_by_age
example. Note once again we didn't have to change generators or test code, so I won't repeat that code here.
> test(prop_wrong_sort_by_age)
Fail: at test 0 with arguments ([Person(name='pjahgc', age=27), Person(name='gndrlt', age=70), Person(name='qukflk', age=79), Person(name='dknczu', age=2), Person(name='jqtqgr', age=8), Person(name='xlcxhk', age=22), Person(name='wotanl', age=5), Person(name='nxkupy', age=99), Person(name='pxngky', age=31)],).
Shrinking: gave up at arguments ([Person(name='etkdgb', age=8), Person(name='haagjh', age=5)],)
skipped=81616 not_shrunk=18373 shrunk=12 min_size=2502
> test(prop_wrong_sort_by_age)
Fail: at test 0 with arguments ([Person(name='cigepg', age=69), Person(name='lqkqmp', age=100), Person(name='nlgbbl', age=33), Person(name='xrnzfq', age=76), Person(name='ujjnfz', age=34), Person(name='xlcvxf', age=19)],).
Shrinking: gave up at arguments ([Person(name='adcyxh', age=1), Person(name='hrclbb', age=0)],)
skipped=81931 not_shrunk=18055 shrunk=15 min_size=2530
Person
s, with relatively low ages.skipped
happens about 80% of the time, not_shrunk
about 20% of the time. The number of shrunk
values is almost insignificant! These numbers are similar if I try 1,000,000 times instead of 100,000 times. In fact, the number of successful shrinks remains around 10-15 even with 10x more tries. This indicates that for this case at least, it's unlikely to help much if we keep shrinking for longer.This third and last approach to shrinking sounds almost too simple to work. But it does, and pretty well at that. We've removed all human cleverness from the shrinking process, and as a result, we gain a straightforward implementation, while keeping an integrated generation and shrinking API. Shrinking works just as well for mutable as for immutable values, and we can reproduce any run with just a single seed value for the pseudo-random number generator (typically a couple int64
values).
Also, it's feasible to tack on any of the other shrinking methods after random shrinking: for example, we can try to find the smallest example using random shrinking, and then try and make that value even smaller by any of the two other methods.
The complete code for this post can be found on - in particular and .
Also published .