tiny-test, for non-production property testing games

March 29, 2021

Property testing is a testing strategy where the programmer defers the responsibility of writing example cases to the computer. That’s most of the game, but isn’t super useful for practice with property testing outside of a production setting.

Property testing is an area of software development dear to my heart — one of the first software talks I ever saw where I thought I’d like to be a developer was on hypothesis by Matt Bachmann. However, I’ve often had trouble explaining the mechanics to people without hand-waving. To that end, I created tiny-test, a testing framework you absolutely should not use in any production setting, but which is fine for playing around.

What’s the goal with tiny-test? Write a “test framework” that illustrates the stylized facts of property testing.

What are the stylized facts of property testing frameworks?

Property testing goes by a lot of names — fuzz testing, generative testing, monkey testing… there are a lot. These names collectively emphasize that the test cases will be driven by randomness, so that will be the first stylized fact: test cases in tiny-test should be driven by randomness.

Second, since tiny-test is a testing framework, it should have opinions about what a test is. The second stylized fact is that tests are things that produce test results. What are test results? Test results are checked expectations — e.g., “I think this should run without crashing,” or “I think the result should be less than six.”

With these two stylized facts, there’s a pretty straightforward main requirement: tiny-test needs to provide APIs for transforming randomness into test results. Cool! Let’s work on that.

Useful randomness

The hello world of property testing is a custom add function that acts like the built-in + operator.

In Haskell, that would look like this:

add :: Int -> Int -> Int
add = (+)

However, that’s not how add is implemented in tiny-test. How do we explore the possible integer inputs such that we can find the values where the add implementation is wrong?

In Python

Let’s suppose we’re working in a language where we can freely mix effectful and non-effectful code, like python. In that case, our test might look something like:

import random

def test_add():
    a = random.randint(0, 1000)
    b = random.randint(0, 1000)
    assert add(a, b) == a + b

Cool, now we can run one test with two random inputs. What if we want to run lots of tests? Let’s try that instead:

def test_add_a_lot():
    some_as = [random.randint(0, 1000) for _ in range(1000)]
    some_bs = [random.randint(0, 1000) for _ in range(1000)]
    for a, b in zip(some_as, some_bs):
        assert add(a, b) == a + b

Nice, now we’ve run lots of tests. This is great because we didn’t have to do very much work and we got to make sure that our add function works for lots of different int combinations, instead of the six combinations we might come up with in a unit test.

What if we want to test some property about strings?

def some_rand_string_fn():
    # This returns a random string, I swear.
    ...

def test_append():
    some_strings = [some_rand_string_fn() for _ in range(1000)]
    for st in some_strings:
        assert st in (st + st)

This pattern looks familiar — generate a bunch of data of the type that we need, call some function on the generated data, and assert some expectation about the result.

Properties over arbitrary functions

We saw in the Python example that it’s useful to be able to generate data of different types. That example included only ints and strings, but we could conceivably want to test functions with any input type. In that case, we’d want some way to express the requirements that:

we can generate data of the type needed
we have some function on the generated data that gives us a test result

In languages with typeclasses, we can represent some capability that we want to express like “we can generate data of this type” with a typeclass. In property testing libraries, this capability is frequently (scalacheck, quickcheck, hedgehog) wrapped up in a Gen type. A value of type Gen a or Gen[A] says “I can generate values of type a/A.” Often, access to that generator is hidden in an Arbitrary typeclass. In tiny-test, we’ll make things a little simpler in our Arbitrary typeclass, and just say that we have a sample method that effectfully procures us a list of values of the type we want:

class Arbitrary a where
  sample :: IO [a]

The second requirement is simpler — we said tests are things that produce test results, so let’s summon a type called Result (the details aren’t important here) and say that we need a function from something we can summon to a Result. In Haskell, that looks like:

prop :: Arbitrary a => (a -> Result) -> IO Result

Provided you know how to summon values of type a, we have a function that we can use to run lots of tests. Let’s assume that Results can be combined — in that case, we can smash all of our results in the implementation with:

prop :: Arbitrary a => (a -> Result) -> IO Result
prop f = do
  someAs <- sample
  pure $ mconcat (f <$> someAs)

do takes us into something that looks a little imperative — you can read each line as “this, then that, then the other thing,” similar to languages like Python and JavaScript. The first line says “get me the list of a-s that you promised,” and the second says “test them all, smash the results together, and give me the result.” The pure $ at the beginning is just there to make the types cooperate.

Similarly, for functions of two parameters where we know how to generate values of both types, we can implement prop2:

prop2 :: (Arbitrary a, Arbitrary b) => (a -> b -> Result) -> IO Result
prop2 f =
  do
    as <- sample
    bs <- sample
    pure . fold $ zipWith f as bs

How high can we go with this? All the way to infinity! tiny-test stops at prop3 because that’s not the point of tiny-test. Real property testing libraries don’t have this kind of limit or require you to count arguments on your own. For example, the ScalaCheck quickstart shows that the forAll method, which is like prop above, can take functions of different numbers of parameters.

Using `tiny-test`

You can use tiny-test in the tiny-test test suite. To run the test suite, you’ll need stack. If you clone the repo and run stack test, you’ll see:

Running test suite addition unit tests
Success!

If you then apply a small diff to the Spec.hs file and rerun…

diff --git a/test/Spec.hs b/test/Spec.hs
index a114364..b776925 100644
--- a/test/Spec.hs
+++ b/test/Spec.hs
@@ -10,8 +10,8 @@ import TestSuite (TestSuite (..), runTests)
 main :: IO ()
 main = do
   addUnitExit <- runTestSuite addUnitSuite
-  -- addPropExit <- addPropSuite >>= runTestSuite
-  exitWith addUnitExit
+  addPropExit <- addPropSuite >>= runTestSuite
+  exitWith $ addUnitExit `max` addPropExit
 
 runTestSuite :: TestSuite -> IO ExitCode
 runTestSuite testSuite =
@@ -42,12 +42,12 @@ addUnitSuite =
         ]
     }
 
--- addPropSuite :: IO TestSuite
--- addPropSuite =
---   do
---     result <- Result.fromProp2 (\x y -> add x y `shouldBe` (x + y))
---     pure $
---       NamedTestSuite
---         { suiteName = "addition prop tests",
---           suite = [result]
---         }
\ No newline at end of file
+addPropSuite :: IO TestSuite
+addPropSuite =
+  do
+    result <- Result.fromProp2 (\x y -> add x y `shouldBe` (x + y))
+    pure $
+      NamedTestSuite
+        { suiteName = "addition prop tests",
+          suite = [result]
+        }

You’ll see a nice collection of failures:

Running test suite addition prop tests
15 was not equal to 16. Input: (13,3)
69 was not equal to 70. Input: (13,57)
34 was not equal to 35. Input: (0,35)
101 was not equal to 102. Input: (26,76)
138 was not equal to 139. Input: (91,48)
76 was not equal to 77. Input: (13,64)
23 was not equal to 24. Input: (13,11)
80 was not equal to 81. Input: (0,81)
103 was not equal to 104. Input: (13,91)
144 was not equal to 145. Input: (78,67)
146 was not equal to 147. Input: (52,95)
94 was not equal to 95. Input: (26,69)
102 was not equal to 103. Input: (52,51)

If you’re quick with dividing things by 13, you might be suspicious already about how this function was implemented, and of course the implementation in tiny-test looks like this:

-- |
-- add two numbers
-- but do a bad enough job that property tests show something interesting
add :: Int -> Int -> Int
add x y = if (mod x 13 == 0) then x + y - 1 else x + y

If you want to play around with other function types, it’s easy to add new arbitrary instances. For example, if you wanted to generate instances of a type Foo with a num field that’s an Int and a ch field that’s a Char, you could apply the following diff:

diff --git a/src/Arbitrary.hs b/src/Arbitrary.hs
index 66405b3..17cb6d2 100644
--- a/src/Arbitrary.hs
+++ b/src/Arbitrary.hs
@@ -10,4 +10,12 @@ class Arbitrary a where
   sample :: IO [a]

 instance Arbitrary Int where
-  sample = replicateM 100 ((flip mod) 100 <$> randomIO)
\ No newline at end of file
+  sample = replicateM 100 ((flip mod) 100 <$> randomIO)
+
+data Foo = Foo { num :: Int, ch :: Char }
+
+instance Arbitrary Foo where
+  sample = replicateM 100 $ do
+    num <- randomIO
+    ch <- randomIO
+    pure $ Foo num ch

and test what ever functions over Foos you can imagine.

What are the stylized facts of property testing frameworks?

Useful randomness

In Python

Properties over arbitrary functions

Using tiny-test

Using `tiny-test`