Advanced Rego Testing Techniques

9 min read

Filling in some blanks on large scale policy testing with Gusto‘s Nicholaos Mouzourakis.

Intro

It’s exciting to learn new languages! I’ve recently been getting my head around Rego, the domain-specific policy language of Open Policy Agent (OPA). It’s fun, and while it can be quick to learn the basics, when you get to the stage of structuring a larger project of policies, there can also be some challenges that come along with that scale. For example, Rego hasn’t been around quite long enough yet to have an established and comprehensive body of closed and open source work from which to draw good coding hygiene and structural/architectural habits from. Towards that goal, I will be sharing the next best thing, which are some hard-learned patterns and techniques that we at Gusto use to streamline our Rego testing code and make it both more understandable, and less of a chore to write.

The Context

Before diving straight into the tests, we should establish some kind of common schema for authorization requests. Normally, we do this so that request serializers and policy writers have some idea of the JSON input to expect, but for our case, it’s so that we can write tests with input that is as close to the real thing as we can make it. At Gusto, we use a “principal-action-entity” authorization request model, where some “principal” is requesting to perform some “action” on some “entity”, and OPA will respond with true or false depending on whether said request is authorized. Drawing from a typical Gusto example, the input for a request to read someone’s compensation might look like this:

{
  "principal": {
    "type": "user",
    "id": "7f70389b-2011-407a-89cc-6633fdf86daa"
  },
  "action": "read",
  "entity": {
    "type": "compensation",
    "id": "fbbaa42c-9631-45ba-8323-56de1b6d85f3"
  }
}

Our action is a simple string, while our principal and entity are simple objects, each with a UUID and type, which should be all we need to logically differentiate them in Rego. (It’s also all we should need to identify them in a database if necessary for future audit needs.) That being said, we will likely have difficulty making any meaningful decisions with this data alone, and so this is where input/policy designers like myself have to make the decision of where to load additional external data. This is a complicated problem that has its own page in the OPA documentation, but for the sake of this post, we’ll just use ‘input-stuffing’. Specifically for this example, let’s say we need the “compensation” object to have an owner, which will correspond to the “employee” role that the user is logged in as, so we’ll just add a “metadata” field to the input object, like so:

{
  "principal": {
    "type": "user",
    "id": "7f70389b-2011-407a-89cc-6633fdf86daa"
  },
  "action": "read",
  "entity": {
    "type": "compensation",
    "id": "fbbaa42c-9631-45ba-8323-56de1b6d85f3"
  },
  "metadata": {
    "principal": {
      "role": {
        "type": "employee",
        "id": "284df004-a71b-4e52-8a8e-4907331d1b58"
      }
    },
    "entity": {
      "owner": {
        "type": "employee",
        "id": "284df004-a71b-4e52-8a8e-4907331d1b58"
      }
    }
  }
}

Note that this is just an example; in practice, these input objects will have to evolve to accommodate whatever use case it needs to represent. We’ll come back to that later, but you may be able to see what kind of policy we are heading towards from this input. A policy that would accept this kind of input might sound like, “employees can read their own compensation”, or in Rego:

default allow := false

allow if {
    input.principal.type == "user"
    input.action == "read"
    input.entity.type == "compenstaion"
    input.metadata.principal.role.type == input.metadata.entity.owner.type
    input.metadata.principal.role.id == input.metadata.entity.owner.id
}

And given that policy, we can now start to formulate what a test for it might look like. To cover all our bases, a typical pattern is to test a “happy case”, where the policy passes with a valid input, and then test some or all failure cases, with each test causing a single policy condition to fail, like so:

test_employee_read_compensation_match if {
    allow with input as {
        "principal": {
            "type": "user",
            "id": "7f70389b-2011-407a-89cc-6633fdf86daa",
        },
        "action": "read",
        "entity": {
            "type": "compensation",
            "id": "fbbaa42c-9631-45ba-8323-56de1b6d85f3",
        },
        "metadata": {
            "principal": {"role": {
                "type": "employee",
                "id": "284df004-a71b-4e52-8a8e-4907331d1b58",
            }},
            "entity": {"owner": {
                "type": "employee",
                "id": "284df004-a71b-4e52-8a8e-4907331d1b58",
            }},
        },
    }
}

test_employee_read_compensation_owner_type_mismatch if {
    not allow with input as {
        "principal": {
            "type": "user",
            "id": "7f70389b-2011-407a-89cc-6633fdf86daa",
        },
        "action": "read",
        "entity": {
            "type": "compensation",
            "id": "fbbaa42c-9631-45ba-8323-56de1b6d85f3",
        },
        "metadata": {
            "principal": {"role": {
                "type": "accountant",
                "id": "284df004-a71b-4e52-8a8e-4907331d1b58",
            }},
            "entity": {"owner": {
                "type": "employee",
                "id": "284df004-a71b-4e52-8a8e-4907331d1b58",
            }},
        },
    }
}

test_employee_read_compensation_principal_owner_id_mismatch if {
    not allow with input as {
        "principal": {
            "type": "user",
            "id": "7f70389b-2011-407a-89cc-6633fdf86daa",
        },
        "action": "read",
        "entity": {
            "type": "compensation",
            "id": "fbbaa42c-9631-45ba-8323-56de1b6d85f3",
        },
        "metadata": {
            "principal": {"role": {
                "type": "employee",
                "id": "536a1856-66da-4629-a5c9-76af3dce4c77",
            }},
            "entity": {"owner": {
                "type": "employee",
                "id": "284df004-a71b-4e52-8a8e-4907331d1b58",
            }},
        },
    }
}

Yikes… That is just a few tests for a single policy… A test file that fully tests each rule of a corresponding policy file with even a few dozen policy rules can quickly explode into hundreds or even thousands of lines of rules containing very similar looking JSON objects, with each one differing by only a field or two. This in turn typically leads to not only tedious test creation by copying and pasting several test rules at a time and then editing individual fields. Additionally, if any of those edits are not done properly due to visual JSON overload, test errors can accumulate, defeating the purpose and assurances of the tests in the first place. Plus, if writing it is a challenge, reading it will also likely be a pain, further undermining the spirit of good testing. So, let’s see what can be done to clean this up with some of the techniques we have been using at Gusto.

Consolidating input objects

The first and probably most obvious thing we can do is not repeat the entire input object as a literal for each test, but store it in a separate rule, and then alter it as necessary for each rule. The built-in object.union(…) function comes in very handy for this:

employee_read_compensation_input := {
    "principal": {
        "type": "user",
        "id": "7f70389b-2011-407a-89cc-6633fdf86daa",
    },
    "action": "read",
    "entity": {
        "type": "compensation",
        "id": "fbbaa42c-9631-45ba-8323-56de1b6d85f3",
    },
    "metadata": {
        "principal": {"role": {
            "type": "employee",
            "id": "284df004-a71b-4e52-8a8e-4907331d1b58",
        }},
        "entity": {"owner": {
            "type": "employee",
            "id": "284df004-a71b-4e52-8a8e-4907331d1b58",
        }},
    },
}

test_employee_read_compensation_match if {
    allow with input as employee_read_compensation_input
}

test_employee_read_compensation_owner_type_mismatch if {
    not allow with input as object.union(
        employee_read_compensation_input,
        {"metadata": {"entity": {"owner": {"type": "accountant"}}}},
    )
}

test_employee_read_compensation_owner_id_mismatch if {
    not allow with input as object.union(
        employee_read_compensation_input,
        {"metadata": {
            "entity": {
                "owner": {"id": "536a1856-66da-4629-a5c9-76af3dce4c77"}
            }
        }},
    )
}

This looks much better, and will save quite a few lines, copy-paste errors, and headaches when writing new tests. At Gusto we take it one step further and have custom with_ functions for all of our common object.union(…) overrides, like so:

package testing

with_entity_owner(input_object, entity_owner) := object.union(
    input_object,
    {"metadata": {"entity": {"owner": entity_owner}}}
)

Which we then utilize across all of our tests for consistency. (And to save us from typing out whole objects for the second argument every time, which like working with full-sized input objects, quickly gets annoying!)

Offload input object creation to a function

Now that each set of tests has only one input, it is much easier than before to write new tests, but having dozens of large object literals in a single test file is still not ideal for readability… What would be really nice would be if we could boil down the core components of an input object to interchangeable building blocks that we could then arrange as needed. It might make sense to use a function for this, but since Rego doesn’t support variadic or named arguments, the next best thing is to just use one big argument object, with specific keys being mapped to specific object paths in the resulting object. An ideal function invocation to produce the employee_read_compensation_input above might look like this:

employee_read_compensation_input := build_input({
    "action": "read",
    "principal": {
        "type": "user",
        "id": "7f70389b-2011-407a-89cc-6633fdf86daa"
    },
    "entity": {
        "type": "compensation", 
        "id": "fbbaa42c-9631-45ba-8323-56de1b6d85f3"
    },
    "principal_role": {
        "type": "employee", 
        "id": "284df004-a71b-4e52-8a8e-4907331d1b58"
    },
    "emtity_owner": {
        "type": "employee", 
        "id": "284df004-a71b-4e52-8a8e-4907331d1b58"
    },
})

This would make it very easy to quickly build input objects from known simple objects, and make sure they are placed at the proper input path. The input paths could be specified via a mapping, like so:

{
  "principal": ["principal"],
  "action": ["action"],
  "entity": ["entity"],
  "principal_role": ["metadata", "principal", "role"],
  "entity_owner": ["data", "entity", "owner"]
}

And then the input object could be built by placing each object argument at the path specified by the string array at its key name in the mapping above. Unfortunately, the implementation of such a function is not as trivial as one may like, but there is one that we have developed in order to get the job done:

build_object_path(path, value) := object if {
    path_patches := [patch |
        some index, _ in path
        json_path := sprintf("/%s", [concat(
            "/",
            array.slice(path, 0, index + 1),
        )])
        patch := {"op": "add", "path": json_path, "value": {}}
    ]
    value_patch := {
        "op": "add",
        "path": sprintf("/%s", [concat("/", path)]),
        "value": value,
    }
    object := json.patch({}, array.concat(path_patches, [value_patch]))
}

build_input(params) := result if {
    param_paths := {
        "principal": ["principal"],
        "action": ["action"],
        "entity": ["entity"],
        "principal_role": ["metadata", "principal", "role"],
        "entity_owner": ["data", "entity", "owner"],
    }
    param_objects := [build_object_path(param_paths[param_key], param_value) |
        some param_key, param_value in params
    ]
    result := object.union_n(param_objects)
}

By fiendishly misusing creatively reappropriating the built-in json.patch(…) function, we can create arbitrary numbers of arbitrary-path-length objects, that we can then combine with the object.union_n(…) function.

Use fixtures to model real data

While the build_input function above is certainly an improvement to raw input object literals, it still leaves a bit to be desired. We are still typing in (or more likely/honestly, copy-pasting) raw object literals, only smaller, for the arguments to the function. While we now have a mechanism to put the building blocks in place, it would be nice if those blocks were all organized somewhere, ready to be used as needed… One solution for this is to have a comprehensive “fixture” file which contains organized, named objects that represent building blocks of input data as they are expected to be present in the input object. We also try to structure the data as closely as possible to the way it exists in the application so that choosing fixtures to use when building input makes sense to the test reader and writer. For the example above, we might have an fixture file that looks like this:

package fixtures

alice := {
    "user": {
        "type": "user",
        "id": "7f70389b-2011-407a-89cc-6633fdf86daa",
    },
    "employee": {
        "type": "employee",
        "id": "284df004-a71b-4e52-8a8e-4907331d1b58",
    },
}

bob := {
    "user": {
        "type": "user",
        "id": "ba9cfc65-67af-4c56-b37a-1e9e05c6d719",
    },
    "employee": {
        "type": "employee",
        "id": "44b00526-81c3-4680-81b1-c154dbff9cdc",
    },
}

compensation := {
    "type": "compensation",
    "id": "fbbaa42c-9631-45ba-8323-56de1b6d85f3",
}

Being a security team, we settled on the generic names used in academic security papers, but you are of course welcome to use whatever naming scheme which suits your organization’s needs.

(At first, we considered using names from popular media so that relationships might be intuitively understood by test readers/writers. (e.g. a “michael” employee may be the manager of a “dwight” employee) However, we didn’t want knowledge of the universe of “The Office” (or any other media) to be a prerequisite for understanding the tests. We would advise similar considerations for any other fixture files.)

The important part is that we can now beautifully rewrite our previous input builder invocation so that it can readably and succinctly convey our intent, like so:

import data.fixtures

employee_read_compensation_input := build_input({
    "principal": fixtures.alice.user,
    "action": "read",
    "entity": fixtures.compensation,
    "principal_role": fixtures.alice.employee,
    "entity_owner": fixtures.alice.employee,
})

And, if we implemented the with_ functions mentioned above, we can rewrite our previous tests to be equally readable and succinct, like so:

import data.testing

test_employee_read_compensation_match if {
    allow with input as employee_read_compensation_input
}

test_employee_read_compensation_owner_mismatch if {
    not allow with input as testing.with_entity_owner(
        employee_read_compensation_input,
        fixtures.bob.employee,
    )
}

And finally, this is about as close to our actual Gusto test code as we can get without copy-pasting it directly.

Hopefully, this provides some new techniques that you can use in your own test code to keep it clean and bug free. These lessons were certainly hard but also necessary ones to learn at Gusto, but we are definitely pleased with the results, and look forward to the possibilities that these patterns enable in future code.

Spirit of testing

When writing tests, it’s important not to lose sight of the goal: to set up scenarios that are the best balance of realistic and understandable relative to policies, and then be able to write those tests in a way that will be readable and maintainable. It’s also important to mention that testing, especially for our OPA rego policies, serves more than simply confirming that our policies were written correctly. As we refactor and rewrite our policies to meet our evolving needs, testing also serves as a critical tool to ensure that refactored/rewritten policies are logically equivalent (to an extent, at least) to the former policies. The techniques in this post have helped us immensely at Gusto to write our tests in a way that we feel strikes the aforementioned balance, and I hope they do the same for you and/or your organization.

If you’d like to discuss testing, or how we use OPA at Gusto, join me and many others in the Styra community Slack!

Cloud native
Authorization

Entitlement Explosion Repair

Join Styra and PACLabs on April 11 for a webinar exploring how organizations are using Policy as Code for smarter Access Control.

Speak with an Engineer

Request time with our team to talk about how you can modernize your access management.