understanding tries to infer from a set of axioms whether a particular query or class of queries is allowed by policy, independently of the state of the database being queried. Taking pre-approved queries as axioms is a simple case of this. For example, “If X is an identifier for a reasonable and articulable suspicion (RAS) target, return all the identifiers that communicated with X in the last year.” The query is fixed except for some parameters, such as X in this example. It is a human decision to pre-approve it, and no automated reasoning is needed to apply it. A more powerful system could deduce that this query is OK from more general axioms such as, “X is associated with Y if X and Y communicated in the last year” and “Any identifier associated with a RAS target can be disclosed.”
Dynamic understanding looks at the actual results of a query, rather than considering all possible results, and asks whether policy allows them to be disclosed. A simple example is a kind of minimization: if a query returns a set of identifiers, any identifiers for U.S. persons should be removed from the results. Tags on the data that track its provenance or other properties can make dynamic understanding more powerful; the example uses a “U.S. person” tag that is added to database entries for an identifier when it is determined that the identifier refers to a U.S. person.11 This kind of understanding has been studied extensively in the context of information flow control, where the goal is to keep secrets from being disclosed to uncleared people, even if it is processed by untrusted programs. Decentralized information flow can very flexibly represent both degrees of secrecy and authorities for disclosure.12 Dynamic systems can also take account of context and history by applying the rules in force at the time a query is made, considering questions such as, Is there an emergency? Is the query part of a pattern known to need more scrutiny? Are the results being combined with other data to deanonymize the results in a way that is contrary to policy?
There are many similarities between static and dynamic understanding and the thriving fields of static and dynamic program understanding, which suggests that there may be rich opportunities here. Not surprisingly, programs written in languages that are designed for automated understanding are much easier to understand. The same thing applies to query languages; indeed, the standard SQL database query language is
______________
11 NSA has developed and donated to the Apache open-source community such a database. Accumulo is a scalable key/value store that allows “access labels” to be attached to each cell that enables low-level query authorization checks (Apache Software Foundation, “Apache Accumulo,” https://accumulo.apache.org/, accessed January 16, 2015).
12 A.C. Myers and B. Liskov, A decentralized model for information flow control, pp. 129-142 in Proceedings of the 17th ACM Symposium on Operating System Principles (SOSP), 1997, Association for Computing Machinery, New York, N.Y.