Technical Report

Encrypted Data Analytics

A comprehensive guide to privacy-enhancing technologies, evaluating architectures, and navigating the trade-offs of modern data protection.

Executive summary

Encrypted data analytics is best understood as an umbrella for privacy-enhancing technologies that let an analyst, cloud service, or counterparties compute over sensitive data while reducing or eliminating ordinary plaintext exposure during processing. In the strictest cryptographic sense, that includes homomorphic encryption, secure multi-party computation, structured/searchable encryption, private set operations, and functional encryption. In industry practice, the umbrella also includes trusted execution environments and confidential computing, which protect data in use inside attested hardware isolation rather than by keeping it cryptographically opaque end to end. NIST, the ICO, and major cloud providers all treat these tools as part of the broader PET/confidential-computing landscape. [1]

No single technique dominates. TEEs/confidential VMs are currently the easiest path to broad functionality and near-native performance for SQL, joins, and ML, but they add hardware trust, attestation, and side-channel assumptions. Searchable encryption and ORE/OPE are often the most practical way to support equality/range/search workloads, but they do so by accepting structured leakage that can be exploited. MPC is usually the strongest choice when several organizations need joint analytics without a trusted third party, though performance depends heavily on interactivity and network conditions. FHE offers the strongest trust reduction for outsourced single-server computation, but it remains expensive and is most mature today for selected aggregation, linear algebra, similarity search, and low-depth ML inference rather than general-purpose OLAP. [2]

The most important design lesson is that "encrypted analytics" is not a binary property. Systems differ in what they protect against: cloud operators, other collaborating parties, database administrators, output inference, side channels, frequency leakage, or collusion among compute nodes. They also differ in what they can compute: sums, linear models, approximate neural inference, equality search, range search, joins, sorting, or arbitrary code. The right choice is therefore threat-model-first, not feature-first. [3]

From a deployment perspective, the dominant production pattern in 2026 is hybridization: a confidential-computing layer for general execution, cryptographic sub-protocols for the most sensitive steps, searchable encryption for narrow queryability, and differential privacy for result release. That stack is reflected in products and frameworks such as SecretFlow, Duality on AWS Nitro Enclaves, Decentriq on Azure Confidential Computing, Google Confidential Space, MongoDB Queryable Encryption, AWS Clean Rooms Differential Privacy, and Prio/DAP-style private aggregation systems. [4]

Definitions and scope

NIST’s Privacy-Enhancing Cryptography project describes the core primitives clearly: MPC lets multiple distrustful parties compute on private inputs; FHE lets a server evaluate supported functions on ciphertexts so that decryption yields the function output; PIR retrieves a database item without revealing the query; and structured encryption enables private queries over encrypted data structures. NIST also explicitly lists functional encryption among related PEC tools. This gives a stable foundation for what should count as encrypted data analytics. [5]

For this report, encrypted data analytics includes any architecture that allows useful analytics, search, joining, or ML over sensitive data while preventing the processing environment from having ordinary unrestricted plaintext visibility. That includes: cryptographic computation on ciphertext or secret shares; structured/searchable query execution over protected indexes; hardware-attested confidential execution; and output controls such as differential privacy when combined with encrypted or secret-shared processing. By contrast, ordinary encryption at rest or in transit alone does not qualify, because the application or analytic engine still processes plaintext in its normal memory space. [6]

A useful practical distinction is between strict cryptographic opacity and reduced plaintext exposure. FHE, MPC, PIR/PSI, and most FE constructions aim for the former. TEEs aim for the latter: data becomes plaintext inside an attested enclave or confidential VM, but the operator, hypervisor, and surrounding platform are supposed to remain outside the trust boundary. Because many real deployments need existing SQL engines, model runtimes, or data-clean-room workflows, TEEs are operationally important even though their guarantees are different from pure cryptography. [7]

The scope of "analytics" here is broad: aggregation such as count, sum, average, histogram, and group-by; ML inference and, where feasible, training; SQL-like filtering and selected joins; search over encrypted databases or documents; and cross-party linkage or overlap analysis such as private join and compute. The fact that different PETs support different subsets of this workload family is the central organizing principle of the rest of the report. [8]

Deployment Models Synthesis Schematic

Data owner A

Data owner B

Untrusted cloud or shared platform

Encrypt / Secret-share

Processing Environment (X)

FHE or PHE service

MPC parties

Attested TEE / VM

Encrypted index engine

Result recipient

↓

Plain result, encrypted result, or DP-protected release

The diagram above is a schematic synthesis of the main deployment models documented by NIST, MongoDB, AWS, Azure, Google Cloud, and SecretFlow. [9]

Threat models and regulatory constraints

The first question in encrypted analytics is who is the adversary. Common models include: an honest-but-curious cloud provider; a malicious cloud operator or hypervisor; colluding input parties in an MPC workflow; a database administrator who can dump storage and logs; a side-channel attacker who can observe memory, page faults, or microarchitectural effects; and an analyst who sees only aggregate outputs but may try membership or reconstruction attacks from repeated queries. Different PETs address very different subsets of that list. [10]

For cryptographic techniques, the main assumptions are usually about the hardness of lattice or number-theoretic problems and about how much leakage the scheme permits by design. For MPC, the key assumptions are the corruption model and collusion threshold: passive versus malicious adversaries, honest-majority versus dishonest-majority, and protocol behavior under abort. NIST’s PEC descriptions and widely used MPC frameworks make these distinctions explicit, and modern frameworks such as MP-SPDZ, MPyC, and MOTION expose several such security models to the developer. [11]

For TEEs, the threat model is narrower and more operational. Azure defines confidential computing as protecting code and data in use against cloud-provider operators and similar actors by using hardware-based attested TEEs; Google Confidential Space is designed to release secrets only to authorized workloads after attestation; AWS Nitro Enclaves similarly relies on enclave identity and KMS-integrated attestation. Inference: if your unacceptable risk includes hardware/firmware bugs, side channels, or supply-chain trust in the platform vendor, then TEEs alone are usually insufficient. [12]

Regulatory treatment is similarly nuanced. Under the GDPR, pseudonymisation is still a form of processing of personal data, not automatic anonymization; Article 25 requires data protection by design and by default, and Article 32 explicitly cites pseudonymisation and encryption as examples of appropriate security measures. The ICO’s PET guidance says PETs can support data minimisation and security, but they are not a silver bullet, and organizations still need lawful, fair, and transparent processing plus a case-by-case DPIA. [13]

That means encrypted analytics typically helps with risk reduction, processor minimization, breach resilience, and cross-organizational data sharing, but it usually does not eliminate legal obligations around purpose limitation, transparency, data-subject rights, retention, or international-transfer analysis. This is especially true when a controller can still decrypt outputs or link results back to individuals. That conclusion is an inference from the GDPR definition of pseudonymisation and the ICO/EDPB view that PETs are complementary safeguards within a broader compliance program. [14]

In U.S. sectoral regimes, the lesson is similar. HHS’s HIPAA security guidance points covered entities to NIST security controls and encryption guidance; the FTC’s health-data guidance emphasizes understanding data flows, implementing robust safeguards, and avoiding deceptive privacy claims; and the FTC Safeguards Rule under GLBA is explicitly risk-based and focused on preserving the confidentiality, integrity, and security of customer information. Encrypted analytics therefore helps most when it is tied to written risk assessments, access control, attestation and key management, and output governance, not when it is treated as a stand-alone compliance argument. [15]

Technique catalog

Homomorphic encryption

Partially homomorphic encryption is the mature low-functionality end of HE. Paillier-based libraries expose additive homomorphism with scalar multiplication, which makes them well suited to counts, sums, weighted sums, private billing, and secure aggregation. Security is classical public-key semantic security under the scheme’s number-theoretic assumptions; performance is strong relative to FHE because no bootstrapping or deep circuit support is required. In practice, PHE is often embedded inside larger protocols rather than used as a general analytic engine. Representative tooling includes CSIRO/Data61’s python-paillier and newer lightweight packages such as LightPHE. Typical deployment is client-side encryption, server-side accumulation, and decrypt-at-the-end by the key owner or a threshold set of key holders. The limiting factor is simple: no arbitrary comparisons, joins, or general SQL without combining PHE with other primitives. [16]

Fully homomorphic encryption is the strongest mainstream approach for outsourced single-server computation on ciphertexts. NIST defines FHE as evaluation of functions over encrypted data such that decryption yields the function output, and the HomomorphicEncryption.org community’s 2024 security guidelines are now widely used to configure modern FHE systems. The practical library ecosystem is substantial: OpenFHE, Microsoft SEAL, HElib, Lattigo, TFHE-rs, Concrete, and Concrete ML are all active, with schemes such as BFV, BGV, CKKS, and TFHE-style variants represented. Supported analytics today include aggregation, vector operations, similarity search, low-depth arithmetic circuits, and selected ML inference; multiparty HE variants extend this toward collaborative analytics. However, general SQL engines, joins, and broad OLAP remain mostly research/prototype territory, with recent surveys and systems such as ArcEDB and FHE-SQL indicating progress rather than full production readiness. [17]

Performance is the main tradeoff. The 2026 FHE Benchmarking Suite frames the right metrics as latency, throughput, memory use, storage blow-up, communication complexity, and accuracy loss. Bootstrapping is still a dominant bottleneck; even the HE Standard notes that bounded-depth schemes are more practical and that bootstrapping is expensive. Concrete ML’s documentation also illustrates the operational reality: it currently focuses on inference, and supported models must fit quantization and precision restrictions rather than arbitrary floating-point training pipelines. A subtle but important security limitation is that approximate HE, especially CKKS-like schemes, has required stronger analysis than plain IND-CPA in some settings. [18]

Typical FHE deployment is: encrypt at the client, send ciphertexts and evaluation keys to an untrusted compute service, perform homomorphic evaluation, and return either encrypted outputs or threshold-decryptable outputs to a data owner or consortium. The best current use cases are one-owner outsourced computation and hybrid pipelines where FHE protects the narrowest, most sensitive steps. Commercial activity is strong around that model, including IBM HElayers/FHE services, Duality, Zama, and vendor-driven hardware acceleration efforts. [19]

Secure multi-party computation

MPC is the natural choice when multiple organizations each keep their raw data local but want a joint result. NIST describes MPC as allowing multiple distrustful parties to compute a function over private inputs while revealing only what follows from each party’s own input and output. In practice, MPC ecosystems support passive or malicious security, honest-majority or dishonest-majority settings, and combine secret sharing, oblivious transfer, garbled circuits, and sometimes homomorphic encryption. Representative open-source frameworks include MP-SPDZ, MPyC, MOTION, EMP, ABY3, and SecretFlow. [20]

Functionality is broad but topology-sensitive. MPC is strong for aggregations, histograms, PSI, private joins, secure overlap-and-sum, federated analytics, and classical ML training/inference on partitioned data. Google’s Private Join and Compute is a concrete example of privately summing values over overlapping identifiers, and ABY3 was explicitly built as a mixed-protocol framework for ML. Honest-majority protocols continue to improve materially: recent work reported fewer high-latency links and up to 50% fewer basic instructions per gate than prior state of the art in certain 3PC/4PC settings. [21]

Performance is usually better than FHE for rich multi-party computation, but worse than TEEs for simple lift-and-shift analytics. The dominant costs are interaction rounds, network bandwidth, and the availability of independent compute parties. Within a low-latency network and a carefully selected protocol, MPC can scale well; in WAN settings or when the corruption threshold forces more conservative protocols, tail latency rises quickly. Typical deployments therefore use 2–4 orchestrated compute parties, formal collusion assumptions, and strict operational controls around party independence and output release. The main limitations are complexity, debugging difficulty, fairness/abort behavior, and the fact that the security guarantee collapses if too many parties collude. [22]

Trusted execution environments

TEEs and confidential-computing platforms protect data by constraining where it is decrypted and executed, rather than by keeping it opaque throughout computation. Current production-relevant examples include Intel SGX enclaves, AWS Nitro Enclaves, AMD SEV/SEV-SNP, Intel TDX, Azure confidential VMs, Google Confidential Space, and confidential GPUs such as NVIDIA H100. Their core guarantee is typically: only an attested workload running in the protected environment may access the keys or plaintext. [23]

TEEs are the most functionally expressive option because they can run existing analytics code with limited changes. That makes them the leading option for SQL, joins, conventional databases, arbitrary application logic, and ML training or inference when performance matters. The DuckDB-SGX2 paper is a good reference point: it ran a TPC-H scale-factor-30 analytical workload with under 2x overhead relative to non-encrypted execution, while also surfacing the real hazards of enclave execution such as higher cache-miss cost, NUMA sensitivity, and enclave paging. Typical tooling includes Gramine, Open Enclave, and Confidential Containers. [24]

The price for that performance is a larger and more fragile trust base. SGX has an extensive attack literature; the 2024 SGX.Fail systematization explicitly surveys publicly known SGX attacks and their applicability across architectures. AMD’s SEV-SNP has also faced fresh scrutiny: the 2026 Fabricked paper reports arbitrary read/write and forged attestation under a routing-misconfiguration attack, and AMD’s own security bulletin acknowledges integrity impact and mitigation requirements for affected products. In practical terms, TEEs are strongest when combined with attestation-based key release, patch discipline, minimal TCBs, secret zeroization, and output controls, and weakest when treated as “set-and-forget encryption in use.” [25]

Searchable encryption and encrypted indexes

Searchable encryption is the broad family of techniques that allows queries over protected indexes or ciphertext-linked structures. NIST’s structured encryption definition captures the essence: query encrypted data structures without revealing everything in the database. In practice, this includes blind indexes, inverted indexes, encrypted token/query protocols, and encrypted field-level query systems. Representative tooling includes OpenSSE, CipherSweet, Cosmian Findex, MongoDB Queryable Encryption, and CipherStash-style platforms. [26]

The security model is deliberately different from FHE or MPC. Efficient searchable systems almost always leak some combination of search pattern, access pattern, frequency, result size, update pattern, or index structure. Recent research emphasizes that this leakage is not a cosmetic issue: efficient structured/searchable encryption is provably defined in terms of allowed leakage, and modern leakage-abuse work shows how those profiles can be exploited in practice. That tradeoff buys substantial speed. Searchable encryption is therefore ideal for equality search, document retrieval, keyword search, and selected range/prefix/suffix queries, but it is not a drop-in answer for arbitrary joins or analytics with strong semantic-security expectations. [27]

MongoDB Queryable Encryption is the clearest production example. MongoDB documents equality queries as generally available in 7.0+, and in the current manual equality and range queries are supported in production, while prefix/suffix/substring remain preview only in 8.2 and are specifically not recommended for production. MongoDB also documents real operational costs: encrypted fields configured for queryability increase storage requirements, impact query performance, and reduce observability because encrypted collections redact some logs and diagnostics. An independent USENIX security analysis of MongoDB QE further argued that operational logs could undermine the intended security story and observed that a full public security proof was not available at the time of study. [28]

Order-preserving and order-revealing encryption

OPE and ORE are specialization tools for range predicates, sorting, thresholding, and ORDER BY-like semantics. Stanford’s Applied Crypto Group describes ORE as enabling efficient range queries, sorting, and threshold filtering over encrypted data. CipherStash’s ore.rs is an example of a production-oriented Rust Block-ORE implementation used inside a searchable-encryption platform. These constructions are attractive because they are fast and easy to integrate into database indexes. [29]

But the security compromise is fundamental: these schemes reveal order, either directly in the ciphertext relation or via a comparison interface. That leakage is often enough to support powerful inference attacks when the attacker has auxiliary distribution information or public reference data. The literature on inference attacks against property-preserving encrypted databases is the key warning here. OPE/ORE can absolutely be the right answer for latency-sensitive range search, but only when the leakage is explicitly accepted, documented, and bounded by domain design and access controls. They should not be described as “just like normal encryption, but queryable.” [30]

Functional encryption

Functional encryption sits conceptually between encryption and access control. IBM describes FE as encryption that allows learning a selected function of encrypted data or enforcing fine-grained access control; the Fentec libraries make the same idea concrete by exposing FE for linear, inner-product, and quadratic functionalities. In FE, secret keys are bound to functions rather than to full decryption, so analysts learn f(x) and nothing more, at least in the ideal model. [31]

That makes FE intellectually powerful for analytics, but practically narrow today. It is useful for inner products, selected linear algebra, some scoring functions, and special-purpose ML components, and recent work continues to explore scalable FE for federated learning and DP-augmented variants. However, FE is not a broadly deployed platform for general SQL, rich joins, or unrestricted ML training. Its tooling ecosystem is comparatively thin, mainstream cloud support is minimal, and operational key management is demanding because a master authority must issue function keys. The honest conclusion is that FE remains promising but low-maturity for enterprise encrypted analytics outside specialized research or niche high-value workflows. [32]

Differential privacy with encryption

Differential privacy is not itself a method for computing on encrypted data; it is a rigorous method for controlling what can be inferred from released outputs. That is exactly why it combines so well with encrypted analytics. OpenDP frames DP as limiting what can be learned about any individual from the output, while Google’s production write-up on distributed differential privacy for federated learning explains the complementary role of secure aggregation: the server should learn only an aggregate model update, not each user update. [33]

The strongest practical pattern is therefore: protect inputs during collection and computation using encryption, secret sharing, or TEEs; then protect the released statistics or model with DP. Production examples include Google federated learning with secure aggregation and distributed DP, AWS Clean Rooms Differential Privacy, and Prio/DAP-style systems that split or aggregate client reports before release. The DP step is computationally cheap relative to the cryptography; the hard parts are privacy accounting, contribution bounding, sampling assumptions, and utility-loss management. In short, DP answers a problem that cryptography alone does not solve: attacks against the output. [34]

Hybrid architectures

Hybrid designs are increasingly the production default because they align technique to task. SecretFlow is explicit about this approach: it abstracts MPC, HE, and TEE into one privacy-preserving data-analysis and ML framework. Duality’s AWS case study similarly shows a move beyond one PET alone, using Nitro Enclaves to complement prior FHE, federated learning, and differential-privacy methods. Decentriq’s Azure-linked materials also describe clean-room architectures that combine confidential computing with other privacy technologies, including differential privacy. [35]

The architectural value is straightforward. A hybrid stack can use searchable encryption for narrow lookup, TEE execution for general SQL or model serving, MPC for cross-party joins and aggregation where no single operator should see the plaintext, FHE/PHE for the most sensitive arithmetic subroutines, and DP for anything released outside the trust boundary. That often beats any single primitive on the joint objective of security, functionality, and cost. The downside is also straightforward: security proofs become compositional rather than monolithic, and operational complexity rises sharply because each layer adds new assumptions, observability needs, and failure modes. [36]

Hybrid Architecture Layer Pattern

Encrypted or secret-shared data

Queryable index layer

Cross-party MPC or PJC layer

↓

Attested execution layer

↓

Policy and privacy layer

↓

Encrypted result, attested result, or DP release

This hybrid pattern is an analytical summary of the production architectures described by SecretFlow, cloud confidential-computing services, searchable-encryption systems, and DP release frameworks. [37]

Comparative tradeoffs

The table below is a qualitative synthesis rather than a literal benchmark. “Security level” refers to how much trust is removed from the execution environment when the stated assumptions hold. [38]

Technique	Security level	Supported analytics	Performance	Dev complex	Best fit	Primary caveat
Partial HE	High cryptographic protection for narrow arithmetic	Counts, sums, weighted sums, secure aggregation	High relative to PET alternatives	Low/Med	Simple outsourced arithmetic	Functionality too narrow for rich queries
Full HE (FHE)	Very high trust reduction for outsourced computation	Aggregation, vector ops, selected SQL-like ops, ML inference	Low to medium; often the slowest option	High	Single-owner outsourced compute	Blow-up, tuning, slow bootstrapping
MPC	Very high within explicit collusion thresholds	Aggregation, joins, PSI/PJC, partitioned ML	Medium; network- and round-bound	High	Cross-org collaboration without trusted hardware	Operational complexity and collusion assumptions
TEE / Confidential	High if hardware, firmware, and attestation assumptions hold	Broadest coverage: SQL, joins, arbitrary code, ML	High; often closest to native	Medium	Lift-and-shift confidential analytics	Side channels, larger TCB, hardware vulns
Searchable Encryption	Medium to high, but leakage-prone by design	Equality search, keyword search, some range/prefix/suffix	High	Medium	Queryable encrypted databases and search	Search/access/frequency leakage
OPE / ORE	Low to medium because order leakage is explicit	Sorting, range filters, thresholding, ORDER BY	Very high	Low/Med	Fast range search when leakage is acceptable	Inference attacks can recover structure
Functional Encryption	High for supported function families	Inner products, selected linear/quadratic analytics	Medium for narrow tasks	High	Fine-grained delegated analytics	Narrow functionality, low ecosystem maturity
DP + Encryption	High against output inference if well tuned	Aggregate analytics, telemetry, federated learning	High for DP; PET dominates cost	Medium	Sharing results safely after processing	Utility/privacy tradeoff and budget accounting
Hybrid Stack	Potentially strongest overall fit	Broadest practical coverage	Med to high if well partitioned	Very High	Real-world enterprise deployments	Security composition & operational complexity

Deployments, case studies, and vendor landscape

The clearest production maturity today is in confidential-computing and clean-room deployments. Google documents Confidential Space as a TEE for agreed workloads and uses the same foundation in Google Ads confidential matching. Microsoft positions Azure Confidential Computing as protection against cloud-operator access, and Decentriq uses Azure Confidential Computing for enterprise data clean rooms. AWS documents Nitro Enclaves with KMS-integrated attestation, and Duality’s AWS case study describes using Nitro Enclaves to create isolated processing spaces for sensitive-data analysis, including cross-border cancer research. These are strong signals that broad-functionality encrypted analytics is, in practice, currently led by TEE-centric and hybrid architectures. [48]

On the pure cryptography side, the market is real but more selective. IBM maintains HElayers and public FHE materials, and IBM has described an Intesa Sanpaolo deployment using FHE to secure digital-transaction workflows. Duality markets secure data collaboration for healthcare, finance, and government using PETs and open-source FHE. Zama has built an active FHE ecosystem around TFHE-rs, Concrete, and Concrete ML, though its most visible commercial push is currently blockchain/confidential smart-contract infrastructure rather than mainstream SQL analytics. Inpher remains a notable MPC/HE/federated-learning vendor with industry use across healthcare, finance, and IoT. [49]

For queryable encrypted databases, MongoDB Queryable Encryption is the most prominent mainstream example. It moved equality query support into general availability and now supports production equality and range query types, while still documenting storage, performance, and observability tradeoffs. CipherSweet, OpenSSE, Cosmian Findex, and CipherStash represent the adjacent software stack for application-level searchable encryption and encrypted index construction. Those systems are materially easier to adopt than FHE if the workload is dominated by exact/range/search predicates and the leakage profile is acceptable. [50]

For privacy-preserving aggregation and telemetry, Prio and its descendants are among the most credible real-world deployments. Mozilla has publicly described work to deploy Prio-based DAP in Firefox, and Divvi Up describes itself as a production system for aggregate statistics built on Prio3. Google’s federated-learning blog documents secure aggregation combined with distributed differential privacy in production model-training pipelines, while AWS Clean Rooms Differential Privacy is an example of a cloud product that explicitly formalizes privacy-controlled sharing of aggregate results. These are important because they show that encrypted analytics is not only about databases and model serving; it is also about safe measurement and telemetry at scale. [51]

Selection criteria, deployment checklist, and evaluation metrics

The first decision criterion is not vendor or algorithm; it is the trust boundary you are trying to move. If you mostly distrust the cloud operator but trust a hardware root of trust and need rich existing software, start with TEEs. If multiple organizations insist that no single operator may see data, start with MPC or PJC. If one data owner wishes to outsource computation without trusting the server at all, start with FHE or PHE. If the workload is mostly equality/range retrieval in a database, searchable encryption or queryable encryption may be enough. If the risk is not just input secrecy but also sensitive outputs, you need DP on top. [52]

The second criterion is workload shape. Rich joins, arbitrary UDFs, and model training strongly favor TEEs or hybrid arrangements. Cross-party federated features, overlap analysis, and private record linkage favor MPC/PJC. Low-depth inference, vector similarity, and selected arithmetic pipelines are increasingly feasible in FHE. Equality/range lookup over application data typically favors encrypted indexes. If you decide without pinning down the exact operators, query selectivity, data sizes, cardinalities, and latency SLOs, the project will almost always drift toward either under-securing or over-engineering the workload. [53]

Practical Deployment Stage-Gate Checklist

PET projects fail most often when teams skip explicit adversary modeling or realistic prototyping. [54]

Stage	What to do	Pass condition	Why it matters
Problem framing	Classify data, outputs, parties, and exact operators	You know whether the task is aggregation, search, join, inference, or training	PET choice is workload-specific, not generic
Threat model	Write down adversaries, collusion assumptions, and unacceptable leakages	Named threat model approved by security/legal	Techniques differ mainly in assumptions
Technique shortlist	Map workload to 2–3 candidate architectures	At least one cryptographic option and one operationally efficient option considered	Prevents premature lock-in
Key and identity design	Define key custody, enclave attestation flow, or share-holder governance	Keys or shares are never ad hoc	Most failures are operational, not mathematical
Prototype	Benchmark on representative data and queries at realistic security levels	Meets p95 latency, throughput, and cost guardrails	PET performance is extremely workload-sensitive
Leakage review	Document what metadata, patterns, or outputs remain observable	Explicit acceptance or rejection of leakage profile	Searchable encryption and TEEs especially need this
Privacy release controls	Add DP, quota controls, or query governance if results leave trusted boundary	Output policy defined and testable	Encryption alone does not solve output inference
Red-team / compliance	Test side channels, patching, logging, and legal claims	Findings resolved before rollout	PETs are not a silver bullet under GDPR/FTC/GLBA/HIPAA

The benchmark program should be equally explicit. The FHE Benchmarking Suite is especially useful as a model because it centers latency, throughput, memory, storage expansion, communication complexity, and quality loss. For TEE-based SQL, add enclave-specific metrics such as attestation time, EPC or enclave-paging behavior, cache-miss amplification, and observable overhead under realistic OLAP workloads. For searchable encryption, add index size, query selectivity, token-generation cost, and leakage profile documentation. For DP-based releases, add epsilon, delta, contribution bounding, privacy-budget burn rate, and utility loss. [55]

A good cross-technique benchmark suite usually includes at least five workload families: aggregations on wide tables; private join or PSI-plus-sum on skewed identifiers; search with equality and range predicates; SQL analytics on a TPC-H-like subset with one or two joins; and ML with one classical model and one compact neural model. For each, measure p50/p95 latency, throughput, ciphertext or share expansion, network bytes, RAM/VRAM, accuracy degradation, deployment time, and operator effort. If the solution requires hidden parameter tuning or hand-crafted circuits that your own engineers cannot maintain, treat that as a first-class cost signal, not a footnote. [56]

A concise decision rule is this: use the weakest tool that still closes the threat you actually care about. That usually means searchable encryption for narrowly queryable fields, TEE-based confidential computing for broad low-latency analytics, MPC for multi-organization collaboration without a trusted runtime, FHE when the server itself must be cryptographically untrusted, and DP whenever aggregate outputs leave the protected environment. The most defensible production systems combine two or more of these rather than forcing one primitive to do everything. [57]

Open questions and limitations

Several parts of this field are moving quickly enough that any static report has limits. General-purpose FHE for SQL and large-model training is improving, but the most credible evidence today still points to selective inference and narrow analytics rather than drop-in encrypted datastores for arbitrary workloads. The new benchmarking ecosystem is promising, but it is still young. [58]

Searchable encryption leakage is still an open design fault line. Efficient structured encryption is necessarily defined with explicit leakage, but the operational meaning of “acceptable leakage” remains highly context-specific and is still being quantified by active research. This is probably the most common place where vendor positioning and academic caution diverge. [59]

TEE risk is also not stable. Recent SGX and SEV-SNP results show that confidential computing remains exposed to evolving microarchitectural, firmware, and attestation-chain failures. For that reason, any decision that leans heavily on TEEs should be revisited as hardware generations, cloud attestations, and vendor patch guidance evolve. [25]

Finally, functional encryption remains under-evidenced in production compared with FHE, MPC, searchable encryption, and confidential computing. The theory is powerful and the libraries are real, but recent large-scale deployment evidence is sparse relative to the rest of the field. That should push FE toward targeted pilots, not broad enterprise commitments, unless the function family is unusually well matched to the application. [45]

References

Secure Data Clean rooms By Dataknobs