prashant.dhingra.website
Field Guide · Data & Privacy Engineering · Updated June 2026

Analytics on data you can't see

Encrypted data analytics is an umbrella for the privacy-enhancing technologies that let you compute, search, join, and model over sensitive data while reducing or eliminating ordinary plaintext exposure during processing. No single technique wins. This guide catalogs eight of them, maps each to a threat model, and shows how to choose — and combine — them.

PD By Prashant Dhingra ~26 min read 8 techniques cataloged Primary sources ↓
← reduced plaintext exposurestrict cryptographic opacity →
TEE
attested hardware
OPE/ORE
order revealed
Search.
bounded leakage
DP
output control
FE
f(x) only
MPC
split trust
PHE
narrow arith.
FHE
ciphertext math
Key takeaways
  • "Encrypted analytics" is not binary. Systems differ in what they protect against (cloud operators, collaborators, DBAs, output inference, side channels, collusion) and in what they can compute.
  • The choice is threat-model-first, not feature-first — pick the weakest tool that still closes the threat you actually care about.
  • TEEs give the broadest functionality at near-native speed; MPC suits multi-party analytics; FHE removes the most server trust; searchable/OPE trade leakage for query speed; FE is narrow; DP guards the outputs.
  • The dominant 2026 production pattern is hybridization — confidential computing for general execution, cryptographic sub-protocols for the sensitive steps, searchable encryption for queryability, and DP for release.
  • PETs are complementary safeguards, not compliance exemptions — pseudonymised data is still personal data under GDPR.

01 — ScopeDefinitions and scope

In the strict cryptographic sense, encrypted analytics covers homomorphic encryption, MPC, structured/searchable encryption, private set operations, and functional encryption. In industry practice the umbrella also includes trusted execution environments and confidential computing.

Encrypted data analytics

Any architecture that allows useful analytics, search, joining, or ML over sensitive data while preventing the processing environment from having ordinary, unrestricted plaintext visibility — via computation on ciphertext or secret shares, queries over protected indexes, hardware-attested execution, or DP-controlled output release.

A useful distinction is between strict cryptographic opacity and reduced plaintext exposure. FHE, MPC, PIR/PSI, and most functional-encryption constructions aim for the former. TEEs aim for the latter: data becomes plaintext inside an attested enclave or confidential VM, but the operator, hypervisor, and surrounding platform are meant to stay outside the trust boundary. Crucially, ordinary encryption at rest or in transit alone does not qualify — the analytic engine still processes plaintext in its normal memory.

NIST's Privacy-Enhancing Cryptography project gives a stable foundation: MPC lets distrustful parties compute on private inputs; FHE evaluates functions on ciphertexts; PIR retrieves an item without revealing the query; structured encryption enables private queries over encrypted data structures; and functional encryption is listed among related tools. The "analytics" in scope is broad — aggregation, ML inference and (where feasible) training, SQL-like filtering and selected joins, search over encrypted data, and cross-party linkage such as private join and compute.

02 — Threats & LawThreat models and regulatory constraints

The first question is who is the adversary. Common models include an honest-but-curious cloud provider; a malicious operator or hypervisor; colluding input parties in an MPC workflow; a database administrator who can dump storage and logs; a side-channel attacker observing memory, page faults, or microarchitectural effects; and an analyst who sees only aggregate outputs but attempts membership or reconstruction attacks. Different PETs address very different subsets of that list.

  • Cryptographic techniques rest on lattice or number-theoretic hardness and on how much leakage the scheme permits by design.
  • MPC hinges on the corruption model and collusion threshold — passive vs malicious, honest- vs dishonest-majority, and behavior under abort.
  • TEEs have a narrower, operational model: only an attested workload may access keys or plaintext — so hardware/firmware bugs, side channels, or vendor supply-chain trust fall outside it.

Regulation is nuanced. Under GDPR, pseudonymisation is still processing of personal data, not automatic anonymization; Article 25 requires data protection by design and default, and Article 32 cites pseudonymisation and encryption as example measures. The ICO's PET guidance is explicit that PETs support data minimisation and security but are not a silver bullet, and still require lawful, fair, transparent processing plus a case-by-case DPIA. U.S. sectoral regimes echo this: HIPAA points to NIST controls, the FTC stresses understanding data flows and avoiding deceptive privacy claims, and the GLBA Safeguards Rule is explicitly risk-based.

The recurring lesson

Encrypted analytics helps with risk reduction, processor minimization, breach resilience, and cross-organizational sharing — but it usually does not eliminate obligations around purpose limitation, transparency, data-subject rights, retention, or transfers, especially when a controller can still decrypt outputs or relink results to individuals.

03 — CatalogThe technique catalog

Eight families, each strongest at a different job. The colored spine on each card matches the spectrum strip at the top of the page.

01Partial HE (PHE)
additive homomorphism · low functionality
ComputesCounts, sums, weighted sums, private billing, secure aggregation — often embedded inside larger protocols.
CaveatNo arbitrary comparisons, joins, or general SQL without combining with other primitives.
python-paillier · LightPHE
02Fully HE (FHE)
ciphertext computation · strongest server-trust reduction
ComputesAggregation, vector ops, similarity search, low-depth arithmetic, selected ML inference; multiparty variants extend toward collaboration.
CaveatSlow; bootstrapping bottleneck; general SQL, joins, and training remain mostly research/prototype.
OpenFHE · SEAL · HElib · Lattigo · TFHE-rs · Concrete ML
03Secure MPC
distributed trust · multi-party
ComputesAggregations, histograms, PSI, private joins, overlap-and-sum, federated analytics, partitioned ML training/inference.
CaveatNetwork- and round-bound; complex to operate; security collapses if too many parties collude.
MP-SPDZ · MPyC · MOTION · EMP · ABY3 · SecretFlow
04TEE / Confidential computing
attested hardware · broadest functionality
ComputesSQL, joins, arbitrary code, conventional databases, ML training/inference — often near-native speed (DuckDB-SGX2: <2× on TPC-H SF30).
CaveatLarger TCB; side channels (SGX.Fail, SEV-SNP "Fabricked" 2026); needs patch discipline + attestation-bound keys.
SGX · Nitro Enclaves · SEV-SNP · TDX · Confidential Space · Gramine
05Searchable encryption
encrypted indexes · queryable
ComputesEquality search, keyword/document retrieval, selected range/prefix/suffix queries over protected indexes.
CaveatLeaks some of: search/access pattern, frequency, result size, structure — exploitable by leakage-abuse attacks.
MongoDB Queryable Encryption · OpenSSE · CipherSweet · Cosmian Findex · CipherStash
06OPE / ORE
order-revealing · range-native
ComputesSorting, range filters, thresholding, ORDER BY-like semantics — very fast, easy to index.
CaveatReveals order by design; inference attacks can recover substantial plaintext given auxiliary data.
CipherStash ore.rs · Block-ORE constructions
07Functional encryption
keys bound to functions · f(x) only
ComputesInner products, selected linear/quadratic functions, some scoring and ML components; analysts learn f(x), nothing more.
CaveatNarrow functionality, thin tooling, demanding key authority — promising but low-maturity for enterprise.
Fentec libraries · research prototypes
08DP + encrypted execution
output control · pairs with any PET
ComputesNot a compute method — it bounds what the released statistics or model can reveal about any individual. Pairs with secure aggregation.
CaveatHard parts are privacy accounting, contribution bounding, sampling assumptions, and utility loss.
OpenDP · Google secure aggregation · AWS Clean Rooms DP · Prio/DAP

04 — TradeoffsComparative tradeoffs

A qualitative synthesis, not a literal benchmark. "Security level" means how much trust is removed from the execution environment when the stated assumptions hold.

TechniqueSecurity levelSupported analyticsPerformanceBest fitPrimary caveat
Partial HEHigh for narrow arithmeticCounts, sums, weighted sums, aggregationHigh (no bootstrapping)Simple outsourced arithmeticFunctionality too narrow for rich queries
Full HEVery high trust reductionAggregation, vector ops, similarity, ML inferenceLow–medium; often slowestSingle-owner outsourced computeCiphertext blow-up, slow bootstrapping, limited SQL/training
MPCVery high within collusion thresholdsAggregation, joins, PSI/PJC, partitioned MLMedium; network/round-boundCross-org collaboration, no trusted hardwareOperational complexity & collusion assumptions
TEE / confidential computingHigh if HW/firmware/attestation holdBroadest: SQL, joins, arbitrary code, MLHigh; often closest to nativeLift-and-shift confidential analyticsSide channels, larger TCB, HW vulnerabilities
Searchable encryptionMedium–high, leakage-proneEquality, keyword, some range/prefix/suffixHighQueryable encrypted databasesSearch/access/frequency leakage
OPE / ORELow–medium (order leaks)Sorting, range filters, thresholdingVery highFast range search when leakage acceptableInference attacks recover plaintext structure
Functional encryptionHigh for supported functionsInner products, selected linear/quadraticMedium for narrow tasksFine-grained delegated analyticsNarrow functionality, low ecosystem maturity
DP + encrypted executionHigh vs output inferenceAggregates, telemetry, federated learningHigh for DP stepSafe result release after protected computeUtility/privacy tradeoff & budget accounting
HybridPotentially strongest overallBroadest practical coverageMedium–high if well partitionedReal-world enterprise deploymentsCompositional proofs & operating complexity

← swipe the table to see all columns →

05 — HybridHybrid architectures

Hybrids are increasingly the production default because they align technique to task. SecretFlow abstracts MPC, HE, and TEE into one framework; Duality's AWS work uses Nitro Enclaves alongside prior FHE, federated learning, and DP; Decentriq combines Azure confidential computing with other privacy technologies including DP. A well-partitioned stack looks like this:

Searchable
narrow lookup
TEE
general SQL / serving
MPC
cross-party joins
FHE/PHE
sensitive arithmetic
DP
anything released

The value is straightforward: this often beats any single primitive on the joint objective of security, functionality, and cost. The downside is equally straightforward — security proofs become compositional rather than monolithic, and operational complexity rises sharply because each layer adds new assumptions, observability needs, and failure modes.

06 — DeploymentsDeployments and vendor landscape

The clearest production maturity today is in confidential-computing and clean-room deployments — broad-functionality encrypted analytics is, in practice, currently led by TEE-centric and hybrid architectures.

Confidential computing & clean rooms

Google Confidential Space (and Google Ads confidential matching), Azure Confidential Computing with Decentriq clean rooms, and AWS Nitro Enclaves with Duality (including cross-border cancer research).

Pure cryptography (FHE/MPC)

IBM HElayers (an Intesa Sanpaolo digital-transaction deployment), Duality for healthcare/finance/government, Zama (TFHE-rs, Concrete, Concrete ML), and Inpher for MPC/HE/federated learning.

Queryable encrypted databases

MongoDB Queryable Encryption — the most prominent mainstream example — plus CipherSweet, OpenSSE, Cosmian Findex, and CipherStash for application-level searchable encryption.

Private aggregation & telemetry

Prio and descendants: Mozilla's Prio-based DAP in Firefox and Divvi Up (Prio3), plus Google federated learning with secure aggregation and AWS Clean Rooms Differential Privacy.

On the cryptography side the market is real but more selective. MongoDB Queryable Encryption is the clearest queryable-database example: equality and range queries are production-supported, while prefix/suffix/substring remain public preview in 8.2 (GA targeted for 2026) — and MongoDB documents the real costs of queryability: extra storage, query-performance impact, and reduced observability because encrypted collections redact some logs.

07 — Signals2026 signals

Recently updated

What's moving right now

  • Confidential computing went to the GPU. NVIDIA Confidential Computing across Hopper and Blackwell GPUs (encrypted VRAM, attestable alongside a CPU TEE) makes TEE-based private AI inference and training practical — a major boost for the broad-functionality end of this spectrum.
  • TEE risk kept evolving. The 2024 SGX.Fail systematization and the 2026 "Fabricked" SEV-SNP result (arbitrary read/write and forged attestation under a routing-misconfiguration attack, acknowledged by an AMD bulletin) confirm that TEE security must be revisited as hardware, attestation, and patch guidance change.
  • FHE commercialized and accelerated. Zama became the first FHE unicorn (2025) and targets 500–1,000 TPS via GPU; the FHE Benchmarking Suite matured into a standard way to compare latency, throughput, memory, storage blow-up, communication, and accuracy loss.
  • Queryable encryption broadened. MongoDB QE added production range queries and moved prefix/suffix/substring into public preview (8.2), with GA expected in 2026 — narrowing the gap between "encrypted at rest" and "queryable in use."
  • Private telemetry scaled. Prio/DAP deployments (Firefox, Divvi Up) show encrypted analytics is not only about databases and model serving, but also safe measurement at population scale.

08 — ChooseSelection criteria & deployment checklist

The first decision criterion is not vendor or algorithm — it's the trust boundary you are trying to move. The second is workload shape. A concise decision rule: use the weakest tool that still closes the threat you actually care about, and combine two or more rather than forcing one primitive to do everything.

  • Distrust the cloud operator but trust a hardware root and need rich existing software → TEEs.
  • Multiple organizations, no single operator may see data → MPC / PJC.
  • One owner outsourcing computation, server fully untrusted → FHE / PHE.
  • Mostly equality/range retrieval in a database → searchable / queryable encryption.
  • Sensitive outputs, not just inputs → add differential privacy.

Stage-gate deployment checklist

StageWhat to doPass condition
Problem framingClassify data, outputs, parties, and exact operatorsYou know if it's aggregation, search, join, inference, or training
Threat modelWrite down adversaries, collusion assumptions, unacceptable leakagesNamed threat model approved by security/legal
Technique shortlistMap workload to 2–3 candidate architectures≥1 cryptographic and ≥1 operationally efficient option considered
Key & identity designDefine key custody, attestation flow, or share-holder governanceKeys/shares are never ad hoc
PrototypeBenchmark on representative data at realistic security levelsMeets p95 latency, throughput, and cost guardrails
Leakage reviewDocument observable metadata, patterns, or outputsExplicit acceptance or rejection of the leakage profile
Release controlsAdd DP, quotas, or query governance if results leave the boundaryOutput policy defined and testable
Red-team & complianceTest side channels, patching, logging, legal claimsFindings resolved before rollout

← swipe the table →

09 — MetricsWhat to actually measure

The benchmark program should be explicit. The FHE Benchmarking Suite is a good model, centering latency, throughput, memory, storage expansion, communication complexity, and quality loss. Extend it per technique:

  • TEE-based SQL — attestation time, EPC/enclave-paging behavior, cache-miss amplification, observable overhead under realistic OLAP.
  • Searchable encryption — index size, query selectivity, token-generation cost, and a documented leakage profile.
  • DP-based releases — epsilon, delta, contribution bounding, privacy-budget burn rate, and utility loss.

A good cross-technique suite includes at least five workload families: aggregations on wide tables; private join / PSI-plus-sum on skewed identifiers; search with equality and range predicates; SQL analytics on a TPC-H-like subset with one or two joins; and ML with one classical and one compact neural model. For each, measure p50/p95 latency, throughput, ciphertext/share expansion, network bytes, RAM/VRAM, accuracy degradation, deployment time, and operator effort. If a solution needs hidden parameter tuning or hand-crafted circuits your own engineers can't maintain, treat that as a first-class cost signal, not a footnote.

10 — LimitsOpen questions and limitations

Several parts of this field move quickly enough that any static guide has limits. General-purpose FHE for SQL and large-model training is improving, but the most credible evidence still points to selective inference and narrow analytics rather than drop-in encrypted datastores. Searchable-encryption leakage remains an open design fault line — "acceptable leakage" is highly context-specific and still being quantified, and this is where vendor positioning and academic caution most often diverge. TEE risk is not stable, as recent SGX and SEV-SNP results show. And functional encryption remains under-evidenced in production relative to the rest of the field, which argues for targeted pilots rather than broad enterprise commitments unless the function family is unusually well matched.

11 — FAQFrequently asked questions

What is encrypted data analytics? +
An umbrella for privacy-enhancing technologies that let you compute over sensitive data while reducing or eliminating ordinary plaintext exposure during processing — homomorphic encryption, MPC, searchable/structured encryption, private set operations, functional encryption, and (in practice) trusted execution environments. Plain encryption at rest or in transit does not qualify, because the analytic engine still processes plaintext in normal memory.
Which PET should I choose for analytics? +
It's threat-model-first. TEEs for broad, low-latency analytics when you trust a hardware root; MPC when multiple organizations must analyze jointly without a trusted operator; FHE when one owner outsources and the server must be cryptographically untrusted; searchable/queryable encryption for equality and range lookups; OPE/ORE for fast range search when order leakage is acceptable; functional encryption for narrow delegated functions; and DP on top whenever aggregate outputs leave the trusted boundary. Use the weakest tool that still closes the threat you care about.
Does encrypted analytics make data exempt from GDPR or HIPAA? +
No. Pseudonymisation is still processing of personal data under GDPR, and PETs are complementary safeguards, not a silver bullet. Article 25 requires data protection by design; Article 32 cites pseudonymisation and encryption as example measures. HIPAA, the FTC, and the GLBA Safeguards Rule are risk-based. Encrypted analytics aids risk reduction and sharing but doesn't remove obligations around purpose limitation, transparency, data-subject rights, or output governance.
Is FHE practical for analytics in 2026? +
It's most mature for selected aggregation, vector ops, similarity search, low-depth arithmetic, and ML inference — not general OLAP. Broad SQL, joins, and large-model training over FHE remain mostly research or prototype. Bootstrapping is still a dominant bottleneck, so the best uses are single-owner outsourced computation and hybrid pipelines where FHE protects only the most sensitive steps.
What's the risk with searchable and order-preserving encryption? +
They trade leakage for speed. Efficient searchable encryption is defined in terms of allowed leakage — search/access pattern, frequency, result size, structure — which leakage-abuse attacks can exploit. OPE/ORE explicitly reveal order, enabling inference attacks given auxiliary distribution data. Both can be right for equality/range/search, but only when the leakage profile is documented, accepted, and bounded by access controls.
What is the most common production pattern? +
Hybridization: a confidential-computing layer for general execution, cryptographic sub-protocols (MPC/FHE) for the most sensitive steps, searchable encryption for narrow queryability, and differential privacy for result release. Frameworks and products reflecting this include SecretFlow, Duality on AWS Nitro Enclaves, Decentriq on Azure, MongoDB Queryable Encryption, AWS Clean Rooms DP, and Prio/DAP systems.

12 — SourcesPrimary sources

Synthesized from standards bodies, regulators, vendor documentation, peer-reviewed research, and 2024–2026 systems work.