Privatliv by design: Bygning af apps der ikke sporer
Privacy policies are promises. Architecture is proof.
You can write a privacy policy that says you don't track users while running Google Analytics, Facebook Pixel, and HubSpot on every page. You can't architect a system that structurally prevents data collection while simultaneously collecting data.
Privacy by Design is the principle that privacy protection should be built into the foundation of systems, not bolted on after construction. When implemented properly, the system can't violate user privacy because the architecture makes it physically impossible.
The Seven Privacy by Design Principles
Ann Cavoukian's framework, developed in the 1990s and now embedded in GDPR Article 25, defines seven principles. Here's what each one means in practice for software engineers.
1. Proactive, Not Reactive
Don't wait for a breach to think about privacy. Build privacy considerations into the requirements phase, not the compliance review phase.
In practice: Before adding any feature that touches user data, ask three questions: What data does this require? Is there a way to achieve the same functionality with less data? What happens to the data when it's no longer needed? If you can't answer all three, the feature isn't ready for development.
2. Privacy as the Default Setting
The default configuration should be the most private option. Users who never touch settings should have maximum privacy protection.
In practice: Analytics opt-in, not opt-out. Minimal data collection on registration (email and password, not name, phone, address, company). Profile fields are optional, not required. Sharing features default to private.
3. Privacy Embedded into Design
Privacy isn't a feature — it's an architectural property. It should be as fundamental as the choice of database or programming language.
In practice: Choose technologies that support privacy structurally. End-to-end encryption instead of server-side encryption (you can't access what you can't decrypt). Client-side processing instead of server-side processing where possible. Federated learning instead of centralized data collection.
4. Full Functionality — Positive-Sum, Not Zero-Sum
Privacy shouldn't come at the cost of functionality. "We need to track users for the feature to work" is usually a failure of imagination, not a technical constraint.
In practice: Server-side analytics provide business metrics without client-side tracking. Passkey authentication is more secure AND more convenient than passwords. Privacy-first architectures perform better because they load fewer third-party scripts.
5. End-to-End Security — Full Lifecycle Protection
Data should be protected from collection through storage through deletion. Not just encrypted at rest and in transit, but protected through its entire lifecycle.
In practice: Encrypt data at rest (AES-256), in transit (TLS 1.3), and ideally end-to-end (user holds the keys). Define retention periods at the data model level. Implement automated deletion that runs on schedule, not on request.
6. Visibility and Transparency
Users should be able to verify that the system works as described. Open-source code, public audits, and transparent data practices.
In practice: Open-source your privacy-critical components. Publish audit results. Provide users with data export and data deletion tools that actually work (not "submit a request and wait 30 days").
7. Respect for User Privacy
The user's interests come first. Every design decision should be evaluated from the user's perspective, not the company's perspective.
In practice: Never dark-pattern users into sharing more data. Never make privacy-preserving options harder to find or use. Never collect data "because we might need it later."
Architecture Patterns
End-to-End Encryption
The gold standard for data privacy. The server stores encrypted data that only the user can decrypt. Even if the server is compromised, user data remains protected.
Implementation approach for a notes application:
// Client-side: encrypt before sending to server
async function encryptNote(content: string, userKey: CryptoKey) {
const iv = crypto.getRandomValues(new Uint8Array(12));
const encoded = new TextEncoder().encode(content);
const encrypted = await crypto.subtle.encrypt(
{ name: 'AES-GCM', iv },
userKey,
encoded
);
return {
ciphertext: btoa(String.fromCharCode(...new Uint8Array(encrypted))),
iv: btoa(String.fromCharCode(...iv))
};
}
// Server stores ciphertext + iv
// Server CANNOT read the note content
Tradeoff: Server-side search, indexing, and processing are impossible on encrypted data. You need client-side search (slower, limited) or specialized solutions like homomorphic encryption (computationally expensive) or searchable encryption (limited query types).
Best for: Messaging, notes, passwords, documents, health data — any content where the user's expectation is that only they can read it.
Zero-Knowledge Architecture
The server verifies claims about data without seeing the data itself. Zero-knowledge proofs allow the server to confirm "this user knows the password" without ever receiving the password.
Practical application: authentication.
// Instead of: client sends password → server checks hash
// Zero-knowledge: client proves knowledge of password without revealing it
// Server stores: salt, verifier (derived from password)
// Client computes: proof using password + salt + server challenge
// Server verifies: proof is valid without learning the password
// SRP (Secure Remote Password) protocol implements this
Best for: Authentication systems, identity verification, any scenario where the server needs to verify a claim without learning the underlying data.
Differential Privacy
Add calibrated noise to aggregated data so individual records can't be reconstructed from the aggregate. Used by Apple and Google for telemetry.
import numpy as np
def add_differential_privacy(value, epsilon=1.0, sensitivity=1.0):
noise = np.random.laplace(0, sensitivity / epsilon)
return value + noise
# Example: report page views with privacy
actual_views = 1523
private_views = add_differential_privacy(actual_views, epsilon=0.5)
# Returns ~1523 ± noise, individual page views not recoverable
Best for: Analytics, surveys, telemetry — anywhere you need aggregate statistics but don't need individual-level data.
Local-First Architecture
Process data on the user's device. The server is a sync layer, not a processing layer. The user's device holds the source of truth; the server holds an encrypted copy for sync and backup.
Best for: Note-taking apps, productivity tools, personal finance — any app where the user generates and primarily consumes their own data.
Data Minimization in Practice
Data minimization isn't just "collect less data." It's a systematic approach to every data touchpoint.
Collection Minimization
Only collect what's strictly necessary for the requested functionality:
| Common practice | Privacy-first alternative |
|---|---|
| Require full name for registration | Username or email only |
| Collect date of birth for age verification | Boolean age confirmation (over 18: yes/no) |
| Store full IP address in logs | Store truncated IP (remove last octet) |
| Record precise geolocation | City-level or country-level only |
| Track individual page views | Aggregate page view counts |
Processing Minimization
Process data in the least invasive way possible:
- Aggregate early. Convert individual events to aggregates as close to collection as possible. Don't store 1 million individual pageview records when you need daily pageview counts.
- Pseudonymize immediately. Replace direct identifiers (email, name) with pseudonyms at the ingestion layer.
- Process locally when possible. If analysis can happen on the client device, keep the data there.
Retention Minimization
Define retention periods per data category and enforce them automatically:
-- Automated data retention enforcement
DELETE FROM page_views WHERE created_at < NOW() - INTERVAL '30 days';
DELETE FROM user_sessions WHERE last_active < NOW() - INTERVAL '7 days';
DELETE FROM search_queries WHERE created_at < NOW() - INTERVAL '24 hours';
Run retention enforcement as a scheduled job, not a manual process. Manual processes get forgotten.
Privacy-Preserving Analytics
You can measure everything that matters for business decisions without tracking individual users.
Aggregated Server-Side Metrics
Count events at the server level without individual attribution:
- Page views per day — increment a counter, don't log individual visits
- Referrer distribution — aggregate referrers into daily counts, don't store per-visit referrers
- Device category distribution — parse user-agent into desktop/mobile/tablet counts, discard the user-agent
- Geographic distribution — map IP to country, increment country counter, discard the IP
k-Anonymity
Ensure every data record is indistinguishable from at least k-1 other records. If your analytics shows a specific page was viewed by one user from Luxembourg at 3:47 AM, that's identifiable. If it shows the page was viewed by 50+ users from the Benelux region during the 3-4 AM hour, that's k-anonymous.
Privacy Budget
Assign a "privacy budget" to your analytics system — a maximum amount of information extractable about any individual user. Each query against the data consumes some of this budget. When the budget is exhausted, no more queries are allowed until the data refreshes. This prevents aggregated queries from being combined to reconstruct individual records.
FAQ
Is there a Privacy by Design certification? Not a single globally recognized certification. ISO 27701 (privacy information management) covers some PbD principles. The IAPP (International Association of Privacy Professionals) offers training. Some EU data protection authorities issue PbD assessments. The most practical approach is documenting your PbD implementation and having it reviewed by a privacy professional.
Does Privacy by Design conflict with GDPR requirements? No — GDPR Article 25 explicitly requires Data Protection by Design and by Default. Implementing PbD is a GDPR compliance strategy, not a conflict. The challenge is that PbD is more demanding than minimum GDPR compliance — it requires structural privacy, not just policy-based privacy.
What's the performance impact of end-to-end encryption? AES-256-GCM encryption/decryption on modern hardware adds <1ms per operation for typical document sizes. The WebCrypto API (browser-native) is hardware-accelerated on most devices. The real performance cost is in the architectural constraints: server-side search and indexing aren't possible on encrypted data, which means you need alternative approaches for features that depend on server-side data access.
How do you handle user experience tradeoffs? The most common concern is that privacy-first means fewer features. In practice, the constraint drives creative solutions: client-side search instead of server-side search, peer-to-peer sync instead of server-mediated sync, progressive feature unlocking based on user-chosen data sharing. Users consistently report preferring apps that ask for less data upfront.