Story
Amaro Than: API Keys — Penetration Test & Remediation Report
Scope (what we tested)
Systems & endpoints: API authentication endpoints, token/key-issuance services, internal services that cache keys (Redis), key-usage paths in web/mobile backends, and associated CI/CD deploy pipelines.
Functional areas:
API key generation and lifecycle: creation, revocation, rotation, TTL.
Storage: persistent (database / vault), caches (Redis), environment variables, and in-memory handling.
Transit: key exchange over TLS/TCP between clients, edge, app servers, and internal microservices.
Key usage patterns: signing, HMAC verification, third-party integrations that accept our keys.
Auth contexts exercised: unauthenticated, authenticated with standard roles, and privileged operator contexts.
What isn’t included (or not publicly visible) ⚠️
Physical hardware forensics, hardware key extraction, and any vendor-managed HSM internals.
Third-party provider backends where we do not control execution (we tested integrations only as they affect our surface).
Public PoCs or exploit scripts — technical proofs and exploit code are retained privately for remediation tracking and forensic review only.
Test methodology & tools used
Automated scanning: Burp Suite for request/response inspection, TLS/crypto checks, and protocol verification.
Manual testing: focused sessions to simulate server compromise, in-memory key exposure scenarios, and race/flush tests (controlled, monitored).
Side-channel risk analysis: non-invasive measurement of microarchitectural leakage vectors (speculative execution patterns) at a high level — no weaponized exploit code was created or published.
Cache & persistence review: manual code review of Redis usage, connection handling, and key lifecycle code paths (PHP/Python).
Cryptographic review: verification of TLS configuration (strong ciphers, HSTS), verification of on-disk encryption for key vaults, and review of entropy sources during key generation.
Post-fix verification: retests using the same tooling and manual review; logs and scans archived.
Detailed findings (high level)
Severity summary
No critical unresolved exposures of long-lived API keys after remediation.
One medium risk: transient key material leakage from cached memory (Redis) coupled with measurable microarchitectural side-effects under specific, contrived conditions.
Several low findings: inconsistent key rotation intervals in legacy code, and some operator workflows that exposed keys in logs during rare failure modes.
Representative findings
In-memory leakage via Redis (medium): Under a specific pattern of workload and process scheduling, fragments of API keys were present in Redis memory pages after eviction/expiry cycles and could be observable in RAM snapshots on a compromised host. This was not a straightforward remote exploit — it required host-level access and specific timing conditions — but it did create a realistic post-compromise data leakage vector.
Microarchitectural side-effects (research observation): Controlled tests showed measurable CPU speculative-execution effects that could, in theory, be used to infer small amounts of transient memory state. The test team treated this as a risk vector rather than a practical, weaponized attack path in our deployment context and prioritized mitigations.
Operational leakage (low): Some legacy operator/debugging flows printed key fragments into verbose logs or accidentally exposed them in monitoring snapshots during rare failures.
What we didn’t find
No evidence that API keys were being exfiltrated through public endpoints or that external parties had successfully decrypted keys in transit. TLS and HMAC signing schemes were found to be properly implemented in production paths.
Remediation actions taken
Actions were applied promptly and comprehensively to eliminate the vector and harden key handling:
Redis & cache hardening
Switched from caching raw key material to caching opaque references (key IDs / short tokens) only; actual key material is kept in a secure vault/KMS when possible.
Configured Redis to use encrypted persistence where feasible and reduced the time keys could remain in cache.
Disabled swap on hosts that store secrets and enforced vm.swappiness and other kernel settings to lower risk of key pages being paged out.
Applied strict ACLs and network isolation for Redis instances (private VPCs, no public exposure, TLS between app ⇄ Redis where supported).
Memory hygiene & process changes
Reworked server code to zeroize sensitive buffers after use and avoid keeping long-lived in-process copies of full keys (use short-lived derived tokens for runtime operations).
Where native libraries were involved, introduced secure memory allocation patterns to minimize copies and reduce swapping risk.
Cryptography & transit
Verified and tightened TLS config (strong cipher suites, forward secrecy, strict certificate validation).
Ensured endpoints use ephemeral session keys where possible and enforced mutual TLS for highly privileged service-to-service channels.
Key lifecycle & storage
Centralized key storage in a dedicated secrets manager/KMS with audit trails and role-based access (instead of environment variables or flat DB fields).
Enforced short TTLs and automatic rotation of keys; retired long-lived test keys found during review.
Rotated any keys that were in potentially exposed environments, and issued notifications/rotations for any downstream integrations impacted.
Kernel & microcode
Applied OS kernel patches and microcode updates where recommended by vendors to mitigate speculative execution side-channels at the platform level.
Tuned CPU scheduling and disabled SMT/hyperthreading on hosts where testing suggested higher side-channel sensitivity during peak threat models (applied selectively based on risk assessment).
Logging & operator controls
Sanitized logs and monitoring to avoid accidental key leakage.
Implemented operator training and CI gates preventing commits that print or persist secret material.
Remediation timeline & verification
Discovery & Triage (days 0–2): Findings logged; immediate containment actions (cache TTL reduction, ACL tighten) applied within 24–48 hours.
Code fixes & infra changes (days 2–10): Application code changes (zeroization, cache reference model), Redis configuration updates, and key rotation executed.
Platform patches (days 3–14): OS/kernel/microcode updates deployed across impacted hosts following staged rollout and monitoring.
Retest & verification (days 10–18): Security team re-ran tests and confirmed memory leakage no longer reproducible; microarchitectural measurement no longer yielded actionable leakage in the deployed environment.
Follow-up (ongoing): Increased cadence for secrets audits and scheduled another penetration test within 90 days, plus continuous monitoring.
Evidence includes secure internal retest logs, scan exports, and updated CI test suites that validate no sensitive material prints in logs or remains in caches beyond expected TTLs. These artifacts are retained by the security team.
Systems & endpoints: API authentication endpoints, token/key-issuance services, internal services that cache keys (Redis), key-usage paths in web/mobile backends, and associated CI/CD deploy pipelines.
Functional areas:
API key generation and lifecycle: creation, revocation, rotation, TTL.
Storage: persistent (database / vault), caches (Redis), environment variables, and in-memory handling.
Transit: key exchange over TLS/TCP between clients, edge, app servers, and internal microservices.
Key usage patterns: signing, HMAC verification, third-party integrations that accept our keys.
Auth contexts exercised: unauthenticated, authenticated with standard roles, and privileged operator contexts.
What isn’t included (or not publicly visible) ⚠️
Physical hardware forensics, hardware key extraction, and any vendor-managed HSM internals.
Third-party provider backends where we do not control execution (we tested integrations only as they affect our surface).
Public PoCs or exploit scripts — technical proofs and exploit code are retained privately for remediation tracking and forensic review only.
Test methodology & tools used
Automated scanning: Burp Suite for request/response inspection, TLS/crypto checks, and protocol verification.
Manual testing: focused sessions to simulate server compromise, in-memory key exposure scenarios, and race/flush tests (controlled, monitored).
Side-channel risk analysis: non-invasive measurement of microarchitectural leakage vectors (speculative execution patterns) at a high level — no weaponized exploit code was created or published.
Cache & persistence review: manual code review of Redis usage, connection handling, and key lifecycle code paths (PHP/Python).
Cryptographic review: verification of TLS configuration (strong ciphers, HSTS), verification of on-disk encryption for key vaults, and review of entropy sources during key generation.
Post-fix verification: retests using the same tooling and manual review; logs and scans archived.
Detailed findings (high level)
Severity summary
No critical unresolved exposures of long-lived API keys after remediation.
One medium risk: transient key material leakage from cached memory (Redis) coupled with measurable microarchitectural side-effects under specific, contrived conditions.
Several low findings: inconsistent key rotation intervals in legacy code, and some operator workflows that exposed keys in logs during rare failure modes.
Representative findings
In-memory leakage via Redis (medium): Under a specific pattern of workload and process scheduling, fragments of API keys were present in Redis memory pages after eviction/expiry cycles and could be observable in RAM snapshots on a compromised host. This was not a straightforward remote exploit — it required host-level access and specific timing conditions — but it did create a realistic post-compromise data leakage vector.
Microarchitectural side-effects (research observation): Controlled tests showed measurable CPU speculative-execution effects that could, in theory, be used to infer small amounts of transient memory state. The test team treated this as a risk vector rather than a practical, weaponized attack path in our deployment context and prioritized mitigations.
Operational leakage (low): Some legacy operator/debugging flows printed key fragments into verbose logs or accidentally exposed them in monitoring snapshots during rare failures.
What we didn’t find
No evidence that API keys were being exfiltrated through public endpoints or that external parties had successfully decrypted keys in transit. TLS and HMAC signing schemes were found to be properly implemented in production paths.
Remediation actions taken
Actions were applied promptly and comprehensively to eliminate the vector and harden key handling:
Redis & cache hardening
Switched from caching raw key material to caching opaque references (key IDs / short tokens) only; actual key material is kept in a secure vault/KMS when possible.
Configured Redis to use encrypted persistence where feasible and reduced the time keys could remain in cache.
Disabled swap on hosts that store secrets and enforced vm.swappiness and other kernel settings to lower risk of key pages being paged out.
Applied strict ACLs and network isolation for Redis instances (private VPCs, no public exposure, TLS between app ⇄ Redis where supported).
Memory hygiene & process changes
Reworked server code to zeroize sensitive buffers after use and avoid keeping long-lived in-process copies of full keys (use short-lived derived tokens for runtime operations).
Where native libraries were involved, introduced secure memory allocation patterns to minimize copies and reduce swapping risk.
Cryptography & transit
Verified and tightened TLS config (strong cipher suites, forward secrecy, strict certificate validation).
Ensured endpoints use ephemeral session keys where possible and enforced mutual TLS for highly privileged service-to-service channels.
Key lifecycle & storage
Centralized key storage in a dedicated secrets manager/KMS with audit trails and role-based access (instead of environment variables or flat DB fields).
Enforced short TTLs and automatic rotation of keys; retired long-lived test keys found during review.
Rotated any keys that were in potentially exposed environments, and issued notifications/rotations for any downstream integrations impacted.
Kernel & microcode
Applied OS kernel patches and microcode updates where recommended by vendors to mitigate speculative execution side-channels at the platform level.
Tuned CPU scheduling and disabled SMT/hyperthreading on hosts where testing suggested higher side-channel sensitivity during peak threat models (applied selectively based on risk assessment).
Logging & operator controls
Sanitized logs and monitoring to avoid accidental key leakage.
Implemented operator training and CI gates preventing commits that print or persist secret material.
Remediation timeline & verification
Discovery & Triage (days 0–2): Findings logged; immediate containment actions (cache TTL reduction, ACL tighten) applied within 24–48 hours.
Code fixes & infra changes (days 2–10): Application code changes (zeroization, cache reference model), Redis configuration updates, and key rotation executed.
Platform patches (days 3–14): OS/kernel/microcode updates deployed across impacted hosts following staged rollout and monitoring.
Retest & verification (days 10–18): Security team re-ran tests and confirmed memory leakage no longer reproducible; microarchitectural measurement no longer yielded actionable leakage in the deployed environment.
Follow-up (ongoing): Increased cadence for secrets audits and scheduled another penetration test within 90 days, plus continuous monitoring.
Evidence includes secure internal retest logs, scan exports, and updated CI test suites that validate no sensitive material prints in logs or remains in caches beyond expected TTLs. These artifacts are retained by the security team.