FAQ

Frequently asked project questions—compact answers

Here you will find typical questions from PKI and CLM projects—answered in a practical way and intentionally kept concise. The answers support initial orientation, terminology clarification, and early architecture and operations decisions.

Search & Filters

Is CLM only relevant for TLS certificates?

No. Certificate Lifecycle Management (CLM) governs the lifecycle of machine identities as a whole, not just server TLS. Typical coverage includes TLS servers and clients (mTLS), certificates for APIs/gateways, Kubernetes/Ingress and service-mesh workloads, device and IoT identities, code signing, and S/MIME. The real value comes when inventory, ownership, issuance, renewal, rotation, revocation, and deployment run centrally, automatically, and in an audit-proof manner.

Can you continue using an existing Microsoft PKI (AD CS)?

Yes. AD CS often remains in place as the issuing CA while CLM acts as the orchestration layer on top. A proven target architecture includes discovery and inventory (Windows, Linux, appliances, cloud, Kubernetes), standardized templates/policies, automated enrollment and renewal flows (including installation and service reload), and clear roles and responsibilities (CA administration separated from application owners). This preserves existing investments and makes processes consistently auditable.

How can the risk of certificate outages be reduced?

Outages are usually caused by missing visibility and renewals that happen too late. Effective measures include complete inventory coverage (including shadow IT), clear ownership/metadata, alert tiers (e.g., 90/60/30/14/7/1 days), automated renewal with installation/deployment, and technical checks (chain validation, hostname/SAN, TLS handshake, mTLS auth). In addition, the entire chain (root/intermediate/leaf) should be monitored so an intermediate certificate does not become an unnoticed single point of failure.

How should HSMs be integrated effectively?

HSMs protect especially critical keys (root CA, issuing CAs, code signing) through hardware-based isolation and controlled cryptographic operations, often with certifications such as FIPS 140-2/140-3. Best practice: an offline root (used only to sign issuing CAs), issuing CAs backed by HSMs, documented key ceremonies (e.g., M-of-N control), clear roles (Security Officer vs. Operator), and tested backup/recovery processes. Integration is typically implemented via PKCS#11 or vendor-specific providers.

What is the difference between PKI and CLM?

PKI describes the infrastructure and trust chain (root/intermediate CA, policies, CRL/OCSP) used to issue X.509 certificates (RFC 5280). CLM is the operational discipline and tooling layer on top of it: inventory, approval, issuance, distribution/installation, renewal, rotation, and revocation. Without CLM, PKI often remains manual, which quickly leads to outages once you have hundreds to thousands of certificates.

What is the typical maximum lifetime of public TLS certificates today?

In the browser ecosystem, a common upper limit for publicly trusted TLS server certificates has been 398 days (about 13 months) for several years. This forces organizations to establish renewal automation and clear ownership. Internally (private CA), different lifetimes are possible, but from a security and operations perspective, shorter is usually better than longer.

Which standards are especially relevant for TLS/PKI in day-to-day operations?

For TLS, RFC 8446 (TLS 1.3) is foundational; for certificate profiles, RFC 5280. OCSP is defined in RFC 6960 and Certificate Transparency in RFC 6962. For automation, ACME (RFC 8555) is the de facto standard for public TLS; in enterprise/device environments, EST (RFC 7030) and CMP (RFC 4210) also matter.

How do I build a reliable certificate inventory?

In practice, you need discovery across every location where certificates reside: web servers, load balancers, API gateways, Java keystores, Windows certificate stores, Kubernetes secrets/Ingress, cloud KMS, CI/CD, and code-signing stores. Critical metadata includes owner, system, environment, criticality, expiration date, issuer, key type/size, and deployment targets. A practical KPI is the share of certificates with an assigned owner (target: 100%).

How do I identify and prevent shadow certificates?

Shadow certificates arise when teams obtain and deploy certificates independently without central visibility. Countermeasures include discovery/scanning, mandatory issuance paths (e.g., via CLM), policies for allowed issuers, and blocking unapproved CAs at central control points (Ingress/LB). CT monitoring (RFC 6962) also helps for publicly issued certificates for your own domains.

ACME, SCEP, EST, CMP: when should I use which?

ACME (RFC 8555) is ideal for automated TLS certificates (DNS/HTTP challenges). SCEP is historically widespread but functionally limited; EST (RFC 7030) is more modern and better suited for devices/enterprise use. CMP (RFC 4210) is very powerful (complex PKI workflows), but more demanding to integrate and operate.

How do I implement renewals without downtime?

Renewal is not just about issuing a new certificate; it also requires controlled deployment: write the certificate, reload the service, run a health check (TLS handshake, SNI/SAN, chain), and only then switch over. For load balancers/Ingress, roll out in stages (canary). Proven practice includes renewal windows 30 to 60 days before expiry and automatic retries including an audit trail.

What is the difference between renewal and key rotation?

Renewal typically replaces the certificate, but it may reuse the same key. Key rotation means a new private key plus a new certificate. For stronger security, rotation is recommended, e.g., periodically or at every renewal, especially for higher-risk workloads or when key exposure is suspected.

RSA or ECC: which key sizes are commonly used in practice?

RSA-2048 is still widely used in practice; RSA-3072 is used where a higher security margin is required. For ECC, curves such as P-256 are common and efficient (less CPU, smaller keys). The important part is aligning the algorithm with client compatibility (legacy devices) and internal policies.

How do I validate certificate chains correctly?

Correct validation means a complete chain up to the trust anchor, the correct order, and matching SAN entries (DNS/IP). In addition, Key Usage and Extended Key Usage must fit (e.g., Server Auth, Client Auth). A common mistake is a missing intermediate in deployment; stricter clients then fail hard.

What role do CRL and OCSP play in operations?

CRL is simple, but it can grow large and is often updated infrequently. OCSP (RFC 6960) enables online status checks per certificate, but it loads the responder and is operationally critical. In practice, OCSP stapling reduces runtime impact and makes revocation checking more robust.

What is OCSP stapling and why is it important?

With stapling, the server includes the OCSP status directly in the TLS handshake instead of requiring the client to query the responder separately. This reduces latency and decouples clients from OCSP responder availability. For highly available services, stapling is often essential.

What is Certificate Transparency (CT) and what is the value of monitoring?

CT (RFC 6962) logs publicly issued TLS certificates in CT logs. Monitoring can make mis-issuance visible faster (certificates for your domain that were not requested by you). Operationally useful: CT alerts combined with a playbook (validation, revocation if needed, incident process).

How do I prevent insecure algorithms and hashes?

This is policy work: ban SHA-1 and RSA-1024, enforce minimum sizes (e.g., RSA-2048+) and modern signature algorithms. Implement these rules in CA templates, CLM gates, and CI/CD checks. TLS policy and cipher suites should also be managed centrally.

How do I integrate CLM into Kubernetes?

Approaches include renewing certificates as secrets via controller/operator, Ingress controllers with automated certificate binding, and service-mesh mTLS with workload identities. It is important to separate issuance (CA/CLM) from distribution (K8s). Use short lifetimes and automatic rolling restarts so pods actually use the new key.

What are best practices for mTLS in microservices?

mTLS requires automation. Proven practices include short lifetimes, automatic renewal, clearly defined trust domains, and ClientAuth policies. SPIFFE/SPIRE can standardize workload identities; CLM must cover issuance, rotation, and audits. A good KPI is the share of automated mTLS renewals (target: close to 100%).

How do I handle appliances/load balancers/firewalls?

Many outages occur at non-automated endpoints. Appliances require API-based installation, consistent naming conventions, and rollback. Without CLM connectors, it remains manual work—and therefore an outage risk.

How do I implement code signing professionally?

Code signing requires strictly segregated keys (ideally in an HSM) and approval processes. For long-term verifiability, RFC 3161 time stamping is important so signatures remain verifiable after certificate expiration. Signing keys should not live on build agents; they belong in HSM/KMS-backed signing services.

How do public CA and private CA differ in governance?

Public CAs are suitable for Internet trust, but they follow strict rules and shorter lifetimes. Private CAs provide flexibility (mTLS, devices), but require governance: offline root, intermediates, revocation, logging, audits, and role concepts. A clear CA hierarchy per trust domain is better than many CAs with no ownership.

How do I plan high availability for OCSP/CRL and CA services?

OCSP/CRL are operational dependencies: responders/distribution points must be highly available and globally reachable. Plan redundancy (at least active/active), SLAs, and monitoring (latency, error rates, data freshness). For issuing CAs: no single points of failure, clean backups, and tested restore processes.

What does effective alerting for CLM look like?

In addition to expiration dates, monitor signals such as missing owner, failed deployments, unexpected issuers, policy violations, and unusual issuance rates. Robust alerting is multi-stage (e.g., 30/14/7/1 days) and integrated with on-call/ITSM. KPI: renewal success rate and number of escalated incidents per quarter.

Which typical failure patterns cause TLS outages?

Classic cases: expired certificate, wrong SAN/hostname, missing intermediate, key mismatch, mixed chain after a CA change, or services not loading the new certificate (no reload). Intermediate rotation is also often underestimated: chain updates must be rolled out cleanly everywhere. Automated handshake checks against real clients significantly reduce the risk.

How do I handle CA changes or intermediate rotation?

With intermediate changes, chain management is central: prepare the new chain, account for overlap phases, and perform staged rollouts. Test the most critical clients (legacy Java, embedded) before cutover. Maintain a migration playbook with a cutover plan, telemetry, and rollback criteria.

What does crypto-agility mean in practice?

Crypto-agility means algorithms, key sizes, and policies can be changed without a major project. This requires centrally defined policies, automated rotation, and configuration as code. A useful metric: how quickly can an algorithm change be rolled out to X percent of endpoints?

How do I prepare for post-quantum cryptography (PQC)?

PQC is a multi-year transition: inventory your algorithms, assess long-term confidentiality needs (e.g., 10+ years), and plan hybrid approaches. TLS, VPN, S/MIME, and code signing must be supported by PQC-capable clients/libraries. In the short term, crypto-agility is the most important step to make later transitions operationally manageable.

Which KPIs are suitable for measuring PKI/CLM maturity?

Useful KPIs include coverage (percentage of certificates in inventory), owner assignment (target 100%), degree of automation, renewal success rate, unplanned certificate incidents, median time to renew, and the percentage of policy-compliant algorithms. In addition: time to respond to CT alerts and rollout time for CA/intermediate changes.