23 January 2026·9 min read·

AVDZero TrustVDI

Designing Azure Virtual Desktop: Principles That Actually Matter

Why build AVD and what problems does it solve.

Azure Virtual Desktop has matured significantly since its initial release. What was once a complex, sprawling service requiring deep Azure networking expertise has evolved into a genuinely viable enterprise VDI platform. But maturity in the product doesn't guarantee maturity in implementations. I've seen AVD deployments that replicate every anti-pattern from legacy Citrix environments, and others that embrace cloud-native principles to deliver something genuinely transformative.

This post distils the architectural principles I apply when designing AVD solutions. It's not a step-by-step deployment guide — Microsoft's documentation covers that adequately. Instead, it focuses on the decisions that determine whether your AVD platform becomes an asset or a liability.

The Fundamental Shift: Cloud-Native, Not Lift-and-Shift

The most common mistake in AVD design is treating it as "Citrix in Azure." Organisations take their existing VDI architecture — ExpressRoute backhaul, domain-joined hosts, on-premises file servers for profiles — and recreate it in Azure. This approach delivers the worst of both worlds: cloud costs with on-premises constraints.

A well-designed AVD platform should be cloud-native by default. This means Entra ID-joined session hosts (not hybrid-joined), Azure Files for profile storage (not on-premises file servers accessed over ExpressRoute), and connectivity that doesn't depend on your corporate WAN.

Legacy Approach

User

AVD Gateway

Session Host

Domain-Joined

ExpressRoute

On-Prem DC

On-Prem Files

Cloud-Native Approach

User

AVD Gateway

Session Host

Entra ID-Joined

Azure Files

Premium

Entra ID

+ Kerberos

Cloud-native eliminates on-premises dependencies — fewer hops, fewer failure domains

The cloud-native approach eliminates dependencies that create fragility. When your AVD platform doesn't need ExpressRoute to function, you've removed a significant failure domain. When authentication happens against Entra ID rather than domain controllers in a distant data centre, you've improved both resilience and user experience.

This doesn't mean zero connectivity to on-premises resources. Legacy applications still exist, and users need to access them. But that access should be application-level, through services like Zscaler Private Access or Azure Private Link, rather than network-level through broad VPN or ExpressRoute connectivity. The session host itself remains cloud-native; legacy access is an overlay, not a foundation.

Landing Zone Design: Standardisation Over Flexibility

AVD landing zones often suffer from organic growth. Department A gets a subscription with their preferred configuration. Department B gets another with slightly different policies. Before long, you have a fragmented estate where every deployment is a special case.

The solution is to treat AVD landing zones as standardised patterns, enforced through Azure Policy at the management group level. Departments and regions may have separate subscriptions for billing and RBAC purposes, but the technical implementation should be identical and portable.

Org Policies & Blueprints

Security, tagging, RBAC — enforced at management group level

Management

Nerdio Manager

Log Analytics

Key Vault

Image Pipeline

AVD — UK South

Compute Gallery

Host Pools

FSLogix Storage

AVD — West Europe

Compute Gallery

Host Pools

FSLogix Storage

Management subscription orchestrates all AVD subscriptions via Nerdio + image pipeline

Standardised landing zones — identical patterns across regions and business units

The management subscription hosts shared services: orchestration tooling, monitoring, secrets management, and image pipelines. AVD subscriptions consume these services but don't define their own patterns. This separation ensures that when you need to deploy AVD in a new region or for a new business unit, you're instantiating a known pattern rather than designing from scratch.

Sizing: The PCU Model

Capacity planning for VDI is notoriously difficult. Traditional approaches often result in either over-provisioning (wasting money) or under-provisioning (degraded user experience). The Peak Concurrent User (PCU) model provides a structured framework for getting it right.

The core formula is straightforward:

PCU = Headcount × ActivityFactor × OverlapFactor × PeakUplift

ActivityFactor represents the proportion of users expected to use AVD on any given day. Not everyone logs in every day — contractors might use the platform twice a week, while analysts use it daily.

OverlapFactor reflects simultaneous usage at peak times. Even if 500 people use the platform in a day, they're not all online at 9:15am.

PeakUplift (typically 1.15–1.25) accounts for short-term spikes beyond normal peaks.

Once you have PCU, you need to map it to host pool capacity. This requires defining user personas based on workload intensity:

Persona	Typical Workload	VM SKU	Users per Host
Occasional	Basic productivity, web apps	D8s_v5 (8 vCPU / 32 GB)	10-12
Regular	Office apps, light data work	D8s_v5 (8 vCPU / 32 GB)	8-10
High	Analytics, development, heavy apps	D16s_v5 (16 vCPU / 64 GB)	4-6

With personas defined, calculate base hosts for each, add a 15% buffer for drain/failover scenarios, and configure autoscale bands accordingly. The goal is three bands: minimum (off-peak), core (matches PCU), and peak (PCU plus buffer).

Critically, pre-warm capacity 30-45 minutes before expected peaks. Users arriving at 9am shouldn't wait for VMs to spin up — the capacity should already be warm and ready.

Profile Management: FSLogix Done Right

FSLogix profile containers are the standard approach for non-persistent AVD desktops, but implementation details matter enormously. Poor FSLogix design manifests as slow logins, profile corruption, and frustrated users.

Storage selection is the first decision. Azure Files Premium with Entra Kerberos authentication is the baseline recommendation. It supports Entra-joined hosts without domain controllers, delivers predictable performance, and keeps traffic on the Azure backbone via Private Endpoints. Azure NetApp Files offers better raw performance but requires AD DS integration, which conflicts with a cloud-native identity strategy.

Sizing for logon storms is critical. The worst-case scenario for profile storage isn't steady-state usage — it's 8:55am when everyone logs in simultaneously. Profile attach times should meet a p95 target of under 15 seconds and p99 of under 25 seconds. If your storage can't sustain this during peak logon, users will notice.

Container sizing varies by user type. Standard users typically need 10-20GB; heavy Outlook or reporting users may need 20-40GB. Enable OneDrive Files On-Demand to reduce storage requirements, and configure exclusions for caches that don't need to roam.

Session Hosts

Host Pool A

Host Pool B

Azure Files Premium

Profile Share A

Profile Share B

ODFC Share

Outlook data

Data Protection

Snapshots

Hourly

Azure Backup

Daily

Soft Delete

14-day retention

HostsFSLogix mountSharesProtection

Three-layer profile architecture — compute, storage, and protection are independently scalable

Resilience requires multiple layers. Enable soft delete on storage accounts to protect against accidental or malicious deletion. Configure Azure Backup with both snapshots (for fast operational recovery) and vaulted backups (for retention and regional DR). For mission-critical deployments, consider warm standby shares in a paired region with scheduled AzCopy synchronisation.

Application Delivery: MSIX App Attach as Default

Application delivery strategy determines how clean your golden images stay and how quickly you can respond to application changes. MSIX App Attach should be the default model for AVD.

The principle is separation: the golden image contains the operating system, core productivity tools, and mandatory agents. Everything else — line-of-business applications — is delivered dynamically through App Attach. Applications are stored in containers (VHD/VHDX/CIM format) on Azure Files, mounted at user logon, and presented to users based on Entra ID group membership.

Golden Image — Static Base

Windows 11

Multi-session

Core Apps

Office, Edge

Agents

FSLogix, Defender

+ mounted at logon

MSIX App Attach — Dynamic Layer

Finance Apps

SG-App-Finance

Dev Tools

SG-App-DevTools

Analytics

SG-App-Analytics

Separation of concerns — OS and apps have independent lifecycles

The benefits compound over time. Image updates no longer require coordination with application owners — the OS and apps have independent lifecycles. Application updates can be staged alongside existing versions and cut over without host redeployment. The same MSIX packages created for Intune deployment to physical devices work in AVD, enabling genuine "build once, deploy anywhere" consistency.

Not every application converts cleanly to MSIX. Applications with kernel drivers, complex licensing, or deep system integration may need to be baked into the golden image as exceptions. But these should be documented exceptions with justification, not the default approach.

One critical limitation: user-driven installs from Intune Company Portal don't work in pooled AVD. Company Portal assumes a 1:1 user-to-device relationship, which doesn't exist in multi-session environments. All application delivery must be either machine-assigned (in the image) or dynamically attached (MSIX App Attach).

Image Lifecycle: Rings and Immutability

Golden images should be immutable and versioned. An image, once built and validated, should never be modified in place. Changes result in new versions, and those versions flow through validation rings before reaching production.

A practical ring structure for AVD images:

Ring	Purpose	Population	Promotion Criteria
Insider	Early validation of upcoming changes	IT volunteers, nominated testers	Build completes, basic smoke tests pass
Pilot	Broader validation with real workloads	Representatives from each business function	1 week stable, no blocking issues
Production	General availability	All users	Pilot sign-off, CAB approval

Build

Monthly

1 wk

Insider

Validation pool

1 wk

Pilot

All functions

Production

All pools

Build

Monthly

Insider

Validation pool

Pilot

All functions

Production

All pools

Issues at any ring → remediate & rebuild → re-enter pipeline

Immutable images flow through validation rings — rollback is always one version away

Each month, a new image is built with cumulative updates, driver updates, and agent updates. This image enters the Insider ring for initial validation. If no blocking issues emerge after a week, it promotes to Pilot. After another week of stable operation with broader user representation, it promotes to Production.

The Insider ring deserves particular attention. Access should be controlled through Entra ID Access Packages, ensuring only nominated testers can access Insider hosts. This prevents accidental assignment and provides auditable governance. Each user group added to AVD should nominate at least two members for Insider testing — this ensures coverage while keeping the blast radius contained.

Azure Compute Gallery provides the versioning infrastructure. Images are stored with version numbers, and host pools reference specific versions. Rollback is straightforward: point the host pool back to the previous version and reimage hosts.

Orchestration: Nerdio as the Operational Layer

Native Azure tooling for AVD is comprehensive but fragmented. Managing images, host pools, autoscaling, and day-to-day operations across the Azure portal, PowerShell, and various APIs creates operational overhead and inconsistency.

Nerdio Manager for Enterprise consolidates these operations into a single interface while keeping Azure as the system of record. It doesn't replace Azure constructs — it orchestrates them. Host pools are still Azure resources; images still live in Azure Compute Gallery; autoscaling still uses Azure's native capabilities. Nerdio provides the workflow layer that makes operating these components manageable at scale.

Key capabilities to leverage:

Image lifecycle management: Build, test, and promote images through defined rings with approval gates. Automate the monthly build process while retaining manual promotion control.

Autoscaling: Configure scale bands (minimum/core/peak) aligned to the PCU model. Pre-warm capacity before peaks. Drain hosts gracefully before shutdown.

Host operations: Drain, reimage, resize, and rebuild hosts through standardised workflows with audit trails. No more ad-hoc PowerShell scripts with inconsistent outcomes.

Cost visibility: Track cost per concurrent user, surface right-sizing recommendations, and integrate with FinOps dashboards.

The critical principle is that Nerdio configuration should be treated as code. Export configurations regularly, store them in source control, and document the native Azure equivalents. This prevents lock-in and ensures you can rebuild the orchestration layer if needed.

Security: Zero Trust as Foundation

AVD security should assume breach. Every component — session hosts, storage, management plane — should be secured independently, with no implicit trust based on network location.

Session hosts should have no inbound ports exposed. AVD uses reverse connect — the host initiates the connection to Microsoft's control plane, and user traffic flows back through that established channel. There's no technical reason to allow RDP inbound to session hosts, and doing so creates unnecessary attack surface.

Storage should be accessed exclusively through Private Endpoints. This keeps traffic on the Azure backbone and eliminates exposure to the public internet. Authentication uses Entra Kerberos, removing the need for domain controllers in Azure.

Secrets belong in Key Vault with Private Endpoints. Automation should use Managed Identities rather than service principals with stored credentials. No credentials should be embedded in scripts, images, or configuration.

Conditional Access gates entry to the platform. Require MFA, enforce device compliance, block legacy authentication, and consider session controls for sensitive workloads.

Privileged access should follow least-privilege principles. Helpdesk staff don't need local admin on session hosts — they need scoped permissions to drain hosts and restart sessions. Administrative access to the management plane should require PIM activation with justification and time limits.

Monitoring: Observability That Drives Action

Monitoring should answer two questions: "Is the platform healthy?" and "Are users having a good experience?" The first is necessary but insufficient; the second is what actually matters.

Platform health metrics include host availability, autoscale events, storage performance, and agent status. These belong in Azure Monitor with alerts configured for anomalies.

User experience metrics require more nuance. Track login times (target p95 under 30 seconds to interactive desktop), FSLogix attach times (target p95 under 15 seconds), and session reconnect success rates (target above 99%). These metrics should be visualised in dashboards that operations teams review daily.

Data Sources

Session Hosts

Azure Files

Nerdio Manager

Entra ID

Log Analytics Workspace

Performance Counters

FSLogix Logs

Autoscale Events

Sign-in Logs

Actionable Outputs

Dashboards

Daily review

Alerts

Anomaly detection

Reports

Monthly SLA

From raw telemetry to actionable insight — proactive detection before users report issues

The goal is proactive identification of degradation before users report it. If FSLogix attach times are trending upward over several days, you want to investigate before they breach SLA. If autoscale is consistently hitting peak capacity, you want to increase headroom before users experience session queuing.

Conclusion

AVD design is fundamentally about making deliberate choices rather than accepting defaults. Cloud-native identity over domain join. Standardised landing zones over organic growth. Immutable images with ring-based promotion over ad-hoc updates. MSIX App Attach over image bloat. These choices compound — each one makes the platform more manageable, more resilient, and more cost-effective.

The principles in this post aren't theoretical. They emerge from implementations where the alternative approaches created pain: fragmented estates that couldn't be governed consistently, login storms that brought platforms to their knees, image updates that required weekends of coordination, and security postures that assumed network perimeter would protect inadequate host configuration.

AVD done well is genuinely transformative — a flexible, scalable, secure desktop platform that adapts to business needs rather than constraining them. AVD done poorly is Citrix with a different logo and a larger bill. The difference lies in the design decisions made before the first host is deployed.

If you're planning an AVD implementation or migrating from legacy VDI, I'd be interested to hear about your constraints and challenges. The principles are consistent, but the application always depends on context.