Migrating an AWS S3 File Gateway Without Breaking User Drive Mappings

Replacing an AWS S3 File Gateway looks simple until the gateway also provides a business-critical SMB drive to a fleet of Windows desktops. The storage backend may be S3, but users still depend on a stable drive letter, directory-based authentication, local cache behavior, and a predictable cutover.

This article describes a migration pattern that keeps the S3 data intact, replaces the gateway appliance cleanly, and automates the Windows drive remapping with AWS Systems Manager.

The migration goal

The environment had four requirements:

Replace a legacy gateway before its support deadline
Keep the existing S3 objects and SMB share name
Preserve the user-facing drive letter
Avoid manually reconfiguring every Windows desktop

The final architecture was:

Windows desktops
      |
      | SMB with directory-based authentication
      v
New S3 File Gateway
      |
      | Private VPC endpoints
      v
Existing S3 bucket

The gateway instance and cache were replaced. The S3 bucket remained the source of truth.

Protect the rollback path first

Before changing the old gateway, collect enough evidence to recover or explain the state later:

Export the gateway, SMB, and file-share configuration
Disable delete-on-termination for the attached volumes
Take snapshots of the system and cache volumes
Wait for every snapshot to complete
Confirm that CachePercentDirty has reached zero

The dirty-cache check matters more than a simple “running” status. A clean cache means there are no pending local writes that still need to reach S3.

Why reusing the old disks was abandoned

An early attempt attached the legacy system and cache disks to a replacement EC2 instance. The network path looked healthy: the expected TCP ports accepted connections. The appliance was not healthy, however:

The activation endpoint returned an empty HTTP response
The browser showed ERR_EMPTY_RESPONSE
SSH exposed a server banner but stalled during key exchange

This is an important diagnostic boundary:

Open TCP port != healthy appliance service

Continuing to debug a legacy boot disk would have increased the outage window without improving the data recovery position. Because the durable files were already in S3 and the dirty cache was zero, a clean gateway deployment was the lower-risk path.

Build a clean gateway against the same S3 data

The successful approach used a current Storage Gateway image and new local disks:

Deploy a new S3 File Gateway.
Connect it through the required public or VPC-hosted service endpoint.
Allocate new cache storage.
Join the existing directory service.
Create an SMB share backed by the existing S3 location.
Reapply logging, alarms, IAM, security groups, and instance protections.

The local cache does not need to be copied to preserve S3 objects. The new gateway rebuilds its cache as files are accessed.

Automate the Windows drive migration with SSM

AWS Systems Manager commands run as SYSTEM, while a mapped network drive belongs to a user session. A command that runs net use only as SYSTEM may succeed without making the drive visible to the signed-in user.

The reliable pattern is:

Use SSM State Manager to install a PowerShell script on each managed Windows node.
Register the script in an HKLM Run entry.
Execute the actual mapping when the user signs in.
Write logs to the user’s %LOCALAPPDATA%, not to a protected system directory.

The login script should be conservative:

If the drive letter is unused, map the new share.
If it points to the legacy share, replace it.
If it is a local disk or an unrelated network share, do nothing.

This prevents a migration script from overwriting an unrelated user configuration.

Treat offline desktops as a normal state

Many desktop nodes will be stopped when a State Manager association runs. SSM reports those targets as Undeliverable, which is expected for offline instances.

Use a scheduled association so that configuration is retried after a desktop starts. The user communication should include:

A realistic propagation window based on the association interval
The expected drive label and path
A sign-out and sign-in fallback
A request for the execution time, screenshot, and local log if the mapping still fails

This turns an infrastructure rollout into an operationally supportable user change.

Validate the complete path

Do not stop after the gateway reports RUNNING. Validate from a normal user desktop:

echo gateway-test > E:\gateway-test.txt
type E:\gateway-test.txt
del E:\gateway-test.txt

Also verify:

The expected drive appears in File Explorer
Existing folders are visible
Create, read, update, and delete operations work
The SMB session points to the new gateway
Gateway and file-share logs are receiving events
Availability and dirty-cache alarms are healthy

Retire the old stack in dependency order

After the new path is verified, remove the old resources in a controlled sequence:

Delete the old SMB file share without forcing deletion.
Delete the old gateway.
Terminate the old EC2 instance.
Inventory unattached EBS volumes.
Delete only volumes that are confirmed unused and have the required snapshots.

Keep the new gateway’s active system and cache volumes explicitly excluded from cleanup.

Reusable lessons

S3 is the durable data layer; gateway cache is replaceable when dirty writes are zero.
Network reachability is not application health.
A clean appliance replacement can be safer than reviving an old boot disk.
User-visible drive mappings must be created in the user session.
SSM scheduled associations handle intermittent desktop availability better than a one-time command.
Monitoring, user communication, and cleanup order are part of the migration design, not post-migration chores.