The missing manual for storage APIs.
Every array speaks REST with a different accent: its own login handshake, its own pagination dialect, its own idea of what a gigabyte is. Vendor docs describe the happy path. This guide documents how these APIs behave in production — the auth recipes, the errors, and the gotchas that cost real change windows — This site is two learning tracks and a toolbox: the API track — how to connect to 21 enterprise platforms, with production-depth recipes for nine — and the replication track — from RPO/RTO theory through eleven vendor deep dives to a CLI Command Atlas and an interactive simulator that speaks six vendor dialects. Plus the calculators and the bulk SAN Zoning Studio.
Talk to any array in ten minutes
Every modern enterprise array ships a REST API: an HTTPS endpoint that answers JSON. The GUI you click is usually just a customer of that same API. To automate anything — monitoring, provisioning, reporting — you need exactly three facts per platform: where it listens (base URL and port), how it authenticates, and how it pages large results. Everything else is reading the reference.
The four auth patterns — learn 4, unlock 20+
Twenty vendors did not invent twenty schemes. Every storage API in this guide authenticates in one of four ways:
| Pattern | How it works | Platforms using it |
|---|---|---|
| 1 · Basic per-request | Send credentials (or a client cert) with every call. No session state. Simplest to script; scope the account read-only. | ONTAP, PowerMax (Unisphere), Nutanix, Cisco NX-API |
| 2 · Token exchange | One login call trades credentials or an API token for a short-lived session token you send as a header. On 401, re-login and retry. | Pure FA/FlashBlade, IBM FlashSystem/SVC, Qumulo, VAST, Cohesity, Rubrik, StorageGRID, ECS, Data Domain, Nimble |
| 3 · CSRF session | Basic login creates a cookie session; reads work immediately, but mutations also need an anti-forgery token harvested from a response header. | Unity (EMC-CSRF-TOKEN), PowerStore (DELL-EMC-TOKEN), PowerScale (X-CSRF-Token) |
| 4 · Finite sessions | Login returns a session from a limited pool. Works like pattern 2 — until leaked sessions exhaust the pool and the array refuses logins. Always log out. | Hitachi VSP (CM REST), HPE 3PAR/Primera WSAPI, Brocade FOS REST |
The connect matrix — 21 platforms
Each row: the port, the exact login call, the header that carries your identity afterwards, and where the platform's full endpoint reference lives. The nine platforms with a ▸ have full deep-dive tabs in the next section — auth recipes, pagination code, and field gotchas.
| Platform | Port · base | Login | Then send | Full endpoint reference |
|---|---|---|---|---|
| ▸ Pure FlashArray | 443 · /api/2.x | POST /api/2.x/login + api-token header | x-auth-token | Purity REST API Reference on Pure Support (support.purestorage.com); token setup: Settings → Users |
| Pure FlashBlade | 443 · /api | POST /api/login + api-token header | x-auth-token | FlashBlade REST API Reference on Pure Support |
| ▸ NetApp ONTAP | 443 · /api | Basic per request (or cert) | same | on the cluster itself: https://CLUSTER/docs/api (Swagger, exact to your version) |
| NetApp StorageGRID | 443 · /api/v3 | POST /api/v3/authorize {username,password} | Authorization: Bearer | Grid Management API docs, linked from the Grid Manager UI help |
| ▸ Dell Unity | 443 · /api/types | Basic + X-EMC-REST-CLIENT: true | cookies; writes add EMC-CSRF-TOKEN | Unisphere Mgmt REST API Programmer's + Reference Guides (developer.dell.com / Dell Support) |
| ▸ Dell PowerMax | 8443 · /univmax/restapi/{ver} | Basic per request (to Unisphere) | same | Unisphere REST API docs on developer.dell.com |
| ▸ Dell PowerStore | 443 · /api/rest | Basic → GET /api/rest/login_session | cookies; writes add DELL-EMC-TOKEN | PowerStore REST API Reference on developer.dell.com |
| ▸ Dell PowerScale | 8080 · /platform/{n} | POST /session/1/session {username,password,services} | isisessid cookie; writes add X-CSRF-Token | OneFS API Reference on Dell Support (per OneFS release) |
| Dell ECS | 4443 · mgmt API | GET /login with Basic | X-SDS-AUTH-TOKEN (from response header) | ECS Management REST API Reference on Dell Support |
| Dell Data Domain | 3009 · /rest/v1.0 | POST /rest/v1.0/auth {auth_info:{username,password}} | X-DD-AUTH-TOKEN | DD OS REST API Guide on Dell Support |
| ▸ IBM FlashSystem/SVC | 7443 · /rest | POST /rest/auth + X-Auth-Username/-Password headers | X-Auth-Token | REST API section of IBM Storage Virtualize docs (ibm.com/docs); endpoints mirror CLI names |
| ▸ Hitachi VSP | 23451 · /ConfigurationManager/v1 | POST …/sessions with Basic | Authorization: Session <token> — and DELETE it after | Hitachi Ops Center API / CM REST reference (docs.hitachivantara.com) |
| HPE 3PAR / Primera / Alletra 9000 | 8080/8443 · /api/v1 | POST /api/v1/credentials {user,password} | X-HP3PAR-WSAPI-SessionKey — DELETE the key to log out | WSAPI Developer Guide on HPE Support Center |
| HPE Nimble / Alletra 6000 | 5392 · /v1 | POST /v1/tokens {data:{username,password}} | X-Auth-Token | Nimble REST API Reference on HPE InfoSight / Support |
| ▸ Nutanix Prism | 9440 · v2 GET / v3 POST | Basic per request | same | on Prism itself: REST API Explorer (gear menu); dev docs at nutanix.dev |
| Qumulo | 8000 · /v1 | POST /v1/session/login {username,password} | Authorization: Bearer | on the cluster: interactive API docs in the Web UI (API & Tools) |
| VAST Data | 443 · /api | POST /api/token/ {username,password} → JWT | Authorization: Bearer (refresh token included) | VMS REST docs served by the VMS itself; support.vastdata.com |
| Cohesity | 443 · /irisservices/api/v1 · v2 /v2 | POST …/public/accessTokens {username,password,domain} | Authorization: Bearer | Cohesity REST API docs, linked from the cluster UI and developer.cohesity.com |
| Rubrik | 443 · /api/v1 | POST /api/v1/session with Basic | Authorization: Bearer | Rubrik API Playground on the cluster; docs on the Rubrik support portal |
| Brocade FOS | 443 · /rest | POST /rest/login with Basic | session key returned in the Authorization response header — reuse verbatim; POST /rest/logout when done (finite sessions) | FOS REST API Reference on Broadcom support |
| Cisco MDS | 443/8443 · /ins (NX-API) | Basic per request; body carries the CLI: {"ins_api":{…,"input":"show zoneset active"}} | same | on the switch: NX-API sandbox at https://SWITCH/ once feature nxapi is enabled |
Nine platforms, in production depth
For the nine platforms below, the connect matrix expands into working recipes: full auth flows, pagination loops in curl and Python, error semantics, and the field gotchas that cost real change windows. This matrix is the skeleton of the guide — each vendor tab below expands every row into working commands.
| Platform | Base path | Login | Mutations need | Pagination dialect |
|---|---|---|---|---|
| Pure FlashArray | /api/2.x/… | API token → x-auth-token | same token | limit + continuation_token |
| NetApp ONTAP | /api/… | Basic / cert per request | same | max_records + follow _links.next |
| Dell Unity | /api/types/{r}/instances | Basic + X-EMC-REST-CLIENT | EMC-CSRF-TOKEN from a GET | page/per_page, entries[].content |
| Dell PowerMax | /univmax/restapi/{ver}/… | Basic (to Unisphere) | same | iterator handle for large sets |
| Dell PowerStore | /api/rest/{r} | Basic → session | DELL-EMC-TOKEN | limit/offset, 206 + content-range |
| PowerScale / Isilon | /platform/{n}/… | session → isisessid | X-CSRF-Token | resume= token replaces the query |
| Nutanix Prism | :9440 /api/nutanix/v3 | Basic | same | v3 list = POST with length/offset |
| IBM FlashSystem/SVC | :7443 /rest/ls… | /rest/auth → X-Auth-Token | same token | CLI-mirrored; even list calls are POSTs |
| Hitachi VSP | /ConfigurationManager/v1/… | POST …/sessions → Session token | same — and DELETE the session | count/range params per object |
Pure Storage FlashArray REST
| Generations | REST 2.x (current, versioned per Purity release) and REST 1.x (legacy, still enabled on many arrays) |
|---|---|
| Auth model | Per-user API token, generated in the GUI (Settings → Users → API Token) or CLI (pureadmin create --api-token). The token inherits the user's role — a read-only user's token stays read-only. |
| Session | 2.x: exchange the API token for a short-lived x-auth-token. 1.x: POST the token to /auth/session for a cookie session. |
| Discover versions | GET https://array/api/api_version — no auth needed; returns every REST version the array supports. |
Auth recipe — REST 2.x
# 1. Exchange the API token for a session token curl -sk -X POST "https://ARRAY/api/2.4/login" \ -H "api-token: YOUR-API-TOKEN" -D - # → response HEADER contains: x-auth-token: <session-token> # 2. Use the session token on every subsequent call curl -sk "https://ARRAY/api/2.4/volumes?limit=100" \ -H "x-auth-token: SESSION-TOKEN"
Auth recipe — REST 1.x (legacy arrays)
# Cookie-based session; keep the cookie jar curl -sk -c cookies.txt -X POST "https://ARRAY/api/1.17/auth/session" \ -H "Content-Type: application/json" \ -d '{"api_token": "YOUR-API-TOKEN"}' curl -sk -b cookies.txt "https://ARRAY/api/1.17/volume"
Pagination — 2.x
# Page with limit + continuation_token until more_items_remaining is false GET /api/2.4/volumes?limit=500 # response: { "items": [...], "more_items_remaining": true, # "continuation_token": "abc..." } GET /api/2.4/volumes?limit=500&continuation_token=abc...
Python — minimal collector loop
import requests s = requests.Session(); s.verify = False r = s.post(f"https://{ARRAY}/api/2.4/login", headers={"api-token": TOKEN}) s.headers["x-auth-token"] = r.headers["x-auth-token"] items, tok = [], None while True: p = {"limit": 500, **({"continuation_token": tok} if tok else {})} j = s.get(f"https://{ARRAY}/api/2.4/volumes", params=p).json() items += j["items"] if not j.get("more_items_remaining"): break tok = j["continuation_token"]
HTTP status semantics (REST 1.x/2.x)
| 200 | Success. |
|---|---|
| 400 | Invalid action or missing/invalid data — read the response body; Purity's error text is specific. |
| 401 | Session not created or expired. Re-login and retry — build this into every collector. |
| 403 | Authenticated but not authorized (e.g., a read-only token attempting a POST). |
| 404 / 405 | Bad URI / method not valid for that URI. |
Field notes — the gotchas
1G means GiB (2³⁰), not GB. If your reporting layer assumes decimal, capacity will silently disagree with the GUI by ~7%.destroyed flag or your inventory counts will drift.NetApp ONTAP REST
| Generations | REST API from ONTAP 9.6 onward, maturing every release. ONTAPI (ZAPI) is deprecated — REST is the only forward path, and NetApp's own tooling has moved to it. |
|---|---|
| Auth model | HTTP Basic authentication (or client certificates) on every request — no session dance. Scope the account: a dedicated read-only REST role for monitoring is one command away and worth it. |
| Docs on-box | Every cluster serves its own interactive Swagger UI at https://CLUSTER/docs/api — the reference that exactly matches the version you run. |
Auth + first call
curl -sku admin:PASSWORD \ "https://CLUSTER/api/storage/volumes?fields=name,size,svm.name&max_records=100" # response envelope: { "records": [...], "num_records": N, # "_links": { "next": { "href": "..." } } }
Pagination — follow the link, don't build it
import requests url = f"https://{CLUSTER}/api/storage/volumes" params = {"fields": "name,size,space", "max_records": 500} recs = [] while url: j = requests.get(url, params=params, auth=(USER, PW), verify=False).json() recs += j["records"] nxt = j.get("_links", {}).get("next", {}).get("href") url = f"https://{CLUSTER}{nxt}" if nxt else None params = None # the next-link already carries the query
Behaviors worth knowing
| Field selection | Responses are minimal by default. Ask for what you need with ?fields=; fields=* exists but is expensive on big clusters. |
|---|---|
| Queries | Any property doubles as a filter: ?state=online&size=>100GB. Unit suffixes (KB, MB, GB, TB) are accepted in query values. |
| SVM scoping | Headers X-Dot-SVM-Name / X-Dot-SVM-UUID scope a call to an SVM through the cluster interface — cleaner than sprinkling svm.name through every body. |
| Rate limiting | Under pressure ONTAP answers 429 or 503 with an explanatory body. Back off exponentially; don't hammer. |
| Async jobs | Long operations return 202 Accepted plus a job link — poll /api/cluster/jobs/{uuid} to completion instead of assuming success. |
Field notes — the gotchas
.. in file-level endpoints resolves per RFC 3986 — a DELETE aimed at a file can resolve to the volume. URL-encode dots (%2E%2E) in file paths, always.Dell Unity REST (Unisphere)
| Shape | Collection: /api/types/{resource}/instances · Single object: /api/instances/{resource}/{id}. Resources: pool, lun, storageResource, filesystem, host, metric… |
|---|---|
| Auth model | HTTP Basic plus the mandatory header X-EMC-REST-CLIENT: true. The first authenticated GET establishes a cookie session. |
| CSRF | Every POST / PUT (MOD) / DELETE must carry EMC-CSRF-TOKEN — a value you harvest from the response headers of any prior GET in the same session. |
Auth + CSRF recipe
# 1. Login GET — keep cookies, capture the CSRF token from headers curl -sk -c ck.txt -D - -o /dev/null \ -H "X-EMC-REST-CLIENT: true" -u admin:PASSWORD \ "https://UNITY/api/types/loginSessionInfo/instances" # → header: EMC-CSRF-TOKEN: <token> # 2. Reads: cookies + client header are enough curl -sk -b ck.txt -H "X-EMC-REST-CLIENT: true" \ "https://UNITY/api/types/pool/instances?fields=name,sizeTotal,sizeUsed" # 3. Writes: add the CSRF token curl -sk -b ck.txt -X POST \ -H "X-EMC-REST-CLIENT: true" \ -H "EMC-CSRF-TOKEN: <token>" \ -H "Content-Type: application/json" \ -d '{"name":"LUN_APP01","lunParameters":{"pool":{"id":"pool_1"},"size":1099511627776}}' \ "https://UNITY/api/types/storageResource/action/createLun"
Reading responses
# Everything is wrapped: entries[].content { "entries": [ { "content": { "id": "pool_1", "name": "Pool_SSD", "sizeTotal": 21990232555520, "sizeUsed": 9895604649984 } } ] } # Paginate with ?page=N&per_page=M ; add &compact=true to trim envelopes
Field notes — the gotchas
?fields=. Every "why is my response empty" ticket starts here.Dell PowerMax / VMAX — Unisphere REST
| Base path | https://UNISPHERE:8443/univmax/restapi/{version}/… — the API version rides in the path (e.g. /100/ family for Unisphere 10.x, /9x/ for 9.x) and everything is scoped by symmetrixId. |
|---|---|
| Auth | HTTP Basic on every call — no session handshake. You talk to Unisphere, which proxies the arrays it manages; one Unisphere, many serials. |
| SDK | PyU4V is the de-facto Python client — with a caveat below. |
First calls
# Which arrays does this Unisphere manage? curl -sku user:PASS "https://UNISPHERE:8443/univmax/restapi/100/system/symmetrix" # SRDF state per storage group — the compliance workhorse curl -sku user:PASS \ "https://UNISPHERE:8443/univmax/restapi/100/replication/symmetrix/{sid}/storagegroup/{sg}/rdf_group"
Field notes — the gotchas
/common/Iterator/… family) or you'll silently process the first page only.rdf_mode lives inside group_details.modes as a list (['ASYNC'], ['ADAPTIVE_COPY']) — filter SRDF/A statistics through that list, never a flat field._srdf_list → _srdf_group_list). Verify at runtime; pinning by memory is how collectors break on upgrade day.Dell PowerStore REST
| Base path | https://ARRAY/api/rest/{resource} — flat, modern, consistent resource names (volume, appliance, replication_session, metrics). |
|---|---|
| Auth | Basic login to /api/rest/login_session establishes a session; mutations require the DELL-EMC-TOKEN header harvested from the login response — the same CSRF philosophy as Unity, new header name. |
Auth + query recipe
# 1. Login — capture cookies and the DELL-EMC-TOKEN header curl -sk -c ck.txt -D - -o /dev/null -u admin:PASS \ "https://ARRAY/api/rest/login_session" # 2. Reads: select fields explicitly, page with limit/offset curl -sk -b ck.txt \ "https://ARRAY/api/rest/volume?select=id,name,size&limit=1000&offset=0" # 3. Writes: add the token curl -sk -b ck.txt -X POST -H "DELL-EMC-TOKEN: <token>" \ -H "Content-Type: application/json" -d '{"name":"vol01","size":1099511627776}' \ "https://ARRAY/api/rest/volume"
Field notes — the gotchas
206 Partial Content with a content-range header. Treat 206 as success and keep paging — collectors that only accept 200 stop after the first page.select= means minimal objects. Enumerate the fields you need.Dell PowerScale / Isilon — OneFS PAPI
| Base path | https://CLUSTER:8080/platform/{n}/… — the Platform API, versioned by number in the path; namespaces like /platform/…/statistics, /quota, /snapshot, /sync (SyncIQ). |
|---|---|
| Auth | Basic works; production collectors should create a session — POST /session/1/session with {"username","password","services":["platform"]} — yielding the isisessid cookie plus a CSRF token to echo back as X-CSRF-Token on mutations. |
Pagination — resume tokens
# First page GET /platform/12/quota/quotas?limit=1000 # response ends with: "resume": "1-1-MAAw..." (null when done) # Every later page: the resume token REPLACES all other query params GET /platform/12/quota/quotas?resume=1-1-MAAw...
Field notes — the gotchas
resume=, OneFS rejects other filters on the same request — the token encodes them. Collectors that re-append limit= get a 400 and blame the array./platform/12/… next to /platform/3/… on one cluster). Pin per-endpoint, not per-cluster.Nutanix Prism REST
| Base path | Port 9440. v2 (element-level): /PrismGateway/services/rest/v2.0/… · v3 (Prism Central, intent-based): /api/nutanix/v3/… |
|---|---|
| Auth | HTTP Basic on both generations. |
The v3 shape — list is a POST
# v2: conventional GET curl -sk -u admin:PASS "https://PRISM:9440/PrismGateway/services/rest/v2.0/storage_containers/" # v3: listing is a POST with a body — not a GET curl -sk -u admin:PASS -X POST -H "Content-Type: application/json" \ -d '{"kind":"vm","length":500,"offset":0}' \ "https://PC:9440/api/nutanix/v3/vms/list"
Field notes — the gotchas
kind/length/offset bodies. Generic "REST collector" frameworks that assume GET-for-read fail here by design.IBM FlashSystem / SVC — Storage Virtualize REST
| Base path | https://CLUSTER:7443/rest/… — endpoints mirror the CLI verbs almost 1:1 (/rest/lssystem, /rest/lsvdisk, /rest/lsrcrelationship), which makes 20 years of SVC CLI muscle memory instantly useful. |
|---|---|
| Auth | POST /rest/auth with headers X-Auth-Username / X-Auth-Password → JSON token, sent thereafter as X-Auth-Token. |
Auth recipe
curl -sk -X POST -H "X-Auth-Username: superuser" -H "X-Auth-Password: PASS" \ "https://CLUSTER:7443/rest/auth" # → { "token": "..." } curl -sk -X POST -H "X-Auth-Token: TOKEN" "https://CLUSTER:7443/rest/lsvdisk"
Field notes — the gotchas
code_level as a version tuple, never a float — 9.10 is newer than 9.1, and float compares say otherwise.lsrcrelationship returns empty — that's not "no replication," it's Policy-Based Replication. Detect PBR explicitly.Hitachi VSP — Configuration Manager REST
| Base path | …/ConfigurationManager/v1/objects/storages/{deviceId}/… — the storage device ID rides in every path; one CM/Ops Center API endpoint fronts multiple arrays. Newer VSP One / Ops Center surfaces add OAuth2 (Keycloak-issued bearer tokens) in front of the same resource model. |
|---|---|
| Auth | Classic flow: POST …/sessions with Basic → a session object with a token, sent as Authorization: Session <token>. |
Session recipe — and the cleanup that matters
# 1. Open a session curl -sk -u user:PASS -X POST \ "https://CM:23451/ConfigurationManager/v1/objects/storages/{devId}/sessions" # → { "token": "...", "sessionId": N } # 2. Use it curl -sk -H "Authorization: Session TOKEN" \ "https://CM:23451/ConfigurationManager/v1/objects/storages/{devId}/ldevs?count=200" # 3. ALWAYS close it — session slots are finite curl -sk -H "Authorization: Session TOKEN" -X DELETE \ "https://CM:23451/ConfigurationManager/v1/objects/storages/{devId}/sessions/{sessionId}"
Field notes — the gotchas
Part 1 — Theory: two numbers, two axes, one rule
Business continuity is keeping the business running when something breaks; disaster recovery is the technical subset storage engineers own. Every DR design reduces to two numbers — and confusing them is the most common mistake in the field.
| Number | Question it answers | What sets it |
|---|---|---|
| RPO | How much data can you afford to lose? "How far back in time does my recovered copy sit?" | Replication / snapshot frequency. Hourly snapshots → best-case RPO of one hour. |
| RTO | How long can you afford to be down? "How long until I'm running again?" | Failover speed: automation, orchestration, standby compute. |
| RPA / RTA | What did you actually measure at the last drill? | The gap between the O and the A is exactly what a DR program exists to close. |
The Seven Tiers (SHARE/IBM, 1992 — still maps onto everything)
| Tier | Shape | Typical RPO / RTO |
|---|---|---|
| 0 | No DR. Recovery is rebuild-from-scratch. | effectively infinite |
| 1–2 | Periodic backups shipped off-site (the "pickup-truck access method"; today, a dedup appliance or cloud). | RPO ~24 h / RTO days |
| 3–4 | Electronic vaulting + point-in-time copies — where most snapshot-based array replication lives. | RPO hours / RTO hours |
| 5–6 | Continuous async or sync replication to a hot/warm standby. | RPO sec–0 / RTO min–hours |
| 7 | Sync replication + full orchestration; active-active metro clusters. | RPO 0 / RTO ~0 |
The two axes that make every marketing name legible
Axis 1 — timing (when the host gets its ack): synchronous commits on both arrays before acknowledging — RPO 0, latency pays the round trip, distance practically capped near ~100 km / sub-10 ms. Asynchronous acknowledges locally and catches the remote up — RPO > 0, unlimited distance.
Axis 2 — mechanism (how the change travels):
| Mechanism | How it works | Copies on target | Lag / RPO driver |
|---|---|---|---|
| Snapshot / periodic | Ship the delta between scheduled snapshots; target keeps N discrete, immutable copies. The only mechanism producing countable copies. RPO floors ≈ 5 min on most arrays. | Countable (N of M) | schedule frequency |
| Journal-based | Every write logged with a sequence number; target drains the journal in exact order — perfect write-order consistency. | One living copy | journal fill vs drain |
| Delta-set / cycle | Writes batch into a fixed cycle (e.g. 15 s) and ship as one dependent-write-consistent set; target is always consistent to a cycle boundary. | One living copy | average cycle time |
| Streaming | Writes stream near-continuously as cache fills — smallest async RPO (seconds), but the link must be sized near peak write rate. | One living copy | link vs peak writes |
Active-active (metro / stretched) is a special shape of sync: both arrays present the same volume and serve I/O simultaneously, kept identical through a quorum witness. No source, no target — compliance is pair state, Active or Suspended. Examples: Hitachi GAD, SRDF/Metro, PowerStore Metro Volume, Pure ActiveCluster, NetApp MetroCluster.
Finally: a consistency group ties volumes together so they replicate and fail over as a unit, preserving write order across all of them. Any database with data and logs on separate volumes needs one. Every vendor implements it; only the name changes — consistency group, protection group, RDF group, journal group, copy group.
Why synchronous replication has a latency floor
"RPO 0" has a mechanical cost, and it shows up as a hard distance/latency ceiling — not a marketing footnote. Under synchronous replication the host write is not acknowledged until the remote array has the data too: host → local cache → wire → remote cache → ack back → ack to host. That round trip sits directly in every write's response time, which is why sync deployments are commonly planned inside a sub-10 ms round-trip / ~100 km envelope — vendor-specific ceilings vary (SRDF/S, TrueCopy, and ONTAP Synchronous SnapMirror each publish their own supported distance/latency tables; treat this as a planning heuristic, not a physical constant, and confirm against your platform's current interoperability matrix).
Two consequences follow directly from that mechanism, not from any one vendor's implementation:
Working out actual asynchronous RPO — not the adjective, the number
"Asynchronous" describes the acknowledgment model, not a number. The number a DR runbook needs is: if I fail over right now, how far back does my recovered copy sit? That depends on which of the four async mechanisms above is running, and the formula differs by mechanism:
| Mechanism | Worst-case RPO formula | Worked example |
|---|---|---|
| Snapshot / periodic | schedule interval + time-to-detect-and-declare a disaster | 15-min pgroup schedule, 3-min detection → up to 18 min of loss |
| Delta-set / cycle (SRDF/A) | ≈ 2 × average cycle time, worst case (an in-flight cycle plus the next one starting) | 15 s default SRDF/A cycle → worst case ≈ 30 s, typical case ≈ one cycle |
| Journal-based (UR, RecoverPoint, Global Mirror) | journal drain time at the moment of failure — bounded by how far behind the journal was allowed to fall, not by a fixed schedule | see the journal sizing calculator below — this is exactly the number it estimates |
| Streaming | ≈ current replication lag (seconds, tracked directly — e.g. ONTAP's lag_time) | healthy link: 2–10 s · link falling behind peak write rate: lag grows until the link catches up or the target falls further behind |
Part 2 — Eleven platforms, mapped to the same axes
Each card: the replication technologies, which mechanism they are underneath, and the field-verified gotchas. Expand what you run.
Dell PowerMax / VMAX — SRDFdelta-set · sync · active-active
SRDF is the canonical delta-set implementation and the deepest replication stack in the industry; SnapVX provides local snapshots alongside.
| Technology | Mechanism / timing | RPO |
|---|---|---|
| SRDF/S | synchronous | 0 |
| SRDF/A | async, delta-set cycle | ≈ cycle time — default 15 s on current Enginuity (30 s is legacy) |
| SRDF/Metro | active-active (R1 and R2 both RW, witness-arbitrated) | 0 · binary pair-state compliance |
| SnapVX | local snapshots (countable) | local protection, reported separately |
rdf_mode lives nested in group_details.modes as a list (e.g. ['ASYNC']) — not a flat field. · PyU4V renames methods between releases; verify method names at runtime, don't pin blindly.NetApp ONTAP — SnapMirror familysnapshot · sync · active-active
SnapMirror is the franchise: a relationship ships the delta between Snapshot copies from source to destination, with a directly exposed lag_time metric.
| Technology | Mechanism / timing | RPO |
|---|---|---|
| SnapMirror Async | snapshot-based (countable, policy-driven retention) | schedule interval; watch lag_time |
| SnapMirror Synchronous | synchronous, one-way | 0 |
| SnapMirror active sync (ex SM-BC) | consistency-group sync / active-active | 0 · binary pair-state |
| MetroCluster | active-active at cluster level | 0 |
snapmirrored + healthy=true is the synced proxy — anything else counts as not-synced. · snapmirrorTransfers exists only on ONTAP 9.11+ — gate with a tuple version compare, never float (9.10 vs 9.1 is the classic bug). · active sync uses policy types automated-failover(-duplex): classify as binary sync state, never as countable snapshots.Pure Storage FlashArraysnapshot · journal · active-active
Three clean shapes, all built on pods and protection groups: periodic snapshot async, ActiveDR journal-based near-sync, and ActiveCluster sync active-active.
| Technology | Mechanism / timing | RPO |
|---|---|---|
| Async (pgroup snapshots) | snapshot-based, countable | pgroup schedule frequency |
| ActiveDR | journal-based continuous async on a pod | seconds (near-sync) |
| ActiveCluster | synchronous active-active (stretched pod + mediator) | 0 · binary |
Hitachi VSPjournal · sync · active-active
Organized around journals for async, across three management surfaces (CCI, Ops Center, Configuration Manager REST).
| Technology | Mechanism / timing | RPO |
|---|---|---|
| Universal Replicator (UR) | journal-based async | journal fill vs drain (derived) |
| TrueCopy | synchronous | 0 |
| Global-Active Device (GAD) | active-active, quorum-arbitrated | 0 · binary |
| ShadowImage / Thin Image | local clone / local snapshot | local protection |
IBM FlashSystem / SVC (Storage Virtualize)sync · journal · snapshot · active-active
The widest mechanism spread in one platform — and a hard generational break at firmware 8.7.1, where Policy-Based Replication replaces the classic Remote Copy family.
| Technology | Mechanism / timing | RPO |
|---|---|---|
| Metro Mirror | synchronous | 0 |
| Global Mirror (non-cycling) | journal-style continuous async | seconds |
| GM with Change Volumes (GMCV) | snapshot/cycle async | cycle default 300 s (60–300 s flagged not recommended); max RPO ≈ 2× cycle |
| HyperSwap | active-active | 0 · binary |
| Policy-Based Replication (8.7.1+) | policy-driven async / HA | per policy |
lsrcrelationship is empty; detect PBR instead. · Everything is firmware-gated: compare code_level as a tuple, never a float. · GMCV default cycle is 300 s, not 60.HPE 3PAR / Primera / Alletra 9000 — Remote Copysync · snapshot · streaming
| Technology | Mechanism / timing | RPO |
|---|---|---|
| Remote Copy Synchronous | synchronous | 0 |
| Async Periodic | snapshot-based | interval — minimum 5 minutes; don't model tighter |
| Async Streaming | streaming continuous | seconds |
| Peer Persistence | per-volume active/standby (ALUA, same WWN, quorum) — transparent failover, not simultaneous active-active | 0 · binary |
showschedule); derive expected local copies from creation→expiration timing. · Managed via creatercopygroup / setrcopygroup.Dell Unity XTsnapshot · sync · file-metro · CDP add-on
| Technology | Mechanism / timing | RPO |
|---|---|---|
| Native Async | snapshot / RPO-policy driven (block + file) | configured RPO |
| Native Sync | synchronous (block + file) | 0 |
| MetroSync (file) | file active/standby | 0 · binary |
| RecoverPoint (block) | journal-based, any point in time | seconds, journal-bounded |
Dell PowerStoresnapshot · sync · active-active
All replication is native software — no add-on license.
| Technology | Mechanism / timing | RPO |
|---|---|---|
| Async | RPO-policy snapshot-driven | configured RPO |
| Sync | synchronous | 0 |
| Metro Volume | active-active | 0 · binary · bounded to ~96 km / <10 ms — HA, not long-distance DR |
HPE Nimble / Alletra 6000snapshot · active/standby
| Technology | Mechanism / timing | RPO |
|---|---|---|
| Snapshot replication | protection schedules + templates, partner-to-partner, countable | schedule interval |
| Peer Persistence | volume-granular transparent failover | 0 · binary |
replication_partners. · Consistency lives at the volume-collection level; group dependent volumes there.Cohesity DataProtectsnapshot · journal CDP · cloud tiers
A backup platform, not a primary array — its "replication" protects backup data and VMs.
| Technology | Mechanism / timing | RPO |
|---|---|---|
| SnapTree snapshots | incremental-forever backup foundation | backup frequency |
| Cluster-to-cluster replication | snapshot shipping between clusters | policy schedule |
| CloudArchive / CloudReplicate | cloud copy / cloud-resident DR cluster | policy schedule |
| CDP | journal-based, VMware VAIO filter | near-zero (VM-level) |
Rubrik Security Cloudsnapshot · journal CDP · immutable
| Technology | Mechanism / timing | RPO |
|---|---|---|
| Incremental-forever snapshots (Atlas) | immutable filesystem, SLA-domain policies | SLA frequency |
| SLA-driven replication + archival | snapshot shipping / cloud-object archive | SLA schedule |
| CDP | journal-based, VM-level (VMware) | near-zero |
This section condenses the author's full Business Continuity & Storage Replication Field Reference (Clear Technologies / VSI Platform Engineering, 2026) — vendor facts verified against current vendor documentation at time of writing.
The verbs, per platform — CLI, not API
The lifecycle from the theory section, expressed in each platform's native operator CLI. Rows are the same everywhere; only the words change. Commands are the canonical forms — production use takes device/group arguments and flags that vary by version; the linked vendor references carry every option.
Dell PowerMax / VMAX — SYMCLI symrdf
| Stage | Command | Notes |
|---|---|---|
| Inventory | symrdf list · symrdf -sid SID list | all RDF devices/groups; rich filters (-rdfa, -concurrent, -dynamic) |
| Status | symrdf -g DG query · symrdf verify -synchronized · symrdf ping | query a device/consistency group; verify asserts a state; ping tests RDF links |
| Create + first sync | symrdf createpair -establish | -file pairs.txt -type R1 -rdfg N; add -rdf_mode async for SRDF/A |
| Pause / resume | symrdf suspend / symrdf resume | link NR; R2 stays write-disabled |
| Split (both RW) | symrdf split | R2 becomes writable too — for DR tests against real data |
| Failover / failback | symrdf failover → symrdf update → symrdf failback | update pre-copies R2 changes home so failback is brief |
| Reverse roles | symrdf swap | R1↔R2 personalities; link must be NR (post-suspend/split/failover) |
| Mode / teardown | symrdf set mode sync|async|acp_disk · symrdf deletepair | acp_disk = bulk copy without host-I/O impact (the migration workhorse) |
NetApp ONTAP — snapmirror
| Stage | Command | Notes |
|---|---|---|
| Inventory / status | snapmirror show · snapmirror list-destinations | watch lag_time, state, healthy |
| Create + first sync | snapmirror create → snapmirror initialize | needs a vserver peer + a DP-type destination volume |
| Incremental | snapmirror update | or the policy schedule does it for you |
| Pause / resume | snapmirror quiesce / snapmirror resume | quiesce completes the in-flight transfer, then holds |
| Failover | snapmirror break | destination becomes RW (state Broken-off); quiesce first when planned |
| Failback / reverse | snapmirror resync | direction follows the source-path you resync toward; re-protect, then break/resync the original way |
| Single-file restore | snapmirror restore | pull data back out of a destination without breaking it |
| Teardown | snapmirror delete + snapmirror release | delete the relationship, then release source-side metadata |
Hitachi VSP — CCI (pair* / horctakeover)
| Stage | Command | Notes |
|---|---|---|
| Status | pairdisplay -g GRP -fcx · pairvolchk | states: COPY, PAIR, PSUS, PSUE, SSWS |
| Create + first sync | paircreate -g GRP -f async|never | UR pairs ride journal groups; TC pairs are fence-level based |
| Pause / resume | pairsplit -g GRP / pairresync -g GRP | pairsplit -rw makes the S-VOL writable (DR test) |
| Failover (planned or not) | horctakeover -g GRP | swap-takeover when links are healthy; S-VOL takeover (→ SSWS) when the primary is gone |
| Failback / reverse | pairresync -swaps | resync with role swap — the return leg after SSWS |
| Teardown | pairsplit -S | simplex — dissolves the pair |
IBM FlashSystem / SVC — Storage Virtualize CLI
| Stage | Command | Notes |
|---|---|---|
| Status | lsrcrelationship · lsrcconsistgrp | empty on firmware 8.7.1+ — that means Policy-Based Replication, not "no replication" |
| Create | mkrcrelationship -master V1 -aux V2 -cluster REMOTE | add -global for Global Mirror, -cyclingmode multi for GMCV |
| Start / stop | startrcrelationship / stoprcrelationship | -force variants exist; group forms: *rcconsistgrp |
| Failover | stoprcrelationship -access | grants host access to the auxiliary — the takeover verb |
| Reverse / failback | switchrcrelationship -primary aux|master | flips copy direction once both sides are consistent |
| PBR era (8.7.1+) | chvolumegroup -replicationpolicy POL · lsvolumegroupreplication | replication becomes a policy attached to a volume group |
HPE 3PAR / Primera / Alletra 9000 — Remote Copy CLI
| Stage | Command | Notes |
|---|---|---|
| Status | showrcopy | groups, targets, sync state, last-sync times |
| Create | creatercopytarget → creatercopygroup GRP TARGET:sync|periodic|async → admitrcopyvv VV GRP TARGET:VV_DR | mode is named at group creation |
| Start / stop | startrcopygroup GRP / stoprcopygroup GRP | periodic groups also take setrcopygroup period |
| Planned switchover | setrcopygroup switchover GRP | orderly role reversal, no data loss |
| Disaster failover | setrcopygroup failover GRP | run on the target system |
| Return home | setrcopygroup recover GRP → setrcopygroup restore GRP | recover resyncs back; restore reverts roles to original |
Pure Storage FlashArray — Purity CLI
| Stage | Command | Notes |
|---|---|---|
| Async (pgroup) status | purepgroup list --schedule · purepgroup list --transfer | frequency is in seconds |
| Async create | purepgroup create --targetlist ARRAY2 PG · purepgroup setattr --replicate-frequency N PG | members via purepgroup add --vollist |
| ActiveDR status | purepod list · purepod list --replica-link | pod is the consistency + failover unit |
| Pause / resume | purepod replica-link pause / … resume | |
| Failover / failback | purepod promote POD (target) · purepod demote POD | demote with --skip-quiesce exists for emergencies — know what it forfeits |
Dell PowerScale — SyncIQ (isi sync)
| Stage | Command | Notes |
|---|---|---|
| Status | isi sync policies list · isi sync jobs list · isi sync reports list | |
| Create / run | isi sync policies create → isi sync jobs start POLICY | schedule-driven; directory-tree scoped |
| Failover | isi sync recovery allow-write POLICY | run on the target cluster — makes the target tree writable |
| Failback prep | isi sync policies resync-prep POLICY | creates the mirror policy that carries changes home; then run it, allow-write on source, resync-prep again |
Dell Data Domain — MTree replication
| Stage | Command | Notes |
|---|---|---|
| Create + first sync | replication add source mtree://… destination mtree://… → replication initialize | |
| Status | replication status · replication show performance | lag and throughput per context |
| Failover | replication break | destination MTree becomes writable |
| Failback | replication resync | re-establishes after a break, in either direction |
Dell Unity XT & PowerStore — session CLIs
| Platform | Verbs | Notes |
|---|---|---|
Unity (uemcli) | /prot/rep/session show · … -id ID sync · failover · failback | sessions are the object; -async/planned flags per operation |
PowerStore (pstcli / REST actions) | replication_session show · pause · resume · sync · failover · reprotect | CLI verbs mirror the REST action names one-to-one |
HPE Nimble / Alletra 6000 — volume collections
| Stage | Command | Notes |
|---|---|---|
| Status | volcoll --list · partner --list | replication rides protection schedules on volume collections |
| Planned handover | volcoll --handover NAME --partner P | graceful role reversal — drains, then flips |
| Disaster | volcoll --promote NAME (on target) · later volcoll --demote NAME --partner P | promote grants writes at DR; demote rejoins the original as replica |
Full option references: Dell Solutions Enabler SRDF CLI Guide, the ONTAP command reference, Hitachi CCI guides, IBM Storage Virtualize command docs, and the HPE 3PAR CLI Reference — see Resources.
One discipline, four vocabularies
SRDF, SnapMirror, Universal Replicator, and ActiveDR are the same idea wearing four uniforms: a source that owns the write, a target that shadows it, a link between them, and a set of verbs for breaking and reversing that relationship on purpose. Engineers who know one stack freeze when handed another — not because the concepts changed, but because every vendor renamed them. This table is the translation layer.
Command & concept mapping
| Concept | Dell EMC SRDF | NetApp SnapMirror | Hitachi (TrueCopy / UR) | Pure ActiveDR |
|---|---|---|---|---|
| Unit of replication | Device pair (R1 → R2) in an RDF group | Volume relationship (source → destination) | P-VOL → S-VOL pair in a copy / journal group | Pod (volumes + config) over a replica link |
| Consistency construct | Consistency group (symcg) |
Consistency group (SM-S) / per-volume Snapshot lineage | Consistency group (CTG); journals for UR | The pod itself is the consistency boundary |
| Create + first sync | symrdf createpair -establish |
snapmirror create → snapmirror initialize |
paircreate |
purepod replica-link create |
| Incremental update | continuous (SRDF/S sync, SRDF/A cycles) | snapmirror update (async, scheduled) |
continuous (TC sync; UR via journals) | continuous near-sync |
| Pause / resume | symrdf suspend / symrdf resume |
snapmirror quiesce / snapmirror resume |
pairsplit / pairresync |
replica-link pause / resume |
| Planned failover | symrdf failoverR1 write-disabled, R2 RW |
snapmirror quiesce + breakdestination RW |
horctakeoverswap-takeover when links healthy |
purepod promote (target)demote source first |
| Unplanned failover | symrdf failover from surviving side |
snapmirror break at destination |
horctakeoverS-VOL takeover → SSWS |
purepod promote at target |
| Failback | symrdf update → symrdf failback |
snapmirror resync (reverse) → break → resync original |
pairresync variants, then takeover back |
demote / promote back across the link |
| Reverse roles for good | symrdf swapR1↔R2 personalities |
delete + re-create relationship in reverse (or reverse resync) | swap-takeover | promote target, demote original — direction follows |
| Healthy state name | Synchronized (S) / Consistent (A) | Snapmirrored | PAIR | replicating |
| Split state name | Split / Suspended / Failed Over | Broken-off / Quiesced | PSUS (planned) / PSUE (error) / SSWS (takeover) | paused / promoted |
| Sync flavors | SRDF/S (sync) · SRDF/A (async) · Adaptive Copy (bulk) | Async (XDP policies) · SnapMirror Synchronous (Sync / StrictSync) | TrueCopy (sync) · Universal Replicator (async, journal) | ActiveDR (near-sync) · ActiveCluster (sync, stretched pod + mediator) |
The lifecycle every stack shares
Strip away the vendor names and one state machine remains. Learn it once; map it forever.
Three rules that survive every vendor
1 — Failover is a write-ownership transfer, not a copy operation. Whether it's symrdf failover, snapmirror break, or horctakeover, the command's real job is deciding which side is allowed to accept writes. Data movement is what happens before and after.
2 — Async means the target is a point in time, not a mirror. SRDF/A cycles, SnapMirror schedules, and UR journals all trade currency for distance. Know the cycle/schedule interval — that is a component of your RPO, and it belongs in the DR runbook as a number, not an adjective; see the worked RPO formulas above for how that interval becomes an actual worst-case number per mechanism. (The RPO bandwidth calculator below turns change rate into required link capacity; the journal/CDP sizing calculator turns change rate into required journal capacity.)
3 — The failback plan is the failover plan. Every takeover creates an inverted relationship that someone must resync, reverse, or rebuild. A DR drill that ends at "application is up at site B" is half a drill. Write the return leg first.
The replication simulator
One replication pair, six vendor CLIs. The state machine underneath never changes — only the vocabulary does, which is the entire thesis of this site made playable. Type commands, watch the pair react, and run the missions every storage engineer must be able to perform half-asleep. Type help to list every modeled command in the current dialect — inventory (symrdf list, lsrcrelationship, showrcopy…), health (ping, verify, pairvolchk), mode changes, split for DR tests, update before failback, swap — the full lifecycle. Switch dialects mid-mission and finish in another vendor’s words.
Why hard zoning exists, and where it breaks down
The Zoning Studio below generates zone configs; this section is the theory the tool assumes you already know. Two topics that "soft vs. hard zoning" 101-level writeups usually name but don't finish explaining: what soft zoning's WWPN-based membership actually fails to stop, and what happens to zone enforcement once NPIV puts more than one host identity behind a single physical switch port.
Soft zoning: membership is not enforcement
Zoning has two independent jobs that are easy to conflate: membership (which WWPNs are configured into a zone together) and enforcement (what actually stops traffic between WWPNs that aren't). Soft zoning does the first and not the second.
Under soft zoning, the fabric's name server simply omits devices outside a WWPN's zone from that WWPN's query results — a host asking "what targets exist?" only gets back the targets it's zoned to see. That is a discovery filter, not a traffic block. A host, VM, or compromised initiator that already knows (or guesses, or is manually configured with) a target's WWPN can address it directly — the switch enforces nothing at the frame level and simply forwards the frame, because soft zoning never programmed a hardware filter to drop it. Hard zoning is different in exactly this respect: the fabric switch enforces zone membership at the port ASIC, validating source ID (S_ID) and destination ID (D_ID) on every frame and dropping traffic between ports whose IDs aren't co-zoned — independent of what any device claims its own WWPN is.
NPIV: when the WWPN you're zoning isn't the physical port
N-Port ID Virtualization (NPIV) lets one physical HBA port register multiple virtual WWPNs with the fabric — the mechanism behind per-VM WWPNs on a hypervisor, and behind N-Port Virtualizer (NPV) blade-switch designs where an entire chassis's server ports "borrow" fabric services from an upstream core switch rather than joining the fabric as full switches themselves.
Standard zoning theory assumes a roughly 1:1 relationship between a physical port and the WWPN sitting behind it. NPIV breaks that assumption on purpose — and that has two concrete consequences worth knowing before you zone an NPIV or blade environment:
Both points above describe the standard, vendor-documented NPV/NPIV design tradeoff (fabric services proxied upstream; WWPN-only zoning) rather than a single vendor's specific defect — but exact TCAM budgets, maximum WWPNs-per-port, and zone-database size ceilings are switch-model- and firmware-specific. Check your platform's current configuration limits before sizing a large NPIV deployment.
Ransomware resilience, fabric evolution, key management
Five topics that came up repeatedly when checking what practitioners actually search for versus what this guide covered. Weighted honestly: some of these have real, citable, vendor-verified mechanics; one — capacity forecasting — turned out to be mostly vendor marketing dressed as methodology, and is treated that way below rather than padded out.
Ransomware-resilient backup: 3-2-1-1-0
The classic 3-2-1 rule (3 copies, 2 different media, 1 offsite) says nothing about an adversary who can authenticate to your backup infrastructure and delete or encrypt the backups themselves — which is exactly what modern ransomware playbooks target before triggering encryption on production. 3-2-1-1-0 adds two digits to close that gap. Per Veeam, which popularized this extension of the older 3-2-1 rule: the additional 1 means one copy that is offline, air-gapped, or immutable — these are alternatives satisfying the same requirement, not three separate mandates, though some secondary sources conflate "air-gapped" and "immutable" as if they were the same digit; they aren't identical mechanisms, just interchangeable ways to satisfy this one. The 0 means zero recovery errors, verified by actually testing recovery, not by assuming a completed backup job is a restorable one.
| Platform | Feature | Mechanism |
|---|---|---|
| Pure Storage | SafeMode Snapshots | Destroyed snapshots enter an eradication timer (default 24h, configurable up to 30 days on FlashArray) during which they cannot be permanently removed. Increasing the timer is a lower-friction request; lowering or disabling it requires going through Pure Support with two designated, Support-verified authorized contacts approving — the asymmetry between the two directions is the point. |
| NetApp ONTAP | SnapLock (Compliance / Enterprise) + Snapshot copy locking | A tamper-resistant ComplianceClock enforces WORM. Compliance mode: no one, including cluster admins, can delete before expiry. Enterprise mode: a privileged admin retains an early-delete path. Snapshot copy locking (ONTAP 9.12.1+) extends the same clock to lock individual Snapshot copies, not just SnapLock volumes. |
| Dell PowerMax | Secure Snaps | Time-locked SnapVX snapshots; no user can terminate a Secure Snap during its retention period, and it auto-terminates at TTL expiry once no linked targets or restore sessions remain — check current documentation for the exact behavior of a snap actively in a restore operation. |
| Dell PowerStore | Secure Snapshots | Block and file snapshots that cannot be deleted even by the top administrative role; expiration can be extended but never reduced. PowerStoreOS 3.5 documents secure-snapshot replication and conversion of existing snapshots to secure — treat 3.5 as a confirmed capability point rather than necessarily the feature's introduction release; check Dell's release notes for the exact version if that distinction matters to your design. |
| Dell PowerProtect Cyber Recovery | Isolated recovery vault (separate product) | Applies here as a distinct product, not a PowerStore/PowerMax feature: an air-gapped vault holding retention-locked immutable copies plus clean-room recovery analytics, positioned as the last line after primary immutable snapshots. |
| IBM FlashSystem / Storage Virtualize | Safeguarded Copy (8.4.2+) | Immutable point-in-time copies held in an isolated backup pool that is never mapped to a host — the copy is unreachable for modification or deletion by design, not merely by permission. |
| Hitachi VSP | Thin Image + Data Retention Utility | Snapshots carry a customer-set retention timer that cannot be shortened by an admin once applied, layered with WORM. The strongest immutability claims in the current lineup are model-specific (VSP One Block 20 with HDPS IntelliSnap) rather than a blanket capability across every VSP generation — check your specific model. |
| Nutanix | WORM on Nutanix Unified Storage (Files Enterprise WORM + Objects Object Lock) | Native Nutanix Files (Enterprise WORM, 4.1+) and native Nutanix Objects (S3-compatible Object Lock) enforce write-once-read-many immutability for all callers during the retention window — scoped to the Files/Objects (NUS) services specifically, not native VM/volume-level snapshots; don't assume it covers a Nutanix AHV VM snapshot, it doesn't. Data Lens is a separate analytics/ransomware-detection layer over NUS — it reports on and helps recover from threats, but the immutability enforcement itself lives in Files/Objects, not in Data Lens. |
NVMe-oF: the fabric bindings, and why FC-NVMe is a different standards body
NVMe over Fabrics (NVMe-oF) is published by NVM Express, Inc. — the same organization that owns the base NVMe specification. NVMe-oF 1.0 (2016) defined the fabric-independent command and queueing model plus an initial RDMA transport binding (covering InfiniBand, RoCE, and iWARP under one RDMA binding). NVMe-oF 1.1 (2019) added the TCP transport binding (NVMe/TCP) along with improved multipath and discovery. The spec family has since been restructured into a modular set of documents, all maintained at nvmexpress.org.
This sits alongside this guide's existing FC and iSCSI content as the third transport family: NVMe/TCP and NVMe/RoCE run over Ethernet fabrics (the RoCE binding needs a lossless/DCB-configured fabric the same way FCoE does; NVMe/TCP does not), while FC-NVMe rides existing Fibre Channel fabrics and zoning exactly like traditional FCP — the zoning theory above applies to FC-NVMe unchanged, since zoning operates on WWPN identity regardless of which upper-layer protocol (FCP or FC-NVMe) rides on top of the FC fabric.
Encryption and key management: SED vs. array-level vs. application-level, and KMIP
Three layers get called "encryption at rest" and they protect against different failure modes:
| Layer | How it works | What it protects against | Tradeoff |
|---|---|---|---|
| Self-encrypting drive (SED) | Hardware AES engine on the drive itself, governed by TCG Enterprise/Opal protocols | Data exposure from a physically removed or decommissioned drive | Near-zero performance cost, but provides no protection for a running, authenticated array — the drive decrypts transparently for any authorized controller |
| Array/controller-level | Software or controller-based encryption applied above the drive layer, centrally keyed | Broader at-rest exposure with centralized key management | Simpler key management than per-drive SEDs, but naive implementations that encrypt before dedup/compression destroy both — most array vendors encrypt after reduction specifically to avoid this, verify yours does too |
| Application-level | Encrypted before data ever leaves the host/application | Storage-layer compromise entirely — the array never sees plaintext | Strongest protection against a compromised storage layer, but ciphertext's high entropy defeats storage-side dedup/compression outright, and complicates backup, search, and restore workflows |
KMIP (Key Management Interoperability Protocol) is an OASIS-ratified protocol for standardized communication between storage/encryption endpoints and a centralized external key management server, typically over TCP port 5696. Dell, IBM, and NetApp are documented members of the OASIS KMIP Technical Committee; Nutanix documents support for KMIP-compliant external key management servers in its own security documentation. Pure FlashArray documents Purity//FA native encryption with external KMS integration in its security guides; independently confirm current KMIP-specific support against Pure's current documentation before designing around it, since the mechanism wasn't verified line-by-line against a live KMIP conformance statement for this guide. Same caveat for Hitachi VSP — treat vendor KMIP support as something to confirm against the specific firmware/CM release you run, not as a blanket guarantee across a platform family.
Capacity forecasting — and an honest assessment of what's actually out there
Most vendor capacity-forecasting features are marketed as predictive or AI-driven without publishing the underlying methodology — which, on inspection, is itself informative: it suggests there usually isn't much methodology to publish. NetApp is the one vendor in this guide's coverage that documents its actual algorithm: Active IQ Digital Advisor's Capacity Forecast feature computes an average weekly growth rate from up to twelve months of historical used-capacity data, then extrapolates that rate forward across a one-to-six-month window, flagging systems approaching a 90% projected-utilization threshold — explicitly accounting for reconfiguration events (an aggregate expansion isn't misread as organic growth). That is a real, documented growth-rate-extrapolation methodology — not the "AI/ML-driven" framing Active IQ's broader marketing implies elsewhere; the ML capabilities Active IQ is best known for (anomaly and performance detection) are documented as separate features from this specific capacity forecast, though NetApp's own materials don't draw that exact boundary in a single place.
Hybrid and multi-cloud storage: two concrete mechanisms, not marketing
NetApp FabricPool operates at the block level (4KB blocks), not the file level — the same mechanism tiers both NAS and SAN data uniformly, since ONTAP doesn't distinguish file-vs-LUN at the tiering layer itself. A tiering minimum cooling period defines how long a block must go untouched before it's eligible to move: the Auto policy defaults to 31 days and is manually adjustable from 2 to 183 days (ONTAP 9.8+). (Cloud Volumes ONTAP has a separate, distinct behavior where Auto tiering activates once the aggregate crosses roughly 50% capacity — don't conflate that trigger with the on-prem cooling-period default described here.) A daily background scan finds cold blocks, packages them into 4MB objects, and writes them to the configured object store (AWS S3, Azure Blob, Google Cloud Storage, or an on-prem S3-compatible target including StorageGRID). Tiering policy (None / Snapshot-Only / Auto / All) controls which data classes are eligible for tiering at all.
AWS Storage Gateway bridges on-prem to cloud through four gateway types (a fourth, FSx File Gateway, is closed to new customers but remains documented — not detailed here), all sharing a read-through/write-back local cache — writes commit locally first for low latency, then replicate asynchronously to AWS: S3 File Gateway presents NFS/SMB and lands files as native S3 objects directly manageable via S3 APIs afterward; Volume Gateway presents iSCSI block volumes, with "cached volumes" mode keeping only a working set local while the full dataset lives in S3, and point-in-time snapshots materializing as incremental (changed-blocks-only) EBS snapshots; Tape Gateway emulates a virtual tape library and media changer over iSCSI for existing backup software, with virtual tapes living in S3 and optionally archived to Glacier Flexible Retrieval or Glacier Deep Archive.
Other vendors in this guide's coverage (Pure, Dell, Hitachi, IBM, Nutanix) have their own cloud-tiering and hybrid mechanisms; they aren't detailed here because this guide doesn't yet have primary-source-verified mechanics for them at the same depth as FabricPool and Storage Gateway above. Treat their absence as "not yet researched to this guide's standard," not as "doesn't exist."
The math you keep re-deriving
Five calculators, all client-side — nothing you type leaves your browser. Each encodes a formula storage engineers rebuild in spreadsheets every year.
IOPS-weighted latency
A straight average lets a thousand idle volumes hide one suffering database. Weighting by IOPS makes the roll-up reflect what hosts actually experience. Paste rows as iops,latency_ms — one per line.
RPO bandwidth estimator
First-order sizing for async replication: can the link drain your change rate inside the RPO window? Overhead covers protocol framing and journal/metadata cost; 1.2–1.3 is a common planning factor. This ignores burstiness — profile peak-hour change rate separately before you commit a design.
Capacity converter — base-2 vs base-10
Arrays, operating systems, and procurement sheets mix GiB (2³⁰) and GB (10⁹) freely. Convert once, at a known boundary, and label the unit.
WWN decoder
Identifies the NAA naming format and the registered vendor (OUI) inside a World Wide Name. Colons, dashes, and case are ignored. Vendor table covers the OUIs most common on enterprise SAN fabrics; an unlisted OUI just means it's outside this table, not that the WWN is invalid.
SAN Zoning Studio
Paste your host HBAs and array targets in bulk; choose the zoneset strategy and naming convention; get complete, reviewable scripts for either fabric OS. One-to-Many builds one zone per HBA containing all selected targets (classic single-initiator zoning); One-to-One builds a zone per HBA-target pair. This generates WWPN-based zone configs specifically — the Zoning Deep Dive above explains why WWPN zoning is the right default and what it does and doesn't protect against.
cfgenable and zoneset activate are disruptive-capable operations.Journal / CDP protection-window sizing
Journal-based replication (EMC/Dell RecoverPoint, Hitachi Universal Replicator, IBM Global Mirror) keeps one living copy plus a rolling log of writes, rather than discrete snapshots. The protection window — how far back you can roll — is bounded by journal capacity versus the rate writes are consumed by it, not by a fixed schedule. This calculator does the sizing arithmetic in both directions: given a change rate and a target window, how big does the journal need to be; and given an existing journal, how much protection window it actually buys you. The exact per-platform constants (log-overhead reserve, safety margin) are vendor- and release-specific and are not hard-coded here — enter your own from the current sizing guide for your platform (RecoverPoint's field guidance and Hitachi's journal-volume sizing whitepaper both publish worked formulas per release; the defaults below are common field-planning starting points, not fixed constants — confirm before you size production capacity).
The shortlist worth bookmarking
Curated, verified, and deliberately short. Primary vendor documentation first; the community references that have earned their place second.
Why trust a personal site over vendor docs?
Don't — use both. Vendor documentation is authoritative for its own platform and always wins on version-specific detail. What it can't give you is the cross-vendor view: the patterns and traps you only learn by making twenty different arrays feed one pipeline. That's the job I do daily — building Python collectors and REST integrations across Pure, NetApp, Dell EMC, Hitachi, IBM, and more for an infrastructure-observability platform — and this site is the notebook from that work, published.
Method: every recipe here follows the same rules as production code. Nothing is included that hasn't been exercised against real or rigorously emulated arrays; behaviors are stated with the API generation they apply to; and when something is a planning heuristic rather than a law (overhead factors, for instance), it's labeled as one. Corrections are welcome and get credited — the fastest route is LinkedIn.
Roadmap: an in-browser array API sandbox (practice real request/response cycles against emulated endpoints, zero setup), more vendors (Hitachi VSP, PowerMax/Unisphere, Isilon/PowerScale), an interactive SRDF course, and Arabic-language editions — there is currently no Arabic-language enterprise storage resource of substance, and that should change.