A field guide, not a datasheet reprint

The missing manual for storage APIs.

Every array speaks REST with a different accent: its own login handshake, its own pagination dialect, its own idea of what a gigabyte is. Vendor docs describe the happy path. This guide documents how these APIs behave in production — the auth recipes, the errors, and the gotchas that cost real change windows — This site is two learning tracks and a toolbox: the API track — how to connect to 21 enterprise platforms, with production-depth recipes for nine — and the replication track — from RPO/RTO theory through eleven vendor deep dives to a CLI Command Atlas and an interactive simulator that speaks six vendor dialects. Plus the calculators and the bulk SAN Zoning Studio.

Written and maintained by Mahmoud Khalifa — 20+ years in enterprise storage; builds multi-vendor telemetry collectors for a living. Verify everything against the firmware you run; versions drift.

01 · API Foundations

Talk to any array in ten minutes

Every modern enterprise array ships a REST API: an HTTPS endpoint that answers JSON. The GUI you click is usually just a customer of that same API. To automate anything — monitoring, provisioning, reporting — you need exactly three facts per platform: where it listens (base URL and port), how it authenticates, and how it pages large results. Everything else is reading the reference.

The four auth patterns — learn 4, unlock 20+

Twenty vendors did not invent twenty schemes. Every storage API in this guide authenticates in one of four ways:

Pattern	How it works	Platforms using it
1 · Basic per-request	Send credentials (or a client cert) with every call. No session state. Simplest to script; scope the account read-only.	ONTAP, PowerMax (Unisphere), Nutanix, Cisco NX-API
2 · Token exchange	One login call trades credentials or an API token for a short-lived session token you send as a header. On 401, re-login and retry.	Pure FA/FlashBlade, IBM FlashSystem/SVC, Qumulo, VAST, Cohesity, Rubrik, StorageGRID, ECS, Data Domain, Nimble
3 · CSRF session	Basic login creates a cookie session; reads work immediately, but mutations also need an anti-forgery token harvested from a response header.	Unity (`EMC-CSRF-TOKEN`), PowerStore (`DELL-EMC-TOKEN`), PowerScale (`X-CSRF-Token`)
4 · Finite sessions	Login returns a session from a limited pool. Works like pattern 2 — until leaked sessions exhaust the pool and the array refuses logins. Always log out.	Hitachi VSP (CM REST), HPE 3PAR/Primera WSAPI, Brocade FOS REST

The connect matrix — 21 platforms

Each row: the port, the exact login call, the header that carries your identity afterwards, and where the platform's full endpoint reference lives. The nine platforms with a ▸ have full deep-dive tabs in the next section — auth recipes, pagination code, and field gotchas.

Platform	Port · base	Login	Then send	Full endpoint reference
▸ Pure FlashArray	443 · `/api/2.x`	`POST /api/2.x/login` + `api-token` header	`x-auth-token`	Purity REST API Reference on Pure Support (support.purestorage.com); token setup: Settings → Users
Pure FlashBlade	443 · `/api`	`POST /api/login` + `api-token` header	`x-auth-token`	FlashBlade REST API Reference on Pure Support
▸ NetApp ONTAP	443 · `/api`	Basic per request (or cert)	same	on the cluster itself: `https://CLUSTER/docs/api` (Swagger, exact to your version)
NetApp StorageGRID	443 · `/api/v3`	`POST /api/v3/authorize` {username,password}	`Authorization: Bearer`	Grid Management API docs, linked from the Grid Manager UI help
▸ Dell Unity	443 · `/api/types`	Basic + `X-EMC-REST-CLIENT: true`	cookies; writes add `EMC-CSRF-TOKEN`	Unisphere Mgmt REST API Programmer's + Reference Guides (developer.dell.com / Dell Support)
▸ Dell PowerMax	8443 · `/univmax/restapi/{ver}`	Basic per request (to Unisphere)	same	Unisphere REST API docs on developer.dell.com
▸ Dell PowerStore	443 · `/api/rest`	Basic → `GET /api/rest/login_session`	cookies; writes add `DELL-EMC-TOKEN`	PowerStore REST API Reference on developer.dell.com
▸ Dell PowerScale	8080 · `/platform/{n}`	`POST /session/1/session` {username,password,services}	`isisessid` cookie; writes add `X-CSRF-Token`	OneFS API Reference on Dell Support (per OneFS release)
Dell ECS	4443 · mgmt API	`GET /login` with Basic	`X-SDS-AUTH-TOKEN` (from response header)	ECS Management REST API Reference on Dell Support
Dell Data Domain	3009 · `/rest/v1.0`	`POST /rest/v1.0/auth` {auth_info:{username,password}}	`X-DD-AUTH-TOKEN`	DD OS REST API Guide on Dell Support
▸ IBM FlashSystem/SVC	7443 · `/rest`	`POST /rest/auth` + `X-Auth-Username/-Password` headers	`X-Auth-Token`	REST API section of IBM Storage Virtualize docs (ibm.com/docs); endpoints mirror CLI names
▸ Hitachi VSP	23451 · `/ConfigurationManager/v1`	`POST …/sessions` with Basic	`Authorization: Session <token>` — and DELETE it after	Hitachi Ops Center API / CM REST reference (docs.hitachivantara.com)
HPE 3PAR / Primera / Alletra 9000	8080/8443 · `/api/v1`	`POST /api/v1/credentials` {user,password}	`X-HP3PAR-WSAPI-SessionKey` — DELETE the key to log out	WSAPI Developer Guide on HPE Support Center
HPE Nimble / Alletra 6000	5392 · `/v1`	`POST /v1/tokens` {data:{username,password}}	`X-Auth-Token`	Nimble REST API Reference on HPE InfoSight / Support
▸ Nutanix Prism	9440 · v2 GET / v3 POST	Basic per request	same	on Prism itself: REST API Explorer (gear menu); dev docs at nutanix.dev
Qumulo	8000 · `/v1`	`POST /v1/session/login` {username,password}	`Authorization: Bearer`	on the cluster: interactive API docs in the Web UI (API & Tools)
VAST Data	443 · `/api`	`POST /api/token/` {username,password} → JWT	`Authorization: Bearer` (refresh token included)	VMS REST docs served by the VMS itself; support.vastdata.com
Cohesity	443 · `/irisservices/api/v1` · v2 `/v2`	`POST …/public/accessTokens` {username,password,domain}	`Authorization: Bearer`	Cohesity REST API docs, linked from the cluster UI and developer.cohesity.com
Rubrik	443 · `/api/v1`	`POST /api/v1/session` with Basic	`Authorization: Bearer`	Rubrik API Playground on the cluster; docs on the Rubrik support portal
Brocade FOS	443 · `/rest`	`POST /rest/login` with Basic	session key returned in the `Authorization` response header — reuse verbatim; `POST /rest/logout` when done (finite sessions)	FOS REST API Reference on Broadcom support
Cisco MDS	443/8443 · `/ins` (NX-API)	Basic per request; body carries the CLI: {"ins_api":{…,"input":"show zoneset active"}}	same	on the switch: NX-API sandbox at `https://SWITCH/` once `feature nxapi` is enabled

Why "where the reference lives" is often the array itself: ONTAP, Nutanix, Qumulo, Rubrik, VAST, and Cisco all serve interactive API documentation from the device — which is always exactly right for the firmware you run, unlike any website (including this one). Learn the on-box doc location for your platforms first; use portals for everything else.

Universal rules before your first script: create a dedicated read-only account per integration — never script as admin. Treat 401 as "re-login and retry," not failure. Page every list to completion. Convert capacity units once, at the edge, and label them. And on pattern-4 platforms (Hitachi, 3PAR, Brocade): log out in a finally-block, or you will eventually lock everyone out.

02 · API Deep Dives

Nine platforms, in production depth

For the nine platforms below, the connect matrix expands into working recipes: full auth flows, pagination loops in curl and Python, error semantics, and the field gotchas that cost real change windows. This matrix is the skeleton of the guide — each vendor tab below expands every row into working commands.

Platform	Base path	Login	Mutations need	Pagination dialect
Pure FlashArray	`/api/2.x/…`	API token → `x-auth-token`	same token	`limit` + `continuation_token`
NetApp ONTAP	`/api/…`	Basic / cert per request	same	`max_records` + follow `_links.next`
Dell Unity	`/api/types/{r}/instances`	Basic + `X-EMC-REST-CLIENT`	`EMC-CSRF-TOKEN` from a GET	`page`/`per_page`, `entries[].content`
Dell PowerMax	`/univmax/restapi/{ver}/…`	Basic (to Unisphere)	same	iterator handle for large sets
Dell PowerStore	`/api/rest/{r}`	Basic → session	`DELL-EMC-TOKEN`	`limit`/`offset`, 206 + `content-range`
PowerScale / Isilon	`/platform/{n}/…`	session → `isisessid`	`X-CSRF-Token`	`resume=` token replaces the query
Nutanix Prism	`:9440 /api/nutanix/v3`	Basic	same	v3 list = POST with `length`/`offset`
IBM FlashSystem/SVC	`:7443 /rest/ls…`	`/rest/auth` → `X-Auth-Token`	same token	CLI-mirrored; even list calls are POSTs
Hitachi VSP	`/ConfigurationManager/v1/…`	`POST …/sessions` → `Session` token	same — and DELETE the session	`count`/range params per object

Ground rule for everything below: endpoints and behaviors are stated for the API generations named in each tab. Storage firmware moves; before you script a change window, confirm against the exact Purity / ONTAP / Unisphere release you run. When this guide and your array disagree, the array wins.

Pure Storage FlashArray REST

Generations	REST 2.x (current, versioned per Purity release) and REST 1.x (legacy, still enabled on many arrays)
Auth model	Per-user API token, generated in the GUI (Settings → Users → API Token) or CLI (`pureadmin create --api-token`). The token inherits the user's role — a read-only user's token stays read-only.
Session	2.x: exchange the API token for a short-lived `x-auth-token`. 1.x: POST the token to `/auth/session` for a cookie session.
Discover versions	`GET https://array/api/api_version` — no auth needed; returns every REST version the array supports.

Auth recipe — REST 2.x

# 1. Exchange the API token for a session token
curl -sk -X POST "https://ARRAY/api/2.4/login" \
     -H "api-token: YOUR-API-TOKEN" -D -
# → response HEADER contains:  x-auth-token: <session-token>

# 2. Use the session token on every subsequent call
curl -sk "https://ARRAY/api/2.4/volumes?limit=100" \
     -H "x-auth-token: SESSION-TOKEN"

Auth recipe — REST 1.x (legacy arrays)

# Cookie-based session; keep the cookie jar
curl -sk -c cookies.txt -X POST "https://ARRAY/api/1.17/auth/session" \
     -H "Content-Type: application/json" \
     -d '{"api_token": "YOUR-API-TOKEN"}'

curl -sk -b cookies.txt "https://ARRAY/api/1.17/volume"

Pagination — 2.x

# Page with limit + continuation_token until more_items_remaining is false
GET /api/2.4/volumes?limit=500
# response: { "items": [...], "more_items_remaining": true,
#             "continuation_token": "abc..." }
GET /api/2.4/volumes?limit=500&continuation_token=abc...

Python — minimal collector loop

import requests

s = requests.Session(); s.verify = False
r = s.post(f"https://{ARRAY}/api/2.4/login",
           headers={"api-token": TOKEN})
s.headers["x-auth-token"] = r.headers["x-auth-token"]

items, tok = [], None
while True:
    p = {"limit": 500, **({"continuation_token": tok} if tok else {})}
    j = s.get(f"https://{ARRAY}/api/2.4/volumes", params=p).json()
    items += j["items"]
    if not j.get("more_items_remaining"): break
    tok = j["continuation_token"]

HTTP status semantics (REST 1.x/2.x)

200	Success.
400	Invalid action or missing/invalid data — read the response body; Purity's error text is specific.
401	Session not created or expired. Re-login and retry — build this into every collector.
403	Authenticated but not authorized (e.g., a read-only token attempting a POST).
404 / 405	Bad URI / method not valid for that URI.

Field notes — the gotchas

SESSION EXPIRYSessions are short-lived. Long-running collectors must treat 401 as "re-login and retry," not as failure. Losing a poll cycle to an expired token is the most common Pure integration bug.

CAPACITY IS BASE-2Purity sizes are binary: 1G means GiB (2³⁰), not GB. If your reporting layer assumes decimal, capacity will silently disagree with the GUI by ~7%.

KNOW YOUR REDUCTION RATIOSpace objects expose more than one reduction figure — data reduction (dedupe + compression) is not the same as total reduction (which also counts thin-provisioning savings). Quoting the wrong one inflates DRR reports and erodes trust in your numbers.

DESTROYED ≠ GONEA destroyed volume sits in a 24-hour pending-eradication state and still appears in some listings. Filter on the destroyed flag or your inventory counts will drift.

NetApp ONTAP REST

Generations	REST API from ONTAP 9.6 onward, maturing every release. ONTAPI (ZAPI) is deprecated — REST is the only forward path, and NetApp's own tooling has moved to it.
Auth model	HTTP Basic authentication (or client certificates) on every request — no session dance. Scope the account: a dedicated read-only REST role for monitoring is one command away and worth it.
Docs on-box	Every cluster serves its own interactive Swagger UI at `https://CLUSTER/docs/api` — the reference that exactly matches the version you run.

Auth + first call

curl -sku admin:PASSWORD \
  "https://CLUSTER/api/storage/volumes?fields=name,size,svm.name&max_records=100"
# response envelope: { "records": [...], "num_records": N,
#                      "_links": { "next": { "href": "..." } } }

Pagination — follow the link, don't build it

import requests
url = f"https://{CLUSTER}/api/storage/volumes"
params = {"fields": "name,size,space", "max_records": 500}
recs = []
while url:
    j = requests.get(url, params=params, auth=(USER, PW), verify=False).json()
    recs += j["records"]
    nxt = j.get("_links", {}).get("next", {}).get("href")
    url = f"https://{CLUSTER}{nxt}" if nxt else None
    params = None  # the next-link already carries the query

Behaviors worth knowing

Field selection	Responses are minimal by default. Ask for what you need with `?fields=`; `fields=*` exists but is expensive on big clusters.
Queries	Any property doubles as a filter: `?state=online&size=>100GB`. Unit suffixes (KB, MB, GB, TB) are accepted in query values.
SVM scoping	Headers `X-Dot-SVM-Name` / `X-Dot-SVM-UUID` scope a call to an SVM through the cluster interface — cleaner than sprinkling `svm.name` through every body.
Rate limiting	Under pressure ONTAP answers `429` or `503` with an explanatory body. Back off exponentially; don't hammer.
Async jobs	Long operations return `202 Accepted` plus a job link — poll `/api/cluster/jobs/{uuid}` to completion instead of assuming success.

Field notes — the gotchas

REST ≠ ZAPI RENAMEDField names differ from ONTAPI and the CLI, and rarely-used CLI parameters simply aren't exposed. Port ZAPI collectors by mapping fields deliberately — never by string substitution.

7-MODE IS ANOTHER PLANETThe REST API is clustered ONTAP only. If your estate still has 7-Mode filers, that's a separate (ZAPI/CLI) collection path with different capacity semantics — budget for both.

LATENCY IS PER-WORKLOADThere is no single "array latency." Meaningful roll-ups are IOPS-weighted across volumes or workloads — the calculator in the Tools section below does exactly this math.

DOT-SEGMENTS IN FILE PATHSUn-encoded .. in file-level endpoints resolves per RFC 3986 — a DELETE aimed at a file can resolve to the volume. URL-encode dots (%2E%2E) in file paths, always.

Dell Unity REST (Unisphere)

Shape	Collection: `/api/types/{resource}/instances` · Single object: `/api/instances/{resource}/{id}`. Resources: `pool`, `lun`, `storageResource`, `filesystem`, `host`, `metric`…
Auth model	HTTP Basic plus the mandatory header `X-EMC-REST-CLIENT: true`. The first authenticated GET establishes a cookie session.
CSRF	Every POST / PUT (MOD) / DELETE must carry `EMC-CSRF-TOKEN` — a value you harvest from the response headers of any prior GET in the same session.

Auth + CSRF recipe

# 1. Login GET — keep cookies, capture the CSRF token from headers
curl -sk -c ck.txt -D - -o /dev/null \
  -H "X-EMC-REST-CLIENT: true" -u admin:PASSWORD \
  "https://UNITY/api/types/loginSessionInfo/instances"
# → header:  EMC-CSRF-TOKEN: <token>

# 2. Reads: cookies + client header are enough
curl -sk -b ck.txt -H "X-EMC-REST-CLIENT: true" \
  "https://UNITY/api/types/pool/instances?fields=name,sizeTotal,sizeUsed"

# 3. Writes: add the CSRF token
curl -sk -b ck.txt -X POST \
  -H "X-EMC-REST-CLIENT: true" \
  -H "EMC-CSRF-TOKEN: <token>" \
  -H "Content-Type: application/json" \
  -d '{"name":"LUN_APP01","lunParameters":{"pool":{"id":"pool_1"},"size":1099511627776}}' \
  "https://UNITY/api/types/storageResource/action/createLun"

Reading responses

# Everything is wrapped: entries[].content
{ "entries": [
    { "content": { "id": "pool_1", "name": "Pool_SSD",
                   "sizeTotal": 21990232555520, "sizeUsed": 9895604649984 } }
] }
# Paginate with ?page=N&per_page=M ; add &compact=true to trim envelopes

Field notes — the gotchas

NO fields= → IDs ONLYUnity returns only object IDs unless you explicitly enumerate ?fields=. Every "why is my response empty" ticket starts here.

CSRF TOKEN LIFECYCLEThe token binds to the session. On 401, redo the login GET, harvest a fresh token, and retry — cache both together, invalidate both together.

SIZES ARE BYTESAll capacity fields are raw bytes. Decide once — at the edge of your pipeline — whether you present base-2 or base-10, and convert exactly once. Mixed conventions inside a pipeline is how 7% discrepancies are born.

BLOCK vs FILE METRICS DIFFERLUN performance and NAS (file-system) metrics live in different resource families with different granularity. A collector that treats them as one shape will parse block cleanly and quietly drop file.

Dell PowerMax / VMAX — Unisphere REST

Base path	`https://UNISPHERE:8443/univmax/restapi/{version}/…` — the API version rides in the path (e.g. `/100/` family for Unisphere 10.x, `/9x/` for 9.x) and everything is scoped by `symmetrixId`.
Auth	HTTP Basic on every call — no session handshake. You talk to Unisphere, which proxies the arrays it manages; one Unisphere, many serials.
SDK	PyU4V is the de-facto Python client — with a caveat below.

First calls

# Which arrays does this Unisphere manage?
curl -sku user:PASS "https://UNISPHERE:8443/univmax/restapi/100/system/symmetrix"

# SRDF state per storage group — the compliance workhorse
curl -sku user:PASS \
 "https://UNISPHERE:8443/univmax/restapi/100/replication/symmetrix/{sid}/storagegroup/{sg}/rdf_group"

Field notes — the gotchas

ITERATORS FOR BIG RESULTSLarge result sets come back as an iterator handle, not a full list — page the iterator to completion (/common/Iterator/… family) or you'll silently process the first page only.

rdf_mode IS A NESTED LISTrdf_mode lives inside group_details.modes as a list (['ASYNC'], ['ADAPTIVE_COPY']) — filter SRDF/A statistics through that list, never a flat field.

PyU4V RENAMES METHODSMethod names change between PyU4V releases (e.g. _srdf_list → _srdf_group_list). Verify at runtime; pinning by memory is how collectors break on upgrade day.

Dell PowerStore REST

Base path	`https://ARRAY/api/rest/{resource}` — flat, modern, consistent resource names (`volume`, `appliance`, `replication_session`, `metrics`).
Auth	Basic login to `/api/rest/login_session` establishes a session; mutations require the `DELL-EMC-TOKEN` header harvested from the login response — the same CSRF philosophy as Unity, new header name.

Auth + query recipe

# 1. Login — capture cookies and the DELL-EMC-TOKEN header
curl -sk -c ck.txt -D - -o /dev/null -u admin:PASS \
  "https://ARRAY/api/rest/login_session"

# 2. Reads: select fields explicitly, page with limit/offset
curl -sk -b ck.txt \
  "https://ARRAY/api/rest/volume?select=id,name,size&limit=1000&offset=0"

# 3. Writes: add the token
curl -sk -b ck.txt -X POST -H "DELL-EMC-TOKEN: <token>" \
  -H "Content-Type: application/json" -d '{"name":"vol01","size":1099511627776}' \
  "https://ARRAY/api/rest/volume"

Field notes — the gotchas

206 IS SUCCESSPaged reads answer 206 Partial Content with a content-range header. Treat 206 as success and keep paging — collectors that only accept 200 stop after the first page.

select= OR NOTHING USEFULLike Unity: no select= means minimal objects. Enumerate the fields you need.

UNPLANNED FAILOVER = LAST RPO SNAPSHOTOn the replication side, an unplanned failover promotes the destination to the last synchronized snapshot — surface that in DR reporting rather than implying zero loss for async sessions.

Dell PowerScale / Isilon — OneFS PAPI

Base path	`https://CLUSTER:8080/platform/{n}/…` — the Platform API, versioned by number in the path; namespaces like `/platform/…/statistics`, `/quota`, `/snapshot`, `/sync` (SyncIQ).
Auth	Basic works; production collectors should create a session — `POST /session/1/session` with `{"username","password","services":["platform"]}` — yielding the `isisessid` cookie plus a CSRF token to echo back as `X-CSRF-Token` on mutations.

Pagination — resume tokens

# First page
GET /platform/12/quota/quotas?limit=1000
# response ends with:  "resume": "1-1-MAAw..."   (null when done)

# Every later page: the resume token REPLACES all other query params
GET /platform/12/quota/quotas?resume=1-1-MAAw...

Field notes — the gotchas

RESUME REPLACES THE QUERYOnce you pass resume=, OneFS rejects other filters on the same request — the token encodes them. Collectors that re-append limit= get a 400 and blame the array.

CAPACITY HAS THREE ANSWERSCluster capacity, pool capacity, and quota accounting answer different questions (protection overhead included or not). A 100+ TiB "drop" that's actually OneFS recalculating FlexProtect overhead is a rite of passage — verify which number you're graphing before you file the bug.

PAPI VERSION PER ENDPOINTEndpoints advance versions independently (/platform/12/… next to /platform/3/… on one cluster). Pin per-endpoint, not per-cluster.

Nutanix Prism REST

Base path	Port `9440`. v2 (element-level): `/PrismGateway/services/rest/v2.0/…` · v3 (Prism Central, intent-based): `/api/nutanix/v3/…`
Auth	HTTP Basic on both generations.

The v3 shape — list is a POST

# v2: conventional GET
curl -sk -u admin:PASS "https://PRISM:9440/PrismGateway/services/rest/v2.0/storage_containers/"

# v3: listing is a POST with a body — not a GET
curl -sk -u admin:PASS -X POST -H "Content-Type: application/json" \
  -d '{"kind":"vm","length":500,"offset":0}' \
  "https://PC:9440/api/nutanix/v3/vms/list"

Field notes — the gotchas

GET-ONLY CLIENTS BREAK ON v3v3 list endpoints are POSTs with kind/length/offset bodies. Generic "REST collector" frameworks that assume GET-for-read fail here by design.

TWO APIS, TWO SCOPESv2 speaks to a cluster (Prism Element); v3 speaks to the manager-of-managers (Prism Central). Inventory that mixes both double-counts unless you dedupe on cluster UUID.

IBM FlashSystem / SVC — Storage Virtualize REST

Base path	`https://CLUSTER:7443/rest/…` — endpoints mirror the CLI verbs almost 1:1 (`/rest/lssystem`, `/rest/lsvdisk`, `/rest/lsrcrelationship`), which makes 20 years of SVC CLI muscle memory instantly useful.
Auth	POST `/rest/auth` with headers `X-Auth-Username` / `X-Auth-Password` → JSON token, sent thereafter as `X-Auth-Token`.

Auth recipe

curl -sk -X POST -H "X-Auth-Username: superuser" -H "X-Auth-Password: PASS" \
  "https://CLUSTER:7443/rest/auth"
# → { "token": "..." }

curl -sk -X POST -H "X-Auth-Token: TOKEN" "https://CLUSTER:7443/rest/lsvdisk"

Field notes — the gotchas

POSTS EVERYWHEREEven list ("ls*") endpoints are POSTs on this API. Wire your client accordingly.

code_level IS A TUPLEEverything is firmware-gated. Compare code_level as a version tuple, never a float — 9.10 is newer than 9.1, and float compares say otherwise.

8.7.1 REMOVED REMOTE COPYFrom firmware 8.7.1, Metro/Global Mirror and HyperSwap are gone and lsrcrelationship returns empty — that's not "no replication," it's Policy-Based Replication. Detect PBR explicitly.

Hitachi VSP — Configuration Manager REST

Base path	`…/ConfigurationManager/v1/objects/storages/{deviceId}/…` — the storage device ID rides in every path; one CM/Ops Center API endpoint fronts multiple arrays. Newer VSP One / Ops Center surfaces add OAuth2 (Keycloak-issued bearer tokens) in front of the same resource model.
Auth	Classic flow: `POST …/sessions` with Basic → a session object with a token, sent as `Authorization: Session <token>`.

Session recipe — and the cleanup that matters

# 1. Open a session
curl -sk -u user:PASS -X POST \
  "https://CM:23451/ConfigurationManager/v1/objects/storages/{devId}/sessions"
# → { "token": "...", "sessionId": N }

# 2. Use it
curl -sk -H "Authorization: Session TOKEN" \
  "https://CM:23451/ConfigurationManager/v1/objects/storages/{devId}/ldevs?count=200"

# 3. ALWAYS close it — session slots are finite
curl -sk -H "Authorization: Session TOKEN" -X DELETE \
  "https://CM:23451/ConfigurationManager/v1/objects/storages/{devId}/sessions/{sessionId}"

Field notes — the gotchas

SESSIONS ARE A FINITE RESOURCELeaked sessions accumulate until the array refuses new logins. DELETE your session in a finally-block — this single habit prevents the most common Hitachi integration outage.

SNAPSHOT POLICY ISN'T HERESnapshot retention/schedules live in Ops Center Protector or CCI, not the array REST — derive local-copy cadence from timestamps if REST is all you have.

THREE MANAGEMENT SURFACESCCI/raidcom, Ops Center, and CM REST expose overlapping-but-different views of the same array. Pick one source of truth per fact; blending them mid-pipeline creates phantom drift.

03 · Replication Learning Path

Part 1 — Theory: two numbers, two axes, one rule

Business continuity is keeping the business running when something breaks; disaster recovery is the technical subset storage engineers own. Every DR design reduces to two numbers — and confusing them is the most common mistake in the field.

Number	Question it answers	What sets it
RPO	How much data can you afford to lose? "How far back in time does my recovered copy sit?"	Replication / snapshot frequency. Hourly snapshots → best-case RPO of one hour.
RTO	How long can you afford to be down? "How long until I'm running again?"	Failover speed: automation, orchestration, standby compute.
RPA / RTA	What did you actually measure at the last drill?	The gap between the O and the A is exactly what a DR program exists to close.

The cost curve: RPO→0 needs synchronous replication (low-latency links, a second array mostly idle). RTO→0 needs orchestration and pre-provisioned standby. Both get exponentially more expensive near zero. DR design is not making everything zero — it is matching spend to the business impact of each workload.

The Seven Tiers (SHARE/IBM, 1992 — still maps onto everything)

Tier	Shape	Typical RPO / RTO
0	No DR. Recovery is rebuild-from-scratch.	effectively infinite
1–2	Periodic backups shipped off-site (the "pickup-truck access method"; today, a dedup appliance or cloud).	RPO ~24 h / RTO days
3–4	Electronic vaulting + point-in-time copies — where most snapshot-based array replication lives.	RPO hours / RTO hours
5–6	Continuous async or sync replication to a hot/warm standby.	RPO sec–0 / RTO min–hours
7	Sync replication + full orchestration; active-active metro clusters.	RPO 0 / RTO ~0

The two axes that make every marketing name legible

Axis 1 — timing (when the host gets its ack): synchronous commits on both arrays before acknowledging — RPO 0, latency pays the round trip, distance practically capped near ~100 km / sub-10 ms. Asynchronous acknowledges locally and catches the remote up — RPO > 0, unlimited distance.

Axis 2 — mechanism (how the change travels):

Mechanism	How it works	Copies on target	Lag / RPO driver
Snapshot / periodic	Ship the delta between scheduled snapshots; target keeps N discrete, immutable copies. The only mechanism producing countable copies. RPO floors ≈ 5 min on most arrays.	Countable (N of M)	schedule frequency
Journal-based	Every write logged with a sequence number; target drains the journal in exact order — perfect write-order consistency.	One living copy	journal fill vs drain
Delta-set / cycle	Writes batch into a fixed cycle (e.g. 15 s) and ship as one dependent-write-consistent set; target is always consistent to a cycle boundary.	One living copy	average cycle time
Streaming	Writes stream near-continuously as cache fills — smallest async RPO (seconds), but the link must be sized near peak write rate.	One living copy	link vs peak writes

Active-active (metro / stretched) is a special shape of sync: both arrays present the same volume and serve I/O simultaneously, kept identical through a quorum witness. No source, no target — compliance is pair state, Active or Suspended. Examples: Hitachi GAD, SRDF/Metro, PowerStore Metro Volume, Pure ActiveCluster, NetApp MetroCluster.

The rule that ties it together — mechanism decides the math. Snapshot replication yields a real "N of M" compliance count. Journal, delta-set, and streaming keep one living copy, so compliance is binary: synced or not. Active-active is binary too: Active or Suspended. And local copies and remote copies are always reported separately — 18 local + 20 remote is "18 local, 20 remote," never 38, because they protect against different failures.

Finally: a consistency group ties volumes together so they replicate and fail over as a unit, preserving write order across all of them. Any database with data and logs on separate volumes needs one. Every vendor implements it; only the name changes — consistency group, protection group, RDF group, journal group, copy group.

Why synchronous replication has a latency floor

"RPO 0" has a mechanical cost, and it shows up as a hard distance/latency ceiling — not a marketing footnote. Under synchronous replication the host write is not acknowledged until the remote array has the data too: host → local cache → wire → remote cache → ack back → ack to host. That round trip sits directly in every write's response time, which is why sync deployments are commonly planned inside a sub-10 ms round-trip / ~100 km envelope — vendor-specific ceilings vary (SRDF/S, TrueCopy, and ONTAP Synchronous SnapMirror each publish their own supported distance/latency tables; treat this as a planning heuristic, not a physical constant, and confirm against your platform's current interoperability matrix).

Two consequences follow directly from that mechanism, not from any one vendor's implementation:

1 — Latency past the ceiling doesn't degrade gracefully, it stalls writes. Because the local array withholds the ack until the remote confirms, a link that regresses from 6 ms to 15 ms doesn't just make replication "a bit behind" — every synchronous write on every affected volume now waits for that round trip. Application-visible write latency inflates by roughly the added round-trip time, and sustained congestion can back up host I/O queues. This is the operational reason nearly every sync implementation ships a fallback (SRDF/S can drop to Adaptive Copy, TrueCopy pairs can suspend, PowerStore/Pure/ONTAP sync pairs can trip to an async or suspended state) rather than block the host indefinitely — confirm which fallback behavior your platform and pair mode actually use, since "hangs versus trips to async" is a per-vendor, sometimes per-setting, decision.

2 — An HBA or path change on either array can force a full fabric/zoning reconfirmation. Where zoning binds on WWPN identity (the norm for production fabrics — see the Zoning Deep Dive below), replacing a failed HBA changes the WWPN the fabric sees, and every zone that named the old WWPN needs updating before the replacement port can rejoin the same conversation. Port-based zoning avoids that specific reconfiguration but reintroduces the problem this guide's zoning section covers: physical-port binding breaks the moment someone patches a different cable into that port.

Working out actual asynchronous RPO — not the adjective, the number

"Asynchronous" describes the acknowledgment model, not a number. The number a DR runbook needs is: if I fail over right now, how far back does my recovered copy sit? That depends on which of the four async mechanisms above is running, and the formula differs by mechanism:

Mechanism	Worst-case RPO formula	Worked example
Snapshot / periodic	schedule interval + time-to-detect-and-declare a disaster	15-min pgroup schedule, 3-min detection → up to 18 min of loss
Delta-set / cycle (SRDF/A)	≈ 2 × average cycle time, worst case (an in-flight cycle plus the next one starting)	15 s default SRDF/A cycle → worst case ≈ 30 s, typical case ≈ one cycle
Journal-based (UR, RecoverPoint, Global Mirror)	journal drain time at the moment of failure — bounded by how far behind the journal was allowed to fall, not by a fixed schedule	see the journal sizing calculator below — this is exactly the number it estimates
Streaming	≈ current replication lag (seconds, tracked directly — e.g. ONTAP's `lag_time`)	healthy link: 2–10 s · link falling behind peak write rate: lag grows until the link catches up or the target falls further behind

The trap this table exists to close: a snapshot schedule interval of 15 minutes is not your RPO — it's the best-case component of your RPO. The number that belongs in a DR runbook also accounts for detection time and, for journal/streaming mechanisms, however far the replica had actually fallen behind at the moment of failure — which you only know by monitoring lag/journal-fill directly, not by reading a schedule setting.

04 · Vendor Deep Dives

Part 2 — Eleven platforms, mapped to the same axes

Each card: the replication technologies, which mechanism they are underneath, and the field-verified gotchas. Expand what you run.

Dell PowerMax / VMAX — SRDFdelta-set · sync · active-active

SRDF is the canonical delta-set implementation and the deepest replication stack in the industry; SnapVX provides local snapshots alongside.

Technology	Mechanism / timing	RPO
SRDF/S	synchronous	0
SRDF/A	async, delta-set cycle	≈ cycle time — default 15 s on current Enginuity (30 s is legacy)
SRDF/Metro	active-active (R1 and R2 both RW, witness-arbitrated)	0 · binary pair-state compliance
SnapVX	local snapshots (countable)	local protection, reported separately

FIELD NOTESSRDF/A cycle default is 15 s, not 30. · In the Unisphere REST payload, rdf_mode lives nested in group_details.modes as a list (e.g. ['ASYNC']) — not a flat field. · PyU4V renames methods between releases; verify method names at runtime, don't pin blindly.

NetApp ONTAP — SnapMirror familysnapshot · sync · active-active

SnapMirror is the franchise: a relationship ships the delta between Snapshot copies from source to destination, with a directly exposed lag_time metric.

Technology	Mechanism / timing	RPO
SnapMirror Async	snapshot-based (countable, policy-driven retention)	schedule interval; watch lag_time
SnapMirror Synchronous	synchronous, one-way	0
SnapMirror active sync (ex SM-BC)	consistency-group sync / active-active	0 · binary pair-state
MetroCluster	active-active at cluster level	0

FIELD NOTESState snapmirrored + healthy=true is the synced proxy — anything else counts as not-synced. · snapmirrorTransfers exists only on ONTAP 9.11+ — gate with a tuple version compare, never float (9.10 vs 9.1 is the classic bug). · active sync uses policy types automated-failover(-duplex): classify as binary sync state, never as countable snapshots.

Pure Storage FlashArraysnapshot · journal · active-active

Three clean shapes, all built on pods and protection groups: periodic snapshot async, ActiveDR journal-based near-sync, and ActiveCluster sync active-active.

Technology	Mechanism / timing	RPO
Async (pgroup snapshots)	snapshot-based, countable	pgroup schedule frequency
ActiveDR	journal-based continuous async on a pod	seconds (near-sync)
ActiveCluster	synchronous active-active (stretched pod + mediator)	0 · binary

FIELD NOTESpgroup snapshot frequency is in seconds, not milliseconds — a classic units bug. · A pod under ActiveDR can't simultaneously carry async pgroups or ActiveCluster — relationships are per-pod. · ActiveDR target objects have different serial numbers than the source — never key a join on serial alone.

Hitachi VSPjournal · sync · active-active

Organized around journals for async, across three management surfaces (CCI, Ops Center, Configuration Manager REST).

Technology	Mechanism / timing	RPO
Universal Replicator (UR)	journal-based async	journal fill vs drain (derived)
TrueCopy	synchronous	0
Global-Active Device (GAD)	active-active, quorum-arbitrated	0 · binary
ShadowImage / Thin Image	local clone / local snapshot	local protection

FIELD NOTESSnapshot retention/schedule is not in the array REST — it lives in Ops Center Protector or CCI; infer local-copy cadence from timestamps. · UR journal-fill is a derived RPO proxy, not a Hitachi-published gauge — label it as derived. · ShadowImage (clone) and Thin Image (snapshot) are different technologies; don't conflate. · The journal/CDP sizing calculator in Tools models this same journal-fill-vs-drain relationship generically.

IBM FlashSystem / SVC (Storage Virtualize)sync · journal · snapshot · active-active

The widest mechanism spread in one platform — and a hard generational break at firmware 8.7.1, where Policy-Based Replication replaces the classic Remote Copy family.

Technology	Mechanism / timing	RPO
Metro Mirror	synchronous	0
Global Mirror (non-cycling)	journal-style continuous async	seconds
GM with Change Volumes (GMCV)	snapshot/cycle async	cycle default 300 s (60–300 s flagged not recommended); max RPO ≈ 2× cycle
HyperSwap	active-active	0 · binary
Policy-Based Replication (8.7.1+)	policy-driven async / HA	per policy

FIELD NOTESFirmware 8.7.0 is the last release with classic Remote Copy — on 8.7.1+ lsrcrelationship is empty; detect PBR instead. · Everything is firmware-gated: compare code_level as a tuple, never a float. · GMCV default cycle is 300 s, not 60.

HPE 3PAR / Primera / Alletra 9000 — Remote Copysync · snapshot · streaming

Technology	Mechanism / timing	RPO
Remote Copy Synchronous	synchronous	0
Async Periodic	snapshot-based	interval — minimum 5 minutes; don't model tighter
Async Streaming	streaming continuous	seconds
Peer Persistence	per-volume active/standby (ALUA, same WWN, quorum) — transparent failover, not simultaneous active-active	0 · binary

FIELD NOTESModes are named (Sync / Async Periodic / Async Streaming), never numbered. · WSAPI does not expose snapshot retention depth — that's CLI-only (showschedule); derive expected local copies from creation→expiration timing. · Managed via creatercopygroup / setrcopygroup.

Dell Unity XTsnapshot · sync · file-metro · CDP add-on

Technology	Mechanism / timing	RPO
Native Async	snapshot / RPO-policy driven (block + file)	configured RPO
Native Sync	synchronous (block + file)	0
MetroSync (file)	file active/standby	0 · binary
RecoverPoint (block)	journal-based, any point in time	seconds, journal-bounded

FIELD NOTESNever run RecoverPoint on a resource already under native replication, and never point it at the Sync Replication port. · A NAS server carries at most one sync + three async sessions (four total). · Fan-out/cascade and bridge-mode file topologies need OE 5.0 / 5.2+. · Sizing the journal for a given protection window? Use the journal/CDP sizing calculator in Tools.

Dell PowerStoresnapshot · sync · active-active

All replication is native software — no add-on license.

Technology	Mechanism / timing	RPO
Async	RPO-policy snapshot-driven	configured RPO
Sync	synchronous	0
Metro Volume	active-active	0 · binary · bounded to ~96 km / <10 ms — HA, not long-distance DR

FIELD NOTESAn unplanned failover promotes the destination to the last synchronized RPO snapshot — an incomplete final sync means a small async data gap. · Metro "Pause" takes the non-preferred volume offline to hosts; plan maintenance around it.

HPE Nimble / Alletra 6000snapshot · active/standby

Technology	Mechanism / timing	RPO
Snapshot replication	protection schedules + templates, partner-to-partner, countable	schedule interval
Peer Persistence	volume-granular transparent failover	0 · binary

FIELD NOTESReplication is partner-based — link/target identity comes from replication_partners. · Consistency lives at the volume-collection level; group dependent volumes there.

Cohesity DataProtectsnapshot · journal CDP · cloud tiers

A backup platform, not a primary array — its "replication" protects backup data and VMs.

Technology	Mechanism / timing	RPO
SnapTree snapshots	incremental-forever backup foundation	backup frequency
Cluster-to-cluster replication	snapshot shipping between clusters	policy schedule
CloudArchive / CloudReplicate	cloud copy / cloud-resident DR cluster	policy schedule
CDP	journal-based, VMware VAIO filter	near-zero (VM-level)

FIELD NOTESCDP is VM-level, not array-level, and currently on-prem-to-on-prem VMware. · CDP needs dedicated storage — reserve for mission-critical workloads. · Replicated/archived copies are backups: recovery may require a restore (CDP and instant-mass-restore excepted).

Rubrik Security Cloudsnapshot · journal CDP · immutable

Technology	Mechanism / timing	RPO
Incremental-forever snapshots (Atlas)	immutable filesystem, SLA-domain policies	SLA frequency
SLA-driven replication + archival	snapshot shipping / cloud-object archive	SLA schedule
CDP	journal-based, VM-level (VMware)	near-zero

FIELD NOTESImmutability is the point — Atlas snapshots can't be deleted by compromised credentials; that is the ransomware story. · Retention is policy-driven: read the SLA's local-vs-archive split, not a per-job config. · Mixing CDP and snapshot SLAs in one blueprint yields continuous vs discrete recovery points — know which you're promising.

This section condenses the author's full Business Continuity & Storage Replication Field Reference (Clear Technologies / VSI Platform Engineering, 2026) — vendor facts verified against current vendor documentation at time of writing.

05 · Replication Command Atlas

The verbs, per platform — CLI, not API

The lifecycle from the theory section, expressed in each platform's native operator CLI. Rows are the same everywhere; only the words change. Commands are the canonical forms — production use takes device/group arguments and flags that vary by version; the linked vendor references carry every option.

Dell PowerMax / VMAX — SYMCLI `symrdf`

Stage	Command	Notes
Inventory	`symrdf list` · `symrdf -sid SID list`	all RDF devices/groups; rich filters (`-rdfa`, `-concurrent`, `-dynamic`)
Status	`symrdf -g DG query` · `symrdf verify -synchronized` · `symrdf ping`	query a device/consistency group; verify asserts a state; ping tests RDF links
Create + first sync	`symrdf createpair -establish`	`-file pairs.txt -type R1 -rdfg N`; add `-rdf_mode async` for SRDF/A
Pause / resume	`symrdf suspend` / `symrdf resume`	link NR; R2 stays write-disabled
Split (both RW)	`symrdf split`	R2 becomes writable too — for DR tests against real data
Failover / failback	`symrdf failover` → `symrdf update` → `symrdf failback`	`update` pre-copies R2 changes home so failback is brief
Reverse roles	`symrdf swap`	R1↔R2 personalities; link must be NR (post-suspend/split/failover)
Mode / teardown	`symrdf set mode sync\|async\|acp_disk` · `symrdf deletepair`	acp_disk = bulk copy without host-I/O impact (the migration workhorse)

NetApp ONTAP — `snapmirror`

Stage	Command	Notes
Inventory / status	`snapmirror show` · `snapmirror list-destinations`	watch `lag_time`, state, healthy
Create + first sync	`snapmirror create` → `snapmirror initialize`	needs a vserver peer + a DP-type destination volume
Incremental	`snapmirror update`	or the policy schedule does it for you
Pause / resume	`snapmirror quiesce` / `snapmirror resume`	quiesce completes the in-flight transfer, then holds
Failover	`snapmirror break`	destination becomes RW (state Broken-off); quiesce first when planned
Failback / reverse	`snapmirror resync`	direction follows the source-path you resync toward; re-protect, then break/resync the original way
Single-file restore	`snapmirror restore`	pull data back out of a destination without breaking it
Teardown	`snapmirror delete` + `snapmirror release`	delete the relationship, then release source-side metadata

Hitachi VSP — CCI (`pair*` / `horctakeover`)

Stage	Command	Notes
Status	`pairdisplay -g GRP -fcx` · `pairvolchk`	states: COPY, PAIR, PSUS, PSUE, SSWS
Create + first sync	`paircreate -g GRP -f async\|never`	UR pairs ride journal groups; TC pairs are fence-level based
Pause / resume	`pairsplit -g GRP` / `pairresync -g GRP`	`pairsplit -rw` makes the S-VOL writable (DR test)
Failover (planned or not)	`horctakeover -g GRP`	swap-takeover when links are healthy; S-VOL takeover (→ SSWS) when the primary is gone
Failback / reverse	`pairresync -swaps`	resync with role swap — the return leg after SSWS
Teardown	`pairsplit -S`	simplex — dissolves the pair

IBM FlashSystem / SVC — Storage Virtualize CLI

Stage	Command	Notes
Status	`lsrcrelationship` · `lsrcconsistgrp`	empty on firmware 8.7.1+ — that means Policy-Based Replication, not "no replication"
Create	`mkrcrelationship -master V1 -aux V2 -cluster REMOTE`	add `-global` for Global Mirror, `-cyclingmode multi` for GMCV
Start / stop	`startrcrelationship` / `stoprcrelationship`	`-force` variants exist; group forms: `*rcconsistgrp`
Failover	`stoprcrelationship -access`	grants host access to the auxiliary — the takeover verb
Reverse / failback	`switchrcrelationship -primary aux\|master`	flips copy direction once both sides are consistent
PBR era (8.7.1+)	`chvolumegroup -replicationpolicy POL` · `lsvolumegroupreplication`	replication becomes a policy attached to a volume group

HPE 3PAR / Primera / Alletra 9000 — Remote Copy CLI

Stage	Command	Notes
Status	`showrcopy`	groups, targets, sync state, last-sync times
Create	`creatercopytarget` → `creatercopygroup GRP TARGET:sync\|periodic\|async` → `admitrcopyvv VV GRP TARGET:VV_DR`	mode is named at group creation
Start / stop	`startrcopygroup GRP` / `stoprcopygroup GRP`	periodic groups also take `setrcopygroup period`
Planned switchover	`setrcopygroup switchover GRP`	orderly role reversal, no data loss
Disaster failover	`setrcopygroup failover GRP`	run on the target system
Return home	`setrcopygroup recover GRP` → `setrcopygroup restore GRP`	recover resyncs back; restore reverts roles to original

Pure Storage FlashArray — Purity CLI

Stage	Command	Notes
Async (pgroup) status	`purepgroup list --schedule` · `purepgroup list --transfer`	frequency is in seconds
Async create	`purepgroup create --targetlist ARRAY2 PG` · `purepgroup setattr --replicate-frequency N PG`	members via `purepgroup add --vollist`
ActiveDR status	`purepod list` · `purepod list --replica-link`	pod is the consistency + failover unit
Pause / resume	`purepod replica-link pause` / `… resume`
Failover / failback	`purepod promote POD` (target) · `purepod demote POD`	demote with `--skip-quiesce` exists for emergencies — know what it forfeits

Dell PowerScale — SyncIQ (`isi sync`)

Stage	Command	Notes
Status	`isi sync policies list` · `isi sync jobs list` · `isi sync reports list`
Create / run	`isi sync policies create` → `isi sync jobs start POLICY`	schedule-driven; directory-tree scoped
Failover	`isi sync recovery allow-write POLICY`	run on the target cluster — makes the target tree writable
Failback prep	`isi sync policies resync-prep POLICY`	creates the mirror policy that carries changes home; then run it, allow-write on source, resync-prep again

Dell Data Domain — MTree replication

Stage	Command	Notes
Create + first sync	`replication add source mtree://… destination mtree://…` → `replication initialize`
Status	`replication status` · `replication show performance`	lag and throughput per context
Failover	`replication break`	destination MTree becomes writable
Failback	`replication resync`	re-establishes after a break, in either direction

Dell Unity XT & PowerStore — session CLIs

Platform	Verbs	Notes
Unity (`uemcli`)	`/prot/rep/session show` · `… -id ID sync` · `failover` · `failback`	sessions are the object; `-async`/planned flags per operation
PowerStore (`pstcli` / REST actions)	`replication_session show` · `pause` · `resume` · `sync` · `failover` · `reprotect`	CLI verbs mirror the REST action names one-to-one

HPE Nimble / Alletra 6000 — volume collections

Stage	Command	Notes
Status	`volcoll --list` · `partner --list`	replication rides protection schedules on volume collections
Planned handover	`volcoll --handover NAME --partner P`	graceful role reversal — drains, then flips
Disaster	`volcoll --promote NAME` (on target) · later `volcoll --demote NAME --partner P`	promote grants writes at DR; demote rejoins the original as replica

Platforms with no operator CLI for replication — on purpose: Cohesity and Rubrik replicate by policy (protection policies / SLA Domains) applied in the UI or API; Dell ECS and NetApp StorageGRID replicate by storage policy (replication groups / ILM rules) across sites. There are no pair verbs to memorize — the skill shifts to reading the policy and verifying compliance, which is exactly what the theory section's "mechanism decides the math" rule prepares you for.

Full option references: Dell Solutions Enabler SRDF CLI Guide, the ONTAP command reference, Hitachi CCI guides, IBM Storage Virtualize command docs, and the HPE 3PAR CLI Reference — see Resources.

06 · Replication Rosetta Stone

One discipline, four vocabularies

SRDF, SnapMirror, Universal Replicator, and ActiveDR are the same idea wearing four uniforms: a source that owns the write, a target that shadows it, a link between them, and a set of verbs for breaking and reversing that relationship on purpose. Engineers who know one stack freeze when handed another — not because the concepts changed, but because every vendor renamed them. This table is the translation layer.

Scope: mappings are conceptual equivalents, not drop-in substitutes — consistency semantics, RPO behavior, and prerequisites differ per platform and per mode (sync vs async). Commands shown are the canonical CLI forms (SYMCLI, ONTAP CLI, Hitachi CCI, Purity CLI); flags vary by version. Rehearse on non-production pairs before any real failover.

Command & concept mapping

Concept	Dell EMC SRDF	NetApp SnapMirror	Hitachi (TrueCopy / UR)	Pure ActiveDR
Unit of replication	Device pair (R1 → R2) in an RDF group	Volume relationship (source → destination)	P-VOL → S-VOL pair in a copy / journal group	Pod (volumes + config) over a replica link
Consistency construct	Consistency group (`symcg`)	Consistency group (SM-S) / per-volume Snapshot lineage	Consistency group (CTG); journals for UR	The pod itself is the consistency boundary
Create + first sync	`symrdf createpair -establish`	`snapmirror create` → `snapmirror initialize`	`paircreate`	`purepod replica-link create`
Incremental update	continuous (SRDF/S sync, SRDF/A cycles)	`snapmirror update` (async, scheduled)	continuous (TC sync; UR via journals)	continuous near-sync
Pause / resume	`symrdf suspend` / `symrdf resume`	`snapmirror quiesce` / `snapmirror resume`	`pairsplit` / `pairresync`	replica-link pause / resume
Planned failover	`symrdf failover`R1 write-disabled, R2 RW	`snapmirror quiesce` + `break`destination RW	`horctakeover`swap-takeover when links healthy	`purepod promote` (target)demote source first
Unplanned failover	`symrdf failover` from surviving side	`snapmirror break` at destination	`horctakeover`S-VOL takeover → SSWS	`purepod promote` at target
Failback	`symrdf update` → `symrdf failback`	`snapmirror resync` (reverse) → break → resync original	`pairresync` variants, then takeover back	demote / promote back across the link
Reverse roles for good	`symrdf swap`R1↔R2 personalities	delete + re-create relationship in reverse (or reverse resync)	swap-takeover	promote target, demote original — direction follows
Healthy state name	Synchronized (S) / Consistent (A)	Snapmirrored	PAIR	replicating
Split state name	Split / Suspended / Failed Over	Broken-off / Quiesced	PSUS (planned) / PSUE (error) / SSWS (takeover)	paused / promoted
Sync flavors	SRDF/S (sync) · SRDF/A (async) · Adaptive Copy (bulk)	Async (XDP policies) · SnapMirror Synchronous (Sync / StrictSync)	TrueCopy (sync) · Universal Replicator (async, journal)	ActiveDR (near-sync) · ActiveCluster (sync, stretched pod + mediator)

The lifecycle every stack shares

Strip away the vendor names and one state machine remains. Learn it once; map it forever.

Three rules that survive every vendor

1 — Failover is a write-ownership transfer, not a copy operation. Whether it's symrdf failover, snapmirror break, or horctakeover, the command's real job is deciding which side is allowed to accept writes. Data movement is what happens before and after.

2 — Async means the target is a point in time, not a mirror. SRDF/A cycles, SnapMirror schedules, and UR journals all trade currency for distance. Know the cycle/schedule interval — that is a component of your RPO, and it belongs in the DR runbook as a number, not an adjective; see the worked RPO formulas above for how that interval becomes an actual worst-case number per mechanism. (The RPO bandwidth calculator below turns change rate into required link capacity; the journal/CDP sizing calculator turns change rate into required journal capacity.)

3 — The failback plan is the failover plan. Every takeover creates an inverted relationship that someone must resync, reverse, or rebuild. A DR drill that ends at "application is up at site B" is half a drill. Write the return leg first.

07 · Interactive Lab

The replication simulator

One replication pair, six vendor CLIs. The state machine underneath never changes — only the vocabulary does, which is the entire thesis of this site made playable. Type commands, watch the pair react, and run the missions every storage engineer must be able to perform half-asleep. Type help to list every modeled command in the current dialect — inventory (symrdf list, lsrcrelationship, showrcopy…), health (ping, verify, pairvolchk), mode changes, split for DR tests, update before failback, swap — the full lifecycle. Switch dialects mid-mission and finish in another vendor’s words.

replication-lab · training pair APP_DB01

symcli>

Training model, on purpose: the simulator teaches state transitions and verb mapping, not exact CLI output or every flag. Real commands take device/group arguments and have prerequisites this lab intentionally simplifies. Rehearse the real thing on non-production pairs.

08 · SAN Zoning Deep Dive

Why hard zoning exists, and where it breaks down

The Zoning Studio below generates zone configs; this section is the theory the tool assumes you already know. Two topics that "soft vs. hard zoning" 101-level writeups usually name but don't finish explaining: what soft zoning's WWPN-based membership actually fails to stop, and what happens to zone enforcement once NPIV puts more than one host identity behind a single physical switch port.

Soft zoning: membership is not enforcement

Zoning has two independent jobs that are easy to conflate: membership (which WWPNs are configured into a zone together) and enforcement (what actually stops traffic between WWPNs that aren't). Soft zoning does the first and not the second.

Under soft zoning, the fabric's name server simply omits devices outside a WWPN's zone from that WWPN's query results — a host asking "what targets exist?" only gets back the targets it's zoned to see. That is a discovery filter, not a traffic block. A host, VM, or compromised initiator that already knows (or guesses, or is manually configured with) a target's WWPN can address it directly — the switch enforces nothing at the frame level and simply forwards the frame, because soft zoning never programmed a hardware filter to drop it. Hard zoning is different in exactly this respect: the fabric switch enforces zone membership at the port ASIC, validating source ID (S_ID) and destination ID (D_ID) on every frame and dropping traffic between ports whose IDs aren't co-zoned — independent of what any device claims its own WWPN is.

What this means operationally: soft zoning is a convenience/organization feature (cleaner name-server output, fewer support calls about "phantom" targets) and a compliance nicety on trusted, well-managed fabrics — it is not a security boundary. Any environment doing multi-tenant SAN access, or zoning across a boundary you don't fully trust, wants hard zoning (or WWPN zoning enforced at the ASIC — check your specific switch's default; some fabrics ship WWPN zoning as soft by default and require explicit hard-zone configuration) plus LUN masking as the actual control. Treat WWPN membership as an inventory/organization tool and hard enforcement + masking as the security control — don't let one stand in for the other.

NPIV: when the WWPN you're zoning isn't the physical port

N-Port ID Virtualization (NPIV) lets one physical HBA port register multiple virtual WWPNs with the fabric — the mechanism behind per-VM WWPNs on a hypervisor, and behind N-Port Virtualizer (NPV) blade-switch designs where an entire chassis's server ports "borrow" fabric services from an upstream core switch rather than joining the fabric as full switches themselves.

Standard zoning theory assumes a roughly 1:1 relationship between a physical port and the WWPN sitting behind it. NPIV breaks that assumption on purpose — and that has two concrete consequences worth knowing before you zone an NPIV or blade environment:

1 — Zone enforcement responsibility shifts upstream. An NPV-mode blade switch doesn't hold a full fabric login and doesn't enforce zoning itself the way a full switch does — it proxies logins from its downstream server ports up to a core/enforcing switch, which is where zone membership is actually checked. Design and troubleshooting both have to account for this: a zoning problem that looks like it's on the blade switch is frequently a zone-database or zoneset-activation issue on the upstream core switch instead. Confirm which switch in the topology is the actual zoning enforcer before spending a change window on the wrong device — your fabric vendor's NPV/NPIV configuration guide states this explicitly per platform (Cisco MDS NPV, Brocade Access Gateway).

2 — Every virtual WWPN still needs its own zone membership. Because NPIV multiplexes several independent fabric identities onto one physical port, zoning by physical port location (port zoning) doesn't work at all in an NPIV environment — there is no single "the device on this port" to zone. WWPN-based zoning is effectively mandatory here: each virtual WWPN (each VM's virtual HBA, or each blade server's per-blade WWPN) is zoned individually, exactly as if it were its own physical initiator. On a densely virtualized hypervisor host or a full blade chassis, that means the zone count scales with virtual/blade WWPNs, not physical uplinks — plan zone-database and switch TCAM capacity against that real count, not the physical port count, on any large NPIV or blade deployment.

Both points above describe the standard, vendor-documented NPV/NPIV design tradeoff (fabric services proxied upstream; WWPN-only zoning) rather than a single vendor's specific defect — but exact TCAM budgets, maximum WWPNs-per-port, and zone-database size ceilings are switch-model- and firmware-specific. Check your platform's current configuration limits before sizing a large NPIV deployment.

09 · Modern Storage Practice

Ransomware resilience, fabric evolution, key management

Five topics that came up repeatedly when checking what practitioners actually search for versus what this guide covered. Weighted honestly: some of these have real, citable, vendor-verified mechanics; one — capacity forecasting — turned out to be mostly vendor marketing dressed as methodology, and is treated that way below rather than padded out.

Ransomware-resilient backup: 3-2-1-1-0

The classic 3-2-1 rule (3 copies, 2 different media, 1 offsite) says nothing about an adversary who can authenticate to your backup infrastructure and delete or encrypt the backups themselves — which is exactly what modern ransomware playbooks target before triggering encryption on production. 3-2-1-1-0 adds two digits to close that gap. Per Veeam, which popularized this extension of the older 3-2-1 rule: the additional 1 means one copy that is offline, air-gapped, or immutable — these are alternatives satisfying the same requirement, not three separate mandates, though some secondary sources conflate "air-gapped" and "immutable" as if they were the same digit; they aren't identical mechanisms, just interchangeable ways to satisfy this one. The 0 means zero recovery errors, verified by actually testing recovery, not by assuming a completed backup job is a restorable one.

What "immutable" technically means, and why the distinction matters: three different mechanisms get called "immutable" and they are not equivalent. WORM (write-once-read-many) blocks in-place modification at the filesystem level until a retention period expires. S3 Object Lock has two meaningfully different modes: governance mode blocks delete/overwrite unless the caller holds a specific bypass permission (an admin can still override it), while compliance mode blocks it for everyone, including the account root, until the lock expires — confusing these two in a design review is a real risk, not a technicality. Array-level immutable snapshots are typically enforced as a time-locked retention flag the array itself refuses to honor a delete against, for any caller, until the timer expires — this is enforcement at the array, not a permission grant that a sufficiently privileged admin can route around.

Platform	Feature	Mechanism
Pure Storage	SafeMode Snapshots	Destroyed snapshots enter an eradication timer (default 24h, configurable up to 30 days on FlashArray) during which they cannot be permanently removed. Increasing the timer is a lower-friction request; lowering or disabling it requires going through Pure Support with two designated, Support-verified authorized contacts approving — the asymmetry between the two directions is the point.
NetApp ONTAP	SnapLock (Compliance / Enterprise) + Snapshot copy locking	A tamper-resistant ComplianceClock enforces WORM. Compliance mode: no one, including cluster admins, can delete before expiry. Enterprise mode: a privileged admin retains an early-delete path. Snapshot copy locking (ONTAP 9.12.1+) extends the same clock to lock individual Snapshot copies, not just SnapLock volumes.
Dell PowerMax	Secure Snaps	Time-locked SnapVX snapshots; no user can terminate a Secure Snap during its retention period, and it auto-terminates at TTL expiry once no linked targets or restore sessions remain — check current documentation for the exact behavior of a snap actively in a restore operation.
Dell PowerStore	Secure Snapshots	Block and file snapshots that cannot be deleted even by the top administrative role; expiration can be extended but never reduced. PowerStoreOS 3.5 documents secure-snapshot replication and conversion of existing snapshots to secure — treat 3.5 as a confirmed capability point rather than necessarily the feature's introduction release; check Dell's release notes for the exact version if that distinction matters to your design.
Dell PowerProtect Cyber Recovery	Isolated recovery vault (separate product)	Applies here as a distinct product, not a PowerStore/PowerMax feature: an air-gapped vault holding retention-locked immutable copies plus clean-room recovery analytics, positioned as the last line after primary immutable snapshots.
IBM FlashSystem / Storage Virtualize	Safeguarded Copy (8.4.2+)	Immutable point-in-time copies held in an isolated backup pool that is never mapped to a host — the copy is unreachable for modification or deletion by design, not merely by permission.
Hitachi VSP	Thin Image + Data Retention Utility	Snapshots carry a customer-set retention timer that cannot be shortened by an admin once applied, layered with WORM. The strongest immutability claims in the current lineup are model-specific (VSP One Block 20 with HDPS IntelliSnap) rather than a blanket capability across every VSP generation — check your specific model.
Nutanix	WORM on Nutanix Unified Storage (Files Enterprise WORM + Objects Object Lock)	Native Nutanix Files (Enterprise WORM, 4.1+) and native Nutanix Objects (S3-compatible Object Lock) enforce write-once-read-many immutability for all callers during the retention window — scoped to the Files/Objects (NUS) services specifically, not native VM/volume-level snapshots; don't assume it covers a Nutanix AHV VM snapshot, it doesn't. Data Lens is a separate analytics/ransomware-detection layer over NUS — it reports on and helps recover from threats, but the immutability enforcement itself lives in Files/Objects, not in Data Lens.

Not independently verified for this table: a distinct admin-proof immutable snapshot feature on Dell Unity XT specifically (only ordinary retention schedules and host-access locks were confirmed) — if your environment depends on Unity for ransomware resilience, verify current firmware capability directly with Dell rather than assuming parity with PowerMax/PowerStore.

NVMe-oF: the fabric bindings, and why FC-NVMe is a different standards body

NVMe over Fabrics (NVMe-oF) is published by NVM Express, Inc. — the same organization that owns the base NVMe specification. NVMe-oF 1.0 (2016) defined the fabric-independent command and queueing model plus an initial RDMA transport binding (covering InfiniBand, RoCE, and iWARP under one RDMA binding). NVMe-oF 1.1 (2019) added the TCP transport binding (NVMe/TCP) along with improved multipath and discovery. The spec family has since been restructured into a modular set of documents, all maintained at nvmexpress.org.

The nuance worth getting right — FC-NVMe is not an NVM Express document. NVMe over Fibre Channel (FC-NVMe, aka NVMe/FC) is defined by INCITS Technical Committee T11 — the Fibre Channel standards body — and published as ANSI/INCITS 540. NVM Express deliberately left the Fibre Channel transport to T11 rather than authoring it themselves, because FC already had its own mature standards body and FC-4 frame-mapping convention. In practice, "NVMe-oF" gets used loosely to mean the whole family (RDMA + TCP + FC), while "FC-NVMe" specifically denotes the T11-defined FC transport binding — worth being precise about which one you mean in a design document, since they come from different standards processes with different change cadences.

This sits alongside this guide's existing FC and iSCSI content as the third transport family: NVMe/TCP and NVMe/RoCE run over Ethernet fabrics (the RoCE binding needs a lossless/DCB-configured fabric the same way FCoE does; NVMe/TCP does not), while FC-NVMe rides existing Fibre Channel fabrics and zoning exactly like traditional FCP — the zoning theory above applies to FC-NVMe unchanged, since zoning operates on WWPN identity regardless of which upper-layer protocol (FCP or FC-NVMe) rides on top of the FC fabric.

Encryption and key management: SED vs. array-level vs. application-level, and KMIP

Three layers get called "encryption at rest" and they protect against different failure modes:

Layer	How it works	What it protects against	Tradeoff
Self-encrypting drive (SED)	Hardware AES engine on the drive itself, governed by TCG Enterprise/Opal protocols	Data exposure from a physically removed or decommissioned drive	Near-zero performance cost, but provides no protection for a running, authenticated array — the drive decrypts transparently for any authorized controller
Array/controller-level	Software or controller-based encryption applied above the drive layer, centrally keyed	Broader at-rest exposure with centralized key management	Simpler key management than per-drive SEDs, but naive implementations that encrypt before dedup/compression destroy both — most array vendors encrypt after reduction specifically to avoid this, verify yours does too
Application-level	Encrypted before data ever leaves the host/application	Storage-layer compromise entirely — the array never sees plaintext	Strongest protection against a compromised storage layer, but ciphertext's high entropy defeats storage-side dedup/compression outright, and complicates backup, search, and restore workflows

FIPS 140-2 is being retired — this is a near-term deadline, not a distant one. NIST's CMVP stopped accepting new FIPS 140-2 validation submissions on September 22, 2021 for most vendors — a narrow carve-out extended that for CSTL-contracted vendors already in the pipeline before June 15, 2021, but even that extension closed for good on April 1, 2022. All new module validations target FIPS 140-3 today. Existing FIPS 140-2 validated modules remain accepted for federal use through September 21, 2026, the date on which NIST's Cryptographic Module Validation Program moves them to the "Historical" list — still valid for already-deployed systems, but explicitly discouraged for new procurements. If you're specifying storage for a federal or federally-adjacent environment, confirm which FIPS generation your target array's crypto module is actually validated against, not just whether it says "FIPS-compliant" on a datasheet.

KMIP (Key Management Interoperability Protocol) is an OASIS-ratified protocol for standardized communication between storage/encryption endpoints and a centralized external key management server, typically over TCP port 5696. Dell, IBM, and NetApp are documented members of the OASIS KMIP Technical Committee; Nutanix documents support for KMIP-compliant external key management servers in its own security documentation. Pure FlashArray documents Purity//FA native encryption with external KMS integration in its security guides; independently confirm current KMIP-specific support against Pure's current documentation before designing around it, since the mechanism wasn't verified line-by-line against a live KMIP conformance statement for this guide. Same caveat for Hitachi VSP — treat vendor KMIP support as something to confirm against the specific firmware/CM release you run, not as a blanket guarantee across a platform family.

Capacity forecasting — and an honest assessment of what's actually out there

Most vendor capacity-forecasting features are marketed as predictive or AI-driven without publishing the underlying methodology — which, on inspection, is itself informative: it suggests there usually isn't much methodology to publish. NetApp is the one vendor in this guide's coverage that documents its actual algorithm: Active IQ Digital Advisor's Capacity Forecast feature computes an average weekly growth rate from up to twelve months of historical used-capacity data, then extrapolates that rate forward across a one-to-six-month window, flagging systems approaching a 90% projected-utilization threshold — explicitly accounting for reconfiguration events (an aggregate expansion isn't misread as organic growth). That is a real, documented growth-rate-extrapolation methodology — not the "AI/ML-driven" framing Active IQ's broader marketing implies elsewhere; the ML capabilities Active IQ is best known for (anomaly and performance detection) are documented as separate features from this specific capacity forecast, though NetApp's own materials don't draw that exact boundary in a single place.

What this guide will not do: manufacture forecasting methodology that doesn't exist in public vendor documentation. Dell CloudIQ's and Pure's predictive-capacity / run-out-date features describe the output (a projected exhaustion date) without publishing algorithm internals in public docs — which is a legitimate reason to treat their outputs as directional planning signals, not audited projections. If your capacity planning needs to survive an audit or a budget justification, build your own weighted-average or exponential-smoothing model from your own historical utilization data rather than relying on an unpublished vendor black box — at minimum you'll be able to explain the number when asked.

Hybrid and multi-cloud storage: two concrete mechanisms, not marketing

NetApp FabricPool operates at the block level (4KB blocks), not the file level — the same mechanism tiers both NAS and SAN data uniformly, since ONTAP doesn't distinguish file-vs-LUN at the tiering layer itself. A tiering minimum cooling period defines how long a block must go untouched before it's eligible to move: the Auto policy defaults to 31 days and is manually adjustable from 2 to 183 days (ONTAP 9.8+). (Cloud Volumes ONTAP has a separate, distinct behavior where Auto tiering activates once the aggregate crosses roughly 50% capacity — don't conflate that trigger with the on-prem cooling-period default described here.) A daily background scan finds cold blocks, packages them into 4MB objects, and writes them to the configured object store (AWS S3, Azure Blob, Google Cloud Storage, or an on-prem S3-compatible target including StorageGRID). Tiering policy (None / Snapshot-Only / Auto / All) controls which data classes are eligible for tiering at all.

AWS Storage Gateway bridges on-prem to cloud through four gateway types (a fourth, FSx File Gateway, is closed to new customers but remains documented — not detailed here), all sharing a read-through/write-back local cache — writes commit locally first for low latency, then replicate asynchronously to AWS: S3 File Gateway presents NFS/SMB and lands files as native S3 objects directly manageable via S3 APIs afterward; Volume Gateway presents iSCSI block volumes, with "cached volumes" mode keeping only a working set local while the full dataset lives in S3, and point-in-time snapshots materializing as incremental (changed-blocks-only) EBS snapshots; Tape Gateway emulates a virtual tape library and media changer over iSCSI for existing backup software, with virtual tapes living in S3 and optionally archived to Glacier Flexible Retrieval or Glacier Deep Archive.

Other vendors in this guide's coverage (Pure, Dell, Hitachi, IBM, Nutanix) have their own cloud-tiering and hybrid mechanisms; they aren't detailed here because this guide doesn't yet have primary-source-verified mechanics for them at the same depth as FabricPool and Storage Gateway above. Treat their absence as "not yet researched to this guide's standard," not as "doesn't exist."

10 · Engineering Tools

The math you keep re-deriving

Five calculators, all client-side — nothing you type leaves your browser. Each encodes a formula storage engineers rebuild in spreadsheets every year.

IOPS-weighted latency

Σ(IOPSᵢ × latencyᵢ) ÷ Σ(IOPSᵢ) — the only honest way to roll per-volume latency up to an array number

A straight average lets a thousand idle volumes hide one suffering database. Weighting by IOPS makes the roll-up reflect what hosts actually experience. Paste rows as iops,latency_ms — one per line.

workload rows (iops,latency_ms)

RPO bandwidth estimator

required link ≈ (change volume ÷ replication window) × protocol overhead

First-order sizing for async replication: can the link drain your change rate inside the RPO window? Overhead covers protocol framing and journal/metadata cost; 1.2–1.3 is a common planning factor. This ignores burstiness — profile peak-hour change rate separately before you commit a design.

data changed (GB)

over a window of (hours)

target RPO (minutes)

overhead factor

Capacity converter — base-2 vs base-10

1 TiB = 1.0995 TB · the ~7–10% gap behind half of all capacity disputes

Arrays, operating systems, and procurement sheets mix GiB (2³⁰) and GB (10⁹) freely. Convert once, at a known boundary, and label the unit.

value

from

WWN decoder

NAA format + IEEE OUI vendor lookup — paste any 16-hex-digit WWN/WWPN

Identifies the NAA naming format and the registered vendor (OUI) inside a World Wide Name. Colons, dashes, and case are ignored. Vendor table covers the OUIs most common on enterprise SAN fabrics; an unlisted OUI just means it's outside this table, not that the WWN is invalid.

WWN / WWPN

SAN Zoning Studio

bulk multi-host · Brocade FOS & Cisco MDS · alias or raw-WWN · zoneset clone workflow — the browser successor to the Auto-Zone workbook (Ramez Nagui, 2016) that storage teams still pass around

Paste your host HBAs and array targets in bulk; choose the zoneset strategy and naming convention; get complete, reviewable scripts for either fabric OS. One-to-Many builds one zone per HBA containing all selected targets (classic single-initiator zoning); One-to-One builds a zone per HBA-target pair. This generates WWPN-based zone configs specifically — the Zoning Deep Dive above explains why WWPN zoning is the right default and what it does and doesn't protect against.

switch vendor

zone members by

zoning relation

zoneset strategy

current / base zoneset name

clone / new zoneset name

zone name prefix

Cisco VSAN id

Cisco alias type

host HBAs — one per line: alias, wwpn

array targets — one per line: alias, wwpn

Change-window discipline: the clone-then-activate strategy exists so your rollback is one command — re-activate the original zoneset. Review names against your fabric convention, run in a maintenance window, and never paste generated config into a switch you haven't been authorized to change. cfgenable and zoneset activate are disruptive-capable operations.

Journal / CDP protection-window sizing

journal capacity ÷ change rate ≈ how far back in time your journal-based replica can roll — RecoverPoint, Hitachi UR, IBM Global Mirror

Journal-based replication (EMC/Dell RecoverPoint, Hitachi Universal Replicator, IBM Global Mirror) keeps one living copy plus a rolling log of writes, rather than discrete snapshots. The protection window — how far back you can roll — is bounded by journal capacity versus the rate writes are consumed by it, not by a fixed schedule. This calculator does the sizing arithmetic in both directions: given a change rate and a target window, how big does the journal need to be; and given an existing journal, how much protection window it actually buys you. The exact per-platform constants (log-overhead reserve, safety margin) are vendor- and release-specific and are not hard-coded here — enter your own from the current sizing guide for your platform (RecoverPoint's field guidance and Hitachi's journal-volume sizing whitepaper both publish worked formulas per release; the defaults below are common field-planning starting points, not fixed constants — confirm before you size production capacity).

average write / change rate (MB/s)

target protection window (hours)

journal metadata / log-overhead reserve (fraction)

safety margin (multiplier)

Reverse direction — I already have a journal, what window does it buy me?

existing journal size (GB)

average write / change rate (MB/s)

What this ignores, on purpose: real change rate is bursty, not average — size against your measured peak-hour rate (from array performance history or the replication appliance's own reporting), not a 24-hour average, or the journal will drain faster than planned during exactly the write spike that matters. Minimum journal sizes and maximum consistency-group counts are also platform- and release-specific ceilings this calculator does not model — check current vendor sizing limits before committing a design.

11 · Resources

The shortlist worth bookmarking

Curated, verified, and deliberately short. Primary vendor documentation first; the community references that have earned their place second.

ONTAP REST API — Getting Started NetApp's canonical intro: request shape, queries, pagination, SVM tunneling, rate limits. docs.netapp.com Pure — Try the REST API Token generation and first calls against FlashArray with a REST client, kept current by Pure. (Pure's community blog now lives at everpuredata.com — old blog.purestorage.com links redirect here.) blog.everpuredata.com Pure FlashArray Python client The official Python wrapper: install, quick start, and full API glossary. pure-storage-python-rest-client.readthedocs.io Pure1 REST API Fleet-level (as-a-service) telemetry: key-pair auth, capacity and busy-meter data across arrays. blog.everpuredata.com Dell — Unity REST walkthrough Dell's developer-blog walkthrough of Unity REST including the EMC-CSRF-TOKEN flow, with links to the official Programmer's and Reference guides. dell.com/community ONTAP 9 Simulator (official) NetApp's downloadable simulator — a full ONTAP in a VM for labs. Requires a NetApp account. kb.netapp.com FlackBox — free NetApp lab eBook Step-by-step build of a complete two-cluster ONTAP simulator lab on your own PC, free. flackbox.com ONTAP day-to-day CLI cheat sheet A working admin's command reference for daily ONTAP operations, actively maintained. blog.matrixpost.net rajeshvu — SRDF Operations The classic illustrated walkthrough of SRDF failover, failback, split, swap, and update. rajeshvu.com rajeshvu — symrdf command list Worked symrdf examples: pair files, dynamic RDF groups, modes, and queries. rajeshvu.com Learn Claude Code interactively Not storage — but the learn-by-doing format this site tips its hat to. Terminal simulators in the browser, by Ahmed Nagdy. claude.nagdy.me Dell Solutions Enabler — SRDF CLI Guide The full symrdf reference: every option for list, query, verify, ping, and all control operations. delltechnologies.com (PDF) ONTAP snapmirror command reference Every snapmirror verb and flag, per ONTAP release (Lenovo-hosted mirror of the ONTAP command reference). pubs.lenovo.com HPE 3PAR/Primera/Alletra replication toolkit HPE's official PowerShell wrappers around the Remote Copy CLI and WSAPI — a readable map of every rcopy verb. github.com/HewlettPackard m-khalifa.com The author's portfolio — including a live AI twin briefed on 20+ years of storage engineering. Ask it anything on these topics. m-khalifa.com

12 · About & Method

Why trust a personal site over vendor docs?

Don't — use both. Vendor documentation is authoritative for its own platform and always wins on version-specific detail. What it can't give you is the cross-vendor view: the patterns and traps you only learn by making twenty different arrays feed one pipeline. That's the job I do daily — building Python collectors and REST integrations across Pure, NetApp, Dell EMC, Hitachi, IBM, and more for an infrastructure-observability platform — and this site is the notebook from that work, published.

Method: every recipe here follows the same rules as production code. Nothing is included that hasn't been exercised against real or rigorously emulated arrays; behaviors are stated with the API generation they apply to; and when something is a planning heuristic rather than a law (overhead factors, for instance), it's labeled as one. Corrections are welcome and get credited — the fastest route is LinkedIn.

Roadmap: an in-browser array API sandbox (practice real request/response cycles against emulated endpoints, zero setup), more vendors (Hitachi VSP, PowerMax/Unisphere, Isilon/PowerScale), an interactive SRDF course, and Arabic-language editions — there is currently no Arabic-language enterprise storage resource of substance, and that should change.

The missing manual for storage APIs.

Talk to any array in ten minutes

The four auth patterns — learn 4, unlock 20+

The connect matrix — 21 platforms

Nine platforms, in production depth

Pure Storage FlashArray REST

Auth recipe — REST 2.x

Auth recipe — REST 1.x (legacy arrays)

Pagination — 2.x

Python — minimal collector loop

HTTP status semantics (REST 1.x/2.x)

Field notes — the gotchas

NetApp ONTAP REST

Auth + first call

Pagination — follow the link, don't build it

Behaviors worth knowing

Field notes — the gotchas

Dell Unity REST (Unisphere)

Auth + CSRF recipe

Reading responses

Field notes — the gotchas

Dell PowerMax / VMAX — Unisphere REST

First calls

Field notes — the gotchas

Dell PowerStore REST

Auth + query recipe

Field notes — the gotchas

Dell PowerScale / Isilon — OneFS PAPI

Pagination — resume tokens

Field notes — the gotchas

Nutanix Prism REST

The v3 shape — list is a POST

Field notes — the gotchas

IBM FlashSystem / SVC — Storage Virtualize REST

Auth recipe

Field notes — the gotchas

Hitachi VSP — Configuration Manager REST

Session recipe — and the cleanup that matters

Field notes — the gotchas

Part 1 — Theory: two numbers, two axes, one rule

The Seven Tiers (SHARE/IBM, 1992 — still maps onto everything)

The two axes that make every marketing name legible

Why synchronous replication has a latency floor

Working out actual asynchronous RPO — not the adjective, the number

Part 2 — Eleven platforms, mapped to the same axes

The verbs, per platform — CLI, not API

Dell PowerMax / VMAX — SYMCLI symrdf

NetApp ONTAP — snapmirror

Hitachi VSP — CCI (pair* / horctakeover)

IBM FlashSystem / SVC — Storage Virtualize CLI

HPE 3PAR / Primera / Alletra 9000 — Remote Copy CLI

Pure Storage FlashArray — Purity CLI

Dell PowerScale — SyncIQ (isi sync)

Dell Data Domain — MTree replication

Dell Unity XT & PowerStore — session CLIs

HPE Nimble / Alletra 6000 — volume collections

One discipline, four vocabularies

Command & concept mapping

The lifecycle every stack shares

Three rules that survive every vendor

The replication simulator

Why hard zoning exists, and where it breaks down

Soft zoning: membership is not enforcement

NPIV: when the WWPN you're zoning isn't the physical port

Ransomware resilience, fabric evolution, key management

Ransomware-resilient backup: 3-2-1-1-0

NVMe-oF: the fabric bindings, and why FC-NVMe is a different standards body

Encryption and key management: SED vs. array-level vs. application-level, and KMIP

Capacity forecasting — and an honest assessment of what's actually out there

Hybrid and multi-cloud storage: two concrete mechanisms, not marketing

The math you keep re-deriving

IOPS-weighted latency

RPO bandwidth estimator

Capacity converter — base-2 vs base-10

WWN decoder

SAN Zoning Studio

Journal / CDP protection-window sizing

The shortlist worth bookmarking

Why trust a personal site over vendor docs?

Dell PowerMax / VMAX — SYMCLI `symrdf`

NetApp ONTAP — `snapmirror`

Hitachi VSP — CCI (`pair*` / `horctakeover`)

Dell PowerScale — SyncIQ (`isi sync`)