Skip to content
Ayhan Sipahi Ayhan Sipahi

Managing IAM Policies and Roles at Scale Without Hitting AWS Limits

The exact IAM size, attach, and quota limits you will hit at scale, and the scoped-policy, permission-boundary, and SCP structure that keeps you far from every one.

Abstract

IAM has hard, quiet ceilings. A managed policy can hold only so many characters. A role can carry only so many attached policies. Inline policies share a per-entity size budget that fills up without warning. Accounts have a default role quota. Teams that lean on one shared mega-role and paste permissions through the console drift toward these ceilings, then get blocked mid-deployment with a cryptic error. This guide names the exact current limits, shows what fails when you cross them, and lays out the structure that keeps you clear: least privilege as the default, one policy per responsibility, scoped roles instead of a shared mega-role, and permission boundaries plus SCPs as guardrail layers.

The single idea that organizes all of it: decide whether you want to grant or to cap. IAM identity and resource policies grant. Permission boundaries and SCPs only cap. Keep those two jobs separate and most limit problems never form.

The limits you will actually hit

IAM limits split into two categories, and confusing them wastes real time. Some are quotas you can raise through Service Quotas. Others are fixed character limits that no support ticket will move. Engineers file increase requests for the fixed ones and wait for a rejection that was never going to be an approval.

Adjustable quotas (default, then the maximum Service Quotas will auto-approve):

QuotaDefaultMaximum
Managed policies per role1025
Managed policies per user1020
Managed policies per group1010 (fixed at default)
Customer managed policies per account1,50010,000
Roles per account1,00010,000
Groups per account300500
Role trust policy length2,048 chars8,192 chars

Fixed size limits — the quotas page states plainly that you cannot request an increase for these:

LimitValue
Customer managed policy size6,144 characters
Inline policy sum per user2,048 characters
Inline policy sum per group5,120 characters
Inline policy sum per role10,240 characters
Managed policy versions stored5

The three inline sums are the ones people misremember. The order runs opposite to intuition: the role budget is the largest at 10,240, the user budget the smallest at 2,048. The number is a sum across every inline policy on that one entity, not a per-policy limit.

One more detail worth knowing, quoted from the quotas page: “IAM doesn’t count white space when calculating the size of a policy against these limits.” Minifying JSON buys headroom against the 6,144 and inline ceilings. But if you are reaching for minification to fit, the policy is doing too many jobs. The fix is to split it, not to compress it.

What failure looks like

None of these limits announce themselves early. They surface as a failed deploy or a failed API call at the moment you cross them.

  • A customer managed policy grows past 6,144 characters, often right after someone appends one more resource ARN. CloudFormation rolls back with LimitExceeded, and the CDK deploy fails with it.
  • You attach the eleventh managed policy to a role still on the default quota. The AttachRolePolicy call fails; in CloudFormation this reads as a policy attachment limit error.
  • An inline policy on a role keeps growing until the aggregate crosses 10,240 characters. The next PutRolePolicy returns LimitExceeded.
  • An account approaches its 1,000-role default quota. Per-branch preview environments that each mint fresh roles are a common cause. New stacks then fail to create roles until stale ones are torn down or the quota is raised.

The pattern is the same each time: a limit that was invisible for months becomes a wall on a Friday deploy. The rest of this guide is about staying far from every wall by design.

Least privilege keeps policies small

Least privilege is usually framed as a security virtue. At scale it is also the thing that keeps policies under 6,144 characters. A policy scoped to named actions on named resources is small. A policy that reaches for wildcards to avoid thinking about scope grows without bound, because every “just add one more” lands in the same document.

A scoped policy names the actions and the exact resources:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "ReadWriteUploadsBucket",
      "Effect": "Allow",
      "Action": ["s3:GetObject", "s3:PutObject"],
      "Resource": [
        "arn:aws:s3:::acme-uploads",
        "arn:aws:s3:::acme-uploads/*"
      ]
    }
  ]
}

The lazy version does the same job on paper and fails on every other axis:

{
  "Version": "2012-10-17",
  "Statement": [
    { "Effect": "Allow", "Action": "s3:*", "Resource": "*" }
  ]
}

The wildcard grants every S3 action on every bucket in the account. It inflates blast radius, it is exactly the shape IAM Access Analyzer flags, and it invites the “add one more thing” growth that eventually blows the size limit. Scope the actions and the ARNs from the first line, and the policy stays both safe and small.

One policy, one responsibility

Give each managed policy a single job and a name that states it: s3-read-uploads-bucket, dynamodb-crud-orders-table, sqs-consume-orders-queue. Build a role’s permissions by attaching several of these small policies rather than growing one large inline blob toward the ceiling.

This buys three things. Each document stays small enough to read and audit. Each is reusable across roles that share a genuine need. Each is versioned on its own. The composition math is comfortable too. A role built from four to six single-purpose managed policies sits well under the default cap of ten attached policies. It uses no inline policy at all, so the inline sum never enters the picture.

The failure mode this avoids is the inline policy that grows one statement at a time until PutRolePolicy refuses it. A role composed from named managed policies has no single document to overflow.

Scoped roles, not one shared mega-role

The mega-role is the pattern to design away from. One app-lambda-role attached to every function across dev, staging, and prod ends up carrying the union of every function’s needs. That policy set is large, tends toward wildcards, and runs up against the attachment cap. A single compromised function holds the keys to every service the role can reach.

The alternative is a role per service per environment: orders-api-prod, orders-api-dev, billing-worker-prod. Each role’s policy set stays small. A compromised function can reach only its own service’s resources. The audit trail names exactly which workload holds which grant. The cost of the discipline is more roles, so watch the 1,000-per-account default, especially with ephemeral preview environments that mint roles per branch. Request the increase early rather than at the moment a deploy fails.

The shared-role tension, resolved

There is a real contradiction to face here, and this site contains both sides of it. The guide on breaking through CloudFormation’s 500-resource limit recommends shared IAM roles as a consolidation tactic: one role instead of fifty saves dozens of CloudFormation resources and can keep a stack under the 500 ceiling. Meanwhile the Lambda function granularity guidance, and AWS best practice, push role-per-service to shrink blast radius. Both are correct under different pressures.

The resolution is to name the binding constraint before you choose:

  • When the constraint is CloudFormation resource count, consolidation is the right move. Fewer roles means fewer resources in the template.
  • When the constraint is blast radius or policy size, separation is the right move. More roles means smaller, tighter policies and contained failure.

A scoped-shared pattern satisfies both. A shared role that carries only the permissions genuinely common to every workload it serves (a narrow read-only observability policy, say) stays small and low-risk while still cutting resource count. The pattern that satisfies neither is the mega-role that carries the union of everyone’s needs. It is large and dangerous and still counts as one resource less. Consolidate the truly common; never consolidate the union.

Grant versus ceiling: boundaries and SCPs

Here is the distinction that organizes the whole guide. Only IAM identity and resource policies grant permissions. Permission boundaries and SCPs never grant — they set a maximum the granting policies cannot exceed. Effective permission is the intersection of every applicable layer, and an explicit Deny in any one of them wins.

Permission boundaries cap a single principal. Their most valuable use is delegated creation: when you let a pipeline or a team create roles and users, attach a boundary so the created principal can never exceed a ceiling, even if someone attaches AdministratorAccess to it. The IAM docs are explicit: “a permissions boundary does not provide permissions on its own,” and the effective permissions are the intersection of the identity-based policy and the boundary.

The enforcement comes from a condition on the delegated creator’s own policy. The creator may only create principals that carry the boundary:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "CreateRolesWithBoundary",
      "Effect": "Allow",
      "Action": ["iam:CreateRole", "iam:CreateUser"],
      "Resource": "*",
      "Condition": {
        "StringEquals": {
          "iam:PermissionsBoundary": "arn:aws:iam::111122223333:policy/DelegatedBoundary"
        }
      }
    },
    {
      "Sid": "DenyBoundaryTampering",
      "Effect": "Deny",
      "Action": [
        "iam:DeleteRolePermissionsBoundary",
        "iam:DeleteUserPermissionsBoundary",
        "iam:PutRolePermissionsBoundary",
        "iam:PutUserPermissionsBoundary"
      ],
      "Resource": "*"
    }
  ]
}

The StringEquals condition forces every created principal to carry DelegatedBoundary. The explicit deny stops the delegate from removing or rewriting that boundary later. A boundary is legitimately allowed to be broad, because it is a ceiling and not a grant. So a wildcard inside a boundary is not the same mistake as a wildcard inside an identity policy.

SCPs cap an entire account or organizational unit. In AWS Organizations, an SCP sets the maximum available permissions across every member account. The docs state it directly: “No permissions are granted by an SCP.” You still need IAM policies to grant; the SCP only removes options. Two exceptions are worth writing down: SCPs do not affect the management account, and they do not affect service-linked roles.

SCPs are the layer for org-wide rules that no account should ever be able to override — block a region, prevent an account from leaving the organization, deny disabling of CloudTrail. Treat them with care. Removing FullAWSAccess or attaching an SCP to the org root without testing can lock people out of services across the whole organization. Create a sandbox OU, move one account in, and validate before rolling wider.

Relieving size pressure with groups

The per-user inline budget is small at 2,048 characters, and the fastest way to exhaust it is to paste the same inline policy onto forty users. Forty copies, each eating that user’s budget, all needing the same edit when the permission changes.

Attach one managed policy to an IAM group instead, and put the users in it. One document to update, and zero inline-size pressure on any user. The trade-off to state plainly: groups are for human users only. A role cannot be a member of a group, and services and Lambda functions assume roles, not users. So groups solve human-operator sprawl, not the workload case. For workloads, the answer stays policy-per-responsibility on scoped roles.

Validate with Access Analyzer

Structure keeps you clear of the limits; validation keeps you honest about least privilege. IAM Access Analyzer has three capabilities worth wiring in.

Policy validation checks a policy against grammar and best practice, returning findings, warnings, security warnings, and suggestions. Run it on every policy, ideally in CI.

Custom policy checks run in a pipeline. check-no-new-access answers whether a proposed change grants more than a reference policy; check-access-not-granted answers whether a policy grants a listed critical action. These are the checks that catch a wildcard sneaking into a pull request. They bill at $0.0020 per API call.

Unused access analyzers flag roles, permissions, and credentials that nobody has used, so you can prune toward least privilege continuously rather than in an annual audit. This is a per-principal line item at $0.20 per IAM role or user analyzed per month, so scope which accounts run it. External access analysis — the cross-account and public-exposure findings — is provided at no additional charge.

A decision path

Every IAM change starts with the same two questions: grant or cap, and where does it belong. The tree below roots there.

Grant an action

Cap a principal or account

Workload

Human

One principal

Whole account or OU

Yes

No

IAM change needed

Grant or cap?

Human operator or workload?

One principal or whole account?

Scoped managed policy on a service+env role

Managed policy on an IAM group

Permission boundary with iam:PermissionsBoundary condition

SCP on the OU, tested in a sandbox first

Policy near 6144 or role near 10 attached?

Split into single-purpose managed policies

Access Analyzer validation in CI

A few branch cases the tree compresses:

  • About to reuse an existing role on another service or environment? Default to a new scoped role. Reuse only for a genuine shared read-only concern, and even then prefer a shared policy over a shared write role.
  • A policy approaching 6,144, or an entity approaching its inline sum? Split into single-purpose managed policies, and move duplicated per-user inline policies onto a group-attached managed policy.
  • Approaching ten attached managed policies on a role? Consolidate related statements into tighter policies first; request the increase toward twenty-five only once consolidation is genuinely exhausted.

Common pitfalls

PitfallFix
Inline policy grows until PutRolePolicy failsSplit into managed policies; the role inline sum is 10,240 characters
Attaching the eleventh managed policy failsConsolidate related statements; raise the cap toward 25 only if truly needed
Assuming an SCP grants accessAn SCP only removes; add the matching IAM grant
Wildcard Resource: "*" “for now”Scope ARNs from the start; Access Analyzer will flag it anyway
Forgetting the boundary on delegated creationEnforce it with the iam:PermissionsBoundary condition on the creator
Console click-ops grantsMove to IaC; console drift is invisible and unauditable
Role quota exhausted by preview environmentsRaise the quota proactively and tear down stale roles

The through-line: keep policies small by scoping them, keep roles scoped by service and environment, and keep the grant layer separate from the ceiling layer. Do that and the limits in the first table stay comfortably out of reach.

References

Related posts