Governance at scale is a dream that can come true when using proper tools and automation that address at minimum account management, security & compliance, and cost management.

Over the past few months, the topic of “governance at scale” landed in my inbox at an increased rate – these spanned simple customer enquiries to committed and ready to hit the “proceed to production button”. Several proof of concepts, or more accurately stated, technology sandpits started to emerge from the day-to-day activities. Diving into the technologies and how to best make the topic a reality started to take shape with a lot of head-scratching along the way. In this planned to be short, but no length guaranteed article – we will explore some of the rationale and approaches to governance at scale.

Before we dive in, let us set a baseline – what is mean by “governance”. The Oxford dictionary defines it as “the action or manner of governing a state, organization, etc.”.

Great, but somewhat vague in my opinion. Let us try “Security Governance”, this is defined as “…the combined set of tools, personnel, and processes that provide for formalized risk management. It includes organizational structure, roles and responsibilities, metrics, processes, and oversight, as it specifically impacts the security program”

This definition is closer to the money and our cloud requirements of governance at scale.

Cloud challenges

Security governance has always been and still is a key requirement in any organisations IT-landscape, however the adoption of public cloud resulted in several interesting challenges for the traditional IT-security governance model. Traditional enterprises commonly had and still have many applications and databases, deployed on well-known server infrastructures, operated by trusted operations teams that are backed by a security guru or two that look after centralised firewalls. The only variable is normally the “application”, so most of the ecosystem is governed with handful of mechanisms and policies. This model is to a degree applicable where organisations operate a “lifted-and-shifted” ecosystem in the public cloud.

The cloud challenge is introduced when comparing each AWS account and Virtual Private Cloud (VPC) as its own little data centre in the cloud – quickly expanding the footprint to monitor and protect.

The Cloud Operating Model is a key area of focus within the cloud adoption journey – it refers to the concept of rethinking the business and IT operating model in terms of people, process and technology to operate and deliver as efficiently as possible within the public cloud. This cloud operating model positions the delivery of “products and services through empowered teams”, rather than in a project-based approach with handoff to operations once the “software is complete”. The model below depicts a best practice team ownership structure that resonates with a modern cloud operating model – with the Application Engineering team (aka product team) being responsible for engineering and operations of both the applications and infrastructure. A “Cloud Platform Engineering” team then offers cross-functional support by setting standards and building reusable components. This AWS Whitepaper offers further detail on establishing a modern cloud operating model.

The cloud value proposition holds five promises to the cloud adopter, namely increased agility, elasticity, flexibility, and security at a lower price point. This value proposition is not always delivered in full as part of a lift-and-shift migration commonly undertaken. Some degree of cloud native transformation is often needed to deliver that promised extra cloud-mileage. An initial transformation (transform-and-forget) is a good start, however, to remain functional and cost efficient in the cloud, an ongoing continuous improvement lifecycle of product features, system improvements, and new technology adoption is required.

You may ask yourself, why continuous transformation is becoming more prevalent in the public cloud landscape compared to the on-premises world. Firstly, when using managed platform services, you are lightly nudged to keep current with supported versions, and at times forced (not always) to take the plunge and upgrade. Secondly, more performant or more cost-efficient services are often released, which may offer that missing link to put you ahead of the competition.

Gone are the days of sweating assets, licences and hoping that server in the corner the run the “special” month-end Cobol code makes it through the holidays.

Governance at scale challenges recap:-

  • An expanding attack surface, with each AWS account & VPC resembling a data centre
  • Decentralised teams and product approach rapidly multiplies this virtual data centre count
  • Fast paced, ever evolving technology landscape – that could differ vastly between each of the decentralised teams. Governance needs to adjust based on product and individual team demands (to a point of course)

The reason why a well-defined governance approach is paramount could be described in a storyline – depicted as follows:

A starting position (our baseline) – in the datacentre or operating a handful of AWS accounts (or VPCs):

The expanded cloud operations vision that is defined and approved by the architecture board.

The reality – operating diverse cloud native technologies, with decentralised and empowered product teams.

Nobody plans for this, however failing to start off with a well-defined governance strategy, organizations easily fall into this cloud LAN-party operating model.

Governance at scale focus areas

When operating more than just a handful of AWS accounts (and VPCs), the mechanisms and processes that support the secure and cost-effective operations of the public cloud footprint must be positioned front-and-centre to avoid chaos (our well-organised LAN party). Within a cloud virtual data centre, aspects like power and cooling are not a concern of the customer anymore, however ensuring that the AWS accounts are appropriately secured and costs continually monitored becomes important.

The focus areas for operating and governing your AWS landscape at scale could be divided into the following Account Management, Cost Management, and Security and Compliance. Automation is considered key within each of these focus areas to ensure effectiveness is maintained as the AWS landscape expands.

Let us explore each of the areas, diving into processes and AWS services that aid in establishing a cloud governance at scale capability.

Account Management

The area of account governance focus on efficiencies supporting the provisioning and day-to-day management of a multi-account landscape, delivered through standardisation and automation of maintenance activities. Account automation (and standardisation) is a key area, however within the context of a multi-account landscape, identity management stands out as a critical success factor.

Account Automation

AWS Organizations underpin our governance at scale capability. By using AWS Organizations, you can programmatically create new AWS accounts and allocate resources, group accounts (into Organizational Units), apply policies to accounts or account groups for governance, and simplify billing by using a single payment method across all accounts. Several other AWS services integrate with AWS Organizations to offer a comprehensive cross-account management function.

One of the most relevant services to deliver our governance at scale vision is AWS Control Tower – which will be addressed in more detail under the Security and Compliance section.

AWS Service Catalog combined with AWS CloudFormation brings forward standardisation and automation, ensuring each AWS account and the contained infrastructure is provisioned in accordance with organisational architecture standards and best-practice.

Identity and Access Management

A centralised identity provider (IdP) offers security and compliance across the account landscape by decreasing both the effort to manage staff authentication credentials, and at the same time reducing the risk of old-staff credentials floating around in an AWS account long after their last day at the office.

AWS Single Sign-On (SSO) offers a uniform mechanism to authenticate and assign roles (as PermissionSets) to users across the multi-account landscape. Identities are managed using the free provided AWS identity provider or using identity federation with existing identity providers (Microsoft AD) or Azure Active Directory. The use of AWS Single Sign-On offers a degree of automation by orchestrating PermissionSets (automated IAM roles) across all AWS accounts within the organization structure. This practice ensures user permissions (roles) are uniformly applied and centrally governed across tens to hundreds of accounts.

“LAN party” tips for Account Management

  • Have an account strategy that articulate how, and how many accounts will be issued to teams and or large workloads (applications)
  • Combine the account strategy with the security and compliance approach, implement isolated accounts for regulated workloads or sensitive data processing. See this previous post for details on a multi-account strategy

Cost Management

Our ability to allocate (budget) and accurately monitor cloud spend across a multi-account landscape is critical to the governance at scale vision. Setting a budget is one aspect of this vision, however managing budgets within a self-service ecosystem used, and at times abused by decentralised teams require automation to be effective. Within development or sandbox environments, it is common for team members to experiment with new services or optimisation techniques, with large cost implications sometimes making an appearance on the monthly invoice. It is not uncommon to hear of product teams racking up $20,000 in a month, instead of the average $2,000 due to a misconfiguration or misunderstanding on service pricing or service behaviour. Being able to communicate and escalate such a budget breach asap is critical.

AWS Budgets offers organizations an easy to use, cloud native service to efficiently track, communicate and enforce budgets. Cost governance is applied through multiple strategies, commonly with resource tagging as a prerequisite – however the use of finer-grained AWS account allocation and isolation often encapsulates budget enforcement much easier – i.e every product or department owns their own accounts for development, testing and production. It is easier to simply deny all actions in a single development AWS account that exceeds the budget set by the project sponsor than defining complex enforcement rules using resource-tagging policies.

Budget Communication and Enforcement

Defining an operational budget on a project sign-off document or annual budget is a start, however within a broad ecosystem of AWS accounts, this does not meet our governance requirement. Product teams or departments should have clear views on their operational budget allocations and for each AWS account they operate – this is especially important for sandbox or developer accounts.

AWS Budgets, operated from the AWS Organizations Management account offers several features to deliver on our governance at scale requirements. The strategy applied is setting an AWS budget for each AWS account, with several thresholds, three to four. A threshold approach for sandbox or developer accounts are depicted in the diagram below – roughly translated as follows: At 70% the team is notified, at 80% the product manager is notified. At 99% the budget is escalated to the sponsor, service director or relevant executive. A non-production account threshold could offer automated remediation actions that range from simple actions to completely stopping all EC2 or RDS infrastructure and preventing teams from launching new resources.

Budget automation

Within a self-service, at scale AWS account landscape, automating budget and threshold configuration is critical. As a best-practice, AWS Control Tower events orchestrate the configuration of default budgets thresholds to ensure costs are managed and escalated in the event where unsanctioned accounts are created.

“LAN party” tips for Cost Management

  • A tip for the wild-wild west (the real world), unless you are working with large budgets, understand that when taking drastic actions such as denying resource creation or stopping infrastructure, this impacts developer productivity. Be realistic with the budgets and keep drastic remediating actions to sandbox accounts.
  • Use “Cost Categories” within AWS Budgets and Cost Explorer to group AWS accounts for larger department or divisions. This offers budget management at both the account level and aggregated department or division level.

Security and Compliance

Governance is synonymous with security and operating at scale positions automation as a critical succusses factor for efficient management of the multi-account security landscape.

Maintaining a secure and compliant posture was once an arduous task, however several security, compliance, and vulnerability assessment services made its way to the AWS cloud native stage. Using the analogy introduced earlier – if each AWS account (&VPC) represents a mini data centre in the cloud, operating multiple AWS accounts demand a well-defined mechanism to centrally define and maintain the observation and enforcement policies.

To automate security and compliance requirements there are several AWS services at our disposal, each offering unique capabilities to address our governance concerns.

AWS Control Tower

At the heart of our governance at scale solution is AWS Control Tower, responsible for the deployment and configuration management of a best practice multi-account landscape (also known as a landing zone).

It integrates with or rather orchestrates AWS Organizations and AWS Config (and a few other services) to perform two major governance functions – namely security policy observation and security policy enforcement through automation – this is referred to as guardrails within the context of AWS Control Tower. Policy observation (detective guardrails) is commonly implemented with AWS Config rules, which evaluate resources against defined security policies and flag it as non-compliant should it breach a policy. Policy enforcement (preventative guardrails) employs Service Control Policies to restrict execution of defined actions.

Automation can elevate detective guardrails to policy enforcement points by executing automated remediation activities for critical non-compliant resources. An example of such an enforcement action is – once a security group is detected that allows Remote Desktop (port:3389) access from the internet (any IP-address), a remediation action is executed which removes the inbound security group rule.

The Control Tower account compliance view is highlighted below, offering the security team an aggregated view across the entire AWS account landscape – aiding in identifying non-compliant accounts and associated resources.

The screengrab below highlights how Control Tower presents a view of several preventative and detective guardrails and which of the guardrails are compliant or in violation.

Amazon GuardDuty

Amazon GuardDuty is a threat detection service that continuously monitors for malicious activity and unauthorised behaviour to protect AWS accounts, workloads, and data stored in Amazon S3. It provides multi-account support using AWS Organizations to enable GuardDuty across all existing and new accounts and offers the security team an aggregated view of findings across multiple accounts into a single administrative account for efficient management of threat observations.

AWS Security Hub

AWS Security Hub offers a comprehensive view of the security alerts and the overall security posture across an entire AWS multi-account landscape – offering a wealth of security policy datapoints in a single view, ranging from operating system, infrastructure, network, and AWS account vulnerabilities. Security Hub further offers an aggregated event stream via AWS EventBridge to support automation actions. It aggregates datapoints from multiple AWS services, such as Amazon GuardDuty, Amazon Inspector, Amazon Macie, AWS Identity and Access Management (IAM) Access Analyzer, AWS Systems Manager, and AWS Firewall Manager, as well as from AWS Partner Network (APN) solutions. Security Hub, run automated, continuous security checks based on industry standards and best practices, such as the Center for Internet Security (CIS) AWS Foundations Benchmark and Payment Card Industry Data Security Standard (PCI DSS).

At a glance, the capabilities of AWS Security Hub are depicted in the diagram below.

Not all services depicted is addressed in this blog. For more information on the services and full integration options for Security Hub, please refer to the AWS website.

The screengrab below depicts AWS Security Hub findings aggregated across the entire AWS multi-account landscape.

“LAN party” tips for Security and Compliance

  • Define and clearly communicate to relevant departments or teams which AWS regions are allowed to be used.
  • Have a process for AWS region exception management – this will come up from time to time when critical services are not available in the primary region of operation.
  • Ensure the cloud security policies include which AWS global services may be used, and how these global services may be used to maintain organisational or regulatory compliance. For Australian and New Zealand customers, using the ISM (IRAP) guidelines for PROTECTED workloads offer a great guideline. See AWS Artifact to access this document
  • Restrict or disable unused AWS regions – using Service Control Policies (newer regions introduced after 20 March 2019 are disabled by default)

Governance at scale – and in concert

The diagram above depicts a best-practice governance at scale implementation within AWS using AWS Control Tower. Not all services depicted is addressed in this blog. For more information on the services please refer to the AWS website. At a high-level, the core services are placed and integrated as follows:

AWS Control Tower publish events into Amazon EventBridge, used to orchestrate pre-defined standards on any new created account, setting default budgets out of the box, enabling GuardDuty, and setting specific Security Hub integrations as needed.

Amazon GuardDuty is <delegated> and configured within the Audit account, offering a centralised vulnerability management solution and security management dashboard.

AWS Security Hub is <delegated> as the administrator service within the Audit account. This configuration aggregates all security hub funding across the multi-account landscape into the Audit account.

AWS Service Catalog is centralised (and delegated) to the Shared Services account, with the Product Portfolios shared with the NonPRD, PrePRD and PRD Organizational Units (OUs) to offer teams approved and standardised resource “templates” to deploy within their product accounts.

Delegation of AWS services into the Audit and Shared Services accounts further improve the multi-account security posture. This approach ensures the operations, engineering, and security teams do not require privileged access to the AWS Management account to perform their day-to-day job functions.

Conclusion

Organizational agility at scale, whilst enjoying a secure and compliant AWS multi-account landscape is a reality that can be implemented today by any organization – small, large or enterprise. A major benefit of the governance at scale model described herein offers pay-per-use cost model. Which, depending on the number of security and compliance services enabled, will set you back about 1-4% of your AWS monthly bill.

AWS Control Tower – sample guardrails

The guardrails showcased below is a small list of the detective or preventative guardrails, implemented and managed by AWS Control Tower as Service Control Policies and AWS Config Rules.

Disallow Creation of Access Keys for the Root User Secures your AWS accounts by disallowing creation of access keys for the root user.
Enable Encryption for Amazon EBS Volumes Attached to Amazon EC2 Instances This guardrail detects whether encryption is enabled for Amazon EBS volumes attached to Amazon EC2 instances in your landing zone.
Disallow Internet Connection Through RDP & SSH This guardrail detects whether internet connections are enabled to Amazon EC2 instances through services like Remote Desktop Protocol (RDP)
Enable MFA for the Root User This guardrail detects whether multi-factor authentication (MFA) is enabled for the root user of the management account.
Disallow Public Read & Write Access to Amazon S3 Buckets This guardrail detects whether public read/write access is allowed to Amazon S3 buckets.
Disallow Public Access to Amazon RDS Database Instances Detects whether your Amazon RDS database instances have public access enabled. This guardrail does not change the status of the account.
Disallow Public Access to Amazon RDS Database Snapshots Detects whether your Amazon RDS database snapshots have public access enabled.
Disallow Amazon RDS Database Instances That Are Not Storage Encrypted Detects whether your Amazon RDS database instances are not encrypted at rest, along with their automated backups, Read Replicas, and snapshots.
Disallow Amazon EBS Volumes That Are Unattached to An Amazon EC2 Instance Detects whether an Amazon EBS volume persists independently from an Amazon EC2 instance. (cost focused control)
Disallow Amazon EC2Instance Types That Are Not Amazon EBS-Optimized Detects whether Amazon EC2 instances are launched without an Amazon EBS volume that is performance optimized. (cost/performance focused control)