visit
In this blog post series, I would like to discuss best practices for building multi-tenant services in AWS. Existing literature on how to build multi-tenant services is usually aimed at SaaS applications with hundreds of customers ( e.g. ).
I will split the series of blog posts into three parts for each type of service-to-service integration: synchronous, asynchronous, and batch integration.
Multi-tenancy for internal services
1.1. Tenant isolation 1.2. Multi-tenant monitoring 1.3. ScalingMulti-tenancy for internal services
2.1. Tenant-isolation - acess-control 2.2 Tenant-isolation - noisy neighbor problem 2.3 Multi-tenant monitoring 2.4 Metrics, Alarms, Dashboards 2.5 Onboarding and offboarding API clientsMulti-tenancy with AWS AppSync
Conclusion
Multi-tenancy is the ability of software to serve multiple customers or tenants with a single instance of the software.
Once you allow more than one team to call your service API, your service becomes multi-tenant. Multi-tenant architecture introduces additional complexity to your services, such as tenant isolation, tenant-level monitoring, and scaling.
If you are building your AWS web service with REST, HTTP, or WebSocket API in AWS you are most likely using API Gateway.
Onboarding a client with resource-based authorization. For resource-based access, you need to update the API Gateway Resource Policy and add the AWS Account of your client. The main disadvantage of this method is that once you update the resource policy, the API Gateway stage needs to be redeployed for changes to take effect (see AWS docs and ). However, if you use CDK you can automate the deployment of new stages (see ). Another disadvantage is the limit for the maximum length of resource policy.
Onboarding a client with identity-based authorization. For identity-based access control, you need to create an IAM role for the client and allow the client to assume it by updating the role’s resource policy (trusted relationships). You could use IAM users, but IAM roles are better from the security point of view. Roles allow authentication with temporary credentials and do not require storing IAM user credentials. There is a limit of 1,000 roles per account, but this limit is adjustable. Plus, another disadvantage of the role-based method for getting cross-account access to your API is that you need to create an IAM role for every new API client. However, role management can be automated with CDK (see ).
AWS IAM authorization only allows you to control access to the API Gateway (using IAM policy you can specify what AWS account can call what API Gateway endpoints). It’s your responsibility to implement control access to the data and other underlying resources of your service. Within your service, you can use the AWS IAM ARN of the caller that is passed with API Gateway Request for further access control:
export const handler = async (event: APIGatewayEvent, context: Context): Promise<APIGatewayProxyResult> => {
// IAM Principal ARN of the api caller
const callerArn = event.requestContext.identity.userArn!;
// .. business logic based on caller
return {
statusCode: 200,
body: JSON.stringify({
message: `Received API Call from ${callerArn}`,
})
};
};
API Gateway has two types of logs:
To monitor the requests of your API clients, I would recommend enabling access logging. You can log at very least AWS IAM ARN of the caller ($context.identity.userArn
), the request path ($context.path
) , your service response status code $context.status
and API call latency ( $context.responseLatency
).
const formatObject = {
requestId: '$context.requestId',
extendedRequestId: '$context.extendedRequestId',
apiId: '$context.apiId',
resourceId: '$context.resourceId',
domainName: '$context.domainName',
stage: '$context.stage',
path: '$context.path',
resourcePath: '$context.resourcePath',
httpMethod: '$context.httpMethod',
protocol: '$context.protocol',
accountId: '$context.identity.accountId',
sourceIp: '$context.identity.sourceIp',
user: '$context.identity.user',
userAgent: '$context.identity.userAgent',
userArn: '$context.identity.userArn',
caller: '$context.identity.caller',
cognitoIdentityId: '$context.identity.cognitoIdentityId',
status: '$context.status',
integration: {
// The status code returned from an integration. For Lambda proxy integrations, this is the status code that your Lambda function code returns.
status: '$context.integration.status',
// For Lambda proxy integration, the status code returned from AWS Lambda, not from the backend Lambda function code.
integrationStatus: '$context.integration.integrationStatus',
// The error message returned from an integration
// A string that contains an integration error message.
error: '$context.integration.error',
latency: '$context.integration.latency',
},
error: {
responseType: '$context.error.responseType',
message: '$context.error.message',
},
requestTime: '$context.requestTime',
responseLength: '$context.responseLength',
responseLatency: '$context.responseLatency',
};
const accessLogFormatString = JSON.stringify(formatObject);
const accessLogFormat = apigw.AccessLogFormat.custom(accessLogFormatString);
fields @timestamp, path, status, responseLatency, userArn
| sort @timestamp desc
| filter userArn like 'payment-service'
| limit 20
CloudWatch Metrics supported by API Gateway by default are aggregated for all requests. But you can parse API Gateway access logs to publish custom CloudWatch metrics with an additional dimension of your client name to be able to monitor client (tenant) usage of your API. At the very minimum, I would recommend publishing per-client CloudWatch metrics Count, 4xx, 5xx, Latency split by Dimension=${Client}
. You could also add dimensions like status code and API path.
2.4.1. Using metric log filters for publishing per-client metrics
Example of CloudWatch Metric Log Filter to Publish Count
with dimension Client
and Path
new logs.MetricFilter(this, 'MultiTenantApiCountMetricFilter', {
logGroup: accessLogsGroup,
filterPattern: logs.FilterPattern.exists('$.userArn'),
metricNamespace: metricNamespace,
metricName: 'Count',
metricValue: '1',
unit: cloudwatch.Unit.COUNT,
dimensions: {
client: '$.userArn',
method: '$.httpMethod',
path: '$.path',},});
});
2.4.2. Using Lambda function for publishing per-client metrics
The alternative option is to create a Lambda function to parse the logs, extract metrics and publish them. This allows you do more custom stuff like filtering out unknown clients or extract client name from the userArn.
const logProcessingFunction = new lambda.NodejsFunction(
this,
'log-processor-function',
{
functionName: 'multi-tenant-api-log-processor-function',
}
);
new logs.SubscriptionFilter(this, 'MultiTenantApiLogSubscriptionFilter', {
logGroup: accessLogsGroup,
destination: new logsd.LambdaDestination(logProcessingFunction),
filterPattern: logs.FilterPattern.allEvents(),
});
interface ApiClientConfig {
name: string;
awsAccounts: string[];
rateLimit: number;
burstLimit: number;
}
const apiClients: ApiClientConfig[] = [
{
name: 'payment-service',
awsAccounts: ['3','444455556666'],
rateLimit: 10,
burstLimit: 2,
},
{
name: 'order-service',
awsAccounts: ['777788889999'],
rateLimit: 1,
burstLimit: 1,
},
];
If your service has a GraphQL API you probably use AppSync. Similarly to API Gateway, you can use IAM Auth to authorize AppSync requests. AppSync does not have a resource policy (see ), so you can only use a role-based authorization for setting up access control to AppSync API. Similarly to API Gateway, you would create a separate IAM role for every new tenant of your service.