visit
FileSystem:
Type: AWS::EFS::FileSystem
Properties:
PerformanceMode: generalPurpose
FileSystemTags:
- Key: Name
Value: fs-pylibs
MountTargetA:
Type: AWS::EFS::MountTarget
Properties:
FileSystemId:
Ref: FileSystem
SubnetId: "{{resolve:ssm:/root/defaultVPC/subsetA:1}}"
SecurityGroups:
- "{{resolve:ssm:/root/defaultVPC/securityGroup:1}}"
MountTargetB:
Type: AWS::EFS::MountTarget
Properties:
FileSystemId:
Ref: FileSystem
SubnetId: "{{resolve:ssm:/root/defaultVPC/subsetB:1}}"
SecurityGroups:
- "{{resolve:ssm:/root/defaultVPC/securityGroup:1}}"
AccessPointResource:
Type: "AWS::EFS::AccessPoint"
DependsOn: FileSystem
Properties:
FileSystemId: !Ref FileSystem
PosixUser:
Uid: "1000"
Gid: "1000"
RootDirectory:
CreationInfo:
OwnerGid: "1000"
OwnerUid: "1000"
Permissions: "0777"
Path: "/py-libs"
Note that we will use EFS General Purpose performance mode since it has lower latency than Max I/O.
We will mount EFS on Amazon SageMaker on a SageMaker notebook and Install PyTorch and ConvAI model on EFS.
The notebook instance must have access to the same security group and reside in the same VPC as the EFS file system.Let’s mount EFS path
/py-libs
to /home/ec2-user/SageMaker/libs
directory:%%sh
mkdir -p libs
FILE_SYS_ID=fs-xxxxxx
sudo mount -t nfs \
-o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 \
$FILE_SYS_ID.efs.ap-southeast-2.amazonaws.com:/ \
libs
cd libs && sudo mkdir -p py-libs
cd .. && sudo umount -l /home/ec2-user/SageMaker/libs
sudo mount -t nfs \
-o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 \
$FILE_SYS_ID.efs.ap-southeast-2.amazonaws.com:/py-libs \
libs
!sudo pip --no-cache-dir install torch -t libs/py-libs
!sudo pip --no-cache-dir install torchvision -t libs/py-libs
!sudo pip --no-cache-dir install simpletransformers -t libs/py-libs
Once we have all packages installed, download provided by Hugging Face, then extract the archive to
convai-model
directory on EFS.!sudo wget //s3.amazonaws.com/models.huggingface.co/transfer-learning-chatbot/gpt_personachat_cache.tar.gz
!sudo tar -xvf gpt_personachat_cache.tar.gz -C libs/convai-model
!sudo chmod -R g+rw libs/convai-model
We are now ready to talk to the pre-trained model, simply call
model.interact()
The provided by Hugging Face performs well out-of-the-box and will likely require less fine-tuning when creating chatbot.Create a DialogHistory table to store dialog history with at least the last utterance from user. We can use sample CloudFormation templates to configure the DynamoDB table.
Please note that We have to create a even though the Lambda function is running inside a public subnet of a VPC.
We will use AWS SAM to create Lambda functions and mount EFS access points to Lambda function.
First, create a Lambda function resource, then setup EFS File System for Lambda. Make sure that EFS and Lambda are in the same VPC:HelloFunction:
Type: AWS::Serverless::Function
DependsOn:
- LibAccessPointResource
Properties:
Environment:
Variables:
CHAT_HISTORY_TABLE: !Ref TableName
Role: !GetAtt LambdaRole.Arn
CodeUri: src/
Handler: api.lambda_handler
Runtime: python3.6
FileSystemConfigs:
- Arn: !GetAtt LibAccessPointResource.Arn
LocalMountPath: "/mnt/libs"
VpcConfig:
SecurityGroupIds:
- "{{resolve:ssm:/root/defaultVPC/securityGroup:1}}"
SubnetIds:
- "{{resolve:ssm:/root/defaultVPC/subsetA:1}}"
- "{{resolve:ssm:/root/defaultVPC/subsetB:1}}"
LambdaRole:
Type: AWS::IAM::Role
Properties:
RoleName: "efsAPILambdaRole"
AssumeRolePolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: "Allow"
Principal:
Service:
- "lambda.amazonaws.com"
Action:
- "sts:AssumeRole"
ManagedPolicyArns:
- "arn:aws:iam::aws:policy/AWSLambdaExecute"
- "arn:aws:iam::aws:policy/service-role/AWSLambdaVPCAccessExecutionRole"
- "arn:aws:iam::aws:policy/AmazonElasticFileSystemClientFullAccess"
Policies:
- PolicyName: "efsAPIRoleDBAccess"
PolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: Allow
Action:
- "dynamodb:PutItem"
- "dynamodb:GetItem"
- "dynamodb:UpdateItem"
- "dynamodb:DeleteItem"
- "dynamodb:Query"
- "dynamodb:Scan"
Resource:
- !GetAtt ChatHistory.Arn
- Fn::Join:
- "/"
- - !GetAtt ChatHistory.Arn
- "*"
- Effect: Allow
Action:
- "ssm:GetParameter*"
Resource:
- !Sub "arn:${AWS::Partition}:ssm:${AWS::Region}:${AWS::AccountId}:parameter/root/defaultVPC*"
Adding the conversation engine: AWS Lambda
In this section, we will create a Lambda function for communication between users and conversation AI model.We will contain the following source code in
src/api.py
:import json
import logging
import sys
import boto3
import random
import os
sys.path.insert(1, '/mnt/libs/py-libs')
import torch
import torch.nn.functional as F
from simpletransformers.conv_ai.conv_ai_utils import get_dataset
from simpletransformers.conv_ai import ConvAIModel
def get_chat_history(userid):
response = dynamodb.get_item(TableName=TABLE_NAME, Key={
'userid': {
'S': userid
}})
if 'Item' in response:
return json.loads(response["Item"]["history"]["S"])
return {"history": []}
def save_chat_history(userid, history):
return dynamodb.put_item(TableName=TABLE_NAME, Item={'userid': {'S': userid}, 'history': {'S': history}})
def lambda_handler(event, context):
try:
userid = event['userid']
message = event['message']
history = get_chat_history(userid)
history = history["history"]
response_msg = interact(message, convAimodel,
character, userid, history)
return {
'message': json.dumps(response_msg)
}
except Exception as ex:
logging.exception(ex)
Note that library allows us to interact with the models locally with input(). To build our chat engine, we need to override the default method interact and
sample_sequenc
in conv_ai
:def sample_sequence(aiCls, personality, history, tokenizer, model, args, current_output=None):
special_tokens_ids = tokenizer.convert_tokens_to_ids(SPECIAL_TOKENS)
if current_output is None:
current_output = []
for i in range(args["max_length"]):
instance = aiCls.build_input_from_segments(
personality, history, current_output, tokenizer, with_eos=False)
input_ids = torch.tensor(
instance["input_ids"], device=aiCls.device).unsqueeze(0)
token_type_ids = torch.tensor(
instance["token_type_ids"], device=aiCls.device).unsqueeze(0)
logits = model(input_ids, token_type_ids=token_type_ids)
if isinstance(logits, tuple): # for gpt2 and maybe others
logits = logits[0]
logits = logits[0, -1, :] / args["temperature"]
logits = aiCls.top_filtering(
logits, top_k=args["top_k"], top_p=args["top_p"])
probs = F.softmax(logits, dim=-1)
prev = torch.topk(probs, 1)[
1] if args["no_sample"] else torch.multinomial(probs, 1)
if i < args["min_length"] and prev.item() in special_tokens_ids:
while prev.item() in special_tokens_ids:
if probs.max().item() == 1:
logging.warn(
"Warning: model generating special token with probability 1.")
break # avoid infinitely looping over special token
prev = torch.multinomial(probs, num_samples=1)
if prev.item() in special_tokens_ids:
break
current_output.append(prev.item())
return current_output
def interact(raw_text, model, personality, userid, history):
args = model.args
tokenizer = model.tokenizer
process_count = model.args["process_count"]
model._move_model_to_device()
if not personality:
dataset = get_dataset(
tokenizer,
None,
args["cache_dir"],
process_count=process_count,
proxies=model.__dict__.get("proxies", None),
interact=True,
)
personalities = [dialog["personality"]
for dataset in dataset.values() for dialog in dataset]
personality = random.choice(personalities)
else:
personality = [tokenizer.encode(s.lower()) for s in personality]
history.append(tokenizer.encode(raw_text))
with torch.no_grad():
out_ids = sample_sequence(
model, personality, history, tokenizer, model.model, args)
history.append(out_ids)
history = history[-(2 * args["max_history"] + 1):]
out_text = tokenizer.decode(out_ids, skip_special_tokens=True)
save_chat_history(userid, json.dumps({"history": history}))
return out_text
$aws lambda invoke --function-name "chat-efs-api-HelloFunction-KQSNKF5K0IY8" out --log-type Tail \--query 'LogResult' --output text | base64 -d
>>hi there
how are you?
>>good, thank you
what do you like to do for fun?
>>I like reading, yourself?
i like to listen to classical music
......
However, I am aware of the impact of cold starts on response times. The first request took ~30 secs for cold starts to complete. To prevent cold start in our Lambda functions, we can use to keep functions warm: