ChatBedrock
Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like
AI21 Labs
,Anthropic
,Cohere
,Meta
,Stability AI
, andAmazon
via a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI. UsingAmazon Bedrock
, you can easily experiment with and evaluate top FMs for your use case, privately customize them with your data using techniques such as fine-tuning andRetrieval Augmented Generation
(RAG
), and build agents that execute tasks using your enterprise systems and data sources. SinceAmazon Bedrock
is serverless, you don't have to manage any infrastructure, and you can securely integrate and deploy generative AI capabilities into your applications using the AWS services you are already familiar with.
%pip install --upgrade --quiet langchain-aws
Note: you may need to restart the kernel to use updated packages.
from langchain_aws import ChatBedrock
from langchain_core.messages import HumanMessage
chat = ChatBedrock(
model_id="anthropic.claude-3-sonnet-20240229-v1:0",
model_kwargs={"temperature": 0.1},
)
messages = [
HumanMessage(
content="Translate this sentence from English to French. I love programming."
)
]
chat.invoke(messages)
AIMessage(content="Voici la traduction en franรงais :\n\nJ'aime la programmation.", additional_kwargs={'usage': {'prompt_tokens': 20, 'completion_tokens': 21, 'total_tokens': 41}}, response_metadata={'model_id': 'anthropic.claude-3-sonnet-20240229-v1:0', 'usage': {'prompt_tokens': 20, 'completion_tokens': 21, 'total_tokens': 41}}, id='run-994f0362-0e50-4524-afad-3c4f5bb11328-0')
Streamingโ
To stream responses, you can use the runnable .stream()
method.
for chunk in chat.stream(messages):
print(chunk.content, end="", flush=True)
Voici la traduction en franรงais :
J'aime la programmation.
LLM Caching with OpenSearch Semantic Cacheโ
Use OpenSearch as a semantic cache to cache prompts and responses and evaluate hits based on semantic similarity.
from langchain.globals import set_llm_cache
from langchain_aws import ChatBedrock
from langchain_community.cache import OpenSearchSemanticCache
from langchain_community.embeddings import BedrockEmbeddings
from langchain_core.messages import HumanMessage
bedrock_embeddings = BedrockEmbeddings(
model_id="amazon.titan-embed-text-v1", region_name="us-east-1"
)
chat = ChatBedrock(
model_id="anthropic.claude-3-haiku-20240307-v1:0", model_kwargs={"temperature": 0.5}
)
# Enable LLM cache. Make sure opensearch is setup and running. Update Url accordingly.
set_llm_cache(
OpenSearchSemanticCache(
opensearch_url="http://localhost:9200", embedding=bedrock_embeddings
)
)
%%time
# The first time, it is not yet in cache, so it should take longer
messages = [HumanMessage(content="tell me about Amazon Bedrock")]
response_text = chat.invoke(messages)
print(response_text)
%%time
# The second time, while not a direct hit, the question is semantically similar to the original question,
# so it uses the cached result!
messages = [HumanMessage(content="tell me about Amazon Bedrock")]
response_text = chat.invoke(messages)
print(response_text)