Feature request
The provided examples that leverage LangChain to create a representation all make use of langchain.chains.question_answering.load_qa_chain and the implementation is not very transparent to the user, leading to inconsistencies and difficulties to understand how to provide custom chains.
Motivation
Some of the issues in detail
langchain.chains.question_answering.load_qa_chain is now depricated and will be removed at some point.
- The current LangChain integration is not very clear because
- a
prompt can be specified in the constructor of the LangChain class. However this is not a prompt but rather a custom instruction that is passed to the provided chain through the question key.
- in the case of
langchain.chains.question_answering.load_qa_chain (which is the provided example), this question key is added as part of a larger, hard-coded (and not transparent to a casual user) prompt.
- if a user wants to fully customize the instructions to create the representation, it would be best not to use the
langchain.chains.question_answering.load_qa_chain chain to avoid this hard-coded prompt (this is currently not very clearly documented). In addition, if that specific chain is not used, the use of a question key can be confusing.
- the approach to add keywords in the prompt (by adding
"[KEYWORDS]" in self.prompt and then performing some string manipulation) is confusing.
- Some imports to LangChain are outdated (e.g. Documents, OpenAI).
Example of workarounds in current implementation
With the current implementation, a user wanting to use a custom LangChain prompt in a custom LCEL chain and add keywords to that prompt would have to do something like (ignoring that documents are passed as Document objects and not formatted into a str).
from bertopic.representation import LangChain
from langchain_core.prompts import ChatPromptTemplate
custom_prompt = ChatPromptTemplate.from_messages(
[
("system", "Custom instructions."),
("human", "Documents: {input_documents}, Keywords: {question}"),
]
)
chain = some_custom_chain_with_above_prompt
representation_model = LangChain(chain, prompt="[KEYWORDS]")
Related issues:
Your contribution
I propose several changes, which I have started working on in a branch (made a PR to make the diff easy to see).
- Update the examples so that
langchain.chains.question_answering.load_qa_chain is replaced by langchain.chains.combine_documents.stuff.create_stuff_documents_chain as recommended in the migration guide.
- This new approach still takes care of formatting the Document objects into the prompt, but the prompt must now be specified explicitly (instead of the implicit, hard-coded prompt of
langchain.chains.question_answering.load_qa_chain).
- Remove the ability to provide a prompt in the constructor of
LangChain as the prompt must now be explicitly created with the chain object.
- Rename the keys for consistency to
documents, keywords, and representation (note that langchain.chains.combine_documents.stuff.create_stuff_documents_chain does not have a output_text output key and the representation key must thus be added).
- Make it so that the
keywords key is always provided to the chain (but it's up to the user to include a placeholder for it in their prompt).
Questions:
- Should we provide a new example prompt to replace
DEFAULT_PROMPT? For example
EXAMPLE_PROMPT = "What are these documents about? {documents}. Here are some keywords about them {keywords} Please give a single label."
however it could only be used directly in langchain.chains.combine_documents.stuff.create_stuff_documents_chain which takes care of formatting the documents.
Feature request
The provided examples that leverage LangChain to create a representation all make use of
langchain.chains.question_answering.load_qa_chainand the implementation is not very transparent to the user, leading to inconsistencies and difficulties to understand how to provide custom chains.Motivation
Some of the issues in detail
langchain.chains.question_answering.load_qa_chainis now depricated and will be removed at some point.promptcan be specified in the constructor of theLangChainclass. However this is not a prompt but rather a custom instruction that is passed to the provided chain through thequestionkey.langchain.chains.question_answering.load_qa_chain(which is the provided example), thisquestionkey is added as part of a larger, hard-coded (and not transparent to a casual user) prompt.langchain.chains.question_answering.load_qa_chainchain to avoid this hard-coded prompt (this is currently not very clearly documented). In addition, if that specific chain is not used, the use of aquestionkey can be confusing."[KEYWORDS]"inself.promptand then performing some string manipulation) is confusing.Example of workarounds in current implementation
With the current implementation, a user wanting to use a custom LangChain prompt in a custom LCEL chain and add keywords to that prompt would have to do something like (ignoring that documents are passed as Document objects and not formatted into a str).
Related issues:
Your contribution
I propose several changes, which I have started working on in a branch (made a PR to make the diff easy to see).
langchain.chains.question_answering.load_qa_chainis replaced bylangchain.chains.combine_documents.stuff.create_stuff_documents_chainas recommended in the migration guide.langchain.chains.question_answering.load_qa_chain).LangChainas the prompt must now be explicitly created with the chain object.documents,keywords, andrepresentation(note thatlangchain.chains.combine_documents.stuff.create_stuff_documents_chaindoes not have aoutput_textoutput key and therepresentationkey must thus be added).keywordskey is always provided to the chain (but it's up to the user to include a placeholder for it in their prompt).Questions:
DEFAULT_PROMPT? For examplelangchain.chains.combine_documents.stuff.create_stuff_documents_chainwhich takes care of formatting the documents.