Chat with an Excel dataset with OpenAI and LangChain
In the article, I take you through how you can talk to any .csv dataset using LangChain and OpenAI api, in just about 10 lines of code.
Okay, step 1 is to get an OpenAI API key. You probably need to set up billing as well. You can use the link: https://platform.openai.com/account/api-keys
Once that is sorted, make sure you install langchain, openai, chromadb and tiktoken python libraries.
!pip install -q langchain openai chromadb tiktoken
Great. In the next code snippet, I load the libraries, API key (use the one with the one you create), and a .csv dataset. For now, make sure the dataset is not too large. Anything above 2000 rows and you get wonky results. We are still in the early stages, do not worry, we will soon be able to get past this barrier.
from langchain.document_loaders import CSVLoader
from langchain.indexes import VectorstoreIndexCreator
from langchain.chains import RetrievalQA
from langchain.llms import OpenAI
import pandas as pd
import os
os.environ["OPENAI_API_KEY"] = "sk-xxxx"
# Load the dataset
# link to download data I used: https://www.kaggle.com/datasets/ashishraut64/indian-startups-top-300?resource=download
loader = CSVLoader(file_path='Startups1.csv')
Next, we load this dataset to create indices for this document. LangChain uses ChromaDB here, since we have not specified any particular vector database. In simple terms, vectorising this dataset, puts it in a format…