Data Exploration with LLM
Data exploration is the first step in data analysis involving the use of data visualization tools and statistical techniques to uncover dataset characteristics and initial patterns. With emergence of LLMs like ChatGPT, we can harness the power of NLP to facilitate data Exploration. ChatGPT, have ability to understand and generate human like responses . LLMs can assist in exploring not only textual data but also other form of data such as numerical data or multimedia content. we can leverage ChatGPT’s capabilities to ask questions about statistical trends in numerical datasets or even it gives visualization for a given tasks.
Here, considering nyc_taxi_trip_duration dataset .
installing langchain, chatGPT and langchain_experimental
importing the required packages
assigning the openAI key
Prompts are the instructions and examples we provide to language models to steer their behavoir. Prompt templating refers to creating reusable templates for prompts that can configured with different parameters. LangChain provides tools to create prompt templates in Python. Templates allow prompts to be dynamically generated with variable input. we can create prompt template for data exploring
Agents are used in langChain to control the flow of execution of an application to interact with users, the environment and other agents. Agent initialize the connection between chatGPT and dataset.
Now we can query our agent against the data
we got the answer ‘ This dataset contains information about taxi trips. Which is correct.
we can do visualization on the data.
LLMs makes our analysis pretty easy and clear. Thanks for reading.