PandasAI is a game-changing technology that is revolutionizing data analysis and machine learning. It is a free and open-source data manipulation and analysis package based on the Python programming language. The library includes a variety of tools for working with structured data, including data frames and series. PandasAI is extremely popular among data scientists and analysts because to its simplicity and adaptability.
Table of Contents
Python Pandas, as we all know, is an open-source toolkit that provides data manipulation and analysis capabilities for Python programming. This versatile library has become a must-have for data scientists and analysts.
With its basic yet powerful data structures such as Series and DataFrame, it provides an effective way to managing structured data.
PandasAI is often used in the preprocessing stage of machine learning and deep learning procedures in the field of artificial intelligence. Pandas aid in the translation of raw datasets into organized, ready-to-use forms that can be fed into AI algorithms by offering seamless data cleaning, reshaping, merging, and aggregation.
As a result, it is crucial in reducing data preparation time and speeding up the AI development process. I’m guessing that’s why “PandasAI” was created.
What is PandasAI

PandasAI is meant to be used in conjunction with Pandas. It turns Pandas into a conversational tool that allows you to ask questions about your data and receive answers in the form of Pandas DataFrames.
High-Level Workflow of Pandas AI
Pandas AI employs a high-level framework to create insights from inputted data. The workflow may be summarized as follows:
- Data Upload: The data is uploaded and turned into a Pandas dataframe, which is used as the input for the next procedures.
- Submit Relevant Questions: Users ask pertinent questions or provide prompts regarding the supplied data. These questions lead the analysis and aid in the generation of insights.
- Submit Prompts: Pandas AI responds by uploading the dataframe’s information (e.g., df.head()) and the input prompt to the LLM (Language Model) API.
- LLM API Response: The LLM API evaluates the request and returns Python code to be run on the input data. The code is particularly written to respond to the user’s inquiry or question.
- Code Execution and Evaluation: The Python code obtained from the LLM API is executed on the input data by the system. Based on the user’s request, this function computes and analyzes. The outcomes of executing the code are then assessed.
- Conversion to Conversation Format:The code execution answer is transformed into a conversational manner. This transformation allows for a more engaging and intuitive engagement between the user and the data, allowing for an improved understanding of the insights.
- Return of Response: The transformed response, given in the form of a dialogue, is returned to the user. This offers them with insights and solutions to their inquiries based on the analysis of the submitted data.
The steps above define the high-level workflow implemented in Pandas AI, which makes use of LLMs and Pandas dataframes to ease data analysis and create significant insights.
Enhancing Conversational AI with Pandas AI API
The Pandas AI API supports a variety of models, with the possibility of more models being introduced in the future. The following models are now supported:
- ChatGPT by OpenAI: OpenAI created ChatGPT, a conversational AI model. It is intended to create human-like replies and engage users in interactive dialogues.
- StarCoder by Huggingface: Huggingface, an established supplier of cutting-edge natural language processing models, created StarCoder. StarCoder is intended to help users with code-related issues and to give suitable programming solutions.
- Azure ChatGPT API: Azure ChatGPT is a Microsoft Azure-enabled variation of OpenAI’s ChatGPT concept. It has comparable conversational capabilities and may be used to create interactive apps.
- OpenAI Assistant: OpenAI Assistant is an advanced language model created by OpenAI. It has been trained to do a wide range of linguistic activities, including as answering questions, delivering explanations, and conversing.
- Google PaLM: Google PaLM (Path Language Model) is a Google language model. It is intended to aid with natural language interpretation and generating tasks by allowing users to converse with the model.
These models may be utilized in a conversational fashion, allowing users to communicate with them interactively. To make running these models easier, the course includes a Google Colab Notebook, which provides an intuitive onboarding experience and streamlines the setup process. Users may run the supported models smoothly using the included Google Colab Notebook, boosting the interactive experience and facilitating efficient data analysis.
Installation
pip install pandasai
Now import the dependencies:
import pandas as pd
from pandasai import PandasAI
from pandasai.llm.openai import OpenAI
We create a dataframe using pandas:
You may ask PandasAI to discover all the rows in a DataFrame with a column value greater than 5, and it will return a DataFrame containing just those rows.
df = pd.DataFrame({
"country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"],
"gdp": [21400000, 2940000, 2830000, 3870000, 2160000, 1350000, 1780000, 1320000, 516000, 14000000],
"happiness_index": [7.3, 7.2, 6.5, 7.0, 6.0, 6.3, 7.3, 7.3, 5.9, 5.0]
})
set up the llm (in this case, OpenAI). Make sure to replace the API key with your OpenAI API key.
However, in order to use this new library on the market, you will need an OpenAI key, and each request will require you to pay a small cost using your OpenAI key.
OPENAI_API_KEY = "YOUR API KEY"
llm = OpenAI(api_token=OPENAI_API_KEY)
Then we instantiate Pandas AI with the provided large language model and we run it, passing the data frame and the prompt.
pandas_ai = PandasAI(llm)
pandas_ai.run(df, prompt='Which are the 5 happiest countries?')
the top 5 happiest countries are the United States, Canada, Australia, United Kingdom, and Germany.
So, for those who are unfamiliar with Python or pandas manipulations/transformations, this is a new way of programming with dataframes.
Consider a universe in which, instead of programming the work at hand, you virtually converse with the machine and tell it what you want the outcome to be. The computer will convert this message into machine-readable code and provide the output to you.
You can also show a chart, for example:
pandas_ai.run(df, "Plot the histogram of countries showing for each the gpd, using different colors for each bar")

This article helps you learn about Pandas AI. We trust that it has been helpful to you. Please feel free to share your thoughts and feedback in the comment section below.