← Back to Blog
AI & ML24 May 2026·7 min read

How I Built an AI Chatbot with LangChain and React (What Actually Took Time)

A real account of building a production AI chatbot — the parts tutorials skip: streaming responses, memory management, and keeping costs under control.

LangChainReactAINode.jsOpenAI

Last year I was tasked with building an AI-powered customer support chatbot for a financial services client. Not a demo, not a weekend project — something that had to handle real users, stay within a monthly API budget, and not hallucinate account details.

I'd used OpenAI's API before for small scripts, but this was different. The client wanted conversation memory, document-grounded answers (their internal FAQs and policies), and a React frontend that streamed tokens like ChatGPT does — because "the typing effect makes it feel alive," as they put it.

I reached for LangChain. Here's what I actually learned.

The Part Where LangChain's Docs Will Confuse You

LangChain has gone through three major API changes in two years. If you're searching Stack Overflow or Medium for help, there's a 60% chance any code you find is for a version that no longer works that way. I lost two days to this.

The thing that finally helped me: go straight to the official LangChain.js docs, pick a version, and stick with it. Don't mix examples from different eras.

For this project I used LangChain v0.2 with the @langchain/openai package (not the old bundled one).

Setting Up the Backend

The core of it is a Route Handler in Next.js. Here's a stripped-down version of what I built:

// app/api/chat/route.js
import { ChatOpenAI } from "@langchain/openai";
import { ConversationChain } from "langchain/chains";
import { BufferMemory } from "langchain/memory";
import { StreamingTextResponse, LangChainStream } from "ai";

export async function POST(request) {
  const { message, sessionId } = await request.json();

  const { stream, handlers } = LangChainStream();

  const model = new ChatOpenAI({
    modelName: "gpt-4o-mini",   // cheaper than gpt-4, good enough for support
    streaming: true,
    callbacks: [handlers],
  });

  const memory = getSessionMemory(sessionId); // store these in Redis per session

  const chain = new ConversationChain({ llm: model, memory });

  chain.call({ input: message }).catch(console.error);

  return new StreamingTextResponse(stream);
}

Two things I want to call out here:

1. Model choice matters more than you think. GPT-4 was overkill for simple FAQ-style questions. I switched to gpt-4o-mini mid-project and the costs dropped by 80% with almost no quality difference for this use case.

2. Memory is the tricky part. BufferMemory works fine for a demo but it keeps growing — eventually you'll send 10,000 tokens of history with every message. In production I switched to BufferWindowMemory with a window of 8 turns, plus a summary for older context. Your users won't notice the difference.

The React Side — Streaming Without Losing Your Mind

Streaming is where most tutorials gloss over the hard parts. You get tokens one by one, and you need to append them to the UI without causing re-renders every 50ms.

I used the AI SDK from Vercel alongside LangChain:

import { useChat } from "ai/react";

export default function ChatWindow() {
  const { messages, input, handleInputChange, handleSubmit, isLoading } = useChat({
    api: "/api/chat",
  });

  return (
    <div className="chat-container">
      <div className="messages">
        {messages.map((m) => (
          <div key={m.id} className={m.role === "user" ? "user-msg" : "bot-msg"}>
            {m.content}
          </div>
        ))}
        {isLoading && <div className="typing-indicator">...</div>}
      </div>
      <form onSubmit={handleSubmit}>
        <input value={input} onChange={handleInputChange} placeholder="Ask something..." />
        <button type="submit">Send</button>
      </form>
    </div>
  );
}

The useChat hook handles the streaming for you. It maintains the message list, knows when a stream is active, and appends tokens as they arrive. Using it saved me probably a day of building this myself.

Document Grounding (RAG) — When You Need It

The chatbot also needed to answer based on specific company documents, not just general knowledge. This is called Retrieval Augmented Generation (RAG). The idea: you convert your documents into vector embeddings, store them, and at query time you retrieve the relevant chunks and inject them into the prompt.

I used Pinecone for the vector store:

import { OpenAIEmbeddings } from "@langchain/openai";
import { PineconeStore } from "@langchain/pinecone";
import { Pinecone } from "@pinecone-database/pinecone";

const pinecone = new Pinecone({ apiKey: process.env.PINECONE_API_KEY });
const index = pinecone.Index("support-docs");

const vectorStore = await PineconeStore.fromExistingIndex(
  new OpenAIEmbeddings(),
  { pineconeIndex: index }
);

// At query time:
const relevantDocs = await vectorStore.similaritySearch(userMessage, 3);

Then you inject those 3 chunks into the system prompt before the user's message. The model answers from those, not from its training data.

What I'd Do Differently

  • Start with the AI SDK first, add LangChain only when you need chains, agents, or RAG. For basic chat, the AI SDK alone is much simpler.
  • Log every conversation from day one. You'll want to see where the model fails.
  • Set hard token limits per session. One user managed to rack up $4 in API costs in a single conversation during testing because I forgot to cap it.

If you're building something similar and want to talk through the architecture, reach out. I've done this for several clients across fintech and e-commerce and there are patterns worth knowing before you start.

R
Md Refat Bhuyan
Full-Stack Developer & AI Engineer · Available for hire