What technologies does Refat Bhuyan specialise in?

Refat specialises in full-stack JavaScript: React, Next.js, Node.js, Express, and MongoDB (MERN stack). On the AI side he works with LangChain, OpenAI GPT-4o, RAG pipelines, and Pinecone. For cloud he deploys on AWS, Azure, GCP, and Vercel. He also builds MCP servers for AI agent tooling.

Is Refat Bhuyan available for hire or freelance projects?

Yes. Refat is open to remote full-time roles, senior contract work (1–6 months), and founding engineer conversations for early-stage products. He works with clients in the UK, US, EU, UAE, Australia, and Singapore. His timezone (GMT+6) overlaps well with UAE (GMT+4), UK (GMT+0/+1), and Australian mornings.

Can Refat Bhuyan build AI-powered applications using LangChain or OpenAI?

Yes. Refat has built production AI systems including a document-grounded customer support chatbot for a UK fintech client using LangChain, GPT-4o-mini, and a Pinecone vector store.

Does Refat Bhuyan work with international clients remotely?

Yes. Refat has been working remotely with international clients for over two years, currently full-time with Cunard Consulting Ltd in the UK. He is available for overlap calls with UK and US timezones.

What is Refat Bhuyan's typical project budget range?

Projects range from $500 for focused API integrations up to $5,000+ for full-stack SaaS builds. For ongoing retainer or contract work rates are discussed based on scope.

Can Refat Bhuyan rescue an AI project that is broken or stuck?

Yes — AI project rescue is one of the most common requests Refat handles. Many clients built 70-80% of a product using tools like Cursor, Claude Code, or ChatGPT and then hit walls: broken RAG pipelines, LangChain hallucinations, apps that work locally but fail in production. Refat audits the codebase, identifies root causes, re-architects where needed, and ships a working, production-ready system. Most rescues are completed in 1–4 weeks.

Does Refat Bhuyan build websites for small and local businesses?

Yes. Refat has helped local businesses — restaurants, retail shops, clinics, and service providers — launch and grow online. One example is EcoEats, a local food delivery business that grew from zero to over 150,000 customers after Refat built their full-stack platform with online ordering, SEO, and an AI-powered chatbot. Services include website builds, online booking/ordering systems, Google-optimised SEO, and 24/7 AI chatbots.

What is context engineering and does Refat Bhuyan offer it?

Context engineering is the practice of designing the full information context provided to AI models — including system prompts, retrieval strategies, tool definitions, memory architecture, and conversation structure — to maximise AI reliability and performance. It goes far beyond basic prompt engineering. Refat applies context engineering when building RAG systems, AI agents, MCP servers, and LangChain-based applications. He is available for consulting and implementation.

Can Refat Bhuyan fix or finish a half-built web application?

Yes. Project rescue — taking over incomplete, broken, or poorly built applications — is a core service. Refat performs a rapid code audit (usually within 48 hours), identifies the issues, proposes a fix plan, and executes it. He has rescued Next.js apps, Node.js APIs, React frontends, and AI integrations. A previous developer vanishing or going silent is a common starting point.

AI & ML24 May 2026·7 min read

How I Built an AI Chatbot with LangChain and React (What Actually Took Time)

A real account of building a production AI chatbot — the parts tutorials skip: streaming responses, memory management, and keeping costs under control.

LangChainReactAINode.jsOpenAI

Last year I was tasked with building an AI-powered customer support chatbot for a financial services client. Not a demo, not a weekend project — something that had to handle real users, stay within a monthly API budget, and not hallucinate account details.

I'd used OpenAI's API before for small scripts, but this was different. The client wanted conversation memory, document-grounded answers (their internal FAQs and policies), and a React frontend that streamed tokens like ChatGPT does — because "the typing effect makes it feel alive," as they put it.

I reached for LangChain. Here's what I actually learned.

The Part Where LangChain's Docs Will Confuse You

LangChain has gone through three major API changes in two years. If you're searching Stack Overflow or Medium for help, there's a 60% chance any code you find is for a version that no longer works that way. I lost two days to this.

The thing that finally helped me: go straight to the official LangChain.js docs, pick a version, and stick with it. Don't mix examples from different eras.

For this project I used LangChain v0.2 with the @langchain/openai package (not the old bundled one).

Setting Up the Backend

The core of it is a Route Handler in Next.js. Here's a stripped-down version of what I built:

// app/api/chat/route.js
import { ChatOpenAI } from "@langchain/openai";
import { ConversationChain } from "langchain/chains";
import { BufferMemory } from "langchain/memory";
import { StreamingTextResponse, LangChainStream } from "ai";

export async function POST(request) {
  const { message, sessionId } = await request.json();

  const { stream, handlers } = LangChainStream();

  const model = new ChatOpenAI({
    modelName: "gpt-4o-mini",   // cheaper than gpt-4, good enough for support
    streaming: true,
    callbacks: [handlers],
  });

  const memory = getSessionMemory(sessionId); // store these in Redis per session

  const chain = new ConversationChain({ llm: model, memory });

  chain.call({ input: message }).catch(console.error);

  return new StreamingTextResponse(stream);
}

Two things I want to call out here:

1. Model choice matters more than you think. GPT-4 was overkill for simple FAQ-style questions. I switched to gpt-4o-mini mid-project and the costs dropped by 80% with almost no quality difference for this use case.

2. Memory is the tricky part. BufferMemory works fine for a demo but it keeps growing — eventually you'll send 10,000 tokens of history with every message. In production I switched to BufferWindowMemory with a window of 8 turns, plus a summary for older context. Your users won't notice the difference.

The React Side — Streaming Without Losing Your Mind

Streaming is where most tutorials gloss over the hard parts. You get tokens one by one, and you need to append them to the UI without causing re-renders every 50ms.

I used the AI SDK from Vercel alongside LangChain:

import { useChat } from "ai/react";

export default function ChatWindow() {
  const { messages, input, handleInputChange, handleSubmit, isLoading } = useChat({
    api: "/api/chat",
  });

  return (
    <div className="chat-container">
      <div className="messages">
        {messages.map((m) => (
          <div key={m.id} className={m.role === "user" ? "user-msg" : "bot-msg"}>
            {m.content}
          </div>
        ))}
        {isLoading && <div className="typing-indicator">...</div>}
      </div>
      <form onSubmit={handleSubmit}>
        <input value={input} onChange={handleInputChange} placeholder="Ask something..." />
        <button type="submit">Send</button>
      </form>
    </div>
  );
}

The useChat hook handles the streaming for you. It maintains the message list, knows when a stream is active, and appends tokens as they arrive. Using it saved me probably a day of building this myself.

Document Grounding (RAG) — When You Need It

The chatbot also needed to answer based on specific company documents, not just general knowledge. This is called Retrieval Augmented Generation (RAG). The idea: you convert your documents into vector embeddings, store them, and at query time you retrieve the relevant chunks and inject them into the prompt.

I used Pinecone for the vector store:

import { OpenAIEmbeddings } from "@langchain/openai";
import { PineconeStore } from "@langchain/pinecone";
import { Pinecone } from "@pinecone-database/pinecone";

const pinecone = new Pinecone({ apiKey: process.env.PINECONE_API_KEY });
const index = pinecone.Index("support-docs");

const vectorStore = await PineconeStore.fromExistingIndex(
  new OpenAIEmbeddings(),
  { pineconeIndex: index }
);

// At query time:
const relevantDocs = await vectorStore.similaritySearch(userMessage, 3);

Then you inject those 3 chunks into the system prompt before the user's message. The model answers from those, not from its training data.

What I'd Do Differently

Start with the AI SDK first, add LangChain only when you need chains, agents, or RAG. For basic chat, the AI SDK alone is much simpler.
Log every conversation from day one. You'll want to see where the model fails.
Set hard token limits per session. One user managed to rack up $4 in API costs in a single conversation during testing because I forgot to cap it.

If you're building something similar and want to talk through the architecture, reach out. I've done this for several clients across fintech and e-commerce and there are patterns worth knowing before you start.

Md Refat Bhuyan

Full-Stack Developer & AI Engineer · Cunard Consulting Ltd, UK

Hire Me WhatsApp

← Read more posts