Candy AI Clone: How to Build a Character Chatbot Like Candy AI Using My Own Dataset

I’m Ashish, a CTO at Triple Minds and currently building a character-based conversational AI chatbot inspired by the interaction design of platforms like Candy AI. Before anyone flags this post, I want to clarify: I’m not asking for help with NSFW content generation or violating any policies. My goal is purely technical — I want to build a chatbot that can hold contextual, emotionally-aware conversations using a fine-tuned model on my own dataset of dialogues and character traits.

APIs I’m Currently Using For Candy AI Clone

For API integration, I’m currently using OpenAI’s GPT-4 API (for testing premium conversations), Together.ai (for running Mistral and Mixtral models), and Groq API (for ultra-fast inference using LLaMA-3). For embeddings, I’m using OpenAI’s text-embedding-3-small and also experimented with bge-base-en for better similarity matching in vector recall. For voice synthesis, I’ve integrated ElevenLabs API and for image generation, I’m using Stable Diffusion via Automatic1111 API and Pollinations for fallback image prompts. The goal is to balance quality and latency while keeping the pipeline modular for future scaling.

I’ve already gone through the Hugging Face spaces and also documented my technical journey here: And review my progress on my Candy AI Clone till now so you all can advice me precisely.

This covers tokenizer setup, emotional conditioning prompts, character memory design, and multi-turn logic. But I’m still facing challenges that I’d really appreciate some advice on:

What I’ve done so far with my Candy AI Clone:

  • Built a working character memory system using a Pinecone vector DB for memory recall.
  • Created a dataset with personality-rich prompts + expected responses (custom dialogue dataset).
  • Experimented with GPT-J and Mistral models from HuggingFace, using LoRA fine-tuning for a lightweight setup.
  • Added persona switching, text-to-speech (TTS), and image generation via external APIs.

My current roadblocks with Character Chatbot :

  1. Model Selection for Dialogue Quality:
    Which model gives the best multi-turn, emotionally nuanced replies when fine-tuned on a smaller character-based dataset (5k–10k entries)? GPT-NeoXT, LLaMA, or any other newer open-source options?
  2. Training or Prompt Engineering?
    Should I invest more in few-shot prompting or go all-in on fine-tuning using PEFT/LoRA? I want my chatbot to remember tone, personality, and character quirks.
  3. Memory Design Improvements:
    Any frameworks or methods beyond Pinecone/ChromaDB for scalable memory recall without breaking context window?
  4. Latency & Cost Management:
    Running a live chatbot with vector DB and multiple model endpoints is heavy. Any suggestions on optimizing response time and reducing API costs without switching to fully closed-source APIs?

Tech Stack I’m currently using:

  • Backend: FastAPI + Node.js
  • Models tested: GPT-J, Mistral, Falcon, Zephyr
  • Embedding: sentence-transformers/all-MiniLM-L6-v2
  • Vector DB: Pinecone
  • Frontend: Next.js (React)
  • Image/Voice: Stable Diffusion + ElevenLabs

I’m looking for real technical advice — not anything related to violating usage rules or platform TOS. This is for an emotionally intelligent chatbot like Candy AI Clone with custom personas and memory, similar to roleplay chatbots, but used for safe, consent-based conversation simulation in various industries (mental health, companionship, storytelling).

If anyone has experience with fine-tuning open-source LLMs for character consistency, integrating memory systems, or reducing token cost per session, I’d love to learn from your journey.

Thanks in advance!