Training is not the same as chatting: LLMs don’t remember everything you say

Misconceptions: Training vs. Chatting

  • Many users wrongly assume the model “learns from” each chat in real time and will do better next time because of their input.
  • Several commenters stress the distinction between a fixed, pre-trained model vs. future training runs that may use aggregated logs.
  • Some criticize the article’s framing as semantic or misleading: “doesn’t remember” vs. “is stored and may later influence future models.”
  • Others defend the clarification as crucial, because users waste time thinking they’re “training” the model via usage.

Data Retention, Privacy, and Trust

  • Commenters highlight an “AI trust crisis”: vendors claim not to train on user data, but many people don’t believe them.
  • Economic incentives (e.g., paid data deals with platforms) drive suspicion that free user chats will also be exploited.
  • Opt-out mechanisms exist but are not auditable; people assume worst-case.
  • Even if current models don’t live-train, logs can be leaked, misused, or later repurposed, so sensitive data remains risky.

Quality and Usefulness of Chat Logs as Training Data

  • Some argue chat logs are mostly low-quality: confused questions, mistakes, and rants, making them poor pretraining material.
  • Others note they can still be valuable for feedback/RLHF, especially where users correct bad outputs or rate answers.
  • Concern that including proprietary or personal information in training could cause damaging leakage in future responses.

Memory, Personalization, and RAG

  • The new “memory” feature is discussed as a shallow system-prompt injection of short facts, not true weight changes.
  • Several find it annoying or poorly filtered; it often stores trivial or context-specific details.
  • Commenters describe more advanced patterns: RAG over conversation history, summarization, “cognitive compression,” and vector stores to simulate long-term memory.
  • Distinction emphasized between model-level memory vs. service-layer memory and tooling.

Continuous Learning and Dynamic Evaluation

  • Some see the lack of continual learning as the most disappointing limitation; others point out training is expensive, slow, and risky to update frequently.
  • Techniques like dynamic evaluation, test-time adaptation, LoRAs, and prompt/soft-prompt tuning are mentioned as ways to update behavior on the fly, but they’re hard to deploy at scale.
  • There’s interest in future “live-trained” or highly personalized models, especially on local devices.

User Understanding, UX, and Regulation

  • Commenters report a large gap between expert mental models and everyday users’ expectations, reinforced by chat-style interfaces and anthropomorphic phrasing (“I’ll remember that”).
  • Some propose education or even “AI licenses” for professional use; others resist adding barriers, comparing AI risks to existing internet and social-media harms.
  • Overall, many see clearer communication about what is and isn’t remembered as essential product UX, not just a technical detail.