2024-05-29

Training is not the same as chatting: LLMs don’t remember everything you say

Misconceptions: Training vs. Chatting

Many users wrongly assume the model “learns from” each chat in real time and will do better next time because of their input.
Several commenters stress the distinction between a fixed, pre-trained model vs. future training runs that may use aggregated logs.
Some criticize the article’s framing as semantic or misleading: “doesn’t remember” vs. “is stored and may later influence future models.”
Others defend the clarification as crucial, because users waste time thinking they’re “training” the model via usage.

Data Retention, Privacy, and Trust

Commenters highlight an “AI trust crisis”: vendors claim not to train on user data, but many people don’t believe them.
Economic incentives (e.g., paid data deals with platforms) drive suspicion that free user chats will also be exploited.
Opt-out mechanisms exist but are not auditable; people assume worst-case.
Even if current models don’t live-train, logs can be leaked, misused, or later repurposed, so sensitive data remains risky.

Quality and Usefulness of Chat Logs as Training Data

Some argue chat logs are mostly low-quality: confused questions, mistakes, and rants, making them poor pretraining material.
Others note they can still be valuable for feedback/RLHF, especially where users correct bad outputs or rate answers.
Concern that including proprietary or personal information in training could cause damaging leakage in future responses.

Memory, Personalization, and RAG

The new “memory” feature is discussed as a shallow system-prompt injection of short facts, not true weight changes.
Several find it annoying or poorly filtered; it often stores trivial or context-specific details.
Commenters describe more advanced patterns: RAG over conversation history, summarization, “cognitive compression,” and vector stores to simulate long-term memory.
Distinction emphasized between model-level memory vs. service-layer memory and tooling.

Continuous Learning and Dynamic Evaluation

Some see the lack of continual learning as the most disappointing limitation; others point out training is expensive, slow, and risky to update frequently.
Techniques like dynamic evaluation, test-time adaptation, LoRAs, and prompt/soft-prompt tuning are mentioned as ways to update behavior on the fly, but they’re hard to deploy at scale.
There’s interest in future “live-trained” or highly personalized models, especially on local devices.

User Understanding, UX, and Regulation

Commenters report a large gap between expert mental models and everyday users’ expectations, reinforced by chat-style interfaces and anthropomorphic phrasing (“I’ll remember that”).
Some propose education or even “AI licenses” for professional use; others resist adding barriers, comparing AI risks to existing internet and social-media harms.
Overall, many see clearer communication about what is and isn’t remembered as essential product UX, not just a technical detail.

Related topics