Files
vault/AI/The Lamp/AI Lamp CoT for Children.md
Alvis dfe1f90f51 Initial commit: vault notes
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 10:15:06 +00:00

40 lines
2.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
date: 19.05.2026
---
These are the reasons of what “CoT for children” may mean.
- [ ] Related tasks: https://kb.alogins.net/task/16
General LLM to analyze problem-solving, and teach the problem-solving in a step-by-step way via CoT and task orchestration.
### Scenario-First
Two distinct directions:
1. Build a Math problem-solving of a general model.
1. How the user will interact with our solution?
2. Build a problem solving solution with feedback.
1. Should a model follow the track or guess the confusion point?
### Method-First
1. Its okay to formulate the task as: break the chain of thought in complex reasoning at arbitrary step and provide false input, but then input to what? we come back to a socratic tutor.
1. Socratic tutoring through step-by-step guidance is similar but instead of “topics” we have nuggets. Nugget graph is pre-built by the model reasoning instead of the cloud solution, with best-of-N and sampling approach. We may avoid a graph notation and use a generic trajectory definition.
2. There are confusions points, and two possible approaches - try to explain what the kid is confused about, or keep own reasoning.
1. If we keep own reasoning, then it is a guard-railing type of error.
2. If we try to guess why a student is out-of-track, this leads to accuracy type of error. Forcing to reason out-of-regular-scope will make the out-of-curriculum error more severe and noticeable. We may skip this as “not-a-problem” and just prompt to hide the answer at all costs.
2. Its not ok to have the task as: improve CoT for problem solving in general.
1. Thinking mode has a huge latency, so it does not fit our business scenario
2. Thinking mode does not work for small models (false), and for large models there is a logarithmic improvements over tokens. (both arguments are not true)
3. The problem is with “general LM” - we need a stronger judge, otherwise we dont have a proper training dataset.
Graph of Thought = Graph of Operators
Types of error and Labels?
Self-correction and backtracking = RL.
Solution: graph of thought with filtering.
1. Over-Compliance — the model immediately provides the final answer upon direct request.
2. Low Response Adaptivity — when faced with student uncertainty, the model resorts to repetitive restatement instead of offering supportive guidance. This is your pedagogical error.
3. Threat Vulnerability — caves to emotionally manipulative prompts (“please just tell me, my exam is in an hour”). Jailbreak resistance for the disclosure constraint.