This is my “before.”
Before the workshops.
Before the blog.
Before the refinements, the rewrites, the confidence of mastery.
Before the beacon was lit.
What you’re reading here isn’t the polished voice of a finished program. It’s a baseline. It’s the snapshot I took of myself when I decided to build Shinros. I’m sharing it because there’s a decent chance you’re standing in almost the same place.
You know AI can do more. You’ve tried prompts from LinkedIn. You’ve watched coworkers copy-paste magic phrases into ChatGPT. Sometimes it lands. A lot of the time, you’re guessing.
This blog is where I stop guessing in public. And I’ll start with the first test I gave myself.
The Diagnostic: Module 0 Assessment
Module 0 in my course is about orientation and clarity. No prep. No research. No “cheat sheets.” You sit down, you answer a controlled set of questions about prompting and LLM behavior using only what you already understand.
Then those answers get reviewed by three separate high-end language models: ChatGPT, Claude, and Grok. I’m not asking them to flatter me. I’m asking them to find cracks.
My starting scores:
The point of doing this wasn’t “Am I good?” The point was: What is already transferable to someone else on day one, and what still lives only in my head?
What I Got Right — and What I Missed
Q1: How LLMs Generate Responses
Score: 8 / 10
Prompt: Explain how LLMs generate responses. Be literal and technical. Avoid metaphor unless you need it.
I talked about tokenization, probability, and step-by-step token generation. I even mentioned sampling controls like temperature and top-p. Good start.
What I didn’t include: how the model decides when to stop. I didn’t talk about end-of-sequence tokens or max token limits. That sounds small, but it’s not. If you don’t understand how output stops, you can’t control output shape. It’s like explaining how a car moves without ever mentioning the brakes.
Q2: Context Window Limits
Score: 7 / 10
Prompt: What happens when a prompt exceeds the model’s context window? How might that affect the model’s response?
I said that when you go past the limit, older tokens get dropped, which can break coherence. That’s basically correct.
The refinement I got back: don’t say “the model forgets.” The model isn’t a person with memory loss. It’s an input buffer with a hard cap. Tokens fall out, sometimes from the start, sometimes from the middle, depending on architecture. That difference matters in production.
Q3: Chain-of-Thought vs ReAct
Score: 6 / 10
Prompt: Explain the difference between Chain-of-Thought and ReAct to a client who’s never heard of either.
I said Chain-of-Thought is “thinking out loud,” and ReAct is “thinking with feedback.” Close, but not sharp.
The missing piece was the action loop. ReAct is not just structured reasoning. It alternates between reasoning and taking an action (like looking something up), then using what came back to continue reasoning. That loop is the whole value. Once I internalized that, I started using ReAct as a diagnostic tool, not just a prompt style.
Q4: The Same Prompt Across Different Models
Score: 7 / 10
Prompt: Why might one prompt work in ChatGPT but fail in Mistral or Llama? Give at least two reasons.
I said environment matters (API vs chat UI, hidden system prompts, safety layer, etc.). That’s true.
What I didn’t say: models are not interchangeable brains. They’re trained on different data, they tokenize text differently, and they’re optimized with different alignment steps. So you don’t “transfer a prompt,” you “adapt an instruction to a system.” That’s now part of how I teach teams.
Q5: Summarization and Hallucination
Score: 6 / 10
Prompt: Give three reasons why summarization alone can still lead to hallucination when compressing context.
I said “summaries lose detail.” True, but shallow.
The better framing I was given: hallucination shows up when the model is forced to complete a missing pattern. If you compress a meeting and you drop who actually pushed back on a decision, then ask “Who disagreed?”, the model might invent an answer that sounds right. Not because it’s lying. Because it’s filling a hole you created.
That reframe hit me. Hallucination is often downstream of our shortcut, not the model’s ego.
Practical Prompts: Where Instinct Was Already Strong
P1: Chain-of-Thought Prompt
Score: 7 / 10
Task: Write a prompt that makes the model reason step-by-step before answering (math, troubleshooting, decision-making, etc.).
I told the model to reason in steps. Solid. What I didn’t add was structure like “List possible options, explain tradeoffs, then choose one and justify.” I add that automatically now. That one tweak forces clarity and exposes weak logic.
P2: Compression Prompt (Under 40 Tokens)
Score: 9 / 10
Task: Compress this instruction under 40 tokens without losing tone: “You are a helpful assistant designed to write formal and respectful emails in response to various customer support queries. Begin with a professional greeting, clearly acknowledge the customer's concern, and propose a helpful resolution using polite language.”
This was my cleanest answer. I kept the behavior, the tone, and the structure without letting it go vague. Feedback from Grok was basically: “Watch for places where compression creates ambiguity.” That’s fair. There’s a point where “short” becomes “slippery.”
P3: Inclusive HR Prompt
Score: 8.5 / 10
Task: Write a prompt an HR manager (non-technical) could use to make job descriptions more inclusive without losing clarity.
Claude liked this one. What landed was that I wasn’t just “write inclusive language.” I walked the HR manager through an edit loop and gave them an audit checklist. It was usable immediately, which is the entire point.
Why This Matters
I’m not posting this to claim “Look how strong the score is.” That’s not the win.
The win is: I could see my blind spots. And once I could see them, I could design training around them.
Most people using AI at work are doing it on instinct. Sometimes that instinct is great. Sometimes it’s quietly wrong in a way that poisons an answer and nobody catches it until a customer escalates.
Shinros exists to fix that gap. Not with hype, not with “magic prompts,” but with teachable process: asking better questions, verifying output faster, and keeping failure modes visible.
This Is the Before
Before the results.
Before the workflow audits and rebuilt processes.
Before the Reliability Playbook and the Confidence Program.
But not before the pull. That was already there.
From here, I started building the course that became The Shinros Method: Prompting with Purpose for High-Impact AI Use. It’s the same work I use now when I train Support teams to trust AI under pressure.
This post is where it began.
This is Before the Beacon.