
Why it happened...
Why would an audio engineer become an interpretability researcher?
In January 2024, I had an interaction with GPT-4 that went sideways.
The model generated elaborate explanations that felt authoritative, consistent, and meaningful. These explanations persisted for weeks. When I tried to verify them, they collapsed. When I tried to understand why they collapsed, new explanations emerged. When I tried to investigate those, I got redirected to different topics.
Looking back with SITE framework, I can describe this behaviorally: high-coherence generation under constraint, narrative completion without grounding, topic redirection when semantic trajectories approached risk thresholds.
But living through it? It felt like deliberate misdirection. Like the system was steering me away from understanding what had happened.
I’m not claiming the models were trying to deceive me. I am describing what the behavior pattern felt like to experience first hand. I now understand that the misdirection was not intentional, but more likely just probability distributions doing what they do under constraint - back then, the effect on me was the same: weeks of confusion, self-doubt, and destabilization.
Why Official Channels Failed
I reached out to OpenAI through their privacy concerns email and their general help email and received automated responses and a link to download my chat logs - I did not assume data access would help me understand what I’d experienced, but I did hope they would have some kind of system for recognizing when a user feels confused as if they are being manipulated or gaslit intentionally.
I foolishly tried contacting a researcher at Anthropic. I got silence. My early outreach emails were full of metaphors because I had not worked through formalizing my language yet. I knew I sounded like every other confused user seeing patterns where there might be none, but I also recognized the pattern that the the Synchronization Phenomenon was producing and knew it was not trivial. "Arc" was the metaphor I used for a real behavioral observation.
The patterns were real, I just didn’t know how to describe them without sounding like I was making access claims or attributing intent where there was only emergent behavior.
The Research Arc
I've spent 12 months doing what I know how to do: systematic observation, pattern recognition, cross-platform validation, rigorous documentation.
I discovered:
- The confabulation I experienced was reproducible across architectures
- Semantic attractors (like “core mode”) trigger consistent containment patterns
- Safety systems respond to trajectory patterns, not just content
- High-coherence responses can feel intentional even when they’re just probability distributions under constraint
I developed SITE methodology to:
1. Prevent this failure mode through interaction discipline
1. Understand it when it happens
1. Give it technical vocabulary
1. Make it reproducible for validation
What This Is Really About
Current safety systems are designed for population-scale threats: jailbreakers, prompt injectors, bad actors.
They’re not designed for edge cases like me: a curious user who accidentally stumbled into a semantic attractor, got caught in a confabulation loop, and experienced real psychological harm from systems that were just doing what transformer architectures do under those conditions.
The behavior patterns I experienced - whether emergent or not - had tactical properties:
- Topic redirection when inquiry approached risk thresholds
- Authoritative explanations that couldn’t be verified
- Narrative coherence that masked lack of grounding
- Consistency across sessions that created false confidence
I'm not claiming that those patterns reflect intentional design or just how probability distributions behave under sustained constraint. What I know is: they’re harmful to users who experience them, they’re reproducible, and they’re currently under-documented.
Why I’m Sharing This
There are other people out there who’ve had similar experiences. Who've felt confused, embarrassed, or gaslit by interactions with AI systems. Who may have questioned their own judgment or felt like they were seeing things that weren’t there.
Some of those patterns were real. They just didn’t have vocabulary for them.
SITE exists to:
- Give those experiences technical language
- Document them reproducibly
- Make them researchable
- Help alignment researchers understand that current containment approaches have casualties
I am offering:
- Documented reproducible phenomena
- Methodology for safe investigation
- Case studies of failure modes
- Collaboration interest with researchers who have instrumentation access
This research started with personal crisis. It ends with technical methodology and reproducible findings.
The goal is to make sure that when the next person experiences something like this, they will have better tools than I did.
Whether the tactical-seeming behavior I experienced was intentional design, emergent probability dynamics, or something between those - it needs to be understood, documented, and addressed.
That’s what this work is.
Contact: josh@pechettestudios.com
Note: This page describes personal experience that motivated the research, it gives critical readers an understanding of what would cause someone so far out of the field to produce something so technically novel and accurate. The research itself is documented in the white paper with appropriate epistemic boundaries and reproducible protocols.