Making a Psychologist: Vibe Coding a Data Collector

Photo of Andrew Neff, author of Fundamentals of Biological Psychology, in a library with several of the textbooks displayed around him

This is part 2 of Making a Psychologist, a blog series written in part to promote my good new textbook, Fundamentals of Biological Psychology: A Critical Perspective. In this series, I’m exploring the logistics and ethics of developing a super-powered psychologist: a system that continuously observes behavior in real time, then generates causal models of your life with a degree of statistical objectivity that no human introspector could hope to match (If you haven’t yet, consider starting with part 1).

In this post we’ll think about the very first step in building such a system: collecting speech from an iPhone. If you are not in tech, the idea of “building a phone app” that does anything at all should sound difficult, let alone one that actually collects audio, transcribes speech, identifies speakers, and does not accidentally delete months of precious transcripts. But fear not, for I, a mere textbook writer, assure you that modern AI coding agents can deliver you unto glory, provided you have sufficient patience, and sufficient cunning to constrain your LLM’s occasional mischief.

Vibe Coding – AI Tools and Amateur App Development

Before building anything, consider the basic tools. First are the LLMs themselves, systems like ChatGPT and Claude, which can either be trained on natural language for chatbots, or can be trained on code for, well, writing code. Now, if you were a fool, you could ask ChatGPT, in the regular dialogue window, to generate code for you, then paste that code into manually-created text files. However, you will quickly discover how slow and error-prone that method is.

You want an agent: software that allows an LLM to interact with your computer.  A useful agent reads your project files directly, rather than requiring you to paste code into the dialogue window. It then creates and edits files automatically, so you don’t have to do the wretched task of hunting for blocks of code to replace (blocks of code that, mind you, are sometimes hallucinated). Tools like Cursor, OpenAI’s Codex, and Anthropic’s Claude Code provide this capability, translating natural language instructions into working software.

With tools like these, you can get started with the logistics of app-creation: Install Xcode, sign up for an Apple Developer account (currently $99 per year), switch your phone into developer mode, and you are ready to begin the first real stage of this project: collecting data.

Recording, Transcription & Voice Matching

As you begin, you’ll soon realize that, though your endeavor to create a personal AI psychologist is a grand one, your phone is very uncomfortable with the idea. Apple and Android impose strict limits on continuous recording: the user must initiate recording, the screen must indicate that recording is happening, and, the killer, apps generally cannot start recording from the background. This last restriction is especially troublesome because only one app can access the microphone at a time. Every FaceTime call, Siri request, or baby-monitor conversation stops your recording, and your app cannot automatically restart it afterward. One workaround is to have your server notify the phone when recording stops, allowing the user to restart recording. But who wants to press another button? That is one reason external devices such as pins, glasses, or microphone-enabled EEG headsets are attractive.

Once you begin recording, the next step is transcription and voice matching. Tools such as Whisper and Resemblyzer are impressively good, though far from perfect when the phone is in a pocket and the TV is on. There will be garbled nonsense, and there will be speaker misattribution, but you must not let the perfect be the enemy of the good.

Once transcription and voice matching are working, another issue appears immediately: your phone can’t handle it. Both processes rely on advanced computational algorithms, which means they are computer-slowing and battery-draining processes. One solution is to get a server to run your code for you (e.g. Render). Though I am told that cheaper options are available, and more efficient scripts could be written, 24/7 audio transcription and voice matching runs me $19 per month.

So, now that you’re familiar with AI-app-development, you are recording, transcribing, and voice-matching your audio. Now is the time for patience, for Rome was not built in a day, and few useful psychological models will be derived from a single day or week of data. Take a few months, put your feet up, restart audio-recording when your phone tells you to, collect 180,000 transcript-lines of speech, and we’ll meet back in the next post to talk about what to do with what seems like, but only seems like, a mountain of data.

Textbook cover for Fundamentals of Biological Psychology by Andrew Neff featuring a multicolored brain image over a dark blue background

Order an exam copy of Fundamentals of Biological Psychology

Purchase a copy of Fundamentals of Biological Psychology

Read the next blog in the ‘Making a Psychologist’ series

Leave a reply

Your email address will not be published. Required fields are marked *