Making a Psychologist: Vibe Coding a Data Collector « Computer Science#

Making a Psychologist: Vibe Coding a Data Collector

13 April 2026
Last update: 01/04/26 19:31

Photo of Andrew Neff, author of Fundamentals of Biological Psychology, in a library with several of the textbooks displayed around him

This is part 2 of Making a Psychologist, a blog series written in part to promote my good new textbook, Fundamentals of Biological Psychology: A Critical Perspective. In this series, I’m exploring the logistics and ethics of developing a super-powered psychologist: a system that continuously observes behavior in real time, then generates causal models of your life with a degree of statistical objectivity that no human introspector could hope to match (If you haven’t yet, consider starting with part 1).

In this post we’ll think about the very first step in building such a system: collecting speech from an iPhone. If you are not in tech, the idea of “building a phone app” that does anything at all should sound difficult, let alone one that actually collects audio, transcribes speech, identifies speakers, and does not accidentally delete months of precious transcripts. But fear not, for I, a mere textbook writer, assure you that modern AI coding agents can deliver you unto glory, provided you have sufficient patience, and sufficient cunning to constrain your LLM’s occasional mischief.

Vibe Coding – AI Tools and Amateur App Development

Before building anything, consider the basic tools. First are the LLMs themselves, systems like ChatGPT and Claude, which can either be trained on natural language for chatbots, or can be trained on code for, well, writing code. Now, if you were a fool, you could ask ChatGPT, in the regular dialogue window, to generate code for you, then paste that code into manually-created text files. However, you will quickly discover how slow and error-prone that method is.

You want an agent: software that allows an LLM to interact with your computer. A useful agent reads your project files directly, rather than requiring you to paste code into the dialogue window. It then creates and edits files automatically, so you don’t have to do the wretched task of hunting for blocks of code to replace (blocks of code that, mind you, are sometimes hallucinated). Tools like Cursor, OpenAI’s Codex, and Anthropic’s Claude Code provide this capability, translating natural language instructions into working software.

With tools like these, you can get started with the logistics of app-creation: Install Xcode, sign up for an Apple Developer account (currently $99 per year), switch your phone into developer mode, and you are ready to begin the first real stage of this project: collecting data.

Recording, Transcription & Voice Matching

As you begin, you’ll soon realize that, though your endeavor to create a personal AI psychologist is a grand one, your phone is very uncomfortable with the idea. Apple and Android impose strict limits on continuous recording: the user must initiate recording, the screen must indicate that recording is happening, and, the killer, apps generally cannot start recording from the background. This last restriction is especially troublesome because only one app can access the microphone at a time. Every FaceTime call, Siri request, or baby-monitor conversation stops your recording, and your app cannot automatically restart it afterward. One workaround is to have your server notify the phone when recording stops, allowing the user to restart recording. But who wants to press another button? That is one reason external devices such as pins, glasses, or microphone-enabled EEG headsets are attractive.

Once you begin recording, the next step is transcription and voice matching. Tools such as Whisper and Resemblyzer are impressively good, though far from perfect when the phone is in a pocket and the TV is on. There will be garbled nonsense, and there will be speaker misattribution, but you must not let the perfect be the enemy of the good.

Once transcription and voice matching are working, another issue appears immediately: your phone can’t handle it. Both processes rely on advanced computational algorithms, which means they are computer-slowing and battery-draining processes. One solution is to get a server to run your code for you (e.g. Render). Though I am told that cheaper options are available, and more efficient scripts could be written, 24/7 audio transcription and voice matching runs me $19 per month.

So, now that you’re familiar with AI-app-development, you are recording, transcribing, and voice-matching your audio. Now is the time for patience, for Rome was not built in a day, and few useful psychological models will be derived from a single day or week of data. Take a few months, put your feet up, restart audio-recording when your phone tells you to, collect 180,000 transcript-lines of speech, and we’ll meet back in the next post to talk about what to do with what seems like, but only seems like, a mountain of data.

Textbook cover for Fundamentals of Biological Psychology by Andrew Neff featuring a multicolored brain image over a dark blue background

Order an exam copy of Fundamentals of Biological Psychology

Purchase a copy of Fundamentals of Biological Psychology

Read the next blog in the ‘Making a Psychologist’ series

Post Views: 147

Leave a reply Cancel reply

Andrew Neff · 24 February 2026

Making a Psychologist: When AI meets Psychology

[This is Part 1 of a blog series called Making a Psychologist—about how AI is enablingscientists, big tech companies, and obscure Redditors alike to build systems that aresimultaneously horrifically invasive, but also enormously powerful, and if we’re lucky, verygood for our well-being. The series explores what it would mean to build a personal AI psychologist: […]

Ian Lindsay · 25 January 2021

Keeping up with ‘Collector’: Further notes on mobile GIS data collection

New digital technologies have been a boon to archaeological field recording as attested by the growing literature on the use of mobile devices, GIS, satellite imagery, and other digital tools. However, in many cases, the speed at which new field technologies are advancing are outpacing the tempos of publication, such that articles discussing the use […]

Daniel Major-Smith · 25 July 2023

Being Less Casual About Causality in the Evolutionary Human Sciences

As evolutionary human scientists, we care about causality. We usually want to know whether something causes something else, rather than whether things are just correlated. We want to know whether aspects of our culture, social structure or ecology cause a given behaviour, as opposed to being merely associated with it, for instance. Experiments are the gold standard for assessing causality, but for obvious reasons cannot answer everything, especially many of the evolutionary questions we’re interested in – Randomising infants to be raised as religious or not, for instance, would be both impossible and ethically questionable (to put it mildly!).

Paul Clarke · 23 June 2020

Where Physical and Digital Worlds Collide

In this blog for Data-Centric Engineering, Paul Clarke (Chief Technology Officer at Ocado) documents Ocado’s journey with building synthetic models of its business, its platforms and its underlying technologies, including the use of simulations, emulations, visualisations and digital twins. He explores the potential benefits of digital twins, including the opportunities for creating digital twins at […]

Latest Tweets

; Cambridge University Press @CambridgeUP ·

30 Jul 2024 1818331983518286317

Listen to @BBCRadio4's Start the Week, featuring @NineDotsPrize winner @jkkusiak, talking about her book, 'Radically Legal'. Learn how a group of ordinary people inspired the book when they reclaimed over 240,000 apartments back from corporate landlords 🔗

Start the Week - ‘Left behind’, but not forgotten - BBC Sounds

Tom Sutcliffe with Paul Collier, Joanna Kusiak and Matthew Xia.

cup.org

Reply on Twitter 1818331983518286317 Retweet on Twitter 1818331983518286317 0 Like on Twitter 1818331983518286317 1 Twitter 1818331983518286317

; Cambridge University Press @CambridgeUP ·

29 Jul 2024 1817981190445428893

Sparking curiosity from geopolitics in Japan to the cultural implications of AI!

Explore the seven new titles we are welcoming to Cambridge and the seven #OpenAccess titles we are launching in 2025 🚀 🔗 https://cup.org/3yhAuAx

Reply on Twitter 1817981190445428893 Retweet on Twitter 1817981190445428893 0 Like on Twitter 1817981190445428893 1 Twitter 1817981190445428893

; Cambridge University Press @CambridgeUP ·

25 Jul 2024 1816498686760747499

Our Flip it Open programme has published titles from authors @GibbsSpike, @liederfollower and Inés Valdez!

Discover how we are funding the publication of #OpenAccess books without changing purchasing habits 🔗 https://cup.org/4diDHii

Reply on Twitter 1816498686760747499 Retweet on Twitter 1816498686760747499 2 Like on Twitter 1816498686760747499 6 Twitter 1816498686760747499

View more on Twitter