Methods

How GLP-1 Chatter reads Reddit.

Generated 2026-07-05T08:19:24+00:00

GLP-1 Chatter is not a clinical trial, a pharmacovigilance system, or medical advice. It is a public reading machine for a messy online archive. The goal is not to estimate the true average weight loss on semaglutide, tirzepatide, or retatrutide, and not to tabulate side effects as if Reddit were a registry. The goal is to surface quantitative individual stories: people self-navigating powerful medical interventions, often with limited guidance, uneven access to clinicians, and very different levels of medical supervision.

That distinction matters. A number on this site is meant to stay attached to a person's account of what happened. A dot on a chart should lead back to the post that produced it. A side-effect count should be read with the original language nearby. The site tries to quantify without stripping away context, because the context is often the point: people are describing fear, experimentation, relief, confusion, dose changes, plateaus, side effects, and improvised care in public.

The source material is found by a slow Reddit crawler. It searches selected communities for drug names, brand names, and common shorthand: retatrutide, reta, and retaglutide; tirzepatide, tirz, Mounjaro, and Zepbound; semaglutide, sema, Ozempic, Wegovy, and Rybelsus. When a candidate post or comment is found, the database keeps the subreddit, date, title, body, matched terms, URL, and original full text.

Reddit Sources and Search Terms

The crawler searches the communities below for the same full search-term list. The rotating backfill groups are a scheduling device, not a term filter: the semaglutide slice can still find tirzepatide or retatrutide switch and stack reports, and the tirzepatide or retatrutide slices can still find semaglutide history.

Drug family	Search terms	Applied to
Retatrutide	reta, retatrutide, retaglutide	All listed communities, submissions and comments where available
Tirzepatide	tirz, tirzepatide, mounjaro, zepbound	All listed communities, submissions and comments where available
Semaglutide	sema, semaglutide, ozempic, wegovy, rybelsus	All listed communities, submissions and comments where available

Public member and active-user counts were checked on 2026-07-03. Reddit may suppress, fuzz, or change these counts, so they are rough source context only. Archive candidates are the raw posts and comments already downloaded into this project's database at build time.

Community	Backfill slice	Public description	Members	Active API	Archive candidates
r/Retatrutide	reta/general	Development and user discussion around GIP/GLP-1/glucagon agonism.	139,067	5	4,184
r/RetatrutideTrial	reta/general	Retatrutide trial discussion, sharing experiences and questions around a triple incretin agonist.	6,586	2	0
r/Peptides	reta/general	Broad peptide discussion, including compounds, effects and dosing.	164,286	4	0
r/Semaglutide	sema	FDA-approved semaglutide discussion, including Wegovy, Ozempic and Rybelsus.	196,375	4	9,865
r/SemaglutideFreeSpeech	sema	Open discussion about semaglutide and similar compounds.	22,116	2	0
r/Ozempic	sema	Questions, answers and accomplishments around Ozempic.	142,724	0	0
r/OzempicForWeightLoss	sema	Support community for Ozempic and semaglutide weight-loss use.	40,764	4	0
r/WegovyWeightLoss	sema	Unofficial Wegovy and GLP-1 weight-loss community.	130,720	1	0
r/WegovyUK	sema	UK Wegovy discussion, support and progress.	9,157	0	0
r/Mounjaro	tirz	Questions, experiences and accomplishments around Mounjaro and Zepbound prescriptions.	191,673	5	6,640
r/MounjaroMaintenance	tirz	Maintenance-phase community for GLP-1 users.	21,211	5	273
r/MounjaroUK	tirz	UK and Ireland Mounjaro, Wegovy and other GLP-1 discussion and support.	53,669	4	0
r/Zepbound	tirz	Questions, experiences, tips and weight-loss support around Zepbound.	211,776	4	0
r/Tirzepatide	tirz	Plain tirzepatide subreddit target added after source audit; public metadata was not accessible during the check.	n/a	n/a	0
r/tirzepatidecompound	tirz	Large tirzepatide community discussing Rx tirzepatide and other GLP-1s, including compound-related language.	165,587	1	0
r/compoundedtirzepatide	tirz	Focused on prescription compounded tirzepatide experiences.	43,353	3	0
r/GLP1	reta/general	Broad GLP-1 agonist community covering Wegovy, Ozempic, Mounjaro, Zepbound and related drugs.	34,410	2	0
r/GLP1_BeforeAfter	reta/general	Before and after GLP-1 progress posts; public page was visible, but about metadata was not available during the check.	n/a	n/a	0

Each post or comment is then read one at a time. The language model is not asked to summarize a batch or infer a population trend. It receives a single Reddit item and returns structured fields: the drug family, the drug name mentioned, dose narrative, duration, starting and current weight, reported loss, side effects, attribution, confidence, and a short evidence note. The system marks processed post IDs so routine changes in database metadata do not trigger unnecessary rereading.

The extraction prompt is deliberately suspicious. Reddit shorthand can be treacherous: SW means starting weight, CW means current weight, and GW usually means goal weight, not weight lost. A milligram dose is not body weight. Age is not duration. A pregnancy high weight, a prior Ozempic run, a switch from semaglutide to retatrutide, or a whole lifetime GLP-1 journey should not be credited to the focal drug unless the post clearly says so.

The model extracts raw values and units; code does the arithmetic. Pounds and stone are converted to kilograms, durations are converted to days and weeks, and missing values are filled in only when the relationship is clear. Goal weight is never treated as current weight. Weight loss is plotted as negative weight change, so losing 10 kg appears as -10 kg. The display caps visible weight gain at +10 kg so a likely misread or exceptional outlier does not stretch the whole chart.

Some records get a second read. Reports with weight loss over 25 kg, weight gain over 5 kg, or duration over 365 days are sent through a stronger rescreening step before they become canonical. Side effects are extracted as short phrases, normalized with an explicit mapping, and screened into mild, moderate, or severe reader-facing labels. Those labels are not clinical adverse-event grades; they are a browsing aid for lived reports.

The language model can still be wrong. It can miss jokes, sarcasm, bravado, deleted context, or a throwaway line that changes the meaning of a post. Long Reddit narratives are especially difficult when they describe several drugs, several starts and stops, pregnancy weight, regained weight, a prior GLP-1 history, a switch, a stack, and more than one bout of loss or gain. The model may infer a duration that was never stated, attach an old weight change to the wrong drug, mistake a goal or highest weight for a current weight, or treat a frightened question as a clean report. This is why the widgets are built to point back to the original Reddit text: the extraction is an index into the story, not a substitute for reading it.

The biases are obvious and large. Reddit users are not representative of all patients. Enthusiastic people may be more likely to post. People having frightening symptoms may also be more likely to post. People doing well under ordinary medical care may never appear. People without internet access, English fluency, leisure time, or comfort discussing weight and medication in public are underrepresented. Some posts may be exaggerated, mistaken, duplicated, sarcastic, or incomplete.

There are also platform and market distortions. These communities attract curiosity, desperation, brand loyalty, anti-brand resentment, peptide vendors, gray-market sales pitches, bots, trolls, and people with financial or ideological reasons to make a drug look better or worse than it is. Moderation policies differ by subreddit. Search terms miss some relevant reports and capture some irrelevant ones. Deleted posts, edited posts, Reddit access limits, and crawler blind spots all shape what enters the archive.

For those reasons, the charts should be read as maps of reported experience, not estimates of treatment effect. Optional clinical-trial overlays, when present, are external aggregate comparison data and are never used to fit the Reddit curve. GLP-1 Chatter is most useful when a reader moves between the aggregate view and the underlying stories: from dot to post, from side-effect phrase to full account, from apparent pattern back to the messy social world that produced it.

Technical Process Flow

The live pipeline is spread across six GitHub Actions workflows, or seven GitHub Actions jobs if the separate Pages build and deploy jobs are counted separately. Times below are in UTC. The Reddit crawler runs daily at 03:18 and searches recent posts and comments, usually over the last seven days. A temporary historical backfill runs at 00:17 and 12:17 until July 9, 2026, rotating across tirzepatide, semaglutide, and retatrutide source groups. Neither crawl step uses a language model. They only find candidate Reddit items, store raw text in SQLite, and commit the packed database artifact.

The parse workflow runs daily at 04:42 and also after successful crawl or backfill runs. This is the first LLM-bearing stage. It sends exactly one Reddit post or comment at a time to gpt-5.4-nano and writes structured drug reports to the database. Suspicious records, such as very large losses, large gains, or very long durations, can be rescreened one item at a time with gpt-5.4-mini before becoming canonical. This stage fills the main fields used by the weight-change plots: drug family, named drug, dose narrative, duration, raw weight values, computed kg/week values, attribution, side effects, evidence, confidence, and notes.

The side-effect severity workflow runs at 05:17 and 17:17, and also after successful parse runs. This is the second LLM-bearing stage. It sends one canonical extracted report at a time to gpt-5.4-nano and labels each normalized side-effect phrase as mild, moderate, or severe. The labels are written to side-effect screening tables and are used by the side-effect frequency view, co-occurrence matrix, severity controls, and report browser.

The compound-normalization workflow runs at 06:33. During the initial cleanup period it runs daily through July 17, 2026, then only on Mondays. This is the third LLM-bearing stage, but only for unresolved compound strings. The script first applies explicit aliases in code. If a raw compound phrase is still unresolved, it sends that one phrase to gpt-5.4-nano, then applies the alias map again so shorthand or rough nano outputs are collapsed to canonical names such as tirzepatide, retatrutide, semaglutide, human growth hormone, or insulin. The result is cached in data/compound_normalizations.json and feeds the stacking/polypharmacy matrix.

The Pages workflow has no LLM calls. It runs after successful parse, side-effect severity, or compound-normalization workflows, and also after relevant pushes to the site code or data artifacts. Its build job unpacks the database and runs the static site builder. Its deploy job publishes the generated HTML, JSON, JavaScript, images, and CSS to GitHub Pages. At runtime, the website has no backend; the browser only reads the static files produced by this build.

The process is therefore: Reddit and PullPush sources feed raw_posts; the parser turns pending raw posts into extracted_reports; code performs unit conversion and plot eligibility checks; the mini rescreen revisits suspicious reports; side-effect screening adds severity labels; compound normalization cleans raw drug and stack names; the site builder emits static JSON and HTML; GitHub Pages deploys the result.