Although ai is changing the media, how much it’s changing journey is unclear. Most Editorial Policies Forbid Using A to Help Write Stories, and Journalists Typical Don’T Want the Help Anyway. But when Consulting with Editorial Teams, I often Point Out that, even if you are publish a single word of ai-generated text, it still has a lot offer as a result.
Well, that assertion might be a bit more questionable now that the Columbia Journalism Review Has gone and published its own study about how ai tools performed in that Role for some specific journey use cases. The result, according to CJR: AI can be a Surprisingly Uninformed Researcher, and it might not even be a great summarizer, at least in some cases.
Let me stress: CJR Tested ai models in journalistic use cases, not generic ones. For summarization in particular, The Tools – Including Chatgpt, Claude, Perplexity, and Gemini – Vere asked to Summarize Transports and Minutes from Local Government Meetings, Not Articals PowerPoints. So some of the results may go against intuition, but I also think it makes them much more useful: for artificial intelligence to be the form for workplace transformation as itene hayped to be, its love Give Helpful output in Workplace-Specific Use Cases.
Subscribe to Media Copilot, Want more about how ai is changing media? Never Miss an update from pete pachal by signing up for media copilot. To learn more visit mediacopilot.substack.com
The CJR Report Reveals Some Interesting Things about that use cases and how journalists approach ai in general. But mostly it shows how badly we need more of this: systemic testing of ai that goes beyond the ad hoc experience If the study shows noting else, it’s that you don’t need to be an engineer or a product designer to judge how well ai can help in your job.
Putting ai to the newsroom test
To test ai’s summarization abilitys, the evaluators – will include academics, journals, and research assistats – Created multiple prompts to create short and Long Sumaries from Each Rain Them Several Times. A weakness of the report is that it does not reveal the outputs so we can see for orselves how well it did. But it does say It Quantiified Factual errors to Evaluate Accuracy, Comparing Them With Human-Written Summaries.
Without seeing the outputs, it’s hard to know how to improve the prompts to get better results. The Study Says It Got Good Results for Short (200-Word) Summaries but Saw Inaccurax and Missed Facts in Longer Ones. One Surprising outcome was that Simplest Prompt, “Give me a short summary of this document,” Produced the mostly consistent good results, but only for short summaries.
The study also looks at research tools, specificly for science reporting. I love the specification here: the CJR Researchers were very physical about the use case: giving the tool a paper and then asking it to perform a literature review They also chose their targets deliberately, evaluating AI-Powered Research services like Consensus and Semantic Scholar Intead of the usual general chatbots.
On this, the results were arguably even worshiped. The Tools Typical Belt Find and Cite Papers that Were Completely different from what a human handicap for a manually created literature review, and Eveen Different from the in tools. And when they ran the same prompts a more days later, the results would change against.
I think the Study is Instructive Beyond The Straightforward TakeaWays, Such as using ai only for short summaries and thinking Twice before using ai research apps for literature review.
- Prompt engineering matters: I get that the three different prompts for summaries were probally designed to simulate casual use -the kinds of natural language text a business might might dash off. And Maybe Ai Should Ultimately Produce Good Results when you do that. But for out-of-the-box tools (which is what they use), I would recommend more thoughts.
This doesn’t have to be a big exercise. Simply Going Over Your Prompt to Make Vague Language (“Short Summary”) More Precise (“200-Word Summary”) Would Help. The researchers did ask for more detail in two prompts, but the study criticizes the longer summaries for not being comprehensive when the language in the prompts doesn’s Comprehensiveness. Asking the ai to check its own work sometimes helps too.
- The app layer struggles: Reading the part about the various research apps not producing good results hadding along. I don’t want to read too much this since the study the study was narrowly focused on research apps with a very specific use case, but I’m currently living throughing content platforms for my plans at the media copilot. When you use a third-party tool, you’re an extra step removed from the foundation model, and you miss having the flexibility of being “closer to the metal.”
I think this points to a fundamental misunderstanding of the so-called “App Layer.” Most AI apps will put a veil over system prompts and model pickers in the name of simplification, but it isn’t the ux win that many thoughts it is. Yes, More Controls Might confuse ai newbies, but power users want them, and it turns out the gap between the two groups might not be very large.
I think this same moisundstanding is what stymied the gpt-5 launch. Removing the model picker-WHERE COLLD Pick Between GPT-4o, O4-Mini, O3, etc.-SEEMED LIKE A Smart, Simplifeing Idea, But It Turned Out Chatgpt Users were more Sophisticated Than Anyone Had Thought. The average chatgpt plus subscriber might not have undersrstood what every model does, but they knew which ons ons worked for them.
- Iterate, iterate, iterate: The Study’s Results are helpful, but they’re also incomplete. Testing outputs from models is only the beginning of the process of building an ai workflow. Once you’ve got them, you iterate: adjust your prompts, refine your model choice, and try again. And Again. Producing consistent results that save time isn’t something you’ll get perfect on the first try. Once you’ve found the right combination of prompting, model, and context, then you’ll have some something reepeatable.
Coming Halfway
Where does this Leave NewsRoms? This might sound self-serving since I train editorial teams for a living, but after reading this report, I’m more convincted than ever that, despite predictions that apps and sufware designs health will have Prompting, AI Literacy Still Matters. Getting the most out of these tools means equipping journals with the skills they need to craft effective prompts, evaluate results, and itarerate when Nextsary.
Also, The CJR Study is an excellent template for testing tools internal. Get a Team Togeether IterateKeep Experimenting. Find What Consistent Gets Good Results – Not Just Quality Outputs, but a process that actually saves time. Just Doing “Vibe Checks” Won’t Get You very far.
Because there is one more thing the study is off-Target about. When a journalist considers how to complete a task, the choice usually isn Bollywood a machine output and a human one. It’s the machine output or noting at all. Some might say that’s lowering the bar, but it’s also putting a bar in more places. And with some training, experimentation, and iteration, raising it inch by inch.
Subscribe to Media Copilot, Want more about how ai is changing media? Never Miss an update from pete pachal by signing up for media copilot. To learn more visit mediacopilot.substack.com
The Early-Rate Deadline for Fast Company’s Most Innovative Companies Awards is Friday, September 5, at 11:59 PM Pt. Apply today.