Who Trained on What? AI, Canadian Journalism, and the Attribution Gap
A pair of studies from McGill University's Centre for Media, Technology and Democracy has put numbers to something many in the journalism industry have suspected for years: AI companies are building commercial products on the back of Canadian reporting, and they are doing so largely without credit and without compensation.
AI Canadian Journalism and Paths for Policy Action
The research tested four major AI models — ChatGPT, Gemini, Claude, and Grok — across more than 18,000 queries drawn from real Canadian news stories in both English and French. The findings are difficult to dismiss.
When asked about news events drawn from their training data, the models provided no source attribution in 82 per cent of responses. When given live web access and asked about specific recent articles, they covered enough of the original reporting to substitute for the source in 54 to 81 per cent of cases — meaning a reader who received the AI's answer had little reason to click through to the newsroom that did the work.
The outlets that fared best in AI visibility were the ones you'd expect: CBC, CTV, and the Globe and Mail — large, free, nationally prominent. Paywalled and regional outlets, including many doing substantial original reporting, fell well below proportional representation. The Toronto Star received 11 named-source mentions across 18,000+ responses. The Montreal Gazette received one.
French-language journalism faced a compounded version of the problem. French content was absorbed at rates comparable to English, but French outlets appeared in citations only 10 per cent of the time. Radio-Canada and La Presse dominated even that small share. The Journal de Montréal, one of Quebec's most widely read newsrooms, was described by researchers as nearly invisible to AI systems.
The Legal Gap
Canada's existing legislative tools do not cleanly reach this problem. Bill C-18, the Online News Act, established that technology companies profiting from Canadian journalism should enter a fair process to determine the value of that exchange. But its framework was built around entities that index and display news content — not companies that absorb and synthesize it. The researchers describe extending C-18 to AI as "not a simple amendment."
The Copyright Act's fair dealing doctrine is similarly uncertain. Whether large-scale commercial AI training constitutes "research" within the meaning of the Act has never been tested in Canadian courts. The answer, whenever it comes, will have consequences well beyond journalism.
Federal Culture Minister Marc Miller acknowledged the report publicly, saying the use of news by AI requires a serious conversation and that platforms must "pay their fair share."
The Irony of Capacity
Perhaps the most pointed finding in the research is what it reveals about what is technically possible versus what AI companies have chosen to do. When researchers named the outlet and explicitly asked for citations, attribution rates jumped to 74 to 97 per cent. The capacity for meaningful source identification already exists. It has simply not been deployed by default.
The researchers frame the conclusion plainly: "In the absence of deliberate policy choices, the terms of AI companies' relationship to Canadian journalism are being set by corporate design decisions made outside Canadian jurisdiction."
Two Sides Worth Debating
The information access argument holds that AI synthesis of public-interest reporting serves a democratizing function — bringing news to people who might not otherwise encounter it, in a format they can engage with. Restricting this could entrench the advantages of legacy media brands while doing little to support the smaller outlets the study says are being most harmed.
The sustainability argument counters that the journalism ecosystem that AI depends on for training data is already under severe financial stress. If AI accelerates reader disengagement from source newsrooms by substituting for them, the very reporting that trains future models begins to disappear. The system eats its own inputs.
Both arguments have weight. What neither side has is a clear policy instrument — yet.
Questions for Discussion
- Should the Online News Act be extended to cover AI training and synthesis, or does that require entirely new legislation?
- Is the fair dealing doctrine under the Copyright Act a viable path for journalists seeking compensation, or is it the wrong tool for this problem?
- Does mandatory attribution at the response level — naming the outlet even without a link — meaningfully change the economic equation, or is it largely symbolic?
- Should francophone and regional outlets receive specific protections given the documented disparity in AI visibility?
- Is there a distinction between AI using journalism for training versus AI using it in live-retrieval answers — and should policy treat them differently?
ai-systems-use-canadian-journalism-but-seldom-cite-media-sources-report/
ai-systems-use-canadian-journalism-but-dont-attribute-sources-report/
This article is published for civic discussion. CanuckDUCK does not take a position on AI policy or media regulation. All perspectives are welcome.