Rio's "Homegrown" 397B LLM Accused of Being a Simple Model Merge
A GitHub issue claims Rio de Janeiro's municipal LLM is just a 60/40 blend of Nex-N2 and Qwen.
If you are going to copy someone else's homework, the golden rule is to at least change the name on the cover page. In the world of large language models, that means ensuring your system prompt doesn't slip off to reveal your true corporate identity.
Rio de Janeiro's municipal IT agency, IplanRIO, recently made waves by presenting prefeitura-rio/Rio-3.5-Open-397B as an original, custom-trained 397-billion-parameter model. However, developers looking under the hood have pointed out that building a massive LLM from scratch is incredibly difficult—while sliding a linear interpolation fader to a 0.6 ratio is remarkably easy.
According to a detailed GitHub issue opened by AI startup Nex-AGI on their Nex-N2 repository, Rio's supposedly homegrown model is actually a direct, element-wise merge of two existing models: Nex-N2_pro and Alibaba's Qwen3.5-397B-A17B.
The Anatomy of a "Frankenmodel"
Model merging has become a highly popular technique in the open-source AI community. By mathematically combining the weights of two or more pre-trained models, developers can often synthesize the strengths of both without the astronomical compute costs of a fresh training run.
However, there is a vast difference between publishing an experimental merge and claiming to have trained a 397B model from scratch. Nex-AGI alleges that IplanRIO did the latter, finding no evidence of any actual training performed by the municipal agency. Instead, they claim the weights of the Rio model are a direct, static blend:
$$\text{Rio-3.5-Open-397B} \approx 0.6 \times \text{Nex-N2_pro} + 0.4 \times \text{Qwen}$$
According to the analysis, this exact 0.6/0.4 ratio persists across all 60 layers and every single component of the network. In the machine learning world, weight distributions of independently trained models do not align this perfectly by accident. The similarity is consistent to thousands of standard deviations, making any explanation other than direct weight interpolation mathematically impossible.
The Identity Crisis
If the tensor mathematics do not convince you, the behavioral evidence is even more damning.
When deploying a merged model, creators typically rely on a hard-coded system prompt to establish a new persona—in this case, instructing the model that it is an assistant developed by the city of Rio de Janeiro. But system prompts are notoriously fragile wrappers.
Nex-AGI demonstrated that when Rio's hard-coded "You are Rio" system prompt is removed, the underlying model suffers a severe identity crisis:
- 79% of the time, the model identifies itself as "Nex, from Nex-AGI."
- 0% of the time, it identifies itself as "Rio."
- It even recites Nex-AGI's bespoke, proprietary corporate backstory word-for-word.
For a model supposedly trained independently by a Brazilian municipal government, its sudden insistence that it is a Silicon Valley startup is a tough anomaly to explain away.
The Developer Takeaway
There is no shame in model merging. It is a legitimate, highly efficient way to build specialized tooling on top of foundational giants like Qwen. But transparency is currency in the open-source community.
Attempting to pass off a 60/40 linear merge as an originally trained model is a bold strategy, especially when the open-source ecosystem is filled with developers who know exactly how to run a tensor distance check. For now, the "Rio" model stands as a cautionary tale: if you merge it, label it.
Sources & further reading
Rachel has been embedded in the developer tooling ecosystem for nearly eight years, covering everything from IDE wars and package-manager drama to the quiet rise of AI-assisted coding. She has a soft spot for open-source maintainers and an unhealthy number of terminal emulators installed on a single laptop.
Discussion 0
No comments yet
Be the first to weigh in.