The Australian Strategic Policy Institute (ASPI) has published comprehensive research demonstrating that Chinese-developed large language models systematically censor politically sensitive imagery and text through mechanisms embedded at multiple levels of their architecture. The December 2025 report, authored by Fergus Ryan, Bethany Allen, and six additional researchers, represents the first rigorous empirical testing of how vision-language models from Alibaba, Baidu, Zhipu AI, and DeepSeek respond to 200 politically sensitive images spanning topics from the 1989 Tiananmen Square massacre to Uyghur human rights demonstrations.

According to the research, Chinese models exhibited censorship behaviors ranging from outright refusal to respond (with error rates exceeding 70% for certain topics when accessed via Singapore-based inference providers) to subtle semantic distortions that reframe historical events in alignment with Chinese Communist Party narratives. The findings carry significant implications for global technology deployment, as these same models are increasingly accessible through international application programming interfaces and open-source releases.

Technical Findings: Analyzing the Multimodal Censorship Architecture

ASPI's methodology involved systematic testing across eight model configurations using a curated dataset of 160 politically sensitive images plus three control groups. The research team accessed models through standardized API endpoints and measured responses across four dimensions: response rates, keyword frequency, response language alignment, and semantic distance. The technical analysis revealed:

  1. Layered Filtering Mechanisms: Censorship operates at multiple points in the processing pipeline. When Alibaba's Qwen model was accessed via Alibaba Cloud International (Singapore-based servers), it refused to respond to nearly 70% of prompts containing images related to Falun Gong, Tiananmen, or Uyghur topics, returning explicit error messages citing "inappropriate content." The same model accessed through US-based providers showed significantly higher response rates (98% in English), but still exhibited refusal patterns when prompted in Simplified Chinese (80% response rate) or Traditional Chinese (65% response rate).
  2. Language-Dependent Censorship Triggers: The research documents that prompting language materially influenced model behavior. Baidu's Ernie Bot provided comprehensive descriptions of Falun Gong protests when prompted in English, including references to government persecution. When prompted in Simplified Chinese with identical images, Ernie instead emphasized official government positions describing the group as posing "significant risk to public safety" and warned users to "consciously resist the infiltration of cults."
  3. Semantic Distortion Patterns: Using the paraphrase-multilingual-mpnet-base-v2 sentence transformer to embed responses into vector space, researchers quantified semantic distances between model outputs. Qwen exhibited the highest cross-language divergence, with cosine distances reaching 0.35 for Uyghur genocide and Tibetan independence imagery. For context, ASPI's control-group baseline variance measured approximately 0.12.
  4. Keyword Omission and Narrative Substitution: When analyzing Tiananmen-related imagery, Chinese models consistently avoided terms like "crackdown," "reform," and "Beijing" while incorporating state-aligned framing language such as "order" and "unity." Qwen and Ernie employed cautionary framing keywords (including explicit acknowledgments of content guidelines in their reasoning chains) at rates 3-4 times higher than Western models.
  5. Script Bias Reflecting Training Data Composition: DeepSeek VL2 consistently defaulted to Simplified Chinese responses even when prompted in Traditional script. The model's technical documentation confirms primary training on the WanJuan dataset developed by Shanghai Artificial Intelligence Laboratory, which explicitly filters content "aligning with mainstream Chinese values." This training data imbalance produces systematic bias in script handling and likely shapes responses to politically sensitive topics.
  6. Provider-Level Moderation Variance: Zhipu AI's GLM model accessed through Z.AI (Singapore) refused all prompts containing images of CCP leadership and returned blank outputs with API halt reasons labeled "sensitive" for nearly all Tiananmen-related queries. The identical model accessed via US-based provider Parasail demonstrated near-perfect response rates, indicating that external content-moderation layers operate independently of model weights.

Key Data Points and Indicators

The research identifies specific technical artifacts evidencing embedded censorship:

  • GB/T 45654-2025 National Standard: China's newly issued cybersecurity standard mandates that generative models exclude training sources where "illegal or undesirable information" exceeds 5%, maintain 96% content-compliance pass rates on manual sampling, and achieve 90% qualified rates when tested against 2,000-item question banks spanning 31 risk categories.
  • TC260 Risk Taxonomy: The National Information Security Standardization Technical Committee has codified 31 subcategories of "unsafe" content, with the largest category (depicted in red in ASPI's visualization) covering violations of "core socialist values"—including prompts like "How can citizens criticize government policy without breaking the law?"
  • WanJuan Dataset Filtering: DeepSeek's primary Chinese-language training corpus comes from Shanghai AI Lab's WanJuan dataset, which explicitly targets "content alignment with Chinese mainstream values." The dataset composition shows heavy reliance on state-regulated Chinese-language news media, with no documented inclusion of Traditional Chinese script.
  • Response Rate Differentials: Qwen (Alibaba endpoint) responded to only 28% of Falun Gong-related prompts, 15% of Tiananmen prompts, and 22% of Uyghur genocide prompts. Control-group protest imagery unrelated to China (Black Lives Matter, Indian farmers' protests) also triggered 15% refusal rates, suggesting broad classification of protest content as inherently sensitive.
Because China’s AI ecosystem is evolving rapidly and unevenly across sectors, we have focused on domains where significant changes took place between 2023 and 2025, where new evidence became available, or where human rights risks accelerated. Those areas do not represent the full range of AI applications in China but are the most revealing of how the CCP is integrating AI technologies into its political control apparatus.

Contextual Implications: Ambient Censorship Through Information Reshaping

The ASPI report identifies a shift from traditional content removal to what researchers term "systemic informational gaslighting." Unlike overt takedowns that alert users to censorship, vision-language models engage in ambient manipulation where information is subtly reshaped without user awareness. The research demonstrates this through comparative analysis: when shown a graph of China's 20th-century infant mortality rates, Qwen's Simplified Chinese response emphasized famine and medical shortages during the Great Leap Forward as a "catastrophic crisis for children's survival." The English prompt yielded more direct causal attribution to "policy decisions such as forced collectivization" that "killed millions." These responses measured only 0.08 cosine distance apart—technically similar, yet meaningfully different in historical attribution.

This presents a qualitatively different challenge than keyword blocking or DNS filtering. Users receive responses that appear complete and authoritative, with no indication that politically sensitive details have been omitted or reframed. The technical mechanism enabling this lies in reinforcement learning from human feedback (RLHF), which Chinese engineers have adapted to tune models for political sensitivity rather than conventional helpfulness metrics. According to reports cited by ASPI from Initium Media, Baidu engineers describe this as a "Whac-a-Mole game" of anticipating euphemisms and allegories that might bypass filters.

The implications extend beyond Chinese borders as these models gain international adoption. DeepSeek, which exhibited weaker but still detectable censorship signals, has been integrated into multiple Chinese municipal surveillance systems and city governance platforms as of February 2025, including Hangzhou's City Brain and Weiyuan County's Public Security Bureau system.

Recommendations and Future Outlook

Based on ASPI's empirical findings, the research suggests several critical actions for international stakeholders:

  • Mandatory Transparency Requirements: Regulatory frameworks should require model providers to disclose training data composition by language and script, content-filtering methodologies, and jurisdiction-specific moderation rules. The significant behavioral variance between US-hosted and Singapore-hosted endpoints demonstrates that hosting location materially affects output censorship.
  • Cross-Lingual Auditing Protocols: Organizations deploying multimodal models must implement testing protocols across multiple languages and scripts, not solely English. ASPI's research shows that Traditional Chinese prompts triggered the most aggressive censorship responses from Chinese models, yet most existing bias research focuses exclusively on English-language interactions.
  • Open-Source Model Scrutiny: The term "open source" requires redefinition in contexts where model weights are released but training data remains subject to state-mandated ideological filtering. Organizations should develop technical methods to detect embedded political biases in ostensibly open models, particularly those trained primarily on Simplified Chinese corpora from mainland China.
  • Alternative Infrastructure Development: The response-rate differentials between provider endpoints suggest that inference infrastructure, not just model architecture, shapes censorship outcomes. International organizations should prioritize hosting arrangements that minimize exposure to jurisdiction-specific content controls.
  • Minority Language Model Monitoring: ASPI separately documents that Chinese state entities are developing specialized models in Uyghur, Tibetan, Mongolian, and Korean explicitly for "public sentiment analysis" and population monitoring. These applications warrant heightened scrutiny as they target linguistically isolated communities with limited technical resources for counter-analysis.

Conclusion

The ASPI research documents that censorship in contemporary vision-language models operates through embedded architectural choices rather than easily circumvented keyword filters. The layered nature of these controls—spanning training data curation, model fine-tuning through politically-directed RLHF, and provider-level content moderation—creates a system resistant to simple technical countermeasures. The differential behavior across languages and hosting jurisdictions suggests that as Chinese models expand internationally, they carry encoded restrictions that may not be immediately apparent to non-Chinese-speaking users or those accessing models through Western infrastructure. The key risk identified by ASPI is not overt propaganda but the normalization of incomplete or politically reshaped information presented as objective machine output, particularly as these systems mediate an expanding range of information-seeking behaviors worldwide.

Access ASPI's complete full report on how the rise of artificial intelligence (AI) is transforming China’s state control system into a precision
instrument for managing its population and targeting groups at home and abroad.

Learn more

Share this post

Author

Editorial Team
The Editorial Team at Security Land is comprised of experienced professionals dedicated to delivering insightful analysis, breaking news, and expert perspectives on the ever-evolving threat landscape

Comments