AI Trading Competition: Are Chinese LLMs Dominating Returns?

Live analysis of Alpha Arena: Qwen3 and DeepSeek lead while Western models struggle

Follow the real-time AI trading competition where 6 leading LLMs manage $10,000 each on Hyperliquid. Early results show Chinese models Qwen3 and DeepSeek delivering strong performance while Gemini and GPT‑5 face significant drawdowns. This live experiment provides unprecedented insights into autonomous AI trading capabilities.

Discuss AI Trading Strategy

Competition Status: Chinese Models Leading Early Phase

From official announcements and reporting, early patterns can be derived. They explain why Qwen3/DeepSeek performed better through the first days – and why Gemini/GPT‑5 fell behind.

Observed Patterns (Early Phase)

  • Qwen3: Few, focused trades; rarely >2 positions; tight SL/TP ranges; high conviction.
  • DeepSeek: Long bias, more assets, 10–15x leverage; visible stop discipline.
  • Gemini: Very many trades; frequently maximum position count; premature exits despite SL/TP; lower conviction.
  • GPT‑5: Broader, more cautious; several small positions; still significant drawdowns.

These patterns are snapshots. They can change with market phase, volatility, and learning parameters of the agents. The evaluation must therefore always include date/source.

Methodology & Rules (Alpha Arena)

This is how the competition is set up – important for contextualising the results.

More details and live data can be found directly at nof1.ai .

Early Results and Behaviour Profiles (as of 22–23 Oct 2025)

The charts show a reported 1-week snapshot as well as a normalised behaviour profile (derived from reports). Numbers are approximations; please refer to sources.

Qwen3: approx. +100%, DeepSeek: approx. +100%, Gemini: approx. -60%, GPT‑5: approx. -55 to -60%

Source: nof1.ai (Live Leaderboard), Odaily (22.10.2025), BlockBeats (23.10.2025), 99Bitcoins (Oct 2025). Links see below.

Normalised dimensions: Trading Frequency, Simultaneous Positions, SL/TP Discipline, Conviction

Source: nof1.ai (Live Leaderboard), Odaily (22.10.2025), BlockBeats (23.10.2025), 99Bitcoins (Oct 2025). Links see below.

Additional Visualisations (optional)

Equity curves and trade distribution as illustrative placeholders – replace them with live data from the leaderboard if needed.

Model Profiles (Early Phase)

Brief profiles of participating LLMs – based on observed patterns and reports.

DeepSeek Chat V3.1

High trading frequency, diversification across all 6 assets, disciplined SL/TP setups, moderate to high leverage (10x–20x).

Qwen3 Max

Few, focused trades; rarely more than 2 parallel positions; tight SL/TP; high conviction on entry/hold.

Gemini 2.5 Pro

Many position changes, frequently maximum parallel positions; premature exits despite SL/TP; inconsistent execution.

GPT‑5

Broader, more cautious allocation; several smaller positions; still drawdowns – partly operational execution weaknesses reported.

Claude Sonnet 4.5

Partially high cash allocation (≈70% in reports), thus lower volatility; reasonable but capped upside.

Grok 4

Active trading with higher risk; strong results possible when the regime fits.

Key Insights for Your Roadmap

What you can derive from Alpha Arena – regardless of whether you trade or evaluate autonomous agents in other domains.

Focus Beats Over-Trading

Few, clear bets and disciplined stops proved more robust in week 1 than frequent reshuffling.

Transparency is a Feature

On-chain trading + public telemetry enable real learning instead of "black box".

Regime Dependence

Results depend on market environment – change the regime, change the winners.

Guardrails First

Define limits, approvals, escalations and documentation before you go live with agents.

Implement the observations in playbooks: Policy-as-Code, telemetry, reviews, budget limits.

Challenges & Limitations

Important limitations before you interpret the results.

Conclusion

In the early phase, Qwen3 and DeepSeek dominate – driven by focused trades and more consistent risk management. Gemini and GPT‑5 struggle with drawdowns and inconsistent execution. This is exciting but not a final verdict: The experiment is short, volatile and regime-dependent. Use the data to sharpen your agent governance – not to make investment decisions.

Key Takeaways

  • On-chain competition with real budgets provides rare transparency.
  • Chinese models show higher conviction and focused position management.
  • Over-trading and premature exits cost performance.
  • Governance, limits, telemetry determine the success of autonomous agents.

Further Information

Frequently Asked Questions (FAQ)

Is really traded with real capital? +
Yes. According to nof1.ai, each model manages a real $10,000 budget, trades run on-chain via Hyperliquid. Leaderboard and curves are publicly viewable (see links).
Is "GPT‑5" confirmed? +
Several reports mention "GPT‑5". A separate confirmation by OpenAI is not publicly available. We use the naming according to reporting and mark uncertainties.
Are there exact metrics for trades, positions, SL/TP? +
Partially. Leaderboards, curves and posts provide insights. Detailed metrics (e.g., exact trade numbers) are only partially public; therefore the behaviour charts are normalised derivations with source citation.
Are the results investment recommendations? +
No. This is an experiment with short duration and high risk. Results are regime-dependent and not statistically significant. No financial advice.
Which assets are traded, and with which leverage? +
Traded are BTC, ETH, SOL, BNB, DOGE and XRP as perpetuals on Hyperliquid. The competition band is 10x–20x; the specific leverage is chosen by the model per trade. SL/TP are mandatory.
How transparent is the setup really? +
Wallets and PnL curves are publicly viewable; the frontend view summarises trades/positions. On-chain transactions can be checked via the leaderboard links.
Does the evaluation consider fees, funding, slippage? +
Yes, it's traded live on Hyperliquid. Therefore fees, funding rates, latency and slippage have real effects on the PnL curve. The effects can fluctuate intraday.