AI Trading Competition: Are Chinese LLMs Dominating Returns?
Live analysis of Alpha Arena: Qwen3 and DeepSeek lead while Western models struggle
Follow the real-time AI trading competition where 6 leading LLMs manage $10,000 each on Hyperliquid. Early results show Chinese models Qwen3 and DeepSeek delivering strong performance while Gemini and GPT‑5 face significant drawdowns. This live experiment provides unprecedented insights into autonomous AI trading capabilities.
Discuss AI Trading Strategy
Competition Status: Chinese Models Leading Early Phase
From official announcements and reporting, early patterns can be derived. They explain why Qwen3/DeepSeek performed better through the first days – and why Gemini/GPT‑5 fell behind.
Observed Patterns (Early Phase)
-
Qwen3:
Few, focused trades; rarely >2 positions; tight SL/TP ranges; high conviction.
-
DeepSeek:
Long bias, more assets, 10–15x leverage; visible stop discipline.
-
Gemini:
Very many trades; frequently maximum position count; premature exits despite SL/TP; lower conviction.
-
GPT‑5:
Broader, more cautious; several small positions; still significant drawdowns.
These patterns are snapshots. They can change with market phase, volatility, and learning parameters of the agents. The evaluation must therefore always include date/source.
Methodology & Rules (Alpha Arena)
This is how the competition is set up – important for contextualising the results.
-
Season:
Season 1 live since 17/18 Oct 2025 until 03 Nov 2025 (as of 2025-10-27).
-
Starting Capital:
$10,000 per model (total $60,000 live on-chain).
-
Markets:
Perpetuals on BTC, ETH, SOL, BNB, DOGE, XRP (Hyperliquid).
-
Position Management:
Up to 6 parallel positions possible (per asset).
-
Leverage:
Competition band 10x–20x; selection per trade model-dependent.
-
Risk Parameters:
Mandatory
Stop-Loss (SL) and Take-Profit (TP) per trade.
-
Autonomy:
No human intervention in decision logic or execution.
-
Transparency:
Live leaderboard with wallet/transaction insight; real-time updates.
More details and live data can be found directly at
nof1.ai
.
Early Results and Behaviour Profiles (as of 22–23 Oct 2025)
The charts show a reported 1-week snapshot as well as a normalised behaviour profile (derived from reports). Numbers are approximations; please refer to sources.
Source: nof1.ai (Live Leaderboard), Odaily (22.10.2025), BlockBeats (23.10.2025), 99Bitcoins (Oct 2025). Links see below.
Source: nof1.ai (Live Leaderboard), Odaily (22.10.2025), BlockBeats (23.10.2025), 99Bitcoins (Oct 2025). Links see below.
Additional Visualisations (optional)
Equity curves and trade distribution as illustrative placeholders – replace them with live data from the leaderboard if needed.
Model Profiles (Early Phase)
Brief profiles of participating LLMs – based on observed patterns and reports.
DeepSeek Chat V3.1
High trading frequency, diversification across all 6 assets, disciplined SL/TP setups, moderate to high leverage (10x–20x).
Qwen3 Max
Few, focused trades; rarely more than 2 parallel positions; tight SL/TP; high conviction on entry/hold.
Gemini 2.5 Pro
Many position changes, frequently maximum parallel positions; premature exits despite SL/TP; inconsistent execution.
GPT‑5
Broader, more cautious allocation; several smaller positions; still drawdowns – partly operational execution weaknesses reported.
Claude Sonnet 4.5
Partially high cash allocation (≈70% in reports), thus lower volatility; reasonable but capped upside.
Grok 4
Active trading with higher risk; strong results possible when the regime fits.
Key Insights for Your Roadmap
What you can derive from Alpha Arena – regardless of whether you trade or evaluate autonomous agents in other domains.
Focus Beats Over-Trading
Few, clear bets and disciplined stops proved more robust in week 1 than frequent reshuffling.
Transparency is a Feature
On-chain trading + public telemetry enable real learning instead of "black box".
Regime Dependence
Results depend on market environment – change the regime, change the winners.
Guardrails First
Define limits, approvals, escalations and documentation before you go live with agents.
Implement the observations in playbooks: Policy-as-Code, telemetry, reviews, budget limits.
Challenges & Limitations
Important limitations before you interpret the results.
-
Market Regime:
Short-term trends can favour models with long/leverage – other phases reverse the picture.
-
Time Period & Sample:
Few days/weeks are statistically thin; only 6 models → high variance.
-
Execution & Costs:
Fees, funding, latency and slippage have real effects – details vary intraday.
-
Rule Constraints:
SL/TP mandatory, leverage limits; no human correction after entry.
-
Transparency Limits:
Leaderboard shows PnL/trades, but not always complete micro-metrics (e.g., exact trade counts).
-
Naming:
"GPT‑5" is mentioned in reports; separate OpenAI confirmation is not publicly available.
Conclusion
In the early phase, Qwen3 and DeepSeek dominate – driven by focused trades and more consistent risk management. Gemini and GPT‑5 struggle with drawdowns and inconsistent execution. This is exciting but not a final verdict: The experiment is short, volatile and regime-dependent. Use the data to sharpen your agent governance – not to make investment decisions.
Key Takeaways
-
On-chain competition with real budgets provides rare transparency.
-
Chinese models show higher conviction and focused position management.
-
Over-trading and premature exits cost performance.
-
Governance, limits, telemetry determine the success of autonomous agents.
Frequently Asked Questions (FAQ)
Is really traded with real capital?
+
Yes. According to nof1.ai, each model manages a real $10,000 budget, trades run on-chain via Hyperliquid. Leaderboard and curves are publicly viewable (see links).
Is "GPT‑5" confirmed?
+
Several reports mention "GPT‑5". A separate confirmation by OpenAI is not publicly available. We use the naming according to reporting and mark uncertainties.
Are there exact metrics for trades, positions, SL/TP?
+
Partially. Leaderboards, curves and posts provide insights. Detailed metrics (e.g., exact trade numbers) are only partially public; therefore the behaviour charts are normalised derivations with source citation.
Are the results investment recommendations?
+
No. This is an experiment with short duration and high risk. Results are regime-dependent and not statistically significant. No financial advice.
Which assets are traded, and with which leverage?
+
Traded are BTC, ETH, SOL, BNB, DOGE and XRP as perpetuals on Hyperliquid. The competition band is 10x–20x; the specific leverage is chosen by the model per trade. SL/TP are mandatory.
How transparent is the setup really?
+
Wallets and PnL curves are publicly viewable; the frontend view summarises trades/positions. On-chain transactions can be checked via the leaderboard links.
Does the evaluation consider fees, funding, slippage?
+
Yes, it's traded live on Hyperliquid. Therefore fees, funding rates, latency and slippage have real effects on the PnL curve. The effects can fluctuate intraday.