Can a Large Language Model (LLM) successfully perform financial analysis better than a human, and which platform is better?
Daniel MacDougall
Trading performance and financial analysis: ChatGPT vs. Gemini AI
In financial trading and analysis, choosing an Artificial Intelligence (AI) model can significantly impact performance outcomes. Two prominent AI models, ChatGPT and Gemini, have garnered attention for their capabilities in informing trading strategies.
During the study, the LLM exhibited an advantage over human analysts due to its vast information-gathering capabilities, and prediction accuracy was on par with most savvy financial analysts. Plus, LLMs do not suffer burnout from long hours and do not have emotions afflicting decision-making.
Comparing Chatgpt & Gemini to current AI trading models:
Both AI platforms were compared to artificial neural networks (ANN). These ANN applications are narrowly specialized state-of-the-art machine learning applications trained in financial analysis.
The research compared the AI models and ANN for equal-weighted portfolios and value-weighted portfolio analysis.
The research indicated "that equal-weighted portfolios based on GPT predictions achieve a Sharpe ratio of 3.36, which is substantially larger than the Sharpe ratio of ANN-based portfolios (2.54) or logistic regression-based portfolios (2.05). (1)"
In contrast, for value-weighted portfolios, we observe that ANN performs relatively better (Sharpe = 1.79) than GPT (1.47). (1)"
These data sets indicate that Chatgpt 4 can compete with current financially trained models.
Performance Metrics Comparison:
To assess the trading performance of ChatGPT compared to Gemini, key metrics such as Sharpe ratios and Alphas are crucial indicators.
Sharpe ratio: compares the return on investment with its risk.
Alphas: an investment's performance based against certain benchmarks, used to gauge the skill and strategies of portfolio managers.
Based on the research findings, ChatGPT demonstrates a higher Sharpe ratio and alpha than Gemini, showcasing its superior performance in generating profitable trading strategies.
Numerical Comparison:
The numerical analysis further supports the argument for ChatGPT's superiority in trading performance. ChatGPT achieves a Sharpe ratio of 1.25 and an alpha of 37 basis points, while Gemini exhibits a Sharpe ratio of 1.10 and an alpha of 31 basis points. These numerical values highlight the stronger performance of ChatGPT in generating returns and outperforming market benchmarks compared to Gemini.
Gemini Pro achieves an accuracy of 59.15%, close to GPT 61.05%, but Gemini's F1 is around 62.23% and GPT 4 65.82%
"In this study, accuracy is the total correct predictions compared to total predictions. F1 is the harmonic mean of the precision and recall. (1)"
Chatpgt EPS estimates also landed with around 60% accuracy in future prediction.
Consistency:
Consistency in trading performance is a critical factor for evaluating the reliability of AI models. ChatGPT consistently demonstrates superior performance across different trading scenarios, showcasing its ability to generate positive returns and outperform market expectations consistently. In contrast, Gemini's performance shows marginal significance and variability in trading outcomes, indicating a lower level of consistency compared to ChatGPT.
Conclusion:
The comparative analysis of trading performance between ChatGPT and Gemini AI models indicates ChatGPT's superiority in generating profitable trading strategies. With higher Sharpe ratios, alphas, and consistent performance outcomes, ChatGPT is the preferred choice.
While Gemini performed well compared to industry standards and ANN models, Chatgpt gave the advantage in financial decision-making and future forecasts.
(1) Alex G. Kim. Maximilian Muhn. Valeri V. Nikolaev . (2023). Financial Statement Analysis with Large Language Models. SSRN. Link to article