Last updated 3/22/23

The financial data used in this analysis was obtained from Yahoo Finance.

As a first step, I installed and loaded the tidyverse package, which provides a wide range of powerful tools for data cleaning and analysis. This allowed me to leverage several useful functionalities in my data processing workflow.

install.packages ("tidyverse")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
library("tidyverse")
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.1     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.2     ✔ tibble    3.2.0
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors

I proceeded to import the Tesla daily stock information into the TSLA_daily data frame.

TSLA_daily <- read.csv("TSLA (1).csv")

To evaluate the headers and gain insight into the data, I ran the head() and glimpse() script. This enabled me to scrutinize the headers and glean some preliminary information from the data.

head(TSLA_daily)
glimpse(TSLA_daily)
## Rows: 3,205
## Columns: 7
## $ Date      <chr> "2010-06-29", "2010-06-30", "2010-07-01", "2010-07-02", "201…
## $ Open      <dbl> 1.266667, 1.719333, 1.666667, 1.533333, 1.333333, 1.093333, …
## $ High      <dbl> 1.666667, 2.028000, 1.728000, 1.540000, 1.333333, 1.108667, …
## $ Low       <dbl> 1.169333, 1.553333, 1.351333, 1.247333, 1.055333, 0.998667, …
## $ Close     <dbl> 1.592667, 1.588667, 1.464000, 1.280000, 1.074000, 1.053333, …
## $ Adj.Close <dbl> 1.592667, 1.588667, 1.464000, 1.280000, 1.074000, 1.053333, …
## $ Volume    <int> 281494500, 257806500, 123282000, 77097000, 103003500, 103825…

After thorough analysis, I discovered that the “Date” column was in a character data type. To enable smooth filtering and sorting of the data, I created a new script to change the data type of the “Date” column to a date format.

TSLA_daily$Date<- as.Date(TSLA_daily$Date)

To ensure that the changes made to the headers were successfully implemented, I utilized the colnames() function to perform a detailed review of the headers.

colnames(TSLA_daily)
## [1] "Date"      "Open"      "High"      "Low"       "Close"     "Adj.Close"
## [7] "Volume"

Upon careful examination, I observed that all columns in the data set were initialized with an uppercase letter, and the “adj.close” column contained a period instead of an underscore. To streamline the data cleaning process, I opted to install the “janitor” package and utilized the clean_names() function to generate uniform headers that were all lowercase, with underscores (_) replacing any spaces or periods.

Subsequently, I saved the newly cleaned file as TSLA_daily2 and performed a thorough review of the information using the “glimpse” function, ensuring that the dataset was uniform and ready for further analysis.

install.packages("janitor")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
library("janitor")
## 
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
TSLA_daily2<-clean_names(TSLA_daily)
glimpse(TSLA_daily2)
## Rows: 3,205
## Columns: 7
## $ date      <date> 2010-06-29, 2010-06-30, 2010-07-01, 2010-07-02, 2010-07-06,…
## $ open      <dbl> 1.266667, 1.719333, 1.666667, 1.533333, 1.333333, 1.093333, …
## $ high      <dbl> 1.666667, 2.028000, 1.728000, 1.540000, 1.333333, 1.108667, …
## $ low       <dbl> 1.169333, 1.553333, 1.351333, 1.247333, 1.055333, 0.998667, …
## $ close     <dbl> 1.592667, 1.588667, 1.464000, 1.280000, 1.074000, 1.053333, …
## $ adj_close <dbl> 1.592667, 1.588667, 1.464000, 1.280000, 1.074000, 1.053333, …
## $ volume    <int> 281494500, 257806500, 123282000, 77097000, 103003500, 103825…

With the use of this script, I was able to generate a comprehensive scatter plot showcasing the closing stock prices of Tesla, spanning from June 2010 to March 2023.

ggplot(data= TSLA_daily2) + geom_point(mapping = aes (x = date, y= close))

I include a regression line and formula to gain a deeper insight into the data.

install.packages("ggplot2")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
library("ggplot2")
install.packages("ggpubr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
library("ggpubr")
model <- lm(close ~ date, data = TSLA_daily2)
ggplot(data = TSLA_daily2, aes(x = date, y = close)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
labs(subtitle = paste0("Equation: y = ", round(coef(model)[1], 2), 
                       " + ", round(coef(model)[2], 2), "x",
                       ", R-squared = ", round(summary(model)$r.squared, 2)))
## `geom_smooth()` using formula = 'y ~ x'

In order to gain a better understanding of Tesla’s recent stock price performance, I utilized a script to filter the stock price data to only include dates from 2020-2023.

install.packages("dplyr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
library("dplyr")
TSLA_filtered<-TSLA_daily2 %>%
  filter(date>="2020-01-01")

Next, I generated a new scatter plot using the updated data frame TSLA_filtered and included a regression line. This line of best fit provides insight into the relationship between the date and stock price, offering a visual representation of any trends or patterns in the data. I also added a title and subtitle to clarify the filter on the data.

ggplot(data = TSLA_filtered, aes(x = date, y = close)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
labs(subtitle = paste0("Equation: y = ", round(coef(model)[1], 2), 
                       " + ", round(coef(model)[2], 2), "x",
                       ", R-squared = ", round(summary(model)$r.squared, 2)), title = "Tesla Closing Prices")
## `geom_smooth()` using formula = 'y ~ x'

The candlestick graph is another powerful tool for detailed analysis. It provides a visual representation of the daily high and low prices, as well as the open and close prices. With this graph, you can easily zoom in on specific areas of interest by clicking and dragging across the data. Additionally, hovering your mouse over a data point displays detailed information for that specific date, making it an excellent tool for in-depth analysis.

install.packages("plotly")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
library("plotly")
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
plot_ly(TSLA_filtered, x = ~date, type = "candlestick", open = ~open, high = ~high, low = ~low, close = ~close) %>%
  layout(title = "Tesla Stock Price - Candlestick Chart",
         xaxis = list(title = "", rangeslider = list(visible = FALSE)),
         yaxis = list(title = "Price")) %>%
  config(displayModeBar = FALSE)

After retrieving stock prices from Yahoo finance for SPY spanning June 29, 2010 through March 22, 2023, I loaded the data into a data frame named SPY_daily. Although my task was to review information from January 2020 through March 2023, I decided to include historical data to improve my overall understanding of the data.

SPY_daily<-read.csv("SPY.csv")

While plotting the data, I observed that the date format was not being recognized properly. To resolve this issue, I developed a script to ensure that the data is properly recognized as dates.

SPY_daily$Date <- as.Date(SPY_daily$Date, format = "%m/%d/%Y")

I compared both stocks by reviewing their data back to 2010. To facilitate comparison, I plotted all the data from both stocks on a single graph using a script that included a regression line. The regression line was complemented with R-squared values and a formula for easy understanding of the data.

install.packages("ggplot2")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
library("ggplot2")
install.packages("ggtext")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
library("ggtext")
 model_spy <- lm(Close ~ Date, data = SPY_daily)
  model_tsla <- lm(close ~ date, data = TSLA_daily2)
  
  
  ggplot() +
    geom_point(data = TSLA_daily2, aes(x = date, y = close), color = "blue") +
    geom_smooth(data = TSLA_daily2, aes(x = date, y = close), method = "lm", se = FALSE, color = "blue") +
    geom_point(data = SPY_daily, aes(x = Date, y = Close), color = "green") +
    geom_smooth(data = SPY_daily, aes(x = Date, y = Close), method = "lm", se = FALSE, color = "green") +
    geom_text(aes(x = as.Date("2014-12-01"), y = 420, label = paste0("Equation for TSLA: y = ", round(coef(model_tsla)[1], 2),
                                                                     " + ", round(coef(model_tsla)[2], 2), "x",
                                                                     ", R-squared = ", round(summary(model_tsla)$r.squared, 2))), color = "blue") +
    geom_text(aes(x = as.Date("2014-12-01"), y = 400, label = paste0("Equation for SPY: y = ", round(coef(model_spy)[1], 2), 
                                                                     " + ", round(coef(model_spy)[2], 2), "x",
                                                                     ", R-squared = ", round(summary(model_spy)$r.squared, 2))), color = "green") +
    labs(subtitle = "TSLA vs SPY", x = "Date", y = "Close Price") +
    theme_bw()
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'

Here are some of the findings based on the regression analysis: -The slope for TSLA is lower than SPY, meaning that TSLA has had a slower rate of increase in stock price over time compared to SPY. -The R-squared value for TSLA is lower than SPY, indicating that the regression line is a weaker fit for TSLA than SPY. This means that there may be more variability in the data for TSLA compared to SPY. -Overall, the regression line for SPY has a steeper slope and higher R-squared value than TSLA, indicating that SPY has had a more consistent and rapid rate of increase in stock price over time compared to TSLA.

Following this, I created a visualization for the dates in question (January 2020 - March 2023).

SPY_daily_filtered<-SPY_daily %>%
  filter(Date>="2020-01-01")
model_spy_filtered <- lm(Close ~ Date, data = SPY_daily_filtered)
model_tsla_filtered <- lm(close ~ date, data = TSLA_filtered)
install.packages("ggplot2")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
library("ggplot2")
ggplot() +
    geom_point(data = TSLA_filtered, aes(x = date, y = close), color = "blue") +
    geom_smooth(data = TSLA_filtered, aes(x = date, y = close), method = "lm", se = FALSE, color = "blue") +
    geom_point(data = SPY_daily_filtered, aes(x = Date, y = Close), color = "green") +
    geom_smooth(data = SPY_daily_filtered, aes(x = Date, y = Close), method = "lm", se = FALSE, color = "green") +
    labs(title = paste0("<span style='color: blue;'>TSLA</span> vs <span style='color: green;'>SPY</span>"),
         subtitle = paste0("Equation for TSLA: y = ", round(coef(model_tsla_filtered)[1], 2), 
                           " + ", round(coef(model_tsla_filtered)[2], 2), "x",
                           ", R-squared = ", round(summary(model_tsla_filtered)$r.squared, 2),
                           "\nEquation for SPY: y = ", round(coef(model_spy_filtered)[1], 2), 
                           " + ", round(coef(model_spy_filtered)[2], 2), "x",
                           ", R-squared = ", round(summary(model_spy_filtered)$r.squared, 2)),
         x = "Date", y = "Close Price") +
    theme_bw() +
    theme(plot.title = element_markdown())
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'

Additionally, I developed a script to calculate and compare the ROI for TSLA and SPY as a final step.

TSLA_ROI <- (TSLA_filtered$close[nrow(TSLA_filtered)] / TSLA_filtered$close[1]) - 1
  SPY_ROI <- (SPY_daily_filtered$Adj.Close[nrow(SPY_daily_filtered)] / SPY_daily_filtered$Adj.Close[1]) - 1
  ROI <- data.frame(
    Company = c("TSLA", "SPY"),
    ROI = c(TSLA_ROI, SPY_ROI)
  )
  
  ggplot(ROI, aes(x = Company, y = ROI, fill = Company)) +
    geom_bar(stat = "identity", position = "dodge") +
    geom_text(aes(label = paste0(round(ROI * 100, 2), "%")), vjust = -0.5) +
    scale_fill_manual(values = c("green", "blue")) +
    ggtitle("ROI Comparison: TSLA vs. SPY") +
    ylab("ROI") +
    theme(plot.title = element_text(hjust = 0.5))

The graph illustrates that from January 2020 to March 2023, TSLA had a significantly higher ROI of 566.4%, compared to SPY’s ROI of 29.3%.

Background: In 2020, Tesla’s stock price started to rise significantly due to several factors. One of the primary reasons for this surge was the company’s strong financial performance, which exceeded market expectations. Tesla reported profits for several consecutive quarters, and its revenue growth was impressive. This positive performance helped build investor confidence and led to increased demand for Tesla’s stock.

Another factor that contributed to Tesla’s rising stock price in 2020 was the company’s increasing dominance in the electric vehicle market. Tesla continued to expand its production and delivery capabilities, and it also unveiled several new products and technologies that generated excitement among investors.

Additionally, in 2020, Tesla was added to the S&P 500 index, a significant milestone that brought the company’s stock to the attention of many institutional investors. This inclusion also resulted in increased demand for Tesla’s stock, further driving up its price.

Finally, the overall market conditions in 2020, including low-interest rates and unprecedented fiscal stimulus measures, also contributed to Tesla’s rising stock price. As investors looked for high-growth opportunities in a volatile market, Tesla’s strong financial performance and innovative products made it an attractive investment option.

Conclusion: The information from this analysis can be summarized as follows: -Both TSLA and SPY have positive slopes, indicating a generally increasing trend in their stock prices over time. -The slope of TSLA is higher than that of SPY, suggesting that TSLA has been increasing at a faster rate than SPY. -The R-squared value for TSLA and SPY is 0.36 and 0.4 respectively, indicating that the regression line explains less than half of the variability in TSLA and SPY stock prices. This suggests that there may be other factors affecting the stock prices beyond the ones included in the regression analysis. -This analysis suggests that Tesla’s stock price had strong growth during analyzed time period, outpacing the broader market represented by SPY. -Overall, the results suggest that TSLA may be a more volatile and unpredictable stock than SPY, but TSLA may offer more returns given its lower R-squared value and higher slope. However, further analysis would be needed to confirm this.