How to Use Big Data Effectively in Investing

how to use big data in investing image

In this article, our objective is to raise awareness on how retail investors can use big data in investing. This is not a fool proof method, but it has worked wonders for our investments over the years. We conclude by detailing how to use big data effectively in investing.

My ex-boss once said “If you’re doing what others are doing, you'll be average at best”. That piqued me to explore further in this direction. And that’s how I arrived at big data. In particular, the use of data analytics in finance.

In the post crisis world, when big data in finance was still in its infancy, the availability of such datasets are scarce. As a result, most institutional and hedge fund investors rely solely on macro data and non-scientific guesswork as a driver for their forecasts.

For too long, retail investors have been investing based on “feeling”. Things like “I invested Starbucks because it was packed when I was there last week” are often cited as the main reason why they invested in the first place.

But, can we actually quantify it? Was it just packed in that store last week? How about the week before? What about other stores?



Table of Contents
    Add a header to begin generating the table of contents

    What is big data?

    Big data is a collection of large data set that can be analyzed programmatically to understand trends, patterns, behaviors etc.

    The concept of big data often discourages investors to delve into deeper driven by the need to have sophisticated computing and modelling skills. We beg to differ. While that is true, there are still many ways for retail investors to do it themselves, so as long as they can draw relevance between the data and the company’s drivers.

    The role of big data in investing

    The availability of big data today means investors can access data of any frequencies and types. In investing, the primary objective of big data is to forecast key drivers such as cost and revenues. Consequently, astute investors can make better and more informed investing decisions to maximize return. Hence our tag line - "Fundamental-driven research. Model-informed investing"

    A brief illustration on the relationship of big data and investing.
    A brief illustration on the relationship of big data and investing.

    In a way, utilizing big data is a continuation of our write-up on identifying red flags in investing. Not only does it enhance our understanding on industry trends, it also separates what is actual happening on-the-ground vs what the market believes. A reality check. These are what create investment opportunities.

    Specifically, a technique called “Nowcasting” is gradually being adapted by the investment community. Nowcasting is a type of forecasting that uses high-frequency data to assess the “now” rather than predict the future.

    Such techniques are not limited to the investment community, it has also been used by central banks and governments to assess its economy in real-time for policy and planning purposes. A good example is the Federal Bank of Atlanta’s GDPNow. The product uses models to produce daily/weekly GDP numbers, rather than the need to wait every quarter.

    Being creative with big data in finance

    One of the things I really enjoy about using big data in finance is the creativity it gives. You can almost find anything (so as long as the data exist), that connects with the underlying driver you’re trying to forecast.

    Let’s take car insurers as an example. Suppose you’re trying to figure out what are the claims cost next quarter. Since insurance claims are driven by accidents, we can drill down further into what drives claims. Here are a few ideas

    • Number of vehicle accidents. Fairly straightforward.
    • Weather conditions. Rainy days will likely to have more accidents than sunny days, for example.
    • Inflation. Repair costs, which is a significant component of total claims, are driven in part by inflation.
    big data investing modeling
    An example on modeling claims cost of a car insurer using rain fall, inflation and accident data.
    big data and investing
    Actual claims vs modeled claims.

    Building a model for forecasts

    There are countless models which investors can use, from machine learning techniques to a simple linear regression.

    “All models are wrong, but some are useful” – George Box, British statistician.

    From a retail investor’s perspective, the goal is to get to the answer in the simplest yet most effective method possible, for example, a regression model.

    Generally, I would select ~3 key factors that will drive my forecasts. In addition, one can also produce range of forecast to analyze the best and worst case scenarios.

    Finding free datasets

    Datasets can be costly to acquire, with some costing up to 6 figures per annum. In fact, RSMetrics, a Big Data intelligence firm for businesses and investors, once used satellite imagery of JCPenney parking lots during the quarter to confirm that traffic into its stores across the country was in fact increasing. It goes without saying that satellite imagery does not come cheap!

    As retails investors, cost is the key consideration. Hence, we will only consider free sources. In my experience, despite being less granular, it provides a good starting to point to work with. Here are a few free sources we use

    Obviously, there are many other free sources online. Feel free to drop us a comment if you think we should be adding.

    How to use big data effectively in investing

    To use big data effectively in investing, investors first need to identify what drives a company. Even though there can be many chicken and egg situations, funneling through what’s significant is most impactful and time efficient. For example, Amazon’s sales may loosely depend on the average disposal income of the population, but its website traffic is likely to be a more significant indicator to sales.

    Secondly, be creative with data! Because we’re cost conscious, our data-driven investing approach largely depends on what data that are freely available. Being creative means finding alternatives or proxy data that can present the same picture. Using our previous example on forecasting insurance claims, weather data are often easier to find vs daily traffic data.

    Finally, start out with simple models before attempting more sophisticated ones for forecasts. In many ways, simple models are more tractable, easier to implement and less sensitive to changes in assumptions. In fact, most of the short term estimates we use are multi-linear regression models.

    In summary, models can only be as good as the data and assumptions. In fact, making best use of datasets ranks above everything, regardless of whether it’s free or paid data. Understanding what drives companies in a data-driven manner not only help build a big picture of how the economy works, but also generate actionable investment ideas.



    Leave a Comment

    Your email address will not be published. Required fields are marked *




    Scroll to Top