American statistician Richard Royall once said, “Statistics today is in a conceptual and theoretical mess“. Mix it with the reproducibility crisis in science and the lack of proper statistical training among both data scientists and product managers, and you will understand how hard it is to make sense of data. Yet confident inferences and clear recommendations are exactly what is expected from data scientists. The good news is that there is a way to build teams and processes which will get things done. Disputed statistical techniques should be abandoned, context should become a focal point of all the research, and data scientists should be a natural part of planning and decision-making. Below we briefly discuss all these points.
Disputed Statistical Techniques Should Be Abandoned
It will be fair to say that there is no unity among statisticians on how statistics should be done. There are various camps (frequentists, Bayesians, likelihoodists, etc.) and quite a lot of criticisms and counter-criticism in all possible directions. While the situation might seem like a deadlock, the good thing is that we can choose techniques based on such criticism and our specific needs. Most statisticians, hopefully, follow the same “eclectic” approach. Statistics is a tool to make sense of data, and our recommendation is to be skeptical about any technique and at least abandon techniques which might get you into trouble. An outline is as follows:
- Avoid p-values when possible, especially when working directly with the business. It is absolutely in order to use them when analyzing coefficients of regression models, or for ad-hoc statistical tests, but do not communicate them to the business. If your stakeholders do not know what they represent, you will have a hard time explaining it, and if they do, you will have an even harder time arguing that changing thresholds after seeing the data is a poor scientific practice. Anyone without rigorous statistical training is susceptible to inverse fallacy and thinking that “p-value of 0.05 means that there is a 95% chance that treatment works”. You might jump from one zoom call to another doing something similar to what the American Statistical Association did, but it feels like a waste of everyone’s time.
- Avoid freedom to choose prior distributions. When dealing with your stakeholders do not talk about them. You will know things went wrong when you would see a long queue of data scientists and stakeholders waving at you various “Neyman type A distributions” as their priors. Freedom to choose priors will lead to people spending time on doing it and there are much better ways to keep data scientists occupied (as Stanford statistician Steven Goodman said “The numbers are where the scientific discussion should start, not end.”)
Some context is necessary as one might argue that not much is left if we abandon all of the above. I am writing from a point of view of running statistics in the online mobile gaming industry. This industry is highly competitive and we tend to run a lot of statistical tests and most of these tests are unique in a sense that we usually do not test or retest the same product many times. Which puts us into a situation where we need to blend approaches developed for continuous testing (like one by Neyman and Pearson) with approaches tailored for specific tests (like subjective Bayesian). We ended up using non-parametric Bayes and some bits and pieces of Bayes factors here and there. In a nutshell, the highly-paced environment of online mobile games requires to automate and streamline inferences “anywhere where possible”. It also requires creation of useful user friendly tools, but that is a topic on its own.
Context Should Become the Focal Point of All the Research
These days everyone talks about the importance of understanding business or scientific context. A lot of effort is needed to learn methods and techniques of data science and statistics, so many students end up knowing regression and t-tests, but not knowing that both are meaningless on their own. First and foremost, the focus of data scientists should be on the situation and deep understanding of relevant factors and processes. Our advice is to create a culture and environment where data scientists would want to learn the context of your business. Without understanding how it operates, how economy works, data scientists might quickly turn into “data monkeys” who report point estimates and endlessly create dashboards. Here are main ideas:
- Hire science-oriented curious researchers who want to know how the world works. Build a robust interviewing process, test their programming skills and knowledge, but most importantly, ask them “research questions without answers” and see how they react.
- Monitor what motivates people, ask them what, how and why they do. Help them to find passion, be curious and want to understand business processes.
- Embed data scientists into business teams. They should work closely with those who run the business and have direct access to information and people.
Data Scientists Should Be a Natural Part of Planning and Decision Making
If that is the case, if their opinion matters and that they can influence decisions, they will feel ownership and responsibility, which will boost their commitment and help design better products. Here are main ideas:
- Projects should start with data scientists present in the room. It will help business teams to omit mistakes, not to launch products which were doomed to fail from the very beginning, not to run poorly designed experiments, etc. Clear separation of responsibilities is needed, for example, data scientists can model signal-to-noise ratio for proposed products and help design experiments, while business teams can be in charge of designing products, strategic planning and making final decisions.
- Data scientists should become thinking partners of product managers. Times when there were “I-know-everything” experts are long gone. There are just too many aspects of any business decision or process. Diversity of ideas and perspectives help to improve and innovate, and data scientists should be part of discussions of products, features, offers, etc. They can bring scientific thinking to the table, help distill vague ideas into actionable insights and brainstorm ways to simulate or check assumptions.
We discussed how to run statistics in the wild and how to keep calm. Key takeaway should be this: equip your data scientists with powerful tools and make them equal to the rest of the business. It will give you great competitive advantage as you will optimize processes faster, fail quickly and learn from your mistakes.
About the Author
Aleksandrs Gehsbargs is Director of the Games Data Science at Product Madness. He will speak at the Data Innovation Summit 2023. This is what he has to say about himself: “Since I was a teenager I was passionate about mathematics and helping people to learn and improve their skills. I studied in-depth mathematics and in parallel worked as a teacher for younger folks. After university I became excited about machine learning and worked in the field for many years, building various predictive and descriptive models. It was a great journey into the world of using advanced machine learning methods to improve business processes. With time I got more interested in statistics and was surprised to discover that statistics is both extremely useful and does not make any sense. Today I am leading a team of data scientists whose goal is to understand player behaviour by running AB tests, simulating it using Monte-Carlo modelling and diving into depths of data.”
The views and opinions expressed by the author do not necessarily state or reflect the views or positions of Hyperight.com or any entities they represent.
Featured image: Pressmaster at Envato Elements