Data Engineering

Scalable Big Data Modeling: How to model data to achieve scalability

June 22, 2020

Photo by <a target="_blank" href="https://www.pexels.com/@goumbik?utm_content=attributionCopyText&utm_medium=referral&utm_source=pexels" rel="noopener"> Lukas </a> on <a target="_blank" href="https://www.pexels.com/photo/person-using-macbook-pro-574077/?utm_content=attributionCopyText&utm_medium=ref" rel="noopener"> Pexels</a>

Big data analytics gives businesses a competitive advantage. It opened the door to unexplored opportunities for companies to collect and analyse data from numerous sources, and in turn create product offerings around the precious insight they get, reduce cost or find innovative ways of working. However, for companies to get value from their data, they need to make sure they model their data well in order to be able to scale and support the exponential data growth, and increased data demand. Jayesh Patel, Sr. Data Engineer at Rockstar Games, will talk about Scalable Big Data Modeling in his presentation at the Data Innovation Summit.

Apart from relating his expert knowledge in an article on ML Development using Feature Store published on our Read channel, Jayesh shared with us his views and expertise related to big data modeling, how companies can achieve it, the challenges of traditional big data modeling, the opportunity of big data for businesses and the gaming industry in particular.

Hyperight: Hi Jayesh, we are excited to have you present at the virtual stage at the 5th edition of the Data Innovation Summit. To start, please tell us a bit about yourself, your background and experience.

Jayesh Patel, Sr. Data Engineer at Rockstar Games

Jayesh Patel: I am a big data professional and expert in emerging technologies such as Data Engineering, Big Data Modeling, Machine Learning, Artificial Intelligence, and Big Data Analytics. I work for Rockstar Games as a Senior Data Engineer, where I have served since 2017. I am primarily responsible for leading the design and development of data-driven processes and Artificial Intelligence solutions on the big data platform. I contributed significantly to build the data platform, which led to the success of Red Dead Redemption II and Grand Theft Auto Online.

Over the last 15 years, I have successfully designed complex data processes, architected machine learning pipelines, and developed big data analytics solutions in several industry domains such as Telecommunications, Public Transportation, Manufacturing, Software

Services, Healthcare, Marketing Services, and Gaming. Additionally, I am an active senior member of the IEEE. My expertise and research in the Big Data space were well received in numerous international IEEE conferences. I am also an editorial board member of a renowned international journal-International Journal of Data Mining & Knowledge

Management Process (IJDKP). I actively guide and review the research work of other scholars and professors around the world.

big data engineering — Photo by Chris Liverani on Unsplash

Hyperight: Attendees will have the opportunity to hear you present on Scalable Big Data Modeling. As you state, Due to cheap storage, exponential data growth, and increased data demand, traditional data modeling falls short in the big data platform. How can companies model their data well to achieve scalability and usability?

Jayesh Patel: The data model is the gateway to access valuable insights from the enterprise data platform. Companies can achieve scalability and usability by the following:

Architecture: Big data platforms and cloud data warehouses offer several options. NoSQL databases, Real-time streaming platforms, microservices and cloud data warehouses offer versatility for data architecture. Architecture is the key aspect to consider for scalable big data modeling. I’ve covered more details in some of my research papers.
Data Model Lineage: Enterprise data lake can store billions and trillions of data points. These data points and metrics are accessed from a large number of data models. When the scale goes up, it often becomes a huge challenge to maintain data models and serve insights consistently. Data Model lineage shows model dependency, timeline, and maintenance notes. Data models always evolve over the period of time and lineage helps to effectively enhance them.
Data Model Registry: Data model registry presents which insights and metrics are available for stakeholders, analysts, and data scientists. It showcases the details about data models, including the purpose, data sources, data model lineage, metrics definitions, and business rules. It is critical to market your data models and makes them usable. It should be part of your Big Data Governance strategy.

Hyperight: What are the challenges that organizations face when relying on traditional data modeling?

Jayesh Patel: There are quite a few challenges with traditional data modeling. Some of the notable challenges are:

Traditional data models don’t scale well. Enterprise data growth skyrocketed in the last decade. We didn’t experience such data growth in the 20th century. Traditional data models use relational databases to store data and it requires expensive and robust servers to prevent data loss. Cost and maintenance become a monumental challenge to handle the scale with traditional data models.
Variety: According to projections from IDC, 80 per cent of worldwide data will be unstructured by 2025. A large chunk of big data is still in an unstructured format. Traditional data modeling techniques are not good at processing unstructured data.
Time to Value: Traditional data modeling follows design-first and then the development process. The phases of conceptual, logical, and physical data models are very useful to communicate design details of data models. However, it adds time to see real value from the data models. Time to generate value is high with traditional modeling. Big data platform also serves explorative analysis, predictive modeling and forecasting. Traditional modeling takes more time to fulfil these use cases.

Hyperight: How has big data transformed the way business decisions are made in the digital era?

Jayesh Patel: Big data opened the doors for the unexplored opportunities. You may have heard of a Coca Cola use case of Customer Acquisition and Retention. Using big data, they learned about what their target consumers are passionate about. They served various branded content that aligns with people’s passions by creating advertising content that speaks differently to different audiences. They achieved more than they expected. From outside, we can ponder that there is no correlation between what music we hear and classic coke. You never know what your data will take you to.

This was just one example, but there are tons of use cases when we look around. Big data not only transformed decision making in marketing, but it had a significant impact on various business verticals. According to an Accenture study in 2018, 83% of enterprise executives have pursued big data projects to gain a competitive edge. Human judgment was the key driver for business decisions decades ago. Big data, AI, and Analytics have transformed it. Data-driven decisions are making a big wave and big data is the force behind it.

Hyperight: You are currently working in a leading capacity as a Sr. Data Engineer at Rockstar Games – the video game publisher of Red Dead Redemption and Grand Theft Auto, as all gaming enthusiasts would be familiar with. Could you outline the impact of big data and artificial intelligence in the gaming industry?

Jayesh Patel: As I said earlier, data insights give a competitive edge to any business, and gaming is not excluded. The enormous volume of data with variety and velocity are processed to understand different aspects of games, features, player behaviour, and a lot more. Grand Theft Auto V was launched on Epic store recently and Epic store was down minutes after the launch. That’s the demand for a 7-year-old title and data-driven title updates are one of the key drivers to the success of the title.

Hyperight: What are the trends with big data engineering that we can expect in the next several years?

Jayesh Patel: Big data analytics is evolving at a very fast pace. Data demand increased significantly in the last few years with the new use cases in machine learning, predictive scoring, and artificial intelligence. There are several evident trends in Big data engineering to meet data demand.

Data Engineering Pods: Big Data engineering involved data pipelines, data modeling, data export, and machine learning. Each of them can be intensive for medium to large enterprises. With an increasing number of use cases for big data, the number of data engineering tasks grew faster. One of the visible trends is a clear strategy to break data engineering further down and assigns pods to specialized tasks. It streamlines the agile delivery of big data solutions.
Cloud adoption: Maintenance and technical upgrades are the major aspects of distributed computing. The big data pipeline processes that engineers built 3-4 years back are legacy processes these days due to advancements in new technologies. Even large enterprises have realized it. More and more enterprises will adopt cloud platforms in the next few years.
Real-Time Processing: Stakeholders want critical data points in real-time. It used to be a challenge. Velocity can be more important than data volume as it provides a competitive advantage. Oftentimes, it is better to have limited data in real-time than a large volume of data with higher latency. In the coming years, real-time processing will evolve and will become increasingly popular.

Hyperight: And lastly, what expert advice would you give to companies when they are considering technology for their big data lakes?

Jayesh Patel: We often get frustrated when we misplace the thing that we need at a given time. Before the digital era, we were using analogue telephones. We used to organize phone numbers in a phone directory book. Even if you maintain the directory, it can be useful for anyone. If someone messes up even a single name or a page, it will create issues. Millennials will laugh at how contacts were shared, maintained, and backed up during those days. With the cell phone and next-generation applications, things changed significantly. Smartphones have way more smartness than we need from them.

This analogy will make us think and revise enterprise big data strategy.

As new technologies are evolving to solve the multitude of problems, enterprises should always be open to new advancements. Due to characteristics of big data, often characterized as 5V- Volume, Variety, Velocity, Veracity and Value, through analysis of use cases and value proposition should be considered before considering technology. Organizing big data is more critical than arranging contacts in phones or classifying books in libraries. Enterprises should choose a platform that can effectively manage these characteristics and add significant business value.