In this interview, we speak with Dmytro Basan from Klarna! Dmytro shares key strategies for designing scalable, high-performance data systems.
As data grows exponentially, scalable architectures are essential for transforming information into actionable insights. He will also take the stage at the Data Innovation Summit 2025 to discuss how organizations can build systems that drive smarter decision-making.
Scaling systems means balancing performance and flexibility – navigating big data, automation, and the right tools. Discover how to turn data into powerful insights, and stay tuned for more from Dmytro!
Hyperight: Can you share your journey and what drives your approach to engineering leadership?

Dmytro Basan: My journey into engineering leadership began with a curiosity about how data drives growth. At a hospitality group for digital nomads, I led the creation of a property acquisition system, using data on market trends, real estate, and guest satisfaction to identify top expansion properties. This showed me how data turns challenges into clear actions. At a SaaS company, I led analytics efforts to transform service data into insights. By analyzing user behavior and usage trends, we improved products, enhanced experiences, and optimized billing. This taught me an important lesson –
“When data is applied with purpose, it becomes a powerful tool for identifying opportunities to optimize both product outcomes and business strategies.
As an Engineering Manager at Klarna, I collaborate with over 100 stakeholders to align priorities and deliver customer-focused solutions using data-driven frameworks. Feedback, analytics, and benchmarks guide me in solving user challenges and driving growth.
As the founder of DrillUp, I help engineers build meaningful careers by tailoring growth paths to their skills, goals, and circumstances, emphasizing technical and interpersonal development.
My leadership philosophy centers on trust, empathy, and transparency. By fostering open communication and a people-first culture, I guide teams to create impactful, data-driven solutions that meet business and customer needs.
Hyperight: At the upcoming Data Innovation Summit 2025, you will present on building scalable systems for actionable intelligence. What can the delegates expect from your presentation?
Dmytro Basan: With years of experience as an engineering leader, I’ll share strategies for turning datasets into real-world outcomes. Key takeaways:
- Real-world success stories. Explore how industries like hospitality and SaaS use data to drive actions, optimize resources, and enhance user experiences with AI-powered systems.
- Overcome data transformation challenges. Learn solutions for tackling big data complexities – integrating diverse sources, breaking down silos, and advancing from basic analysis to predictive and prescriptive insights.
- Build scalable frameworks. Discover the architectures and tools needed to support high-demand systems, handling 10,000+ requests per second without sacrificing performance.
- AI’s evolving role. See how AI-driven agents are transforming operations with proactive monitoring, self-healing, and resource optimization.
- Turn insights into impact. Understand how to apply insights to business goals, driving measurable results and fostering a data-driven decision-making culture.
This session goes beyond theory, offering insights from real-world engineering environments. By combining technical expertise with strategic foresight and practical examples, attendees will gain the confidence to tackle data transformation challenges.
Hyperight: In your experience, what are the biggest challenges organizations face when transforming big data into actionable intelligence?
Dmytro Basan: Transforming big data into actionable intelligence is complex, requiring a multi-faceted approach that balances data management, tools, team capabilities, and organizational alignment. The potential rewards are immense, but so are the challenges.
1. Navigate diverse data types and structures
Modern enterprises handle different data types: Structured data, like transactional tables, supports easy analytics with consistent schemas. Unstructured data, like social media posts and reviews, needs advanced techniques like NLP and image recognition. Semi-structured data, like JSON logs and sensor outputs, blends organization with unpredictability. Each type requires a tailored strategy:
- Structured data. Focus on efficient modeling, integrating systems into cohesive warehouses or lake houses, and ensuring high data quality.
- Semi-structured data. Employ flexible schemas, adaptive parsing strategies, and scalable storage solutions.
- Unstructured data. Leverage advanced tools to identify patterns in complex, unconventional sources.
Building scalable, cloud-based infrastructures early – like modern ETL/ELT frameworks, robust pipelines, and distributed storage – sets the stage for seamless analysis. Without this foundation, even top analytics tools fail to provide meaningful value.
2. Break down data silos
As data sources multiply – spanning customer interactions, operational logs, and market intelligence – fragmentation across departments often becomes a bottleneck. Disconnected data silos limit the ability to derive cohesive insights. To address this, organizations need to prioritize:
- Data governance frameworks. Standardize definitions so metrics like “churn rate” or “customer lifetime value” mean the same across teams.
- Cross-functional data literacy. Equip teams – from engineers to product managers – with the skills to interpret and access insights.
- Unified platforms. Use tools like data virtualization to integrate siloed data, ensuring accessibility and a comprehensive view.
3. Advance beyond basic analytics
Many organizations begin with descriptive analytics – dashboards and KPIs that outline what happened. While foundational, this is just the starting point. To unlock deeper value, businesses must embrace advanced analytics:
- Diagnostic analytics. Understand why something happened through root-cause analysis or performance correlation.
- Predictive analytics. Use historical data to forecast demand or predict customer needs.
- Prescriptive analytics and AI. Develop models that recommend specific actions, such as reducing churn through targeted interventions or optimizing engineering workflows.
This evolution requires upskilling. Building interdisciplinary teams of data scientists, machine learning engineers, and domain experts – supported by workshops and mentorship programs – helps embed advanced analytics into decision-making processes.
4. Ensure compliance and trust
With regulations like GDPR and HIPAA, handling sensitive customer data demands rigorous privacy measures. Data must be anonymized, tokenized, or aggregated to protect individual privacy while maintaining analytic value. Beyond legal compliance, earning customer trust is crucial; breaches can harm a company’s reputation and negate the value of its insights.
5. Scale for massive, distributed data
Handling the sheer scale of modern data – generated across global interactions and systems – requires powerful infrastructures. Traditional methods falter under such demands, but scalable, cloud-native architectures can rise to the challenge. Key strategies include:
- Distributed computation frameworks. Tools like Spark or distributed SQL engines efficiently process vast datasets.
- Metadata management. Rich catalogs make navigating massive stores faster and easier.
- Automation and orchestration. Automating data pipelines ensures timeliness and reduces manual overhead.
6. Operationalize insights for impact
Data is only valuable when it drives action. Operationalizing insights means embedding them into everyday workflows and long-term strategies. Examples include:
- Real-time dashboards. Delivering actionable metrics to engineering teams for rapid issue resolution.
- Predictive models. Powering recommendation engines to personalize customer experiences.
- Scenario planning tools. Helping leaders evaluate market changes and make strategic decisions.
Achieving this requires technological and cultural shifts. Cross-functional collaboration ensures insights drive growth. Key challenges include harmonizing data, advancing analytics, ensuring compliance, and embedding insights into workflows. Success lies in balancing infrastructure, scalable tools, data literacy, and actionable outcomes.
Hyperight: How do you approach ensuring data consistency and integrity when orchestrating data from multiple sources?
Dmytro Basan: Ensuring data consistency and integrity across multiple internal sources is a multi-layered challenge. My approach emphasizes clear governance frameworks, well-defined data models, proactive validation, and continuous stakeholder alignment. Here’s how I would structure the solution:
1. Establish a robust, evolving data model
- Unified schema and canonical data model. Start by defining a canonical data schema as the “source of truth” for all output data. This schema should be created collaboratively by engineering, data architects, and key stakeholders like product, customer success, and data-consuming teams.
- Schema versioning and governance. Implement a robust versioning strategy to control, document, and ensure backward compatibility of data model updates. Changes should undergo a formal review process with input from all data-producing teams and key stakeholders.
- Centralized data dictionary. Keep an up-to-date data dictionary and metadata repository to ensure all teams understand field definitions, values, transformations, and lineage, preventing inconsistent data interpretations.
2. Data governance and quality assurance
- Automated validation and testing. Use tools for schema validation at ingestion to reject or quarantine malformed records and alert the right team immediately.
- Profiling and monitoring. Profile data to detect shifts, missing fields, or anomalies. Real-time dashboards and alerts enable quick issue resolution.
- Data lineage and provenance tracking. Trace data to its source with lineage tools, allowing engineers to pinpoint and address inconsistencies efficiently, fostering transparency and faster resolution.
3. Incremental, scalable architecture
- Decoupled ingestion and transformation layers. Use a data lake or centralized storage as a raw data landing zone, applying consistent transformation logic downstream.
- Schema enforcement in transformation pipelines. Leverage tools like Apache Kafka with schema registry or Apache Flink for real-time validation against canonical schemas, preventing inconsistent data from passing through.
- Event-driven and microservices principles. Design ingestion and transformation with event-driven principles and microservices, enforcing schema compatibility at service boundaries to minimize propagation of bad data.
4. Cross-functional collaboration and documentation
- Regular cross-team reviews. Schedule periodic sessions with data-producing teams to align on schema changes, data quality incidents, and lessons learned.
- Clear documentation protocols. Maintain a centralized portal with guidelines for adding fields, updating schemas, and introducing data sources, including rationale, impact analysis, and migration plans.
- Feedback loops with stakeholders. Engage stakeholders early to validate schema changes, communicate timelines, and ensure modifications support business needs and data consistency.
5. Continuous improvement and adherence to best practices
- Scalable and flexible design. Ensure the architecture adapts to data growth using domain-driven design, data contracts, and schema evolution.
- Tooling and automation. Automate data cataloging, schema management, quality checks, and lineage with top-tier tools, enabling engineers to focus on improvements.
- Customer focus and responsiveness. Maintain feedback channels, address issues promptly, and keep stakeholders updated on fixes and improvements.
Ensuring data consistency in a multi-source environment requires strong governance, a refined canonical schema, automated validations, and collaboration. This stabilizes pipelines, builds trust, and drives better decisions.
Hyperight: What are some key considerations when designing systems that can handle peaks of up to 10,000 requests per second?
Dmytro Basan: Imagine launching a new data streaming platform for thousands of users. Everything’s fine until traffic surges to 10,000 requests per second. Without proper monitoring or scalable architecture, the system struggles – latency spikes, the database overloads, and users leave frustrated. Your team scrambles to react, but lacks the metrics to make informed decisions.
Now, consider an alternate scenario – one where the system handles such peaks gracefully, ensuring performance, reliability, and scalability.
1. Begin with monitoring and observability
Effective systems start with strong observability. Every service tracks key metrics like throughput, CPU usage, and latency at percentiles like P99. Distributed tracing follows requests across load balancers, caches, and databases, helping spot bottlenecks in real-time. Alerts tied to Service Level Objectives (SLOs) notify you when latency exceeds thresholds, such as 200ms at P99, before users notice. This visibility enables quick, informed responses to issues.
2. Design for stateless, horizontal scalability
Stateless services are crucial for scalable architecture. Offloading session data to stores like Redis keeps servers interchangeable, enabling seamless scaling. With container orchestration platforms like Kubernetes or AWS EKS, your system adjusts instances based on demand. Asynchronous I/O in frameworks like Node.js ensures predictable performance during surges, allowing smooth scaling without downtime.
3. Leverage caching and optimize data access
High traffic often leads to heavy reads, which can be managed with a solid caching strategy. Use a global CDN for static assets, Redis for frequent data, and read replicas for relational databases like Postgres or Aurora. For extreme workloads, NoSQL databases like DynamoDB scale with low-latency reads. Precomputing queries and using materialized views reduce database strain, keeping your system responsive even under intense load.
4. Use event-driven architectures for streaming
To avoid overloading your main application, use event brokers like Kafka or Kinesis to buffer incoming data and decouple producers from consumers. Consumer groups can scale independently, processing events at their own pace. Backpressure mechanisms prevent slower consumers from causing delays, ensuring smooth, predictable streaming performance.
5. Implement intelligent autoscaling
With proper observability, autoscaling policies become proactive. Metrics like CPU usage, request throughput, and latency inform scaling decisions in real time. For instance, when P99 latency rises, new instances are provisioned before the issues occur. As traffic decreases, instances scale down to save costs. This elasticity ensures efficiency while maintaining performance during peak loads.
6. Continuously stress-test and refine
Load testing is key to predicting system behavior under stress. Simulate peak traffic in a controlled environment while monitoring for anomalies in metrics, logs, and traces. This helps identify inefficiencies, like unoptimized database query or poorly tuned cache TTLs. Iterative improvements ensure your system meets performance targets and adapts to changing needs.
By prioritizing observability, designing stateless services, optimizing data access with caching, adopting event-driven pipelines, and using data-driven autoscaling, you can handle 10,000 requests per second. This approach ensures a fast, reliable user experience during traffic spikes, building trust and keeping your platform ahead of demand.
Hyperight: How do you balance the need for real-time insights with the demands of maintaining system performance and reliability?
Dmytro Basan: In any tech company, knowing when to use real-time insights versus batch updates is crucial. Here’s a guide to identifying what demands immediacy and where delayed processing is more pragmatic.
Real-Time Must-Haves: For Decisions That Can’t Wait
Fraud detection in e-commerce. When a suspicious transaction occurs, quick action is vital. Fraud detection systems must evaluate and block high-risk transactions instantly, as even a short delay could lead to significant financial losses. The time frame for intervention is seconds, not hours.
Dynamic media trends. Consider a scenario where a new show on your streaming platform starts trending. Real-time insights can trigger targeted recommendations and promotional pushes, capitalizing on user excitement as it peaks. Waiting too long risks missing the buzz entirely.
Batch or Near-Real-Time: For Less Urgent Needs
Financial rollups. Your finance team doesn’t need up-to-the-minute updates for daily revenue figures. A nightly batch process that aggregates transactions and reconciles accounts by morning is both cost-effective and sufficient for their needs.
Marketing campaigns. Customer segmentation updates don’t need to happen instantly. Weekly or nightly updates ensure profiles are enriched and ready for campaigns without overburdening your infrastructure.
Why Balance Matters
Cost efficiency. Real-time systems like Kafka or in-memory databases come with high costs. Batch processes, on the other hand, are economical and can run during off-peak hours.
Complexity and reliability. Real-time pipelines add complexity and potential points of failure. Batch workflows, by contrast, are simpler to manage and allow for recovery with minimal impact.
User experience. Overloading systems with unnecessary real-time processes can degrade overall performance. Prioritize real-time efforts only where they enhance user satisfaction directly.
Strategies for Optimizing Data Processing
1. Mixed-tier architecture:
- Real-time: Use event-driven pipelines for critical triggers like fraud prevention or trending content.
- Near-real-time: Aggregate data in smaller windows (e.g., hourly updates) for non-critical metrics.
- Batch processing: Reserve for deep analysis or official reports that aren’t time-sensitive.
2. Adaptive caching: Update caches frequently (e.g., every five minutes for top-selling items). This approach strikes a balance between responsiveness and resource efficiency.
3. Failover and observability: Ensure robust monitoring and fallback mechanisms. If real-time systems falter, serve cached data temporarily to maintain service continuity.
4. Incremental rollouts: Start with the most time-sensitive cases and expand as infrastructure and processes stabilize.
A Practical Contrast
Real-time example: A news publisher leverages real-time insights to display trending stories within minutes of a viral tweet, maximizing user engagement and ad revenue.
Batch example: The same publisher conducts monthly analyses of user behavior to refine long-term strategy – no urgency, just accuracy.
By aligning insights with the right processing method, you not only optimize resources but also deliver timely and relevant data. Real-time capabilities are essential for certain high-stakes use cases, but batch processes often suffice for broader business operations. This balance is the cornerstone of a smart, cost-effective data strategy.
Hyperight: What emerging trends in data engineering and AI do you believe will significantly impact the scalability and performance of systems?
The next wave of scalability and performance innovation is driven by AI-powered autonomous agents. These agents revolutionize the management of high-traffic platforms, SaaS apps, and large-scale processing by going beyond traditional monitoring, offering predictive, adaptive, and self-healing capabilities for system optimization. Here’s a breakdown:
1. Transition from monitoring to autonomous AI agents
Legacy monitoring tools like Datadog and Prometheus depend on static thresholds and manual interventions. AI agents, however, dynamically analyze trends, predict anomalies, and adjust resources autonomously.
Key mechanisms:
- AI agents employ time-series forecasting models (e.g., Prophet, ARIMA, LSTM) to identify spikes and seasonal patterns.
- They utilize reinforcement learning (RL) to refine scaling strategies based on real-time feedback.
- Graph-based neural networks (GNNs) map service dependencies, enabling holistic optimization across microservices.
Architectural shifts:
- Centralized monitoring transitions to distributed agents embedded in service mesh layers such as Istio.
- Integration with tools like Kubernetes and cloud APIs empowers agents to autonomously scale resources, deploy replicas, or throttle traffic.
Operational considerations:
- Cold start challenge: AI agents need historical data for accuracy, which can be a hurdle for new systems.
- Trust issues: Teams may prefer a hybrid model initially, blending automation with human oversight.
2. Predict and manage flash spikes
Flash spikes, such as Black Friday traffic surges, often lead to inefficient over-provisioning. AI agents can handle these surges more effectively.
How it works:
- Sequence models (e.g., Transformers, LSTMs) predict traffic patterns well in advance.
- Unsupervised algorithms (e.g., DBSCAN, Isolation Forest) detect anomalies like unexpected marketing-induced spikes.
- Multi-arm bandit algorithms allocate resources in real-time, optimizing cluster usage.
Benefits:
- Reduced downtime: Accurate predictions prevent resource shortages.
- Cost savings: Dynamic scaling ensures resources match demand, eliminating waste.
3. Real-time anomaly detection for reliability
High-traffic platforms require proactive anomaly detection to prevent cascading failures. AI agents excel in this domain.
Capabilities:
- Autoencoders and Bayesian networks flag metric deviations (e.g., latency, CPU usage).
- Causal inference pinpoints root causes in complex systems.
- Autonomous responses include triggering circuit breakers or deploying fallback services.
Implementation:
- Event-driven architectures leverage tools like Kafka or AWS Kinesis for real-time processing.
- Chaos engineering pipelines (e.g., Gremlin) simulate failures, enhancing resilience.
4. Optimize resource use in batch processing
AI agents optimize enterprise batch jobs like ETLs or financial reconciliations, addressing scalability bottlenecks.
Approach:
- Constraint optimization algorithms (e.g., Linear Programming, Genetic Algorithms) prioritize job scheduling.
- Federated learning shares resource usage patterns while preserving data privacy.
Outcomes:
- Faster job completion through smart resource allocation.
- Enhanced throughput during peak loads by mitigating resource contention.
Black Friday case study: AI agents in action
Scenario: A retailer anticipates 5x traffic on Black Friday, exacerbated by viral promotions. Traditional static scaling fails to address unexpected surges, causing downtime or overspending.
AI-powered solution:
- Forecasting: LSTM models predict 7x traffic by analyzing historical trends and live user behaviors, such as cart additions.
- Scaling: RL models dynamically allocate cloud resources, using cost-effective spot instances for non-critical tasks.
- Anomaly management: Autoencoders flag and reroute low-value bot traffic, prioritizing legitimate users.
Results:
- Zero downtime: Real-time scaling and intelligent routing ensured consistent performance, inspired by Shopify’s Black Friday strategies.
- Cost reduction: Spot instance utilization and precise forecasting reduced costs.
- Improved response times: Intelligent traffic routing and caching cut response times.
- Enhanced user experience: Anomaly management improved conversion rates, mirroring Walmart’s AI-driven prioritization strategies.
AI-driven agents are not just tools but transformative enablers for modern infrastructure, offering a powerful edge in a competitive, high-demand digital landscape.
Hyperight: How do you see the role of AI evolving in enhancing the scalability and performance of data systems in the coming years?
The future of scalable systems will be tied to AI, which I believe is set to play an increasingly central role in enhancing both the scalability and performance of modern data systems. As someone who manages large-scale engineering teams, I see this evolution unfolding across three key dimensions:
1. AI as proactive monitoring and optimization tools
AI will evolve from reactive monitoring to proactive agents, detecting bottlenecks, inefficiencies, and scalability issues before they occur. By analyzing telemetry data, AI can spot patterns, flag anomalies, and predict demand spikes. This proactive approach helps mitigate risks and lets engineering teams optimize resources in real-time.
2. Deeper insights and smarter decision-making
AI offers deep insights into system usage, identifying essential services, underutilized resources, and potential redundancies. Think of AI as a “virtual architect” or “digital CTO”, guiding teams to efficient, scalable designs. It enforces best practices, uncovers inefficiencies like duplicated processes, and recommends restructuring to align with business needs, fostering a culture of streamlined architecture.
3. Self-healing systems for resilience
The next frontier lies in AI-driven self-healing systems. When performance or scalability issues emerge, AI-powered tools can execute automated solutions, whether by following pre-defined playbooks or adapting dynamically based on the specific scenario. This capability not only reduces downtime but also ensures that systems remain resilient under heavy loads, delivering consistent performance when it matters most.
4. Smarter database and workflow optimization
In large-scale databases, AI can deliver optimizations that far outpace what human teams could feasibly achieve. By analyzing query patterns, storage use, and indexing strategies, AI can propose and implement improvements like optimized sharding or schema restructuring. It can even adjust the physical arrangement of data to ensure the system operates at peak efficiency, reducing latency and improving reliability.
Final thought: unlocking innovation for developers
Ultimately, AI isn’t about replacing engineers; it’s about amplifying their capabilities. By reducing the burden of managing complex, large-scale systems, AI empowers developers to focus on innovation and tackling higher-order challenges. With AI handling the routine but critical tasks, engineering teams can direct their energy toward creating value and driving progress.
As a leader, I’m focused on fostering a balance between innovation and pragmatism. While AI holds incredible promise, it’s essential to approach its implementation with a clear, strategic vision – leveraging its strengths where they deliver the greatest impact.

Don’t miss Dmytro’s presentation at the Data Innovation Summit 2025! He’ll explore how to build scalable, high-performance data systems that turn big data into actionable intelligence. Join him to gain valuable insights, learn from real-world experiences, and uncover best practices for designing scalable data architectures, ensuring data consistency, and optimizing high-throughput systems to meet modern business demands.
Add comment