Social media, applications, networks, connections… Terms that have constantly been present for many years in our everyday lives, but not many understand in detail how they correlate, and the underlying structure of relationships within social network entities.
With Network Analysis, data scientists can identify, visualize and analyze entities, or nodes, and their relationships, where an entity can be a person, a transaction, a company etc. Social network analytics is essential where traditional techniques are becoming less functional, such as identifying fraudulent behaviour. With Naveed Ahmed Janvekar, Senior Data Scientist at Amazon, we had a chance to talk and get familiar with what Network Analysis is and how it is essential for fraud deception.
Naveed is one of the speakers at the 2022 edition of the Data Innovation Summit that will take place in May in Stockholm. Read the interview before his presentation at this event, and register to hear additional information on this topic.
Hyperight: Can you please introduce yourself, and what are your professional background and current working focus?
Naveed Ahmed Janvekar: I am currently working as a Senior Data Scientist at Amazon in the United States. In my current role, I develop machine learning solutions to detect and prevent abuse on Amazon’s marketplace. I have a master’s degree in Information Science at The University of Texas at Dallas. In the past, I have worked with companies like Fidelity Investment as a Java Developer and KPMG as a Business Intelligence Developer. My current working focus is in the areas of classification algorithms, clustering, active learning and graph networks.
Hyperight: During the Data Innovation Summit 2022, you will share more on the topic “Understand your data using Network Analysis”. Can you tell us a bit about what the delegates can expect from the presentation?
Naveed Ahmed Janvekar: This presentation will be focused on ways to transform data into graphs or social networks. Extracting different types of graph features, such as local structural features (degree of nodes, page rank) and node embeddings for various downstream machine learning tasks such as model training and/or clustering.
Hyperight: Can we start with what is Network Analysis and how it can be used in fraud detection?
Naveed Ahmed Janvekar: Network analysis helps us understand the structure of relationship within a social network which is formed by connecting entities, also known as nodes (for example persons, computer servers), with various common relationship, also known as edges (for example two people sharing common phone number). From network analysis, we can identify aspects, such as key influencers within a social network, and groups of well-connected entities with common relationships. In the fraud space, bad actors could be connected through common relationships, such as I.P. address, and by using such information, we can improve machine learning driven detection of bad actors.
Hyperight: You emphasize the importance of how entities are connected within data, and how to find entities that are most influential in a graph. Why is this important?
Naveed Ahmed Janvekar: For example, if a social network of customers and sellers on a marketplace is constructed. We could then generate an importance score for every customer using degree centrality and then profile every customer based on such score and then run targeted marketing or ad campaigns based on their score.
Hyperight: In your session description, you say that generating features from the graph can be used for downstream machine learning tasks. How does this work?
Naveed Ahmed Janvekar: Post construction of a graph network, features such as degree centrality, local clustering coefficient can be generated and given as input variables to downstream machine learning models. These network-based features can potentially improve the performance of machine learning models. Additionally, we can also generate node embeddings which can either be used as features for supervised machine learning or can be clustered to identify groups of entities that share similar embeddings.
Hyperight: What are some advantages and disadvantages of machine learning fraud detection over other, more traditional fraud analysis processes?
Naveed Ahmed Janvekar: The advantage of using machine learning based approach to fraud detection is being able to quickly adapt to the evolving space of fraud. Traditional fraud analysis approaches mostly require doing data analysis to come up with rule-based conditions which can be manual and less efficient. With machine learning we can model complex patterns in the fraud space with better performance and efficiency as compared to traditional approaches.
Hyperight: What would be your recommendations to those who are just starting to look into this topic, where should they start and what should they pay attention to?
Naveed Ahmed Janvekar: My suggestion would be to construct different types of social networks with the data available. For example: network of customers and products purchased, network of customer and seller interaction. This will help you get information at entity level for different types of connections. Pay close attention to the growing size of these networks and efficient ways of handling runtime of feature generation. I suggest libraries like NetworKit to generate graph related features due to their superior performance in generating graph features from large datasets since their algorithms are written in C++.
Hyperight: What’s the best advice you’ve received during your career, and what would be your advice for new data enthusiasts?
Naveed Ahmed Janvekar: The best advice I have received so far, is to always prioritize projects based on potential impact that will be generated and go above and beyond in making such projects successful. Data Science is considered as a generalist role by many, and hence while having data science breadth knowledge is important, my advice would also be to focus on data science depth knowledge, such as mathematical details behind algorithms. This will give you an edge over others in the field. Also, being good at storytelling, communicating insights to business stakeholders, and building a narrative around your solutions. Participate in as many data science competitions as possible, participate in publishing research papers and get mentors early on in your career.
Featured image credits: rupixen.com on Unsplash