Using Network Analysis for Better Fraud Detection: Interview with Naveed Ahmed Janvekar

Social media, applications, networks, connections… Terms that have constantly been present for many years in our everyday lives, but not many understand in detail how they correlate, and the underlying structure of relationships within social network entities.

With Network Analysis, data scientists can identify, visualize and analyze entities, or nodes, and their relationships, where an entity can be a person, a transaction, a company etc. Social network analytics is essential where traditional techniques are becoming less functional, such as identifying fraudulent behaviour. With Naveed Ahmed Janvekar, Senior Data Scientist at Amazon, we had a chance to talk and get familiar with what Network Analysis is and how it is essential for fraud deception.

Naveed is one of the speakers at the 2022 edition of the Data Innovation Summit that will take place in May in Stockholm. Read the interview before his presentation at this event, and register to hear additional information on this topic.

Hyperight: Can you please introduce yourself, and what are your professional background and current working focus?

Naveed Ahmed Janvekar: I am currently working as a Senior Data Scientist at Amazon in the United States. In my current role, I develop machine learning solutions to detect and prevent abuse on Amazon’s marketplace. I have a master’s degree in Information Science at The University of Texas at Dallas. In the past, I have worked with companies like Fidelity Investment as a Java Developer and KPMG as a Business Intelligence Developer. My current working focus is in the areas of classification algorithms, clustering, active learning and graph networks.

Hyperight: During the Data Innovation Summit 2022, you will share more on the topic “Understand your data using Network Analysis”. Can you tell us a bit about what the delegates can expect from the presentation?

Naveed Ahmed Janvekar: This presentation will be focused on ways to transform data into graphs or social networks. Extracting different types of graph features, such as local structural features (degree of nodes, page rank) and node embeddings for various downstream machine learning tasks such as model training and/or clustering.

Hyperight: Can we start with what is Network Analysis and how it can be used in fraud detection?

Naveed Ahmed Janvekar: Network analysis helps us understand the structure of relationship within a social network which is formed by connecting entities, also known as nodes (for example persons, computer servers), with various common relationship, also known as edges (for example two people sharing common phone number). From network analysis, we can identify aspects, such as key influencers within a social network, and groups of well-connected entities with common relationships. In the fraud space, bad actors could be connected through common relationships, such as I.P. address, and by using such information, we can improve machine learning driven detection of bad actors.

Hyperight: You emphasize the importance of how entities are connected within data, and how to find entities that are most influential in a graph. Why is this important?

Naveed Ahmed Janvekar: For example, if a social network of customers and sellers on a marketplace is constructed. We could then generate an importance score for every customer using degree centrality and then profile every customer based on such score and then run targeted marketing or ad campaigns based on their score.

Hyperight: In your session description, you say that generating features from the graph can be used for downstream machine learning tasks. How does this work?

Naveed Ahmed Janvekar: Post construction of a graph network, features such as degree centrality, local clustering coefficient can be generated and given as input variables to downstream machine learning models. These network-based features can potentially improve the performance of machine learning models. Additionally, we can also generate node embeddings which can either be used as features for supervised machine learning or can be clustered to identify groups of entities that share similar embeddings.

Credit cards and a laptop during online shopping — Photo by Dylan Gillis on Unsplash

Hyperight: What are some advantages and disadvantages of machine learning fraud detection over other, more traditional fraud analysis processes?

Naveed Ahmed Janvekar: The advantage of using machine learning based approach to fraud detection is being able to quickly adapt to the evolving space of fraud. Traditional fraud analysis approaches mostly require doing data analysis to come up with rule-based conditions which can be manual and less efficient. With machine learning we can model complex patterns in the fraud space with better performance and efficiency as compared to traditional approaches.

Hyperight: What would be your recommendations to those who are just starting to look into this topic, where should they start and what should they pay attention to?

Naveed Ahmed Janvekar: My suggestion would be to construct different types of social networks with the data available. For example: network of customers and products purchased, network of customer and seller interaction. This will help you get information at entity level for different types of connections. Pay close attention to the growing size of these networks and efficient ways of handling runtime of feature generation. I suggest libraries like NetworKit to generate graph related features due to their superior performance in generating graph features from large datasets since their algorithms are written in C++.

Hyperight: What’s the best advice you’ve received during your career, and what would be your advice for new data enthusiasts?

Naveed Ahmed Janvekar: The best advice I have received so far, is to always prioritize projects based on potential impact that will be generated and go above and beyond in making such projects successful. Data Science is considered as a generalist role by many, and hence while having data science breadth knowledge is important, my advice would also be to focus on data science depth knowledge, such as mathematical details behind algorithms. This will give you an edge over others in the field. Also, being good at storytelling, communicating insights to business stakeholders, and building a narrative around your solutions. Participate in as many data science competitions as possible, participate in publishing research papers and get mentors early on in your career.

Featured image credits: rupixen.com on Unsplash

Cookie	Duration	Description
__cfduid	1 month	The cookie is used by cdn services like CloudFare to identify individual clients behind a shared IP address and apply security settings on a per-client basis. It does not correspond to any user ID in the web application and does not store any personally identifiable information.
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-advertisement	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
bp_user-registered	13 years 8 months 8 days	This cookie is used to set which users can access the private pages of the website. It is a functional cookie.
bp_user-role	13 years 8 months 8 days	This is a functional cookie. It is used to set restriction to the user on acessing certain pages like back office, account page etc.
bp_ut_session	13 years 8 months 8 days	This is a functional cookie. This cookie is used to set restriction to the user on acessing certain pages like back office, account page etc.

Cookie	Duration	Description
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form.

Cookie	Duration	Description
IDE	1 year 24 days	Used by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
test_cookie	15 minutes	This cookie is set by doubleclick.net. The purpose of the cookie is to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	This cookie is set by Youtube. Used to track the information of the embedded YouTube videos on a website.

Cookie	Duration	Description
_gat_gtag_UA_62786802_1	1 minute	No description
CONSENT	16 years 9 months 21 days 15 hours 5 minutes	No description
ihc_workflow_restrictions_0	1 month	No description
ihcMedia	1 hour	No description

Add comment

Cancel reply

Recap: Day 2 at Data Innovation Summit 2024

Recap: Day 1 at Data Innovation Summit 2024

Decoding Data Modeling: A Pillar of Modern Data Stacks and AI Cost Efficiency – Interview with Serge Gershkovich, SqlDBM

Recent posts

Recap: Day 2 at Data Innovation Summit 2024

Recap: Day 1 at Data Innovation Summit 2024

Decoding Data Modeling: A Pillar of Modern Data Stacks and AI Cost Efficiency – Interview with Serge Gershkovich, SqlDBM

Next-Generation AI: Deeper Experiments – Interview with Sina Nek Akhtar, Tech Lead, Data Analytics and ML at Google Cloud

Electrolux Continuing Journey to Data-driven Manufacturing Excellence – Interview with Klaas Dobbelaere, Electrolux

Navigating the Next Wave: Generative AI at Accenture – Interview with Mattias Aspelund & Julia Falk, Accenture

The Future of AI-Enabled Experiences – Interview with Dr. Ather Gattami, Leading Swedish AI Expert, AI Researcher at Bitynamics

AIAW Podcast E125 – Liza-Maria Norlin

Topics

Email Newsletter

Events

Hyperight