Data is the gold of today. Right now, the amount of data produced by organizations is crossing all barriers. To gain insights and take data-based decisions, data scientists are required. It is important because anyone who is capable of harnessing the power of its data is staying ahead in the game.
In our MobileAppDaily tech series, today, we will be talking with Ratnakar Pandey. He is regarded as the best minds in data science in India. Adding to it, he has an experience of over two decades in the field of data science & machine. Mr. Pandey has a strong portfolio of working with the top companies such as Amazon, Kabbage, Qbera, Citibank, etc. and currently, he provides consultancy services to startups.
In this interview, we had a conversation on different topics that started from his own journey to the implications of AI, and requirements of data scientists.
Therefore without wasting any time, let’s start with the journey…
Sure so my journey with data science actually started in a very interesting way. It wasn't necessarily by choice, it was more by a necessity right. I will give you more color on that. When I actually went in for my masters in the US people told me that if you want to get a job in the industry that I was actually aspiring for in the tech industry particularly in the semiconductor industry knowing coding and knowing statistics will be very very important.
So I did my master's in semiconductor engineering and I was targeting a company called Texas Instruments. I happen to crack that interview and actually go for that role but it was on the basis of all the data science skills and the knowledge that I have picked up during my Master's. And that is how I actually started.
I think AI is the talk of the town. Everybody is talking about ChatGPT. Even people who don't have anything to do with AI are also using it. The kids are using it for their homework. This is high time that you know people are actually getting involved in AI. I think, it's gonna be very interesting for the next five to ten years.
Some of the experts in the industry are also agreeing that the pace at which the AI Revolution is happening is actually beyond anyone's expectation or prediction. It is just gonna multiply from over here. I mean you will see more of these tools coming in the market whether for image recognition, for doing automated speech recognition, for having a conversational board right, and more & more players will jump in.
We are all aware of OpenAI. Some other big players like Amazon and Google are also coming in. Google already has Bards which they have released. Amazon's language model is also in the making right so you'll see more and more tech companies coming in. The only reason why they haven't launched a product just now is that they want to make sure that in terms of some of the other legal ethical angles. I mean they're taking care of those aspects but it will be a very important time like I said, it's also going to be very interesting in terms of the whole Regulatory and the ethical nuances around the AI Solutions.
Most of these tools particularly from the language part are based on what we call a generative pre-trained transformer model. There are large language models that have been trained on petabytes of information. For instance, any article, any news, or any website that you have on the internet probably has been seen by the model and the model has been trained on this using the neural network or deep learning.
Neural network architecture basically is a combination of unsupervised learning. The tools are not necessarily told what to do next and they will actually make the next word prediction in the sequence. This is the basic construct behind that right. So prior to these transformer models, we used to have what we call a recurrent neural network which will be a sequence of words.
The Transformers are actually doing much better in the runs and they can go to any number of words in the sentence. Prior to the word that you're looking at, they are very good in terms of doing that right and then throwing in the large compute. Some of the architectural advances that we've made and you have something as good as chat GPT which is a very very natural human language-like communicator.
So, the short answer there is that the Generative Pre-Trained Transformer (GPT) model is trained on billions of data sources out there with many different parameters. They are having a combination of unsupervised learning and supervised learning and also reinforced learning. Therefore, OpenAI launched ChatGPT, I guess a couple of years back. Also back then there was a lot of talk about this new AI chatbot so have you seen any difference between ChatGPT-3 and ChatGPT-4 which is launched recently?
I think the main difference will be in the accuracy and particularly for some of the languages that are non-English. ChatGPT is free versus the latest version that we're having, you'll see that it's able to do a much better job. For instance, let's say Indian languages like Telugu, Kannada, Tamil, Marathi, etc. All of those languages are also something that it has been trained on compared to the previous model where it was mostly trained in English and some other languages like French and German.
The accuracy will be something that you can definitely see a boost on. Also, I think more and more tools will now be able to take out any sort of negatives around it because it has been trained on all the data which is there on the internet. So how do we really make sure that it has been trained on clean data right so that's another difference that you'll see coming in terms of some of the newer tools. I have personally not tried the GPT-4 but I think based on whatever papers I have written in some of those gains can be definitely something that you can look forward to.
I think it's gonna be a double-edged sword. Some jobs will be definitely removed. I think IBM also made a public announcement that they are cutting some 9000 odd jobs and they are replacing that with the AI solution. They didn't mention whether it's going to be GPT or something else or some in-house solution may be a combination of that. However, you know things that can be done completely in an automated way by a tool like ChatGPT or something else of similar nature will be the jobs that will be at risk.
It's already happening and there are varied degrees of the effect that people are forecasting. However, I think you can definitely imagine if it is something that can be completely automated without any negative consequences. It will be removed at the same time. I am of the belief that just like when the computers came in, a lot of people were paranoid that computers are gonna completely remove the human and there's not gonna be any opportunity in the banks for example. I mean, it's always you know man and machine combined. It's not either or.
Some people are really paranoid that the machines are going to take over the world and you know it's gonna be a Terminator-like situation. I don't think so because it's human reasoning and problem-solving that cannot be replicated. So wherever you are having you know some sort of newer problems and different ways of looking at the things, we will need people to be involved in that but they can probably make a better decision faster.
If a decision is more efficient and the problem-solving capability by leveraging some of the AI solutions. So the short answer is that everybody should learn how to use some of these AI tools and don't be worried and scared about their jobs that something is going to come and take away their job. You have to start learning those things and become better at it.
The whole generative AI is one area. Within that in my view, I think you'll see development happening on two fronts. One will be the whole natural language which is obviously the best-fitting candidate in terms of a large language model.
When I say natural language it is both the spoken as well as the written language. You may say that okay bots have been existing for many years, I mean any banks that you go to, you'll have a chatbot right or you'll have an IVR sort of message, and so on and forth. However, they're not very conversational, they're very static and they normally frustrate the customer who they are actually talking to.
You'll see there will be a step-up improvement or actually many folds higher improvement in terms of their conversational pattern. They will be able to understand what you're saying and they can have a conversation in real time so that's one area. I think that will be coming in and you will see more and more players adopting that. You will have conversational bots to interact with you and that will take away some of the human intervention in terms of customer service.
It will also have a positive impact on the customer because you don't have to necessarily wait for an associate. I mean how many times when we call and we are in a queue that your wait time is 7 minutes or 10 minutes and so on and so forth? Not anymore, I think when you'll see that some of these things can be outrightly addressed by your bot and you'll have a timely resolution.
The other thing I can foresee is computer vision. Computer vision has a harder problem to solve than NLP. It's not to say that NLP is an easy problem to solve but computer vision is the next level.
I think some of the developments which are happening in the computer vision area will be also something that will be very exciting for all of us to you know look forward to such as Dall-E. Some of the other things that we are already seeing happening in the market will definitely have a bigger impact on computer vision.
You know that will translate into more and more automated things. It would also mean that you'll have more people looking into the self-driving car as an option right. I mean right now it is still a very niche market. I mean only in developed countries in that to you know very expensive cars are providing that functionality. You'll see that becomes more and more common even in places like India. I think already some of the top SUV manufacturers are having those sorts of self-driving options available, however, not completely. I mean you cannot have a car that drives on its own completely just yet in India but at least some functionality.
It’s a great question and this is a constant problem that I sort of you know addressed with my stakeholders. My suggestion or my way of looking at this in terms of the whole mental framework is that try to go for something simple. If it can be done in Excel for example and you can write a script to do it then that is going to address the business need. Let that be the solution so you don't have to necessarily look out for the most complex thing in terms of AI and try to offer it to your stakeholders. It is some of the mistakes, I've seen people making just because they are aware of certain things. They would like to force fit that solution so always think from the point of view of simplicity. Simpler the solution, better it is.
I mean unless you really trying to solve a very complex problem, you don't necessarily need to look out for the most complex solution as well. Simple is also transparent. You can always do reverse engineering, you can always customize, and approach things. Therefore, if a business rule of a heuristic or any sort of simple algorithm is able to solve the problem then let it be the case right? However, as you see our world is not that simple. I mean it's a multi-dimensional lot of players that are at the play at the same time.
A lot of moving pieces and complexities given that situation, and a lot of times you will be better off by leveraging an AI solution. But like I said you have to balance the speed in terms of how critical you can implement, what is going to be the customer impact, what is going to be the regulatory framework that you should be aware of, how you're going to be placed in that, and how cost-effective the solution is going to be.
I'm in ChatGPT and other softwares may offer a lot of different things but is it going to be free of cost, probably not. I mean, if you want to get the API access and stuff like that it is going to cost you
some amount versus some of the free solutions. There’s always a cost-benefit trade-off that you have to really do and see if you know what makes more sense with the mental model of simple is beautiful, you don't have to overcomplicate things.
Some of the things are quite basic. For example not having the right access to talent. It is also very ironic that you're talking about AI everywhere and yet people are there on LinkedIn looking out for opportunities. They're not finding things because of the expectation that we have in the corporate world versus what is being offered as of now, there is a gap in terms of the skills that are actually coming in.
Somebody who's doing a simple spreadsheet analysis may call themselves a data scientist. I'm not saying there's anything wrong with that but when you go and talk to an employer they have a very different set of Expectations. I think that's where the whole upskilling and the cross-skilling of the existing workforce and the people who are coming new from the college is the need of the hour. This is where companies and people are struggling as well and they're not able to fill out the position in one area.
I will definitely say it is gonna be something that will be difficult for people to implement unless they address that the other one, I would have is the connection between the business impact the AI. So AI cannot work in isolation or your technical skills cannot work in isolation. You have to always think from the business stakeholders and the business impact point of view. After that take that business problem and bring it to the AI.
A lot of people make the mistake that they do not necessarily try to sort of quantify what is going to be the impact of the work. They are doing so it's always very important that if you want to make the AI projects talk the business language do not say I'm gonna build this beautiful automated space recognition model. What are we going to do with this? I mean what is going to be the impact on how the businesses are going to use it, what exactly is going to be the efficiency gain, cost-saving, revenue, impact, etc. you have to translate that into a business end goal in terms of top line and bottom line.
This is where things will really happen and that's another hurdle that people face. The third area that I would highlight is data scientists and AI professionals cannot work in isolation. They need to be very collaborative with the product team and the marketing team and particularly with the data engineering team. So there are three things, one is Talent, the second is the business impact, and the third one is the dependencies on making the tech happen for you ethical aspects of AI.
I think you need to bring in your local context. You need to train even if you're using some sort of GPT. I mean bring in your own context, bring in your own data to print on, and make it better. You
should be very sensitive to whatever parameters you're trading, the model on none of those parameters will be something that can be called as objectionable.
Let me give an example. Age for example cannot be used in a credit risk scoring model so when you are supposed to be giving a loan to a particular customer, you cannot give it on the basis of age. Whether they are young or old, all of that that would be called discrimination. But now, what may happen is that I may have a model who is not looking at the age but maybe looking at the length of the credit history. How long they have been having a credit card or a loan account and that becomes a proxy for your age.
I can have the total vertex of the person and how many years they've been working in professional life and that can become again a proxy for the age. So you have to be looking out for those variables which may not be something that your regulation and ethics and other aspects that you're trying to balance, do not allow. Also whether they are proxies or something which is not allowed. This is why we used to do something called disparate impact analysis in the financial world.
You may have built a neutral model but then you have to really see the outcome of the model. Therefore, if I see that my model has been lending more to a particular race or ethnicity versus not lending it to another gender or ethnicity or race right then that's how you know there might be certain things that are there in your model, you don't understand.
They are creating a disparate impact and we have to really understand that no one is obviously careful from the model training phase. The second one is after you have trained and deployed the model when you are getting the data points to be very flexible and agile to rework the model to remove any disparate impact.
With any new technology which comes so when UK came in, I mean a lot of people were victims of fraud. I mean so they didn't know whether they had to put in their four-digit PIN. When doing the money transfer and that was a function of a lack of awareness.
I think we have to be more proactive in terms of creating that awareness making sure that people are aware of some of the risks associated with AI. We all need to be playing our roles. When we know that we have spotted something which looks offensive or which looks like fraud activity, we know that this is not the truth.
We should highlight those things. When I say, all of us digitally, entities like a corporate, government,, agencies, regulators, etc. i.e. everybody will have to play that role in terms of keeping it clean. The most important responsibility will be on the people who are developing these tools rather than trying to develop them because we have a popular saying in the data science world i.e. garbage in, garbage out. Therefore, if you train it on the wrong information, you'll get the wrong answers right.
I think one of the main things about the learning side, we took from Amazon is they call resources day-one company. It is one of the most customer-centric companies. It's very important that you think of customers backward. Do not think from your product and solutions for a few things. From the customer's needs and wants. What is the problem statement that you're trying to solve for the customer and are you doing it in the most efficient fashion in the cheapest possible way? The best quality that you can offer them right.
If you have that, you don't have to necessarily worry too much about the competition. Even though I mean it's very important to keep an eye on your competition, I think it's more important to really hear to those customer anecdotes, to the pain points, the joy that you're trying to create for the customer, and try to develop the best possible solution from that person.
I think, I mean that's how some of the successful startups have been able to scale their operations very quickly without needing to pivot into something different or iterative over the product because they were able to pinpoint that this is the problem.
I'm trying to solve and they really did a lot of backtesting, and a lot of talking to the customers. The focus group discussions to really figure out whether there's a need for this in the market right. I think that has to be the way forward for anything that you're doing and there are tons of problems to be solved. However, what really happens is that we don't define that problem very crisply.
I mean, we might be building a product that is addressing something completely different so we have to be very clear about what is it this productive solution, we are trying to address. How big is that market? how big is that problem statement? and how our product or solution is gonna help in that regard?
The first tip will be to talk to the people who are already doing that work. People who have been doing that work for a certain period of time and see whether you like their lifestyle or not. I think that's the best thing that they can do before, in terms of getting into it.
The second thing that I would recommend to these folks is to not pay upfront for any certificates. Any degree, any program, try to do the work free of cost on your own and learn how to get it right. Try to find a mentor who can help you guide you in terms of what the swim lanes you should be part of. In terms of awareness, you should know what you're getting into. Also, understand how you really get comfortable which is where I'm in having a mentor or having some of the free resources to go through will be very very important.
The third is once you've decided to get in then you should not be in a hurry that within a month, I will be the master of everything. It takes years and years to really get to that level. You'll see some people will naturally build fantastic models, otherwise, despite the computer you give them, despite the machines you give them, they are not able to do it because it's sort of like learning on the job. Be prepared that it's not gonna be a sprint, it's gonna be a marathon where you have to constantly learn and then you'll be coming up to the level, where you can be calling yourself a senior data scientist.
One of the things that I see people do particularly founders who don't have an AI background or who don't have any AI team member or data science team member is that they will sort of pretend that I don't need it until the water is on a higher level.
What they're actually trying to put together is an AI strategy in data science strategy. I think my suggestion or biggest recommendation will be to think about it from day one. Whether or not you do it on day one that's a different story altogether but think about it as having a strategy in place, having a plan in place. When do you want to really bring in some of these skill sets. What sort of value you would like to derive from the data that you're having?
I understand that if you don't have the data, what are the data scientists going to do. However, sooner or later you will get to the stage where you're gonna have this information coming in and then if you're trying to play the catch-up game, it's gonna be too late for you. It’s because you know game theory that others are doing it, you're not doing it then you're gonna be left behind.
Having that data centricity, having that AI mental model from day one will be very very important. I know some of these people will also be very intuitive and very very close to the customers. They know the anecdotes they have spoken to the customers. It's always good to bounce off some of these anecdotes and intuitive thinking with the data to make sure the decision that you're doing the strategy that we are creating everything has you know some element of data science, some element of analytics, some type of numbers coming in to back it up. Be more systematic and organized about it as well right those will be some of the tips I can give so.
Want to connect with Ratnakar Pandey for his fruitful insights in the domain of data science, here’s Linkedin. To read similar compelling interviews on diversified topics, subscribe to MobileAppDaily.
Cut to the
chase content that’s credible, insightful & actionable.
Get the latest mashup of the App Industry Exclusively Inboxed