Blog
22/07/2024
How to apply FinOps to GenAI
How to apply FinOps to GenAI
by Stephen Old, Head of FinOps at Synyega
What is GenAI and how does it work?
Generative Artificial Intelligence (GenAI) refers to a class of AI algorithms that can generate new content, such as text, images, music, or even code, based on the patterns it has learned from existing data. Unlike traditional AI, which primarily focuses on analyzing and interpreting existing data, GenAI creates new data, offering innovative solutions across various fields.
How GenAI Works:
1. Training Phase: GenAI models are trained on large datasets using deep learning techniques. During this phase, the model learns to understand and mimic the patterns, structures, and nuances within the data.
2. Generative Process: Once trained, the model uses its learned knowledge to generate new content. For instance, a text-based GenAI model can produce human-like text by predicting and generating sequences of words based on the input it receives.
3. Fine-Tuning: GenAI models can be fine-tuned on specific datasets to specialise in particular domains or tasks, improving the relevance and quality of the generated content.
4. User Interaction: Users provide prompts or inputs, and the GenAI model generates responses or creative outputs that align with the provided context. This interaction can be iterative, with the user refining the input to achieve the desired output.
GenAI has numerous applications, from creating realistic images and videos to assisting in writing and content creation, making it a powerful tool for innovation and creativity.
Cost and Carbon Implications of Using GenAI
So we understand how it works and how it definitely sounds great, but what are the realities? Is it too good to be true? Well quite simply, yes. GenAI has huge impacts financially and environmentally. It can be both expensive and cause large amounts of emissions, but that just seems anecdotal, so let’s look at how each phase listed above impacts cost and carbon.
Cost Implications:
1. Training Phase:
- Compute Resources: Training GenAI models is computationally intensive, often requiring powerful GPUs or TPUs. The cost can range from tens of thousands to millions of dollars depending on the model size and complexity. For example, training a large language model like GPT-3 can cost several million dollars.
- Data Storage: Large datasets are required for training, incurring significant storage costs.
- Energy Consumption: High energy consumption during training translates to higher electricity bills.
2. Generative Process (Inference):
- Compute Resources: While less intensive than training, inference still requires substantial computational power, especially for real-time applications. The cost depends on the frequency and scale of use, with cloud services charging per inference or usage time.
- Maintenance: Continuous updates and optimizations to keep the model efficient and effective add to operational costs.
3. Fine-Tuning:
- Compute Resources: Fine-tuning models on specific datasets also incurs significant costs, although typically lower than initial training. It requires dedicated compute resources, albeit for shorter durations.
- Specialised Storage: Storing fine-tuned models and additional datasets adds to overall storage costs.
4. Compute Resources:
- Infrastructure: Supporting user interaction with GenAI requires robust infrastructure to handle real-time processing. This includes servers, networking equipment, and often cloud-based services to scale with demand.
- Operational Costs: Continuous availability, high responsiveness, and reliability necessitate high operational costs. These include server uptime, data transfer, and sometimes dedicated support for peak times.
- API Calls: If the GenAI model is accessed via APIs, each interaction may incur a cost. For example, cloud providers like OpenAI, Google Cloud, or AWS charge per API call or per usage time, which can add up with high-frequency use.
- Software Maintenance: Keeping the user-facing application updated, secure, and running smoothly requires regular maintenance. This includes bug fixes, feature updates, and security patches, all of which have associated costs.
- User Support: Providing customer support to handle queries, troubleshoot issues, and manage user accounts adds to operational expenses.
- Storage Costs: Storing user interaction data for analytics, personalization, and improvements incurs storage costs. The scale of data can grow rapidly with a large user base.
- Bandwidth Costs: Transmitting data between users and servers, especially for applications with rich media or extensive interactions, can result in significant bandwidth costs.
Carbon Implications:
1. Training Phase:
- Energy Intensity: Training large GenAI models is highly energy-intensive. Data centres providing the necessary computational power contribute significantly to carbon emissions, especially if powered by non-renewable energy sources.
- Carbon Footprint: The carbon footprint of training a large model can be equivalent to the lifetime emissions of multiple cars. For instance, training a model like GPT-3 can emit hundreds of metric tons of CO2.
2. Generative Process (Inference):
- Operational Carbon Emissions: Regular use of GenAI for inference generates ongoing carbon emissions. The intensity varies based on the frequency of use and the efficiency of the underlying infrastructure.
- Data Center Efficiency: Carbon emissions depend heavily on the energy efficiency and sustainability practices of the data centres used. Efficient cooling systems and renewable energy sources can significantly reduce emissions.
3. Fine-Tuning:
- Energy Use: Fine-tuning also consumes energy, though less than initial training. The carbon impact is therefore lower but still notable.
- Localised Emissions: Fine-tuning on-premises or in regional data centres can contribute to local carbon footprints, influenced by regional energy sources.
4. User Interaction:
- Server Usage: Continuous server usage to support real-time interactions consumes substantial energy. The carbon footprint is influenced by the efficiency of the servers and the energy mix of the data centres.
- Peak Load Management: Handling peak loads efficiently requires dynamic resource allocation, often leading to higher energy consumption during these times.
- Network Emissions: Data transfer between users and servers, especially for high-frequency interactions, contributes to carbon emissions. The distance and speed of data transfer further affect the energy required.
- Mobile and Edge Devices: User interactions on mobile and edge devices also contribute to overall energy consumption, adding to the carbon footprint, especially if these devices are not energy-efficient.
- Persistent Storage: Long-term storage of interaction data in data centres, which require constant cooling and power, leads to ongoing carbon emissions. The impact depends on the data centres' energy sources and efficiency.
The Challenge
So what are people doing? Are they actually considering all of these risks, both financial and environmental? Simply put, no.
Organisations are not learning the lessons from the many cloud transformation platforms that have resulted in little transformation and very poor ROI. They are taking it as an executive mandate that they must use GenAI or fall behind the competition, without taking the time to understand the technology and the governance that needs to be put in place around it. Very often in my day to day dealings with organisations I find the following:
- Lack of definition for the business value of the use-case.
- Poor scheduling to maximise rate optimisations.
- Lack of architectural guidance/benchmarking to support resource efficiency.
- Lack of forecasting/budgeting per use-case, all linked to business value to demonstrate ROI.
- Lack of model review expertise in the organisation.
- Lack of understanding at the engineering level of the cost impact of a model.
- Lack of regular review to prove value.
- Lack of review of the response.
This final point is critical, people are too often trusting the output, thinking it has to be right. So often it’s wrong. Just search FinOps, and it will straight away call it Financial Operations, which if you read the original FinOps book, it will tell you’re wrong. It is just telling you answers based on what it’s read, if it’s fed poor information it will give you poor results.
All of these are things that can be managed, but organisations aren’t even considering most of them, so let’s have a look at what can be done.
FinOps for GenAI
So when I first came across this issue, it was for a very large engineering/manufacturing company. I won’t say what they make, as it makes it very obvious who they are. They excitedly told me they needed to use GenAI. Let me briefly describe the conversation.
Me: Ok, so what questions do you want to answer?
Them: We don’t know yet.
Me: Ok, what data do you think you may start with?
Them: All of it.
Me: Do you have any business challenges right now that maybe we could focus on?
Them: Yes loads.
Me: What’s the first priority?
Them: No idea.
What does this show? Well largely that they haven’t worked out the fundamentals and are just trying to join the fad. Here are a few concepts to help you from falling into the same trap:
- Know what you want to learn - You must have a specific question and some idea what you want the output to look like. From this you can ascertain the value of the process and see what kind of ROI you’re getting.
- Know the data you want to use - You would be mad to just use all your data, it will firstly, cost a fortune to run the model (in money and carbon) and secondly it reduces the likelihood for a useful response. This may seem odd, but if you’re using all the data in your organisation, you have no idea what could be muddying the model. While there’s a chance you could miss a gem, there’s a huge chance you’ll dredge up a load of nonsense as well.
- You have to review and iterate - You will not get any of this right the first time, your forecasts will need work, your models will need improving. It takes time and reflection. Don’t go too big, too quick, give yourself time to learn.
- Educate the users - You need to educate users to ask the right questions to the right model/AI. You get charged per request, and each requests uses carbon, so every time someone asks the wrong data the wrong question, it’s purely avoidable waste.
- Capacity Planning - When you start doing this at real scale, you will almost always be running models which are using huge amounts of computational power. With this in mind, we can learn from what’s been done in the past with high performance compute (HPC) and start scheduling like we do with Supercomputers. This can allow you (dependent on the technology) to benefit from rate optimisation for a more consistent use that’s working in series instead of parallel. Please note though, this requires huge scale and more than that, you need to understand whether you should even be running that many models.
To support organisations I made the below model.
The concept is that the foundations are very much about Use-cases and an ongoing cycle of Define - Measure - Run - Review. Briefly described above, you need to know what you’re trying to answer and the value to be able to understand if you’ve actually invested well in the cost/carbon that any model will generate.
On top of that, you have technical/financial pillars that should be relatively straightforward to understand the concept of:
- Service/Data Architecture - What service will you use? How is your data architected to allow for efficient models?
- Scheduling Resources - Running in series where sensible, or organising for lower carbon intensive times on the location-based grid you’re on.
- Rightsizing - Are you using the right sized compute for the requirement? Balancing speed (as you get charged by time), cost and carbon.
- Commitment Discounts - For when you get to such a scale that you can use series scheduling to reduce the rate of your usage.
- Model Efficiency - Technical/coding efficiency of your model.
- Request Efficiency - Education around making sure people are asking the right questions to the right model.
Finally this is topped by strong budgeting, forecasting and visibility of the costs, carbon and usage of the technologies.