Ask Your AI Spend in Plain English

Clip

Summary

Daniel showed an early look at a new way to understand AI usage across the platform. LightLLM captures every AI call with org ID, user ID, agent, model, and cost — and now that data lands in Azure Data Explorer. On top of that, the team built an MCP so you can query it in natural language: "show cost by user and model", "analyse failure rate", "give me the 99th percentile response time for Anthropic models". It comes back with a real analysis. Expect this to land on pre-prod next week, at which point you'll be able to run the same queries straight from Teamwork.

Screenshots

Agent chat answering the plain-English prompt 'show cost by user and model' — the new MCP surfaces a grouped cost-and-call-count analysis (Heng Wei flagged as a heavy user) directly in the conversation, no KQL written.

Agent chat answering 'give me the 99th percentile response time for Anthropic models' — the MCP returns a Haiku-vs-Sonnet comparison plus a recommendation, the analytics result rendered conversationally.

Transcript

Daniel Iercan

43:37 – 49:51

Thank you, Michael. So before we jump into the demo, a quick reminder of how this works. So an agent in Agents Builder, for example, or any other agentic group we might have in Researcher or Evo Builder or even Workflow generates logs, and those logs are captured at Light LLM.

So whenever we make a call to our To AI, we are passing some headers such as org ID of the user, user ID, agent ID, even the application. Those information are captured by Light LLM. They are published onto an Azure Event Hub.

And from there, we can now, we currently ingest them in Databricks, but what we have done in addition to that, we are ingesting them in Azure Data Explorer. This, for those that are not familiar with the technology, this is a technology similar to App Insights that allows you to use big data analytics on top of semi-structured data. It's very good for this kind of data that we have here, time series with log information, and allows us to query it to KQL language, but also allows us to create meta-ized view, which means that we can dynamically query the data, but we can also have pre-built views that allows us to serve UI.

So out of this Azure Data Explorer, we are going to in the future to present data in MFEs dashboards. But in addition to that, we have created an MCP that allows us to explore the data with an AI. And this is what I'm going to show you next.

So let me show you first how this data looks in Azure. So you can see here we have currently 3 tables. The raw data comes in this table all the time.

And from there, we can actually create as I said, projections. And currently the information for usage comes into these light LLM logs. Here we have all kinds of columns to analyse the data, such as we can see the cost, we can see the token, we can see the models, if it's streaming or not, status, keys.

This is a demo environment, dev environment, so you cannot see the org ID and the user ID, but we also managed to bring them in. They will be available on pre-prod once we make this available on pre-prod next week. But now on top of this data, and here are a few queries I created, we have built this MCP that allows us to discover the table structure.

So let me first Copy this. So I can come and say, analyse this table structure and it will come back with the result and telling me what columns are available in there. I think I have a...

Okay, so this will basically allow an AI model to discover the structure of our tables. And we can also verify if a query is correct, but more importantly, we can execute the query. And this will give us the data.

Now with this MCP, I've created an agent. which allows me to query the data in natural language. So I can ask questions such as show cause by user and model, for example. And starting from this prompt, the AI will explore the data structure through that MCP and will generate the correct query in order to give me the data.

It doesn't bring all the data in the memory and try to, in the memory to sort it and analyse it. Actually, it will create queries to group data, similar to what you saw on my screen there in Data Explorer. So you can see that it actually was able to identify that.

Heng Way, for example, is a heavy user on dev environment and we can see the cost, we can see how many calls. At the moment, as I told you, I can only see the keys. I cannot see the user because this is a dev environment and there we don't have curated data.

But this will be available on pre-prod once we have deployed there. We can see length fields as well.

Chris Bayne 47:54

So I'm flying United, which I think is C, is it right? Yeah, thanks.

Daniel Iercan 47:59

Sorry, Dean. Yeah, I think it was not for me. No worries.

Another question I can ask is with regards to, for example, we can analyse failure rate, right? Let's see what that gives us. We can look to see where we have more failures and which models might fail more. for this particular key.

And this will come back again, will create a query for me, and it come back with an analysis like this. And finally, we can actually do an analysis on the response time for different for different entropic models. Let me do that as well.

So I'm going to ask, give me the 99 response time percentile for the entropic models. Of course, we could dissect this per key as well, per user, per org, in any way we want. And this should come back with another summary around the 99th percentile.

And you can see it actually does a nice analysis across the two flavours of models we have for anthropic. And it has a nice conclusion at the end that, of course, if you look at the end, I think it should be. Um... recommendation to use Haiku, right?

It's faster, which is no surprise because we already know that. Okay, so I think this shows what's possible with the analysis. Expect next week this to be available on pre-prod and you will be able to connect this to teamwork, which means that you can do all this analysis in teamwork as well.

Thank you. And I'm going to hand over to Andrei.