Small to medium enterprises make up the majority of the companies in the world, yet they’re often underserved when it comes to accessing AI. Anastasia AI is working to change that.
In this week’s episode of Experiment Exchange, “How Anastasia AI is Democratizing Access to AI for SMEs,” join Michael McCourt as he talks with Pablo Zegers, Co-Founder and VP of Product at Anastasia AI. The company is focused on democratizing machine learning around the world by providing access to small and medium enterprises, especially those in South America. SigOpt’s Head of Engineering Michael McCourt spoke with Pablo about his work, how Anastasia AI uses SigOpt to manage model design and tuning, importance of efficiency in the model development process, and more.
Below is a transcript from the interview. Subscribe to listen to the full podcast episode, or watch the video interview.
Q: During SigOpt Summit, you presented a talk on democratizing time series forecasting for any industry, and that’s part of what I’d like to kick things off with here today. Your work is really ambitious—tell us more about that.
You first have to consider where we’re based, in South America. South America is not exactly a first world country. The economies here are very, very tight and companies are not as developed as the ones you might find in Europe or the States. This means lots of things.
First, there is not as much talent in South America. I mean, we’re talented! But not as much in artificial intelligence. So it’s very difficult for companies to have internal teams with data scientists that are proficient in artificial intelligence. That’s a big drawback: you can’t release an AI tool and count on a strong team on the company side to use it.
The second point is that because of the lack of talent, companies don’t have an internal culture related to developing technology. They’re mostly used to buying technology from Europe and the States and using that. Even the development of our technological products inside any company in Latin America is rare. So from a cultural point of view, forget it, it’s too complicated. And thirdly, cost. You have to be very, very cheap. So these three things have to be combined in order to offer SME companies an AI service in Latin America.
In our SigOpt Summit presentation, we discussed time series prediction, a tool which can be used by a regular data analyst and allow them to translate data into business results, allowing companies to take action immediately with no intermediate steps. That’s why we’re concerned about democratizing AI. It’s important for us as a company, but from a broader perspective this is very important because AI is very useful, it should be for everyone. If we don’t do this, all these companies are going to be left behind.
So we see this as a social role we are also accomplishing with the company. We are helping these companies to jump into the AI age. And from this point of view, we’re like David and Goliath, in the sense that we are a small company in South America. The least we can do is to offer very good results to our customers.
Q: How does SigOpt play a role in your work?
Most of the machine learning models are, as I mentioned in my presentation, strongly dependent on the random seeds you use for training. And if we use SigOpt, combined with specially designed machine learning architectures, we can guarantee our customers that our results are the best you can get. That’s the least we can do.
We also say that SigOpt is like life insurance for our company because it reduces error and guarantees us that we’re doing the best according to state of the art. So we can go to a customer and compete with another supplier, and be assured that our results are going to be really, really good.
Q: That’s outstanding. You mentioned some small and medium sized enterprises don’t have the AI savvy to be able to execute on these things. Have you seen any sectors that are more savvy or some that are less savvy, or is it pretty uniform across the board?
What I’ve seen is that in retail you have more developed teams. Also in finance because they have the money to hire the best people around. So I would say those two are large enough to have more complex teams inside. But, for example, if you go to mining, if you go to small companies, forget it. They’re dealing with really, really large forecasting errors and they’re not even concerned about artificial intelligence. They’re just concerned with reducing the error from 50% to 20%, which is still a lot. For them, 30% less error means saving a lot of money, it means their monthly margin, it means staying alive. You have to consider that an average small or medium enterprise in Latin America only has enough cash to work with for one month.
So we’re talking about life and death for many of these companies. In the world in general, you have less than a million large companies and more than 150 million SMEs. So we’re talking about a lot of companies.
Q: That’s a lot. Is that why you’ve targeted time series forecasting? Is it your main focus because it affects so many different companies?
That’s our starting point. We actually have a platform that allows us to assemble AI in a modular system like the kid’s game, Legos. In our modular system, we can assemble machine learning models with business rules and package a solution. The first one that we have developed is this time series prediction system. But we’re currently building anomaly detectors, recommender systems, and more because we have realized that this is not just about time series. This is about using AI to model the entire company.
Now, once you have these mathematical models of a company, you can question them all. For example, what’s the reaction of the company if their context variables change? The other thing that is very important to consider is that most of these companies don’t control anything. What I mean is in any company that’s relatively well run, everything is fixed, saved for the context variables. For example, if the exchange rate between Dollar and Euro changes, this company is going to suffer. But the company can’t affect the exchange rate. So they’re basically like a cork floating on a macroeconomic ocean. So if you predict the ocean, you will know more about the company.
Q: That is an incredible vision. What role has SigOpt played in that?
We know that we’re moving toward offering our customers reinforcement learning because they want to optimize the decision-making process, which requires a lot of artificial intelligence training. But as you see in the recent literature, if you don’t have precise, exact, consistent results, everything else is just garbage. So if you get SigOpt, if you’re confident that the model you’re using is the best you can use within a reasonable cost of the resources you have, then you can be more confident about the results.
It’s like building a foundation. If your foundation is weak, you’re going to get garbage results. My father is an architect and he designs buildings in order to withstand earthquakes. I helped him to develop all his software and saw that he produces lots of numbers. I learned that when you have lots of numbers, it’s very easy to get lost. You’re not going to see anything.
If you’re not building a strong, robust base, which is what we have with SigOpt because it allows us to choose the best models, we wouldn’t be able to move forward in terms of the complexity of the solutions. So SigOpt makes things pretty straightforward.
Q: Have any of the SigOpt analysis or visualization tools proved helpful to you?
Oh yeah, definitely. When I compare metrics performance versus cost in SigOpt, for example, I get a Pareto barrier. I was able to see how I could choose between complexity and performance. Maybe this is getting too deep into the technical details, but it gives me the possibility of starting with very simple architectures and then adding complexity in order to achieve the higher performance. So it indicates that there is a way you can follow from simple to complex. The first time I was able to visualize that was in the SigOpt dashboard, for example.
That’s one thing I love. The second one I really like is the parameter importance. Because as I said in the presentation, if you’re on a windy road, driving a crappy car, the problem is both the road and the car. The car shouldn’t be the problem, it should be only the road. So if in the parameter importance list, you see that the data related hyperparameter are at the bottom of the list, then you know that your model is not very good because your model is causing lots of problems. It’s obscuring the search for a solution. That’s a good thing to help us understand whether we are using a bad or a good model.
Q: Tell us more about some of the X-RNN models you’re excited about.
In the search for this solution, a very precise, cheap architecture is needed for producing these results. We’ve been working with a mathematician for a long time on developing a specific architecture that doesn’t use propagation, and it’s totally based on ordinary differential equation theory in order to produce a machine learning model that is totally mathematically characterized. We’re sure that there is a solution because we generated a convex surface that produces the best solution given the constraints. We have also totally characterized the convergence times, so everything is perfect. And because of this, we can guarantee that all the initial conditions are absorbed, so the system is robust to initial perturbations. This is very important because if, for example, you have a recurrent neural network working on a problem and suddenly you switch it to a different problem, the previous problem is a different initial condition, so you need to absorb it in order to move into a new problem. So we’re sure that after some latency time, the previous problem is totally forgotten and the system is using all the resources in solving that specific new problem.
Also, the mathematics we developed behind this X-RNN then allows us to build it in a stable way so we can have a system that has a million components and be absolutely sure that it’s never going to explode or behave in an unstable way. So we can grow in complexity in a totally controlled way. This is good for our business because it means that we can be sure about the things we’re using and building.
I used to be in academia and in academia, well, if it doesn’t work, it doesn’t matter, it means another paper! But with the customer, it means that the contract is going to be void.
Q: What’s next? What should people be looking out for from you in 2022?
We are working on integrating all of these things because we have this strong intuition that a company and enterprise is one single entity. So we have to model it as a single thing where all the parts are interrelated. This is not about having a module that produces time series, another one that recommends, another one that does anomaly detection, I don’t know what else. All of them should be connected into one single thing, because when you’re driving the company, you want to make decisions that optimize the company’s performance in the future and that means that you have to capture totally, completely, in the best possible way, the dynamics of the company. And that’s definable. We’re about to achieve that in this coming month or the next one in a very simple way that’s going to grow. It’s going to be fun.
Q: You’ve spoken previously about the importance of making sure that small and medium enterprises are doing their AI efficiently. Can you tell us more about that?
For example, there’s a recent article in IEEE Spectrum Magazine that predicts that the amount of pollution caused by one of the ImageNet systems in 2025 is going to produce as much pollution as the entire city of New York in an entire month. That’s ridiculous.
Right now, training those machines requires tens of millions of dollars. That’s absolutely not the language that can be taught by small and medium enterprises. If you want that number to go down, it means that we need to produce much more efficient machine learning models. In small and medium enterprises, problems are tiny and data is scarce. So this is a very extreme problem because you need to be even more efficient. As I already told you, there are over 150 million SMEs around the world. If we’re going to use the same models we’re using with large companies in SMEs, we are going to burn the entire planet out. That’s not ideal. So that puts a strong pressure on us to produce much more efficient algorithms. And so we’re working on that.
SigOpt also plays a role because when you optimize, it means that you’re using a better system. A better system that reduces the inventory error by 5% in a company means 5% less of goods that are being bought, which means 5% less pollution. So you have a direct causal chain explaining how SigOpt helps in this problem.
From SigOpt, Experiment Exchange is a podcast where we find out how the latest developments in AI are transforming our world. Host and SigOpt Head of Engineering Michael McCourt interviews researchers, industry leaders, and other technical experts about their work in AI, ML, and HPC — asking the hard questions we all want answers to. Subscribe through your favorite podcast platform including Spotify, Google Podcasts, Apple Podcasts, and more!