Mohammad Abu Sheikh, Founder and CEO of CNTXT AI – Interview Series

Mohammad Abu Sheikh is changing the AI landscape in North Africa, moving from passive consumption to sovereign innovation. As CEO of CNTXT AI and founder of the $10 million AI fund, he led three successful exports and received more than $1 billion in funding. His work lays the foundation for an AI ecosystem rooted in language, culture and data sovereignty.
We see the data that is fully utilized in this part of the world. Many of the problems with scaling AI come from the lack of data readiness, which ultimately means a lack of AI readiness. That’s why we start CNTXT AI.
Initially, we were solving the same problems we faced when building Locai… We saw these challenges working with AI71, TII, and G42 (IIAI). As we help these entities solve these problems, the vision becomes clearer and the business continues to grow.
You play a key role in building the largest Arabic numeral libraries for AI training. What is the biggest challenge in doing this? How did you overcome them?
Quality is one of the biggest challenges. Another is the limited availability of high-quality Arabic data online: the underrepresentation of Arabic. Only a small portion of Arabic content is used for digitization, and only 3-5% of online content is used in Arabic. That’s almost nothing. We overcome this problem by deploying data taggers, annotators, and data scientists to digitize, create and curate data.
CNTXT AI runs at the intersection of culture and computing. How do you balance the goal of state-of-the-art AI innovation with the goal of building culturally relevant solutions for the MENA region?
We build a model of cultural roots from scratch. From infrastructure to final products, culture is embedded from the beginning – it’s not something we’ll add later. We design, innovate and build specific cultures, dialects and needs from day one. Arabic is a language, but it comes with many dialects and cultural backgrounds throughout the region, so we built local products for the local country. We do this by working with local annotators, local people, people in our country.
You also co-founded Locai and led the SMPL AI Fund with others. How do these adventures complement the mission of CNTXT AI?
locai is the application layer – the part that people actually interact with. It sits on top of the data and infrastructure built by CNTXT AI. That’s why it succeeds: It transforms the AI foundations provided by CNTXT AI into realistic solutions that people can use.
SMPL AI, on the other hand, is about giving back to the community. It focuses on investing in early stage startups and helps build a regional AI ecosystem. We share the tools and lessons we learn from building our own AI so founders can grow faster and avoid common pitfalls.
Munsit is known as the most accurate Arabic speech recognition model in the world. What drives the model and why now?
The reason that drives the development of this model is simple: demand.
We always out of necessity. We looked at the market and found that the landscape was mature – both government agencies and private clients asked for such a solution.
The existing model just doesn’t fit the task. Most are based on English technology and then adapted. They are not designed for Arabic from scratch and are definitely not specific issues we want to solve.
So we decided to build our own. First of all, it is Arabic.
The research behind Munsit introduces a weakly supervised learning approach. Can you explain what this means and why it is crucial for training Arab ASR on a large scale?
Comments are expensive. Therefore, we have to go beyond traditional methods that depend on a large number of manual transcription. Weakly supervised learning helps us extend the extension without having to manually tag each audio file – this is especially important for Arabic, a language with limited data and many different dialects.
Instead of using professionally transcribed audio, we started with 30,000 hours of unmarked Arabic speeches. We have built an annotation pipeline that generates, filters and cleans the best annotation pipeline using automatic checks. This provides us with a high-quality 15,000-hour dataset – all without human transcription.
This approach makes it possible to train our model from scratch, thus rapidly and cost-effectively capturing the richness of Arabic in real life. Without this approach, it would take years and millions of manual efforts to build an Arab ASR system on this scale.
Munsit spans multiple benchmarks for OpenAI, Microsoft and Meta models. What does this achievement comment on the future of Arabic AI innovation?
The future of Arabic AI is in our hands; this is what this achievement proves. We can no longer rely on technology we do not own or rely on third parties that do not prioritize our region.
Munsit shows that we can use local talent to solve local problems. The clear signal indicates that the next wave of Arabic AI innovations will come from within.
How do you see Munsit’s evolution in future releases, and what’s the next frontier for Arabic Voice AI on CNTXT?
You just have to wait and see. What I can say is that we have a fresh set of Arab-first AI solutions along the way – all powered by Munsit and other models we currently build on CNTXT AI. This is just the beginning.
You often talk about the importance of “sovereign AI”. What does this term mean to you? Why is it crucial to the Gulf and the wider North Africa region?
To me, sovereign AI means full ownership and control over data, infrastructure, and the models that shape our future. This is crucial because we need to have our own destiny, which starts with data.
Data sovereignty is everything. Data is valuable and we need to make sure it stays in our hands.
We are unable to hand over our future and stay idle, while others build technology for us. The future of AI in this region will come from the region. This is exactly what we are working on.
How do you think of CNTXT AI shaping the Middle East’s AI ecosystem in the next five years?
By achieving true AI preparation. We go in, understand what companies and governments need, build data and AI strategies, and then help them build, test, deploy and scale.
If the data is new oil, then unstructured data is unrefined – full of potential but useless until processing. That’s why we built CNTXT AI to help organizations clean, structure and activate their data. Because that’s where the real AI conversion begins.
From your point of view, you are an entrepreneur and an investor, what advice would you give to other founders who set up AI startups in emerging markets?
Start now. Move quickly. Fail quickly, learn faster, and continue iteration.
Most importantly, establish real problems. Stay close to the ground – listen to users, not just hype. In emerging markets, relevance and adaptability are key.
Thank you for your excellent interview, I hope you know more readers should visit cntxt ai.