With the volume, velocity, and variety of data produced by enterprises growing at 42 percent year over year, the world’s data storage capacity will exceed 175 zettabytes by 2025. Management of that data using traditional methods is expensive, slow, and heavily regulated. Furthermore, much of that data (57 percent) now sits outside of the data center.
This ‘edge-based’ data is found within IoT devices or is living in the Cloud, accessed by an increasingly distributed, global workforce. As a result, 68 percent of the data that is created by enterprises goes unused, leaving precious value uncaptured.
“One of the fundamental questions in computing is: do you do things in a centralized way, or do you do things in a decentralized way?” says David Aronchick, CEO of Expanso. “The thing is there's no right answer to that question. Different scenarios are valid based on what the business is trying to do. Our goal is to add to the portfolio of tools that developers have to make the maximum use of the assets at their disposal.”
David and his team envision a world where organizations can make maximum use of their infrastructure and data by changing the location where processing occurs.
“This isn’t about giant machines and data lakes where all your information is stored. Rather, it's recognizing the reality of where data and compute are, and how to take maximum advantage of a distributed infrastructure. We see Bacalhau, the open-source project, as a compliment to Kubernetes, Hadoop, and Spark, providing a global partnership for data transformation. In this paradigm, the platform itself understands that data is not going to sit on a single machine or within a single zone. It’s spread all over the place."
A new platform and data processing paradigm:
David, who previously held senior roles at Google (Kubernetes, Cloud AI, and Kubeflow), Microsoft (Azure) and Protocol Labs, and led product management in the early days of Kubernetes, founded Expanso to establish that new paradigm -- a platform that could address the challenges that surround data management today -- namely that data is too big to move, networks that are often too slow, and where merely touching the data can lead to security issues.
Instead, Expanso’s compute-over-data approach orchestrates jobs to run where the data is generated and stored, reducing costly and risky data transfers and storage.
“In our world, developers can move just what they need, and respect data gravity. The first step of their process no longer has to be moving the raw data to a single bucket,” says David.
The team’s first step into the market was through the open-source community. The Bacalhau project, named after the Portuguese word for cod, is a distributed data platform for fast, cost-efficient, and secure computation that enables users to run compute jobs where the data is generated and stored. Using Bacalhau, developers can streamline existing workflows without rewrites by running Docker containers and WebAssembly (WASM) images as tasks. This architecture is also referred to as Compute Over Data (or CoD).
They were clearly onto something. In just a year, jobs run on the platform swelled to more than 1.5M, with peaks of more than 150,000 jobs per month.
“Almost instantly it was obvious that people were going to need a commercial company that supported this platform,” says David.
Data processing made faster, more affordable and more secure:
David and the team founded Expanso, the commercial company committed to supporting the open-source project, aiming to help enterprises join the community. In the decentralized world of Expanso, companies can create a private network by reusing existing infrastructure or provisioning new machines. Additionally, management costs are kept at a minimum as failures are automatically addressed by the network, and data interactions are all recorded, audited, and stored in a permanent log.
In one particular use case, Expanso can reduce the amount of data moved over a network by 93% and the cost of processing logs by more than 99%.
“The default today is to move raw log files into a common data lake, which takes time, costs storage and egress fees, and risks moving sensitive data. By doing log processing and sanitization at the point of log creation, we can make things cheaper, faster, and more secure.”
Though only founding the company in 2023, Expanso has already attracted attention from prospective investors and industry leaders. The company was able to land several commercial partners including the US Navy and just landed a seed investment of $7.5M from General Catalyst, Hertz Ventures, and Array Ventures.
“Organizations who have 10 or more devices creating data or are looking to be multi-zone or multi-cloud with their deployments are likely already feeling challenges in managing data,” says David. “Because Bacalhau makes managing these machines and data easier, it lends itself to a pure product-led growth motion, resulting in a passionate community of developers and enterprises.”
Joining the Open 5G Innovation Lab’s (5G OIL) Batch 8 cohort will help Expanso establish its market foothold and build relationships with large industry players.
“Jim Brisimitzis and his team can open the right doors,” says David. “We've talked to at least ten different organizations who have real distributed computing needs, and we continue to do more. 5G OIL’s team is so responsive; they are ready to jump in to make you successful. These are big companies with enormous needs, so it takes time to bring a relationship of this calibre to fruition. The fact we’re having these kinds of conversations as a small company is a remarkable thing. It means that come the New Year we can focus on scaling up, continuing to productize, continuing to build growth, and repeating that same go-to-market that's already been successful with the existing team.”
Posted December 04, 2023