From Promise to Practice; Why a Lack of Standards Prevent Enterprise Adoption of Generative AI
In order for Generative AI to go from promise to practice in the enterprise, mainstream businesses will need to be able to consistently and clearly understand the risk, reward, impact and effort required to adopt a pre-trained model for enterprise use cases. Until business executives and IT leaders have systemic mechanisms to understand and quantify risk from unforeseen consequences, Generative AI in the enterprise will remain more experimental than experiential.
A systematic and standards based approach to comparing pre-trained models is the first critical first step in introducing clarity to organizations interested in Generative AI. Model curation standards dramatically reduce the decision making burden for all of the individual technical and business stakeholders involved in using Generative AI in production.
Need: Developers building software for enterprise use want to use pre-trained models fit for purpose for specific jobs-to-be-done. In-house software engineers, those working for system integrators, as well engineers building commercial software all prefer to use models best suited to accomplishing specific tasks well as opposed to generic models that can do lots of tasks somewhat well. The problem for any of these engineers is that the dramatic, exponential growth of open-source as well as commercial and proprietary models leave no clear starting point for getting started with identifying a model fit for task, and offer even less clarity for determining the cost, risk and effort of usage.
Consequences: There is no standard way of comparing models across use capabilities, origin & provenance, use cases, restrictions or risk. This inability of comparing different models to capabilities, cost and fit causes a bunch of different problems that all work together to slow down enterprise adoption — keeping Generative AI as primarily as a lab experiment instead — in a semi-permanent frozen trial state.
Consider a closer look at the causes of this problem:
- Fit: The explosion of task-specific models is proof of the vibrancy of the open source community. It also makes the process of deciding the best model for a specific task a nightmare. The 300K+ open source models are joined everyday by new entrants. Each has different capabilities, is trained on different training data sets, has different context lengths, specialize in different tasks and offer varying degrees of transparency, explainability and benchmarks. How does a developer decide what is best? Not easily, and why experimentation outweighs experience by so much.
- Training cost: “Do Everything” models like Open AI’s Chat GPT can do lots of things, but the charges for using them can quickly get out of hand. They are expensive to train, and run — even disregarding the copious privacy issues surrounding them
- Newer, “Open Source Generalists” like LLaMa or Falcon approach similar performance to Open AI but can still cost an arm and a leg to run in house. “Task Specialists,” as evidenced by the over 300K models on Huggingface.co, can be great at individual esoteric tasks but most offer no visibility at all into cost or performance of operations or cost of usage.
- Licensing and usage: Not all open source models are truly open source. Neither does open source always equate to “free.” Frequently, Task Specialists come with strings attached. Some are for research only and not authorized for production. Some are authorized for production, but run afoul of corporate policies on usage and others yet are trained on data which may not have been gathered and authorized for use.
- Data privacy and compliance: Data protection and privacy regulations regarding safe use of AI are emerging at a fever pitch. There is not only no uniformity of these safeguards across the major economies of the world, there isn’t even an agreed consensus within the IT community. While no doubt that will emerge in short order, developers, architects and product managers considering using Generative AI have to carefully evaluate exposure to data privacy and put compliance safeguards in place to control for problems.
- Inference latency: It goes without saying that low inference latency is critical to any consumer facing Generative AI application. The “Do Everything” models offer very little control over inference latencies, dramatically potentially restricting their consumer use applicability. The compute, memory and operational needs for training models vs handling inference workloads can vary dramatically by model size, capability and design. A lack of accessible benchmarks further compounds the situation.
For IT executives considering the use of Generative AI for enterprise use cases, there is a wide range of offerings to choose from. Each choice, however, comes with a range of pros and cons. Picking wisely can mean the difference between permanent experimentation or game-changing results.