Explore your data using your own words through natural language question and answering on structured data

Vivek Sriram
2 min readOct 3, 2023

Overview: Though most business users at organizations need to use data on a daily basis, most do not know SQL and thus have no easy way to query data to extract insights. LLMs and Generative AI can make it easy and intuitive for novice users to interact with data, without any knowledge of of SQL or of the structure of tables, columns, fields or attributes.

Approach

Step 1 — Understand user’s intent

Dashboards and canned analytics charts offer users intuitive ways to interact with data. Frequently, however, users have “long-tail” queries which require them to build custom charts. In most cases, users who require access to this kind of data are not suited to find it by themselves either because of their lack of SQL knowledge or because of their lack of authorization to access the data. It’s not desirable for an organization to have a customer service manager writing a SQL query that brings down the whole entire analytics stack.

LLMs are particularly good at understanding language and deciphering user intent, a useful advantage that augments the template based approach commonly used to give users self-service access to data..

Step 2 — Filter relevant data tables to use

In complex systems such as in enterprise analytics, there are often hundreds of databases with countless tables. Mapping user intent to the subset of tables that are best suited to return good results is key to improving accuracy of LLM generated SQL queries. An instruction tuned LLM is helpful in these operations for understanding user intent. These can then be mapped to a set of database tables, given table definitions and a few examples for training. Performance can further be significantly improved if a good quality dataset is available for fine-tuning.

Step 3 — Create SQL queries out of natural language

An LLM tuned on coding languages can convert natural language inputs into SQL. Feeding the user question and filtered table list to this model improves the quality of output from the LLM.

4 steps to building data exploration with natural language with bookend

Step 1: Choose the best best model for the task.

Bookend has Llama2-Chat and Replit-Code-v1–3b models optimized, available and ready for use immediately.

Step 2: Configure

No configuration steps are required to deploy these models. Bookend takes care of the operational tasks making smart hardware, inference framework, and replication choices.

All Bookend models run in a separate environment — protecting all conversations with the models from outside your organization

Step 3: Fine tune

Bookend makes it easy to finetune code llama ensuring that the model’s previously learned capabilities are not lost by automatically adjusting the learning rates, freezing certain layers of the models etc.

Step 4: Integration

The bookend API makes it easy for developers to send in natural language questions to llama2-chat and get back tables to use for the analysis. A second API request to code-llama to get the SQL code to run against the databases.

--

--