SQL and Salesforce with DuckDB and GRAX

Table of Contents

Salesforce, SOQL, and Limits

SQL and Salesforce with DuckDB and GRAX — Source: Image created by OpenAI’s DALL-E, May 24, 2024

Salesforce, a cloud-based Customer Relationship Management (CRM) system, is one of the richest sources of data for business analysis and intelligence.

Structured Query Language (SQL), a language for querying, filtering, joining, and aggregating data, is one of the most widely used tools for data analysis. One of the primary tools for analysis and reporting on the Salesforce platform is Salesforce Object Query Language (SOQL), a dialect of SQL that lets you query Salesforce data directly on the platform.

But as any Salesforce data admin knows, SOQL has some major limitations and challenges. Some of the main issues are:

Hard limits on the length of a query string (MALFORMED_QUERY)
Hard limits on row count and field count in result sets (QUERY_TOO_COMPLICATED)
Hard limits on how long a query can take to run (QUERY_TIMEOUT)
Limited WHERE semantics in queries (SOQL limitations)
No JOIN semantics in queries (relationship query limitations)
No ability to JOIN against other data sets outside of Salesforce

These problems get worse and worse as Salesforce and company-wide data sets grow in complexity and scale.

Traditionally this means you need to set up a database or data warehouse, and load Salesforce data into it before you can start doing real SQL queries and analytics.

But there is a new database – DuckDB – that unlocks true SQL abilities and eliminates the challenges of SOQL, traditional database servers, and Export-Transform-Load (ETL) processes. Combined with GRAX’s data warehouse and data API capabilities, it’s easy to unlock advanced SQL analytics for Salesforce data.

DuckDB and Rich SQL Support

DuckDB is an “in-process analytical database”. There are two things that set it apart from other databases:

There is no database server – it is a program you download that handles SQL queries
It can read and write file formats such as CSV, Parquet, and JSON, to and from the local file system and remote endpoints such as S3 buckets.

This allows a data analyst to write standard SQL queries that fetch, filter, join, and aggregate data across multiple data sources like CSV data on disk, Parquet data on S3, as well as queries to traditional database servers.

This approach is a game changer in the analytics space and enables you to run the most sophisticated distributed data lake query techniques on your laptop against whatever data sets you can download or fetch.

So the remaining challenge is to get your data into standard formats that DuckDB understands like CSV or Parquet. GRAX makes it easy to get your Salesforce data in these formats for data reuse and analytics.

See GRAX in Action!

Check out our demo video to see how GRAX can help you unleash your data’s potential.

Watch Demo

GRAX Data APIs

GRAX acts as a multi-purpose Salesforce data warehouse.

Its primary function is a Salesforce data collector, which when turned on starts capturing and organizing every version of every object into its GRAX internal data warehouse storage system. This powers its classic backup, archive, and restore capabilities.

But with all this Salesforce historical data captured and organized, GRAX also works as a data warehouse with a Search API which lets you quickly find and download Salesforce data by object, field sets, and windows of time across your entire Salesforce history.

The simplest way to get started is to use the GRAX Search UI to find and download some data as CSV. Common search jobs:

All accounts (no filters)
All accounts in North America (field filter)
All accounts as they existed at the end of 2023 (historical time window)

To learn more check out our doc: https://documentation.grax.com/reuse-data/global-search.

A more powerful way is to use the GRAX Search API to script queries in Python. Some scenarios that are possible with a bit of scripting:

All accounts as they existed at the end of every month in 2023 to analyze month-over-month data (12 search API calls)
All Accounts, Cases, and Assets to join together for analysis (3 search API calls)

To learn more, take a look at our API Docs and GRAX Labs GitHub Repo.

Once we understand how to use GRAX to get Salesforce data out for reuse, we’re ready to do advanced analytics with DuckDB.

GRAX and DuckDB

Now we’re ready to start writing SQL queries and have DuckDB go find the data in GRAX and do the analytics. This is accomplished with a few lines of Python script that connects the DuckDB SQL engine with the GRAX Search API to fetch data for analysis. You can see this at graxlabs/duckdb on GitHub.
We can start with something simple – counting all contacts. Salesforce data admins know that counting is something Salesforce and SOQL have trouble with for large data sets.


SELECT COUNT(*) FROM Contact;

---
Downloading Contact...

   count_star()
0        206084

Now we’re ready for some real SQL that shows us Contacts in an Account hierarchy.

WITH Contacts AS (
    SELECT
    AccountId, Email, CONCAT_WS(' ', FirstName, LastName) AS Name
    FROM Contact
),
Accounts AS (
    SELECT
    a.Id, a.Name AS Account,
    (SELECT Name FROM Account.csv WHERE Id = a.ParentId) AS ParentAccount
    FROM Account a
)
SELECT
c.*, a.Account, a.ParentAccount
FROM Accounts a
JOIN Contacts c ON c.AccountId = a.Id
WHERE ParentAccountName IS NOT NULL
LIMIT 20;

---
Downloading Contact...
Downloading Account...

Email:         contact@example.com
Name:          First Last
Account:       Subsidiary Corp
ParentAccount: Parent Corp

...

SQL experts can rejoice that we use Common Table Expressions (CTEs), JOINs, and subqueries. Anything analytics you can imagine doing in SQL is now possible.

To learn more, check out the following resources:

Conclusion

Thanks to DuckDB and GRAX, we’re now able to analyze our Salesforce data with SQL.

All the limits of SOQL are gone. We can join, transform, filter, window, and aggregate large amounts of Salesforce data without leaving our SQL editor.

All the hassles of databases are gone. We didn’t need to set up an expensive database, data warehouse or ETL jobs before we could start doing analytics.

The result is getting insights from our business data in minutes.

Want business insights in minutes?

Speak with a GRAX product expert to see how GRAX can help you unlock your data’s potential.

Get started

SQL and Salesforce with DuckDB and GRAX

Salesforce, SOQL, and Limits

DuckDB and Rich SQL Support

See GRAX in Action!

GRAX Data APIs

GRAX acts as a multi-purpose Salesforce data warehouse.

GRAX and DuckDB

Conclusion

Want business insights in minutes?

Data Lakehouse Architecture with GRAX and Azure

Unlock Salesforce Insights in Power BI with GRAX +...

How to Automate Lead Enrichment Using Salesforce...

Join the best
with GRAX Enterprise.

Be among the smartest companies in the world.

Salesforce, SOQL, and Limits

DuckDB and Rich SQL Support

See GRAX in Action!

GRAX Data APIs

GRAX acts as a multi-purpose Salesforce data warehouse.

GRAX and DuckDB

Conclusion

Want business insights in minutes?

Sign up for updates

You may also like…

Join the best with GRAX Enterprise.

Be among the smartest companies in the world.

Sign up for updates

Thanks for signing up!

Join the best
with GRAX Enterprise.