If you’re just starting out in the world of Einstein Analytics you might see or hear “SAQL” fairly often and be like SAQL what, SAQL who? You mean SOQL right? Nope, you heard it right! SAQL is the Salesforce Analytics Query Language, and while it is similar in many ways to SOQL (Salesforce Object Query Language) and even SQL (Structured Query Language), its different. And its powerful. And you need to learn it right now.
When I was first starting out, I looked at some of the SAQL queries that come with the Salesforce templated apps and I was super overwhelmed. (That waterfall chart, anyone?) I didn’t know any code and I had no idea what I was looking at. There were q’s and r’s and foreach’s and groups and filters and math and I internally started freaking out a little bit. I’m a quick study, but how was I going to learn an entirely new language so that I could just do my job? Hint: I’m gonna tell you! Over the next few posts, I’ll walk you through some basics of SAQL and share how I learned (and am still learning) to use SAQL to transform the way we look at and use data. Ready?
Anatomy of a Query
There are some basic pieces of syntax that probably all of your queries will have. To try and remember those things, the first thing I did when I started learning SAQL was go get a pile of post-it notes and put them next to my keyboard. I would write out what I needed to do on post-its, organize them the way they needed to be organized, usually on my office window, and then type it out in the code. I don’t know if I really recommend this approach, but it really did help me learn some basics.
“We’re talking about the struggle to drag a thought over from the mush of the unconscious into some kind of grammar, syntax, human sense; every attempt means starting over with language. Starting over with accuracy.”Anne Carson
Load Your Data
Every query you write will start with loading data. That makes sense, right? This looks something like this:
q = load 'NameofmyDataset';
There are a few things to breakdown with this statement. First, what does q mean? This q is naming a stream, and every statement you make starts with identifying which stream it applies to. The name of your stream can actually be anything, as long as it doesn’t have spaces. I find that it’s helpful to give my streams a more descriptive name. This is especially useful when I have long queries with many streams. For example, I am going to have a query that looks at invoices, but also opportunities, and maybe accounts. I could name each of these streams with what they are:
invoices = load 'myinvoicedataset';
opps = load 'myopportunitydataset';
accounts = load 'myaccountsdataset';
icouldevenmakethismystreamifiwantedto = load 'NameofmyDataset';
Make sense? Cool, let’s move on. The word “load” is a keyword that does just that – it tells the platform that you want to load some data. The next part says which data you want it to load. This is the api name of your dataset. Every statement that you make in a query will end with a semicolon. This is like the period at the end of the sentence. Its telling the platform that you are done with that statement and it should now evaluate the next one. Moving right along… after you load your data, you will probably apply filters to it.
We’ve now loaded an entire dataset, but the chances of us needing the whole thing are pretty slim. Let’s apply some filters so that we are only working with the data we need. We’ll do this by once again identifying the stream that our data is in, and then telling it which fields we want to filter and for which values. It looks something like this:
q = filter q by 'myfieldname' == "FieldValue";
This is saying that for our stream of q, I want to only look at data where ‘myfieldname’ equals “FieldValue”. To put this in context, if you were looking at a dataset of Opportunities, you could apply a filter to only look at opportunities where the Stage equals “Needs Action”. That would look like this:
q = filter q by 'StageName' == "Needs Action";
Or we could look at multiple stages:
q = filter q by 'StageName' in ["Needs Action", "Working", "Sales Qualified"];
We can also exclude values, which would look like this:
q = filter q by 'StageName' != "Closed Won";
q = filter q by 'StageName' not in ["Closed Won", "Closed Lost"];
So, for single values, we use == for equals, and != for not equals. For many values, we use “in” or “not in”. As far as I know, there is no limit to how many filters you can apply to a query. If it gets too crazy, you might want to evaluate the data that is part of your dataset though. The more you add, the longer it will take your queries to be evaluated. To add multiple filters, you can add them on separate lines or use them in one statement using “and”.
q = filter q by 'StageName' == "NeedsAction"; q = filter q by 'IsClosed' != "true"; q = filter q by 'ShippingCountry' not in ["Algeria", "Canada", "Brazil"];
q = filter q by 'StageName' == "Needs Action" and 'IsClosed' != "true" and 'ShippingCountry' not in ["Algeria", "Canada", "Brazil"];
Both of these examples are applying the same filters. Its really a preference thing. If I’m only applying a few filters, I might combine them into one statement. If I’m adding more than a few, I will separate them so that its easier to find and edit what I need to later on. I think we need a gif break… let’s take a minute to shake it off.
Now that that’s done, we can get back to business. Alright, so we’ve loaded data and we’ve filtered it, and all together now we have something that looks like this:
q = load "Opportunities"; q = filter q by 'StageName' in ["Needs Action", "Working"];
Our next step is grouping our data. You don’t have to group your data, but more often than not, I do. You can think of this in terms of standard Salesforce reports: if you don’t group your data, you have a tabular report – just a big table with rows and columns. If you do group your data, then you have a summary or matrix style report.
In our example, we are looking at Opportunities that have a stage of Needs Action or Working. I want to see how many Opportunities I have in each of those stages, so our grouping will be on the Stage. It’s going to look something like this:
q = group q by 'StageName';
By now, you’ll see our syntax developing here. Every statement starts with the stream name. Every statement identifies a keyword for an action we need to perform on the stream. Every statement names the fields we want to work with. Every statement ends with a semicolon.
You can also group by more than one column. Maybe we want to see how many Opportunities are in each stage and also who the owner is. In this case, we would group by 2 fields like this:
q = group q by ('StageName', 'OpportunityOwner');
Grouping is a really big topic that could fill a dozen more posts, but we’re going to keep it simple and end on this today. We’ll come back to all the different ways you can group things in future posts. So, now that we’ve grouped our data, let’s see what we’ve got. To produce a result, we need a foreach statement.
In a foreach statement, you’re going to tell the platform what data points you want to return from your query. In keeping with our Opportunity example, let’s tell it to return the number of opportunities and also the sum of the amount of our opportunities. This looks like this:
q = foreach q generate 'StageName' as 'StageName', count() as 'number_of_opps', sum('Amount') as 'sum_Amount';
We’re saying that for our stream of q, foreach Stage, we want to count the number of rows, which will give us our count of opportunities, and we also want the sum of the Amount column. You’ll notice there is an ‘as’ after our field names. We’re assigning variables. In other words, we’re saying that we want the sum of the Amount, and I want to call that ‘sum_Amount’. I could also call it ‘TotalAmount’ or ‘thisnumber_i_made’. It doesn’t actually matter what you call it. Again, it comes down to preference and best practice. I like to name things what they are. So if I’m writing a query that is going to return the value of opportunities quarter to date, I’m going to name it ‘value_qtd’. Let’s put it all together.
q = load "Opportunities"; q = filter q by 'StageName' in ["Needs Action", "Working"]; q = group q by 'StageName'; q = foreach q generate 'StageName' as 'StageName', count() as 'number_of_opps', sum('Amount') as 'sum_Amount';
We’re doing great! Let’s recap. First we loaded our data. Then we applied filters so that we narrow down the scope of what we’re looking at. We grouped our data by Stage, and then we told it what data we wanted returned. Our example query is going to return data that looks something like this:
There are a couple more steps you could add to the end of this, like ordering and sorting your data and limiting it to a certain number of rows, but I think we’ve covered enough for now. So, now that you know some basics, what can you do to practice?
Practice Makes Perfect
Hands down, the BEST thing you can do to learn SAQL is to create a query declaratively and then click the SAQL button to look at what it translates to in the code. Every lens generates and executes a SAQL query to build the data needed for your dashboard widgets. Don’t be afraid to put a lens together and then dissect what its doing in the SAQL. Its how I learned to understand what was going on behind the scenes. If you apply this filter, what does that syntax look like? If I build a compare table, how does that change the syntax? Play around and see what you can see.
AND, I know reading documentation is super boring, but the Analytics SAQL Reference is actually really helpful. I still refer to it at least once a week because who wants to memorize everything? Not me. Which brings me to maybe my last point for this post: you don’t have to know all the answers as long as you know where to find the answers. I can write SAQL in my sleep, but it wasn’t always like that. I Google everything. I use the SAQL reference. I copy and paste A LOT. I have pages and pages full of syntax that I have used for various things and I just reuse it all the time. Its how we learn!
Be sure to stay tuned for the next post where I’ll review more SAQL functions and give more examples on structure and execution.