In a previous we reviewed different techniques to troubleshoot automation processes in your Salesforce org. This time we’ll tackle troubleshooting in general, even in systems other than Salesforce, and it’s intended for two audiences: those who report issues and those who receive the issue reports and must solve them.
We will focus not so much on technical stuff, but mostly on the state of mind needed to report and fix issues, and some pointers for Administrators in the support team as well as end-users to report issues when they happen.
Why Is It So Difficult To Troubleshoot?
As an Administrator you will likely always be in the dark. All you might get is an email with the subject “Error in Salesforce” (that’s helpful to know) and if you are really lucky you might get one or two sentences of what happened (if any at all), but rarely a full message of the error, let alone the steps the user followed to get to that error. By “error” in this post we mean the unexpected errors. The ones that the system is not prepared to respond to and doesn’t give you a clear message that allows the user to make corrections and continue an execution. A message such as “Close Date is required” or “Close Date cannot be blank” is clear to the end-user and meant to guide them on how to continue their execution (it’s an expected error). A message such as “Error Occurred During Flow “Payment_Flow”: An error occurred when executing a flow interview.” is clearly unexpected and the support team will need to review it.
Error messages in general are either: too generic (as mentioned before) or too specific (like when you get a specific line of a piece of code or a specific step in a flow). But that’s the result of a sequence of steps, and without those steps you cannot tell what may have been the cause for the failure. Errors can be produced due to a number of factors: inconsistent data, missing data, user did something at a time they were not supposed to, the implementation did not take into consideration a certain scenario, and many others. Even different factors may not cause the error individually, but in conjunction they do. For example: a functionality from an installed package when it’s executed along triggers, process builders and flows you have in your particular org can add a lot of logic that may do contradictory actions with one another leading to an error. Finding these chain executions is very complex and may not always happen, so determining the exact conditions that create the error scenario can take a while.
So you will always be looking for the needle in a (bunch of) haystack(s). Getting the sequence of steps right leading up to the error is crucial to understanding the conditions that produced it and, later on, how to fix it. And getting those instructions can be extremely difficult. This is a two-way street, both the error reporter and the support team must help each other to get this resolved, and they both need to have a certain level of focus, humility, and patience to get there.
Tip 1: Be humble, for there’s no such thing as…
This is a tip that goes a long way and both ways: to the person who found the error and is reporting it and to the one that has to review it and fix it.
There’s no such a thing as complete testing or extensive testing . Sorry, it’s just not possible to test what happens if a truck full of eels crashes into your local power plant and cuts the power right in the second you are saving a record and a process builder is performing actions on it and all its related records. Testing is for the most common scenarios, those are the ones you commit to support and validate. But there’s always that remote case or a particular set of conditions that nobody considered would likely happen, but may happen.
There’s no such a thing as an audit log that can be specific enough. Audit logs can be input in certain places, but not all places, and can seriously impact the performance of a feature if you are constantly saving records of what happened. Even if you have those super fine logs they may still not give you enough information to tell what happened.
There’s no such a thing as a magical button that tells me everything that happened. Too many actions can be imperceptible to the user and lead to an error, for example: you mouse over an element on screen, that mouse over called an action that caused an error in the background that didn’t appear on screen, which later made the saving action of the page trigger a different error. That mouse over is imperceptible to the user and the ultimate observable error is not apparently connected to that action, but it’s part of the chain of actions that leads to the error. Without identifying that mouse over step, you will never get to know how it happened and how to solve it.
Tip 2: Check your emotions
Adding to the scarcity of error details in the messages that pop up, the difficulty of pointing out every action that leads to the error, time constraints and sometimes the complexity associated with fixing it, we’re also talking about humans. And humans come with their emotions.
Sometimes users are too scared to tell (because they think they’re gonna be blamed for breaking it), or too embarrassed (because they think they did something wrong that they should already know how to do), or too frustrated (because they tried several times to do an action or tried to fix it themselves and couldn’t figure it out). You may get the usual “I don’t know”, “I didn’t do anything”, “I don’t know what happened”, like there’s some kind of crime scene and everyone is a suspect. There’s one very important thing to remember: errors happen, it’s nobody’s particular fault, and when they happen, we all need to keep our focus on finding what led to the issue. It’s not about pointing fingers, it’s about solving the problem. We’re all in this together.
Finding errors is also a way to do further testing and identify improvements to make an application more robust and/or usable. For example: it’s ok if you pressed somewhere you were not supposed to, because if it’s possible then how can you tell you shouldn’t? This is a way to identify that the application should probably enable or disable options to better guide the user, or capture this error and show an information message to the user with what they should do before they proceed. Making a system more intuitive is part of its evolution and can be best done with real users in the field who interact with it frequently.
Don’t get too overwhelmed by emotions either. It’s better to be detailed about what you did to get to the error, than thoroughly describing the roller coaster of emotions you went through. Unfortunately, rambling about the odyssey of your suffering will not help identify the error any sooner or solve it any quicker. And this could also lead to an outpour of unrelated issues. All of a sudden you may go from reporting one issue, to reporting seven others you saw or “think” you saw. This type of verbal diarrhea is also known in software as a “memory dump”, when somebody means to report one thing and it turns into a never ending monologue of unrelated background stories, (un)related issues and everything in between. Deal with one problem at a time. Nobody can resolve all problems at once, and just because there are several of them it does not mean they are connected. So stay focused, report them one at a time, and don’t move to another problem until you report and get the fix of the one in front of you.
Once you get to this state of mind you will find it easier to see and report the actions you did and that will lead to a quicker turnaround for the fix. And then, don’t get anxious and interrupt the support team every five minutes to see where they are. Let them do the fix, it will come. But interruptions while somebody is concentrating on solving a complex problem only leads to more delays. Put yourself in the shoes of the support team, if they are answering your thousand questions, they are not using that time to do the fix.
Tip 3: Tell me who you are, where you are and everything you did
When you have end users reporting issues you need to know everything they did. As seen in tip 1, there’s nothing magical that can tell you step by step, click by click what a user did. And as seen in tip 2, there’s a lot of emotions that come into play when getting this information.
As much as you want to help and understand the urgency of the user, when it comes to any scenario of execution there’s a lot of information you need just to get an idea of what the user did before you dig into the error itself. Omitting information will not save any time to resolve the issue.
If you are in pain and go to a doctor, you don’t tell the doctor to guess where it hurts or even assume where it could be. You want the pain to go away, so you point exactly where it hurts and give as much information you have to get the doctor to fix it. This is the main principle of troubleshooting. If we’re gonna get lousy error messages from the system, we should at least help each other by not doing the same. There’s some basic information end users should get used to providing by default when reporting an issue to get things going:
- Which environment?
- An end-user may be talking about an error in the production environment, but a power user who is also working in a sandbox as part of the testing process may be referring to another, such as a sandbox.
- And if it’s a sandbox, which one? There may be many, so always include the name of the org.
- Was it you, somebody else or you logged in as somebody else?
- If somebody else, then who? Always include the name of the user as it is in the org. For example: “it’s with Bob’s user”, there could be many Bobs in the system, so including a last name wouldn’t hurt. And if there’s no Bob at all, if “Bob”‘s user has actually first name Robert then you should report it as “Robert” and their last name. There’s no magic here (as mentioned before) that detects every nickname of every possible name you search. You can call yourself whatever you want among friends, family or in the office, but you have a specific name in the system and that’s the one the support team needs to know.
- You should provide the user’s username, since they are unique there will be no doubt which user you are using (there may be several users with the same first name, last name and email, but not with the same username).
- You should provide a link to the user and save the Administrator the time to search for it. The more information you provide, the faster the support team will get to the issue to resolve it.
- Always remember that the person you are reporting to is not seeing what you are seeing, not aware of every action, shortcut, shorthand you have. And they are not a CSI that can find every trace of evidence. Consider it like “guiding a blindfolded person”, without you pointing out everything they will not know what you know or did.
And this is just the start, we still have a long way to go before we get to the error.
In the second post of this series we’ll dive deeper into how to get all of the information we need to troubleshoot. Stay tuned…
What do you think of the process of troubleshooting? Do you have any recommendations to get started when compiling information about an issue? Tell me all about it or suggest topics you want me to write about in the comments below, in the Salesforce Trailblazer Community , or tweet directly at me @mdigenioarkus . Subscribe to the Arkus newsletter here to get the top posts of the Arkus blog directly to your inbox.