- 1 Dramatically improve spreadsheet performance and decouple your workbooks.
- 2 A brief explanation of the INDIRECT function
- 3 Why using INDIRECT is terrible for performance
- 4 Introducing an alternative to INDIRECT
- 5 Implementing the Pub/Sub pattern in Python
- 6 Putting it all together in Excel
- 7 Recap; What have we done?
- 8 References
Dramatically improve spreadsheet performance and decouple your workbooks.
The INDIRECT function in Excel is a tricky beast. One the one hand it can be incredibly useful, but on the other hand, it is responsible for crippling the performance of many spreadsheets.
In this article, we’ll look at what the INDIRECT function is, why it is so bad for performance and an interesting alternative that is superior in almost every way.
A brief explanation of the INDIRECT function
The INDIRECT function takes a cell address and returns the value contained within the cell.
When designing a spreadsheet or set of spreadsheets it pays off to plan ahead and keep them well organized. Conceptually, using INDIRECT combined with Named Ranges can seem like a great way to do that. You can keep one area of functionality in one workbook and share key results with other dependent workbooks by looking up those values with INDIRECT.
Using named ranges avoids hard-coding explicit address references and allows us to refactor or restructure the referenced workbook later.
Why using INDIRECT is terrible for performance
The INDIRECT function is a volatile function. This means that every time anything in your workbook changes or any time you press F9 to calculate, the INDIRECT function will be called. On its own this may not be such a big deal, but because the INDIRECT function is called repeatedly and calculations that take the result of that as an input will also be called repeatedly.
If the result of the INDIRECT call is an input to some complex calculation or slow function then your spreadsheet will crawl. Every time you change anything the entire calculation will be re-done, even when the change you’ve made has nothing to do with that part of the spreadsheet.
Excel maintains a dependency graph which enables it to know what cells need recalculating after any changes have been made. This allows it to do the minimal number of computations when recalculating a worksheet after a change has been made. This is a very efficient way of minimizing the work that needs to be done so that spreadsheets can update quickly. Using INDIRECT ruins this as anything that is dependent (directly or indirectly) will end up be recalculated every time Excel recalculates.
The developers of Excel have not done this by accident. The INDIRECT function retrieves the value of the address specified, but it is not dependent on the cell pointed to by that address. You can see this if you use Trace Precedents from the Formula tab in Excel. This means that it is not sensitive to the referenced cell changing. It doesn’t know whether the referenced cell has changed or not and so it has to be recalculated every time, and this is why it is a volatile function.
Introducing an alternative to INDIRECT
Using INDIRECT as above is a common solution to the problem of referencing values in one spreadsheet from another. It decouples the two spreadsheets so that calculations from one (we’ll call it the producer) can be used by the other (the consumer). There doesn’t have to be just one consumer, there can be multiple consumers for a single producer.
This problem of needing to decouple producers from consumers is not unique to spreadsheets. In fact, in software engineering it is very well known and there are patterns for doing exactly that.
The pub/sub or publisher/subscriber pattern is one such pattern that is commonly used to decouple producers from consumers. In this pattern, messages are published and subscribers are notified of those messages. The delivery of messages between publishers and subscribers is handled by a message broker.
So that a single message broker can be used for different types of messages it is usual to split messages into topics. A topic is just a string that is known to both the publisher and the subscriber. Messages are published on a specific topic and subscribers subscribe to a topic. The subscribers will only receive messages published to the topic they are subscribed to.
In our spreadsheet rather than using INDIRECT to pull values from another workbook we can use this pub/sub pattern. The producer workbook will publish values to the message broker whenever a change is made, and the consumer workbook will subscribe to those messages and update only when a message is received.
We will implement this in the next section.
Implementing the Pub/Sub pattern in Python
We will use Python to implement the pub/sub pattern. Later we will call use this from Excel using PyXLL, the Python Excel Add-In. PyXLL is a commercial product that enables us to use Python code in Excel instead of VBA. Crucially for this article it can also be used to write RTD, or Real Time Data, functions. We will use an RTD function in the consumer workbook to update the value whenever a message is published from the producer workbook.
PyXLL can be downloaded from https://www.pyxll.com and there is a free 30 day trial. The same technique presented here could be achieved in another language so long as it is possible to write Excel worksheet functions and RTD functions in that language (for example, using Excel-DNA and C# or Jinx and Java, Scala, Kotlin or Clojure).
Often when using the Pub/Sub pattern some messaging middleware like Kafka, RabbitMQ or ApacheMQ is used. This is really useful in situations where we are messaging between applications or even between servers. In our case everything will be running inside Excel in a single application so using a messaging service like these is overkill. All we need is a way to pass messages from producers to consumers that are all in the same process.
We’ll start with a MessageBroker class with three methods: publish, subscribe and unsubscribe. Our producer will publish messages using the subscribe method, and our consumers will subscribe using the subscribe method. When they are no longer interested they can use the unsubscribe method. The messages themselves will simply be Python objects, and the consumers will be Python functions accepting these Python object messages as a single argument. Our MessageBroker will maintain a dictionary of topics to subscribers.
There we have it! Using this we can subscribe to a topic and receive a call-back whenever a message is published to that topic. Hopefully this shows that the pub/sub pattern doesn’t need to be complicated in order to be useful 🙂
There is a few more things we can do to improve on this. In our case of passing values between Excel sheets it would be useful if when subscribing we got the last published value. That way if the consumer subscribes after the producer has already published something it will get the latest value, rather than have to wait until the next one. Additionally Excel functions can (optionally) be called from multiple threads and so if that is something we would want to do then we need to be careful about multiple threads accessing the MessageBroker at the same time.
The complete code with these additional improvements can be found in the “pubsub” folder of the PyXLL Examples repo on github.
Putting it all together in Excel
As a reminder, the reason we went down this pub/sub path was to find an alternative to INDIRECT in Excel and now we’ll get back to that!
We need two new Excel functions, “publish” and “subscribe”. The publish function will be called from our producer workbook with a topic name and the value we want to publish. The subscribe function will be called from the consumer workbook where we want to receive the value. The subscribe method will be an RTD, or Real Time Data, function. That’s a special type of function that can update its value even after it’s been called.
If you’ve not already downloaded PyXLL then you’ll need to now, as that’s what we’re going to use to call our MessageBroker class from the previous section from Excel. You can download a 30 day trial of PyXLL from https://www.pyxll.com.
We’ll use the MessageBroker class from above and create a single global instance of it. We’ll also add some convenience functions so we can call publish, subscribe and unsubscribe on our global MessageBroker instance.
Next, using PyXLL we can write the “publish” Excel function so it can be called from an Excel workbook.
If you’ve not used PyXLL before you might be surprised at how easy this is! We write a normal Python function and simply add the @xl_func decorator to it. This is what tells PyXLL to expose our Python function as an Excel function.
To keep things clean I’ve put the MessageBroker class and the publish, subscribe and unsubscribe functions into a single module, pubsub.py. The function above is in a new module “pubsub_example.py” and imports the pubsub module as well as the @xl_func decorator. You can find the complete code in the “pubsub” folder of the PyXLL Examples repo on github.
To call this function from Excel you will need to install the PyXLL add-in if you’ve not done so already, and add your new pubsub_example.py module to the PyXLL config file, pyxll.cfg
Now we’re ready to add the “subscribe” function. To write an RTD function using PyXLL we create a class derived from PyXLL’s RTD class. You can read more about this in the user guide.
The RTD class has two methods, connect and disconnect. These are called when Excel is ready to start receiving updates and when it no longer needs them, respectively. We will override these in our class to subscribe to and unsubscribe from the message broker. When a new message is received we set the “value” property on the RTD object which updates the value in Excel.
To create the “subscribe” function in Excel we use the @xl_func decorator as before, except this time we return a SubscriberRTD object. We also need to provide a bit more information to PyXLL when calling the @xl_func decorator so it knows to treat the returned value as an RTD object.
And that’s all there is to writing an RTD function in Python with PyXLL! We can now call this new subscribe function from another workbook with the same topic, and each time the producer sheet publishes a value it will be updated in the consumer sheet.
We can have multiple consumers subscribing to the same topic, and we can have multiple producers publishing on different topics. Whenever a published value updates, the “publish” Excel function will be called with the topic and that new value. That will cause all of the results of the “subscribe” function subscribed to the same topic to update automatically.
As the RTD “subscribe” function is non-volatile any dependencies will only be calculated when the value actually changes.
Using PyXLL we’re not just limited to passing numbers or strings between sheets. We can return complete Python objects from Excel functions and publish those in exactly the same way.
Recap; What have we done?
We started off looking for an alternative to Excel’s INDIRECT function without the performance implications of using a volatile function.
The reason for using the INDIRECT function was to decouple results produced in one spreadsheet that were used as inputs in another. Named ranges were identified as a way to avoid hard-coding specific cell references.
Using the pub/sub pattern we can now publish results from any workbook and subscribe to those results in another. Using topic strings we can have publish and subscribe to multiple named values at the same time.
Using an RTD function for subscribing to a topic allows us to update values in Excel as new values are published without having to resort to making our function volatile.
We have achieved our aim of decoupling multiple spreadsheets, and by using named topics we protect ourselves from referencing cells in another workbook directly. By not using a volatile function we have ensured that our workbooks only need to calculate the minimum required when values change.