Have you ever wished you could start using the Julia programming languageto develop custom models?Does the idea of replacingoutdated MATLAB code and modelsseem overwhelming?
Or maybe you don’t plan to replace all MATLAB code,but wouldn’t it be excitingto integrate Julia codeinto existing workflows?
Also, technicalities aside,how do you convince your colleaguesto make the leapinto the Julia ecosystem?
I’m excited to sharean announcement!At this year’s JuliaCon,I will be speaking abouta small but significant stepyou can take to start adding Juliato your MATLAB codebase.
Great news!You can transition to Julia smoothlywithout completely abandoning MATLAB.There’s a straightforward methodto embrace the best of both worlds,so you won’t needto rewrite your legacy models from scratch.
I’ll give my full talk in July,but if you don’t want to wait,keep readingfor a sneak peek!
Background
The GLCS.io teamhas been developing Julia-based solutions since 2015.Over the past 4 years,we’ve had the pleasure of redesigning and enhancing Julia modelsfor our clients in the finance, science, and engineering sectors.Its incredible speed and versatility have transformedhow we tackle complex computations together.However,we also fully acknowledge the reality:MATLAB continues to hold a significant placein countless companies and research labs worldwide.
For decades,MATLAB has been the benchmarkfor data analysis, modeling, and simulationacross scientific and engineering fields.There are likely hundreds of thousands of MATLAB licenses in use,with millions of userssupporting an unimaginable number of models and codebases.
Even for a single company,fully transitioning to Juliaoften feels insurmountable.The vast amount of existing MATLAB codepresents a significant challenge for any team considering adopting Julia.
Yet, unlocking Julia’s power is vital for companiesaiming to excel in today’s competitive landscape.The question isn’t if companiesshould adopt Julia—it’s how to do it.
Companies should blend Juliawith their MATLAB environments,ensuring minimal disruption and optimal resource use.This strategic integrationdelivers meaningful gainsin accuracy, performance, and scalabilityto transform operations and drive success.
JuliaCon Preview
At JuliaCon,I’m excited to share how youcan seamlessly integrate Juliainto existing MATLAB workflows—a processthat has delivered up to 100x performance improvementswhile enhancing code quality and functionality.Through a real-world model,I’ll highlight design patterns,benchmark comparisons,and valuable business case insightsto demonstrate the transformative potential of integrating Julia.
(Spoiler alert:the performance improvement is more than 100xfor the example I will show at JuliaCon.)
What We Offer
Unlock high-performance modeling!Our dedicated team is hereto integrate Julia into your MATLAB workflows.Experience a strategic, step-by-step process tailoredfor seamless Julia-MATLAB integration,focused on efficiency and delivering measurable results:
Tailored Assessment:Pinpoint challenges and opportunities for Julia to address.
MATLAB Benchmarking:Establish a performance baseline to measure progress and impact.
Julia Model Development:Convert MATLAB models to Juliaor assist your team in doing so.
Julia Integration:Combine Julia’s capabilities with your existing MATLAB workflows for optimal results.
Roadmap Alignment:Validate performance improvements,create a strong business case for leadership,and agree on future support and innovation.
By attending my JuliaCon talk,you will learnhow to seamlessly integrate Juliainto your existing MATLAB codebase.And by leveraging our support at GLCS,you can adopt Juliawithout disruption—unlocking faster computations,improved models,and better scalabilitywhile retaining the strengthsof your MATLAB codebase.
Are you or someone you knowexcited about harnessing the power of Julia and MATLAB together?Let’s connect! Schedule a consultation todayto discover incredible performance gains of 100x or more.
Let me take a bit of time here to write out a complete canonical answer to ModelingToolkit and how it relates to Modia and Modelica. This question comes up a lot: why does ModelingToolkit exist instead of building on tooling for Modelica compilers? I’ll start out by saying I am a huge fan of Martin and Hilding’s work, I respect them a ton, and they have made major advances in this space. But I think ModelingToolkit tops what they have developed in a not-so-subtle way. And it all comes down to the founding principle, the foundational philosophy, of what a modeling language needs to do.
Composable Abstractions for Model Transformations
There is a major philosophical difference which is seen in both the development and usage of the tools. Everything in the SciML organization is built around a principle of confederated modular development: let other packages influence the capabilities of your own. This is highlighted in a paper about the package structure of DifferentialEquations.jl. The underlying principle is that not everyone wants or needs to be a developer of the package, but still may want to contribute. For example, it’s not uncommon that a researcher in ODE solvers wants to build a package that adds one solver to the SciML ecosystem. Doing this in their own package for their own academic credit, but with the free bonus that now it exists in the multiple dispatch world. In the design of DifferentialEquations.jl, solve(prob,IRKGL16()) now exists because of their package, and so we add it to the documentation. Some of this work is not even inside the organization, but we still support it. The philosophy is to include every researcher as a budding artist in the space of computational research, including all of the possible methods, and building an infrastructure that promotes a free research atmosphere in the methods. Top level defaults and documentation may lead people to the most stable aspects of the ecosystem, but with a flip of a switch you can be testing out the latest research.
When approaching modeling languages like Modelica, I noticed this idea was completely foreign to modeling languages. Modelica is created by a committee, but the implementations that people use are closed like Dymola, or monolithic like OpenModelica. This is not a coincidence but instead a fact of the design of the language. In the Modelica language, there is no reference to what transformations are being done to your models in order to make them “simulatable”. People know about Pantelides algorithm, and “singularity elimination”, but this is outside the language. It’s something that the compiler maybe gives you a few options for, but not something the user or the code actively interacts with. Every compiler is different, advances in one compiler do not help your model when you use another compiler, and the whole world is siloed. By this design, it is impossible for an external user to write compiler passes in Modelica which effects this model lowering process. You can tweak knobs, or write a new compiler. Or fork OpenModelica and hack on the whole compiler to just do the change you wanted.
I do not think that the symbolic transformations that are performed by Modelica are the complete set that everyone will need for all models for all time. I think in many cases you might want to write your own. For example, on SDEs there’s a Lamperti transformation which can exist which transforms general SDEs to SDEs with additive noise. It doesn’t always apply, but when it does it can greatly enhance solver speed and stability. This is niche enough that it’ll never be in a commercial Modelica compiler (in fact, they don’t even have SDEs), but it’s something that some user might want to be able to add to the process.
From that you can see that Modia was a major inspiration for ModelingToolkit, but Modia did not go in this direction of decomposing the modeling language: it essentially is a simplified Modelica compiler in Julia. But ModelingToolkit is a deconstruction of what a modeling language is. It pulls it down to its component pieces and then makes it easy to build new modeling languages like Catalyst.jl which internally use ModelingToolkit for all of the difficult transformations. The deconstructed form is a jumping point for building new domain-based languages, along with new transformations which optimize the compiler for specific models. And then in the end, everybody who builds off of it gets improved stability, performance, and parallelism as the core MTK passes improve.
Bringing the Power to the People
Now there’s two major aspects that need to be handle to fully achieve such a vision though. If you want people to be able to reuse code between transformations, what you want is to expose how you are changing code. To achieve this goal, a new Computer Algebra System (CAS), Symbolics.jl, was created for ModelingToolkit.jl. The idea being, if we want everyone writing code transformations, they should all have easy access to a general mathematical toolset for doing such code transformations. We shouldn’t have everyone building a new code for differentiation, simplify, and substitution. And we shouldn’t have everyone relying on undocumented internals of ModelingToolkit.jl either: this should be something that is open, well-tested, documented, and a well-known system so that everyone can easily become a “ModelingToolkit compiler developer”. By building a CAS and making it a Julia standard, we can bridge that developer gap because now everyone knows how to easily manipulate models: they are just Symbolics.jl expressions.
The second major aspect is to achieve a natural embedding into the host language. Modelica is not a language in which people can write compiler passes, which introduces a major gap between the modeler and the developer of extensions to the modeling language. If we want to bridge this gap, we need to ensure the whole modeling language is used from a host which is a complete imperative programming language. And you need to do so in a language that is interactive, high performance, and has a well-developed ecosystem for modeling and simulation. Martin and Hilding had seen this fact as the synthesis for Modia with how Julia uniquely satisfies this need, but I think we need to take it a step further. To really make the embedding natural, you should be able to on the fly automatically convert code to and from the symbolic form. In the previous blog post I showcased how ModelingToolkit.jl could improve people’s code by automatically parallelizing it and performing index reduction even if the code was not written in ModelingToolkit.jl. This grows the developer audience of the transformation language from “anyone who wants to transform models” to “anyone who wants to automate improving models and general code”. This expansion of the audience is thus pulling in developers who are interested in things like automating parallelism and GPU codegen and bringing them into the MTK developer community.
Intern, since all of these advances then apply to the MTK internals and code generation tools such as Symbolics.jl’s build_function, new features are coming all of the time because of how the community is composed. The CTarget build_function was first created to transpile Julia code to C, and thus ModelingToolkit models can generate C outputs for compiling into embedded systems. This is serendipity when seeing one example, but it’s design when you notice that this is how the entire system is growing so fast.
But Can Distributed Development Be As Good As Specialized Code?
Now one of the questions we received early on was, won’t you not be able to match the performance a specialized compiler which was only made to work on Modelica, right? While at face value it may seem like hyperspecialization could be beneficial, the true effect of hyperspecialization is that algorithms are simply less efficient because less work has been put into them. Symbolics.jl has become a phenomenon of its own, with multiple different hundred comment threads digging through many aspects of the pros and cons of its designs, and that’s not even including the 200 person chat channel which has had tens of thousands of messages in the less than 2 months since the CAS was released. Tons of people are advising how to improve every single plus and multiply operation.
Just at the very basic level we can see that the CAS is transforming the workflows of scientists and engineers in many aspects of the modeling process. By distributing the work of improving symbolic computing, we have already taken examples which were essentially not obtainable and making them instant with Symbolics.jl:
We are building out a full benchmarking system for the symbolic ecosystem to track performance over time and ensure it reaches the top level. It’s integrating pieces from The OSCAR project, getting lots of people tracking performance in their own work, and building a community. Each step is another major improvement and this ecosystem is making these steps fast. It will be hard for a few people working on the internals of a single Modelica compiler to keep up with such an environment, let alone repeating this work to every new Modelica-based project.
But How Do You Connect To Modelica?
This is a rather good question because there are a lot of models already written in Modelica, and it would be a shame for us to not be able to connect with that ecosystem. I will hint that there is coming tooling as part of JuliaSim for connecting to many pre-existing model libraries. In addition, we hope to make use of tooling like Modia.jl and TinyModia.jl will help us make a bridge.
Conclusion: Designing Around the Developer Community Has Many Benefits
The composability and distributed development nature of ModelingToolkit.jl is its catalyst. This is why ModelingToolkit.jl looks like it has rocket shoes on: it is fast and it is moving fast. And it’s because of the thought put into the design. It’s because ModelingToolkit.jl is including the entire research community as its asset instead of just its user. I plan to keep moving forward from here, looking back to learn from the greats, but building it in our own image. We’re taking the idea of a modeling language, distributing it throughout one of the most active developer communities in modeling and simulation, in a language which is made to build fast and parallelized code. And you’re invited.
PS: what about Simulink?
I’m just going to post a self-explanatory recent talk by Jonathan at the NASA Launch Services Program who saw a 15,000x acceleration by moving from Simulink to ModelingToolkit.jl.
We will analyze daily price data for stocks in the Dow Jones index and then try to build an accurate index fund using a small numbers of stocks therein.
Similar material was already used in a presentation at PyData Berlin 2017. See the “Tour of popular packages” notebook. Back then, we worked with Julia 0.6 and used the packages DataFrames, Plots and JuMP. Now, we work with Julia 1.0 and use packages from the Queryverse, VegaLite and IndexedTables for data prep and visualization. Also, I added an alternative model.
The Dow Jones index is computed from the prices of its stocks, as a weighted average, where the weight is itself defined through the price of the stock.
We will compute the average price of each stock over the days. Then we will normalize these values by dividing through the total of the average prices. The normalized weights are then multiplied to the daily prices to get the daily value of the index.
In the previous plot, the Dow Jones index is marked in orange. In the following, it will be our target, which we want to approximate by using a small number of the stocks.
Let us start with an optimization model similar to an ordinary linear regression, but using the l1-norm, which is readily formulated as a linear program. In compact vector form, this reads
minimize‖wTP–I‖1subject tow≥0
where $P$ stands for the prices of the individual st, $I$ for our index (the target) and $w$ for the weights we use in our fund. We only allow non-negative weights for use in the portfolio.
We can use a standard linear programming trick and introduce auxiliary variables for the positive and negative parts inside the absolute values and minimize their sum. This is formulated using JuMP:
We could use all days of the year as input for our model, but for our evaluation we only use the days from the first three quarters. That way, we can see how our fit extrapolates through the remaining quarter.
In [14]:
training_days=189traindates=dates[1:training_days]testdates=dates[training_days+1:end]@showtraindates[end]testdates[1]# for visualizationdatebreak=[(Date=traindates[end],)]|>@take(1)length(traindates),length(testdates)
In the scatter chart above, we compare the weights from the Dow Jones index, with the weights found by the LP model. As we can see (the points lie on the diagonal), we recover the actual weights perfectly, even though we are only using a subset of the data.
The last model gave us an exact fit, but used all available stocks.
We are changing it now, and allow it to use only a given number of stocks. For this end, we will introduce additional binary variables to select stocks to become active in our index fund.
The weight of inactive variables must be forced to 0, which we do with a so-called big-M constraint. The constant is chosen in such a way, that any active stock could single-handedly approximate the target index.
In [18]:
bigM=1/minimum(weight.data.columns[1])
Out[18]:
90.926297738811
We start by using only a single stock and then go up to subsets of 6 stocks out of the 30.
In [19]:
nstocks_range=1:6
Out[19]:
1:6
The model seems to be relatively difficult to solve for SCIP, probably because of the big-M constraints, so we set a gap limit of 5%, rather than solving to proven optimality.
As we can see, our solutions stick to the target fund quite closely during the training period, but then diverge from it quickly. Allowing more stocks gives us a better fit.
In the previous model, we tried to fit our index fund to the absolute day-to-day closing prices. But the next model will be based on picking representative that are similar, which is defined through correlation of returns.
So let’s start by computing daily returns from the closing prices (for the training period).
The mean and standard deviation of the stock returns can be used as performance and risk markers. In the following scatter chart, we give an overview of these (bottom right is best).
Rather than looking at the day-to-day values, it precomputes a similary for each pair of stocks, and then selects a representative for each stock while limiting the total number of representatives. The actual weight given to the stocks is not part of the model, but computed in a post-processing step.
The authors note that this model can be solved more efficiently using a Lagrangian Relaxation approach, but I found that this is not necessary for our small dataset.
In [33]:
functionsolve_index_repr(nstocks)m=Model(solver=SCIPSolver("display/verblevel",2))@variable(m,active[symbols],Bin)# is stock in index fund?@variable(m,repr[symbols,symbols],Bin)# is stock 'r' represented by 's'?@constraint(m,sum(active[s]forsinsymbols)<=nstocks)forrinsymbolsforsinsymbols@constraint(m,repr[r,s]<=active[s])endendforrinsymbols@constraint(m,sum(repr[r,s]forsinsymbols)==1)end@objective(m,:Max,sum(correlation[r,s][1]*repr[r,s]forrinsymbolsforsinsymbols))status=solve(m,suppress_warnings=true)# post-processing: determine weights for representativesreprsol=getvalue(repr)accweight=[sum(weight[r][1]*reprsol[r,s]forrinsymbols)/avgprice[s][1]forsinsymbols]return(status=status,active=getvalue(active),weight=accweight*mean(dowjones.data.columns.Value))end
We can see that the solution funds are not following the target index as closely as before, on a day-to-day basis, but the general shape of the curve is very similar. On the other hand, there seems to be hardly any difference in performance between the training and the test period, so it extrapolates quite well!
We could compare the approaches more systematically, by computing the losses with different norms (l1, or l2) on the training or test data, but I feel the visualization already gives a good enough impression.
But as a final comparison, let us count how often the different stocks are used for the index funds.
Since this blog was also an exercise in using Queryverse and VegaLite, I would like to comment on my experience now.
For Query itself, building the queries was relatively straight-forward, after watching the tutorial given by David Anthoff at JuliaCon 2018.
Some of the computations look quite awkward and are probably not efficient either, such as the twofold join of price with itself and another table that links each day with the previous day. Here, an array-based view using [2:end] and [1:end-1] would have been more natural. Also, after a while I found it a little annoying that I had to repeatedly name all of the columns that I wanted to keep, in @join or @map calls.
Finally, the combination of Query and ndsparse of IndexedTables.jl is a little unfortunate, when the latter one is used as a sink for some query. Typically, ndsparse is constructed with a set of columns to be used as an index, and another (single, unnamed) column as a value. Individual values can then be referenced just like an array, that is, table[i,j]. But the constructor of ndsparse used on query results takes the first $n – 1$ columns as index and the last as value, but wrapped in a singledton NamedTuple. So, table[i,j] will actually be a (k=23,). This meant that I had to use price[d,s][1] as coefficient in my model, rather than simply price[d,s]. I could not figure out how to avoid this, without creating essentially another copy of the data.