Simulation techniques for data analysis
By Luciano Pandola and Giada Petringa
The birth of any modern physics experiment is preceded by a period of design stage, in which the optimal balance between sensitivity, performances and costs is determined.
Experiments are usually designed to search a specific type of “signal” (the Higgs boson, the dark matter, etc…). The detector used must therefore be sensitive to the signal of interest but, at the same time, well shielded from other kinds of events (the so-called “background”) that could imitate it. Given the complexity of current experiments, most of that design work consists of simulating with proper models the response of the overall system, both to signal and background events.
In this way it is possible to define, for example, the type and the size of the detectors to be used – to better recognise the signal, or the shields necessary to reduce the background to a tolerable level. Simulation is important also after the experiment has been built, when dealing with data analysis. Indeed, simulations offer the enormous advantage of “predicting” the detector response according to changes in the characteristics of the signal and the background.
Comparison between real data and simulations is often an obligatory passage to analyse and interpret the results of an experiment.
Experiment simulations are a particular example of a problem that can be addressed with the so-called “Monte Carlo method”. It is a statistical approach, whose primary ingredient are random numbers. Its name was actually coined taking inspiration from the famous casino of Monte Carlo, the undisputed reign of randomness.
The application field of the Monte Carlo method is very wide and includes complex systems, whose behaviour is determined by the reciprocal influence of a large number of elementary components. An example of this are the studies on vehicular traffic flow, financial market trend or, precisely, the interaction of a particle beam inside a detector.
Basically, random numbers are used to simulate the behaviour of elementary units, so as to predict the global behaviour of the system. So, by repeating many times the whole procedure it is possible to determine the system average response.
b. John von Neumann (1903-1957), born in Hungary and emigrated to the United States in the ‘30s; together with Stanislaw Ulam, he was the undisputed father of the modern scientific use of the Monte Carlo method.
The Monte Carlo method requires both generating random numbers and repeating the same procedure: therefore, it perfectly fits with the characteristics of a computer, designed to quickly perform repetitive calculations.
However, we own the ideas underlying this method to Georges-Louis Leclerc, Comte de Buffon (1777) and Pierre Simon Laplace (1786), in advance of centuries compared to the invention of computers. Applications underwent a leap forward after World War II, thanks to Stanislaw Ulam and John Von Neumann, who took advantage of proto-computers linked to the development of the first thermonuclear weapons.
The most common application of the Monte Carlo method in the area of particle physics is the simulation of the passage of radiation through matter, in order to predict the detector response.
In the last decades, several softwares suitable for the purpose have been developed: starting from the pioneering ones, limited to a specific sector, we gradually came to broader projects, designed to respond to many different requirements. The programs currently available differ especially in the physics models adopted to simulate single interactions. One of the assumptions of the Monte Carlo method is actually being able to describe the behaviour of each elementary unit composing the system or – in this particular case – the interactions of a given particle with the medium in which it passes through.
Slight differences in the “behaviour rules” of these units can bring to more or less visible differences in the results of the entire system. Sometimes, the “behaviour rules” of elementary particles are not precisely known and we must resort to phenomenological or theoretical models, which impose approximations or simplifications: this happens especially for hadrons, namely particles made of quarks and subjected to the strong nuclear interaction.
In some cases, models are not described with analytic expressions, but are based on experimental data, which are organised in a table. This approach is inevitably limited only to those applications for which accurate and reliable data are available.
The simulation is developed by “tracking” particles in the medium of interest (e.g, an experimental apparatus): you start from a primary particle, with a certain energy and direction, and you follow its evolution inside the various elements composing the system.
The particle is subjected to consecutive interactions with the medium, each one described by well defined “behaviour rules”. Every one of these interactions produces a variation both of the energy and the trajectory, and can cause the emission of new particles, called secondary.
The tracking procedure continues iteratively, until the primary particle and all the eventual secondary ones are absorbed or come out of the borders of the simulated volume (the so-called “world-volume”).
Since the Monte Carlo method is based on a statistical approach, the procedure needs to be repeated many times: a large number of primary particles is tracked (together with the possible secondary ones), so as to determine the average evolution of the overall system.
Larger is the number of primary particles traced, higher the precision of the simulation.
The inherently probabilistic nature of quantum physics makes sure the evolution of each repetition will be different, while maintaining always the same initial conditions: the particle interactions with the medium will generally occur in different positions and produce different effects. In this way it is possible to determine, for example, the response of LHC modern complex detectors, but also cosmic ray propagation in the Galaxy.
c. Image of the Geant4 simulation of the Elimed transport line (Eli-Beamlines Medical and multidisciplinary application) installed at Eli-Beamlines (Prague). The line serves to transport the beams of charged particles produced from the interaction of a high power laser with a thin target. The project was verified and optimised also with Monte Carlo simulations.
One of the Monte Carlo programmes which – in the last few years – has been establishing itself in the physics community is Geant4, devoted to the Monte Carlo simulation of the interaction between radiation and matter. It is an open source software, developed by an international collaboration including INFN. The first version was released in 1998, but Geant4 is still being updated regularly to improve more and more its precision, reliability and performances.
Geant4 is a sort of “toolkit” containing all the necessary instruments to simulate a real detector: volume and material management, physics models, particle production and tracing, recovery of the information to be saved. Users have the task to pick from the “toolkit” the most suitable tools for a specific purpose and use them. Geant4 precision and flexibility make it a trusted product in various areas: from high energy physics to nuclear physics, space and particle astrophysics.
Moreover, in recent years, the Monte Carlo method application has been become increasingly common in medical physics.
Modern diagnostic techniques, radiotherapy, nuclear medicine imagining, as well as biological damage description, have successfully made use of softwares originally created to simulate physics experiments. This allowed to solve complex problems previously faced to with analytical calculation, often subjected to errors or significant approximations. Among these there is the calculation of the dose, a crucial quantity in radiotherapy, enclosing both the physical aspects relating to the characteristics of the radiations used (energy, type of particle), and the biological characteristics relating to the response of a specific tissue to the radiation. There is also a whole branch of Geant4 devoted to the simulation of biological damage induced by ionising radiations, called Geant4-Dna. This software extension is continuously updated by a group of researchers that includes physicists, computer scientists, and biologists, to increasingly improve the treatment plants of patients undergoing radiotherapy.
Translation by Camilla Paola Maglione, Communications Office INFN-LNF