visit
With that information in mind I wondered if it would be possible to discover whether a deputy was using this quota illegally. So that was the goal of my work, to discover possible fraud patterns inside the open data that the Chamber o Deputies provides regarding the deputies expenses. During the project I also tried to find if some deputy used the money with a given company that provided funds to his campaign back in 2014, the year that they were elected, which is not illegal but it is a good information for our society.
To perform all that I decided to use a graph oriented database. I chose OrientDB as the DBMS because of the query language derived from SQL. In the next sections I will explain the whole process to achieve those goals.
The first thing that I had to do was obtain the data. First I downloaded the deputies expenses report from 2017 from this link: . There are a lot of file options to download, I worked with a CSV file to import the data in OrientDB. After obtaining the expenses report, I had to obtain the campaign donations to the deputies in 2014 elections, and you can find in this link: . It’s also a CSV file with all the donations for each deputy campaign.
Unfortunately the model is in Portuguese, but we can see that the boxes will turn into vertices inside the database, and the arrows express how the boxes / vertices relate to each other. The left box named “Parlamentar” represents a deputy, and we can see that a deputy can perform multiples “Transação” or transactions, while transactions are usually provided by a “Empresa fornecedora” or supplying company. And at the same time I modeled in a way that a company can also perform transactions that are donated to a deputy. With that pattern we can discover if a deputy used the monthly quota money with companies that donated money to the deputy’s campaign. Also we can see that a deputy can be a “Socio_de” or partner of a company, and that a deputy has relatives that can also be partners of a company. With that we can discover if a deputy used the money with companies that he or a relative is a partner.
In OrientDB if you want to find some specific pattern inside your graph you can use the MATCH operator. Let’s see the query below:
MATCH{class: Parlamentar, as: p}−RealizaTransacao−>{class: Transacao ,as: t}−FornecidaPor−>{class: EmpresaFornecedora, as: e},{as: e}−RealizaTransacao−>{class: Transacao, as: t2}−FornecidaPara−> {as: p}RETURN $elements
That query is pretty descriptive, we can see that it tries to find a pattern inside that graph that has a deputy “p” that executes a transaction “t” supplied by a company “e”. And the same company “e” executes a donation “t2” that is donated to the same deputy “p” declared in the beginning. Pretty simple, right? The image below shows the resultSo the result shows us that seven deputies used the monthly quota money with companies that donated money to them in 2014. The light brown vertex represents a deputy, the dark brown vertex represents a company and the purple vertex represents transactions. The picture is hard to see the patterns because it is trying to show the whole result, the image below shows the pattern in more details:
Here we can see that some deputy received one donation from a company and performed three transactions with the same company. Again it is not illegal to do that but it is important to know these kind of information. But what about the fraud patterns? Unfortunately, the deputies relatives information is not public. One way out would be to built a web scraper and retrieve that information from Facebook, Twitter and other sites. But to validate the model and the whole process I inserted fake data to simulate the fraud scenarios, and hopefully if the Chamber of Deputies has this kind of data they could use the same process to inspect the deputies expenses. The query below was built to find a possible fraud pattern in the graph using the fake data that I’ve mentioned.
MATCH{class: Parlamentar, as: p}−RealizaTransacao−>{class: Transacao, as: t}−FornecidaPor−>{class: EmpresaFornecedora, as: e},{as: p}−Socio_De−> {as: e}RETURN $elements
Here we can see that the pattern looks for a deputy “p” that executes a transaction “t” supplied by a company “e”, and the same deputy “p” is a partner of the same company “e” defined before. This is a fraud pattern inside the graph, the image below shows the result.In the image we can see that the deputy in light brown vertex is associated to a company in a dark brown vertex, at the same time the deputy executed several transactions supplied by the same company. As I said before if the Chamber of Deputies has information regarding the companies that the deputy is a partner, they can approach this problem using graphs instead of relational structures. Finally, the query below shows the last pattern that tries to find if the deputy relative is a partner of a company that he or she used money to buy a product or a service.
MATCH{class: Parlamentar, as: p}−RealizaTransacao−> {class: Transacao ,as: t}−FornecidaPor−> {class: EmpresaFornecedora, as: e},{as: p} −Parente_De−>{class: Pessoa, as: p2}−Socio_De−>{as: e}RETURN $elements
The query follows the same approach as the other ones, the difference is that now the deputy “p” is a relative of a person “p2” and the person is a partner of the previously defined company. The result is shown below (Also using fake data).
select SgPartido, count (SgPartido) from Parlamentar GROUP BY SgPartido
The chart above was displayed in the screen after the system sent a HTTP request for OrientDB and retrieved the result as JSON. The JSON was passed to a Javascript chart library known as Highchart.js. With that we are able to see that PMDB was the party that most used the money in 2017. Here we can see the similarity with the SQL language, which makes real easy to adapt to this NoSQL DBMS. Other interesting information is what they spend the money with. The query below finds that.
select TxtFornecedor, in ("FornecidaPor").size() as servicosfrom EmpresaFornecedoraorder by servicos desc limit 15
With that we can see that the companies that most supply services are traveling ones. It makes sense since the majority of the deputies comes from other states, and they need to travel for business for example. Again we see the similarity with SQL language, however, OrientDB allow us to get how much edges of type “FornecidaPor”, which means supplied by, enters in the vertex of a company. For exemple 23529 edges of type “FornecidaPor” enters in the “Cia Aérea - GOL” vertex. So the query is basically returning the name of the company and how much edges of the type “FornecidaPor” enter in the vertex, ordered by the size of edges mentioned.