{"id":82317,"date":"2022-05-09T21:53:30","date_gmt":"2022-05-09T21:53:30","guid":{"rendered":"https:\/\/hanstimmerman.me\/?p=82317"},"modified":"2024-11-04T19:20:09","modified_gmt":"2024-11-04T19:20:09","slug":"kindertoeslagen-onvolkomen-data-en-slechte-algoritmen","status":"publish","type":"post","link":"https:\/\/hanstimmerman.me\/en\/kindertoeslagen-onvolkomen-data-en-slechte-algoritmen\/","title":{"rendered":"Kindertoeslagen, onvolkomen data en slechte algoritmen.\u00a0"},"content":{"rendered":"<p><span style=\"color: #000000;\">Afgelopen week las ik op LinkedIn een interessante <a style=\"color: #000000;\" href=\"https:\/\/www.linkedin.com\/pulse\/data-driven-versus-value-based-marlon-domingus\/\">beschouwing<\/a> over het gevaar van<span class=\"Apple-converted-space\">\u00a0 <\/span>\u2018<i>ongewenst totalitair gedrag door slechte data analyse\u2019<\/i> van <a style=\"color: #000000;\" href=\"https:\/\/www.linkedin.com\/in\/domingus\/\">Marlon Domingus<\/a>, data protection officer bij de Erasmus Universiteit Rotterdam. Hij ging in op het ontstaan van \u2018totalitair gedrag\u2019 gebaseerd op het boek <a style=\"color: #000000;\" href=\"https:\/\/en.wikipedia.org\/wiki\/The_Origins_of_Totalitarianism\">\u2019The Origins of Totalitarianism\u2019<\/a> (Hannah Arendt, 1951) over de gruwelijkheden van de Tweede Wereldoorlog.<span class=\"Apple-converted-space\">\u00a0 <\/span>Marlon legde in zijn beschouwing een vergelijkbare link met de actuele kindertoeslagenaffaire, die ontstond door het verkeerd gebruik van algoritmen bij data analyse van de belastingdienst. Hoe een in eerste instantie oprecht \u2018systeem\u2019 ongemerkt en langdurig vele onschuldige burgers kon vermorzelen.<span class=\"Apple-converted-space\">\u00a0<\/span><\/span><\/p>\n<p><span style=\"color: #000000;\"><b>Willekeurige slachtoffers<\/b><\/span><\/p>\n<p><span style=\"color: #000000;\">In een <a style=\"color: #000000;\" href=\"https:\/\/www.politico.eu\/article\/dutch-scandal-serves-as-a-warning-for-europe-over-risks-of-using-algorithms\/\">artikel<\/a> in Politico is de toeslagenaffaire al eerder als waarschuwing gebruikt voor het enorme risico van het verkeerd gebruik van algoritmen door een overheid. Het schandaal heeft zelfs een aparte Wikipedia <a style=\"color: #000000;\" href=\"https:\/\/en.wikipedia.org\/wiki\/Dutch_childcare_benefits_scandal\">pagina<\/a> die de trieste resultaten uitputtend beschrijven. Ons land als wereldwijd voorbeeld hoe verkeerd gebruikt van data en algoritmen (grotendeels) onschuldige burgers door een systeem tot willekeurige slachtoffers van uitkeringsfraude maakte.<span class=\"Apple-converted-space\">\u00a0<\/span><\/span><\/p>\n<p><span style=\"color: #000000;\">Marlon\u2019s belangrijkste conclusie is dat \u2018<i>The Origins of Totalitarianism<\/i>\u2019 ons leert dat ook in onze tijd dit soort ongekende negatieve effecten voor burgers mogelijk zijn. In onze moderne datagedreven wereld met twijfelachtige besluitvormingsprocessen op basis van onvolledige dataverzamelingen en foute algoritmen. Hoe kan een door bureaucraten beheerd informatiesysteem, namens een democratische regering, burgers uiteindelijk hun fundamentele juridische en morele persoonlijkheid ontnemen? Burgers die slachtoffer worden van beleid om \u2018fraudeurs\u2019 hard aan te pakken. Met alle negatieve gevolgen van dien. Burgers die willekeurige slachtoffers worden door verkeerd gebruik van data, modellen en algoritmen, terwijl dit ironisch genoeg natuurlijk nooit een opzettelijk of geheim plan van de belastingdienst of de regering was geweest.<span class=\"Apple-converted-space\">\u00a0<\/span><\/span><\/p>\n<p><span style=\"color: #000000;\"><b>Onvolledige data en verkeerde algoritmen<\/b><\/span><\/p>\n<p><span style=\"color: #000000;\">De link die Marlon legt tussen de gruwelen uit de Tweede Wereldoorlog waar burgers ook door een \u2018systeem\u2019 willekeurige slachtoffers werden en de gruwelen van de toeslagenaffaire waar een \u2018systeem\u2019 in feite hetzelfde kon doen, geeft een angstig gevoel. Als maatschappij, zowel bedrijfsleven als overheid, hebben we zoveel data verzameld en zo veel algoritmen \u2018bedacht\u2019 dat een verkeerde of onjuiste combinatie dus ongemerkt en ongezien tot dit soort gruwelijkheden kan leiden. Ergens in de keten is de dan menselijke waarde van beslissingen over mensen, verdwenen: <i>\u2018de computer zegt dat u een fraudeur bent . . . \u2018<\/i><\/span><\/p>\n<p><span style=\"color: #000000;\">Het belang van de blijvende inbreng van menselijke waarde bij het gebruik van data, zeker daar waar het over mensen c.q. burgers gaat, is groot. In eerdere discussies over data gedreven rechtspraak, waar blijkt dat de computer uit de veelheid van eerdere zaken vrij zuiver vergelijkbare zaken &#8211; en natuurlijk ook vonnissen &#8211; kan aangeven, moet toch altijd uiteindelijk de rechter, als mens, een eindoordeel geven. Ongeacht al de kille data en het technische materiaal, de menselijke waarde moet voorop staan om een menselijk oordeel te vellen. In dat kader haalt Marlon enkele uitspraken van de filosoof Immanuel Kant aan: <i>&#8220;Gedachten zonder inhoud zijn leeg, intu\u00efties zonder concepten zijn blind\u201d.<\/i><span class=\"Apple-converted-space\">\u00a0<\/span><\/span><\/p>\n<p><span style=\"color: #000000;\"><b>Data zonder doel is waardeloos<\/b><\/span><\/p>\n<p><span style=\"color: #000000;\">Bovenstaande beschouwing leert ons dat data, waarvan we de bron, de betekenis en de inhoud onvoldoende kennen, eigenlijk waardeloos is. Die conclusie is best ernstig, omdat heel veel verzamelde data eigenlijk aan die kwalificatie voldoet. Dat is ook de discussie die vaak wordt gevoerd of je datacentrisch of informatiecentrisch moet werken. Immers data is nog geen informatie, nog geen boodschap maar slechts een waarde, een begrip, een toestand of een teken. Zoals het bekende voorbeeld dat pas de juiste combinatie en vertaling van de verschillende data-elementen &#8211; 32, graden Fahrenheit, buiten &#8211; een bruikbare boodschap wordt, als de vraag is of ik een jas aan moet trekken als ik naar buiten ga. Zonder die vraag heeft die data geen actuele waarde.<span class=\"Apple-converted-space\">\u00a0<\/span><\/span><\/p>\n<p><span style=\"color: #000000;\">Dus data verzamelen, moet een doel, een vraagstelling in zich hebben om zinvol te zijn. Waar wil ik die data voor gebruiken? Waarom wil ik die data in dat formaat hebben? Mag ik die data wel verzamelen en bewaren? Kan ik ook op een andere wijze mijn vraag beantwoorden? Allemaal vragen die vaak niet worden gesteld als men besluit data te verzamelen. Zoals we tijdens de natuurkundelessen al leerden als we een meting gingen uitvoeren: wat wil je meten, wat kun je meten, hoe nauwkeurig is de meting en is \u00e9\u00e9n meting voldoende. Al snel bleek vroeger al, dat je met een op het oog simpele meting al gauw een hele practicum-middag zoet was, en aan het eind nog niet het exacte resultaat had waar je op gehoopt had. Goed meten is lastiger dan je denkt.<span class=\"Apple-converted-space\">\u00a0Goed data verzamelen echter ook.<\/span><\/span><\/p>\n<p><span style=\"color: #000000;\"><b>Modellen en algoritmen<\/b><\/span><\/p>\n<p><span style=\"color: #000000;\">Naast data, gebruiken we modellen en daaruit ontwikkelde algoritmen. Net zo als data een doel- en vraagstelling in zich moet hebben, geldt dat bij een model en algoritme ook. Waarvoor wil ik dat model gebruiken? Welke uitkomsten wil ik hiermee cre\u00ebren? Welke vragen wil ik hiermee beantwoorden? Hoe nauwkeurig kan ik het antwoord berekenen? Ik kom nog uit de tijd dat we de rekenliniaal gebruikten die op fabelachtige wijze een vrij nauwkeurig decimaal antwoord kon geven. Echter zonder dimensie. Het was 1,3675 maar kon ook 13,675 zijn, of 136,75 of 0,13675. Die orde van grootte moest je als rekenaar zelf benoemen. Dat maakte dat je qua ordegrootte nooit echt de fout in kon gaan.<\/span><\/p>\n<p><span style=\"color: #000000;\">Met de komst van rekenmachine en computer verdween die \u2018kennis over ordegrootte\u2019 helaas en werd de uitkomst van de computer de waarheid. Zelfs als het ordes te klein of te groot was. Waarmee het begrip van de vragenstelling, het begrip van de berekening en het begrip van de uitkomst makkelijk vervaagde. Rekenen omdat we kunnen rekenen. Daarom is het gebruik van verzamelde data en computermodellen voor niet wiskundig onderlegde personen risicovol. Immers, als je gevoelsmatig niet de uitkomsten kunt aanvoelen en controleren, intu\u00eftief gekke antwoorden niet herkent en qua ordegrootte lastig kunt inschatten of een antwoord re\u00ebel is, dan worden uitkomsten heel snel onzin en flauwekul.<span class=\"Apple-converted-space\">\u00a0<\/span><\/span><\/p>\n<p><span style=\"color: #000000;\">Wiskunde is de basis voor informatica. Iedereen die met data, modellen en algoritmen werkt en aan de uitkomsten waarde wil toekennen, dient wiskundig geschoold zijn. En begrijpen dat niets absoluut is en altijd afwijkingen en toleranties heeft. Anders is het gevaar groot dat dit gebruik door leken tot ontsporingen leidt, zoals besproken in het begin van deze blog. Dat lijkt erg zwart wit, maar gezien de maatschappelijke gevaren zoals besproken aan het begin van deze blog, geen overbodige conclusie. In onze groeiende virtuele, datadreven wereld dienen we steeds strikter onder vingers aan de pols te houden wat betreft &#8211; en niet in de laatste plaats de controle van &#8211; de kwaliteit van data, modellen en algoritmen. <span class=\"Apple-converted-space\">\u00a0<\/span><\/span><\/p>\n<p><span style=\"color: #000000;\">Photo by <a style=\"color: #000000;\" href=\"https:\/\/unsplash.com\/@joshhild?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText\">Josh Hild<\/a> on <a style=\"color: #000000;\" href=\"https:\/\/unsplash.com\/collections\/58749807\/welcome-to-the-xxi.5th-century?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText\">Unsplash<\/a><\/span><\/p>\n<p style=\"text-align: center;\"><span style=\"color: #000000;\">&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212; \u00a0Translated by ChatGPT \u00a0&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;<\/span><\/p>\n<p><span style=\"color: #000000;\">Last week, I read an interesting analysis on LinkedIn about the danger of <em>&#8220;unwanted totalitarian behavior through poor data analysis&#8221;<\/em> by Marlon Domingus, Data Protection Officer at Erasmus University Rotterdam. He explored the emergence of totalitarian behavior based on the book <em>The Origins of Totalitarianism<\/em> (Hannah Arendt, 1951), which discusses the atrocities of World War II. Marlon made a similar connection to the current childcare benefits scandal, which arose from the misuse of algorithms in data analysis by the tax authorities, illustrating how an initially well-meaning &#8220;system&#8221; could unknowingly and for an extended period harm countless innocent citizens.<\/span><\/p>\n<p><span style=\"color: #000000;\"><strong>Random Victims<\/strong><\/span><\/p>\n<p><span style=\"color: #000000;\">In an article in <em>Politico<\/em>, the childcare benefits scandal was used as a warning about the enormous risk of misuse of algorithms by a government. The scandal even has its own Wikipedia page that exhaustively describes the tragic outcomes. Our country is a global example of how the misuse of data and algorithms turned largely innocent citizens into random victims of benefit fraud.<\/span><\/p>\n<p><span style=\"color: #000000;\">Marlon\u2019s main conclusion is that <em>The Origins of Totalitarianism<\/em> shows us that, even in our time, such unprecedented negative impacts on citizens are possible. In our modern, data-driven world with questionable decision-making processes based on incomplete data collections and flawed algorithms, how can an information system, managed by bureaucrats and on behalf of a democratic government, ultimately strip citizens of their fundamental legal and moral rights? Citizens become victims of policies aimed at cracking down on &#8220;fraudsters,&#8221; with all the negative consequences that follow. People become random victims of data misuse, models, and algorithms, even though it was, ironically, never an intentional or secret plan by the tax authorities or the government.<\/span><\/p>\n<p><span style=\"color: #000000;\"><strong>Incomplete Data and Faulty Algorithms<\/strong><\/span><\/p>\n<p><span style=\"color: #000000;\">The connection Marlon draws between the horrors of World War II, where citizens also became random victims of a &#8220;system,&#8221; and the horrors of the benefits scandal, where a &#8220;system&#8221; essentially did the same, is unsettling. As a society\u2014both in business and government\u2014we have gathered so much data and created so many algorithms that an incorrect or inaccurate combination can lead, unnoticed and unseen, to such horrors. Somewhere in the chain, the human value in decision-making has disappeared: <em>&#8220;The computer says you\u2019re a fraudster&#8230;&#8221;<\/em><\/span><\/p>\n<p><span style=\"color: #000000;\">The importance of maintaining human input in the use of data, especially when it concerns people, is significant. In earlier discussions on data-driven justice, where it appears that computers can identify quite accurately similar cases\u2014and, of course, verdicts\u2014ultimately, a judge, as a human, must always make the final judgment. Regardless of the cold data and technical material, human value must come first to render a humane judgment. In this context, Marlon references philosopher Immanuel Kant: <em>&#8220;Thoughts without content are empty; intuitions without concepts are blind.&#8221;<\/em><\/span><\/p>\n<p><span style=\"color: #000000;\"><strong>Data Without Purpose is Worthless<\/strong><\/span><\/p>\n<p><span style=\"color: #000000;\">The above reflection teaches us that data, of which we insufficiently know the source, meaning, and content, is essentially worthless. This conclusion is significant, as much of the collected data actually meets that description. This also fuels the debate about whether to work data-centrically or information-centrically. After all, data is not yet information or a message but merely a value, a concept, a condition, or a symbol. Like the familiar example where only the right combination and translation of various data elements\u201432, degrees Fahrenheit, outside\u2014becomes a useful message if the question is whether to put on a coat when going outside. Without that question, the data has no real value.<\/span><\/p>\n<p><span style=\"color: #000000;\">Thus, data collection must have a purpose, a question, to be meaningful. Why do I want that data? Why do I want it in that format? Am I even allowed to collect and store it? Can I answer my question in another way? These are all questions often not asked when deciding to collect data. As we learned in physics classes when taking measurements: What do you want to measure? What can you measure? How accurate is the measurement, and is one measurement enough? It soon became clear that even a seemingly simple measurement could take up an entire lab session, and the exact result hoped for wasn\u2019t always achieved. Good measuring is harder than you think. Good data collection, too.<\/span><\/p>\n<p><span style=\"color: #000000;\"><strong>Models and Algorithms<\/strong><\/span><\/p>\n<p><span style=\"color: #000000;\">Besides data, we use models and derived algorithms. Just as data must have a purpose and question, the same applies to models and algorithms. What do I want to use that model for? What outcomes do I want to create with it? What questions do I want it to answer? How accurately can I calculate the answer? I come from a time when we used slide rules, which could remarkably give a fairly accurate decimal answer. However, without dimensions\u2014it was 1.3675 but could also be 13.675, 136.75, or 0.13675. You had to determine the order of magnitude as the calculator, which meant you could never go entirely wrong with scale.<\/span><\/p>\n<p><span style=\"color: #000000;\">With the arrival of calculators and computers, that &#8220;knowledge of scale&#8221; disappeared, and the computer\u2019s output became truth, even if it was magnitudes too small or too large. The understanding of the question, the understanding of the calculation, and the understanding of the result easily faded away. Calculating because we can calculate. Therefore, the use of collected data and computer models for non-mathematically trained people is risky. If you can\u2019t intuitively feel and verify the outcomes, don\u2019t recognize strange answers, and struggle to assess if an answer is reasonable, the outcomes quickly become nonsense.<\/span><\/p>\n<p><span style=\"color: #000000;\">Mathematics is the foundation of informatics. Anyone working with data, models, and algorithms and wanting to ascribe value to the results must be mathematically trained. And they must understand that nothing is absolute and always has deviations and tolerances. Otherwise, the risk is great that misuse by laypersons will lead to misinterpretations, as discussed at the beginning of this blog. This might seem black-and-white, but given the societal risks discussed at the beginning, it\u2019s not an unnecessary conclusion. In our growing virtual, data-driven world, we must keep a stricter watch on the quality and control of data, models, and algorithms.<\/span><\/p>","protected":false},"excerpt":{"rendered":"<p>Last week, I read an interesting analysis on LinkedIn about the danger of &#8220;unwanted totalitarian behavior through poor data analysis&#8221; by Marlon Domingus, Data Protection Officer at Erasmus University Rotterdam. He explored the emergence of totalitarian behavior based on the book The Origins of Totalitarianism (Hannah Arendt, 1951), which discusses the atrocities of World War II. Marlon made a similar connection to the current childcare benefits scandal, which arose from the misuse of algorithms in data analysis by the tax authorities, illustrating how an initially well-meaning &#8220;system&#8221; could unknowingly and for an extended period harm countless innocent citizens.<\/p>","protected":false},"author":3,"featured_media":82323,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_feature_clip_id":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_post_was_ever_published":false},"categories":[194,72,75],"tags":[107,108,128,130,132,139,438,83,87],"class_list":["post-82317","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-transformatie","category-digitalisation","category-innovation","tag-data","tag-artificial-intelligence","tag-kwaliteitsborging","tag-digitale-platformen","tag-trust","tag-kwetsbaarheid","tag-toeslagenaffaire","tag-automatisering","tag-digitalisering"],"jetpack_featured_media_url":"https:\/\/i0.wp.com\/hanstimmerman.me\/wp-content\/uploads\/2022\/05\/josh-hild-WjZ4eaHq9G4-unsplash-scaled-e1652133083971.jpg?fit=2041%2C998&ssl=1","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/hanstimmerman.me\/en\/wp-json\/wp\/v2\/posts\/82317","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hanstimmerman.me\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hanstimmerman.me\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hanstimmerman.me\/en\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/hanstimmerman.me\/en\/wp-json\/wp\/v2\/comments?post=82317"}],"version-history":[{"count":14,"href":"https:\/\/hanstimmerman.me\/en\/wp-json\/wp\/v2\/posts\/82317\/revisions"}],"predecessor-version":[{"id":85057,"href":"https:\/\/hanstimmerman.me\/en\/wp-json\/wp\/v2\/posts\/82317\/revisions\/85057"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/hanstimmerman.me\/en\/wp-json\/wp\/v2\/media\/82323"}],"wp:attachment":[{"href":"https:\/\/hanstimmerman.me\/en\/wp-json\/wp\/v2\/media?parent=82317"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hanstimmerman.me\/en\/wp-json\/wp\/v2\/categories?post=82317"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hanstimmerman.me\/en\/wp-json\/wp\/v2\/tags?post=82317"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}