Archiveren: een hoogwaardige sexy sport?

5 juni 2024

|In blockchain, Artificial Intelligence, archiving, transformatie, Data ownership, Digital Economy, Innovation

Onze realtime informatiewereld vraagt om ‘realtime archivering van brondata’. Dit is nodig om alle huidige datastromen herleidbaar en terugvindbaar te kunnen archiveren. Er is een realtime keuze van bronnen en stromen vooraf nodig, omdat het onmogelijk is alle daaruit voortkomende exploderende streamingdata fysiek ergens op te slaan. Er is wereldwijd simpelweg niet genoeg opslagruimte beschikbaar. Dit betekent dat we razendsnel keuzes moeten maken over wat we wel en niet willen – of moeten – bewaren. Het vak archiveren wordt strategisch. Het vak wordt zelfs sexy en artistiek als je denkt aan online archivering om op reproduceerbare wijze AR- en VR-getoonde media vast te leggen. Daarnaast creëert centraal beheerde, realtime archivering een gelijke informatiepositie voor iedereen die recht heeft om die data te vinden en in te zien. Het groeiend aantal online datastromen stelt nieuwe en andere eisen aan archivering om de vastgelegde data later nog te kunnen reconstrueren naar de context waarin die data werd gecreëerd, gecommuniceerd en gebruikt.

De Ball Metaverse index

De metaverse komt langzaam onze maatschappij binnen. Naast onze bekende fysieke wereld ontstaan digitaal gecreëerde, virtuele werelden. Werelden die ons – in symbiose met onze fysieke wereld – dagelijks omringen. Het gebruik van sociale media maar ook gaming, is voor velen al een dagelijkse ‘realiteit’ van een virtuele omgeving. Digitale werelden voor ontwerpen, kunst, retail, vermaak en oorlogsvoering bestaan ook al lang. Een opkomende markt die al miljarden euro’s groot is. De Ball Metaverse index toont de bedrijven die voorop lopen in de ontwikkeling van metaverse technologie.

Wie staan op die Ball Metaverse index? Bovenaan staat Roblox, maar verrassend op de tweede plaats staat chip-producent Nvidia. Naast Meta zien we natuurlijk Microsoft, Amazon en Apple. Stuk voor stuk investeren zij miljarden in virtuele technologie waarvan we de afgeleide techniek als maatschappij – langzaam maar onomkeerbaar – gaan gebruiken en omarmen. Virtuele, gezamenlijk beheerde en gevormde ruimtes (of netwerken van ruimtes) waarin mensen samen komen om sociale contacten te leggen en om virtuele evenementen zoals concerten, werk en games bij te wonen. De metaverse sluit nauw aan bij het concept van Web 3.0. Aangezien metaverse een simulatie is van de echte wereld met digitale bezittingen, geld, identiteiten en ‘vastgoed’, zal ook hier data moeten worden vastgelegd en dus gearchiveerd. Dit vanuit de wenselijkheid ook deze werelden wettig en ordentelijk te kunnen besturen en eenduidig en herleidbaar vast te leggen.

Datavastlegging in de metaverse

De handel in cryptomunten, tokens en bezit (NFT’s) wordt momenteel in de vorm van blockchain-opslag vastgelegd. Er evolueert een markt die niet meer gecentraliseerd is en waar gebruikers praktisch alles online kunnen creëren, controleren en daarna monetariseren. Twintig jaar geleden was ik actief in Second Life, één van de eerste virtuele werelden met avatars die virtueel eigen grond, huizen en goederen bezaten of maakten. Met een eigen munt ‘de Linden’ die je kon kopen en inwisselen tegen dollars. Ik heb uit die oude simpele virtuele wereld nog in text-files vastgelegde conversaties en afbeeldingen. Maar die oude wereld weer online terugroepen of construeren, is schier onmogelijk. De metaverse had en heeft nog geen archief.

Over blockchain heb ik uitputtend geblogd. Blockchains bevatten gedistribueerde, in de tijd vastgelegde, encrypte transacties. In die zin is het een onveranderlijke tijdlijn en audit-trail uit het verleden. Maar het is eerder een register dan een archief. Naast de kale registraties van transacties geeft het nauwelijks een beeld en een context van dat verleden. Interoperabele open standaarden om digitale artefacten voor de toekomst te behouden, ontbreken nog. Virtuele werelden gebruiken – net als bij de opkomst van digitalisering – snel verbeterende en dus veranderende bestandsformaten en protocollen, waardoor het lastig is om informatie onderling te delen. Laat staan daarmee een ordentelijk archief is op te bouwen.

Realtime archivering lost veel problemen op

Net als de eerste digitalisering leidde tot gaten in ons verleden – ‘digitale dementie’ – zal dat ons met de groei van streaming data waarschijnlijk ook overkomen. Maar een gewaarschuwd mens telt voor twee: dus het is hoogste tijd om over realtime archiveren na te denken. Hoe we vluchtige chatberichten en beelden op een efficiënte en compliant manier gaan vastleggen, opdat we ze in de toekomst gestructureerd kunnen terugroepen? Niet alleen historisch belang, maar vooral in het kader van waarheidsvinding en daarbij behorende dossiervorming. Immers, net als in de fysieke wereld zal misdaad, fraude en oplichting in deze virtuele online wereld welig tieren. Hoe organiseer je nu al de forensische mogelijkheden voor toekomstige digitale fraude en misdaad?

Via onze overheid moeten we als maatschappij deze digitale toekomst reguleren en wettelijk vastleggen. Er zijn archiefplatformen nodig om onze ‘digitale werkelijkheden’ vast te leggen en te conserveren. Democratisch toegankelijke documenten, berichten en momentopnamen van belangrijke virtuele gebeurtenissen en handelingen. Denk naast de reeds gigantische stroom chatdata ook aan de exploderende stroom IoT-data van alle dingen om ons heen. Data die in mirror-worlds zoals digitaal twins onsdigitaal een beeld van de werkelijkheid tonen. Op basis waarvan we beslissingen hebben genomen. Dus waarvan de daarbij gebruikte brongegevens goed en onveranderlijk opgeslagen zullen moeten zijn. Hoe gaan we dat doen? Hebben we überhaupt al schaalbare archiefsystemen die dat kunnen?

Archiveren: een hoogwaardige sport

Miljarden berichten en gegevens archiveren en toch de relaties tussen en context van die stroom van events en gebeurtenissen bewaren, stelt hoge eisen aan datamanagement. Het vraagt Formule 1 prestaties en daarvoor zijn mensen, systemen en platformen van F1 niveau nodig. ‘By design’ ontworpen om ongeëvenaarde prestaties te leveren. Maar technisch en economisch toch haalbaar en schaalbaar zijn. Onze huidige content management systemen, ontstaan eind vorige eeuw, zijn hier nooit voor ontworpen. Prima om documentenstromen vast te leggen en te beheren. Maar je moet ze niet vragen snel chatverkeer en zeker niet metaverses vast te leggen en te beheren. Dat is prestatietechnisch een totaal andere league.

Die league vraagt naast hoogwaardige processoren, zoals Nvidia maakt, ook hoogwaardige software die daar op draait. De ‘engine’ van elke software is de programmeertaal waarmee het is geschreven. Dus geen hoogontwikkelde, administratieve programmeertalen, maar snelle, eenvoudige en wiskundig gebaseerde talen zoals LISP. Een praktische wiskundige notatie voor snelle programma’s en al lange tijd geliefd in de snelle wereld van AI. Een broncode gebaseerd op vector-beheerde lijsten. Die een datastructuur met boomstructuren, takken, bladeren, automatisch opslagbeheer manipuleert. Met eenvoudige functies hogere orde schema’s realiseert. De perfecte interne ‘engine’ voor supersnelle AI én archiefsystemen.

Clojure: een modern LISP dialect

De behoefte aan lichtvoetige broncodes zoals LISP heeft zo’n tien jaar geleden geleid tot de ontwikkeling van Clojure, een dialect van LISP met wat extra C⁺ mogelijkheden en draaiend op Java gebaseerde virtual machines. De C, L en J vormden al snel een woordspeling op de command-line ‘closure-compiler’. De afgelopen jaren zien we het gebruik van Clojure groeien hoewel er wereldwijd nog maar enkele tienduizenden programmeurs zijn die deze taal beheersen. Een nieuwe league vraagt nieuwe spelers met nieuwe talen en gereedschappen.

Photo by Rodolfo Clix

———————- translated by ChatGPT ——————–

Archiving: A High-Performance, Sexy Sport?

Our real-time information world demands ‘real-time archiving of source data.’ This is necessary to be able to archive all current data streams in a traceable and retrievable manner. Real-time selection of sources and streams beforehand is crucial because it is impossible to physically store all the resulting exploding streaming data somewhere. There simply isn’t enough storage available worldwide. This means making rapid decisions about what you do and do not want—or need—to preserve. The profession of archiving becomes strategic. The profession even becomes sexy and artistic when you think about online archiving to reproducibly document AR- and VR-displayed media. Additionally, centrally managed, real-time archiving creates equal information access for everyone entitled to find and view that data. The growing number of online data streams places new and different demands on archiving to be able to reconstruct the recorded data later into the context where that data was created, communicated, and used.

The Ball Metaverse Index

The metaverse is slowly entering our society. In addition to our familiar, singular physical world, many digitally created virtual worlds are emerging. Worlds that, in symbiosis with our unique physical world, surround us in our daily lives. The use of social media and gaming is already a daily ‘reality’ of a virtual environment for many. Digital worlds for design, art, retail, entertainment, and warfare also already exist. This emerging market is already worth billions of euros. The Ball Metaverse Index showcases the companies leading in the development of metaverse technology.

Who is on the Ball Metaverse Index? At the top is Roblox, but surprisingly, chip producer Nvidia is in second place. In addition to Meta, we also see Microsoft, Amazon, and Apple. Each of them is investing billions in virtual technology, which we, as a society, will slowly but inevitably adopt and embrace. Virtual, jointly managed and formed spaces (or networks of spaces) where people come together to make social contacts and attend virtual events such as concerts, work, and games. The metaverse closely aligns with the concept of Web 3.0. Since the metaverse is a simulation of the real world with digital assets, money, identities, and ‘real estate,’ data will also need to be recorded and archived here to ensure these worlds can be legally and orderly governed.

Data Recording in the Metaverse

The trade in cryptocurrencies, tokens, and ownership (NFTs) is currently being recorded in the form of blockchain storage. A market is evolving that is no longer centralized, where users can practically create, control, and monetize everything online. Twenty years ago, I was active in Second Life, one of the first virtual worlds with avatars that virtually owned land, houses, and goods or created them. With its own currency, ’the Linden,’ which you could buy and exchange for dollars. I still have conversations and images recorded in text files from that old, simple virtual world. But recalling that old world online is nearly impossible. The metaverse had and still has no archive.

I have blogged extensively about blockchain. Blockchains contain distributed, time-stamped, encrypted transactions. In that sense, it is an immutable timeline and audit trail of the past. But it is more of a register than an archive. Besides the bare records of transactions, it hardly provides a picture and context of that past. Interoperable open standards to preserve digital artifacts for the future are still lacking. Virtual worlds use rapidly improving and thus changing file formats and protocols, just as in the early days of digitization, making it difficult to share information among themselves. Let alone build an orderly archive with them.

Thinking About Online Archiving

Just as the first digitization led to gaps in our past—’digital dementia’—this will likely happen to us in that new virtual world of streaming data. But a warned person counts for two: so it is now time to start thinking about online archiving. How are we going to record fleeting chat messages and images efficiently and compliantly so that we can recall them in a structured manner in the future? Not only for historical interest but especially in the context of finding the truth and forming corresponding dossiers. After all, just as in the physical world, crime, fraud, and deception will flourish in this virtual online world. How do you organize the forensic world now for future digital fraud and crime?

Through our government, we must regulate and legally establish this future as a society. This requires archive platforms to record and preserve these ‘digital realities.’ Catalogs with documents, messages, and snapshots of important virtual events and actions. Think of the already enormous stream of IoP data (chat data via the Internet of People) and the exploding stream of IoT data from all the things around us. Data that in mirror worlds like digital twins show us a virtual image of reality. Based on which we will make decisions. Therefore, the source data used for this must be well and immutably stored. How are we going to do that? Do we even have scalable archiving systems that can do that?

Archiving: A High-Performance Sport

Archiving billions of messages and data and preserving the relationships and context of that stream of events and occurrences sets higher standards for data management. It requires Formula 1 performance, and systems and platforms of F1 level are needed. ‘By design’ designed to deliver unparalleled performance but still technically and economically feasible and scalable. Our current content management systems, developed at the end of the last century, were never designed for this. Great for recording and managing current document flows. But you shouldn’t ask them to quickly record and manage chat traffic and certainly not metaverses. That is a completely different league in terms of performance.

That league requires not only high-quality processors, like those made by Nvidia, but also high-quality software that runs on them. The ‘engine’ of any software is the programming language in which it is written. So no highly developed administrative programming languages, but fast, simple, and mathematically based languages like LISP. Derived from List Programming (list processing). An old practical mathematical notation for fast programs and long favored in the world of AI. The source code of Lisp consists of vector-managed lists. Lisp programs manipulate the source code as a data structure with tree structures, branches, leaves, automatic storage management, and can realize higher-order schemes with simple functions. The perfect internal ‘engine’ for super-fast AI and archiving systems.

Clojure: A Modern LISP Dialect

The need for lightweight source codes like LISP led to the development of Clojure about ten years ago, a dialect of LISP with some additional C+ capabilities running on Java-based virtual machines. The C, L, and J quickly formed a pun on the command-line ‘closure compiler.’ In recent years, we have seen the use of Clojure grow, although there are still only tens of thousands of programmers worldwide who master this language. A new league requires new players with new languages and tools.

Hans Timmerman

All author posts