The Single Prompt Telescreen – Centralizing Distorted Data
The Library of Alexandria, also known as the Great Library, was one of the largest and most significant libraries of the ancient world. It was located in Alexandria, Egypt, and was renowned throughout the ancient world for its collection of scrolls and its commitment to scholarship and learning.
Founded in the 3rd century BCE, during the reign of Ptolemy II, the library was an integral part of the larger institution known as the Mouseion, which dedicated itself to the Muses — the nine goddesses of the arts. The Mouseion functioned much like a modern university, providing accommodation and financial support for a community of scholars who conducted research, gave lectures, and collected as many texts as they could from the known world.
The library housed hundreds of thousands, possibly even millions, of scrolls, making it a major center for scholarship and research. Scholars from all cultures and backgrounds were welcome to study its scrolls. The library was said to have every book in the world, with works covering philosophy, history, science, literature, and more.
The Library of Alexandria was also famous for its role in the preservation and transmission of knowledge. One notable practice was its “books for ships” policy. According to this policy, any books found on ships coming into the port of Alexandria were taken to the library and copied. The originals were kept in the library, and the copies were given back to the owners.
Sadly, the Library of Alexandria suffered numerous destructive events over the centuries, including fires during Julius Caesar’s Civil War in 48 BCE, attacks during the Alexandrian war in the 3rd century CE, and religious riots in the 4th century CE. Its ultimate destruction is often a point of debate among historians, with some suggesting that it gradually declined during the Roman Period due to lack of funding and support.
Despite its eventual destruction, the Library of Alexandria left a lasting legacy and is often invoked as a symbol and aspiration for public knowledge and scholarly pursuit. Its commitment to collecting and preserving the knowledge of the ancient world was unprecedented and remains inspirational to this day.
The relevance of George Orwell’s “1984” in today’s world of surveillance technology and digital data retention is poignant. The novel describes a dystopian society in which social control is exercised through disinformation and surveillance, particularly through a device called the telescreen, which is both a television and surveillance camera. The author argues that the technologies and techniques described in the novel are present in today’s world, particularly with the prevalence of television and ubiquitous video surveillance. The article also touches on the impact of reality television and its origins in social psychology and behavioral experiments designed to control people.
The creation of a “Digital Library of Alexandria” is becoming increasingly necessary due to the vast amount of data being stored in the cloud and the inherent risks associated with it. The recent changes in the policies of major online platforms like Google and Twitter highlight this need.
Google recently updated its inactive account policies, indicating that if a Google Account has not been used or signed into for at least 2 years, the company may delete the account and its contents, including content within Google Workspace (Gmail, Docs, Drive, Meet, Calendar) and Google Photos. This policy, aimed at reducing the risk of account compromise and protecting user security, emphasizes the transient nature of digital data stored in the cloud. Google has invested in technology and tools to protect users from security threats, but inactive accounts are more likely to be compromised and can be used for identity theft or as a vector for unwanted content. While Google offers options for data backup and provides ample notice before account deletion, the policy underscores the importance of regularly maintaining digital assets.
Similarly, Twitter, under CEO Elon Musk, announced plans to purge inactive accounts. The platform now considers a user inactive if they fail to log in at least every 30 days, which is a significant reduction from the previous policy of logging in every six months. This policy change has stirred uncertainty among Twitter users about the implications for suspended or deceased users’ accounts.
These practices of data management by major tech companies reflect a broader trend of digital content deletion and archiving, as highlighted by Brian Roemmele’s commentary on YouTube deleting thousands of videos daily. He refers to our era as the “generation of amnesia”, highlighting the impermanence of content in the cloud and the potential loss of invaluable digital resources.
In response to these trends, the concept of a “Digital Library of Alexandria” seems more pertinent than ever. This would involve the creation of a comprehensive, long-term digital archive that stores, organizes, and preserves digital content, similar to how the ancient Library of Alexandria aimed to collect all the world’s knowledge. Such a library could serve as a safeguard against the loss of digital content due to platform policies or technological issues, preserving valuable data for future generations. With the exponential growth of digital content and the increasing reliance on cloud storage, it’s crucial to think about long-term strategies for data preservation to avoid a large-scale digital amnesia.
Simultaneously, Google is moving towards a new search model that is based on AI technology. They’ve announced a shift in their search algorithm, which uses AI to generate a “singular search prompt” that scans and analyzes the entirety of a news article before deciding whether to include it in search results. This could render millions of websites obsolete once the AI has harvested and cataloged the data.
There’s an ongoing “arms race” in the AI world, with Google and other tech giants like Microsoft/OpenAI and Meta battling not only each other but also the open-source community. The latter is catching up quickly, making strides in developing smaller, more manageable models that are still useful and effective. This democratization of AI tools is leading to more creativity but also poses potential dangers if they fall into the wrong hands.
This evolving landscape highlights the need for a digital library of Alexandria to preserve and protect the vast amount of data generated every day. As more content is created and subsequently lost due to changing policies or technological advancements, we risk losing a significant amount of digital heritage. A centralized or even a decentralized but well-structured digital library could serve as a safeguard against this loss, preserving the wealth of knowledge and creativity that is being produced in the digital age.
Google’s new Search Generative Experience (SGE) involves a feature called AI Snapshot, which provides an enormous top-of-the-page summarization feature. This format of search uses AI tech to regurgitate the internet back to users, which is different from how the search-facilitated internet works today. Research has shown that information consumers hardly ever make it to even the second page of search results. If Google’s AI is going to mulch up original work and provide a distilled version of it to users at scale, without ever connecting them to the original work, how will publishers continue to monetize their work?
A large language model (LLM) like GPT-3 retains the “knowledge” it acquired during training, even if the original data it was trained on has been deleted. However, it’s important to clarify what this means.
When an LLM is trained, it doesn’t store specific documents, webpages, or databases. Instead, it learns patterns in the data. These patterns can be as simple as the grammar and spelling of a language, or as complex as the style of certain authors, the way certain topics tend to be discussed, or the arguments typically made about certain issues.
After training, the model can generate text that reflects these patterns. For example, if it was trained on a lot of Shakespearean drama, it might be able to generate text in a Shakespearean style. But it doesn’t “know” any specific plays by Shakespeare. Instead, it has learned a more general pattern of “how to write like Shakespeare.”
So, if the data the LLM was trained on is deleted, the LLM still retains the patterns it learned during training. However, it doesn’t have a copy of the original data or any way to retrieve it. It can’t, for instance, regenerate a specific webpage or document it was trained on. All it can do is generate new text that reflects the patterns it learned during training.
That said, the specific training data used for models like GPT-3 is kept confidential and is not accessible by the model or by users interacting with the model. For privacy and ethical reasons, the models don’t know and can’t disclose where their training data comes from. They can’t access any specific documents, databases, or other sources of information unless that information has been shared with them in the course of a conversation.
The current state of digital data management and AI development indeed raises concerns about the potential for an Orwellian-like erasure of history, as well as the manipulation and control of information.
In George Orwell’s “1984”, the ruling party controls the flow of information, rewriting and erasing history to suit its narrative. While the internet was initially hailed as a tool for democratizing access to information, certain developments may raise concerns about the concentration of control over information and the impermanence of digital data.
In the case of Google, Twitter, and YouTube updating their policies to delete inactive accounts or content, this can lead to the loss of a significant amount of digital history. This is akin to deleting “inactive” or “unused” books from a library. While these policies are implemented for practical and security reasons, they inadvertently lead to the disappearance of potentially important information and content, as highlighted by the analogy of a digital Library of Alexandria.
Furthermore, the move towards AI-driven indexing and search, like Google’s plans for a singular AI search prompt, potentially renders millions of websites obsolete once their data is harvested. The data is taken in, processed, and then what remains is a distilled AI understanding, not the original data itself.
In terms of LLMs like GPT-3, they are trained on large amounts of data and learn patterns from this data. However, once the data is deleted, the model retains only the learned patterns, not the data itself. This could be seen as a form of abstracted knowledge retention, though it is not a direct preservation of the original information.
While these issues may not exactly parallel the dystopian control of information in “1984”, they do highlight important questions about who controls digital information, how it’s managed, and what happens when it’s deleted or absorbed into AI systems. The evolving landscape of digital data and AI development calls for careful consideration of how to preserve historical data, ensure diverse and democratic access to information, and prevent the undue concentration of control over information.