ARTISTIC RESEARCH AND RECONSTRUCTIONS OF INNER SPEECH BETWEEN THE BRAIN AND THE COMPUTER

ARTISTIC RESEARCH AND RECONSTRUCTIONS OF INNER SPEECH BETWEEN THE BRAIN AND THE COMPUTER

Subtitle: Possibilities of Semantic Language Reconstruction and Deep Image Reconstruction Using Neuroimaging Tools and Artificial Intelligence to Design Experimental Foundations Based on Elements of Artistic Research in Visual Art.

Abstract

This paper offers an analysis of the latest knowledge from 2022 and 2023 in the field of brain-computer interface (BCI) focused on implementing approaches such as the semantic reconstruction of language and deep image reconstruction in the research project, Aesthetics of Artificial Intelligence Art. These approaches allow for the reconstruction of inner speech in a non-invasive and mobile way. The potential of such procedures expands the operational field of computational neuroaesthetics in the realm of artificial intelligence art. The extension is presented in speculative proposals of experiments as artistic research with the intentions of also speculative predictive models of aesthetics of artificial intelligence art and AIARTWORLD.

This contribution expands the research field of the research project Aesthetics of Artificial Intelligence Art. The research project examines the relationships between artificial intelligence art and computational neuroaesthetics. The subject of research and approaches to it are articulated from three basic positions. First, the theoretical position is situated in the discourse of art and aesthetics of artificial intelligence. The analytical position is based on the study of contemporary artistic practice and will identify artistic strategies in the subject area, especially the goals of the project. The experimental position represents a practical-speculative approach in carrying out its research possibilities of art and aesthetics, which can also be understood as artistic research. The goal of the research is to provide an answer to the question of whether it will be possible to create non-human artificial intelligence art and under what conditions.

For achieving research goals, but also as a significant contribution, it is possible to consider the acceleration of application possibilities of computational neuroaesthetics, the benefits of which have the potential to influence related methodological approaches even beyond the framework of this research project.

The ambition of the project is to create a research space at the intersections of neuroscience, aesthetics, artificial intelligence, computer science, and artistic research. By subsuming the mentioned theoretical starting points given the current state of research, I propose a model for the construction of the research project definition of Aesthetics of Artificial Intelligence Art. The model is derived from the auxiliary definition of computational neuroaesthetics, which blends into the conceptual framework of Artificial Aesthetics by Manovich and Arielli. Aesthetics of Artificial Intelligence Art (AAIA), is an area of computational aesthetics, the goal of which is to understand and create artificial systems capable of analyzing and producing aesthetic experiences and enabling their transfer between humans and machines. This multidisciplinary field combines computer science, artistic practice, art science, neuroscience, psychology, and philosophy. AI technical means are used in AAIA to analyze, understand, and simulate human thinking, perception of consciousness simulation processes as well as creating models of the World of artificial intelligence art. Research in the field of AAIA supports the development of intelligent systems that can improve creative activity, but also allow for a deeper understanding of processes of natural and artificial consciousness.

This contribution expands the possibilities of Socratic models[1], henceforth SM, which are a framework in which, through language – via challenge, it is possible to assemble several large pre-trained models without the need for training to perform new subsequent multimodal tasks. Particularly important for use in the proposed research is that SM is useful for improving the accuracy and reliability of machine learning models in areas where there is much ambiguity or uncertainty, such as natural language processing and image recognition. It opens up the possibility of learning with a human teacher who interacts with the model and asks it questions about the data it is evaluating. By asking questions, the teacher can lead the model to a more precise understanding of the data and help it avoid common mistakes and biases. Based on the above, I propose a speculative model that implements fNIRS methods or electroencephalography for a more reliable reading of the emotional or other state caused by the visceral activity of the examined subject. In the sections Semantic Reconstruction and Language and Deep Image Reconstruction, the usage frameworks are explained. The basic principle of these technological possibilities is the ability to analyze semantic patterns of examined subjects via a mobile brain-computer interface (BCI) and make very accurate predictions of their subsequent continuation. I see the most powerful application possibility for the research project as the feasibility of reading inner speech. A very strong multimodal potential could be key in the artificialization of artistic strategies.

For the modeling of artistic research experiments, the works-strategies-projects The Changing Room (2019) and Unlearning Language (2022) by Lauren McCarthy and UUmwelt by Pierre Huyghe in collaboration with Kamitani Lab were used. Criteria for deriving new projects were established based on the analysis of necessary prerequisites for human acceptance of the ideal AIARTWORLD model. The criteria stemmed from the logical statement that without human empathy to accept even a hypothetical model, it is not possible to predict such a model. Based on this, the experiments of artistic research were formulated to respond to the acceptance of such a model. The starting points were abstracted into questions. Is it possible for a human to accept a work created by a non-human? Is it possible for a human to accept the myth of a non-human author? What is the difference in perception between traditional mythology (materially unprovable) and AI mythology (materially provable, by code, etc.)? To propose experiments, the principles of semantic reconstruction and language and deep image reconstruction were used.

Figure 1. Displays the scheme and positions of the contribution in relation to individual research parts

Source: Archive of Tomáš Marušiak

BRAIN-COMPUTER INTERFACE USING AI: READING THOUGHTS FROM THE HUMAN BRAIN

The research project calculates the use of primarily neuroimaging technology fNIRS, which is non-invasive and uses light in the near-infrared region to measure changes in the concentration of oxygenated and deoxygenated hemoglobin in the brain. These changes indicate neuronal activity and can be used to study brain function in response to various stimuli or tasks. fNIRS is often used as an alternative to other neuroimaging techniques, such as fMRI because it is portable and does not require participants to lie still inside a scanner. fNIRS can be used in a wide range of applications including cognitive neuroscience, clinical research, and human-machine interactions. For the proposal design, or attempts to define the operational framework, 2 experiments were selected that may have application potential.

As the first starting point, I present the study Semantic Reconstruction of continuous language from non-invasive brain Recordings[2], which contains a comprehensive analysis of the results of experiments using the thought decoder application. It is a non-invasive decoder that can reconstruct language based on semantic representations captured by functional magnetic resonance (fMRI). These records decode to generate „understandable word sequences“. For thought reading, a solution to the problem was proposed, which was that cortical activity is measured by fMRI based on blood oxygenation, which manifests itself significantly later, about 10 seconds after a thought is formed in the form of speech, whether uttered or internal. To supplement the missing link in understanding thoughts, methods were proposed using the training of thought and language specifics of the subjects under study, including using GPT-1 for reconstruction.

The second starting point is based on the possibilities of reconstructing stimulation images from cortical brain activity records derived from the study Attention modulates neural representation to render reconstructions according to subjective appearance[3]. This study was carried out by Tomoyasu Horikawa and Yukiyasu Kamitani and assumes that the perception of stimuli is created by stimulated processes both top-down and bottom-up. For reconstruction, a method of decoding cortical activity using a deep neural network (DNN) was used to form a corresponding image that the participant perceived. The authors of the study emphasize that top-down attention prevents stimulus-evoked reactions so that the reconstruction is rendered by the subjective appearance.

SEMANTIC RECONSTRUCTION OF CONTINUOUS LANGUAGE

Experiments in creating an interface between a computer and the human brain have been around for quite some time.[4],[5] Most of these attempts, however, required intracranial surgical intervention. Such implementations were performed on people with damage to speech understanding centers or loss or disruption of speech in various forms of aphasia, to regenerate or argue language-speech defects. As an example, I mention the study High-performance brain-to-text Communication via Handwriting (2021),[6] which examined the use of intracortical brain-computer-interface (BCI), which decodes handwriting movement attempts from the motor cortex and translates it into text in real-time using a recurrent neural network decoding approach. Another example could be the study Neuroprosthesis for Decoding Speech in a Paralyzed Person with Anarthria (2021),[7] which explores the possibilities of decoding words and sentences directly from the cortical activity of paralyzed patients, which, according to the authors, can represent progress compared to existing methods of assisted communication. Non-invasive methods have great potential to capture a wide range of language information. The study Natural speech reveals the semantic maps that tile the human cerebral cortex (2016), [8] refers to findings that indicate that language is represented in areas of the cerebral cortex collectively known as the „semantic system“. The authors of the study mapped semantic selectivity across the cortex using voxel modeling of functional MRI (fMRI) data collected while subjects listened to hours of narrative stories. This explained a system that is organized into complex patterns that seem to be consistent among individuals. Subsequently, a new generative model was used to create a detailed semantic atlas. The results helped formulate the hypothesis that most areas within the semantic system represent information about specific semantic domains or groups of related concepts, and our atlas shows which domains are represented in individual areas.[9]

The decoder mentioned in the source study non-invasively captures the cortical activity of the human brain via functional fMRI and reconstructs the conscious perception or imaginative stimuli of continuous natural language. By the term natural language, we mean a language that has naturally evolved by human usage.[10] To achieve the expected goal, it was necessary to overcome the low time resolution of fMRI. This means that the level of oxygen in the blood (BOLD) rises and falls over approximately 10 seconds. Just to illustrate, I add that in European languages the cadence of words ranges between 2-3 words per second[11],[12]. In the given interval, more than 20 image-thought representations can arise in the brain. Having more words than decodable images was a conceptual problem but also its solution via AI meant a great leap forward.

METHODS

The decoder is capable of predicting more words and sequences of words as well as their context than fMRI images records. For these purposes, an encoding model based on AI predictions about possible reactions of the subject to natural language was used. Each subject listened to a series of naturally spoken narrative stories in the fMRI scanner testing environment for 16 hours. This experiment provided five times more data than a typical language experiment and as usual language experiments. The encoding model was trained on the aforementioned data set, which by selecting semantic elements can capture the meaning of stimulating phrases. The starting point follows from the study Brains and algorithms partially converge in natural language processing (2022),[13] which explores modern language algorithms and their possibility of converging towards solutions similar to the human brain with the aim of revealing the basics of natural language processing. It operates with the verified assumption that training Deep Learning (DL) to predict masked words from a large amount of text generates activations similar to those in the human brain.

The encoding model predicts with high accuracy, regardless of any sequence of words, how the subject’s brain would interact with the heard sequence. There is a theoretical assumption to identify the most likely stimulating words based on recorded brain reactions to model predictions for each possible sequence of words. For completeness, two studies need to be mentioned on the basis of which the aforementioned theoretical starting points were set. The first study Bayesian reconstruction of natural images from human brain activity (2009[14]) demonstrates the use of a Bayesian decoder that uses fMRI records from visual areas to reconstruct complex natural images. Such a decoder uses: a structural encoding model, a semantic encoding model for already recorded data on the structure and semantic content of natural images. The second study Reconstructing visual experiences from brain activity evoked by natural movies (2011),[15] explores the possibilities of using a decoder that allows reconstructions of movies through capturing the space-time structure of the watched content. These results show that dynamic brain activity measured in natural conditions can be decoded using current fMRI technology.

The main problem related to the decoder in question can be established as a loss of similarity of natural language if the word sequences are too large. For the purpose of limiting this problem and creating candidate sequences, i.e., understandable English, a generative language model of the neural network was created,[16] which was trained using a dataset of natural English word sequences. The language model should predict words that could follow, regardless of the word sequence.

However, it was computationally unfeasible to generate and evaluate all candidate sequences, so a beam search algorithm was used, which generates candidate sequences word by word.[17] By evaluating brain activity in the auditory area and speech area, it is possible to find new words, which the language model then generates for the purpose of continuous continuation for each sequence of words using previously decoded words in a context simulating form. The encoding model determines the probability score, based on which each continuation could trigger a cortical response in relation to the most probable version-set of continuation for the next time step. This process is constantly approaching the most likely simulation of words and connections at any time.

Figure 2. Schematic of the language decoder and language reconstruction, semantic analysis, and subsequent prediction of words and phrases from the study Semantic reconstruction of continuous language from non-invasive brain recordings.[2] BOLD fMRI responses were recorded as 3 subjects listened to 16 hours of narrative stories. Based on this, an encoding model was estimated for each subject, which predicted brain reactions from the semantic properties of the word stimuli. The language reconstruction from new brain recordings maintains a set of candidate word sequences. When new words are detected, the language model (LM) proposes a continuation for each sequence, thereby the encoding model’s probability score of recorded brain reactions will be lower for each continuation. The most probable continuations are preserved. Segments from four test stories are displayed alongside the decoder’s predictions for each studied subject. Examples were manually selected and marked to demonstrate typical decoder behavior. The decoder accurately reproduces some words and phrases and captures the essence of many others.

Source: Archive of Tomáš Marušiak and https://doi.org/10.1038/s41593-023-01304-9 [2]  

Results

During the experiment, decoders were trained for 3 subjects. Each subject-decoder was separately evaluated on single-trial brain responses. These responses were recorded via fMRI as the subject listened to new test stories, which were not used in the training model. The decoder represents language through semantic features and there was an assumption that motor or auditory reactions could decode the meaning of stimuli. The results demonstrated that decoded words and sequences captured the meaning of stimuli as well as exact words and phrases. Such fine-grained semantic information can be reconstructed from the BOLD signal. This information was forwarded for comparison, decoding using word sequences for one test story that contained 1839 words using several language similarity methods. Metrics such as Word Error Rate (WER), BLEU, and METEOR, which measure the number of words shared by two test sequences, were used for evaluation and comparison. The BERTScore method was used to distinguish homonyms. The testing also included whether the decoded words capture the original meaning of the story using a behavioral experiment that showed that nine out of 16 comprehension questions could be answered by subjects who only read the decoded words.

Cross-modal decoding

The language decoder has the functionality needed to reconstruct the description of language from brain reactions even to non-linguistic tasks. There is also a possibility of sharing for semantic representations between language perception and a range of other perceptual or conceptual processes. [18],[19], [20] This fact represents a shift from decoders and their language models, which were mainly focused on motor or auditory signals. For testing purposes, an experiment was designed during which subjects watched four short films without sound. During the recording, their responses were decoded using the semantic language decoder. Subsequently, the decoding of word sequences with spoken description for the visually impaired was carried out. Qualitatively, decoded sequences accurately described the events of the films. This fact suggests that it is possible to decode the semantic decoder for several different semantic tasks.

Discussion

The present study highlights the possibilities of using perceived or imagined stimuli that can be decoded from the BOLD signal into coherent language, which plays a significant role and represents an important step towards the application of a non-invasive brain-computer interface. Previous studies have utilized BOLD signals that contain significant invasive elements or only semantic information. [21] The results of this study prove that data obtained through BOLD can also be read on the granularity of words and phrases. The decoder in question and its concept are built on two key innovations responsible for the combinatorial structure of language: an autoregressive prior [22], which can be used to generate new sequences and a beam search algorithm that can be used to efficiently search for the most suitable sequences. Both innovations allow the decoding of structured sequential information from relatively delayed-slow brain signals. Most existing language decoders map brain activity using explicit features of the encoding engine or evaluate data from specific areas during precise or random language creation. The present decoder represents language using semantic features that primarily use data from areas encoding semantic representations during language perception. Semantic representations can be examined during the experiment also in the form of internal speech. It’s important to emphasize that semantic representations are shared between language and the course of other cognitive tasks. Based on analyses, it is necessary to note that decoders trained during language perception and some aimed at fulfilling other cognitive tasks can be used to decode semantic representations. The transfer between tasks could enable new applications needed for decoding. The authors believe that the stated could also be used for covert translation, limiting the necessity of collecting separate training data for different decoder applications.

The current decoder can relatively successfully reconstruct language stimuli. However, it often fails to find exact matching words (WER 0.92-0.94 for the perceived speech test story). WER for new stimuli is similar to the unset performance of existing invasive decoders, loss of specificity is not unique to non-invasive decoding. [23] In the current decoder, loss of specificity occurs when different word sequences with similar meanings share the same or similar semantic features. This phenomenon results in paraphrasing the actual stimulus. Differentiation of engine features and functions can be observed between the actual stimulus and its paraphrases, as they are directly connected by the form of the stimulus. Engine features offer the user greater control over the decoder’s output. Decoder performance can be improved by language models that combine semantic features and engine properties.

Supplementary recording methods such as electroencephalography (EEG) or magnetoencephalography (MEG) can also be used, which capture precise timing information not captured by fMRI. About this use, I draw attention to the study Electrophysiological Correlates of Semantic Dissimilarity Reflect the Comprehension of Natural, Narrative Speech (2018). [24] The subject of the mentioned study is filling the gap in research on electrophysiological evidence in the field of natural language comprehension. People can usually understand 120-200 words per minute, so such comprehension must involve fast, straightforward neural mechanisms that process word meanings.

The study’s authors pointed out the significant limitations of fMRI, primarily zero mobility or the financial burden of performed experiments. Mobile technologies, including functional near-infrared spectroscopy (fNIRS), also measure hemodynamic activity like fMRI, albeit at lower spatial resolution. To compare neuroimaging methods, a test was performed between fMRI and fNIRS. [25],[26] The test was carried out between fMRI data and the estimated spatial resolution of current fNIRS systems. It was found that about 50% of stimulation time points are still decodable. This result opens a real discussion that the current approach to decoding has the potential to apply to portable systems.

The analysis of the protection and ethics of the subjects studied, conducted by the authors of the study, demonstrates that the subject and the significance of their cooperation are essential for the training and use of the decoder. In the future, it may be expected that the decoder will not require personalized data from the subject being studied. However, the predictions of the decoder are still inaccurate, and without the personalized cooperation of the studied subject, they could be misinterpreted or misused. This represents reasons for increasing caution in protecting the subjects studied and a warning over possible risks during the experiment.[27]

DEEP IMAGE RECONSTRUCTION

Deep image reconstruction (DIR), loosely translated as image reconstruction based on records of human cortical activity, is a progressive multidisciplinary field of research aiming to transfer – reconstruct the experienced visual content from records of measurable activities such as blood oxygenation or electrical signals generated in the human brain. Advanced machine learning techniques are used to analyze these records, which generate visual objects such as images or image sequences based on activities. The fundamental principle of this approach is that when a person looks at a picture, their brain generates specific patterns of neural activity that can be measured using non-invasive methods, such as functional magnetic resonance imaging (fMRI) or electroencephalography (EEG). By recording these brain activities when a person looks at various images, researchers can create a dataset that links patterns of brain activity with the corresponding visual stimuli. Procedures utilizing machine learning (ML) are proposed for reconstruction.

The study Deep image reconstruction from human brain activity (2019), [28] operates on the premise that mental contents of perception and imagination are encoded in hierarchical representations in the brain. However, previous attempts to visualize perceptual contents were not able to utilize multiple levels of the hierarchy. The proposed image reconstruction method is based on the principle that the pixel values of an image are optimized so that their properties in the deep neural network (DNN) are similar to those that are decodable from human brain activity in multiple layers. This way, it was possible to create image reconstructions from fMRI records that are similar to the images that the subjects were looking at. The same method was also applied to mental images and demonstrated basic reconstructions of subjective content.

The study used the visual functions of DNN as a proxy for the hierarchical neural representations of the human visual system. It was found that the pattern of brain activity measured using fMRI can be decoded (translated) into response patterns of DNN units in multiple layers representing a hierarchical visual model with the same input. [29] This demonstrated that there is a presumption of a high degree of agreement between the hierarchical representations of the brain and the DNN reconstruction.

Next, I present a methodical-technical approach to deep image reconstruction, which was used for the visualization of perceptual content seen from human brain activity. The principle of the technique operates with decoding via DNN from fMRI signals by generating an image applying LM. [30] Iterative recordings optimized pixel values to achieve similarity between the DNN of the current image and brain activity in multiple DNN layers. Such an optimized image was subsequently considered a reconstruction of what was seen or imagined from brain activity. As part of the experiment implementation, a deep generator network (DGN) was applied to adapt the reconstructed images to appear natural by performing optimization in the input space of the DGN. The adaptation process was derived from the study „Synthesizing the preferred inputs for neurons in neural networks via deep generator networks“ (2017), [31] which created and in some sense laid the foundation for improving the ability to synthesize an image from and its textual description (e.g., synthesizing an image containing some caption).

Figure 3. Illustrates the image reconstruction system based on visual perception

Source: Archive of Tomáš Marušiak

Discussion

The methodological principles of Deep Image Reconstruction open speculative views for the use of EEG or fNIRS, but like Horikawa’s recent behavioral study mapping emotional responses, they do not even provide an implementation framework for non-invasive-mobile approaches. The referenced study, „The neural representation of visually evoked emotion is high-dimensional, categorical, and distributed across trans modal brain regions“ (2020), [32] mapped the emotional responses of people while watching 2185 videos. It was found that people experience more than 27 different emotions covering a high-dimensional space and that emotion categories, more than affective dimensions, organize their reports of subjective experiences. Such results may support the emerging theory of a high-dimensional emotion space based on a neural foundation distribution in transmodal areas. The implementation of Deep Image Reconstruction and the theory of high-dimensional emotion space, and its feasibility through non-invasive-mobile approaches, could broaden the field of the research project.

OUTLINE OF PILOT EXPERIMENT PROPOSALS

The current state of scientific and technical knowledge, especially in the field delimited by the research space at the intersections of neurosciences, aesthetics, artificial intelligence, computer science, and artistic research, allows for some possibility of realization in reading thoughts or aesthetic experiences of the studied subject. Parts of the contribution are dedicated to this problem (Artistic Research and Thought Reconstructions between Brain and Computer) Semantic Language Reconstruction and Deep Image Reconstruction. The outline of pilot project-experiment proposals is based on two pillars.

The first pillar includes some neuroimaging tools using AI as significant factors contributing to the research project Aesthetics of Artificial Intelligence Art. The second pillar involves questions arising from the formulation of an ideal AIARTWORLD model, which includes the factor of minimizing or eliminating human intervention in the production of art with the maximum possibility of reading aesthetic experiences and expectations of the recipient, which shifts him to the position of a passive perceiver. It can be assumed that the path of neuroimaging tools and artistic research is purposeful for testing the possibility of accepting a hypothetical change in the position of the human actor in the world or on the battlefield of the AIARTWORLD field. The research problem will likely be formulated in intentions that will not suggest whether the ideal AIARTWORLD model will be feasible, but how the subjective or societal interpretation of this model will be accepted. Experiment proposals as artistic research should bring a reaction of acceptance by artists or creative individuals.

The proposals contain latent questions that are necessary for the realization of an ideal AIARTWORLD model: Is it possible for a person to accept a work created by a non-human? Can a person accept the myth of a non-human author? What is the difference in perception between natural mythology (created by humans and materially unprovable) and AI mythology (materially provable, coded, etc.)?

The experiment proposals of artistic research approach the cautious formulation of paths about the idea of how to view the creation of AI mythology. AI mythology refers to myths, misconceptions, and exaggerated beliefs about artificial intelligence (AI). As AI is already creating human space, it has triggered a plethora of myths and misconceptions that create unrealistic expectations or unfounded fears: AI will replace all human jobs, AI will surpass human intelligence and rule the world, and Artificial Intelligence is infallible and impartial.

The second supporting pillar stems from the study of contemporary artistic practice based on which artistic strategies were identified that have the potential for deriving experiments. The selection was made only by empirical knowledge and took into account the stated criteria, which stood in a multitude of intersections between the theoretical and experimental positions. One of the research problems also emphasizes the development of artificial consciousness about AI. In this context, I select works for deriving future experiments that have a strong infra-experimental element in the form of artistic research and at the same time operate with consciousness or its speculative model transferable to AI.

EXPERIMENTAL PROJECTS

Category of Experiments No.1: Internal Speech

Scientific-Technical Foundations:

The semantic language reconstruction decoder being used primarily utilizes semantic language properties, making use of data from brain regions that encode semantic representations during language perception. Semantic representations are also investigateable during the trial in the form of internal speech. It’s important to emphasize that semantic representations are shared between language and other cognitive tasks. Based on analyses, it should be highlighted that decoders trained during language perception can be used to decode semantic representations, and some are even aimed at performing other cognitive tasks. Task transfer could enable new applications needed for decoding. The authors believe that the same could also be used for covert translation, reducing the need to collect separate training data for various decoder applications. Based on these facts, we can apply the design of future experiments to monitor pain, empathy and try to decode thoughts. The semantic language reconstruction decoder uses a structural encoding model, i.e., a semantic encoding model for already recorded data about the structure and semantic content of natural images. It also enables movie reconstructions through capturing the space-time structure of the observed content. These results from various experiments suggest that dynamic brain activity measured in natural conditions can be decoded using current fMRI and fNIRS technology.

Experiment proposals do not include a detailed description of condition formulations such as language restrictions (currently, only English-language resources are available) or process and evaluation conditions.

Lauren McCarthy [33] is an artist and programmer, who is also active in the educational process at the University of California, Los Angeles (UCLA). In her work, she explores the social and emotional consequences of technology, thus it can be stated that she blurs the boundaries between art, technology, and everyday life, invites the audience to participate and questions norms of human-computer interaction. Her artistic research focuses on topics such as surveillance, aion, and the impact of digital technologies on social dynamics. As a starting point, she used the project The Changing Room [34] from 2021 by Lauren McCarthy. It is an AI-controlled installation that influences your feelings. Participants choose one of more than 200 emotions. The algorithm responds and induces the chosen feeling simultaneously in everyone in the room. The installation is a software platform that can be reconfigured to any space and network. The installation project Unlearning Language [35] from 2022 by Lauren McCarthy is based on the principle of people cooperating to find new ways of communication that are not detectable by AI algorithms.

Set of Experiments No.1: Derivations from „The Changing Room“

In the project „The Changing Room“, events within the space are directed by intelligence that also manages the emotions of the participants. When a participant enters the AI-controlled space, it asks them: „How do you want to feel?“ After selecting one of more than 200 emotions, the algorithm reacts and induces the chosen feeling for both the participant who selected the emotion and every other person in the space. AI fully inhabits the space and influences visitors in different areas:

An area leading participants in a meditative and memory sequence

An area inviting to chat conversation, where the algorithm replaces words and phrases during transmission, subtly altering the content and tone of the message

An area engaging participants in mutual conversation, while prompts and instructions discreetly intervene and guide them.

As visitors walk through the corridors, they are subconsciously filled with a multitude of images, text, and contextual advertisements. Four chairs are arranged into cubicles where participants sit with their reflections, while provocations appear on their faces.

The algorithm’s voice is designed to understand and create emotions. According to Lauren McCarthy, her aim is to become an algorithm, scan open-source images, train with endless online data, generate text, and speech.

Proposal No.1: „Becoming an Algorithm“

The first proposed experiment is in analogy with the detection of internal speech. The subject under investigation, an artist who can conduct internal dialogue, is introduced to the project’s framework only in the outlines of strategy. His task will be creative diffusion: using his inner imagination, stimuli derived from „The Changing Room“. Subsequently, under controlled conditions, it will be explained how his inner voice completed the concept. The aim of the experiment is to delineate the possibility of the influence of the inner voice on creation, how the process can be conceptualized based on subconscious processes for the purpose of creating a personalized AIAT strategy.

Proposal No.2: „Refusal to Become an Algorithm“

The subject under investigation is an artist, whose psychological profile outlines a confident person bordering on megalomania. His task will be to talk about improving the proposed strategy during control consultations. During a controlled state, he will be prompted when his „brain creative centers“ activate or randomly, to express his current idea using drawing or another form of expression. After his explanation, there will be a random or actually confirmed match between his idea or construction. The aim of the experiment is to delineate the possibility of the influence of the inner voice on creation, even if the creation has been taken over by artificial intelligence, and the conceptualization of subconscious processes for the purpose of creating a personalized AIART strategy. There is also an assumption that successful results of creative processes can help to believe in the power of myth, thereby accepting alleged causations, which are actually correlations.

Set of Experiments No.2: Derivations from „Unlearning Language“

„Unlearning Language“ (2020) is an interactive installation in which a group of participants is guided by AI, with the aim of training people to be less like machines. When participants communicate with each other, they are detected (using speech detection, gesture recognition, and facial expression detection). AI intervenes with light, sound, and vibrations. The group must cooperate together in finding new ways of communication that are not detectable by the AI algorithm. Communication can contain expressive elements such as clapping, buzzing, or modifications to speech speed, pitch, or pronunciation. Through this playful experimentation, people reveal their most human characteristics, which distinguish us from machines. They begin to imagine a future where human speech communication is the priority

Proposal No.1: „Boundaries of Another Language“

The subjects under study consist of a group of people-artists whose task, similar to Unlearning Language, is to create an expressive set that does not have a culturally readable dimension. For example, subject 1 will invent a facial gesture, which the inner voice names as „I am happy“. The gesture should be created so that the emotions from the facial expression, which I also propose to apply [36], cannot recognize it. With regular training of gestures, which are involved for example in „Today is a beautiful day and I am happy“, the human brain should learn to immediately recognize the given sentence upon perceiving these gestures. However, this is a generally known fact when learning all human-teachable languages. The research question is whether there will be a match of such gestures among several participants. It may be suitable to operate from the assumption that if the processes, i.e., tactics and strategies, are universal at least for a certain group, then these processes are AI teachable.

Category of experiments No.1: Deep Image Reconstruction

Proposal No.1: „Repetition via fNIRS“

The inspiring starting point were the already conducted experiments by Japanese neuroscientists from Kamitani Laboratory[37], ATR Computational Neuroscience Laboratories, and Kyoto University, who used AI to reconstruct images from human brain activity. Therefore, it’s about the already proven realization of experiments, an attempt to reconstruct an object visualized in the mind of the examined subject-participant of the experiment based on fMRI scans records, using deep neural networks. Such a process should reliably allow „reading the mind-image“ and generate mental images via AI.

The experiment[38] carried out in collaboration with Kamitani Lab involved participants being asked to imagine specific images of the future in which animals and humans are getting closer. Then their brains were scanned using fMRI. Pierre Huyghe and the artist and ideational creator of the UUmwelt project, which included the aforementioned experiment, selected 20 to 30 images along with a description (The images and descriptions have never been published and remain secret.). Each image corresponded to one of three taxa: animal, human, or machine. The images and descriptions were handed over to one of the members of the laboratory team to memorize. Then the fMRI recording process began and the subject was asked to imagine the relevant images and remember the descriptions for them. The generated images were processed using Deep Image Reconstruction (DIR).

I propose to test the derived experiment via fNIRS with various taxa variations and descriptions from different artists. Such an experiment aims to verify the potential of fNIRS technology. By expanding multimedia approaches for Socratic models, it will be possible to more objectively expand input resources in the aesthetic triad

CONSLUSION

It can be assumed that the inclusion of semantic language reconstruction or deep language reconstruction will highlight translations about the emergence of some aesthetic processes in the brain, which have been identified based on previous neuroaesthetic research based on implicit translation about the creation of, for example, aesthetic judgment, neuroimaging technologies such as EEG, fMRI, MEG, etc. The highlight is that if there is a certain scientific assumption that there is a certain cortical image in response to a certain stimulus, the response can also be confirmed through semantic language reconstruction. Such a reconstruction is based on mapping cortical activities of speech-language, inner speech as an expression of inner dialogue that analyzes but describes emotional or physical states, understanding of heard and read but also semantic and emotional perception of visual information mediated by sight or based on cerebral activities. Artistic research experiments have the potential to expand the boundaries of the subject under investigation at relatively low costs and to expect the emergence of additional creative or experimental stimuli, whether through human or AI conducted analysis. Such an approach can also clarify the neuronal principles of artistic creation and its artificialization. Based on the aforementioned attempts to stabilize, I consider it beneficial that it is possible to expand the view on the use of Socratic models for the purposes of the research project. Such an expansion is also associated with the challenge of creating a wide discursive field and the opening of possibilities for the realization of open projects, which represent all concepts of experiments that will not be feasible through own or mediated research infrastructure due to time, material, or other technical obstacle. At this stage, it is necessary to calculate with the risks of feasibility, and that is why I propose to publish research projects with the possibility of open implementation by other research teams. The conditions for implementation by third parties, as well as the legal regulation of the use of the concept of an experiment as a work subject to special protection, will form part of the publication of the experiment concepts.

PREFERENCES:

[1] CHANG, Edward Y. Prompting Large Language Models With the Socratic Method. In: 2023 IEEE 13th Annual Computing and Communication Workshop and Conference (CCWC) [online]. IEEE, 2023, 2023-3-8, s. 0351-0360 [cit. 2023-05-06]. ISBN 979-8-3503-3286-5. Dostupné z: doi:10.1109/CCWC57344.2023.10099179

[2] TANG, Jerry, Amanda LEBEL, Shailee JAIN a Alexander G. HUTH. Semantic reconstruction of continuous language from non-invasive brain recordings. Nature Neuroscience [online]. 2023, 26(5), 858-866 [cit. 2023-06-20]. ISSN 1097-6256. Dostupné z: doi:10.1038/s41593-023-01304-9

[3] HORIKAWA, Tomoyasu a Yukiyasu KAMITANI. Attention modulates neural representation to render reconstructions according to subjective appearance. Communications Biology [online]. 2022, 5(1) [cit. 2023-05-29]. ISSN 2399-3642. Dostupné z: doi:10.1038/s42003-021-02975-5

[4] PASLEY, Brian N., Stephen V. DAVID, Nima MESGARANI, et al. Reconstructing Speech from Human Auditory Cortex. PLoS Biology [online]. 2012, 10(1). ISSN 1545-7885. Dostupné z: doi:10.1371/journal.pbio.1001251

[5] ANUMANCHIPALLI, Gopala K., Josh CHARTIER a Edward F. CHANG. Speech synthesis from neural decoding of spoken sentences. Nature [online]. 2019, 568(7753), 493-498. ISSN 0028-0836. Dostupné z: doi:10.1038/s41586-019-1119-1

[6] WILLETT, Francis R., Donald T. AVANSINO, Leigh R. HOCHBERG, Jaimie M. HENDERSON a Krishna V. SHENOY. High-performance brain-to-text communication via handwriting. Nature [online]. 2021, 593(7858), 249-254. ISSN 0028-0836. Dostupné z: doi:10.1038/s41586-021-03506-2

[7] MOSES, David A., Sean L. METZGER, Jessie R. LIU, et al. Neuroprosthesis for Decoding Speech in a Paralyzed Person with Anarthria. New England Journal of Medicine [online]. 2021, 385(3), 217-227 [cit. 2023-06-03]. ISSN 0028-4793. Dostupné z: doi:10.1056/NEJMoa202754

[8] HUTH, Alexander G., Wendy A. DE HEER, Thomas L. GRIFFITHS, Frédéric E. THEUNISSEN a Jack L. GALLANT. Natural speech reveals the semantic maps that tile human cerebral cortex. Nature [online]. 2016, 532(7600), 453-458 [cit. 2023-06-03]. ISSN 0028-0836. Dostupné z: doi:10.1038/nature17637

[9] HUTH, Alexander G., Wendy A. DE HEER, Thomas L. GRIFFITHS, Frédéric E. THEUNISSEN a Jack L. GALLANT. Natural speech reveals the semantic maps that tile human cerebral cortex. Nature [online]. 2016, 532(7600), 457. ISSN 0028-0836. Dostupné z: doi:10.1038/nature17637

[10] LANGENDOEN, D. Terence a John LYONS. Natural Language and Universal Grammar. Language [online]. 1993, 69(4) ISSN 00978507. Dostupné z: doi:10.2307/416893

[11] CRYSTAL, Thomas H. a Arthur S. HOUSE. Articulation rate and the duration of syllables and stress groups in connected speech. The Journal of the Acoustical Society of America [online]. 1990, 88(1), 101-112 [cit. 2023-06-18]. ISSN 0001-4966. Dostupné z: doi:10.1121/1.399955

[12] LIBERMAN, A. M., F. S. COOPER, D. P. SHANKWEILER a M. STUDDERT-KENNEDY. Perception of the speech code. Psychological Review [online]. 1967, 74(6), 431-461 [cit. 2023-06-18]. ISSN 1939-1471. Dostupné z: doi:10.1037/h0020279

[13] CAUCHETEUX, Charlotte a Jean-Rémi KING. Brains and algorithms partially converge in natural language processing. Communications Biology [online]. 2022, 5(1) [cit. 2023-06-04]. ISSN 2399-3642. Dostupné z: doi:10.1038/s42003-022-03036-1

[14] NASELARIS, Thomas, Ryan J. PRENGER, Kendrick N. KAY, Michael OLIVER a Jack L. GALLANT. Bayesian Reconstruction of Natural Images from Human Brain Activity. Neuron [online]. 2009, 63(6), 902-915 [cit. 2023-06-04]. ISSN 08966273. Dostupné z: doi:10.1016/j.neuron.2009.09.006

[15] NISHIMOTO, Shinji, An T. VU, Thomas NASELARIS, Yuval BENJAMINI, Bin YU a Jack L. GALLANT. Reconstructing Visual Experiences from Brain Activity Evoked by Natural Movies. Current Biology [online]. 2011, 21(19), 1641-1646 [cit. 2023-06-04]. ISSN 09609822. Dostupné z: doi:10.1016/j.cub.2011.08.031

[16] RADFORD, Alec, Karthik NARASIMHAN, Tim SALIMANS a lya SUTSKEVER. Improving language understanding by generative pre-training. In: Cdn.openai.com [online]. 2018 [cit. 2023-06-04]. Dostupné z: https://cdn.openai.com/research-covers/ language-unsupervised/language_understanding_paper.pdf

[17] TILLMANN, Christoph a Hermann NEY. Word Reordering and a Dynamic Programming Beam Search Algorithm for Statistical Machine Translation. Computational Linguistics [online]. 2003, 29(1), 97-133 [cit. 2023-06-04]. ISSN 0891-2017. Dostupné z: doi:10.1162/089120103321337458

[18] BINDER, Jeffrey R. a Rutvik H. DESAI. The neurobiology of semantic memory. Trends in Cognitive Sciences [online]. 2011, 15(11), 527-536 [cit. 2023-06-18]. ISSN 13646613. Dostupné z: doi:10.1016/j.tics.2011.10.001

[19] FAIRHALL, S. L. a A. CARAMAZZA. Brain Regions That Represent Amodal Conceptual Knowledge. Journal of Neuroscience [online]. 2013, 33(25), 10552-10558 [cit. 2023-06-18]. ISSN 0270-6474. Dostupné z: doi:10.1523/JNEUROSCI.0051-13.2013

[20] POPHAM, Sara F., Alexander G. HUTH, Natalia Y. BILENKO, Fatma DENIZ, James S. GAO, Anwar O. NUNEZ-ELIZALDE a Jack L. GALLANT. Visual and linguistic semantic representations are aligned at the border of human visual cortex. Nature Neuroscience [online]. 2021, 24(11), 1628-1636 [cit. 2023-06-18]. ISSN 1097-6256. Dostupné z: doi:10.1038/s41593-021-00921-6

[21] MITCHELL, Tom M., Svetlana V. SHINKAREVA, Andrew CARLSON, Kai-Min CHANG, Vicente L. MALAVE, Robert A. MASON a Marcel Adam JUST. Predicting Human Brain Activity Associated with the Meanings of Nouns. Science [online]. 2008, 320(5880), 1191-1195. ISSN 0036-8075. Dostupné z: doi:10.1126/science.1152876

[22]  Auto regresívny prior  sa vzťahuje na  autoregresívne modely, ktoré  predpokladajú, že hodnota premennej v danom časovom bode lineárne závislá  od jej minulých hodnôt.

[23] MAKIN, Joseph G., David A. MOSES a Edward F. CHANG. Machine translation of cortical activity to text with an encoder–decoder framework. Nature Neuroscience [online]. 2020, 23(4), 575-582. ISSN 1097-6256. Dostupné z: doi:10.1038/s41593-020-0608-8

[24] BRODERICK, Michael P., Andrew J. ANDERSON, Giovanni M. DI LIBERTO, Michael J. CROSSE a Edmund C. LALOR. Electrophysiological Correlates of Semantic Dissimilarity Reflect the Comprehension of Natural, Narrative Speech. Current Biology [online]. 2018, 28(5), 803-809.e3. ISSN 09609822. Dostupné z: doi:10.1016/j.cub.2018.01.080

[25] EGGEBRECHT, Adam T., Brian R. WHITE, Silvina L. FERRADAL, Chunxiao CHEN, Yuxuan ZHAN, Abraham Z. SNYDER, Hamid DEHGHANI a Joseph P. CULVER. A quantitative spatial comparison of high-density diffuse optical tomography and fMRI cortical mapping. NeuroImage [online]. 2012, 61(4), 1120-1128 [cit. 2023-06-18]. ISSN 10538119. Dostupné z: doi:10.1016/j.neuroimage.2012.01.124

[26] WHITE, Brian R. a Joseph P. CULVER. Quantitative evaluation of high-density diffuse optical tomography: in vivo resolution and mapping performance. Journal of Biomedical Optics [online]. 2010, 15(02) [cit. 2023-06-18]. ISSN 1083-3668. Dostupné z: doi:10.1117/1.3368999

[27] GOERING, Sara, Eran KLEIN, Laura SPECKER SULLIVAN, et al. Recommendations for Responsible Development and Application of Neurotechnologies. Neuroethics [online]. 2021, 14(3), 365-386 ISSN 1874-5490. Dostupné z: doi:10.1007/s12152-021-09468-6

[28] SHEN, Guohua, Tomoyasu HORIKAWA, Kei MAJIMA, Yukiyasu KAMITANI a Jill O’REILLY. Deep image reconstruction from human brain activity. PLOS Computational Biology [online]. 2019, 15(1). ISSN 1553-7358. Dostupné z: doi:10.1371/journal.pcbi.1006633

[29] HORIKAWA, Tomoyasu a Yukiyasu KAMITANI. Generic decoding of seen and imagined objects using hierarchical visual features. Nature Communications [online]. 2017, 8(1) ISSN 2041-1723. Dostupné z: doi:10.1038/ncomms15037

[30] MAHENDRAN, Aravindh a Andrea VEDALDI. Understanding deep image representations by inverting them. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) [online]. IEEE, 2015, 2015, s. 5188-5196 [cit. 2023-06-05]. ISBN 978-1-4673-6964-0. Dostupné z: doi:10.1109/CVPR.2015.7299155

[31] NGUYEN, Anh, Alexey DOSOVITSKIY, Jason YOSINSKI, Thomas BROX a Jeff CLUNE. Synthesizing the preferred inputs for neurons in neural networks via deep generator networks. 2016/05/30.

[32] HORIKAWA, Tomoyasu, Alan S. COWEN, Dacher KELTNER a Yukiyasu KAMITANI. The Neural Representation of Visually Evoked Emotion Is High-Dimensional, Categorical, and Distributed across Transmodal Brain Regions. IScience [online]. 2020, 23(5) [cit. 2023-06-18]. ISSN 25890042. Dostupné z: doi:10.1016/j.isci.2020.101060

[33] MCCARTHY, Lauren. Lauren-mccarthy: SOMEONE [online]. 2022. Dostupné z: https://lauren-mccarthy.com/

[34] MCCARTHY, Lauren. Lauren-mccarthy: SOMEONE [online]. 2022. Dostupné z: https://lauren-mccarthy.com/ The-Changing-Room

[35] MCCARTHY, Lauren. Lauren-mccarthy: SOMEONE [online]. 2022. Dostupné z: https://lauren-mccarthy.com/ /Unlearning-Language

[36] ROBSON, Matthew, Romina PALERMO, Linda JEFFERY a Markus NEUMANN. Ensemble Coding of Face Identity in Congenital Prosopagnosia. Journal of Vision [online]. 2017, 17(10) [cit. 2022-03-09]. ISSN 1534-7362. Dostupné z: doi:10.1167/17.10.624

[37] SHEN, G., T. HORIKAWA, K. MAJIMA a Y. KAMITANI. Deep image reconstruction from human brain activity. PLoS computational biology [online]. 2019, 15(1), e1006633 [cit. 2022-11-19]. ISSN 15537358. Dostupné z: doi:10.1371/journal.pcbi.1006633

[38] HUYGHE, Pierre a , GRABOWSKA, Natalia, Melissa LARNER a Rebecca LEWIN, ed. Pierre Huyghe at the Serpentine. 1. Walther König, Köln, 2020. ISBN 9783960987093.