Glossary

⬅ Back to Lander

-

= Glossary =

This page aims to define terms that are often used by the community and may not be immediately accessible to newcomers.

Asterism (⁂)
Typography

A group of three asterisks brought together as a single symbol. Usually used to separate parts of a passage. Similar, but not identical to a Dinkus.

Author’s Notes (A/N)
Text Injection, Feature

A type of Text Injection that is usually situated three newlines from the bottom of the context. It is traditionally used to hold information such as author, genre and style tags, but can be used for much more.

Banned Tokens
Generation, Feature

A token that the AI is not allowed to generate. Instead, the AI will use tokens that would have had a chance to appear in similar contexts.

Brackets
Typography

Shorthand for Square Brackets, which are these symbols: [ ]. They are part of the finetune data, and used to contain metadata about the text files. They are traditionally used in Injected Text to separate it from story text.

Branch
Generation, Writing

A version of the story. When you redo, or undo then edit, you create a new branch. Think of it like time travel.

Calliope (model)
AI

A finetuned version of the GPT-NEO 2.7B model. It is inferior in creative performance to Sigurd, but very lightweight and can run on cheaper, slower hardware. It was used by NovelAI during the Pre-Alpha and Alpha stages.

Cascading Activation
Injected Text, Generation

A flag that tells a Lorebook entry to check for other text injections (Memory, Author's Note) for its keys, rather than only the story text.

Caveman Format
Writing, Technique

Writing while removing most grammatical words, and focusing only on words that carry semantic meaning. As an example,  would become. It is used to reinforce connections between subjects and objects and reduce token use at the expense of writing quality.

Context
Injected Text, Generation

All the text that the AI has in its memory before it attempts to generate more text. This is all the activated injected text, plus all the story text that can fit inside the context window.

Context Window
AI, Limit

GPT-J and GPT-NEO both share a limit of 2048 tokens, or ~9000 Latin Alphabet characters used as a person would need to communicate in a natural manner. This is the maximum memory that the AI can use. On Tablet tier, the window is cut in half, reduced to 1024 Tokens.

Context Viewer
Injected Text, Feature

A tool that lets you visualize all text injections, and what the context window contains. This can help diagnose generation problems due to poor text injection settings.

Dataset
AI

A batch of text files, used for fine tuning or module training. It must be presented in a specific format for best results.

Dinkus (***)
Typography

A set of three asterisks in a row. Often used to separate large sections. Assumed to have a strong "break" effect between scenes with NovelAI's models.

End of Sampling (EOS)
AI, Generation

A marker used to tell the AI to stop generating. Generally, it is an uncommon symbol that would not appear unless the user 'trains' the AI to use it regularly. Useful for generating text similar to chat logs and other unorthodox forms, or simply to stop on sentence ending markers.

End of Text Token (<|endoftext|>)
AI

A token that was used to designate the end of a text file, so that the training routine of the AI can proceed to the next file. It can suddenly appear during generation, this is simply an artifact and can be removed.

Entry (or Lorebook Entry)
Text Injection

An entry for an element of your story, as part of its lorebook. It has text data, insertion settings, and keys that will activate it if present.

Ephemeral Context
Text Injection, Feature

A form of Text Injection that is only active for a number of actions, after which it disappears. Has a very specific syntax that no other feature uses.

Finetune
AI

If you consider the AI as a bunch of knobs and sliders, this is the act of adjusting them in order to make the AI's output more "fitting" for your purposes.

Flatten
Feature

An aggressive form of trimming. Eliminates all branches, which leaves the entire story as one solid block of text.

Format (or Lorebook Format)
Writing

A way to present the information to the AI. It can range from writing things as you would in a normal essay, to using formats similar to code, to many other different types.

Generation (or AI Generation)
Generation

After the AI receives the context window's data, it tries to continue the story from there. The text you receive from the AI is the Generation.

Generative Pre-trained Transformer (GPT)
AI

A neural network that takes the form of a large amount of vector-space equations. Every number is a token, which is a text fragment that appeared in the AI's training data. The fragments, and their relationships, are analyzed, and a network of "relationships" between all these fragments are created. The goal is to create a network which can replicate human language in order to generate convincingly human-like text.

As it is pre-trained, it does not learn from its input. Only what it has been trained with. Training is an extremely costly operation that requires rare, expensive, and difficult to set up hardware, making it a costly service. These models also require powerful Graphical Processing Units, especially units equipped with Tensor Cores, which are ideal for vector space math (like raytracing!) These GPUs are expensive and use a lot of power, which also makes running a GPT model expensive.

Inline Generation Insertion Lore Generator Lorebook Lorebook Keys Loss (or Training Loss) Loss Graph Memory Memory Context Model Module (or AI Module) Module Training Nucleus Sampling Order of Insertion Output Perspective (or POV) Phrase Bias Placeholder Preamble Prefix Preset (or Generation Preset) Prompt Prose Randomness (or Temperature) Redo Tree Repetition Penalty Reserved Tokens Retry (used as a noun or as a verb) Sampling Scenario Search Range Sigurd (model) Stage Step Story Story Context Story Tag Stream Responses Subcontext Suffix Tail-Free Sampling Text Adventure Module Theme Tier Token Token Budget Token ID Tokenization Tokensafe Format Top-K Top-Kek Top-P Train (Model Training, not to be confused with Module Training) Trim

⬆ Return to Page Top