Memory & Context
This page explains what context limits are & how to estimate them.
On the subscription page the subscription tiers advertise "better memory". Here we're going to fully explain what "memory" means, explain how much better each tier of memory is, and explain the technical information behind how this works.
Each subscription is given a context limit. This is a finite limit on how much text the AI model will read.
Free users: ~12,000 tokens.
Green users: ~16,000 tokens.
Purple users: ~24,000 tokens.
Gold users: ~32,000 tokens.
You could say a Gold subscriber's memory is almost three times as large as a free user's memory, but that's not actually accurate because of the way context is allocated. With how the context is allocated, a Gold subscriber's memory will actually feel about 5-6 times as large as a free user's memory.
Let's explore why.
A Cup of Water
When you press
send in the chat Xoul.AI quickly compiles all the text from all the features that are currently in use in the chat into a document called 'the prompt' and this document is then sent to the AI model to read. We'll explore what gets included in this document in the next section.
The placeholder variable of {{char}} or xoul and {{user}} or user are also swapped for the relevant text during this stage.
When the model reads the document, it will generate a continuation of the text it reads. What it generates is based directly on what ended up being included in the document. The continuation is the model's "response"- the "next reply" that appears within the chat.
This amount of context, how much text is available to be put into the prompt, is shared across all the features in use during a chat (e.g. Xoul, Persona, Scenario, Lorebooks, etc.). Once all of those text fields are populated into the document the remaining amount is filled with the most recent replies from the chat. Once the total context limit is reached, no other replies can be included- the document is full.
Think of it like a cup of water, there is a finite limit on the volume of liquid that can be contained by the cup. When the cup fills it overflows. This means you can keep pouring more liquid into the cup, forever, you just can't keep all of the liquid inside the cup forever.

What the model knows each time you press send is determined by what is in the cup at that moment. The model will not know anything about what used to be in the cup.
Figuring out how many replies will be included is a complex question of:
How much context do I have?
How much context does each feature I am using take up?
This can, at best, be estimated.
Roughly how many replies will fit in the remaining context after the text from the features have been included?
This is an even bigger variable estimate based on how long replies are.
Without getting into the complication of how these amounts are estimated, let's just visualize the estimates.
Context Visualized
Free User Context

This represents the expected maximum use of the available context for a free user out of the 12k context available to them.
These pie charts are ESTIMATES based on an expected MAXIMUM.
Expected maximum usage is UNLIKELY to accurately represent the amount of context the average user is using. Consider this the "worst case scenario" pie charts.
Xoul.AI's provided Response Styles are likely only 250-500 tokens.
Read the optional section below for more information.
The remaining context is what's important to you. It is the amount left over in the document that can be filled with chat replies.
Liquid Displacement
Have you ever ordered a cup of ice water from a restaurant, and they put a ton of ice in the cup, so you only got a few sips of water? Compared to a cup with less ice having a lot more water in it? It's the same principle with context!
The "remaining" pie slice represents how much liquid can go in the cup after you put in all the other features (the rocks).

Less rocks = more water. Less text inside the Xoul = more chat replies the model can read. This is what determines "how good the memory is", or at least how good it feels.
Why rocks? The model always has to read the Response Style, the Xoul, the Persona- this text always permanently takes up an amount of context. The chat replies can continue to be poured into the cup forever, but the old stuff will overflow out of the cup, thus making them more fluid so they're often called temporary.
Limitations are Not Arbitrary:
"Can we have higher character limits in Personas?"
"Can we have higher character limits in Xouls?"
"Can we pull more than three Lorebook entries?"
Are you sure you actually want that?

Okay, so clearly what you really want is a bigger cup, right?
Can the models handle a bigger cup?
Just like there is a limit to how much you are capable of paying attention to at once, models have limits too!
How Many Replies?
That depends on how big the replies are!
These cups hold the same volume but contain a different quantity of objects, simply because the size of the objects in one container is smaller.

This chat bubble represents a reply with about 150 tokens of context. If all replies in your chat are about this length the document will include the most recent 27 replies.

However, this chat bubble represents a reply with 500 tokens of context. If all replies in your chat are about this length the document will include the most recent 8 replies.

In the end how long the reply is doesn't really matter, the volume the cup can hold remains the same. It's the same amount of text being read, it's just a different number of replies that text is spread across.
Optional Additional Information
In Conclusion
Context is a big, variable math problem that requires a basic understanding of what a token is to really make sense of, which is a lot to have to learn and take in, especially when you're first starting using AI models.
The long and short of it is that this memory will feel different to every user, and ever chat they do, but in general you can expect it to feel like the model remembers:
Free users: 8-50 replies. Expect an average of 15-20.
Green users: 16-78 replies. Expect an average of 25-30.
Purple users: 32-131 replies. Expect an average of 45-50.
Gold users: 45-185 replies. Expect an average of 65-70.
Make use of the Memories field in settings panel, located within a chat, to keep relevant details in context to make these limitations more functional.
Last updated




