Basic Ren'Py TTS Accessibility Guide

The visual novel engine Ren'Py has built-in systems for allowing the player to utilize TTS, making games more accessible for the visually impaired. However, the documentation glosses over some things & is kind of scattered all over the place, and many of the dev aspects of this functionally are pretty new. So, when I sat down to refine this for a couple of my own games, I decided to put together a little getting-started guide based on my experience so far!
As of writing this, I've only done this to a couple simple, fairly short games, so your mileage with my advice may vary with something longer and more complex.
I will not expect you to know any special Python going into this guide, but I will expect you to know the basics of how Ren'Py works, so please go through the basic tutorials the engine comes with if you're looking into this first for some reason.

Default behaviors

Self-voicing mode, activated by pressing the V key while playing, will always be a part of your game by default ~~(i'm sure there's probably a way to remove it, but that's too advanced for me, and also not what this guide is about)~~. However, it doesn't do much on its own.

Here's how it works before you, the programmer, do anything to your code to account for it, in the most recent version as of me writing this guide:

Reads off the name of the speaker (if applicable), pauses briefly, then reads off the dialogue, in whatever TTS voice the player has set in their system settings
Reads off currently highlighted options in menus (main menu, pause menu, buttons along bottom, in-game choices)
Wait for all text to be read before advancing to the next line (such as when using auto advance mode (not skip mode), or a line has the {nw} style property)
Reads off error messages in their entirety in the event that one is encountered (there's nothing you can do about this one barring never ever having an error. just warning you for when you run into one while playtesting this stuff)

Meanwhile, here's some things that it does not do for you:

Read names of speakers that are entirely symbols (i.e. question marks)
Read certain symbols at all, or it could instead read certain other symbols that aren't intended to be read (i.e. hyphens in certain places)
Read dialogue that has the {fast} style property
Acknowledge when an in-game choice menu comes up
Acknowledge images, minigames, or any other primarily visual elements in any way
Pronounce everything in a way that sounds right, consistently pause exactly how long you want, or otherwise meet your artistic vision perfectly

How to code everything

Your two main tools for improving the TTS experience of your game will most likely be the pre-defined alt character and the {alt} style property. Not the only ones, but the main ones.
I'm providing links to the official documentation where relevant, but this wouldn't be that much of a guide if I didn't try to teach you some of this stuff myself.

The alt character

This is a character that functions pretty much exactly the same as the narrator, except none of this dialogue is pulled while running the game unless the player has self-voicing mode enabled. The text is both displayed on-screen and read aloud. This will probably be where the majority of your image descriptions go.

 alt "The scene fades out, and is replaced by a background image of a winding path surrounded by pine trees, with a car off to the side in the foreground."

In fact, it behaves so much like the narrator, that any special characteristics you define for the narrator will apply to the alt text as well! So, if there's stuff you wish to apply to the narration but not the alt text (for example, the who_alt & what_alt properties I'll explain later), you should consider making a seperate nameless character that will be used instead. You could also try applying different stuff to the alt text character I guess, I haven't tried that myself, but considering it's colored differently from other dev-defined characters in the code editor, I don't expect that to do much. Feel free to get in contact with me if you discover something in that regard.

A quick note added now that I've messed around with NVL-Mode: A monologue from the alt character with the {clear} tag will not cause that clear to only be applied in self-voicing mode! It will clear the current text regardless! If you were thinking of incorporating those into alt text somehow, please bear this fact in mind.

The {alt} style property

This is a text tag that will not display the inside text on-screen, but will be read out by the TTS. Useful for symbols that are not naturally read by the TTS, appending image descriptions to existing text, or working in tandem with the {noalt} tag (displays on-screen but isn't read by the TTS) to fine-tune how something is read.

 e "{noalt}‘Chyeah{/noalt}{alt}Yeah{/alt}, {i}exactly!{/i}"

e "There’s already {i}so{/i} much more I wanna tell you about, and{noalt}-{/noalt}{alt}...{/alt}"

What I said earlier about {nw} and auto-advance waiting for the TTS to finish reading before moving on still applies to this, in the event that what you do is considerably longer than it would take to read the vanilla text! I did have some concerns about this at first. I just needed to update my version of Ren'Py, though- this aspect was pretty new.

The TTS substitutions config

You probably don't want to {noalt} {alt} every single instance of a particular odd pronunciation in your game. (I almost had to with one of mine, as there was actually a pretty nasty bug with this when I first started doing this stuff, but luckily it's been fixed now!) This is a list you can put with your character/displayable/whatever definitions towards the top of your script to tell it to pronounce certain terms a particular way every time they come up.

 define config.tts_substitutions = [
("Bevelyne", "Bevelin"),
("Bichard", "Bitchard"),
("Klarl", "Klaarl"),
("Clawyde", "Clawwide")
]

The first term in each parentheses pair is the term that you want to change the pronunciation of, and the second is the desired pronunciation. I have no idea if it's case sensitive, please feel free to let me know.

who_alt & what_alt

These are properties you apply to the characters you define to change how the TTS reads out their name, or to have it append something to what they say, respectively.

 define m = Character("Mascot", who_alt="The person in the costume")

define faken = Character(what_bold=True, what_color="#9441b6", what_alt="I think, [text]")

The [text] variable shown above is used to show where the written dialogue goes when using the what_alt property. Other variables, such as for player-named characters, are also still useable.

Others

There is even more stuff than this, if you want to get more advanced! You can choose between different system TTS voices per-platform. Whatever this thing's on about. Again, this guide is more for basics, and to gather the documentation all in one place.

Personal recommendations

This section is based solely on my personal preferences. I am not the leading expert, and it is ultimately up to you to decide how you want the TTS in your game to sound. But, if you happen to be lost on what to actually put in-between the code, I hope this will serve as a good enough starting point.

Anyone could decide to use self-voicing mode while playing a visual novel for any number of reasons. However, the main demographic it's intended for is the blind or visually impaired. So, ideally, you should ensure that those people won't be entirely lost while playing with it. As the engine won't default to letting them know they've come to a choice menu, you should add accompanying alt text that does (they could find out on their own with context clues or by just messing with the arrow keys, but it's a nice gesture). If you add very visual-heavy minigames that are hard to make accessible with what is at your disposal, consider figuring out some sort of alternative (like a way to skip and convey whatever information the minigame would've conveyed, I don't know, it's your game).

Think of the TTS version of your visual novel as simply a novel, without the visuals. After all, besides sighted players who turn on self-voicing mode for their own reasons, it will more or less play like an interactive audiobook. Whatever effort you put into making it accessible is better than none at all, but bear in mind that the players using these features can get bored of your game just the same as anyone else.

Your game should still be just as enjoyable for the people who rely entirely on this feature to play as it is for the ones who don't! If you look at an image description you've written out, and it seems like something that would make you, personally, close a book, then you should probably change it. If it doesn't mesh with the rest of the writing, then you should probably change it. If you think a particular description would sound best when lumped in with some of the existing narration, then go for it! (Unless you have a ton of fully animated cutscenes, you're probably already describing some things in the non-TTS part of your game anyway- it doesn't hurt to tack things on to what you've already got to help clue the TTS players in on something.)

I've got plenty of experience going in-depth with alt text on this site, but I treat the alt text in my games quite differently. I'll describe plenty of fine details in my art on here, because on here that's the whole point- but in a visual novel, it's not the whole point. It's a game, not... y'know, a website.
So, I personally just go over what's generally being conveyed (what is this character doing? where is the scene taking place?), throw in some aspects I feel would stand out the most to sighted players (if it's a newly introduced character, what stands out about their design? if it's not a new character, is there anything different about their appearance?), then whatever details I feel are necessary to start reading between the lines (could this part of the picture have some deeper meaning, or is it simply there?), and then I move on. I don't mind reading some verbose descriptions every now and then, but I like to avoid bogging down my players too much, just to be safe. Things can still get pretty long when following these principles, but uh... well, hey, like I said, it's your game. It's all up to you; I'm just trying to make it a little easier to start.

Best of luck out there!