Voice & Communication
Launch voice-first agents across realtime chat, phone calls, SMS, and call-center workflows without stitching separate tools together.
Configure voice providers, phone behavior, SMS, and the settings that shape real-time communication.
One of the fastest ways to make an agent feel real is to let it talk. NetShow does not treat voice as a novelty. It treats voice, phone, and SMS as real communication channels your agent can use to help people, answer questions, capture leads, follow up, and stay available when you are not. This guide walks through the practical setup path for voice, phone, and SMS. You do not need to turn everything on at once. In fact, the best setup usually happens in stages. First choose how the agent should speak. Then decide whether it should answer live calls, place outbound calls, or send text messages. Then tighten the conversation behavior and test the experience before putting it in front of real users. If you want the safest starting point, begin with voice inside the main agent editor, then move into the phone agent flow only after you are happy with how the agent sounds.
Mrs. NetShow
Take this one step at a time. You do not need to fill every field perfectly on the first pass.
The best place to choose your speaking stack is the Voice & Realtime section inside the main agent editor. This section is separate from the general AI-brain area because text reasoning and spoken delivery are not always the same thing. Your agent can think with one provider and speak with another when that creates a better experience.
The first field to focus on is Voice Provider. NetShow currently supports OpenAI Realtime, Gemini Live, Grok Voice, Hybrid, and ElevenLabs-related behavior. If you are not sure what to choose, start with OpenAI Realtime. It is the most proven option for smooth, natural conversations. If you need stronger multilingual coverage or want more adaptive emotional dialogue, consider Gemini Live. If you want a more expressive, personality-rich style, Grok Voice is worth testing. If you love one provider for reasoning but want OpenAI for spoken delivery, use Hybrid.
Do not treat this as a purely technical choice. Ask what kind of experience the caller or user should have. Do you want calm and polished? Multilingual and flexible? Expressive and lively? Match the voice provider to the feeling you want, not just the brand name you recognize.
After the provider, look at the Realtime Model field. This controls the live conversation engine, not the regular text chat brain above. In most cases, the recommended realtime option is the right starting place. Change it only if you have a tested reason to do so.
Then choose the Realtime Voice. NetShow’s editor includes a broad set of labeled speaking voices, from neutral and balanced to warm, thoughtful, upbeat, or more authoritative. Pick one that matches the role of the agent.
Here is a simple way to decide:
Listen with the job in mind, not just your own personal taste. A voice can sound pleasant in isolation and still be wrong for the role.
Different voice providers expose different controls inside the editor. You do not need them all, but it helps to understand what they mean.
If you choose Gemini Live, you may see a language field, an affective-dialog toggle, and a camera-input toggle. Use the language field when one spoken language should be the default. Turn on affective dialog only when you want the system to respond more adaptively to emotional tone. Leave camera input off unless the conversation genuinely needs visual context.
If you choose Grok Voice, you may see expressive mode. This is the setting that supports more character and more expressive delivery. It can be a great fit for creator, social, or brand personality agents. It is usually too much for formal service roles.
If you choose Hybrid, NetShow shows the text provider and uses OpenAI for the spoken layer. This is the easiest way to keep one provider as the thinking brain while still getting a premium voice experience.
The voice section also includes controls for how the conversation behaves live. The most important are Turn Detection, VAD Threshold, and Silence Duration.
These settings decide when the agent thinks you are done speaking. If the agent jumps in too early, increase the silence duration or make the detection less sensitive. If it waits too long and the call feels awkward, make it slightly more responsive. For most users, the defaults are fine, so only change these after you have listened to a few real interactions.
The Audio Settings fields help with noise reduction, room conditions, and long-call management. If calls happen in offices, homes, or noisy environments, leave noise reduction on. If you are working with speakerphone or room audio, the far-field option often makes more sense than a close-mic option. The truncation setting helps keep long conversations from becoming unstable or overly expensive.
You may also see a Tool Namespace field. This controls which tools the voice session can use. Be conservative here. Give the voice channel access only to what it really needs.
The phone-related fields in the voice section include options such as Phone Enabled, Phone Number, and Sideband WebSocket. These settings matter when you want the agent to handle actual calls, not just browser-based voice.
Do not enable phone behavior just because you think you might use it later. Turn it on when you have a real phone workflow ready to test. That keeps your setup cleaner and makes troubleshooting easier.
If you are using a Twilio number, make sure the number is assigned clearly to the right agent and that you know whether the agent will answer inbound calls, place outbound calls, or do both.
For deeper phone setup, NetShow includes a dedicated phone-agent flow under the Twilio surfaces. This is where you shape the live call experience more directly.
The phone setup begins by asking you to choose the Agent Type:
This matters because inbound and outbound conversations usually feel different. An inbound call agent needs to greet, orient, and react quickly. An outbound agent needs to introduce itself, respect the listener’s time, and move toward a specific goal.
After that, you choose a Template or start from a blank setup. Templates are helpful if you want a proven starting point. A blank template is best when you already know the exact job and tone you want.
The phone-agent configuration form is designed around the actual call experience. The core fields you will see include:
The best way to fill these out is to imagine the first ten seconds of a real call. What language should the agent use? What name should it say out loud? Should it sound polished, warm, formal, or energetic? What number should route to it? If you cannot answer those questions clearly, pause and decide before moving on.
The Voice field often includes preview support. Use it. Never skip voice preview on a phone agent. A voice that sounds fine in theory can feel wrong immediately on a call.
After the basic phone setup, NetShow gives you a Customize Behaviours step. This is where you tell the phone agent what role it is playing during a conversation.
The key fields here include:
This is the heart of the phone setup. If the voice is how the agent sounds, this section is how it behaves.
The Goal field should be concrete. “Help people” is too vague. “Book qualified consultations for roofing inspections” is good. “Answer common patient scheduling questions and escalate urgent medical issues to staff” is good. “Take restaurant order calls, answer allergy questions from approved menu information, and hand off anything unusual” is good.
The Background field helps the voice feel grounded. Keep it short and practical. The Instructions field is where you define the actual call behavior: what to ask first, what information to collect, when to escalate, and what to avoid saying.
Only turn on Script when the call needs high consistency. Sales qualification, compliance-heavy intake, and tightly structured appointment booking are good candidates. More open-ended help conversations usually work better without a rigid script.
Whether you are configuring QuickAgent, the main editor, or the dedicated Twilio flow, the opening line matters. The greeting should do three jobs quickly:
For example, an appointment-focused agent might open with: “Hi, this is Maya, the booking assistant for BrightLeaf Dental. I can help you schedule, reschedule, or answer common appointment questions.” That works because it is clear, specific, and calming.
Avoid greetings that are too long, too robotic, or too vague. A phone caller wants orientation fast.
SMS configuration lives alongside the broader phone system, but the rules are slightly different because texting is a lower-friction, lower-patience channel.
An SMS agent should sound more concise than a phone agent. It should ask shorter questions, use fewer words, and make the next action obvious. If your phone prompt is long and layered, shorten it for text.
SMS is especially strong for follow-up, reminders, confirmations, and quick replies. It is a great match for lead follow-up, appointment reminders, scheduling nudges, and simple customer service flows. Use it when brevity helps.
If you are enabling both voice and SMS, make sure the tone feels like the same agent across both channels. The wording does not need to be identical, but the personality should feel consistent.
Once your phone agent is live, the Call Center views become your operational dashboard. NetShow includes inbound and outbound history surfaces, settings, and call-related management pages. Think of this as the monitoring layer for the communication channel.
Use call history to review:
If the agent is underperforming, the fastest way to improve it is usually to review a few real calls, then adjust the goal, instructions, greeting, or handoff behavior. The call center is where the phone system becomes measurable instead of theoretical.
If this is your first phone agent, keep it simple:
That sequence gets you to a working first version without forcing you into every advanced telephony option immediately.
The first common mistake is making the greeting too long. Callers do not want a speech. They want orientation and forward motion.
The second is choosing a voice based on novelty instead of fit. A voice can sound fun in preview and still feel wrong for actual customer conversations.
The third is making the goal too vague. Phone agents do better when they know exactly what a successful call should accomplish.
The fourth is overusing scripts. Scripts are useful, but when they are too rigid the conversation stops feeling human.
The fifth is skipping real testing. Always make test calls. Listen for interruptions, awkward pauses, unclear intros, and weak handoff points.
If you want an easy rule of thumb:
Do not wait until every phone option is perfect. Launch the smallest useful version first. A voice or phone agent only needs a clear job, a believable voice, a good opening line, and a safe conversation path to start delivering value.
Once it is working, listen to real calls, review the call center, tighten the instructions, improve the greeting, and refine the tone. That is how strong phone agents are built in practice.
Related product pages
Was this helpful?
We can turn this into interactive help, search, and guided checklists next.
Previous guide
Next guide
Thank you! We got your feedback.
We review every submission.