Voice Over IP for Linux Games?

Try these out by Anonymous Coward · 2001-06-11 06:53 · Score: 3

Voice over IP technologies are the same as those used for video conferencing, but with audio codecs only. The two VoIP/VideoConf standards for call setup and control are H.323 and SIP.

H.323 by Brett+Viren · 2001-06-11 07:22 · Score: 2

Coincidently I was just starting to mess with H.323 this weekend. With a direct net connection, one can use `ohphone' with out needing any other programs (except another one to talk to, of course). It's pretty simple to set up and use. If you are behind a NAT router things are trickier (I haven't gotten it to work yet). There are Debian packages for `ohphone' and a few other H.323 packages.

I was also able to use openmcu to provide rudimentary group conference services. This is also packaged for Debian.

BTW, with ohphone you can talk with windows Netmeeting users.

Re:VoIP in Java by peter · 2001-06-11 11:51 · Score: 2

You might want to use Vorbis. (see www.xiph.org) The encode/decode functions are in a library, so it should be easy to use. You could look at the source code for oggenc to see what library calls it makes. Check out the docs at http://www.xiph.org/ogg/vorbis/doc/vorbisenc/ (some of the links in the encoding library API ref are 404, it seems...).

Another reason for using vorbis is that it will keep Fraunhoffer from suing you for using an MP3 encoder.
#define X(x,y) x##y

--
#define X(x,y) x##y
Peter Cordes ; e-mail: X(peter@cordes , .ca)

VoIP in Java by Drake42 · 2001-06-11 06:52 · Score: 4

Hi All,

I wrote a trivial test app using the java sound API to make a VoIP program. It didn't implement any kind of standard, and it was completely insecure, but it worked after a relatively small amount of effort and it performed really well.

Java Sound passes just about everything through to the card so Java vs C didn't really come in to play much. All I did was decide that one machine was going to play server, and then everyone who connected to that machine got their byte streams mixed using the java Mixer and then sent back the mixed stream.

I'm up to my neck it projects right now, but if someone wanted to lead it up, I'd submit code and experience. Then we wouldn't have to worry about platform at all.

Jason

Re:You forget cost/profit analysis by Zach+Baker · 2001-06-11 13:14 · Score: 2

NB: I'm unrelated to the previous poster. I just thought I'd chime in with my own depressing take on things.

Excellent, I was hoping to get a reply from a developer.
1) Do you use Linux much? Have you used it for any game/non-game development? It is definitely capable of running games. Windows is only worthy of games because of the huge marketshare, not because it is a better gaming platform. It is also obvious that the Linux community wants games. Are you part of that community? Do you not agree?

I think you would be better off asking a major publisher these questions. Developers who would like to support Linux and can afford to often do so. Also, BeOS was an even better gaming platform than Linux or Windows, but look how far quality gets you in this race.

Also, porting to the Mac tends to be an afterthought. As in, let's worry about making the "real game" (on the primary platform) as freaknasty as possible, and think about possibly offloading the code on a Mac developer later.

And finally, regarding developers that wish they could be working in Linux, I presume you weren't referring to artists, level designers, testers or producers. ;^) Sorry to be brutally honest, but hey, remember the PS2 does run Linux kick-assedly (although it's only been available in Japan... so far)!

Summary of current market conditions: by Zach+Baker · 2001-06-11 13:24 · Score: 2

JOE LUNIX

You should make your upcoming game for Linux. It is technically
superior in many obscure ways.

PC GAME DEVELOPER

Sorry, we could only pick one operating system, and it turned out
to be Windows. Better luck next time.

[OT] FFT. by Christopher+Thomas · 2001-06-11 14:46 · Score: 2

Do you know why you wouldn't use an FFT for audio or video compression?

Out of curiosity, why?

Dabbling in signal processing and audio/video compression is one of my hobbies, though I'm only an amateur.

Re:[OT] FFT. by Christopher+Thomas · 2001-06-12 02:40 · Score: 2

So, the FFT breaks down a signal into both a sine *and* cosine, summed together (they use a really neat math trick involving imaginary exponents to do this :) ). The only problem with this for compression is that you end up with twice the amount of data you started with!

I probably should have mentioned that I already know how the FFT works; I was just wondering why it would be unsuitable.

I'd get around it producing twice as much data by discarding half of it. The real and complex components of the spectrum of a purely real signal will be symmetric and antisymmetric, respectively, so I can discard half of each and then reconstruct them before I do the IFFT when unpacking the compressed data.

The DCT can be mathematically derived from the FFT by doing similar tricks, as you're probably already familiar with (you treat the input waveform as half of a symmetric signal, which has no imaginary component in its twice-as-long FFT).

Out of curiosity, what is the basis behind the MDCT? I've heard of it but haven't seen an explanation of how it works.

[OT: I replied to your antimatter post with some of the information you asked for. Short version: It's really, *really* expensive to produce, and solar radiation isn't energetic enough to help.]
Re:[OT] FFT. by Christopher+Thomas · 2001-06-12 02:43 · Score: 2

well for realtime video at least --- it it sloooooooow.

Actually, the FFT is quite fast. It works in near-linear time on the data (O(n log n)). It's the DFT that's the slow version.
Re:[OT] FFT. by Christopher+Thomas · 2001-06-12 07:47 · Score: 2

The real issue is that you can't just throw away half of the data. If all you care about is magnitudes, then yes, you can take the magnitude of the energies for a given frequency. However, if you want to reproduce the original signal, you can't.

This would be the case if I was throwing away all of the real or all of the imaginary data, but I'm keeping half of each.

The spectrum really is perfectly symmetric for the real component and antisymetric for the imaginary component. Deviations from this symmetry would cause an imaginary component to exist in the input signal, which can a priori be assumed not to be the case. Phase in the real signal is encoded in the ratio of real and imaginary spectrum components for a given frequency.

The handwaving argument from information theory is that because I'm only encoding N values for N samples (the real components of the samples), I only need N components out to retrieve all of the information (the real and imaginary components of half of the spectrum). The full spectrum has a factor of two redundancy.

Again, the DCT makes the same assumptions I do in throwing away this data (it just does some additional tweaking on top of it). I can show the derivation if you like, but you probably already have it on file.

Your signal processing library is more complete than mine :). My library just does 1d DFT and 1d FFT, though the n-dimensional versions are numerically equivalent (you just have to chop up and reassemble the 1d spectrum).

BTW - I'm looking for a way of doing the equivalent of a FFT on a non-power-of-2 number of samples. Zero-padding produces an interpolated spectrum containing more information than I need. Any other scheme I can think of takes as much work as a DFT of the samples would. Any thoughts on this?
Re:[OT] FFT. by Rei · 2001-06-11 23:12 · Score: 2

First, the basics behind the FFT. The original concept behind fourier transformations was, of course, to break a signal down into component waveforms. Well, unfortunately, when you look at a cosine, it's symmetrical. A sine is anti-symmetrical. Thus, using only cosines, you could only represent symmetrical data, and with sines, anti-symmetrical data. So, the FFT breaks down a signal into both a sine *and* cosine, summed together (they use a really neat math trick involving imaginary exponents to do this :) ). The only problem with this for compression is that you end up with twice the amount of data you started with!

The DCT and MDCT are transforms which use a waveform which is not symmetrical or antisymmetrical, and thus can represent an arbitrary signal. Thus, with a DCT or MDCT, you get the same amount of data back as you put in (albeit, the data you get back is in floating point format, so you still will lose a bit just simply by storing it back into a integer representation via quantization, which is slightly lossy), but it works :)

- Rei

--
You know when it's okay to shout fire in a crowded theatre? When it's on fire.
Re:[OT] FFT. by Rei · 2001-06-12 06:46 · Score: 2

Hehe, sorry :) I figured since you asked about why the FFT isn't good for signal compression, you didn't know how to take them :)

The real issue is that you can't just throw away half of the data. If all you care about is magnitudes, then yes, you can take the magnitude of the energies for a given frequency. However, if you want to reproduce the original signal, you can't. The existance of the separate components - real and imaginary - store locational data. If you throw away where the peaks of the waves are located, in audio, cancellations with other waves don't work right, and some other problems. In images, its even worse, as data doesn't appear at all like it should (locations are wrong, intensities, etc).

I've never implemented the MDCT before (my library, when I last worked on it, covered DFTs, FFTs, DCTs, and wavelets (Haar and Daubiches), all of those in 1, 2, and 3d). I've only read a summary of it, but from what I read, it sounded like it works as follows: you start summing products from before the start of your block, and continue summing products past the end of your block (by 50%), normalizing appropriately. But, you only compute enough frequencies, using this, to be equal to the number of elements in your block. Now, this has the net effect of giving data that's not as accurate for your particular block; however, when you reproduce the signal, since every location has an overlap, when you average the overlapped sections together, the errors cancel out (an error in one direction on the MDCT corresponds to an error in the opposite direction in the adjacent block). It helps remove blocking artifacts while still keeping optimally-sized blocks for compression.

- Rei

P.S. - I responded to your post :) Short version: Not as much antimatter is needed as you think, and I dug up some solar wind energy levels, and found that, while the average location's energy level is quite low, energy levels at certain locations are as high as 102 GeV... while that's no 900 GeV like at FermiLab's best or 20 TeV like the supercolider would have produced, when FermiLab is producing antimatter they generally do so at 120 GeV - not too far off :) )

--
You know when it's okay to shout fire in a crowded theatre? When it's on fire.
Re:[OT] FFT. by Rei · 2001-06-13 00:55 · Score: 3

Actually, that isn't quite true. "All signals that come out of my microphone are real in the time domain so the frequency domain spectrum is symmetric". If you'll look at a raw audio file, you'll notice that it indeed not symmetric. I'm not sure how familiar you are with signal transforms, so I'll back up a bit. "real" and "imaginary" components are really just mathematical tricks, based on the property that
e^(X*i) = cos(X) + i*sin(X) (or is it the other way around?). The 'i' merely acts as a placeholder, it doesn't actually mean that the frequencies themselves are imaginary. By using exponential math, we can simply add to multiply.

A sine which is contained completely in a rtain frequency range, like a cosine, cannot store phase information - it requires both of them. Now, of course, you can extend the waveform in question so that it isn't completely contained in a certain frequency range - but that is no longer an FFT, but a DCT.

FFTs are useful because they evenly separate signals, and are quite fast. By computing the magnitude of a certain frequency's complex component, you can do windowing quite nicely to tell where your signals are. But, this magnitude alone is not enough to accurately reproduce the original signal with phase information. And, without phase information, cancellation effects can be very bad in the worst case, in fact, to the point of completely messing up your block.

Your example was really a DCT, but using sines instead of cosines :)

- Rei

--
You know when it's okay to shout fire in a crowded theatre? When it's on fire.

Tribes2 uses GSM by Caballero · 2001-06-11 07:12 · Score: 5

Loki had the same sort of problems when they ported Tribes2. They switched over to a freely available GSM encoding (from a university in Germany). It worked so well they're adding the code to the Windows version so you can chat between versions.

Re:Tribes2 uses GSM by Dwonis · 2001-06-11 12:16 · Score: 2

GSM is patented, I think.
------

Re:sure it's been done ... by Mullen · 2001-06-11 07:14 · Score: 2

As a person who plays Tribes 2 alot, I am agaist putting the VoiceIP stuff into the game. Tribes2 VoiceIP is horiable and I know that other companies can do alot better job at the VoiceIP than the makers of Tribes2.

--

--
Linux O Muerte!

Add voice to old games with viavoice by tap · 2001-06-11 09:06 · Score: 2

Paradise 2000, a Netrek client for Linux, uses IBM's viavoice for linux to get speech output. Netrek is probably the oldest real-time graphical game on the internet. It has a sophisticated text messaging and macro system, but pre-dates normal computers even having sound hardware, much less the power for voice over IP.

With a text-to-speech system, you can get voice output without having to worry about bandwith issues, poor quality sound, or people without a microphone.

With Netrek's RCD macro system, it's pretty nifty the things you can do. For example, a player who is in a base is hurt, and pushes a single key for generic distress, causing everyone on their team to get a message like:
F0->FED Help(SB)! 0% shd, 50% dmg, 70% fuel, WTEMP! 9 armies!
But your client will speak, "Base hurt, weapon temped", because all those numbers are a pain to listen to. Later the base is ok, so he pushes the same key.
F0->FED Help(SB)! 99% shd, 0% dmg, 99% fuel, 1 army!
Now the client just speaks, "Base is ok". The macros can have "if" statements based on the relevant values, e.g. if(damage>50) "hurt" else "is ok". It's a lot faster to just push a key than to say the relevant information. And if you don't have all the noise, you don't lose text communication with your teammates.

BTW, if it wasn't obvious, this is a shameless plug.

VOCAL by NitsujTPU · 2001-06-11 06:56 · Score: 2

http://www.vovida.org/ if you search around this website, you should come across a program called VOCAL, which is known to compile under linux, and provide some degree of VoIP support.

Wow! by mindstrm · 2001-06-11 22:17 · Score: 3

Guys, I wasn't trying to be anal, only educate people, because I know a lot of poeple are not aware there is an actual protocol called VoIP.

The comparison to ftp is entirely accurate.

I never said anyone was wrong, only that we should avoid confusion.

Let's avoid confusion.. by mindstrm · 2001-06-11 07:05 · Score: 5

VOIP, or Voice Over IP, has come to mean a specific suite of protocols for providing telco integration, call setup/teardown, etc. We shouldn't use it as a generic term for 'speech over internet' anymore..

Re:Will this matter once games begin including voi by ywwg · 2001-06-11 06:57 · Score: 2

is this feature in the linux version as well?

Re:Have you considered... by WNight · 2001-06-11 08:20 · Score: 2

Speech has more redundant information that most audio signals... A large factor in speech recognition is context.

How lossy are you willing to have this?

Written english contains something like 1.6 bits of entropy per character, meaning aproximately 10 bits per word. That can be pumped into a speech synthesizer and result in *very* lossy representation yet it maintains almost 100% of the meaning. (Ignoring fine details like inflection, etc.)

The best way, from a bandwidth point of view would be to detect phonemes and transmit only those, recreating the audio on the other side. This is a 'bit' CPU heavy for realtime...

At any rate, MP3 encoders might not be the best thing to use, they're tweaked to work well on music not speech. Likely you'll create a larger file than you need. Simply mask out all frequencies below 500hz and above 3000hz (I think) and apply a simple logrithmic encoding and you'll get fairly decent compression. It's also fast.

Nice to see you still posting to Slashdot, I had thought you weren't here anymore, being that you vanished in the middle of a thread... Or, do you not check your posts (slashdot.org/users.pl) to see if you've had any responses?

Re:Have you considered... by WNight · 2001-06-11 12:34 · Score: 2

Sorry, I forgot I was talking to the person who proved Fermat's last theorem when she was three... Or was that only solve the traveling salesman problem in constant time... Oh, no, just an expert at audio compression, the only opinion that matters in any matter philosophical, and the first person to solve the Monty Hall problem.

I really wish you'd read the posts before you reply to them, you'd be a lot more relevant.

I never said MP3 wasn't usable, I said *MP3 ENCODERS* weren't a good bet. The MP3 encoders ARE tweaked towards music, that's what people tend to use them for, and what they are engineered to do well.

You made a comment about just finding one, tweaking it a bit, and having a great solution. The only open-source ones I've seen have been designed with "MP3s", or 128kbit music... If you used one of these you'd have a lot of tweaking to do to get decent compression from it.

It also doesn't fit the problem, as I saw it. This thread is about communication for a game. Barry White tends to perform very few concerts via Quake3 server.

While I'm at it, I think I should mention that telephone calls are compressed, usually with hard arbitrary cutoffs, and MuLaw encoding. Also, the whole point of logrithmic encoding is to have more resolution for quiet sounds, instead of the loud sounds. What you describe is exactly backwards from the way it works.

I assume though, that you find the quality of voice over the toll networks to be unbearably bad though.

You know, I did say "I think" after the numbers 500hz and 3000hz, that sounds right, but I may have misremembered. I even marked it as such. However, arbitrary cutoffs sound only marginally worse than a smoother cutoff, and usually only on specific benchmarks.

If you'd prefer a touch more CPU, then by all means, use a smooth falloff. I assumed speed was all-important because of the gaming aspect... I wouldn't want a 20% speed hit from audio encoding.

When you're doing audio compression, delaying the signal for 50 or 100ms is safe, and gives you all the context needed. I also understand about perceptive encoding, not that it's a huge technical achievment as you make it out to be.

As with the gun control thing... I think you've got an agenda and you can't see that it's not mine. I don't give a rat's ass about gun control in your country. I do however find it annoying when someone throws around a bunch of misused statistics and emotional ploys to deceive people as to the severity of the problem. If you really had a strong point, you wouldn't need to use a bunch of tricks.

And you don't really seem bitchier, you were more insulting in the other thread and just as quick to anger, without reading the post.

Don't waste your precious time on my account, if I want to have this kind of technical discussion I'll go talk with the marketing team.

Re:Have you considered... by WNight · 2001-06-14 03:05 · Score: 2

I think the key in your question is the word 'good'...

Many encoders do this (arbitrary cutoffs, etc) but they aren't designed from the same point of view as an average encoder on a PC...

The reason simple *Law encoding is so popular in telephony is that it takes very little CPU time, on embedded CPUs without an FPU. It also produces a constant-sized output. That's very handy in networking with technologies like ATM where you can reserve bandwidth.

If you have CPU time to burn, even 10% of a modern CPU, then doing a perceptual encoding of speech in realtime gets your better quality at comparable sizes. But if you're trying to do this with so little CPU as to be transparent to the primary application (usually, in this discussion, a game taking 100% of the CPU) or on a tiny 8-bit CPU with 64 bytes of RAM, FFTs (etc) aren't an option.

The reason for the hard cutoffs is that they're usually done in hardware while the signal is still analog. I believe you can get current soundcards to do something like this while recording. If you can't, the work required to seperate out the unwanted frequencies through software means you should probably do a more advanced encoding.

If you're curious though, try an 8000hz voice sample with cutoffs where Gordonjcp suggested, then if your audio program supports it, try 8bit logrithmic encoding. (Not just 8-bit linear encoding, it's much worse.) It's definately not worth encoding music in, but it's not bad for voice. (After all, the telcos use it.)

What do you do for work? I have a feeling we're both in opposite ends of the same field.

Re:Impressive by gmhowell · 2001-06-11 07:39 · Score: 2

Ummm... Go reread the post. I think it was humor.

--
Jesus was all right but his disciples were thick and ordinary. -John Lennon

Re:sure it's been done ... by gmhowell · 2001-06-11 07:42 · Score: 2

Excellent. And with that DirectX 8 that is due out for Linux, he'll be all set!

Seriously, while a PITA for Linux, it will be great for 'Doze games.

--
Jesus was all right but his disciples were thick and ordinary. -John Lennon

Re:What about Macintosh? by darmou · 2001-06-11 10:59 · Score: 2

Rodger Wilco has a mac version as well

--
-- remove NOSPAM for actual email address -- Things are not as square as they may seem

Compression too! by dmaxwell · 2001-06-11 10:31 · Score: 3

ssh has the -C flag that uses the same compression algorithm as gzip. That might make the above scheme barely practical. I know, I know, it was a joke..

I will point out that ssh compression works wonders on a vnc session if the server is running sshd. There is a nice howto on the vnc page for tunneling the connection over ssh. Secure and faster....bonus!

suggestions by DanThe1Man · 2001-06-11 07:05 · Score: 2

Will Linux Telephony do the trick?

If how about useing Speak Freely or Open Phone?

Re:Have you considered... by Dwonis · 2001-06-11 12:23 · Score: 2

Please... if you're qualified to discuss audio compression, how about the basics? Do you know how to compute an FFT? Do you know why you wouldn't use an FFT for audio or video compression? What about a DCT? MDCT? What do you know about quanization schemes? The advantages/disadvantages to storing quantized data with huffman encoding vs. arithmatic encoding? Have you ever written a single signal processing function? (I've written a whole library). Do you know anything about the subject at all?

I'd like to step in and ask: where I can find that information?
------

Re:sure it's been done ... by Dwonis · 2001-06-11 12:26 · Score: 2

Eww... I'd never use DirectX for Linux.
------

Re:Responsibility of Game Publishers by Zigg · 2001-06-13 20:02 · Score: 2

Petition, schmetition.

There are Linux game publishers out there. If their unit sales numbers were comparable to numbers for Windows games, then you'd see more publishers writing to Linux.

All the petitions in the world can't make an unprofitable situation profitable.

*nix based Voice over IP is easy! by dougmc · 2001-06-11 07:49 · Score: 5

We've been doing `Voice over IP' for years, far longer than it's been `cool' -

cat /dev/audio | rsh remote-host 'cat > /dev/audio'

Well, for the year 2001 you may want to use `ssh' instead of `rsh', and /dev/dsp instead of /dev/audio for Linux, but the idea is still the same ...

Almost as much fun as making the Sun next to you belch while the newbie is using it!

Re:Speak Freely for Unix by technos · 2001-06-11 06:59 · Score: 2

I use it all the time. (GF in CA, I'm in MI) The Linux version can be a bitch on some soundcards (ES137x and S3 SonicVibes) compared to the Windows version.

What else to say? It'll do multicast conversations, and coexist nicely in LPC10 mode with Quake on a 56K so long as you don't mind not being to hear Quake save the CD audio.. The delay can be unbearable. If I yell into the mic, I can hear myself five seconds later some nights. Others, it's nearly instantaneous.

--
.sig: Now legally binding!

Re:Speak Freely for Unix by technos · 2001-06-11 07:43 · Score: 2

If you're referring to the ohphone that is a part of the OpenH323 stuff, you're only right because of the bitrates involved. The largest amount of bandwidth you can eat with SF is about equivalent to what you consume using G.723, one of the better compression ratio H323 modes.

Also, using H323 as a carrier would mean you need at least a 1/4 cycle (You are only sending or receiving data 1/4 of the time) peak throughput equivalent to a capped cable modem; I've used Speak Freely on as little as a 9600 baud modem in 100% cycle operation.

--
.sig: Now legally binding!

Re:Let's make one... by technos · 2001-06-11 07:05 · Score: 4

GSM, the standard used in cell-phones, works fine with less than 1/5th of that bandwidth (bidirectional in 2.4-2.8kbps, will do realtime compression in a 486, and there are freely available GSM compressors.

There are also public domain encoders for the military voice standards, LPC and LPC10. Those are usable in as little as .96kbps, but they sound horrible. in comparison.

--
.sig: Now legally binding!

For Half-Life engine fans... by antdude · 2001-06-11 07:09 · Score: 2

If there was a Linux port of Half-Life, then this upcoming voice feature would kick butt. We wouldn't need these voice utilities.

Off-topic, this upcoming Spectator feature would be sweet as well. :)

--
Ant(Dude) @ Quality Foraged Links (AQFL.net) & The Ant Farm (antfarm.ma.cx / antfarm.home.dhs.org).

Re:Open H.323 by brunes69 · 2001-06-11 08:42 · Score: 2

I have used this and can attest to it working. It not oly worked, but worked fine connecting to a netmeeting user as well. So it is also viable for connecting to window users

Tribes 2 Seems to Incorporate One by Greyfox · 2001-06-11 07:14 · Score: 2

I noticed that the Linux Tribes2 seems to have some GSM thing built right in. Haven't fiddled with it yet, as I think my Microphone is dead and the game seems to run much too slowly on my system anyway (Time to spring and get that geforce2 *sigh*...)

--

I'm trying to teach myself to set people on fire with my mind... Is it hot in here?

Speak Freely for Unix by Bigtoad · 2001-06-11 06:45 · Score: 5

How about Speak Freely for Unix?

I have played with it a bit, and it seemed to work, but I haven't actually used it for gaming yet.&nbsp It didn't seem as simple to configure and use as some of the windoze voice comm programs, though.

Freshmeat by rosewood · 2001-06-11 06:47 · Score: 3

I know when I was looking for tools like Net2Phone and Dialpad.com for linux I found a few SDKs on Freshmeat for VoIP. However, none of them were geared toward gaming such as Rogerwilco and Battlecom. I would love to find a good cross platform "game-geared" VoIP system. Btw - only RWBS is for Linux - no linux clients that I know of

--
The ultimate network admin tool needs HELP!

There was an IP phone in the perl journal recently by marcmac · 2001-06-11 06:58 · Score: 2

TPJ had an IP phone in ~70 lines of perl - this should be adaptable to your purposes, I'd think...

sure it's been done ... by Frizzled · 2001-06-11 06:56 · Score: 2

but, to date it hasn't been done well.

Roger Wilco is notorious for using up resources and being near-unhearable and Battlecom, while pretty solid, also has problems with static creeping into the transmission (interestingly, ShadowFactor is about to discontinue Battlecom).

One solution is simply to build Voice-Over-IP into the app ... in the case of games, Valve's next patch to the Half-Life engine will contain code to do this very thing. It should be interesting to see if a company has more luck trying to write this code directly into their product, rather than replying on other companys to create add-ons (like BC and RW).

_f

Re:sure it's been done ... by geomcbay · 2001-06-11 07:04 · Score: 2

Well Microsoft's added voice communications into DirectX, so each game developer for Windows won't have to recreate this stuff, they'll just code some generic interfaces into the API.
For the same sort of thing, but cross-platform, there's HawkNL

Re:Let's avoid being ANAL by SuiteSisterMary · 2001-06-11 07:56 · Score: 2

That's like saying that any protocol that transfers files over the Internet should be called 'File Transfer Protocol' and is equally stupid.

--
Vintage computer games and RPG books available. Email me if you're interested.

Re:Let's avoid being ANAL by SuiteSisterMary · 2001-06-11 18:38 · Score: 2

So if I tell you to send me a file using 'file transfer protocol' you're going to assume that I mean good old RFCd FTP, port 21, all that crap? Or are you going to assume I mean HTTP? ICQ file send? MSN file send? SMTP MIME? scp? rcp? Zmodem over telnet? The poster above IS saying that Voice Over IP, the protocol used in all sorts of hardware and software, is NO different from 'voice over IP', or any form of protocol involving TCP/IP and transmission of voice. By his defintion, VoIP could involve FTPing sound files. VoIP is a specific technology, with a specific implementation. Hell, Linksys's new Cable/DSL router has a VoIP option; plug a regular phone into the assigned jack, and tell people to point their VoIP software at the router's IP address, and if somebody calls, it rings the phone.

--
Vintage computer games and RPG books available. Email me if you're interested.

Have you considered... by Rei · 2001-06-11 07:50 · Score: 2

Have you considered taking a gpl'ed mp3 encoder and ripping out some of its (slow) frequency masking and hearing sensitivity calculations? That would take almost no effort. You'll take a quality/size hit (unimportant for speech), but it should be fast enough for real-time encoding without a significant CPU signature; then all you have to deal with is streaming. Also, given that this is speach we're dealing with, you can expect most of the frequencies in question to *not* be masking each other out or be outside the human hearing range (what point would there be to humans being able to make sounds like that? ;) ).

- Karen

--
You know when it's okay to shout fire in a crowded theatre? When it's on fire.

Re:Have you considered... by Rei · 2001-06-11 23:42 · Score: 3

I'll step in and answer some of those questions for you :)

First, before we can go into the FFT, we need to discuss the DCT (Discrete Fourier Transform). Of course, the purpose of the DFT is to break down a signal into component waveforms - but how? Well, picture that you have a waveform that is just a sine, lets say, 5 hz across the area you're looking at. Now, picture multiplying that waveform, at every location, times a sine waveform of zero hz, summing those together, and normalizing. What do you get? You get 0. :) Well, try it with a 1hz signal. You still get 0. Keep trying till you get to multiplying it times a 5hz signal. All of the sudden, you'll notice that wherever the signal is positive, it gets multipled by a positive value, and wherever it is negative, it gets multiplied by a negative value (making it positive) - so you no longer get 0 as your result. For everything over 5 hz, you'll get 0 again. You just broke down a simple waveform into a sinous component. :)

Now, as was discussed in another reply on this thread, to represent *any* signal, you can't just use a sine at even mhz boundaries and steps. For that, you have to use the sum of a sine and cosine. Due to a trick you can use involving imaginary exponents, you can get one part of the data to come out "real" and the other part "imaginary", and correspond to the individual components of your signal. But, since you get twice as much data back (real and imaginary components), it is a poor choice for compression. DCTs and MDCTs are briefly discussed in the other reply as well. The main difference between a DCT and MDCT is for use in block transforms. Also, the difference between a DFT and a FFT is that, in a DFT, you'll find that you're doing a lot of the same calculations multiple times, due to various properties of sines and cosines. FFTs are basicly a re-ordering of the calculations so that redundant ones are done (less often) (optimally, once). There are several different FFT algorithms.

Block transforms are to get rid of a nasty side effect of transformed data. If a signal exists in one part of the data, but not another, frequency decomposition has trouble dealing with this. It generally causes "ripples" of energy to appear (the goal of doing transforms, compression-wise, is to concentrate the energy in certain frequencies, and then store them - so, ripples are bad). If you look at a very large sample, many frequencies will start and stop. So, you break it down into blocks - if there's a start or stop of a frequency, it only causes ripples in that section.

This works fine until you start throwing away data on a DCT. Because different data will get thrown away in different blocks, while they'll have the same overall level of quality, there will still be discontinuities between the blocks. The MDCT effectively halves the block size and vastly reduces discontinuities by including an overlapped area in its calculations.

Before I can discuss quantization, you first have to understand thresholding and the principles of compressing a transformed signal, which were briefly discussed in my original post. After you transform the signal, ideally, your energy is concentrated in specific frequencies. The effect is something like a starburst in the upper left-hand corner of each block that was transformed. Generally, you will still have *some* energy in weak frequencies, but not much. So, you kill them off - generally with a threshold that varies over the human hearing range, in the case of audio. Also in audio, you generally want to take masking effects into account when killing off weak signals. Once your energy is left in strong signals, you need to store how strong. However, while your input signal might have been composed of 8 or 16 bit integers, your output data will generally be high decimal-resolution floating point values. You need to get it back into integers. This is known as quantization. Some schemes simply convert the data back linearly. Some create a table of arbitrary endpoints for what-converts-to-what. Some use a smooth function. There is a lot of debate over what is the best method. I personally recommend, in this case, after seing the tiny gains made by various other quantization methods, for a huge cpu/complexity cost, using linear quantization.

Huffman encoding is typically used to losslessly encode the quantized data. Huffman encoding has proved attractive because they already know what sort of tree they can build to compress the data well. However, I feel that using arithmatic encoding can give *huge* advantages, via frequency preduction. Because, not only do you know the signal density for a given location for an arbitrary signal, you know what it has been like for this *particular* signal in the past, and can scale your probabilities appropriately. (oh, btw, if you want info on how to use huffman or arithmatic encoding, just ask).

Anyways, I better get back to work. Ciao!

- Rei

--
You know when it's okay to shout fire in a crowded theatre? When it's on fire.
Re:Have you considered... by Rei · 2001-06-12 00:13 · Score: 3

I just did a quick check on human voice ranges. Vowel sounds contain notable power at frequencies as low as 50 Hz. The sibilants and fricatives, s and f, contain significant amounts of power at frequencies as high as 8,000 Hz. Frequencies above 1,000 Hz contribute the most intelligibility to speech, even though only 16% of the energy lies above that frequency (Chapanis, 1996). Using such an arbitrary method of thresholding as truncation is *far* from optimal in signal compression. Perhaps it has some advantages in analog signal transmission - I don't deal much with that, and that seems to be what you deal with, so perhaps their are applications for it there - but when it comes to encoding digital signals for compression with your goal being the clarity vs. size ratio, that is a very poor method. A quick look at human hearing research shows that humans have vastly different sensitivies over different ranges, and just "throwing away" data outside of certain ranges regardless of signal intensity, and keeping data within those ranges regardless of how weak it is, will ruin your clarity vs. size ratio. That's why you won't see a single, good encoder, whos target is speech or music, that does this. Please, if you can present an example to the contrary, please do so :).

- Rei

P.S. - for those who care, the cpu cost of having a varying threshold over various ranges, instead of a constant one, is negligable compared to the time it takes to do the MDCT, quantize, encode, etc.

P.P.S - Any specific URLs from those organizations I should check out? I'm always looking for a good distraction from work ;)

--
You know when it's okay to shout fire in a crowded theatre? When it's on fire.
Re:Have you considered... by Rei · 2001-06-11 11:03 · Score: 5

Sorry, some of us have lives on the weekends ;) My humblest apologies for having better things to do on a saturday night than to debate gun control with someone who loves to continue to dodge the simple question, "Do you think the US legal system is wrong 50 times as often as it is right?", by making pseudo-statistical arguments without overcoming the sheer numbers, and nitpick choices of examples in arguments without hitting the core of them (I.e., "just because something else is worse doesn't justify something bad"). But, regardless, this thread is about audio compression :)

When you do a block DCT or MDCT on an audio signal, you're not looking at a whole page of text's worth. You generally look at a fraction of a second. Speech has little redundancy at this level. However, that isn't what I was referring to. Do you have any background in audio compression? There are two keys in compressing audio using current methods: Frequency masking and signal response. Frequency masking is the fact that when the human ear hears a strong signal, weak signals that are near it in frequency seem to "dissapear" or "merge" with the stronger signal. Signal response (hearing response, frequency sensitivity, etc), is how good, overall, the human mind/ears are at hearing weak signals at various frequencies across the spectrum. By a careful knowledge of these, in music or voice, you can kill off many more frequencies than without it. However, it also is a big CPU consumer to do it very carefully. Cutting out some of the analysis can save you a good bit of CPU - and, in the case of human voice, which tends to be in a very audible range with few masking effects, won't affect your compression rate much.

Second, please, if you can create a good sounding speech synth - especially one that can give inflections, emotion, etc - please, please share it with us. Until then, good luck having something like this work (simply neglecting CPU issues) without sounding like a 50s robot that messes up once a second.

Oh, and to answer your theory about masking out frequencies below 500hz and above 3000hz: No. That will sound so unbelievably awful. First off, lets neglect the fact that someone with a voice like Barry White would be inaudible, and that you'd never hear a 't' or 's'. Ignoring that, that's a silly way to do it. You need a simple curve, even just a simple line graph. It takes little CPU time, and will actually be able to reproduce the original sound well. Arbitrary truncation points are unbearably bad.

Next, you seem to be of the notion that MP3 encoders are "tweaked towards music". MP3 encoding is a fairly abitrary term. MP3 is a specific format for encoding streamable, quantized, transformed data. You can use any truncation scheme you want - even the silly one you proposed. Most encoders you'll find are tweaked towards the human hearing range - an optimal choice for both voice and music (especially voice, though! Voice compresses very well, because, compared to music, it has most of its energy concentrated in a few signals at any given time).

Next, why use "logarithmic encoding" for compression? Logarithmic encoding is a (poor) way to store raw (uncompressed) audio data - it sacrifices low-level clarity for the ability to represent very loud signals - something seldom of use in normal audio compression applications (have you ever noticed how quiet signals on an 8-bit sound card are very crackly, but the loud ones are clear? Thats the sort of effect logarithmic encoding gives to sound). It is useful in efficient Pulse Code Modulation (PCM) of data for maximizing the number of transmissions over a small number of physical channels, but doesn't even begin to apply as far as storing quantized data is concerned (that would be like using a bubble sort to compute Pi or something ;) ). Do you mean "arithmatic encoding", perhaps? I have a neat theory for using that, with scaled probabilities, to create an optimal compression ratio (predictive) for thresholded, quantized data, that I came up with after the last slashdot conversation on compression (it was video compression then).

Please... if you're qualified to discuss audio compression, how about the basics? Do you know how to compute an FFT? Do you know why you wouldn't use an FFT for audio or video compression? What about a DCT? MDCT? What do you know about quanization schemes? The advantages/disadvantages to storing quantized data with huffman encoding vs. arithmatic encoding? Have you ever written a single signal processing function? (I've written a whole library). Do you know anything about the subject at all?

If you don't know what you're talking about, please don't be suggesting encoding schemes. There are enough bad ones out there already ;)

- Rei

P.S. - sorry if I seem a bit bitchy. For some reason, they decided to leave us without air conditioning today at work ;)

--
You know when it's okay to shout fire in a crowded theatre? When it's on fire.

Vocal and SIP by cullenfluffyjennings · 2001-06-11 09:25 · Score: 2

You could just hack your own protocol to do it Why not do it the open standards based way? Use SIP - it is good for voice and there is substantial work to build instant messaging and presence stuff on it. Someone, sipforum I think, showed Quake with SIP. I have worked on the SIP stuff at www.vovida.org and it is open source.

Will this matter once games begin including voice? by fetta · 2001-06-11 06:48 · Score: 2

Will this matter once games begin including voice as a feature within the game? Tribes 2, for example, includes its own built in voice communication system.

Granted, external programs like GameVoice and Roger Wilco offer some additional feature, but will the average gamer care? I would expect every game that has a team play element to include its own voice technology within a year or two.

--
** The opinions expressed here are my own, and do not reflect those of my employers - past, present, or future**

Re:Let's avoid being ANAL by Erasmus+Darwin · 2001-06-11 21:29 · Score: 2

So, VoIP is a specific protocol, 'voice over IP' is a technology application with many potential specific solutions.

I'd debate this point. First, the article text specificially said "Voice over IP" (with voice being capitalized for no other reason). That, to me, implies a proper name rather than a general technology label, not unlike if someone where to say "File Transfer Protocol".

Second, the phrase "Voice over IP" is frequently used when referring to (what you call) "VoIP". For some reason, you seem to use "file transfer protocol" (FTP) versus "a file transfer protocol" (anything to move a file), but create an artificially more relaxed standard where the expansion of "VoIP" is the generic term. This simply isn't the case.

Also, while I hesitate to use pointing out popular usage as a means of winning this argument (and I'm sure it'll come back and bite me when the next cracker vs. hacker argument pops up), a quick google search on "voice over IP" appears to turn up links that're all about VoIP, rather than "use the Internet to talk to your friends". Again, while I'm loathe to normally invoke the popular definition, it's worth pointing out that it seems to coincide with the technical one, providing a much more compelling case.

Overall, I suspect your problem is that you've fixated on the fact that "voice over IP" is an overly broad term that, when analyzed, could apply to a broader set of items than what it's actually intended for. On the other hand, we've already got the afore-mentioned example of "file transfer protocol", we know that a "personal video recorder" is a TiVo-like device rather than just any video recorder owned by an individual, we know that a "television" (literally: far seeing) isn't just any device that lets you see far away -- telescopes and binoculars certainly aren't part of that group. When it comes down to it, a name is really just an arbitrary designator that just happens to usually have some relevance. If you want names that're complete descriptions, you might want to switch from English to German.

Searching freshmeat still seems to work ... by Tyndareos · 2001-06-11 06:54 · Score: 2

http://lumumba.luc.ac.be/jori/jvoiplib/jvoiplib.ht ml.

Don't know if it's really what the person is looking for, but it's worth a shot.

Re:Responsibility of Game Publishers by Moridineas · 2001-06-11 08:39 · Score: 2

I cited Carmack as a developer who is into linux. Look at other prominent developers, such as Sweeney--they don't care nearly as much. Nor do most game developers i think. MOST people don't need or want to have to have an opinion in the OS wars. It's enough to make money and do what works.

Whether the constraints of multiplatformism are benefitial? Most likely (looking at it structurally, and assuming that at least ONE profitable and worthwhile port will be made). However the platform that makes up for this cost deficit, ISN'T *nix.

The "bottlenecks" are managers--I guess you are correctin saying this, though i wouldn't use the term bottleneck. Programmers create a product, managers make money.

Scott

Re:Team up w/ Apple (OS X) - MORE BUYERS! by infiniti99 · 2001-06-11 10:42 · Score: 2

I totally agree. This even makes sense when combined with my other post in this thread. MacOS has a fairly low market share, just like Linux, but it has a better reputation as a desktop OS and so it gets lots of good commercial applications and games. No doubt these developers (like Adobe, Blizzard, etc) are interested in easy crossplatform development.

With SDL and OpenGL working on Mac, and Qt on its way there, companies may be very interested in utilizing these libraries/technologies for crossplatform Windows/Mac development. The kicker though is that these are also supported fully on Linux. If these libraries catch on in the Mac world, I'm fairly sure Linux will see lots of the same apps merely by side-effect.

-Justin

Re:You forget cost/profit analysis by infiniti99 · 2001-06-11 11:18 · Score: 2

Excellent, I was hoping to get a reply from a developer.

1) Do you use Linux much? Have you used it for any game/non-game development? It is definitely capable of running games. Windows is only worthy of games because of the huge marketshare, not because it is a better gaming platform. It is also obvious that the Linux community wants games. Are you part of that community? Do you not agree?

2) Still, using OpenGL would make a Mac port easier. Choosing OpenGL over DirectX is mainly a portability decision. Do you not want your game to run on Mac either?

3) True. Any game worth its salt has its own GUI library. I meant to bring up Qt as a reference to application developers, broadening my argument for crossplatform programming to include all types of software. I must say Qt may be a good in-house tool at a game company though, for developing map editors, game editors, etc if you have a heterogenous development environment. It may encourage a crossplatform mindset in your company as well. Do any of your developers wish they could be working in Linux?

I understand what you mean though. It all comes down to whether or not you consider portability important, and whether or not it would turn a profit.

Re:You forget cost/profit analysis by infiniti99 · 2001-06-11 08:19 · Score: 3

That's why there is SDL. It uses DirectX on Windows and DRI on X (as well as many other graphics layers / OS's).

I think the problem with Windows developers in general is that they don't think of coding crossplatform in the first place. It's easy to understand why: they are taught DirectX and MFC, and Windows has a huge percent of the desktop market. Also, some games are coded so horribly (compare the duct-tape-that-is-EverQuest to any Blizzard product) that porting certain games look like they would be a nightmare.

On the other hand, I think Linux developers are more trained to code portably. With all the unix flavors out there, source portability is already a must. It also seems that these developers care about porting to Windows. Many apps for X are available on Windows (like a lot of the Gtk stuff), but not the other way around.

So Linux developers actually care about portability, but Windows developers do not. Maybe we can convince them to change their ways?

Surely the Windows developers out there don't thoroughly enjoy Windows-only programming, do they? I've used DirectX, and it was ludicrous. It isn't direct at all (Come on, DirectMusic? DirectPlay? Direct is just a buzzword..) and the classes are a mess. I haven't heard much good about MFC either, but I've heard only good about Qt (and I've used both).

Qt works on Windows. There's no reason to use MFC. Yes it does cost money, but aren't we talking about real game companies here? SDL works on Windows. There's no reason to use DirectX "directly" (whatever that means). You know how long it would take to port Windows apps/games to Linux that were all written in Qt and SDL? All of a recompile.

List of VoIP applications by leonia · 2001-06-11 07:05 · Score: 2

There is a pretty comprehensive list of VoIP and multimedia applications at http://www.cs.columbia.edu/~hgs/rtp and http://www.cs.columbia.edu/sip (under 'implementations') These applications are all interoperable, at either the media or call set up level. Among others, our sipc application runs on Linux (and Solaris, FreeBSD, etc.).

HawkNL by geomcbay · 2001-06-11 07:01 · Score: 3

HawkNL is a nice LPGL library (currently with Win and Linux support) for doing, among other things, voice over IP.

Its targeted at game programmers, to be integrated in-game, as a cross-platform alternative to Microsoft's DirectPlay and DirectPlay voice, but could be used to do a stand-alone VOIP app as well (though I am not aware of any currently).

Just a little effort... by Bonkers54 · 2001-06-11 06:52 · Score: 2

With a mere 15 seconds of research I found a wonderful library here that would very easily be utilized to make a VoIP app for games. All the work is done for you just about, now go finish the rest. This library will even work on both windows and linux, so you could make your program cross platform! Sometimes all it takes is hitting the search button to find what you need.

-Matt

easy problem--easy solution by Not+A+Democrat · 2001-06-11 06:59 · Score: 2

Well, voip isn't very complicated. If you're unsatisfied with the currently available solutions like those available at LinuxTelephony or OpenPhone you can always roll your own. I know I'd lay down a couple bucks for a satisfactory one, so you can probably sell it and make a couple quick bucks from your effort, too. Stop whining and make a difference!

--

Being Liberal should be a Crime!

Vovida by blang · 2001-06-11 06:58 · Score: 2

Is an opensource SIP stack. It's an overkill if you just want to chat during games, but it's there.

--
-- Another senseless waste of fine bytes.

Responsibility of Game Publishers by idonotexist · 2001-06-11 06:57 · Score: 2

To the point: game publishers need to get off their arse and provide a linux version for each release.

This is the only way I can see we can solve the problem of not having the "latest and greatest" windows games on Linux. Also, by the time one completes porting a game from Windows to Linux, it is likely the game is passe (except maybe for halflife (the game will not die)).

Is there a petition I can sign? A list of game publisher email addresses to send an email to? I think part of the problem is that game publishers do not see a demand on Linux platforms. Perhaps if publishers were communicated a significant interest, then they should at least think about providing for Linux versions.

--
"There ought to be limits to freedom"

You forget cost/profit analysis by ColGraff · 2001-06-11 07:22 · Score: 4

Most modern computer games uses Direct3D and DirectX a great deal. These libraries are not portable, and they are what most developers have experience with. You are asking, in essence, for developers to either have two devteams working in parellel, or one team programming both versions. In either case, the game company can either completely rewrite the game engine for each OS, or it can create one highly portable engine. The problem with the latter option is that DirectX in particular really is the backbone of Windows gaming. It would be very hard to convince developers to give it up, and I'm not sure "DirectX free" windows games could match the perfomance of their windows-only counterparts.

Alos, have you considered the expense of training all these developers in Linux? Remember, most of them do not have Linux experience.

Finally, when you consider that Windows controls 90something percent of the desktop gamer market, it just doesn't make sense for a company to pour massive resources into developing Linux and Windows games simultaneously that only a relatively small number of people would buy. At least a dedicated porting company like Loki doesn't have to worry about graphic artists, level designers, story writers, or game design as a whole.

--
I'm the stranger...posting to /.

Slashdot Mirror

Voice Over IP for Linux Games?

65 of 147 comments (clear)