BlindZA.com - Couple of tricks relating to using forms of Gemini content description along with the vOICe

Couple of tricks relating to using forms of Gemini content description along with the vOICe

Using Gemini live interaction along with screen recognition

Firstly, for those who don't know, yes, I have been playing around with/using the vOICe since it's old nokia phone version, so have done quite a bit of what probably counts as self-training with it, which lets me interpret it's audio output on my own quite a bit of the time - it was one of the additional features I made use of when doing my motorcycle track ride in 2018 since it helped me detect when was nearing the edges of the track, at times before my sighted companion had wanted to warn me about that.

The developer of the vOICe/seeingWithSound - "augmented reality for the blind" has added a slightly experimental feature to specifically let us try out asking Google Gemini assisstant, running in live mode, with screen-sharing turned on, for details about what the camera is processing to produce the audio soundscape output/interpretation thereof.

As in, the android software normally doesn't render fully-detailed imagery on-screen since it's converting it to a form of lower-resolution grayscale before processing it to generate the audio soundscape interpretation, but they've now made it possible to toggle on camera colour preview mode, and this then meant I could activate google gemini live mode, including screen-sharing, and then if I hear something interesting, or detailed via the vOICe, I ask google gemini about what's showing up on-screen, and get detailed descriptions of the scene.

Now, yes, there are a couple of tricks, or things to bear in mind here - the silliest mistake I made when trying this out was forgetting that had set talkback to hide screen contents, so google gemini live screen-sharing literally just kept on asking me to "turn the lights on"...LOL!

Anyway, besides that, it's probably simplest to first bring up google gemini - activate it's live mode using the button that is on the lower right corner, activate screen-sharing - it will prompt you for a form of permission, due to privacy concerns - and then bring up the vOICe - it will switch to landscape display mode, and if you are not too used to it's interface, you will find mention of various aspects of it's settings on the screen, including that you can look for, and double-tap on the option to toggle mute on/off more or less in the middle of the screen, but, besides that, the first thing you will need to do is toggle camera colour preview mode on, and here is where another small bit of a talkback trick might help.

Say that since you would need to swipe/flick right across the screen to toggle camera colour preview mode on or off, and you can do this using two fingers, but I have another talkback gesture assigned to then pass next bit of interaction/gestures directly through to the system - have the talkback gesture of double-tap-and-hold with 4 fingers set to have talkback ignore next gesture, and this means I can then just flick right with a single finger across the screen and it's easy enough.

If you are not used to the vOICe, you might first want to toggle mute on, then do this an then toggle mute off again by double-tapping in the middle of the screen - you will get spoken feedback when exploring it's screen interface.

Anyway, with google gemini live screen-sharing now active in the background, you can listen to the vOICe's audio rendition of the camera viewpoint, and then,ask google gemini to describe the imagery in detail, maybe toggling mute on again to hear it's feedback clearly, and then toggling mute of the vOICe off again to carry on exploring.

And, yes, I was doing this while having paired my Samsung S24 ultra with my hearing aids via bluetooth as bluetooth earphones, and another note is that the vOICe runs offline, interpreting the visual input on the handset, which is why it can provide a form of real-time, almost instant feedback, without you needing to constantly keep on asking gemini about everything - you can wait until you hear what sounds like it might be an interesting bit of detail and then ask it about that scene.

Working additionally with automating Talkback's own screen recognition functionality

The next bit of an idea on this specific note related to wanting to get talkback to trigger it's A.I.-based screen description, but wanting to invoke it sort of automatically - might work with other camera software where the camera preview renders on-screen as well - this relates to the fact that in some parts of the world, we don't always get speedy response-times from online A.I. agents.

Firstly, I went into google talkback's settings, and under the customise gestures menu item, I allocated what talkback refers to as a swipe-left-then-up two part gesture - assigned that to invoke the "describe screen" functionality.

I then launched macro droid - which is a form of device interaction automation software for the android platform, and created a new macro, and started off assigning the shake device activity as the trigger, including configuring the constraint lower down that this would only activate if the vOICe was running in the foreground.

Then under actions that would occur due to the macro firing up, I told it to execute a gesture sequence with two parts, using starting point of 60% horizontally = X, and 60% vertically = Y, with the first part then moving to X of 30%, and Y still at 60% (swipe left), and the second gesture in the sequence was to then move up to X = 30% and Y also = 30% (swipe up) - as in, simulate the swipe-left-then-up talkback gesture, and I asked it to use 250 milliseconds to carry out the gestures - not sure if that relates to each part or the whole sequence, but anyway.

Now, this is where I was hopeful that talkback would initiate the describe screen functionality, so I then told macro droid to perform an 8 second wait, and then automatically trigger activating the back button, in order to close the talkback description result.

Anyway, after accepting saving the macro, fired up the vOICe, used my other trick of ignoring next gesture to then activate camera colour perception, and shook my Samsung S24, and, voila! - talkback described the camera viewpoint rendered as screen contents to me, and then reverted back to the vOICe's interaction interface.

I also made sure that this would work repeatedly, and your only issue might be volume levels since if the soundscapes are running, you might need talkback to be outputting it's speech at a relatively higher volume to be heard describing the screen contents well enough.

So, that's my next bit of experimental activity for now - hope it's of interest to some.