HN Theater @HNTheaterMonth

The best talks and videos of Hacker News.

Hacker News Comments on
How to hear image descriptions in the Camera app on iPhone, iPad, and iPod touch — Apple Support

Apple Support · Youtube · 172 HN points · 1 HN comments
HN Theater has aggregated all Hacker News stories and comments that mention Apple Support's video "How to hear image descriptions in the Camera app on iPhone, iPad, and iPod touch — Apple Support".
Youtube Summary
With VoiceOver and Image Descriptions turned on, you can hear a description of what you’re taking pictures of in the Camera app.

To learn more about this topic visit the following articles:
Use VoiceOver in apps on iPhone: https://apple.co/3u3MaQ9
Turn on and practice VoiceOver on iPhone: https://apple.co/3vv43HG
Change your VoiceOver settings on iPhone: https://apple.co/3gL9pKZ

Additional Resources:

Contact Apple Support for iPhone: http://apple.co/iPhone

To subscribe to this channel: https://www.youtube.com/c/AppleSupport

To download the Apple Support app: http://apple.co/2hFtzIv

Apple Support on Twitter: https://twitter.com/AppleSupport
HN Theater Rankings

Hacker News Stories and Comments

All the comments and stories posted to Hacker News that reference this video.
I like your canary; it reminds me of the Terence McKenna quote "The artist's task is to save the soul of mankind; and anything less is a dithering while Rome burns. Because if the artists - who are self-selected for being able to journey into the Other - if the artists cannot find the way, then the way cannot be found.".

If the hackers - who are self-selected for being able to journey into the Machine - cannot find the adventure, then the adventure cannot be found.

Things which have made me wow in more recent years:

- Large scale aggregation of data, specifically when Google started using data from phones around the planet to overlay live road traffic data on Maps, or when individual businesses are typically busy on different days of the week. Live map views of lightning ( https://lightningmaps.org/ ) or weather ( https://www.windy.com/ ). Things a skilled programmer might build (a map on a website) but which can't work without a global network of sensors.

- More continuously active sensors, brought about by specialist circuitry and low power systems, e.g. a phone with raise to wake, step tracker, and which silence incoming calls when they detect unusually stressful manouevres in a car.

- JSLinux and v86 browser-based virtual machines. I know the tech of running a VM isn't particularly knew but the ability to boot a Linux/Windows VM with one click in a browser almost everyone has without needing a cloud/container instance behind it feels like it will bloom into a lot of new things over time.

- The first app to run on smartphones which did OCR of the camera, translation of the text, and live overlay of the translated text back on the picture on screen. I forget its name now, and now it's builtin to cameraphones.

- Going back about 15 years, but when content aware scaling came in - https://en.wikipedia.org/wiki/Seam_carving

- Something else which is scale related but not ML, when a cloud storage program like DropBox can hash a file on your local machine, send the hash to their servers, notice that someone else has already uploaded that file, and tag it into your account so you can 'upload' non-unique files without the time or bandwidth of actually uploading them.

- When DropBox live recompress JPGs using a lossless compression to save tens of petabytes of storage, then decompress them back into JPG as people access them. https://blog.acolyer.org/2017/05/01/the-design-implementatio...

- Internet traffic highjacking by 'bitsquatting' domain names which are one memory-bit-flipped-bit away from the correct name, and then using the incoming traffic to estimate the global amount of flakey memory and cosmic ray events happening: https://www.google.com/search?hl=en&q=squatting%20memory%20b...

- Deepfakes; they appeared as hype and then faded from hype, but the ability to synthesize another person's facial appearanace and vocal mannerisms on someone else is impressive.

- Drone FPS flying with VR headsets. Coordinated light-drone displays in the sky instead of fireworks.

- ML related, but iPhone accessibility features can describe pictures, live while using the camera ( https://youtu.be/UnoeaUpHKxY?t=39 ) or in the photos app or on websites. Or apps like Audible Vision (e.g. https://youtu.be/QiEKMTTwTZg?t=377 ).

- ML related, Stable Diffusion, being able to generate visual scenes from text descriptions.

> "Now, ML is a fresh breeze but it is lacking the "garage-ability" that computing has been famous for; good luck competing with the bigshots on that field."

Doug Miles is trying with LogicMoo ( https://www.youtube.com/watch?v=sdG6GVCwJrw ), trying to build an AI that learns inside a virtual world instead of using big data and ML techniques.

generationP
Never heard of seam carving before. Not sure if it's used anywhere, but what a fun idea it is!

Lepton is interesting (as is brotli), although it doesn't quite rise to the level of "opening gates to new worlds" that the innovations around 2000 did. Virtual machines were funny in that they implemented an idea from mid-century mathematical logic, but VMWare was founded in 1998 and cygwin (not quite virtual machines but similar) is even older.

Oh yeah, lots of innovation in spam, social engineering and trolling, but that's not computing per se :)

Well aware of the march of big data and "quantity becoming quality"-based services over the last 20 years. But Google Earth and Wikipedia started in 2001, and everything that came thereafter would mostly be less exciting and more closed-down. OpenStreetMap deserves a mention, not for innovation but for stealing the fire from the gods. Windy.com is a fresh breeze, too; good point.

Never found DropBox exciting. Even git, which I love and use for 3 different purposes every day, just doesn't feel particularly novel. Maybe that's because torrenting (with all its deduplication, hash-indexing and various other innovations) had set my expectations so high long ago that everything that came after looked like the Dark Ages.

Drones... now these are some new grounds. In hindsight, I feel stupid forgetting them in my comment above!

May 14, 2021 · 172 points, 47 comments · submitted by colinprince
LeoPanthera
If you get stuck in VoiceOver, you can ask Siri to "turn off VoiceOver". The touch gestures are completely different to normal iPhone usage, with it turned on.

Quick guide: Single-tap to highlight a widget, double-tap to click, swipe with three fingers to scroll (after first highlighting the part of the screen you want to scroll). To go home, swipe up from the bottom and hold until it vibrates.

cbovis
Turning on voice over to investigate this has to have been one of my worst decisions in life. Please do not repeat.
irq
Here's the cheat sheet to gestures you have to use to control your phone once it's turned on, which I guess was your problem. https://support.apple.com/en-au/guide/iphone/iph3e2e2329/ios
sarsway
Haha I remember having to test apps, and being completely confused by it in the beginning too. First thing I did was add VoiceOver toggle to Control Center.

Once you get used to it, it's actually kinda amazing. Apple has put so much effort into iOS to make this work well, and there are a ton of features packed into VoiceOver mode.

Most have no idea this even exists, if you're bored you should play around with it a bit more.

mimischi
The YouTube channel by Kristy Viers has been helpful in understanding how they use an iPhone and other devices. Here’s a video on VoiceOver in the Camera app: https://youtu.be/8CAafjodkyE
judge2020
That’s amazing, I wonder if it constantly doing ML classification is more draining on the battery.
KMnO4
Probably balanced by the fact that you can turn the screen off completely when navigating with VO.

https://support.apple.com/en-ca/HT201443

jayd16
Word of warning, if you lock your phone and you're not familiar with the voice over input it can be really annoying to unlock your phone...Seems to turn off face id as well?

You can ask Siri to turn off voice over and all is well.

qzervaas
My phone is set to use the “accessibility shortcut” function, with is mapped to triple-click side button toggles VoiceOver.

Right down the bottom of Settings > Accessibility.

Edit: you can also enable screen curtain in iOS (and watchOS!), which means your phone can operate with the screen completely off when in VoiceOver.

Good for privacy / battery!

jereees
If you have tagged people in photos from your camera roll it will also say “maybe (person name)” when it sees a match in the viewfinder. Pretty neat.
rconti
The cautions not to use it in navigation or circumstances where you might be harmed ("yellow car driving directly at you at a high rate of speed!") or diagnoses of medical conditions ("might want to get that wart looked at, dude") made me laugh.
qwertox
Also pretty cool for learning languages?

If it would also show the written text of what it said, faded in as soon as the word is spoken, overlaid on the object it describes, that would help a lot.

tonylemesmer
Makes the phone virtually unusable if you’re not familiar with voiceover gestures
Operyl
You can ask Siri to turn off Voiceover thankfully, can turn it on and off inside of the Camera app this way.
BoorishBears
On my phone I get a pretty stern warning about that before enabling it
irq
Gestures are listed here https://support.apple.com/en-au/guide/iphone/iph3e2e2329/ios
ushkarev
Microsoft Seeing AI has similar features; might be of interest to visually impaired people as it’s made more specifically for identifying objects or reading documents
krackers
Sound recognition is also another very cool accessibility feature: https://www.loopinsight.com/2021/05/11/apple-support-how-to-...

Though I'm not sure if it comes at the cost of increased battery usage.

callalex
In my anecdotal experience it makes no difference on my 11 Pro. It presumably uses the same hardware that’s already awake all the time listening for “Hey Siri”.
etaioinshrdlu
It seems to basically be using an image captioning model like this one: https://deepai.org/machine-learning-model/neuraltalk , but possibly trained on more data? I wonder what dataset they’ve used, or if they created it themselves and won’t share it.
simongr3dal
I have the newest iOS update on an iPhone 7, but I’m not seeing the VoiceOver Recognition menu item.

Are there any special requirements? There’s no mention of anything in the video description on YouTube or Apple’s own support site.

gnicholas
Is this powered by the same technology that comes up with results for queries like, “hey Siri, show me pictures of hamburgers”?

I have actually found this to be an effective way to find old photos on my phone.

Lammy
Neat. Is my phone classifying all this stuff, or is it sending all my photos to some Apple server to do it? I watched the video in the OP but it didn't go into any technical details.
spullara
I don't know of any ML stuff that Apple does serverside. That is why they have invested so much in ML processing on their devices.
user-the-name
I guess Siri does work at least partially serverside?
spullara
The ML part is local for the voice transcription but the questions almost always need to be answered from the server. But if you ask to play music or set a timer for example that is all local. Also works in Airplane mode.
propogandist
yet they can't manage offline Maps and directions without internet connectivity.
yazaddaruvala
All ML on device definitely seems like a goal for Apple[1]. However, right now Siri doesn't work in airplane mode.

Maybe its still too power hungry to run locally? Maybe they still need the data from the requests server side to improve the model?

I hope both Siri and Maps, and especially their integration, i.e. "Siri navigate to ____." "Siri find restaurants nearby (or along the way)" works on device in the near future. Its annoying for the CarPlay integration to stop working when you're in a remote area and need the navigational help.

[1] https://appleinsider.com/articles/18/11/15/apple-considering...

alblue
Siri’s voice translation works in your phone but the answers it needs are all calculated from the server (on the whole). Maps, for example, are all stored server side and route calculation are server side processes as well, as are the locations of restaurants.

There’s a difference been decoding voice requests and being able to answer any questions you may have; you wouldn’t expect Google web search or Wikipedia to work in offline mode, so why do you expect it to work for Siri?

yazaddaruvala
I don’t expect it to work :)

I expect that it will one day soon work offline. Likely not a full web search.

By 202X, while offline my device should have:

1. Enough storage for voice to text models, translation models, local Maps data auto-downloaded when I have reception (including restaurants, reviews, image thumbnails, etc), Wikipedia, wallets (car keys, credit cards, concert/movie tickets, email, daily news, music/podcasts/YouTube that I’ve subscribed to. All of this should be automatically synced when I have reception (podcasts and music already do es this).

2. Enough compute to translate voice to text (and between languages), set a simple timer/alarm, unlock my car, search through Wikipedia, search for restaurants, directions between points on the map (without using traffic data), open an App or play downloaded music/podcasts/YouTube (including the option to chrome-cast without wifi - using BlueTooth (or similar with high bandwidth), NFC/UWB for simple pairing).

3. Auto-upgrade to use the network when it’s available, like use current traffic data for directions, hi-res images/audio/video.

Bonus: When offline, Apps like Spotify, Gmail, etc should also run some local search options for downloaded data.

Bonus: When offline/online, restaurants should have NFC/UWB servers for their menu. My iPhone should just “download” the menu from the POS system. No need for “signal” or wifi.

spullara
1. Some are possible now: https://apps.apple.com/us/app/minipedia-offline-wikipedia/id.... 2. Airplay doesn't require a wifi network. Just wifi on, it uses the same thing as Airdrop.

Bonus 1: you can search your offline library in spotify. Bonus 2: this used to be common back in the 00s but it was generally decided to be mostly spam. Bonus: local search for downloaded spotify stuff works without a connection.

machello13
It's your phone.
banana_giraffe
It runs on the phone, no doubt using whatever model they use to enable the keyword searching in the camera-roll. Just verified it all works in airplane mode.
cochne
I just tried this as it was really bad. Kept saying everything was a picture of the night sky, only one of many many descriptions was correct.
cknoxrun
Interesting, I pointed it at my son and it immediately accurately described "a child sitting at a white table doing a jigsaw puzzle". I pointed at one of his Lego creations, and immediately "a multi-colour lego construction sitting on a wooden shelf". I was blown away.
supernova87a
Hmm, I wonder the first thing that most people are gonna try to have the image recognition recognize? Or at least 50% of the population...
Waterluvian
I tried to get Voice Over working but scroll broke once enabled. It says three fingers to scroll but that just won't work for me.
domoritz
Put three fingers next to each other on the screen and move them up to scroll down.
Waterluvian
Tried this over and over again. No luck. Weird!
LeoPanthera
You have to tap the part of the screen you want to scroll, first.
user-the-name
VoiceOver is a completely different UI. You need to learn how it works before attempting to use it.
Waterluvian
Turns out iOS was broken and a reboot made the three finger scrolling work.
haddr
It looks like it doesn't work with non-english languages.
olliej
Well that sucks in this day and age.

Does VO in general not work for non-English?

dang
Url changed from https://www.loopinsight.com/2021/05/13/apple-support-how-to-..., which points to this.
LeoPanthera
The descriptions are shockingly detailed. I wish there was a way you could add these descriptions to the metadata of all the photos in your library, it would be an amazing way to add rich searchable data.

In particular, asking Siri to "show me photos of <x>" where x is a thing that the photo descriptions does recognize, doesn't always work.

ericholscher
Just so you know, you can search the photos libraries with all sorts of ML-generated metadata (eg mountains).
LeoPanthera
Yes but the things that the image descriptions recognizes is a far longer and more detailed list than the searchable objects.
mapkkk
What blew me away is that when I used the front facing camera it recognized that it was my face as well. I presume it would recognize all the faces it has saved in the Photos app.
Waterluvian
I do this all the time in google photos. It's so #%^^ing incredible.

"Waterloo patios with friendname" - me looking for when I last saw that person remembering it was on a patio.

"Driveway or sidewalk"

"cottage docks"

"receipts"

"Memes with Ted Cruz fleeing to Mexico"

smoldesu
Same here. Blew me away when I realized you could do it a couple years ago!
HN Theater is an independent project and is not operated by Y Combinator or any of the video hosting platforms linked to on this site.
~ yaj@
;laksdfhjdhksalkfj more things
yahnd.com ~ Privacy Policy ~
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.