Developer preview of TensorFlow Lite

developers.googleblog.com

358 points by runesoerensen 8 years ago

TensorFlow Lite is TensorFlow’s lightweight solution for mobile and embedded devices! TensorFlow has always run on many platforms, from racks of servers to tiny devices, but as the adoption of machine learning models has grown over the last few years, so has the need to deploy them on mobile and embedded devices. TensorFlow Lite enables low-latency inference of on-device machine learning models.

Looking forward to your feedback as you try it out.

linuxkerneldev 8 years ago

> Looking forward to your feedback as you try it out.
Thanks Rajat. We use typical Cortex-A9/A7 SoCs running plain Linux rather than Android. We would use it for inference.
1. Platform choice
Why make TFL Android/iOS only? TF works on plain Linux. TFL even uses NDK and it would appear the inference part could work on plain Linux.
2. Performance
I did not find any info on performance of TensorFlow Lite. Mainly interested in inference performance. The tag "low-latency inference" catches my eye, just want to know how low is low latency here? milliseconds?
- rajatmonga 8 years ago
  
  1. The code is standard C/C++ with minimal dependencies so it should be buildable on even non-standard platforms. Linux is easy.
  2. The interpreter is more optimized for being low overhead and the kernels are better optimized especially for ARM CPUs currently. While model performance varies by model - we have seen significant improvements on most models going from TensorFlow to TensorFlow Lite. We'll share benchmarks soon.
  
  linuxkerneldev 8 years ago
  
  > The code is standard C/C++ with minimal dependencies so it should be buildable on even non-standard platforms. Linux is easy.
  Glad to hear that Rajat. Since it is easy as you say, I look forward to your upcoming release with Linux as standard. :-)
- jonnycowboy 8 years ago
  
  Also interested in answers to these two questions, as well as OpenCL performance in vanilla linux (iMX6 and above).
azinman2 8 years ago

Will CoreML (or any hardware acceleration) on iOS be supported?
- sarasuec 8 years ago
  
  We want to provide a great experience across all our supported platforms, and are exploring ways to provide a simpler experience with good acceleration on iOS as well.
zitterbewegung 8 years ago

Woah this is cool. I’ve been waiting for this since you announced it. I was thinking about benchmarking it against other solutions . What do you think about other similar frameworks like coreml ?
blackguardx 8 years ago

What tradeoffs did you make compared to the original?
- rajatmonga 8 years ago
  
  A few tradeoffs we had to make:
  - As mentioned below - flatbuffers makes the startup time faster while trading off some flexibility
  - Smaller code size means trading off dependency on some libraries and broader support vs writing more things from scratch more focused on the user cases people care about
  
  om42 8 years ago
  
  Do you have performance/memory comparisons from using flatbuffer vs protobuf in TF? A quick writeup with how switching effected performance would be really interesting :)
  
  puzzle 8 years ago
  
  Flatbuffers also uses less memory.
- puzzle 8 years ago
  
  Using FlatBuffers, for one?
sandGorgon 8 years ago

Hi, I have written about this before ( https://news.ycombinator.com/item?id=15595689 ) , but are there serialization fixes between cloud training and mobile ?
We have had huge issues in trying to figure out how to save models (freeze graph,etc) and load it on Android. If you look at my previous thread - it also mentions bugs,threads and support requests where people are consistently confused.
- rajatmonga 8 years ago
  
  Agree, that is a big problem that we are working hard to solve. It isn't solved in this release, but it is high up on our task list.
  
  sandGorgon 8 years ago
  
  hey, thanks for the reply!
  petewarden (https://news.ycombinator.com/item?id=15596990) from Google is also working on this - so im really hopeful you guys will have something soon. This is a serious blocker for doing anything reasonable in TF.
bitmapbrother 8 years ago

Will this leverage the Pixel Visual Core SoC on a Pixel 2 device?
- sarasuec 8 years ago
  
  This release of Tensorflow Lite doesn't leverage the Pixel Visual Core. We will explore different hardware options available to us in the future.
- rajatmonga 8 years ago
  
  TF Lite supports Android NN API that allows each phone to accelerate these models leveraging the custom accelerator on the phone.
lern_too_spel 8 years ago

What about using XLA to compile libraries for mobile deployment fusing only the operations needed by the model?
- petewarden 8 years ago
  
  One nice thing about Lite is that it's a lot easier to just include the operations you need (compared to TensorFlow 'classic'), there's fusion for common patterns, and the base interpreter is only 70KB. That covers a lot of the advantages of using XLA for mobile apps. In return you have the ability to load models separately from the code, and the ops are hand-optimized for ARM.
  I'm still a fan of XLA, and I expect the two will grow closer over time, but I think Lite is better for a lot of scenarios on mobile.
  
  restricted_ptr 8 years ago
  
  How about quantization? Does tensorflow lite perform quantization or is it tensorflow supposed to do it? Is it iterative process or straightforward? Or are you training quantized models as nn api docs say?
  
  infnorm 8 years ago
  
  The quantization is done with a special training script that is quantization aware. We will be open sourcing a mobilenet quantized training script to show how to do this soon.
- infnorm 8 years ago
  
  TensorFlow Lite is an interpreter in contrast with XLA which is a compiler. The advantage of TensorFlow lite is that a single interpreter can handle several models rather than needing specialized code for each model and each target platform. TensorFlow Lite’s core kernels have also been hand-optimized for common machine learning patterns. The advantage of compiler approaches is fusing many operations to reduce memory bandwidth (and thus speed). TensorFlow lite fuses many common patterns in the TensorFlow converter. We are of course excited about the possibility of using JIT techniques and using XLA technology within the TensorFlow Lite interpreter or as part of the TensorFlow Lite converter as a possible future direction.
dangjc 8 years ago

Is it lite enough to compile with Emscripten and use via WebAssembly?
- infnorm 8 years ago
  
  This should be possible, but we haven't tried it. We're likely going to add a simplified target that has minimal dependencies (like no Eigen) that allows building on simple platforms.
  
  dangjc 8 years ago
  
  Cool. I have something else that uses Eigen in WebAssembly, so that hasn't caused any issues btw.

pjmlp 8 years ago

So it uses Bazel on Android....

Google devs, could you please get yourself together in one room and agree in ONE BUILD SYSTEM for Android?!?

Gradle stable, cmake, ndk-build, Gradle unstable plugin, GN, Bazel, ..., whatever someone else does with their 20%.

I keep collecting build systems just to build Gooogle stuff for Android.

d4l3k 8 years ago

It might be a tad annoying, but the rest of Tensorflow uses Bazel so it makes sense that Tensorflow Lite also uses it. It also probably matches the internal Google workflow better since Google uses Blaze internally.
- pjmlp 8 years ago
  
  I thought it was to be used outside Google, not that we have to learn every single build system they happen to use inside.
  
  solipsism 8 years ago
  
  If all Google teams who make use of external build systems were going to agree on one (not likely), it would be Bazel.

m3kw9 8 years ago

Why would I use this for iOS when I can use CoreML and convert TensorFlow into a CoreML model where there is already native support?

prodtorok 8 years ago

Native support for tensorflow? I don’t think so...
https://developer.apple.com/documentation/coreml/converting_...
dr1337 8 years ago

CoreML doesn't actually support Tensorflow. It's support for Tensorflow is only through Keras which is fine if you just want to build stock standard models but if you're doing crazy research implementations then that's not going to work.
- m3kw9 8 years ago
  
  Is all in the converter tool, if the converter tool can get the tf file into a .mlmodel properly, then it will be supported. Inside is just a bunch of weights and layers and parameters. We just need a proper script to translate it
  
  dgacmu 8 years ago
  
  "just a bunch of weights and layers and parameters" -- I think you and the GP are agreeing. That's the definition of standard: If the model can be expressed using the currently-blessed set of layer definitions in CoreML, then yes. But if you're doing nonstandard stuff with weird control flow behavior, or RNNs that don't map into some of the common flavors, then all bets are off.
  An example: Some of my colleagues put a QP solver in tandem with a DNN, so that the neural network could 'shell out' to the solver as part of its learning, and learned to solve small sudoku problems from examples alone: https://arxiv.org/abs/1703.00443 The pytorch code for it is one of the examples I like to use as a stress-test for doing funky things in the machine learning context.
  TensorFlow is a very generic dataflow library at its heart - which happens to have a lot of DNN-specific functionality as ops. It's possible to express arbitrary computations in it, whereas CoreML and and similar frameworks make more assumptions that the computation will fit a particular mould, and optimize it thereby.
  
  m3kw9 8 years ago
  
  Looks like you are right, CoreML only support these 3 DNNs: Feedforward, convolutional, recurrent. I suppose capsule nets are not any one of those, if it were implemented in TF

mtgx 8 years ago

How would this differ from uTensor? Did they make uTensor redundant?

https://github.com/neil-tan/uTensor

infnorm 8 years ago

We developed TensorFlow lite to be small enough to target really small devices that lack MMU’s like the ARM Cortex M MCU series, but we haven’t done the actual work to target those devices. That being said, we are excited when the ecosystem and community around machine learning expands.
- ianhowson 8 years ago
  
  Cortex-M compatibility was literally my first thought when I read this -- especially low-memory systems. Might have to hack it up myself.

barbolo 8 years ago

Would that be a viable option to deploy TensorFlow models on serverless environments (Lambda, Functions)?

rasmi 8 years ago

You can deploy TensorFlow model binaries as serverless APIs on Google Cloud ML Engine [1]. But I would also be interested in seeing a TensorFlow Lite implementation.
[1] https://cloud.google.com/ml-engine/docs/deploying-models
Disclaimer: I work for Google Cloud.
- barbolo 8 years ago
  
  Thanks, @rasmi. I have a feedback for you guys. The pricing for predictions inference in GCP is not very fair. If I deploy a small model (like a SqueezeNet or Mobilenet) I pay almost the same price of someone deploying large models (like Resnet or VGG). That’s why I’m deploying my models on serverless environments and paying about 5 dollars for 1 million inferences.
  The pricing of GCP is: $0.10 per thousand predictions, plus $0.40 per hour. That’s more than 100 dollars for 1 million inferences.
  
  rasmi 8 years ago
  
  I see what you mean. To some companies, ML Engine's cost as a managed service may be worth it. To others, spinning up a VM with TensorFlow Serving on it is worth the cost savings. If you've taken other approaches to serving TensorFlow models to get around ML Engine's per-prediction cost, I'm curious to hear about them.
infnorm 8 years ago

The main TensorFlow interpreter provides a lot of functionality for larger machines like servers (e.g. Desktop GPU support and distributed support). Of course, TensorFlow lite does run on standard PCs and servers, so using it on non-mobile/small devices is possible. If you wanted to create a very small microservice, TensorFlow lite would likely work, and we’d love to hear about your experiences, if you try this.
- barbolo 8 years ago
  
  Thanks for the answer. Currently I’m using AWS Lambda to deploy my TensorFlow models. But it’s pretty hard and hacky. I need to remove a considerable portion of the code base that is not needed for inference only routines. I do that so the code loads faster and to fit the deployment package size limit. If TensorFlow Lite is already a compressed code, then it may be much easier to deploy it to a serverless environment. I’ll be trying it in my next deployments.
  
  infnorm 8 years ago
  
  Sounds really interested. We're excited to hear about how that goes.

MBCook 8 years ago

Is it possible that a future version may be able to leverage CoreML on iOS?

rajatmonga 8 years ago

With TensorFlow and TF Lite we are looking to provide a great experience across all platforms, and are exploring ways to provide a simpler experience with good acceleration on iOS as well.
m3kw9 8 years ago

Someone can just write a CoreML tool to convert. There could already be one that converts non-lite version straight to CoreML

therealmarv 8 years ago

But on iOS we still cannot use swift with that, see https://github.com/tensorflow/tensorflow/issues/19 ?! Btw. what about Kotlin?

UPDATE: It seems some third party developer have developed some swift compatible APIs.

nightsd01 8 years ago

On iOS, does TensorFlow Lite utilize the GPU for inference when needed or is it CPU only?

If so, does it use OpenCL or something?

aprao 8 years ago

Is this the next iteration of TensorFlow for Mobile? Is on-device training something planned for the future?

runesoerensen 8 years ago

Yes to your first question, from the article: ”As you may know, TensorFlow already supports mobile and embedded deployment of models through the TensorFlow Mobile API. Going forward, TensorFlow Lite should be seen as the evolution of TensorFlow Mobile, and as it matures it will become the recommended solution for deploying models on mobile and embedded devices.”
Also check out this post for more info and examples: https://research.googleblog.com/2017/11/on-device-conversati...

ausjke 8 years ago

This is pretty Android/iPhone-only, wish it can be more flexible to be used on other edge devices such as home routers or other embedded products where resource is constrained.

rajatmonga 8 years ago

The current examples talk about Android/iPhone, however the core runtime is pretty lightweight with the goal of supporting all kinds of embedded products.
Do let us know if you build/run on other platforms.

qhwudbebd 8 years ago

I was hoping this link might be to a version of TensorFlow that sheds the heavyweight java dependency for building. Sadly not; still bazel-infested.

tadeegan 8 years ago

How does this compare to using XLA for AOT compilation?

rajatmonga 8 years ago

XLA for AOT is useful for cases when you know exactly what architecture you are shipping to, and are ok updating the code whenever the model changes.
TF Lite addresses the segment where you need more flexibility
- you ship single app to many types of devices
- would like to update the model independent of the code itself e.g. no change to Android APK, and update the model over the wire.
Even with this generality, TF Lite is still quite fast and lightweight as that was the focus building it up.

ralphc 8 years ago

How do I know which handsets or tablets have "New hardware specific to neural networks processing" for the NNAPI?

thepoet 8 years ago

Is the Lite convertor also doing some sort of quantization or is it purely for file format conversion?

d4l3k 8 years ago

Tensorflow has supported quantization for a long time (and is recommended for mobile devices) so it very likely is.
- infnorm 8 years ago
  
  Quantization comes in many different forms. TensorFlow lite provides optimized kernels for 8-bit uint quantization. This specific form of evaluation is not directly supported in TensorFlow right now (though it can train such a model). We will be releasing training scripts that show how to setup such models for evaluation.

amq 8 years ago

What are the minimum requirements? Would something like ARM M4F with 72 MHz and 512 KB RAM work?

kau_mad 8 years ago

I would like to how small an Inception-V3 model becomes when converted into .tflite format.

cyberpunk0 8 years ago

Didnt they announce this at Google I/O? Where it was supposed to be available that day

theDoug 8 years ago

Definitely announced at I/O, but all the language I'm finding from around that time is of the "want to" and "will" variety, like this Wired piece:
https://www.wired.com/2017/05/google-really-wants-put-ai-poc...
> “Google won't say much more about this new project. But it has revealed that TensorFlow Lite will be part of the primary TensorFlow open source project later this year”

1_over_n 8 years ago

how does this relate to other hardware beyond iOS / Javascript i.e. raspberry pi, nvidia jetson etc andddddd.........whats the likelihood of libraries that sit on top of TF supporting this like keras and pytorch. Just some questions that spring to my mind

amelius 8 years ago

I'm wondering if TF has something like pytorch's autograd. Does anyone know?

igorbark 8 years ago

I only just briefly read the doc for autograd, but automatic differentiation is the strong default in TF if that's what you're asking.
- rajatmonga 8 years ago
  
  Yes, it does have auto differentiation from day one. There's also a new autograd like functional API as part of eager. See https://research.googleblog.com/2017/10/eager-execution-impe...

piratebroadcast 8 years ago

Any React Native APIs?

infnorm 8 years ago

Not at this time. However, in principle it would be possible to create such bindings.

HumanDrivenDev 8 years ago

It's spelled "light" - you'd think google could hire an editor.

fiatjaf 8 years ago

So we'll start to see more and more battery-consuming "AI" apps in mobile devices?

dgacmu 8 years ago

And we'll start to see more battery-efficient hardware to run those apps without consuming all of your battery. :)
(I'm saying that glibly, but I'm dead serious -- look at what we've seen emerge just this year in Apple's Neural Engine, the Pixel Visual Core, rumored chips from Qualcomm, and the Movidius Myriad 2. The datacenter was the first place to get dedicated DNN accelerators in the form of Google's TPU, but the phones -- and even smaller devices, like the "clips" camera -- are the clear next spot. And this is why, for example, TensorFlow Lite can call into the Android DNNAPI to take advantage of local accelerators as they evolve.
Being able to run locally, if battery life is preserved, is a huge win in latency, privacy, potentially bandwidth, etc. It'll be good, though it does need advances in both the HW and the DNN techniques (things like Mobilenet, but we need far more).
- fiatjaf 8 years ago
  
  Thank you.
bluetwo 8 years ago

Yes.