您好,欢迎光临本网站![请登录][注册会员]  
文件名称: TensorFlow.js Machine Learning for the Web and Beyond.pdf
  所属分类: 机器学习
  开发工具:
  文件大小: 568kb
  下载次数: 0
  上传时间: 2019-07-15
  提 供 者: nichol******
 详细说明:TensorFlow.js, Google 提供的基于TensorFlow的JavaScript库。方便使用JS的开发者使用,并且可以为未来的边缘计算提供支持。TensorFlow. js: Machine Learning for the Web and beyond acceleration, notably TensorFire(Kwok et al., 2017), Propel Layers APl, which provides higher-level model buildin (built on top of Tensor Flow. js)(Dahl, 2017) and Keras.js blocks and best practices with emphasis on neural networks (Chen, 2016), however they are no longer actively main- The Layers APl is modeled after the tf keras namespace in ained Tensor Flow Python, which is based on the widely adopted Webdnn (Hidaka et al, 2017)is another deep learning li- Keras API(Chollet et al., 2015) brary in Js that can execute pretrained models developed in TensorFlow, Keras, Py Torch, Chainer and Caffe. To acceler API ate computation, Webdnn uses WebGPu (Jackson, 2017) a technology initially proposed by Apple. WebGPU is in Layers AP an early exploratory stage and currently only supported in Safari Technology Preview, an experimental version of the Ops API(Eager) Safari browser As a fallback for other browsers WebDnn uses WebAssembly(Haas et al., 2017), which enables exe Browser Node js cution of compiled c and c++ code directly in the browser While Web assembly has support across all major browsers, WebCL TE CPI TE GPU TE TPU t lacks SIMD instructions, a crucial component needed to Runtime make it as performant as WebGL and webGPu Figure 1. Overview of the Tensor Flow. js architecture 3 DESIGN AND API The goals of Tensor Flow. js differ from other popular ML Tensor Flow. js is designed to run in-browser and server-side libraries in a few important ways. Most notably, Tensor as shown in Figure l. When running inside the browser, it Flow. js was designed to bring Ml to the JS ecosystem, utilizes the gpu of the device via Webgl to enable fast empowering a diverse group of js developers with limited parallelized floating point computation. In Node. jS, Ten or no Ml experience(Anonymous, 2018). At the same time, sor Flow js binds to the Tensor Flow C library, enabling full we wanted to enable experienced ML users and teaching access to TensorFlow. Tensor Flow. js also provides a slower enthusiasts to easily migrate their work to JS, which neces- CPU implementation as a fallback(omitted in the figure stated wide functionality and an API that spans multiple for simplicity), implemented in plain JS. This fallback can levels of abstraction. These two goals are often in conflict run in any execution environment and is automatically used requiring a fine balance between ease-of-use and function- when the environment has no access to WebGL or the Ten ality. Lastly, as a new library with a growing user base, sorFlow binary missing functionality was prioritized over performance These goals differ from popular deep learning libraries 3.2 Layers API (Abadi et al., 2016: Paszke et al., 2017), where performance Beginners and others who are not interested in the operation- is usually the number one goal, as well as other JS ML li level details of their model might find the low-level oper- braries(see Section 2.3), whose focus is on simplicity over ations API complex and error prone. The widely adopted completeness of functionality. For example, a major differ Keras library( Chollet et al., 2015), on the other hand, pro entiator of TensorFlow js is the ability to author and train vides higher-level building blocks with emphasis on deep models directly in JS, rather than simply being an execution environment for models authored in Pyth learning. With its carefully thought out APl, Keras is pop- ular among deep learning beginners and applied ml prac titioners. At the heart of the api is the concept of a model 3.1 Overview and layers. Users can build a model by assembling a set of The APi of TensorFlow is is largely modeled after Ten- PI ore-defined layers, where each layer has reasonable default sorFlow, with a few exceptions that are specific to the Js parameters to reduce cognitive load environment. Like TensorFlow, the core data structure is the For these reasons, TensorFlow js provides the Layers aPI Tensor. The TenisorFlowjs API provides methods to create which mirrors the Keras api as closely as possible, in tensors from js arrays as well as mathematical functions cluding the serialization format. This enables a two-way that operate on tensors door between Keras and TensorFlow jS: users can load a Figure 1 shows a high level schematic view of the architec pretrained Keras model(see Section 5. I)in TensorFlow js ture. TensorFlow js consists of two sets of APIs: the Ops modify it, serialize it, and load it back in Keras python API which provides lower-level linear algebra operations Listing 1 shows an example of training a model using the (e.g. matrix multiplication, tensor addition, etc. ) and the Layers aPl TensorFlow.js: Machine Learning for the Web and beyond // A linear rccel with 1 dense layer are graph-based and eager Graph-based engines provide an const model= tf sequential(); API to construct a computation graph and execute it later model. add(tf layers. dense( When computing gradients, the engine statically analyzes units: 1, input Shape: [1] the graph to create an additional gradient computation graph This approach is better for performance and lends itself // Specify the loss and the optimizer easily to serialization model. compile(t loss: meanScuaredError' Eager differentiation engines, on the other hand take a dif- optimizer:/sad' ferent approach(Paszke et al., 2017; Abadi et al., 2016; }); Maclaurin et al., 2015). In eager mode, the computation happens immediately when an operation is called, making Generate synthetic data to train it easier to inspect results by printing or using a debugger const xs= another benefit is that all the functionality of the host lan tf tensor 2c([1 3;4],[4,1]); const. ys guage is available while your model is executing; users can t. tensors([1,3,5;7],[4,1]) use native if and while loops instead of specialized control flow A Pis that are hard to use and produce convoluted stack // Train the rodel using the data traces nodel riL(xs, ys). then(()=> i // Do inference on an unseen data point Due to these advantages, eager-style differentiation engines and print the result like TensorFlow Eager(Shankar Dobson, 2017)and Py- const x = tf. tensor2d([5],[l, 1])i model predict(x).print()i Torch(Paszke et al., 2017), are rapidly gaining popularity Since an important part of our design goals is to prioritize ease-of-use over performance, TensorFlow js supports the Listing /. An example Tensor Flow js program that shows how to eager style of differentiation build a single-layer linear model with the layers API, train it with synthetic data, and make a prediction on an unseen data point 3.6 Asynchronous execution 3.3 Operations and Kernels JS runs in a single thread, shared with tasks like page lay out and event handling. This means that long-running JS As in TensorFlow, an operation represents an abstract com- functions can cause page slowdowns or delays for handling putation(e.g. matrix multiplication) that is independent of events. To mitigate this issue, jS users rely on event call- the physical device it runs on. Operations call into kernels, backs and promises, essential components of the modern JS which are device-specific implementations of mathematical language. A prominent example is Node js which relies on functions which we go over in Section 4. asynchronous 1/O and event-driven programming, allowing the development of high-performance, concurrent programs 3.4 Backends However, callbacks and asynchronous functions can lead To support device-specific kernel implementations, Tensor- to complex code. In service of our design goal to provide Flow.js has a concept of a backend. a backend implements intuitive APIs, TensorFlow js aims to balance the simplicity kernels as well as methods such as read( and write( which of synchronous functions with the benefits of asynchronous are used to store the TypedArray that backs the tensor. Ten- functions. For example, operations like tf matMul() are pur sors are decoupled from the data that backs them, so that posefully synchronous and return a tensor whose data might operations like reshape and clone are effectively free. This not be computed yet. This allows users to write regular syn is achieved by making shallow copies of tensors that point to chronous code that is easy to debug. When the user needs to the same data container(the TypedArray). When a tensor is retrieve the data that is backing a tensor, we provide an asyn- disposed, we decrease the reference count to the underlying chronous tensor data() function which returns a promise that data container and when there are no remaining references, resolves when the operation is finished. Therefore, the use we dispose the data container itself of asynchronous code can be localized to a single datal call Users also have the option to call tensor data Sync(, which is 3.5 Automatic differentiation a blocking call. Figures 2 and 3 illustrate the timelines in the browser when calling tensor: dataSync( and tensor data() Since wide functionality was one of our primary design respectively goals, TensorFlow jS supports automatic differentiation, pro- viding an API to train a model and to compute gradients The two most common sty les of automatic differentiation TensorFlow.js: Machine Learning for the Web and beyond :Pu matmul add rel dataSync 3.8 Debugging and profiling TensorFlow. js provides a rich set of debugging tools to help GFU matmul add relu read Pixel developers understand common problems with performance and numerical stability, accessible either via a URL change or a feature Hag. Users can profile every kernel that gets called, seeing the output shape, memory footprint, as well Figure 2. The timeline of a synchronous and blocking as device-specific timing information. In this mode, every sordataSync( in the browser. The main thread blocks until GPU is done executing the operations tensor gets downloaded from the GPU and is checked for NaNs, throwing an exception at the first line a nan is in- troduced, showing model developers which operation is the source of the numerical instability PU matmul add relu data Idle! Respond to events, update UL.resolved data Tensor Flow. js also provides tf time(f) for timing a function that calls Tensor Flow. js operations. When calling tf time(f GFU matmul add relu readPixels the function f will be executed and timed. Each backend is responsible for timing functions, as timing may be device specific. For example, the WebGL backend measures the exact GPU time, excluding time for uploading and down- Figure 3. The timeline of an asynchronous call to datao in the loading the data browser. The main thread is released while the gpu is executing the operations and the datao promise resolves when the tensor is A more generic API,tfprofile(f), similarly takes a function ready and downloaded f and returns an object representing the functions effect on memory. The object contains the number of newly allo cated tensors and bytes created by executing the function well as the peak tense d allocated inside the 3.7 Memory management function Understanding peak memory usage is especially important when running on devices with limited memory Js provides automatic garbage collection. However, in the such as mobile phones browser WebGL memory is not automatically garbage col- lected. Because of this, and the lack of finalization, we 3.9 pe erformance expose an API for all backends to explicitly manage mem While performance was not the single most important goal mory allocated by a tensor, users can call it was critical in enabling real-world ML in JS. In the browser, Tensor Flow s utilizes the gpu using the WebGL tensor. dispose(. This approach is relatively straightforward API to parallelize ce By using webgl for nt but the user has to have a reference to all tensor objects so merical computation, we were able to achieve 2 orders of they can be disposed. Often models are written as chained magnitude speedup, which is what fundamentally enabled blocks of operations, so breaking up the chains for disposal running real-world ML models in the browser. On the server can be cumbersome Since tensors are immutable and opera side, Tensor Flow. js binds directly to the TensorFlow C API tions are functional, a single op call can allocate a significant number of intermediate tensors. Forgetting \s diSpose these whicl which takes full advantage of native hardware acceleration intermediate tensor results in memory leaks and slows down Table I shows the speedups of these implementations rela- the application significant tive to the plain JS CPU counterpart. We measure a single inference of MobileNet v1 1.0(Howard et al., 2017) with TensorFlow.js offers an alternative approach. since func- an input image of size 224 x22433, averaged over 100 runs. tions are first-order citizens in JS, and a large portion of the All measurements, other than those mentioning gtX 1080, native JS API uses functions as arguments, we decided to are measured on a macbook pro 2014 laptop while the provide a scoping mechanism where the user can wrap any GTX 1080 measurements are done on a desktop machine synchronous function f by calling tf tidy(()=fO). This Note that the Webgl and the nodes cpu backends are results in calling f immediately, and disposing all inter two orders of magnitude faster than the plain js backend mediate tensors created inside once f finishes, except for hile those utilizing the capable gtx 1080 graphics card the return result of f. We use this mechanism extensively are three orders of magnitude faster n our library. Users of the Layers API do not need ex plicit memory management due to model-level APIs such Since the launch of TensorFlow]s, we have made signifi as model fit(, model predict() and model. evaluate( which cant progress on improving our WebGL utilization. One internally manage memory. notable improvement is packing, where we store floating TensorFlow.js: Machine Learning for the Web and beyond Backend Time(ms) Speedup To work around the limitations and the complexities of we Plain js 3426 Ix bGl. we wrote a layer of abstraction called the gpgpucon WebGL (Intel Iris Pro) 49 71x ext which executes WebGl fragment shaders representing WebGL(GTⅹ1080) 685x computation In a graphics program, fragment shaders are Node. js CPU W/AVX2 87 39x typically used to generate the colors for the pixels to be Node. jS CUDA (GTX 1080) 1105X rendered on the screen. Fragment shaders run for each pixel independently and in parallel; TensorFlow js takes advan tage of this parallelization to accelerate ML computation Table 1. Speedups of the WebGl and node. js backends over the plain JS implementation. The time shows a single inference of In the Webgl backend, the draw pipeline is set up such MobileNet v1 1.0(How ard el al., 2017), averaged over 100 runs. that the scene geometry represents a unit square. When we execute a fragment shader program, we bind the texture that backs the output tensor to the frame buffer and execute point values in all 4 channels of a texel (instead of using the fragment shader program. This means that the fragment only I channel). Packing resulted in 1.3-1.4x speedup of shader main() function is executed in parallel for each output models such as Pose Net(Oved 2018)across both mobile value, as shown in 4. For simplicity we only use the red and desktop devices channel of the texture that backs the tensor(shown as 'r in the figure). On WebGL 2.0 devices, we use the gl. R32F While we will continue to work on our WebGL implementa- texture type which allows us to avoid allocating memory for tion, we observed a 3-10x gap in performance between We- the green, blue, and alpha channels(shown as'G','B, and bGL and CUDA. We believe the gap to be due to WebGls A respectively). In future work, Tensor Flow. js will take lack of work groups and shared memory access, benefits pro- advantage of all channels for WebGL 1.0 devices, which vided by general-purpose computing(GPGPU)frameworks will better utilize the GPU's sampler cache like CUDA(Nickolls et al., 2008)and OpengL Compute shaders ( shreiner et al., 2013). As we discuss below in Sec tion 4.3, we believe that the upcoming WebGPu (Jackson, A B 2017) standard is a promising avenue for bridging the gap in performance 1020 4 IMPLEMENTATION 3 4 3040 This section describes the specific constraints and imple- mentations of the various backends that are supported by void main()i TensorFlow. js ivec2 coords getoutputCoords() float a= getA(coords[o], coords [11); 4.1 Browser and WebGL float b= getb(coords[0], coords[11): With the advent of deep learning and scientific comput float result a +: ing in general, and advances in modern GPU architectures setoutput(result); the use of GPGPU has grown tremendously. While mod 3344 ern s virtual machines can optimize plain js extensively, its performance is far below the computational power that GPUS provide(see Table 1). In order to utilize the GPu TensorFlow.js uses WebGL, a cross-platform web standard providing low-level 3D graphics APIs. Unlike opencL and Figure 4. The addition of two equally shaped matrices as executed CUDA, the WebGL API is based on OpenGL ES Specifica- by the Webgl backend, and the glsl code of the fragnent shader tion( Shreiner et al, 2013) which has no explicit support for that represents the element wise addition computation. The glsl GPGPU function, main(, runs in the context of each output value and in parallel, with no shared memory Among the three TensorFlow js backends, the WebGL back d has the highest complexity. This complexity is justified Writing OpengL Shading Language(GLSL) code can be by the fact that it is two orders of magnitude faster than error prone and difficult. To make it significantly easier our CPu backend written in plain Js. The realization that to write and debug gPGPu programs, we wrote a shader WebGL can be re-purposed for numerical computation is compiler. The shader compiler provides high-level GLSL what fundamentally enabled running real-world Ml models functions that the shader author can call. Listing 2 shows in the browser the glsl source code for matrix multiplication where the TensorFlow.js: Machine Learning for the Web and beyond shared dimension N is assumed to be a multiple of 4 for sim- thread. This means that while programs are running on the plicity. The functions marked with bolded font are provided GPU, the CPu is free to respond to events and run other Js y our shader compi USing the higher level functions generated by the shader When the user calls an operation, we enqueue a program compiler has multiple benefits. First, the user-defined onto the GPU command queue, which typically takes sub GLSL code operates in high-dimensional'logicalspace millisecond time, and immediately return a handle to the re instead of the physical 2D texture space. For example, the sulting tensor despite the computation not being done. Users GLSL implementation of tf conv(uses the auto-generated can later retrieve the actual data by calling tensor dataSynco getA(batch, row, column, depth)method to sample from a or tensor datal), which returns a Typedarray 4D tensor. This makes the user code simpler, more readable As mentioned in Section .6. we encourage the use of the and less error-prone asynchronous tensor data() method, which avoids blocking Second, the separation of logical and physical shape allows the main thread, and returns a promise that resolves when the the framework to make intelligent decisions about mem- computation is done(see Figures 2 and 3). however, to re ory layout avoiding device-specific size limits of Webgl trieve the underlying data of a texture the webGL api onl textures provides a blocking gl.read Pixels()method. To get around Third, we can optimize the mapping from high-dimensional this limitation, we approximate when the GPU is done exe space to the 2D space. For example, assume the logical cuting the operations, postponing the call to gl. readPixelso which releases the main thread in the meantime shape of tensor A is 4D with shape lx3xlx2. when A gets uploaded to the GPu, the backend will allocate a physical Approximating when the GPU has finished executing pro x2 texture and the compiler will generate a gelA(a, b, c, d) grams can be done in a couple of ways. The first approach method whose implementation ignores a, and c and directly taken in Tensor Flow.js, for WebGl 1.0 devices, uses the maps b and d into the 2D texture space. We observed that EXT_disjoint timer-query WebGL extension. This exten this optimization leads to 1.3x speedup on average. sion can be used to accurately measure the gpu time of Last, there is a single Glsl implementation of tf matMul( programs, but also implicitly has a bit that gets flipped when regardless of the browsers WebGL capabilities. In Chrome a program is done executing. The second approach, for We- bGl 2.0 devices, uses the gl, fence Sync() API by inserting a we render to a 32bit single-channel floating texture, while ln- fence into the gPu command queue and polling a query iOS Safari we render to a 16bit single-channel floating point texture. In both cases, the user code is the same, using the hich returns true when the fence has dipped high-level setOutput(value) glsl method with the browser specific implementation generated by the compiler 4.1.2 Memory management Disposing and re-allocating webGl textures is relatively void main( expensive, so we dont release memory when a tensor gets i. coords getoutput coords ()i disposed. Instead, we mark the texture for reuse. If another int aRow ccordsx; nt bCol= coordsy tensor gets allocated with the same physical texture shape, float result 0.0 we simply recycle the texture. The texture recycler gives us for (int i=0; 1
(系统自动生成,下载前可以参看下载内容)

下载文件列表

相关说明

  • 本站资源为会员上传分享交流与学习,如有侵犯您的权益,请联系我们删除.
  • 本站是交换下载平台,提供交流渠道,下载内容来自于网络,除下载问题外,其它问题请自行百度
  • 本站已设置防盗链,请勿用迅雷、QQ旋风等多线程下载软件下载资源,下载后用WinRAR最新版进行解压.
  • 如果您发现内容无法下载,请稍后再次尝试;或者到消费记录里找到下载记录反馈给我们.
  • 下载后发现下载的内容跟说明不相乎,请到消费记录里找到下载记录反馈给我们,经确认后退回积分.
  • 如下载前有疑问,可以通过点击"提供者"的名字,查看对方的联系方式,联系对方咨询.
 输入关键字,在本站1000多万海量源码库中尽情搜索: