{"id":5264,"date":"2016-04-26T09:59:31","date_gmt":"2016-04-26T05:59:31","guid":{"rendered":"http:\/\/blog.denivip.ru\/?p=5264"},"modified":"2016-04-26T13:49:48","modified_gmt":"2016-04-26T09:49:48","slug":"faster-than-real-time-stabilization-of-videos","status":"publish","type":"post","link":"http:\/\/blog.denivip.ru\/index.php\/2016\/04\/faster-than-real-time-stabilization-of-videos\/?lang=en","title":{"rendered":"Faster Than Real Time Stabilization of Smartphone Videos"},"content":{"rendered":"<p>Casual videos shot with a smartphone can often suffer from shakiness. \u200eSoftware-based video stabilization methods have been actively developing over the \u200elast years and can make such videos look much more professional and fluid. We \u200ehave just released <a href=\"https:\/\/itunes.apple.com\/ru\/app\/deshake-video-handshake-removal\/id1104260355?mt=8\" target=\"_blank\">Deshake, our video stabilizer app for iOS<\/a> that can enable faster \u200ethan real-time video processing. In this article, we are going to \u200eoverview the video stabilization techniques available and discuss some of \u200eengineering challenges we have faced during the development.\u200e <!--more--><\/p>\n<h3>Video Stabilization<\/h3>\n<p>Generally speaking, every video stabilization algorithm has to implement two \u200esteps:\u200e<\/p>\n<ol>\n<li>Analyse motion between the consecutive frames of the video. The sequence of such \u200emotions will constitute a trajectory which is usually a pretty shaky one.\u200e\u200e<\/li>\n<li>Build a smoothed version of the trajectory and attempt to re-create the video as if it \u200ewere shot along this new trajectory.<\/li>\n<\/ol>\n<p>The biggest problem here is that algorithm mostly has to operate with frames only. \u200eThose are 2D images which are tremendously simplified representations of the real \u200edynamic 3D world. A lot could have happened in that world between two adjacent \u200eframes. Even if we ignore the movements of the objects shot, the camera alone can \u200echange its location, orientation or even some internal parameters like focal length or \u200ezoom level.\u200e<\/p>\n<p>To demonstrate why the problem is so arduous, let us ignore everything but the \u200ecamera movement for a moment. There are 6 degrees of freedom in the \u200emovement already \u2014 3 for location (x, y and z) and 3 for orientation (yaw, roll, pitch). \u200eDo we have any chance to reconstruct them at least? Well, in a sense we cannot. It is well-known that based on only two 2D images of a static 3D \u200eworld taken with the same camera (no focal length modifications, etc.) you can\u2019t \u200epositively determine the relative movement of such camera between the shots. To \u200edetermine the movement you have to know some external data: it may be the \u200ecamera parameters (that\u2019s why in 3D scanning and other applications calibrated \u200ecameras are widely used) or some insights on the scene shot (for instance, if you have \u200eshot a building you can solve the problem based on the assumption that its windows are rectangular and walls \u200eare perpendicular to the ground).\u200e<\/p>\n<p>Back to our case. We have much more than just two images and for a certain point \u200eof view motion estimation does not necessarily require camera position estimation. \u200eAt this point, different stabilization algorithms diverge a lot.\u200e<\/p>\n<p>The most complicated and reckless algorithms attempt to reconstruct the full 3D picture based on the \u200eframes only. As we already know, even in a perfectly static world you cannot do this \u200ewith two adjacent frames: any reconstruction would necessarily include camera \u200emovement estimation which is impossible as you know. However, three frames would be \u200esufficient (google up <i>trifocal tensor<\/i>, <i>multi view 3D reconstruction<\/i> and so on). But in \u200epractice to accommodate dynamism of the scene and types of instability, such \u200ealgorithms have to account multiple consecutive frames, calibrate camera on the fly \u200eand do some other non-trivial stuff. It is still quite unstable; for example, one known \u200eweakness of such algorithms is camera rotations without movement. In such a case \u200ethere is almost no\u00a0effect of parallax (i.e., relative shifting of objects positioned at different distances \u200efrom the camera) in the image and 3D reconstruction techniques may simply fail. It \u200eis also extremely slow: probably the most notorious attempt in the field is Microsoft\u2019s \u200eFirst-Person Hyperlapse which spends minutes (!) on a single video frame. Certainly \u200ethere are some benefits in such an approach: by knowing real 3D positions of the camera you can easily draw a smooth physically correct alternative path in all 6 degrees of freedom. \u200eThen, with a reconstructed 3D scenery you can literally render the stabilized video \u200eas if it were really shot with a camera moving along that path.\u200e<\/p>\n<p><iframe loading=\"lazy\" width=\"640\" height=\"360\" src=\"https:\/\/www.youtube.com\/embed\/SOpwHaQnRSY?feature=oembed\" frameborder=\"0\" allowfullscreen><\/iframe><\/p>\n<p>The most conventional models pretend that the world is two-dimensional. For them \u200ethe camera is just a rectangular framework moving over an infinite plane. This may appear a highly simplistic approach, but in real life for each pair of adjacent frames \u200ethis simplification often works pretty well. From the mathematical point of view we can \u200ecall it the similarity motion model. With few degrees of freedom (4 with scaling, \u200eotherwise 3) and very easy transform rules, it has proven itself robust and \u200ecomputationally efficient. Moreover, you can physically interpret the trajectory you \u200eget from this model: the camera is moving along a polyline on a plane, rotating here \u200eand there. Such an interpretation, although it is not exactly reflecting the real world \u200esituation, is a real helper at the smoothing step: it can even imitate camera motions \u200eused by professional cameramen (see article <a href=\"\u200ehttp:\/\/research.google.com\/pubs\/pub37041.html\">Auto-Directed Video Stabilization with Robust L1 Optimal Camera Paths<\/a> to learn more).\u200e<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-5281\" src=\"http:\/\/blog.denivip.ru\/wp-content\/uploads\/2016\/04\/stabilizationArticle_teapots.png\" alt=\"stabilizationArticle_teapots\" width=\"695\" height=\"226\" srcset=\"http:\/\/blog.denivip.ru\/wp-content\/uploads\/2016\/04\/stabilizationArticle_teapots.png 695w, http:\/\/blog.denivip.ru\/wp-content\/uploads\/2016\/04\/stabilizationArticle_teapots-300x98.png 300w\" sizes=\"(max-width: 695px) 100vw, 695px\" \/><\/p>\n<p>In fact, you don\u2019t have to use this very planar model: there are other classes of \u200emotions to choose from. For example, you can assume the world to be a sphere with \u200ea camera in its center which can rotate between frames. Such a rotational model is \u200ealso often used in panorama stitching. This model can simulate some non-trivial \u200eperspective transforms between frames that are out of scope for the similarity model.\u200e<\/p>\n<p>\u200eSo the overall idea is to leave frames in 2D, treating a single frame as a solid piece \u200eof plane that can be moved around, scaled or even skewed, but still remains a \u200esolid piece. The benefits of this approach are high computational speed and \u200erobustness. The downside is that it is restricted to the allowed transforms.\u200e<\/p>\n<p>There are also some intermediate solutions we can call 2.5D. The general idea is \u200ethat we track points as they move between frames, choose long enough and \u200econsistent tracks and consider them, without any attempt to interpret what is \u200ehappening either in 3D or on the frames, as the motions of interest. Then we \u200esmooth such tracks and try to bend each frame so that the points move from their \u200eoriginal positions to the smoothed tracks. The trick here is that we use, both during \u200esmoothing and bending, the underlying ideas of true 3D reconstruction (e.g., \u200eepipolar geometry), but without any actual 3D reconstruction. From the \u200eperformance viewpoint such algorithms are also somewhere between 2D and 3D.\u200e<\/p>\n<p>Our main goal was to achieve at least real-time video processing on modern \u200esmartphones, so for us there were no options but to stick to 2D. As for the motion \u200emodel, we use neither the similarity model nor rotational model, but some \u200eexperimental model of our own.\u200e<\/p>\n<h3>Motion Between Frames<\/h3>\n<p>As we have chosen how to stabilize motions, the question now is how to determine \u200ethe motions to be stabilized. The obvious way to go is to analyze frames. But how \u200eexactly should this be done?\u200e<\/p>\n<p>What we need here is called, in computer vision, Image Registration. Those are \u200etechniques which align several images of the same scene. Generally speaking, such \u200etechniques can be classified into the spatial domain methods and frequency domain \u200emethods. Spatial methods compare pixel colours, while frequency methods first use \u200esome tricky algorithms like Fast Fourier Transform to transform the array of pixels \u200einto a set of waveforms and then look for correlations between them. The spatial \u200emethods in turn can be classified into intensity-based and feature-based methods. \u200eThe intensity-based methods deal with intensity patterns of the whole image while \u200efeature-based methods try to find correspondences between some local features \u200esuch as points, lines or contours. The feature-based methods are most widely used \u200efor video stabilization today: they are faster as they don\u2019t have to deal with the \u200ewhole images most of the time and they are more precise, as single features can be \u200ealigned very accurately, sometimes up to subpixels.\u200e<\/p>\n<p>There are some implementation variations of feature-based image registration, \u200ebut let\u2019s focus on the approach that we have adopted. First of all, we use points as \u200efeatures. We select a set of points on a frame and then try to determine what \u200elocations those points are moved to on the next frame. Such movement of single \u200epoints on a frame is called the optical flow. The optical flow estimation is based on the assumptions that:<\/p>\n<ol>\n<li>patches of pixels around a point and its correspondence in \u200ethe next frame are almost identical and<\/li>\n<li>the point has probably not moved too far \u200ebetween two adjacent frames.<\/li>\n<\/ol>\n<p>So we compare the patches of pixels around \u200ethe point with those not far from it on the next frame and choose the one with the \u200egreatest resemblance. To do it efficiently, we use the pyramidal variation of \u200eLucas-Kanade method.\u200e<\/p>\n<p>There is still one important issue to solve, i.e., which set of points to choose as \u200efeatures? It turns out that the best candidates for features are corners (i.e. vertices of \u200eangles) on the image. The rationale is as follows: all points can be tentatively \u200eclassified into internal points, border points and corner points. If you choose an \u200einternal point as a feature, not only a patch around the true image of that point on \u200ethe next frame will be similar to the original patch, but almost any of its neighbours \u200ewill suffice. For a border point, all of the true image neighbours along the border will \u200ehave confusingly similar patches in the next frame. But the corners are not affected \u200eby such a problem. Interestingly enough, the fact that only corners are good to \u200edetermine the motion is not a deficiency of the particular algorithm we are using, in \u200eabsence of corners even the human brain can misidentify motions. This problem \u200eis known in motion perception as the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Motion_perception#The_aperture_problem\">Aperture Problem<\/a>.\u200e<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-5280\" src=\"http:\/\/blog.denivip.ru\/wp-content\/uploads\/2016\/04\/stabilizationArticle_cornerness.png\" alt=\"stabilizationArticle_cornerness\" width=\"700\" height=\"388\" srcset=\"http:\/\/blog.denivip.ru\/wp-content\/uploads\/2016\/04\/stabilizationArticle_cornerness.png 700w, http:\/\/blog.denivip.ru\/wp-content\/uploads\/2016\/04\/stabilizationArticle_cornerness-300x166.png 300w\" sizes=\"(max-width: 700px) 100vw, 700px\" \/><\/p>\n<p>So we need to find corners in the image. For this we use a variation of\u00a0\u200eHarris Corner Detector.\u200e<\/p>\n<p>After we have selected the features on the image and found their correspondences \u200eon the next frame, we have to deduce the motion between frames (i.e., the \u200etransform). This step depends on the motion model chosen, but generally you just \u200eoptimize your model iteratively or in some other way to minimize the error between \u200eyour model\u2019s prediction and real point movement just estimated.\u200e<\/p>\n<p>That was the obvious way: you have frames, you analyze them. But there is an \u200ealternative way also. What if you have some external sensor and its data is recorded \u200ealongside the video? Certain attempts have already been done in this area. An \u200eapparent choice for such a sensor is the accelerometer (and\/or gyroscope) of a device. Most \u200emodern smartphones already have it; it can provide surprisingly precise data at a \u200ehigh rate (higher than the video frame rate) and it literally gives you the raw motion \u200ebetween frames. There are solutions based on this approach, most notably \u200eInstagram\u2019s Hyperlapse. Pros are apparent: the biggest problem here is to calibrate \u200ethe accelerometer\/gyroscope correctly (its timeline can differ a little from the camera\u2019s \u200etimeline, and it can have some drift). After that you get the required data almost for \u200efree (from the computational point of view). The downside is that the only data you \u200ehave is the rotational component of the motion and you can say nothing of the \u200etranslational shaking or other types or movement like in shooting the moving target: \u200efor instance, in shooting videos from a running car). Another clear downside is that \u200ethe video should be shot from a given app. So if you have shot it previously or want \u200eto shoot it with another app to use some filters or exposure\/focus controls, you just \u200ewon\u2019t be able to stabilize the video.\u200e<\/p>\n<p>Another curious example of using external sensors for motion estimation is the \u200erecent attempts to use a depth camera (e.g. Kinect) alongside with a normal camera. \u200eThat way you have a 3D scene without the need to reconstruct it, which gives you \u200ethe pros of a fully 3D approach without most of its cons. The problem is that the \u200edepth camera is not good at big distances (i.e., it is not applicable to outdoor \u200eshooting yet). Other issues include calibration and \u200ealignment of the colour camera with the depth camera. And, yes, the depth cameras are not sufficiently widespread yet.\u200e<\/p>\n<h3>OpenCV<\/h3>\n<p>As its name suggests, OpenCV is an open-source computer vision library. It \u200econtains highly optimized implementations for many computer vision algorithms \u200e\u200e(including some accelerated versions for CUDA, NEON, etc.). It is cross-platform \u200e\u200e(written in C++) and\u00a0has become a de-facto industry standard. Unsurprisingly, \u200eOpenCV already contains a video stabilization implementation, accidentally it is \u200eexactly the one we have just described. But an attempt to run this implementation as \u200eit is has not been quite cheerful.\u200e<\/p>\n<p>As a test case we have selected a 1-minute full HD video \u200ecaptured with an iPhone at 30 frames per second. OpenCV\u2019s native implementation has spent 13 minutes to \u200eprocess this video, and a typical user won\u2019t wait for so long. So we aimed at speeding-up \u200ethe processing 13-fold. Before discussing our results, let\u2019s review \u200esome details specific to the current OpenCV implementation.\u200e<\/p>\n<p>First of all, OpenCV on iPhone runs totally on CPU. At the moment, OpenCV \u200esupports only CUDA-based GPU computing, and it is not available on iPhone. \u200eMoreover, the OpenCV implementation has been designed to be cross-platform, \u200eand although some parts of the algorithm are optimized for specific platforms (for \u200eexample, video file decoding on iPhone uses Apple\u2019s optimized AVFoundation framework), the \u200egeneral pipeline had to remain suboptimal.\u200e<\/p>\n<p>To be more specific, OpenCV stores single frames most of the time as RGB \u200eencoded images (as it requires them either way to output stabilized frames), but the \u200ecomputer vision algorithms (including feature detection, optical flow estimation, etc.) \u200erequire grayscale images as their input. That\u2019s why OpenCV converts RGB frames \u200eto grayscale each time it needs to pass them to individual steps: hence it performs \u200ethe grayscale conversion several times for each input frame. Another example: \u200ebesides the frame, the optical flow algorithm needs a series of auxiliary images \u200econstituting what is called a Gaussian pyramid; this pyramid is also computed more \u200ethan once for each frame. \u200e<\/p>\n<p>So the first step for us was to remove such and similar redundancies from the \u200ecomputations.\u200e<\/p>\n<h3>Our Solution<\/h3>\n<p>As I have already mentioned, OpenCV does grayscale conversion multiple times for \u200eeach frame. Yet the problem is deeper. In fact, during motion estimation (which is \u200erun as a single-pass pre-processing through a video file) there is no single \u200ealgorithm in the pipeline that requires an RGB-encoded frame. But you may be surprised to \u200eknow that frames in video files are usually stored using the YUV colour space invented to transmit the colour television signal via the black-and-white \u200einfrastructure. Its Y component represents luma (i.e. brightness) of the pixel and U, \u200eV are responsible for chrominance (colour itself). Therefore, the problem is that \u200eOpenCV first has to convert from grayscale (Y component) to RGB using additional \u200einformation (UV components) and then to convert it multiple times back to grayscale. \u200eTo eliminate it, but still maintain fast video decoding on iOS, you can use \u200eAVFoundation in combination with Core Video\u2019s pixel buffers. Intentionally \u200esimplified setup code for asset reading with proper preparations may look like this:\u200e<\/p>\n<p>[gist id=d2a4fe1eb3ab723a2b56122ae05414b9]<\/p>\n<p>Here the kCVPixelBufferPixelFormatTypeKey key tells AVFoundation to provide for \u200eeach frame a Core Video buffer and the format \u200ekCVPixelFormatType_420YpCbCr8Planar is chosen to eliminate any unneeded colour \u200econversions during decoding. Note also that AVFoundation sets the \u200ealwaysCopiesSampleData property of videoOutput to YES by default. It means that \u200eeach frame\u2019s data is copied after decoding, which is not required in our case. So \u200esetting this property to NO can sometimes give you a performance boost. The colour \u200eformat used here is planar, for us it means that the Y component will be stored in a \u200econtinuous region of memory separately from others, which makes processing we \u200ehave to do both simpler and faster.\u200e<\/p>\n<p>Single frame reading can look like this:\u200e<\/p>\n<p>[gist id=a6141d20959466787f497943ac5493d0]<\/p>\n<p>It\u2019s important to use CVPixelBufferGetBaseAddressOfPlane here instead of the standard \u200eCVPixelBufferGetBaseAddress as the colour format is planar. If you are not familiar with \u200ethe Core Video framework, please note that in this example you can only process \u200epixels pointed by grayscalePixels before CVPixelBufferUnlockBaseAddress is called, and \u200eyou are not allowed to modify them (as the read-only access here was requested).\u200e<\/p>\n<p>After we have removed all the unnecessary colour conversions and other \u200ecomputations and did some other memory optimizations, the processing speed still \u200eremained far from desired. So the next decision was to downscale frames before \u200eprocessing. That may sound unreasonable, but the reality is that the full frame \u200edefinition gives you more noise than data for stabilization. We have downscaled \u200eframes to half of their original size easily using cv::resize. It is the OpenCV \u200efunction already highly optimised, including even NEON support on iOS.\u200e<\/p>\n<p>And that was still not enough. The next heuristic we have applied is as follows. \u200eFeature detection is quite a computationally intensive algorithm; in fact, more than \u200e\u200e60% of time at that moment was consumed by this step. But as we have estimated \u200ethe optical flow, we have already computed frame-to-frame feature mapping. So, if \u200eon the previous frame the points were good enough to be considered features (they \u200ehad high \u2018cornerness\u2019), than probably after mapping they will still remain good. Why \u200enot just reuse this knowledge? So, we can compute features on a given frame and \u200ethen do not recompute them from scratch for a few next frames: each time we can \u200euse the points we get after optical flow estimation as features. So we have \u200ekeyframes, i.e., those for which we run the real feature detector, and all the other \u200eframes get their set of features for free, as we compute the optical flow either way.\u200e<\/p>\n<p>After all these optimizations, the processing time of a 1-minute video has decreased \u200efrom 13 minutes to 42 seconds. Well, that is already faster than real-time but still not \u200efast enough for us. In fact, all this time we consciously left one optimization in \u200ereserve, that is, parallelization. As you certainly know, all modern iOS devices have \u200e\u200e2-core CPUs, and this can potentially double your computational power at least if \u200eyou know how to use them efficiently. In fact, OpenCV already uses some \u200eparallelization features, but for video stabilization the support is not extensive \u200eenough.\u200e<\/p>\n<p>Video stabilization has turned out to be an interesting problem in context of \u200eparallelization. You can see that, for each frame we may want at some point to: read \u200eit (that is, to request the next decoded frame from AVAsset reader), downscale it, \u200econstruct the Gaussian pyramid for it, find features on it (if it is a keyframe), compute \u200ethe optical flow (between it and the next one) \u2014 that\u2019s 5 potential operations for a \u200esingle frame. The problem is not the count, but the interdependencies between \u200ethem, which you can see on the scheme:\u200e<\/p>\n<p><a href=\"http:\/\/blog.denivip.ru\/wp-content\/uploads\/2016\/04\/stabilizationArticle_concurrency.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-5282\" src=\"http:\/\/blog.denivip.ru\/wp-content\/uploads\/2016\/04\/stabilizationArticle_concurrency.png\" alt=\"stabilizationArticle_concurrency\" width=\"718\" height=\"344\" srcset=\"http:\/\/blog.denivip.ru\/wp-content\/uploads\/2016\/04\/stabilizationArticle_concurrency.png 718w, http:\/\/blog.denivip.ru\/wp-content\/uploads\/2016\/04\/stabilizationArticle_concurrency-300x144.png 300w\" sizes=\"(max-width: 718px) 100vw, 718px\" \/><\/a><\/p>\n<p>The scheme shows the operations for a single frame in columns, while the \u200ekeyframes are marked with the darker background. The arrows connect an \u200eoperation with the dependent operations. In fact, the distance between keyframes in \u200eour case is 8 rather than 6, but the general idea is the same. So, we have lots of \u200eoperations to do and we have to manage dependencies between them somehow: \u200eNSOperationQueue is a perfect fit for the task. We apply a standard scheme: one \u200eserial operation queue is used to coordinate the process (i.e., dispatch new \u200eoperations for the individual frames) and respond to requests from the user (i.e., begin \u200eprocessing, cancel processing, etc.) and another concurrent operation queue is \u200eused for all the frame operations. This works great. The only potential issue here is, as the video is processed a tail of processed dependencies is piled up: on the \u200escheme you can see that each compute optical flow operation has most of the \u200eprevious operations as its dependency in some generation. Such operations remain \u200ein memory and can potentially mess some of your plans. However, you can easily \u200eremove them if you apply the following code to already processed operations (for \u200eexample, it is sufficient to apply it to the keyframe read and flow operations):\u200e<\/p>\n<p>[gist id=2ffeaa9c81ab9d7d9858b1fcbca589c9]<\/p>\n<p>This solution gives us an almost perfect parallelization: debug shows ~90% of the \u200eCPU resources is consumed by our processing code and the other ~10% is used by \u200ethe system, probably for video decoding.\u200e<\/p>\n<p>So after all these optimizations including parallelization, our test with a 1-minute full \u200eHD video processing takes 22 seconds only.\u200e<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Casual videos shot with a smartphone can often suffer from shakiness. \u200eSoftware-based video stabilization methods have been actively developing over the \u200elast years and can make such videos look much more professional and fluid. We \u200ehave just released Deshake, our video stabilizer app for iOS that can enable faster \u200ethan real-time video processing. In this [&hellip;]<\/p>\n","protected":false},"author":14562,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3,20,407],"tags":[],"_links":{"self":[{"href":"http:\/\/blog.denivip.ru\/index.php\/wp-json\/wp\/v2\/posts\/5264"}],"collection":[{"href":"http:\/\/blog.denivip.ru\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/blog.denivip.ru\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/blog.denivip.ru\/index.php\/wp-json\/wp\/v2\/users\/14562"}],"replies":[{"embeddable":true,"href":"http:\/\/blog.denivip.ru\/index.php\/wp-json\/wp\/v2\/comments?post=5264"}],"version-history":[{"count":21,"href":"http:\/\/blog.denivip.ru\/index.php\/wp-json\/wp\/v2\/posts\/5264\/revisions"}],"predecessor-version":[{"id":5292,"href":"http:\/\/blog.denivip.ru\/index.php\/wp-json\/wp\/v2\/posts\/5264\/revisions\/5292"}],"wp:attachment":[{"href":"http:\/\/blog.denivip.ru\/index.php\/wp-json\/wp\/v2\/media?parent=5264"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/blog.denivip.ru\/index.php\/wp-json\/wp\/v2\/categories?post=5264"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/blog.denivip.ru\/index.php\/wp-json\/wp\/v2\/tags?post=5264"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}