My idea about UD

Here is what I believe that  UD is doing. This can be surely multiply optimized, but the basics is: put the z buffer in the 3d space.

I. Preparing the database.

1. Imagine a 3D cubic lattice which is as big as your 3D scenery bounding box.  Take coordinates aligned with the lattice. Put the camera at coordinates (0,0,0) and surround it with a cube C. The faces of the cube C will serve as the cubemap we want to construct. Each face is covered by pixels of a given resolution. We have already the following parameters to play with, given in the units of the coordinate system chosen: the step of the lattice, the dimension of the cube C, the resolution of a face of C.

2. render, by any means you like, the lattice, as seen from the (0,0,0) pov, on the 6 “screens” -faces of C.  We have 6 view frustra, the discussion will be the same for each of them. In order to render them you need to put small balls, or squares facing the relevant face of the cubemap, or whatever, at the lattice nodes, so we have another parameter here, the diameter of the ball or square.  As a result of the rendering you now know the following things:

  • for any lattice atom you know which pixel from which face it corresponds. You may do better for lattice atoms which are inside the cube C, namely “ignore” them, say you attribute to them a IGNORE label, otherwise you attribute to each lattice atom the 2D coordinates of the pixel from the cubemap and a information which says which face of the cubemap the pixel is,
  • you have information about the scale, which you attach to the lattice atoms, like this: if two neighbouring lattice atoms project on the same pixel then attach IGNORE to both. If the ball/square/whatever of a lattice atom projects on more than one pixel then attach to it  a number SCALE approximately proportional with the square ROOT (thanks ine) of the number of pixels it projects (or the dimension of the bounding box of the pixels)

Of course, you don’t want to take a huge lattice, with very small balls. That’s all in the parameters choice.

3.  Take now your database of 3D points, i.e. the real one, which you want to render eventually, UD style. I shall ignore, for the sake of the argument, how is the database implemented: as an octree or otherwise, or even if the database is made by 3D points or by some more complex objects, like polygons.  Put this database in the coordinates chosen first, such that, for example, if you are working with octrees, the cells of the lattice correspond with some level of the octree.  Attach to each node of the lattice the supplementary information: the points from the database which are within the ball surrounding the atom, or else the label VOID. Alternatively, think that any point from the database is inside some cell lattice and “project” it on each corner of the the cell lattice (i.e. attach to each lattice atom a list of points from the database which are in the neighbourhood of the lattice atom, the nil list corresponds to VOID)

4.  Let’s see what we have, supposing for example that we use octrees. We add the 3D lattice to the 3D database of points (by using lists of points as explained at 3.) and for any lattice atom we attach also the information as explained at point 2.

How to compress this efficiently? There is a new parameter here, namely the level of the octree and also how is the color, for example, information stored in the octree. Of course, this is a matter of recursion, namely at points 1-3 we may take finer and finer resolutions and lattice steps, and so on, starting from a very gross lattice and resolution, etc, then trying to figure a correct recursion procedure. That’s work to be done, is not trivial but it is somehow straightforward once you figure it.

II. The pipeline and rendering.

The problem is that we want to be able to get very very fast, from the database constructed at (I), only the points which are needed for realtime rendering, when the camera is at coordinates (x,y,z).

This problem splits into two different ones:

  • at the start of the realtime UD rendering, we want to be able to cull something close to the minimum number of 3D points, when camera is at (0,0,0). According to the information given by Euclideon, a good algorithm should   able to do this in about 1s.
  • then, we need a procedure to take what we need from the database when we change the pov from (x,y,z) to (x+1,y, z) (or alike). This should be much more faster, allowing for realtime rendering.

As a preparation, let’s remark that:

  1. if the camera is a (0,0,0), then we already know where each lattice point projects, is written in the database. So we just need to start from a pixel, at a given resolution (reccursion again), and to choose from the database only the lattice atoms which are CLOSE, in the decreasing order of SCALE, and from those the real 3D points which are neighbours (of course we have to use the octree structure). We get for each pixel a number of points of the order of log(dimension of the world).
  2. if the camera is at (x,y,z) we also know where each point from the 3D database projects, because we read it from the data attached to the lattice atom which, translated by (x,y,z), is the neighbour of the point. We get also the SCALE parameter from this.

1. We use remark 1 to solve the first problem, namely what comes through the pipeline from the huge 3D database to the pre-rendering buffer B, when we start with the camera at (0,0,0). The buffer contains about (number of pixels) X log(dimension of the world)  3D points, along with pixels coordinates where they project and with SCALE.  This is fed directly to the rendering procedure, which you can choose freely, but it is almost trivial.

2. What happens when we move the camera? We update the pre-rendering buffer B, only by updating the pixel and scale information for the 3D points in the buffer D, getting only by a translation (addition) the relevant data from the huge database, in case here are two things which might happen: there is a point which exits the buffer, or there is a hole in the image.

_________________________

Is this making any sense? Please let me know, because I was asked repeatedly what do I really believe.

58 thoughts on “My idea about UD”

  1. Thank you for sharing your ideas, it’s very interesting!
    Would you clarify something?
    In II-remark 1: You search for atoms which are CLOSE in which space? In the 3D space of lattice coordinates? Do you only take points from neighbor octree cubes at different scales? If so, do you think it is sufficient? New points which were occluded on the previous frame may pop up from random places.

    1. Thanks for the interest! I speak about lattice atoms which have the label CLOSE EDIT: in a previous version of the post I added a label CLOSE to those points which have surrounding ball/square projecting on more than one pixel. So, CLOSE means SCALE > 1. They bring with them a small cloud of 3D points which are near (which “project” to the respective lattice atom).
      The thing is that the database is too huge to render it realtime, so we need to extract (fast) from it a smaller one which can be rendered realtime with computers as powerful as today’s ones.

  2. So, you have lattice atoms, say A, B, C, D, … and you have inhabited 3D points a, b, c, d, … and you have screen pixels u, v, w, … . A lattice atom A is described by (X,Y, Z, x,y, SCALE), let’s neglect the problem of the 6 view frustra. That means that A is at coordinates (X,Y,Z) in 3D space, is seen from (0,0,0) at the pixel with coordinates (x,y) and it has the scale (as defined) SCALE. The lattice atom A points to a list (which can be nil) of 3d points (a,b,c,d), each described by coordinates (x,y,z) and color information. (The 3d points may be, in fact, corners of the octree, at a scale compatible with SCALE, but I pass.)
    If you are at (0,0,0), you want to know who projects on (x,y). So you go to the database and you pick, in the decreasing order of SCALE the lists associated to the lattice atoms which have (x,y) in the description (and you read the SCALE for each of them).
    If you are at (X_0, Y_0, Z_0) and you want to know who projects on (x,y) on the screen (on one of the 6 screens of the cubemap, alligned with the axis) then you go to the database and you look at the lattice atoms A(X,Y,Z,x,y,SCALE_0) and then you check if A'(X+X_0, Y+Y_0, Z+Z_0, x’,y’,SCALE’) has non-nil associated list of 3d points. If so then you take the respective list and you attach to them SCALE_0.

  3. Ok… Some things are more clear now, thanks!
    Now I’m trying to understand why the number of lattice atoms A(X,Y,Z,x,y,SCALE_0) is log(N), where N is the dimension. It’s analogous to frustum casting from (x,y) pixel on the screen, isn’t it? All (X,Y,Z) which project to (x,y) will potentially be in this set. If we ignore the scale, the number of such (X,Y,Z) will be O(N^3) for 3D space, because essentially the number of such points on a regular grid is the discretized volume of the frustum.
    If you only choose atoms with a particular scale SCALE_0, wouldn’t you get exactly one?
    If you choose ones with SCALE > 1, as you said in the previous comment, you will get only the points within a certain radius, I suppose.

    1. Fix (x,y). Take N^3 to be the number of lattice atoms (so N is proportional to the diameter of the world). Being a lattice, the number of lattice atoms which project on (x,y) is (roughly) N. I shall suppose that there is an upper bound on the number of the inhabited 3D points which are close to a lattice point and moreover that this number is proportional to SCALE^3 (that’s where you use in fact the octree structure, so you assimilate 3d points with levels of octree cubes). So, at a given SCALE, SCALE^3 points project on SCALE^2 pixels, therefore there are about SCALE points projecting on one pixel, from that distance. You take into account that SCALE is proportional with 1/distance (with distance from 1 to N) and you get log(N) 3D points projecting on one pixel.
      That’s what my nose tells me, anyway.

      1. >Being a lattice, the number of lattice atoms which project on (x,y) is (roughly) N
        But this would be true only for a parallel projection. With a perspective projection each (x,y) pixel “sees” a pyramid, which takes a fixed percent of the whole space. Therefore the number of lattice atoms visible from a (x,y) pixel is O(N^3).

      2. Take p^2 as the number of pixels. One of those pyramids contains about N^3/p^2 lattice points. This pyramid has height about N and base about N^2/p^2 points. The number you need to control is therefore N/p, but this is controlled by SCALE. There is a play between all these parameters.

      3. Ok, it seems that SCALE is not well-defined in the article.

        >number SCALE approximately proportional with the square of the number of pixels it projects

        This is ~ p^4/distance^4. Maybe you meant square root?

        >(or the dimension of the bounding box of the pixels)

        And this is p/distance. This got me a little confused. I’ll assume the latter.
        The calculation of the number of projected 3D points from the cloud on a single (x,y) on the screen is then: sum(num_lattice_points(distance) * (SCALE(distance))^3, distance=1..N) = sum(distance^2/p^2 * (p/distance)^3, distance=1..N) = sum(p/distance, distance=1..N) = p * H(N),
        where H(N) is the harmonic function, which is almost the logarithm: http://www.wolframalpha.com/input/?i=plot%28sum%281%2Fd%2C+d%3D1..N%29%2C+log%28N%29%2C+N+%3D+1..1000000000%29

        Indeed it works!

  4. I’m not sure what you mean by 3D lattice. If it is a 3D equidistant grid, then its size is going to be a problem. Either because it is too small, which gives a too low resolution or because it is too big, causing rendering to be too slow. If the lattice is a sparse (i.e. only those points that render a pixel on screen/cubemap), then storing a list of those points might be sufficient.

    Furthermore ‘delta updates’, which in this context I define as ‘requesting the points that are missing when moving from (x,y,z) to (x+1,y,z)’, tend to be problematic. For example, the user might be moving from (x,y,z) to (x+1000,y,z), because it is moving with a higher speed. You cannot solve that by dividing the move into 1000 steps of 1. You might be able to determine what needs to be requested from the database, but this is complicated (i.e. I do not know how to do this faster than by rendering the entire scene). If you know how to do this, I would be very interested.

    Somehow I’m thinking of multi-scale cubemaps: a set of n cubemaps where the i-th cubemap only contains the scene within the cube with sides of length 2^i, but not in the cube with sides of length 2^(i-1). You do not need to recompute the i-th cubemap if the viewer has moved less than 2^(i-1)/screen_width from its origin.

    1. Hi Bauke, the point is to introduce a regular, pre-rendered lattice, to attach to it the information about where lattice point project (wrt (0,0,0)) and scale, then link the points you intend to render realtime to the lattice points. So, at the moment of doing realtime rendering, you first pick the points you need, along with the information about the rendering of the lattice points which are close (mind you, by doing only additions) and you already, almost have all the needed information for realtime rendering, and a buffer of reasonable size of points to render. Then you do the rendering.
      Re the speed of moving of the camera, there is an upper limit, of course. I suppose that’s the speed the camera moves.
      If you notice, the first stage of realtime rendering takes about a second (a lot of time), Dell mentions 0.8 s. In contradistinction, when the camera moves from the initial point, the algorithm should be much faster, basically because the buffer will almost never change much from t to t+dt. The buffer does not contain only visible points, but all the potential visible points, i.e. all the octree cubes of right scale which are inhabited and potentially visible.

  5. I thought a bit more about the approach in the OP. It turns out that I’m doing something similar, although simpler. So far I’ve got the 3D cloud points selection working in the same way as you described, just without pyramid casting: in your terms, I take all lattice points instead of just the visible ones. Because I don’t have the (x,y) coordinates that way, I project the cloud points I’ve retrieved at each frame using the GPU. The lattice in my case is quite sparse, so the N^3 step is not a big problem. The SCALE is discrete, it’s 1/2^n, where n is an integer, like in an octree.

    The end result is that it retrieves a lot of points. Still too many for fast processing in my system. I use 800×800 resolution, and 1/10th of the “tower” dataset. When I’m close to the terrain, so that for the closest lattice points I get the full detail, I get about 12 millions points. Many of them are from horizontal surfaces I guess, which are hardly visible. It shows at most 20fps.

    I don’t know if precomputing the projection in the lattice in the way you described will improve things. You don’t need to project, which is good. But instead you have to make the lattice small enough such that you can implement motion as jumping on the lattice points. And then the database will be huge, as Bauke pointed out. In my system the increase of the size of DB is negligible.

    Kyle, the last sentence I understood from you was “Bauke is the closest”.

    1. Ine, re “precomputing the projection in the lattice”, this should be done recursively, because there are regularities there, then coupled with the level of the octrees used. My nose tells me it’s completely feasible, also that the reason for “euclideon” name comes from this.
      I think Kyle is trying to explain something along these lines, although he still does not make a difference between precomputing and realtime rendering stages.
      Finally, with the lattice augmented database you jump into 5D (SCALE apart) and that’s the right dimension for seeing things projectively.
      For all these reasons, I am just not motivated enough to do this myself, because it takes, I suppose, about 6 months of full time for 1 person to do this.
      Finally, I don’t expect the result to be as fast as the actual euclideon UD. Once the prototype done, (Dell did it presumably in C), there are lots of improvements and tweaks.
      There is still the possibility that the lattice thing is a different idea than Dell’s euclideon.
      In conclusion, a team work with free sharing of ideas and real programming done, as Bauke did, is the path to success.
      (the net is not subtle, so they say, therefore …) LET’S GET IT DONE TOGETHER.

      1. It indeed looks like some raytracing. But it is not! If you look at the “Dell”‘s code you will see it has no rays at all, noting is traced, it is just sorting and it is incredibly simple (no optimisations or subdivisions just big linear raw array of points) and thats why slow on this big dataset. It is online sorting. Framerate is almost constant.

    1. So what is in that code that Bruce Dell wrote? I am not a programmer, but I am really interested in 3d graphics, and I know a lot of thinks how work. I know what is raytracing, octre, ssdo …… I think you can explain it to me.
      Thanks!

  6. Kyle, I guess the only use of octrees for this algorithm would be to introduce a bit of recursion in order to spare the work for the far points.
    Looking forward for any improvement (proof or program)!

  7. Octree would allow not to sort every atom in a scene but only small subset around camera because with the more distance you need less points from higher and higher octree nodes (representing simplified geometry and interpolated colors). And I think “sorting” must be smarter somehow. First – not to destructively move points in array but manage separate limited size array of LINKS to the points.

  8. Ok, I’m going to take a wild guess at how the algorithm, linked by b@b.com, might work. There are 2 parts:
    1. obtaining missing points from the database.
    2. keeping an up-to-date buffer with the points visible on screen.

    If anyone has ideas for part 1, I’d love to read them, because I do not yet know a viable solution. The octrees mentioned above might help.

    Part 2 seems rather trivial. The buffer is just an array of points which are roughly sorted in front to back order. That array is traversed rendering each point, checking visibility using a z-buffer. When rendering a point P, three things can happen:
    * There was not yet anything rendered to the screen pixel P projects to. Render P at that pixel and store the index of P.
    * There is already a rendered point Q, but Q is nearer to the viewer than P. Remove P from the array.
    * There is already a rendered point Q, but Q is farther from the viewer than P. Swap the points P and Q in the array. Remove Q from the array.

    Rendering takes O(n) time, with n being the size of the array. At the end of the rendering the array’s size is at most the number of pixels on the screen. The points retrieved from part 1 can simply be appended to the array prior to rendering.

  9. In your screen shot I notice the problem caused by rendering pointclouds using points. There are holes. Though I guess this can be fixed by having a high resolution model and a near plane that is placed sufficiently far.

  10. I’m probably way too late for this. But anyway I made this quick edit of the program:
    http://pastebin.com/HiDW5tMB

    It features a dynamic thresh size that fits the visible points found so far, so that there will be no overflows if the number of points on-screen exceeds the thresh limit.
    Also, if an atom inside thresh is no more visible, there is no point in swapping it with another random atom that probably will be invisible too. After removing the “ad” and “cmt” values, the visual result is identical.
    Finally, drawing squares around points to fill holes halves the framerates. Is it worth the additional work?

    1. Thanks Magio! That depends on others using it, but making it open is definitely good. (Sorry for being vague, my mind is now with the chemical machine, thanks everybody for keeping me up with the UD subject, things seem close to a working solution.)

    2. I’ve been running your code. It is amazing how well such a simple algorithm works. It definitely has its flaws, but I guess that most of them can be fixed.

  11. Do you guys really think this sorting algorithm is a good idea? Why? To me it seems just a bad way to implement a cache of points to display from the previous frame.
    You don’t seriously consider that it was Dell posting it… right?

  12. The first algo from fake brucedell is in the path of cube3d (JAS), 2d plane map with list of pixels in heigth, is a shame that not publish its code.

  13. Sounds sweet Kyle!
    I think that if you are achieving “instant” rendering than the original algorithm is no more of use right? Because all it does is distributing the work between multiple frames, so if just 1 frame is needed to render the image it becomes pointless…
    At this point you may have completely rewrote the whole algorithm but anyway, have you checked my paste? I brutally removed some stuff that turned out to be almost uneffective, visually speaking. For example, the “additional sorting” loop seems a complete waste of cycles to me…

  14. Acknowledged thx.

    As an unrelated notice, there is a pretty annoying issue with the original program. When I run the script it looks like the points are always kind of aligned to a grid facing the camera, no matter what direction I’m looking at. I think that’s the best description of what I’m experiencing. And the movements too are very “blocky”, when the camera moves slowly it is like if it is jumping from a block to another. Why is that? Is there a way to fix it.

  15. >when the camera moves slowly it is like if it is jumping from a block to another.
    That is because of buggy Dell’s camera motion smoothing code.

  16. I wrote that comment while replying to Kyle. But since you replied first to Kyle (#50), my comment showed right after yours, like if I was replying to you. The WordPress comment system leads often to these ambiguous situations. I apologize for this, from now on I will always include in my comments the name of the person I’m talking with.

    The Magio with the dark yellow avatar is not me.

  17. about circle intersections:

    bool intersects(c1, c2) = len(c1.origin – c2.origin) < fov * c1.radius/c1.z + fov * c2.radius/c2.z
    where len of v is sqrt(dot(v,v))

    should do the work
    as you see slow parts are divisions (for z) and square root. you can precompute inversed z or maybe you already computing inversed z somewhere in projection code, you can also try to avoid sqrt:

    bool intersects(c1, c2) = sqrlen(c1.origin – c2.origin) < pow(fov * c1.radius*c1.invz + fov * c2.radius*c2.invz, 2)
    where sqrlen of v is dot(v,v)

    1. and maybe you should also divide by z (or multiply by invz) origins of you circles if they are projected and lay in a screen space with 0,0 at center

      1. Probably he realized that this algorithm is not applicable at all and that he was receiving false hopes.

      2. Any news about the brucedell program? Anybody improved it? Please tell me if I understand it correctly: the movement of the camera is standard (i.e. does rotations), the perspective is achieved by divisions and the visible points are picked at random (like a random walk) until pixels are occupied.

  18. for help anyone, search “octree popcnt” and see a nice encoding for octree nodes.. with 4 bytes (1 int) per node, only with colors in the last level.
    with 8 bytes (2 ints) you can store intermediate colors.
    many variations can be made, avoid popcnt can be possible too, but need store the offset of childrens in 8 bytes version.
    compare with 8 adress for childrens (8 int) the ram reduced to 1/8 or 2/8.

  19. This is How Magio’s version of the algorithm works:
    – It loads the points as (x,y,z,color) tuples.
    – It generates a random permutation of these points.
    – Then it enters the main loop, with the following steps:
    — Do camera movement.
    — For each point (up to some limit):
    — Rotate that point (involving floating point computations
    — Project that point (involving divisions)
    — Render the point (including a z-buffer check & update)
    — Move point to the front of the array
    — Paint the screen buffer on the screen.

    The original version contains some additional operations in the main loop:
    – Enlarging the rendered pixels (making quads) to reduce the number of holes.
    – Slightly permute the point array.
    – The swapping of points in the array during rendering is slightly different.

    My overal opinion of the code is that it is rather sloppy, or is using certain tricks that I fail to understand.

  20. This is my last version of the algorithm:
    http://pastebin.com/kW1t5s1c

    – supports y-axis rotation (up-down arrows)
    – supports 32bit floating point datasets
    – no need to invert anymore y and z coordinates.
    – no more “grid facing camera” issue.
    – usage of UINTs extends clouds of size up to 4 billion atoms.

    “thresh” is now called “stack”. Fits better 🙂

    I do believe this is how possibly fast you can go without acceleration structures, which is the next big leap one can try.
    I lazily tried to pack points into grid cells and… failed hard :/ I can provide source code of it, but would not recommend…
    I don’t plan to go any further though.

    Other possible optimizations may be:
    – faster rotation function
    – faster float division (less precision).
    Just these two correctly applied may easily double the framerates.

    Clearly, all of this is not even close to UD. A classic splatting renderer Just4fun.

    Bauke, I noticed now your comment and I must admit the overall functioning is exactly the same as the original algorithm, actually the only innovation is the adaptive stack/threshold. Otherwise, just very natural cuts and simplifications are made 🙂

  21. Good work guys, you’re really getting somewhere with this. I have been doing some experiments of my own using a shallow linear octree as a scene graph, each leaf node containing an AABB (extent based not min max cartesian coordinates) which points to a list of 3d points. Using Baukes projective grid method it is possible to reject AABB.s in each simplex of of the cubemap (48 in total) after the frustum and half space cull. A list of points provided can be sorted by it’s quadrant, and can remain so until a camera move forces it into another quadrant. Also, when using barycentric coodinates as an offset for each point, from it’s container AABB, the code to determine if a point is within a half space space is simpler and faster.
    My linear octree is 1 bit per voxel ( we only want occlusion, shader works out colour/ angle / distance) . Cell contents in a hash map. The CPU renderer outputs points as 10_10_10_2, exact distance is then calculated on the shader using AABB / ray intersection. Also a histogram can be, returned which gives a complete list of any cells rendered. At the moment I’m using a 3d texture for colours, but this can change.

    Alos, had an idea for condensing each ‘cloud’ of points within a cell. At a distance, you will see a cluster of points as a single point. If each leaf cell contains a spherical harmonic of the contribution of each colour within, then the exact colour of a distant cloud can be calculated easily using an SH lookup.

    Interesting times indeed.

    1. Hello Lensman! Would you care to share with us some results of of your experiments? I am convinced that an important part of UD is that it is not a renderer (or it’s not the realtime rendering part which is the most interesting), but it has a way to prepare the initial database into another format which makes the realtime rendering easy.

      1. Hi, yes i have some speculative results. Sorting 10,000,000 points into 48 signed buckets ~20ms. This can be improved easily with simd, but is already pretty good. I’m rasterizing at the moment, and have tried a few different approaches but what seems to give the best performance is using an edge equation to compute barycentric coordinates and drawing each visible simplex that contains data as a triangle. Must add some qualifying info, and that is that I am working with data that isn’t pre-processed, but per frame vary few points change sign as you move, and not at all if you just rotate the camera, so an insert sort is good enough once the initial bucketing is done. It is rasterization, but I end up with exact distance to each point, at most one point per pixel. It’s promising as a technique and there is no need for a cubemap so i’ll keep hacking till I have something concrete.

    2. interesting work Lensman!
      my first think was condensing cloud of point with similiar normal, but not fit well with isolated points, the spherical harmonic aproach perhaps can work!
      keep work and make a open project

Leave a comment