What’s going on in this UD algorithm?

In the post My idea about UD, completely unrelated to the content of it, b@b made a comment where he gives a link to a post of “BruceRDell” reproduced here:

Okay, I see you understand basic principles. Now let’s move to the more complex things. I can’t share exact UD algorithm (you must understand me, I’ve spent years mastering it) but I can express in a code what I have already said in the interviews. The thing I want to say is: you must use sorting and eliminate raycasting.
Sorting algorithm I introduce here is very VERY simple. You can optimize it a lot using octrees and more elegant heuristics than that. But keep it simple.
You’ve mentioned this http://rghost.net/48541594 dataset, I had some problems downloading it, but I did it that and can say it is good for testing, you may use it with my program.

Here is the code I am talking about. If you’re smart enough, you’ll get benefit from it.

Magio, in this comment on My idea about UD, says:

I’m probably way too late for this. But anyway I made this quick edit of the program:
http://pastebin.com/HiDW5tMB     [updated version:http://pastebin.com/kW1t5s1c  ]

___________________

Question: why does it work?

UPDATE:  It’s not clear if it really works, it’s unknown the quantitative improvement given by the random permutation trick. If you care, the BruceRDell  thing is a prank.

___________________

I would like to understand the following:

  1. what is the role of the random permutation?
  2. why does the main loop work?
  3. is this already done? (send links to papers, if any)

So, let’s have a polite and informative discussion in the comments, if you agree. We all write in some variant of broken english, therefore I don’t see why anybody can’t participate.

___________________

Not interested in: is the real Bruce Dell? Is the real UD algorithm? Only interested to understand the guts of it.

I finish with the video from b@b’s comment:

Advertisements

143 thoughts on “What’s going on in this UD algorithm?”

  1. the idea is very simple…
    you have to fill a screen with N point, then take aproximate N point of you dataset and make the traslation/rotation/projection of the point and if is outside the camera reemplace this point with other how not is in this initial set.
    In some time you keep only with the point in front of camera, a zbuffer is constructed for oclusion.

    1. the random permutation is for “touch” every point in dataset..
    2. the main loop calculate the point in N point, reject the no visible and incorporate others, for this is need time for correct image.
    3.I guess not.. this aproach need time always for converge..octree structue avoid this.

  2. Thanks Pablo, but I still need to understand the magic behind the random permutation technique. I suspect that the idea has already appeared, maybe not in relation with 3D rendering, my math nose smells something here.
    But ignore the math part, is the algorithm really working? For any data input?
    Yes, I also suspect that the speed of the algorithm can be greatly enhanced by making some recursion along an octree structure, that’s another reason for trying to understand why it works.
    So I ask!

    1. there are related to sorting algorithms, better random in selection offer better average time for convege, I guess.
      another explanation of this algo is the next..
      take all the point in dataset, calculate position in screen, draw. But this need calculate all points and at last, only the screen points are draw then only calculate N point, some not draw and then this are changes for others in not the current set.
      this give constant speed but need time to find the current set in screen, the random can be reemplace with linear selection in the array of point.
      is only split the procesing of all point in time for constant rendering speed..
      interesting aproach but not big deal for me

    2. A random list of 3d points is quicker to sort using a ping pong bubble sort (cheap) , than an already ordered list. I imagine this is it’s purpose, as a shuffler.

  3. Pablo is right.

    1) Role of the permutation
    The initial random permutation is like shuffling a deck of cards. Compiling the program without it results in un-even filling of the screen (it follows the raw order of the points of the dataset) that results in artifacts. By randomizing it, you have a uniform distribution that looks nicer. The algorithm works perfectly without it actually. It’s pure and simple aesthetics.

    2) Main Loop
    How a splatter works:
    – splat all the points at once on the screen.

    This algorithm:
    – splat all the points on the screen but a section at a time.

    Definition of “splat”:
    initial camera facing y-axis: x=right, z=up, y=depth.
    Divide ‘x’ and ‘z’ by ‘y’: perspective achieved! Basic geometry here.

    But we want to move and rotate the camera, yes?
    Before performing perspective, the points in the current cluster to be rendered are rotated to match, of course, the position and orientation of the camera. This brings new xyz coordinated for the points that are, immediately after, projected onto the screen, excluding culled points that fall behind/outside the camera and z-culled ones.

    3) Is this already done?
    I’m pretty sure my GPU does it… In fact, any splatter works like this (a GPU is a splatter, it splats vertices on the screen. Don’t kill me). The only thing is that the dataset is not traversed at once. Pure spatial-temporal coherence which is very old stuff.

    ITT I try to port this algorithm to OpenGL. This algorithm is suited for GPUs. Load all the points to a VBO, use glDrawElements with starting and ending indices to render a different part of the VBO every frame. Enjoy extreme performance.

    Marius, I made this comment:
    https://chorasimilarity.wordpress.com/2013/08/30/my-idea-about-ud/#comment-5729

    It wasn’t meant to be a reply but it ended up in the middle of the discussion. lol.

    Link to the new program:
    http://pastebin.com/kW1t5s1c

    1. Thanks Magio, the comments order was messed by the deletion of Kyle’s comments.
      Yes, I understand how the algorithm works, excepting the random permutation part, for which I have only feeling-hand-waving explanations. Yes, a random permutation mixes stuff. But what does it mean this? A permutation mixes enough, why is a *random* permutation better?

      1. chorasimilarity :
        why is a *random* permutation better?

        Likely because uniform distribution. Every atom has the same probability to fall on the screen as any other. The permutation does not need to be random by itself, only the result must appear random, or close enough. In our case, we use a random function to shuffle the array, because its the easiest way to do it.
        Not sure if I answered your doubts, but again, this is possibly the least important step of the pipeline so idk.

  4. 1) what is the role of the random permutation?
    Random shuffling gives more even distribution of points and more chance for every point to be shown. It is used just to get less artifacts.
    2) why does the main loop work?
    Each frame you need to calculate (calculation is: a) project, b) check for visibility and c) move visible points to the beginning of visible set while d) moving invisible points away from visible set) only constant number of points from two small parts of the whole dataset: 1. visible set which is 450000 pts and after reaching end of visible set you jump to the 2. some pointer which marks beginning of the part of the rest points, say 1000000. On next frame pointer will be increased and next 1000000 will be checked after projecting visible set, if pointer reached end, it will be reset to the end of the visible set.
    That is why the frame time is almost constant. But you need to wait to fill the screen after motion. This results in online sorting along with reprojection: after you stay a while all points get sorted to fill the screen. When you move already visible points is still visible and on the edges of the screen holes need to be filled again with new sorting iterations. If we could heavily speed up the sorting part and exclude redundant work somehow it will be UD. But now it is not.
    3) is this already done? (send links to papers, if any)
    Some point cloud renderers works by reducing point count while you move and by filling the screen when you stop. They however fall back to showing reducing part immediately when you move again. Maybe some of them use reprojection too, who knows. I googled “point cloud reprojection” and found only reprojection system in area of coloring laser scans (combining color from photos with depth data).

    1. b@b, [update: thank you for all contributions!] the algorithm is trivial excepting the random permutation bit. So I would like to understand quantitatively how much does the random permutation thing speeds the process. Sorry for being lazy and asking instead of doing, but I would like to see the following: (1) comparisons for the same data but 1a. with no permutation 1b. with non-random permutations 1c. with random permutation (2) comparisons for the same data with other, classical, renderers, not with the pie in the sky Euclideon’s UD.

      I recall that at the beginning, Bruce Dell claimed a relative more speed than the classical algorithms, nothing impressive, but maybe (at that time) something it could be improved.

      For me the random permutation part is funny because it exactly trashes the involuntary space coherence embedded in the way data was collected. Why is this an advantage, if it is one?
      Finally, at the beginning I was not interested in this proposal, but after some thinking, I realized I don’t get *why* (and not *how*) it works, in the sense previously explained. Maybe is just an illusion and the program is not interesting. It would be good to know.

    2. Thanks for that description b@b. So if you sorted a chunky cubes of points (half a meter in size using world scan scale), you could sort large amounts of packets (cubes) of points, the overhead of not doing visibility tests on the packet points individually is far out weighed by the fast culling gained from packet sort? The sorting is done using memory pointers to the cubes of points, rather than the points themselves of course. Does that sound feasible?

      1. Lol, I tried and failed miserably at exactly that, Dave 😀 I can share code if you want. Basically it works, but it is buggy, ugly and somewhat slower.

      2. Magio :
        Lol, I tried and failed miserably at exactly that, Dave I can share code if you want. Basically it works, but it is buggy, ugly and somewhat slower.

        OK, I’ve not been back on the UD trial for a while, so I haven’t caught up yet. I don’t understand why you can’t treat packets of points as ACTUAL points on the top containers, then do the algorithm again on the container’s contents. I don’t mean an octree at all, just maybe 2 of 3 levels of containers.

  5. To b@b:
    Right now I’m working on an old computer with windows XP on.
    I tried and it works on XP, it was just missing some dlls. It runs anywhere from 50 to 200 fps and yep, its pretty much limited.

    To Dave:
    I never explained it here in-depth because results weren’t very promising.
    Left picture is with cell packetizing, right is my last per-atom algorithm.

    Quality is… well, crap.
    The algorithm correctly picks a single atom for every cell and performs occlusion with it for the whole cell. This part is very fast. For some reason it slows down to hell when a cell turns out to be visible and every point inside is projected and culled :/ Not sure why this happens so dramatically.
    Also, some of the cells are bugged and don’t show… No idea why. I likely screwed up with the implementation somehow.

    To Marius:
    Could you please update the pastebin link of your OP post with my last version? At least it has y-axis rotation.

    1. Thanks for replying, Magio.
      OK, it sounds a like your implementation of the idea might be buggy, rather than the theory of a hierarchic algorithm being wrong. That’s not a good basis to dismiss something. ; )

      1. b@b :
        did he?

        Mankind will never know…

        Btw no I didn’t. Just a layer of boxes till now and I don’t think I’ll do that. I’m not a programmer…

        DaveH :
        OK, it sounds a like your implementation of the idea might be buggy, rather than the theory of a hierarchic algorithm being wrong. That’s not a good basis to dismiss something. ; )

        The idea in not wrong at all, it is actually a natural improvement to it. I acknowledge that I may have done something wrong.

        This is the code with cell packetizing:
        http://pastebin.com/NH8ewFK9

        The world in my screenshot can be created like this:
        cloud_gen blah.txt 1200 800 1000 100

        See if you (or anyone else) can make it worth the work.
        Have fun :p

  6. @Magio, the loop:
    for (i = 0; i < count*4; i+=4) {
    uint ri = (int(randf()*randf())%count)*4;
    SWP(i, ri);
    }
    does not give a uniform random permutation. The following is better (assuming that rand() gives numbers of sufficiently large size, otherwise do not use multiplications, but bit shifts and xor, such as (rand()< 1; i–) {
    uint ri = rand() % i;
    SWP(i*4-4, ri*4);
    }

  7. I’ve executed the algorithm on one of my datasets without permuting the data. In my dataset, the points are sorted in Hilbert curve order (such that points close together in the array are also close together as points). As the algorithm does not render the entire screen in one go, you can see it build up the screen and also see it project multiple points to the same pixel.

    So lets assume that for each pixel, there are on average 100 points that project to that pixel. Furthermore, assume that points close together have similar colors. Under these assumptions, there is a simple theoretical reason as to why using a random permutation is useful. With an arbitrary permutation, the render algorithm would, in the worst case, project 100 points before going to the next pixel. With a random permutation, there is only a really small chance that the next point projects to the same pixel. So the whole purpose of a random permutation is that the rendered image converges faster to what should be rendered.

    There is a rather similar technique in integration, which is known as Monte Carlo Integration. While the application is entirely different, the underlying principle is the same.

    1. Thanks Bauke. That’s a reason for using a permutation, which mixes stuff. But what is the advantage to use a random one? (i.e. one picked arbitrary from all permutations. Btw, how do you construct random permutations? Something fishy here.
      The algorithm makes me think about random walks: at step t you are at x and at step t+1 you are at y, picked at random with respect to a probability measure depending on x. Maybe this “random” permutation thing is a naive way to implement a random walk? Again, what’s the quantitative advance?

  8. I gather from the comments that basically nobody has any idea why exactly the random permutation trick is useful.
    Dave H, I am curious how effective is the algorithm on one of the fractals you use as example for your rendering algorithm. (Remember the discussion about metric dimensions and rough measures of curvature, natural scenes and fractals?)

  9. You’d better define “useful” then. Technically it doesn’t speed up rendering: you still have to splat the entire dataset to make sure everything has been drawn, permuted or not.

    1. ine, I agree. I want to understand if the proposal is useful or not. Maybe is not. Or maybe there’s a nice piece of math hidden there. (Which I start to doubt, but there is still a bit of a spark of hope.)
      The only novel part (maybe an illusion caused by ignorance) is in the random permutation. I agree and understand all the hand-waiving arguments, but are they an illusion as well? I don’t try to be bossy or otherwise, just curious.

  10. Well, the argument about the visual perception sounds convincing for me. The permutation makes the program fill the screen uniformly, instead of filling it in patches, as it would do without a permutation. So while the scene is not fully rendered, it looks nicer.

    1. Yes, is convincing and interesting for me because it shows that there might be an advantage not to use absolute coordinates (which are not accessible to perception, they are a manifestation of the homunculus fallacy in the CS research on vision). But that’s where I stop understanding: OK, you shuffle the points in order to increase the chance (for a greedy algorithm for painting the pixels) to finish sooner. I get it. But is it a true effect? Does it work as well for Dave H fractal world? Does it work as well for a collection of random 3D points?

    2. Yes, Magio originally said “The algorithm works perfectly without it actually. It’s pure and simple aesthetics.” It’s not part if the algorithm, it’s just presentation, like spray painting large areas, rather than thickly painting small areas one at a time.

  11. >But is it a true effect?
    Depends on your point of view. The algorithm will not finish the complete scene sooner on average, so technically it’s not a “true effect”. But if it has drawn 99% of the scene, with the permutation the holes will be spread all over the screen and will be barely noticeable, while without the permutation you may see a single large hole taking 1% of the screen.

    You can compare it to the progressive JPEG pictures. It doesn’t save time to load or make the picture smaller in size, but it makes it nicer while it’s loading.

  12. What a discussion for such a small thing 😀

    No one else wanna give a bite at the cell-packetizing-hierarchical-containers idea? It’s the only way to make this algorithm worth something…

  13. Care to explain how exactly cell packetizing is supposed to work?
    From your description I understood that if one atom in a cell is visible, you splat all atoms in that cell. But what if it isn’t?

    1. ine :
      Care to explain how exactly cell packetizing is supposed to work?
      From your description I understood that if one atom in a cell is visible, you splat all atoms in that cell.

      That’s basically all 🙂

      Also, lossless precision truncation can be applied to save half of the memory (aka write the point coordinates as relative to the center of the cell, thus requiring like 1 byte to store the full precision instead of 4 for floats).

      ine :
      But what if it isn’t?

      Easy enough, you don’t splat the cell.

      Warning: artifacts ahead.
      95% of the times if an atom inside the cell is not visible, the whole cell consequently is not. If a cell is far behind the camera, you sure don’t need to check every atom inside to see that the cell is not visible. But of course there are times where that particular checked atom is not visible, but some of the remaining atoms may be visible. This leads to some artifacts but the speed boost should be worth it.
      Btw a big part of those artifacts can be fixed, but at this point this is not really our primary concern.

  14. Well, I don’t know 🙂 Square holes in your screenshot, which are probably artifacts of partially occluded cells, are rather disturbing. It doesn’t really solve the occlusion problem, you see. So in the end we’ll have to splat the entire dataset again it seems. Again, it may draw a large part of the picture quickly and then spend a lot of time finishing it.
    There is no “true” performance gain from this, unless you have an idea how to fix the artifacts faster than by rendering the entire scene.

    1. I guarantee almost for sure that those are not occlusion artifacts. I can tell because the missing cells are actually rendered, but not in the correct position. In fact, the atoms are rendered with coordinates stretched to infinite +x and +y, with a blueish colour.

      Actually… yes, you can see them in my screenshots… I accidentally captured them: image coordinates 150, 290 🙂 Those are just bugged. Forget them.

      ine :
      There is no “true” performance gain from this.

      How does this idea improve performance?
      It speeds up checks for non-visible cells. In the worst case (=if all cells are visible), it shall render at the same speed as the naive algorithm. But 100% of the times just a fraction of them are visible. Being pessimist a well designed algorithm like this with one layer of cells may gain:
      – 4x in performance (with an average of 1/4 of cells in view?)
      – half of memory used.

      Again, I’m not a programmer. I probably did a very bad job with that implementation. And yeshy, that’s why I’m asking.

  15. I put here a checklist for what I consider that can be qualified as a UD solution. I could make another post with it, but maybe later, because I don’t want this blog to be monopolized by UD seekers. Therefore please spread it, discuss and criticize it here, in particular. I believe that a UD solution should satisfy the following:
    1- has a pre-rendering part consisting in the preparation of the database (with no time constraints)
    2- has a real-time pre-rendering part which uses the database from (1), does not use multiplications, divisions or trig functions, is a sorting algorithm and outputs at any moment a buffer with a sufficiently small number of points,
    3 – so that any (sufficiently clever) rendering algorithm can paint the screen in real time by using only the buffer.

    This schema is consistent with all claims made by Dell.

  16. 1- there has to be a constraint on the size of the DB, otherwise the problem is trivially solved by saving the reference frames (aka snapshots) on a fine grid. Say, O(N log N) for N points.
    2- “Is a sorting algorithm” is too fuzzy. Is octree building a sorting algorithm? Why is this a requirement?
    3- so the point renderer can use multiplications etc.? I think this is reasonable. It would be even better if a bound on the number of pixels could be specified.

    1. ine:

      1- agree
      2- to be consistent with Dell
      3- yes, again, to multiplications in the renderer. Second part because Dell’s UD is not a renderer, he declared this repeatedly. I believe since a long time that the real problem is the database, not the renderer.

      More clarifications:
      – the buffer does not contain only the visible points, but it contains a small number of points (or pointers to octree levels, or whatever) such that
      – if we call t_1 the time needed for updating the buffer and t_2 the time needed by the renderer to update the screen with the input from the buffer, then t_1 + t_2 has to be small enough to count for “realtime”.

  17. Just a heads up, in order to stream from the HDD, at fastest I can only get 100k reads per 250ms; which is your budget during traversal.. unless there is something more that can be done to improve that, going to try a few more things to maximize performance, but currently that appears to be a cap;

    1. I once did an analysis of how fast certain hard disk operations are and found the following statistics:
      – It can read 100MB/s if the data is being read sequentially from disk.
      – Random access, i.e. seeking to a specific location on disk, requires at least 9 ms.

      I used these for analyzing what Euclideon’s database conversion algorithm is able to do: https://chorasimilarity.wordpress.com/2013/05/25/call-for-analysis-of-the-new-ud-video/#comment-3547.

  18. In this case it would be random access, and using low-level driver calls to read from the file; I am not sure what you mean by “requires at least 9ms”, but I can assure you I’ve only squeezed 100k reads within 250ms, so 400nodes/ms this is keeping in mind, my octree is terrible and uses pointers instead of morton code;

    each node is 1 char, 9 ints, I am sure the reads per second can be otherwise improved with morton code to 1 int per node using alpha channel for that 1 char i have;

    1. Hey Kyle, the HD work with buffer of fixed length, read less is more work to SO, better plan read block of nodes. and is better when the size of chunk is pow2.
      37 bytes per node is not good, like i say, 16 bytes per node is posible and easy.
      when you publish you code?? you will publish your code??
      if you wait to finish never can help you, and us lost interest or something better appear, learn from Linus, open source is very powerfull.
      Not everybody take care of this, look JAS, they have a code of 17 years old, if be open, many people can improve..but perhaps the code lost forever in a hard disc..not usefull at all.

  19. Eh.. Just trying to improve my reading, currently I traverse about 50k through 300k nodes with fairly confined changes to scene.. was trying to just decrease this as much as possible, I am guessing the memory alignment of pow2, and the order of hilbert code may improve my 100k cap to squeeze more out of that .. but I have no idea how to do it.. the PDFs/WIKI are not making sense to me, likely cause I have not read them, I just look at the images and give up trying to understand it.. I am actually being greedy and trying to sale my algorithm once I improve the reading from hdd portion of it, and I suppose clean up/structure the code better so its worth purchasing.. open-sourcing doesn’t make anyone any money..

    I’ve learned that from a friend, who open sourced an ESB more powerful than pubnub, now everyone downloads the free-serialized-obfuscated open sourced version with very limited specs(still a hell of a lot better then pubnub if put on a beefy server since his free one doesn’t allow scaling, and clustering);

    now he sits by waiting for clients to purchase an enterprise license with a slim chance..

    1. I think you are wrong, Linus did not do too bad, right?. but it’s okay, everyone decides to do with your code.
      So you have have a problem with storage, maybe find a good algorithm for ray tracing, but lacks the idea of ​​storage, although we will not know.
      Reduce octree node is not difficult, it is replaced with a base address plus an offset, build the tree has a few tricks but not impossible, in fact I I have published as open source code.
      but this does not solve the issue of storage, there must be another detail to compress this tree, as having addresses in the nodes and this does not allow reuse nodes.

    2. Kyle, I think Pablo is right. There is one thing the open source brings you: notoriety. You can cash that later. An open source UD will appear (there is already one by Bauke, but I keep claiming that the real improvement comes from the clever database pre-processing, thus Bauke proposal is very interesting, but different idea than UD).
      Instead of keeping your work for yourself, why not share it in such a way it’s clear you are the author? Your choice anyway.

  20. If anyone wants to make money with me, then I am happy to work with them providing my algorithm/traversal, and their optimizations.

  21. oh wow, you’re right.. if I can get it down to 16 byte per octant, then I can read 4 at a time, with hilbert order, then I’d be at 400k per 250ms.. which is actually enough; how the hell do I do that though — google

  22. Euclideon may have made a specific version for landscapes (version 3?).
    If you see the problem as a 2D grid seen from above, it maybe easier to grasp the streaming process.
    If the camera is surrounded by a grid of 32 by 32 squares which wrap around the closest 3 mile area for example, then that would mean a fixed 1024 current files to read. As a grid cell got closer to the camera, the file pointer would move down the cell filling in the detail sequentially until the final detail at the end of the file. The cell could be kept in memory in it’s entirety until the it wraps around the 3 mile limit to save streaming the same data again if the camera toured the local area.
    This would work in 3D I guess, but it’s not as easy to explain.

    You can see potential 2D block streaming over the internet with Geoverse here. The cells appear to be cued, and judging by typical internet streaming they all arrive at different times. Just jump to 12:15.

  23. There are various reasons why I chose for making my software open source. It helps me reach a wider audience and in the case of open source, a wider audience implies a larger development team, with all kinds of original and inspiring new ideas. And while I might move on to other projects, my creation will keep on living and evolving within the community. It will also give life to all kinds of new games and applications, based on my creation and thus also entirely open source. So in that sense, open source pays for itself, without me having to do any boring stuff, like monetizing my creation.

    Monetization was not one of my goals, because it kind of takes all the fun out of developing an unlimited detail renderer. I probably can also make money more reliably with the job I have. And even if there where a potential buyer, why wouldn’t they buy the more mature Euclideon software?

    Monetization is still an option though. I’ve released it as GPLv3, but because I’ve written the code, I can still sell it to people under a proprietary license which allows them to use it in a closed source project. I can also accept money for writing specific enhancements or to provide other kinds of support.

  24. @chorasimila
    Have you tought about building a forum?
    I really think that a well divided phpBoard forum with the right division between subjects (mathematics, UD, etc) would be closer to the platform for open project interactivities you seem to be looking for.
    I’m a noob just trying to understand UD for some time, and how it works. I don’t know how you guys would read my stuff knowing that I have something between 0 and NULL from 3D programming. But here it is, my idiot thoughts:
    1 – Ray-tracing is out of the game for UD.
    2 – The database file data organization for fast searching is one of the big needs for this.
    3 – The search algorithm. In many videos from Euclideon this algoritm is mentioned and pointed as the “magic” behind most of UD. What’s the best way of searching on a cloud of dots? Google search algoritm is even mentioned in a UD vídeo.
    4 – Like mentioned by BruceRDell. Stick with the simple.

    And this is my most ignorant self: “The screen is nothing more than dots filled by a mathematical algorithm. It is just 2073600 dots (fullHD).”
    Maybe forgetting e re-learning from simplicity is the best for this.

    I’m sorry if I am just a random guy. And I love you all for trying to build this. Going against a gigantic wave that is just building games that demand more and more extremely expensive video-card and delivering much less.

    1. Thanks, a forum would be great, but I don’t have time to manage it. If everybody wants to do it then I’ll participate and repost the zest of discussions here, when converging to something.
      Also agree with all 1-4. It appears that it is possible that the original UD uses something resembling what I proposed in My idea about UD and the random stuff ingredient of the fake brucerdell.

  25. In the movie above, posted by DaveH, Bruce Dell mentions, starting at 7:27, that their database conversion is very fast, processing 3.500.000.000 points per hour. Where common geospatial programs take days or weeks. So whatever the format of the database might be, you have to keep in mind that there is only very limited time to create it (instead of infinite time).

    In another movie it is mentioned that for conversion, you have to chose some kind of resolution. And with a lower setting, points might end up at the same ‘voxel’. In that movie it is mentioned that these duplicate points are discarded.

    Both the conversion speed and the discarding of pixels make me believe that it is converted into some kind of octree. After all, octrees are a quite efficient indexing scheme. (And I could imagine Bruce being too proud of his invention of the octree to not use it.)

    As a last point, I want to mention that somewhere Bruce mentions that the algorithm is progressive, in that it reuses information computed at previous frames for rendering. Kind of like the algorithm from fake brucedell does with its tresh/stack/cache. I think I might have found a way to modify my algorithm to make it do such progressive rendering.

  26. I am indeed interested to see the changes you make to your existing algorithm, hopefully it will speed things up enough for you to run it at higher resolutions like 1900×1024

  27. Also, how does your traversal perform(reading from memory) against very dense data sets like the euclideon island, with 64million points per cubic milimeter;

      1. follow the forum.. he convert from a 4gb file to 2gb,with code in the post. I can’t compile the code (with gcc) but the guy not finish the idea..

  28. Hi,

    I have some thoughts concerning BruceRDell algorithm no 2. They mostly refer to projection algorithm, some are obvious.

    As someone mentioned rendering is done by ordinary point projection, same as performed on GPU using traditional rendering pipeline. The only thing that might be taken as unusual is why isn’t it done with matrices.

    The point projection method done this way is of course inaccurate, especially in closeups when difference in distance between point and camera in floats and integers becomes significant. Some minor errors (~1px) are produced by further integer calculations.

    It is possible to replace remaining floating point operations by integer ones and, by some additional operations, reduce most of errors in closeups. Because of let’s say “insufficient int capacity” some inaccuracies still remain and are noticeable and annoying. With my naive integer-math implementation with closeup-error minimization I managed to speed up the rendering process by about 20%.

    In turn when fp operations are used (some minor corrections need to be introduced though), the results are accurate, but algorithm becomes slower by ~20% compared to the original one. In this form it is also slower than its SIMDized matrix equivalent by ~6%.

    On single core of my notebook’s 2.26 GHz CPU I can perform approximately 14 million point projections (using matrices & SIMD) per second (this value can be substantially different as it depends on frame and depth buffer size and degree of filling of them).

    In video with interview with Bruce Dell, Dell was showing Island demo running on 1 core of i7 2630QM processor, screen resolution was set to 1024 x 768, fps varied between 15 and 25.

    One thread of mentioned processor has, according to benchmarks 47% higher performance than mine, so potentially it can render 20.54 million points per second.

    1024 x 768 = 786432 pixels
    20.54 million / 786432 pixels = 26.12 fps

    One has to remember that 26 fps apply to rendering of simple point list. And this is the maximum speed the whole renderer can achieve (i.e. if no additional operations e.g. sorting are performed).

    These 26 fps will be decreased because of the heuristic nature of the algorithm, which in order to find final image requires significantly more points to be checked (i.e. rendered) (not to mention that point swapping or any other sorting operation takes also time).

    If it is possible to reduce number of checked points to the number just slightly larger than required by screen size, then 15 to 25 fps seem possible.
    I’m not an expert in this field, but I guess after 15 years of development an appropriate or at least tolerable solution can be found.

    What intrigues me is why people asked Bruce Dell if his engine performs ray tracing and no one asked if it simply projects the points? Why wouldn’t he tell it anyway? Maybe because when the people knew the truth, the magic aura around UD would be gone? Especially after Dell’s categorical distancing from traditional rendering.

    PS. I can install and maintain a forum on my private server, just give me a sign you are going to use it and be active:)

    1. Hi Paul,
      It is not clear to me that using the calculation of the projection point, this needs a division or multiplication. I tend to think that you need a Rasterization algorithms to avoid these calculations, but mine is just an assumption.
      I use only integers and fixed point numbers to draw in 3d. Using only integer numbers you have more control over the accuracy of the calculations.
      Count with my for contrib in the forum.

  29. euclideons algorithm, and the one presented by Fake BruceRDell will never work on the GPU…. I wish people would stop thinking the GPU is the answer to everything;

    if the GPU had a built in hard-drive, and the hard-drive was specifically designed for IO operations by the CPU, then one could upload the data to the GPUs HDD, and then read it directly from there, otherwise this algorithm is never ever never going to ever never work on GPU.. only the rendering of points

    think of it this way; one would need to run very minimal operations on teh GPU, just to tell the CPU to upload a point, so it can know what to do next in its algorithm.. then the algorithm would repeat this until the screen is full of points.. SLOW as hell; after finally getting a scene rendered in a very inefficient way, you could finally move the camera, and start the process over, at minimally if your algorithm is good;

    either way… its all super inefficient when considering the GPU for anything other then the actual rendering of the points; one day we will have one of two things(or both) higher demmand for IO read operations, making GPU useless for this algorithm.. or, a specialized hardware combo as I mentioned with optimized IO read/write operations JUST for a specialized GPU capable of running this algorithm;

    if a GPU had that, then buake algorithm could easily run at 1000fps while streaming points from the HDD;

    in fact, I wouldnt be at all, or even remotely surprised if euclideon is experimenting on this GPU/HDD hybrid, or at least working on faster IO operations; I mean its not a IN demmand thing, and so we just don have it… there are internet connections faster at sending data across the world, than your hard-drive is capable of sending to your CPU for manipulations in ram

  30. Mateus, Kyle, everybody, make a forum somewhere, because here there’s an effort to be concise and high information density. (Excuse me Mateus for trashing your last comment and putting instead this one, which is almost void in info, as well.)

    Happy to see you talking, otherwise, and interested in what comes out of this.

  31. No news and unfortunately no time to code or to post. I only sometimes check if someone posted something (people only register and do not post).

    1. In convert user manual: “It is possible to continue using UDS files for large data sets however users may experience vibrating points when trying to view the data in close detail.”. A dataset with >1.000.000 points in diameter is considered large.

      Can we deduce anything from that?

      1. Maybe. This is problem is often seen in floating point mathematics, where the numbers get so large that they can no longer be accurately represented in 32 bits. It’s a problem that often appears with rendering large areas. You start to get things shaking and jumping about as rotations and projections lose fractional precision.
        BUT, if they’re using fixed point integers then…. hmmm.

      2. My guess.. some displacement in scale store the position, when not get full precision because some lower bit are missing, not calc the real position correct.

  32. hi!
    in the video postet by DaveH it is interesting to see, that the engine has no problem rendering the scene from two different perspectives. the videos i saw before showing the island had perfect reflections in the water. these could mean another rendering pass or maybe cubemap stuff.

    gpu:
    the only problem i see is transferring the data to graphics memory. the data could be encoded into textures (octree -> texture)… i have no experience with that.

    in my opinion, sorting every frame is crazy. maybe im wrong…
    if you use an octree, your data is sorted already. why bothering? the levels can be stored sequentially, so reading them from hdd shouldnt be a problem. i also dont know why a node would consist of one char and 9 ints…
    even occlusion is fairly simple with octrees. looking at the patents from euclideon, they use octrees and the key is to traverse it correctly (once) and reject unneeded nodes.

  33. @bauke
    ah ok. now i get it. the island was turned upside down and just like the regular island, the visible parts were rendered -> fake reflection. i thought it would take more rendereing time but its still not more than one atom per pixel.

    well, here is my opinion on the ud algorithm:
    1. data: every point cloud is saved in a sparse octree. that means, only the nodes containing points are saved, other nodes dont take up space. (as far as i see it, you cannot use morton codes when the octree is sparse. at least it doesnt make sense to me, except if u use a hashmap with morton as keys). every point requires one bit. since you know where the node is, there is no reason to save x, y, z coords. then you can add one byte for palletized colors and maybe one byte for a normal vector (not sure about this yet). the higher levels also save a color representing the underlying nodes (this is important).

    2. rendering: you traverse the tree depth first from near to far. this is simple because of the octree “grid” structure. you only have to go down the full depth for points close to the camera. for points further away, which would be smaller than one pixel, you stop at the parent node and use its color and possible other informations. so in the end the tree wont be traversed fully and a lot of nodes can be rejected by simply taking the distance into account.

    3. occlusion: by rejecting occluded nodes, even less nodes need to be traversed. if a node at a very high level is occluded, a whole branch wont be visible and there is no need to traverse it. this is easy. i have a funny idea for that 😉

  34. think about how google map does clustering, imagine the octree in 2d space, imagine how changing perspective might effect the clusters and their children, then consider how much easier it’d be to traverse this progressively in that sense.. data to screen approach will always be better than screen to data.

  35. @kyle:
    i know what a quadtree is 😉 . google map uses a quadtree? if i think of a quadtree, it seems quite easy to traverse it correctly from any perspective. it helps to find only the needed points. yes, its faster to go through data that is not indexed, but if you have to check much more data, it gets slower again. the hierarchical structure also allows simple LOD.
    another thing, why do you think my idea is screen to data? i thought im doing data to screen.

    1. at 9:06 the streaming over the internet is demonstrated. one can clearly see the lower lod used to overcome connection speed. to me that looks like an octree which is just not traversed to full depth but stopped at higher levels. that way less data is required and rendering is faster but it looks blockier, just like in the video.

      1. This has been discussed before. It’s the v.3 software that appears to work on an area 2D streaming system, and the Internet deliverers packets when available. They don’t allow comments because they made a misktake by introducing the massive gaming world into thinking it’s the next best thing in their world. It was a mistake, simple as that.
        But it does show they do appear to be headed by extremely naive business positioning, and are feeling their way through foggy territory. It’s sad more than anything. I wish them more powers of thinking, but like Atomontage, they flounder horrible in getting their software to the public, probably because of the dollars shining in their little eyeballs! It’s very precious to them, you see.

      2. Dave H., let’s stick to speculations only when it comes to how UD is working.
        Mateus, I trashed your last comment. Dave, it’s on the boundary, but anyway:
        Mateus and Dave, do you have any proof for your claims?
        For example, I think they don’t allow comments because of past experience with hate messages.

      1. They simply don’t want the hassle of moderating the frustrated comments and jeers from the game playing masses and CGI people. After all, it’s nearly 2014 and we still can’t demo it ourselves.

      2. Dave, you can buy it, why don’t you? Frustrated comments were made by people rejecting everything before taking some time to think.
        Please, there are many other places for this type of comments, here we try to be constructive and curious, not venting social discontent about a nobody who dares to think independently.
        Who cares about that? I still look forward for variants others than Bauke’s, and in the same time I’m congratulating him for being the first to come with a decent constructive solution.

  36. chorasimilarity :
    Dave, you can buy it, why don’t you? Frustrated comments were made by people rejecting everything before taking some time to think.
    Please, there are many other places for this type of comments, here we try to be constructive and curious, not venting social discontent about a nobody who dares to think independently.
    Who cares about that? I still look forward for variants others than Bauke’s, and in the same time I’m congratulating him for being the first to come with a decent constructive solution.

    I was just expanding on exactly what you said: “I think they don’t allow comments because of past experience with hate messages.”

  37. thx dave, i think i understand it. the final 2d images are streamed in whatever quality…
    i didnt know it was discussed before.
    they probably disabled comments because of notch rage 😉 i also think they are doing something wrong. we should all be playing games with unlimited detail by now. but what do we get? calls from random strangers.

    i did a little thinking and i always have to do projection for each pixel. am i doing something wrong or is there no way around this?

    1. wait! now that i think about it, i dont understand it. in the video, a link to the file is opened, so there is no software streaming 2d images. where can i find the information on that?

    2. roy: take a look to 3d cube post in this blog. I guess the UD are similar in some point (not full similar). The main idea is store the map in diferent resolutions, when you draw in perspective, only the close point need the full resolution, more far points, less resolution needed.
      No need load the entire map in mem, only chuncks of diferent resolutions.

      1. Yes, exactly. The files are stored in a resolution progressive order, so the data can be increased in resolution without the read ‘head’ jumping to new start positions. For a better explanation try searching google for “progressive jpeg”. This must basically be a 3D version of this, if they are indeed storing the data as 3D points.

      2. thanks for the hint. im reading it right now. but it doesnt look that promising. everything said is mainly about raytracing. i guess i have to code something…

      3. roy :
        thanks for the hint. im reading it right now. but it doesnt look that promising. everything said is mainly about raytracing. i guess i have to code something…

        Hmm, I thought we we talking about how the data is loaded, there aren’t many options and this seems to be the most viable for the basic streaming abilities that Geoverse appears to have. They only load the resolution of data they need according to distance in the view frustum. Mip-mapping and LOD are very much present in old and new rendering techniques, and are directly linked to this idea.

  38. Dave H. :

    roy :
    thanks for the hint. im reading it right now. but it doesnt look that promising. everything said is mainly about raytracing. i guess i have to code something…

    Hmm, I thought we we talking about how the data is loaded, there aren’t many options and this seems to be the most viable for the basic streaming abilities that Geoverse appears to have. They only load the resolution of data they need according to distance in the view frustum. Mip-mapping and LOD are very much present in old and new rendering techniques, and are directly linked to this idea.

    sry dave, your reply wasnt there when i posted mine. i replied to preda. i meant that in the 3d cube post, there is a lot of hard to understand talk going on. and often there is some kind of rays involved.
    but back to data representation. you and preda say that the data is saved in multiple resolutions and i actually looked up progressive jpeg.
    that is exactly what i would expect from an octree. the deeper you go into it, the higher the resolution. thats why i said it looks like they use an octree. each level is higher resolution so just stream the higher levels depending on the connection. i still dont understand why that cant be the case. sry if i dont get it. i actually know quite a bit about computers and programming, but i never really did cg.

    1. I’m really only working backwards from one of the things we know for sure, that it can stream over the internet, or a USB 2.0 stick.
      The file pointers and position data could very well be in an sparse octree at, say .1 mile resolution. And that may be loaded permanently into memory for speed, but I was wondering how the masses of data could be stored and retrieved efficiently from streams.
      The whole data can be chunked up into one single file, but there needs to be some kind of coherent data within, otherwise it’ll be wasting time moving all over the file.
      But I guess that’s the secret of their ‘search-engine’ grade sorting code! 🙂

      1. ok. i have no answer for that. my first thought is to just save the tree breadth first. there shouldnt be too much seeking backwards, i guess. but a lot of skiping is probably involved. i dont know how expensive that is.

      2. chorasimilarity :
        I continue to claim that the new thing of UD is exactly this database format.

        i read nearly everything in the comments and you said this a few times, thats right. but at least i am still strugling with efficient rendering. i dont want to do projection for each pixel, so im still trying. there needs to be a good solution for that, then the data structure can be improved. at least thats how im trying to do it.

  39. but im obviously too dumb to use the comments correctly -.-
    chorasimila could you please make my comment a reply to dave and delete this one, thx!

  40. From the documentation referenced by pabloreda (http://www.merrick.com/merrickandcompany/media/Files/Corporate%20documents/Euclideon/GeoverseConvert_UserHelp.pdf), I get that the conversion rate is 1 million points per second. It also mentions a ‘grid’ whose size must be a power of two, however it is unclear what a ‘grid’ actually is. (I think a ‘grid’ is a cell in a 2D grid). You cannot really do any other sizes with octrees.

    Then there is the (failed) patent that Euclideon filed: http://www.ipaustralia.com.au/applicant/euclideon-pty-ltd/patents/AU2012903094/, which explicitly mentions octrees.

    Furthermore I want to remind that Bruce invented the octree (but was not the first to do so).

    So everything suggests that Euclideon is using octrees in their algorithm (and then likely also as their database) and I have not seen any claims that are incompatible with them using octrees.

    Hence I do not think that their database format contains any notable new things. Instead I believe that their database search algorithm, i.e. how they traverse the octree, is what is new.

    Thus, I continue to claim that the database format is only a fancy version of the well known octree format.

    1. i agree with you. they use an octree…
      btw the system requirements say directx 9. so they are far away from using software for rendering. the cpu selects the points, the rest is done by gpu. there goes the perspective projection problem.

    2. I agree it was pretty clear that the data structure is an octree and the key point is the octree traversal, see this post from 2009 made by a guy who was working at euclideon and then left (according to his posts)…

      http://www.gmscript.com/gamemonkey/forum/viewtopic.php?f=12&t=419

      Quoting from the post:

      “I’d like to mention Unlimited Detail, a technology developed by Bruce Dell, which does in fact do exactly what he claims it does… Render incredibly detailed 3D scenes at interactive frame rates… without 3D hardware acceleration. It accomplishes this using a novel traversal of a octree style data structure. The effect is perfect occlusion without retracing tree nodes. The result is tremendous efficiency.

      I have seen the system in action and I have seen the C++ source code of his inner loop. What is more impressive is that the core algorithm does not need complex math instructions like square root or trig, in fact it does not use floating point instructions or do multiplies and divides!”

      1. thanks for the post! thats very interesting. i have a few ideas for traversal. but i cant get past the rendering…

      2. Thanks for the link. It clears up some of Bruce’s vague claims.

        He also mentions that Bruce’s claim “the system isn’t ray tracing at all or anything like ray tracing” actually means that “the method did not contain any mathematical or structural concepts of a ray”.

        I have used “perfect occlusion” to describe my algorithm. So it seems that I’m actually quite close. The difference is that I traverse each visited octree node 2-8 times on average. I can bring that down to exactly 1, but then I have to implement more difficult occlusion queries (described below). I’m considering using a Fenwick tree for that. I’m also considering a slightly faster, but courser occlusion check based on the quadtree that I’m already using. Other ideas are welcome.

        For the new occlusion queries I need 2 operations, working on a 1024×1024 grid, with each cell initially unmarked:
        – Mark a single grid cell. (Performed at most 2^20 times.)
        – Query whether a certain rectangle contains any unmarked grid cells. (False positives are allowed to some extend, but reduce rendering speed. This query is performed once for each octree node traversed, approximately 2^23 times.)

      3. Thanks Pablo, *octree style*. I think that full UD needs new ideas about the database formats and also new rendering styles. Bauke’s solution is concentrated on the second part, I still look forward for proposals on the first part (I have one myself, but not enough time to program it, but who knows? maybe sometime I’ll give it a try).

      4. “I have seen the system in action and I have seen the C++ source code of his inner loop. What is more impressive is that the core algorithm does not need complex math instructions like square root or trig, in fact it does not use floating point instructions or do multiplies and divides!”

        that tells me: the outer loop is probabply foreach octree/object. the inner loop is then the traversal of the visible octrees determined by the outer loop. the inner loop uses only integer additions. so for each object, the complicated stuff is only calculated once at the start. like projection etc and it probably uses floats and divisions there. then in the inner loop the points are selected from the octree depending on the calculated values. the points dont need to be projected etc. their position is added to the relative position calculated once at the start.

  41. What and awesome discussion. The method that first came in my mind about the “octree style” is that maybe it’s some kind of threaded octree where each cell contains a reference to it’s neighbour. This speeds up considerably the computation of the addresses of neighbor cells; since otherwise in the worst case scenario finding a neighbor cell implies going all the way up the tree to the root cell and then down again.

    1. maybe is misunderstood something, but if you want a neighbor cell, you have to go up once to the parent node. that means one pointer up should be enough.

  42. “that tells me: the outer loop is probabply foreach octree/object. the inner loop is then the traversal of the visible octrees determined by the outer loop. the inner loop uses only integer additions. so for each object, the complicated stuff is only calculated once at the start. like projection etc and it probably uses floats and divisions there. then in the inner loop the points are selected from the octree depending on the calculated values. the points dont need to be projected etc. their position is added to the relative position calculated once at the start.”

    Actually this could be a challenge to chora and other mathematicians, to come up with working formulas (not sloppy programming tricks etc.) to project points from 3d to 2d without requiring multiplications or divisions, only incremental calculations.
    The database is an octree and most likely doesn’t contain any notable new things. The traversal is the novel thing, it seems.

    1. yeah, if there was a way to calculate projection and other stuff just vor the 8 corners of the octree root and then calculate each points position just by adding/subtracting would be nice.

      1. Hi roy, go and look at my proposal for UD post. You can do it, for example, if you add a sparse regular lattice of points to the database, together with the precomputed visibility of these points from one POV. Then you may use this when doind the realtime rendering part, both for selecting from the huge database only the part which is potentially visible, and for using only additions and substractions.

  43. The person that shows some code that includes something like this:
    Maybe two C/C++ source files, one including very basic Windows callbacks for basic keyboard or mouse controls, and perhaps a single assembly file, and a very basic file loading code routine, the other being the main square array filling code. That will always be the winning example of how this works. Using multiple libraries and other modern black box libraries will always be slow and probably wrong. Geoverse will always be THAT simple. Remove all extraneous libraies, and all the helper functions that you we’re told to use in modern computer education. And then you’ll be thinking correctly.
    Just an idea. It just seems that people approach things in such a high level of computing these days, that they forget the power of assembly language, and how messy and bloated library functions can be.

    1. i only use plain c (NOT c++). i only use the basic libraries like stdio for file reading. for drawing i use win32/dib. if i would go more low level, i would be using assembler. and when i have something working, ill add simd/assembler instructions.

  44. It is impossible to do perspective projection without division, however, you’re traversing an octree, so you can do a base-2 long tail division with only a constant amount of integer additions, subtractions and bit shifts per octree node.

    So the speed of the algorithm is determined by the number of traversed octree nodes. To get this down to its absolute minimum, you have to use “perfect occlusion”. That means that you only traverse each octree node that would be visible in the scene you’re rendering. (Imagine the scene. And then add an octree node in it. If it is visible, you can visit that node. Otherwise you can’t.) Furthermore, you’re allowed to traverse each octree node at most once (per frame).

    To some extend I managed to implement these. However, I visit octree nodes more than once. Also, I can only do the projection when the camera is aligned with the axes. That means that there are only 6 valid camera positions, (24 if you count those that give the same result after 90 degree rotations) but that is sufficient to compute a cubemap. I compute that cubemap and then render it. It gets me to 25 FPS at 1024×768.

    Please read: https://chorasimilarity.wordpress.com/2013/06/06/guestpost-a-proposal-of-a-ud-like-algorithm-by-bauke-conijn/

    1. i googled base-2 long tail, but i dont quite unterstand it…
      as far as i know, one can use simple bitshifts instead of dividing when the divisor is n^2 and if the divisior is not n^2 one has to do a little adding. but i also read, that the compiler does this anyway. so everytime you write x / y, the compiler optimizes that for you. that makes me think, that the ud algorithm might not use division/multiplikation nor the bitshift equivalent (because in the end it would be the same…) while traversing and drawing the tree.
      i agree with you, one cannot do projection without division / multiplikation. but i believe for each octree, there are only projections for the root required and the projection of all children can be interpolated according to those values (by simple additions/subtractions).
      i also agree with you, that only the visible nodes should be traversed and that perfect occlusion is the key to that.

      i looked at what you did and im very impressed. its very clever. although i dont understand it in full extend. do you use some kind of rays? in your code, you use opengl and sdl. how much fps do you get if you use software only? what struck me when i saw your video, in moments when sparse data is very close to the camera, one can see each point very clearly (just like in ud too). to me, it looked like you rendered actual cubes. the ud videos show flat quads, not cubes.

      the pictures here are also very interesting, if someone likes to look at baukes approach:
      https://chorasimilarity.wordpress.com/2013/07/11/the-weekend-round-table-on-ud/

      1. Octree traversing in projection space is feasible, as You said it can be done based on projections of 8 corners and root of the octree. Notwithstanding one division (so called perspective divide) is still necessary.

      2. You are describing D.Meagher’s patent in a way. He states that you only need to project to a certain pixel size, then use binary subdivision to find the dividing positions in screen space. So you only mathematically project far enough to fool the eye.
        The old Quake engine went down to something like 16×16 pixels then did normal interpolation, as it was good enough. This may explain the wobbly artefacts we see on the early Euclideon videos.
        I’m not actively pursuing this anymore, but I’m still get these posts with interest and throw random ideas in. Sorry if I seem of topic at times.

  45. == Base-2 long tail division ==
    Below is the code for a base-2 long tail division. As you can see it only contains additions, bit shifts and subtractions. It works for 0 <= a < 1024*b =0; i–) {
    if (a<(b<<i)) {
    a -= b<<i;
    r += 1<<i;
    }
    return r;
    }
    }

    The CPU contains a circuit that does something similar (or uses some clever trick) and is 'executed' by the code:
    int r = a/b;

    The circuit length of the division operation is longer than those of addition, subtraction and bit shifts and therefore supposedly takes more time to compute (this depends on what kind of processor you are using).

    The divide function above is a lot slower than just a/b. However, if you have to do millions of divisions and can share partial results. If you get an average of 2 or 3 additions, subtractions or bit shifts per division then you are slightly faster.

    == Rendering ==
    The algorithm is not a ray tracer in the sense that the algorithm does not use the notion of a ray.

    Using pure SDL would mean I have to do the cubemap rendering in software. That would about halve the frame rate and also reduce image quality. The later is because OpenGL uses bilinear filtering during rendering, giving some kind of anti-aliasing.

    I'm 'rendering' cubes, because that was the easiest thing to do. I suspect that rendering quads would actually be slower. The algorithm only traverses an octree, so the trick I use here is to let the leaf nodes (cubes) actually be nodes that has itself as its 8 children. So when the algorithm encounters a leaf node, it just keeps traversing and subdividing that leaf node until it is the size of a single pixel.

    I have considered the Quake style 16×16 shortcut, but it does not seem possible to implement in my case.

    1. thanks for the information. have you tested how much slower your code is when using simple a/b ?

      bruce dell claims that the graphics card only draws the final image. i dont know if that also covers hardware cubemap rendering…
      so when you reach a leaf, you keep doing “fake” traversing, even though its a leaf? untill you reached the size of one pixel?

      1. I have not tested it with a/b. My algorithm in its current form is not suitable to do that. However, using a/b would allow me to easily omit the cubemap. So it could be interesting to try that. Though I might run into rounding errors.

        And yes, I do these ‘fake’ traversals, until it reaches the size of one pixel. The algorithm should be able to handle that, as we can easily replace these traversals with ‘real’ traversals by increasing the model resolution.

  46. I found out that the quadtree is the most important part of my algorithm. I can basically replace everything else with something different, even the octree and the cubemap. But without the quadtree it all stops working. So the quadtree is at the core of my algorithm.

    The quadtree is basically a sorting machine, with your screen pixels at the bottom (the leaf nodes). At the top you throw in your model. At each layer your model is broken into smaller pieces and thrown in the right squares of the layer below in front to back order (It says ‘squares’, because model pieces can be duplicated or discarded at each layer). When a model piece ends up at the bottom, the average color of that piece is used to color that pixel and that square is blocked, such that no new pieces can be thrown in and thus must be discarded. If a square has all its 4 lower squares blocked, than that square is blocked as well. So this quadtree sorts the pieces of your model, by putting them in the right pixels, but only those pieces that are visible.

    The nice thing about this quadtree is that it entirely removes the need for perspective projection. You only need to know whether a pyramid intersects (the bounding box) of a model piece, which generally can be done with only additions, subtractions and bit shifts. (Btw. this quadtree effectively performs a long tail division, which is used to create the perspective projection.)

    In my algorithm, the model is an octree and is broken into pieces by traversing that octree. But it could have been anything else.

    An image that visualizes the layers of a quadtree: http://img6.imageshack.us/img6/5616/tetr.png

      1. Yes, I was planning on doing that and tried to explain it based on radix sort. However, I noticed that no one would understand what I had written and decided that comment would be faster.

        I still want to make a guest post. But it will take some time to write a proper understandable one. Also, I believe I can eliminate the cubemap, but do not yet fully understand how.

        It is a new rendering method, different than rasterization, splatting or ray tracing, and simple enough to be added to that list. It is odd that very few people have thought of it.
        The sorting process also really fits the name “mass-connected processing”. And I’ve shown that it actually works, as I’m already using it in my renderer. So expect this concept to be at the core of Unlimited Detail and probably also Atomontage (in which case Notch was right about one thing).

        It is kind of ironic that I encountered Unlimited Detail while searching for how I could render voxel sprites, which I wanted to use in a minecraft clone (instead of the polygon based models) and got interested when I realized that it would be a great renderer for a minecraft like game, because it can also render everything that is more than 100 feet away. This is also why I have a low resolution sign between my models. 😛

      2. I tried that, but it didn’t work like I expected. So I recreated the post, with a link to the original one. If you like, I can send you the html code of my new post.

    1. nice! thanks for sharing. awesome idea. even though i dont fully understand how the projection works. for each quadtree section, you “cast” a pyramid (with the size of the section at the image plane) into the octree?
      how do you get the near to far order?

      i believe this really is something. it could solve some problems…

    2. If your octree tested for the centre point and radius, would it be quicker? It would also look more like UD, as it’ll be a flat screen facing 2D square instead of a cube.

    3. i thought again and now i think i understand how your algorithm works. you simply do “frustum culling” for each section of the quadtree to see which part of the octree corresponds to it. of course, no further projection needed, since each “frustum pyramid” is pointed in the correct perspective direction. nice!

      i wonder how fast this is.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s