Is 3D Cube a predecessor of UD?

Pablo Hugo Reda has found this old site dedicated to the 3D Cube Project (that’s an even older link sent by appc23, with more access to info), as well as a discussion on the net which looks strangely alike the actual exchanges around UD.  There are an exe and a database, as well. I reproduce further some parts, taken from the two links provided by Pablo  (only boldfaces by me):

3DCube is as far as I know, a completely different way to store and display complex 3D images. The building atoms of the 3D data structure are lines of pixels, not polygons, bitmaps, or voxels. As explained above, the main objective of the project was to create a system that would allow very complex, elaborate models, with hundreds of structures, which could take up to an entire CD-ROM but require only a small portion of the model to reside in RAM. A large portion of the challenge was coming up with data organization that would allow keeping the model on CD ROM but be capable of displaying the perspective of the entire model instantly at any time. This is possible since the high detail of the model is only needed for elements close to the view point origin. Almost no processing is needed to load the model or its parts from the disk (please notice how quickly the demo initializes). Therefore, the disk activity processing load related to tracking the view point movement is very small – much lower than required for playing a digital video for example.

The algorithm required to display the image at any angle from the view point is quite simple. No floating point calculations, trigonometric functions, or even division instructions are needed, and use of multiplication instructions is very limited. A simple custom hardware utilizing my method could render the image with the same ease as a video card hardware displays the bitmap stored in its video memory. […]

My rendering algorithm is essentially a DSP-type algorithm, working with 64-bit data, which generates the image scan-line by scan-line with operations being mostly adding of 64-bit data. If the 80×86 just had a few more registers, the entire rendering algorithm would use no temporary RAM data (just the CPU registers) and would render the entire image by reading the model and outputting the resulting scan-lines. The biggest current problem in implementing the algorithm now is the necessity to swap the temporary data in and out of the registers to memory.

3D Cube project was originated with the intention of making it a new generation 3D game engine, allowing unprecedented detail and complexity of “virtual worlds” created. After seeing the performance and model sophistication possible with this method, I realized that the possible applications of the method are far beyond just video games. […]

In addition to the above, 3D Cube could allow things like taking pictures of some scene from different directions, building a model out of it. 3D Cube storage method

It’s not a mistake, the text ends like this.

From the discussion, a comment by the 3D Cube creator, S.A. Janczewski:

Well, let us try to clarify the term voxel.
– If a “voxel” can have different color/attribute from each of 6  directions, is it still a voxel?
– If it is not cubical — can have sloped surfaces, is it still a  voxel?
– If the six colors of a “voxel” are not at all stored together as a unit, is it still a voxel?
If the data organization of a 3D engine does not have any kind of  data structure that would store data through clear (x,y,z) – coordinate
granularity, is it still a voxel engine?

If you answer yes to all the above questions than my engine is a voxel engine but so is any polygon-based engine.

I hope that the above will make it clear that the reasons I did not use  the term voxel anywhere were following:
– My method has absolutely nothing in common with commonly known voxel engines or storage techniques
– Since the term voxel is not clearly defined, using it would not  contribute anything(rather than confusion) to the descriptions
– Some “voxel engine” techniques are patented — using the term could result in getting myself (without a reason) accused of “patent infringement.”

Please forgive me if you somehow interpreted my previous message as   criticism of your input. I did however get accused quite a few time of ignorance (not by you) for not using the term voxel in my description  and felt it was appropriate to respond to it.

What do you think?


UPDATE: Is maybe relevant for the discussion to state that the goal is to produce an open source variant of an UD like algorithm. As you can see, supposing of course that 3D Cube was indeed a precursor of UD, the problems are the same, i.e. a probably very good idea, with a lot of potential for cash and also a game changer for an industry. Communicating it means loosing an advantage, but not communicating it leads to disbelief. There is a third way, open source. Right, no direct money from it, but everybody benefits and new possibilities open. So, in case you are working on that, don’t be shy or secretive, that’s not the good idea. Share.

222 thoughts on “Is 3D Cube a predecessor of UD?”

  1. I had a look at the .dat with a hex editor – nothing jumps out

    I had a look at the .exe with a hex editor – parts of the source code debug seem to be there 😛
    there’s also a .bmp loader in there

    3D Clusters must be in Order.Clusters must have the same dimensions.
    Cluster bitmap must be the same type.

    \s\cp\dynvect\adynv.hpp.(Flags & ROM_Layout)==0…\cp\dynvect\adynv.cpp.Prev!=NULL…\cp\dynvect\adynv.cpp.Next!=NULL.\s\cp\dynvect\adynv.hpp.Index*4 < New_Len…\cp\dynvect\adynv.cpp.Prev!=NULL…\cp\dynvect\adynv.cpp.Next!=NULL…\cp\dynvect\adynv.cpp.iElem <= DBase_Len…\cp\dynvect\adynv.cpp.iElem < DBase_Len…\cp\dynvect\adynv.cpp.p!=NULL.\s\cp\dynvect\adynv.hpp.Index*4 New_Elem_Ref>=0 || Next->New_Len==0.\s\cp\dynvect\adynv.hpp.iElemSec_Elem_Ref>=0 || Next->Sec_Len==0…\cp\dynvect\adynv.cpp.Next == NULL…\cp\dynvect\adynv.cpp.New_Len <= New_Alloc…\cp\dynvect\adynv.cpp.Next == NULL…\cp\dynvect\adynv.cpp.New_Len <= New_Alloc…\cp\dynvect\adynv.cpp.Next == NULL…\cp\dynvect\adynv.cpp.New_Len <= New_Alloc…\cp\dynvect\adynv.cpp.Next != NULL…\cp\dynvect\adynv.cpp.New_Len New_Elem_Ref>=0 || Next->New_Len==0…\cp\dynvect\adynv.cpp.Next != NULL…\cp\dynvect\adynv.cpp.New_Len Sec_Elem_Ref>=0 || Next->Sec_Len==0…\cp\dynvect\adynv.cpp.New_Len Sec_Elem_Ref. = .p->Sec_Len. = .p->Sec_Alloc. = .p->New_Elem_Ref. = .p->New_Len. = .p->New_Alloc. = .p->New_Step. = .p->Sec_Active. = .p->Flags. = .p->DBase_Step. = .p->DBase_Len. = .p->DBase_Alloc

    Setting 0000:0000 Selector Limit… ERROR Returned = 0x%X…..%d .%X .%12.5f . . . .zInc Problem: .&…####.####.### Restoring 3D structure from: .\s\cp\dynvect\adynv.hpp.iElem<DBase_Len.\s\cp\dynvect\adynv.hpp.(Flags & ROM_Layout)==0. patterns…*** Converted .3d_coord.cpp.Grid_Size Next != NULL.\s\cp\dynvect\adynv.hpp.Current->Next != NULL.\s\cp\dynvect\adynv.hpp.Index*4 ZStarts.Current->NGetWord(ZStarts.Current->New_Len/2-1).\s\cp\dynvect\adynv.hpp.Index*2 ZStarts.Current->NGetWord(ZStarts.Current->New_Len/2-1).3d_coord.cpp.Z_Coordinate == 0.3d_coord.cpp.Z_Coordinate > ZStarts.Current->NGetWord(ZStarts.Current->New_Len/2-1).3d_coord.cpp.Z_Coordinate == 0.3d_coord.cpp.Z_Coordinate == 0.3d_coord.cpp.Z_Coordinate > ZStarts.Current->NGetWord(ZStarts.Current->New_Len/2-1).3d_coord.cpp.Z_Coordinate == 0.]= .,.,.ZStarts[.]= .,.HStarts[.]= .,.HRefs[.HStarts leading to Cluster:.. .:.HRefsUp leading to Cluster:.. .:.VerZStarts at CLuster:.. .:.VerPatterns at CLuster:.. .:.FlatZStarts at CLuster:.. .:.FlatPatterns at CLuster:.. .:.**** Position Dump **********************************.. .NDO_Cluster=. .NDO_Coord=. .DOM_Cluster=. .DOM_Coord=.TABLES[iNDO_Coord][iDOM_CLuster]..TABLES[iDOM_Coord][iNDO_CLuster]….No Error.Z Range Exceeded.Pattern Len 0 not allowed in this version.Pattern Len GT 256 not allowed in this version.Pattern Len must be power of 2 in this version.Base Array Size Does Not Match.Not Enough RAM.Line Outside Boundaries.Depth Outside Boundaries.Cannot Create File.Cannot Open File.Cannot Read from File – File Corrupted.Cannot Write File – Of Of Disk Space?.File Corrupted or Wrong Type.Unknown Error….<NÑ‘\þï?3d_pos.cpp.d<1.0.3d_pos.cpp.d<1.0.3d_pos.cpp.d<1.0.3d_pos.cpp.d<1.0.3d_pos.cpp.d<1.0.3d_pos.cpp.d<1.0.3d_pos.cpp.d<1.0.3d_pos.cpp.d>c3D_Position_Shift.. %-20s = %d..yOrg>>c3D_Position_Shift.. %-20s = %d..zOrg>>c3D_Position_Shift.. %-20s = %d….xAng.. %-20s = %d..yAng.. %-20s = %d..zAng.. %-20s = %d..hVideo_Size.. %-20s = %d..vVideo_Size.. %-20s = %d..hVideo_Start.. %-20s = %d..hVideo_End.. %-20s = %d..vVideo_Start.. %-20s = %d..vVideo_End.. %-20s = %d..Distance.. %-20s = %d..hVideo_Range.. %-20s = %d..vVideo_Range.. %-20s = %d..hVideo_Half.. %-20s = %d..vVideo_Half.. %-20s = %d..hVideo_Medium.. %-20s = %d..vVideo_Medium.. %-20s = %d..dHSqrt.zLong_Step.zLong_Inc.zShort_Step.zShort_Inc.xpyRatio.Func.Index… %10s %14s %14s %18s %14s %18s %14s %14s… %10u %14p %14.7f %18.11f %14.7f %18.11f %14.7f %14.7f……#### Render List ####.. %3d. %p…********.. = .Dist. = .(void*)(p->Last_Pixel). = .p->Last_zHShort_Step. = .p->Last_zShort. = .p->Last_zHLong_Step. = .p->Last_zLong. = .(void*)(p->First_Pixel). = .p->First_zHShort_Step. = .p->First_zShort. = .p->First_zHLong_Step. = .p->First_zLong. = .p->Prev. = .p->Next….,.,.,.,PgUp,PgDn – GO AROUND, UP, DOWN..x.S,B – SMALLER, BIGGER VIEW, NOW:.T – MEASURES AND SHOWS FRAME RATE.F1 – RE-DISPLAYS THIS HELP..ESC – EXIT. ..IF DEMO RUNS TOO SLOWLY.USE SMALLER VIEW SIZE…SWORLD1.DAT.### Initializing r3D from: .SWORLD1.DAT.### After r3D Init..SWORLD1.LOG.### Opening LOG FILE: .w.SWORLD1.LOG.SWORLD1.LOG.###### Cannot Create LOG FILE:…### T_B3D : Setting Video Mode… ERROR Returned = 0x%X..*********************************************************************..* c3D ERROR: You need VESA 640×480 256 Color mode to run this demo *..*********************************************************************….### T_B3D : Programming Palette… ERROR Returned = 0x%X… ERROR – Too Small Screen..### T_B3D: About to Prepare Video..[s]..[FPS] .FRAME RATE=…DO YOU REALLY WANT TO EXIT? (Y/N)….*****************************************************************….* 4. Further speeding up the algorithm *..* disk-based models. See documentation on the web. *..* 3. Completing “dynamic loading” allowing creating huge *..* 2. Creating more examples, more detailed models *..* 1. DirectX (Windows95) Version of Cube 3D *..* Please Check my Web Site Again. I am working on: *..* *..* Web Site: *..* E-Mail: *..* Tel (909) 794-0130 Fax/Msg (909) 794-5720 *..* PO Box 8024, Huntington Beach CA, 92615-8024 *..* Real-Time 3D Rendering System (C)1996 SAJ FAZA *..*****************************************************************..****************************************************************..* c3D ERROR: .****************************************************************..****************************************************************..* DVect ERROR: .****************************************************************..****************************************************************..* Unknown

  2. The website also provides a download of their program:

    It runs quite well in dosbox. They seem to be using voxels, but with separate colors for each side of the voxel. Remember that in DOS you had only 16 colors available at high resolution (anything above 320×200, up to a maximum of 640×480). Therefore the model itself only uses 16 colors and uses dithering. However, I think that the method would work equally well with 24bit colors. The FPS depends on where in the model you are.

    I also noticed that the LOD effects are similar to those in my own implementation (

    1. DOS’s VGA mode 13h had 8 bit colour with a changeable palette. It was used throughout the game industry, along with variations like Mode X.

  3. The resulting image is quite clearly made of tiny cubes. These cubes are aligned on a grid. So they are picture elements in a volumetric raster image, hence they are voxels. So it does not matter what the algorithm is, the end result consists of voxels, so the algorithm must be using voxels.

    1. So, according to your reasoning, any image in this world made of little cubes is made by voxels, even if it’s made by hand. Any algorithm which uses such an image must be using voxels. For example, when I look at a jpeg image on the net, which depicts a cube, the browser calls some voxel algorithm in the background, because otherwise would be impossible to see it, right?

      When I think about a PDE describing, say, what happens to a 3D body which has some parts of the boundary acted by some forces, other parts with a prescribed displacement, then I usually draw a potato. When the solution for solving approximately the PDE is turned into a numerical algorithm, the explanation of the algorithm is visually aided by potatoes drawings. Moreover, even the final product, which is an working algorithm, is tested on some simple geometries, basically for the simple reason that that’s available, and also enough, for a proof of principle that it works. It does not apply to potatoes, but to cars, or planes, or bridges, or bones, whatever, and once there is a known way to solve the problem for potato-like bodies, then it can be optimized, harmonized with BS which is bought by the conservative producer, with windows or whatever the applied guys are using and finally turned into a piece of very good looking software which solves concrete problems. However, somewhere in that (huge now) piece of software, hides the original algorithm which, without any lack of respect for the rest of work, solves what was previously thought unknown.

      Likewise, why don’t give the benefit of “what if” the author of 3D Cube just made, by some simple procedure, a small world for testing his algorithm, by using small cubes, for example? That is very much alike the ad nauseam replies that Bruce Dell algorithm applies only for worlds made by countless copies of his animal (in reference to his video with pyramids of animals)? Maybe the guy, without having access at a huge file of a real city, for example, just used a template to create his world for experimenting with the program. After all, now Dell shows videos which are not based on such templates. The logical conclusion is that the argument that UD works only for fractal like worlds was superficial.

      1. From your post I get the feeling that you did not run the demo program. Please run the demo and take a good look at the red roof of the church-like building. It clearly shows voxels.

        Maybe the algorithm does not use voxels. Maybe it supports ‘points with arbitrary color and orientation’ and he just happened to use a voxel based model. Well, then he has chosen an awful bad model for testing his program, because now he cannot test whether it truly supports arbitrary orientation. (Besides, he explicitly mentioned ‘all 6 directions’, so I’m quite sure it does not support arbitrary orientation.)

        I’m also wondering why its called ‘3d cube’.

  4. Cleaned up the debug strings – most of it was for a dynamic array system (adynv.cpp) – which is much like STL’s vector.
    Looking at the debug stuff here and reading his website – I think he is using RLE textures – or pixel line arrays inside some sort of 3D hash structure.

    very interesting!

    No Error.Not Enough RAM.
    3D Clusters must be in Order.Clusters must have the same dimensions.
    zInc Problem:
    Restoring 3D structure from: .
    3d_coord.cpp Grid_Size ZStarts.
    3d_coord.cpp Z_Coordinate == 0.
    3d_coord.cpp Z_Coordinate > ZStarts.
    3d_coord.cpp Z_Coordinate == 0.
    3d_coord.cpp Z_Coordinate > ZStarts.Current->NGetWord(ZStarts.Current->New_Len/2-1).
    3d_coord.cpp Z_Coordinate == 0.
    3d_coord.cpp Z_Coordinate == 0.
    3d_coord.cpp Z_Coordinate > ZStarts.
    3d_coord.cpp Z_Coordinate == 0.
    ZStarts[.]= .
    HStarts[.]= .
    HStarts leading to Cluster:.
    HRefsUp leading to Cluster:.
    VerZStarts at CLuster:.
    VerPatterns at CLuster:.
    FlatZStarts at CLuster:.
    FlatPatterns at CLuster:.
    **** Position Dump **********************************.. .
    NDO_Cluster=. .
    NDO_Coord=. .
    DOM_Cluster=. .
    No Error.
    Z Range Exceeded.
    Pattern Len 0 not allowed in this version.
    Pattern Len GT 256 not allowed in this version.
    Pattern Len must be power of 2 in this version.
    Base Array Size Does Not Match.
    Line Outside Boundaries.
    Depth Outside Boundaries.
    Cannot Create File.
    Cannot Open File.
    Cannot Read from File – File Corrupted.
    Cannot Write File – Of Of Disk Space?.
    File Corrupted or Wrong Type.Unknown Error….<NÑ‘\þï?
    3d_pos.cpp.d>c3D_Position_Shift.. %-20s = %d..
    yOrg>>c3D_Position_Shift.. %-20s = %d..
    zOrg>>c3D_Position_Shift.. %-20s = %d..
    HSqrt zLong_Step zLong_Inc zShort_Step zShort_Inc xpyRatio Func Index

    %10s %14s %14s %18s %14s %18s %14s %14s…
    %10u %14p %14.7f %18.11f %14.7f %18.11f %14.7f %14.7f……

    #### Render List ####.. %3d. %p…
    ********.. = .Dist. = .
    (void*)(p->Last_Pixel). = .
    p->Last_zHShort_Step. = .
    p->Last_zShort. = .
    p->Last_zHLong_Step. = .
    p->Last_zLong. = .
    (void*)(p->First_Pixel). = .
    p->First_zHShort_Step. = .
    p->First_zShort. = .
    p->First_zHLong_Step. = .
    p->First_zLong. = .
    p->Prev. = .

  5. specifically:


    look like fixed point math to get the perspective transform, much like was done in the good old days (Extreme Assault – PC Helicopter game, software rendering – using a transform much like this), Those vars won’t all fit in the x86 reg’s, but I could be wrong tho – c’mon lads get your thinking caps on!

  6. Jim and Bauke if you like I send you the decompiled .exe, I need help with the names of functions.
    Some greath advances in file format are from appc23!

    1. how did you decompile that oldschool .exe? none of my spook software will decompile it properly 😛 I thought it was because it was a pharlap / watcom exe

  7. I’ve had time to think about how this might work – I thinks it’s like this:

    it’s a ray caster, it casts a vertical ray into 2D Bitmap Lookup table, which has RLE Vertical spans that contain the “Voxel” data. I think the Vertical voxel data also references a texture for it’s colour. There’s no Yaw in the engine – hence the vertical spans – which are very quick to render. There’s no mip info – hence the awful aliasing and performance at a distance. I think the 2D “Bitmap” is split up into “Clusters” which contain the links to the vertical spans. It would be extremely easy and quick to write this, but would give terrible render quality and performance without mip maps. A much better and farther evolved version is ken silvermans

  8. Some more 3DCube tests.
    Changing the odd byte seems to change whole rows of either blocks or empty space. This is leans towards Jim’s findings with the debug strings of 3DCube.

    At hex 40004 of sworld1.dat if you change the ‘310000ff’ to ‘31000000’ (viewing in hex 4 byte groups) you get this on the car park railing:

    Note that it is flat and attached to the outside of the wall rather than being a cube.

    And changing the same block into ‘311000ff’ it creates a bright pink line in the occupied span below the other one:-

    It is possible that the 3 planes are rendered separately, the area I researched in then was:-
    <- in which references are made to scan-line and run length rendering of volume models.

    1. Oh that’s cool, thanks Jim.
      The main loop appears to return to Label_13 quite a bit although I don’t understand the following, I realise the v84 and v82 are the two previous values of ‘byte_BAEAC,’ but what changes ‘byte_BAEAC’ to escape the loop?

      v82 = v84;
      v84 = v83;
      v83 = byte_BAEAC;
      while ( !byte_BAEAC );

  9. it’s oldschool – prolly a interrupt – keyboard or something – can’t quite remember how the dos extended mode stuff worked 😦

    1. Hmm, I suppose a disassembly with memory locations so we can NOP out and change immediate values in the ‘exe’ would help a lot! Possibly, or maybe not. : )

  10. do
    sub_55B50(v21, 24, v14);
    sub_57890(v21, v22);
    sub_57930(v21, &unk_BAEB0);
    sub_503E0((int)&unk_105EB0, v23);
    while ( v14 < 30 );

    is the test code when you hit T – just gotta decompile those sub functions.

      1. This is all getting interesting. As a complete stab in the dark, the two integers there are 24 with a count of 30, which makes 720, ideal for some factor of rotation about Y for the fps test – maybe. 🙂

  11. yeah I agree – he’s using VBE stuff – which makes it hard to find the Screen buffer copies – that first function sub_55B50 it’s return is a call to the main render funtion I think

  12. I think main render is at sub_56EC0 which in turn calls sub_55E60 – which I think is the scan line filler! I’m pretty shocked it’s actually __int64’s e.t.c. – awesome 😛

      1. “My rendering algorithm is essentially a DSP-type algorithm, working with 64-bit data, which generates the image scan-line by scan-line with operations being mostly adding of 64-bit data. If the 80×86 just had a few more registers, the entire rendering algorithm would use no temporary RAM data (just the CPU registers) and would render the entire image by reading the model and outputting the resulting scan-lines.”

    1. Thanks, I was expecting some repetition in code for different angle sections and it appears they’re there(!), well I say so tentatively…
      If you search for “0x80000000u” in that code it will jump to eight similar sections.
      The eight sections start with:-
      v178 = sqrt(a5 * a5 + a6 * a6) * *(double *)(a2 + 44);
      if ( a5 < 0.0 )
      {…. … .. .

      Is v178 the scaled distance of a5 and a6, which are the a X & Z coordinate or an internal combination of 2D slices in XY / XZ / ZY?

  13. We should underline this 3dCube pursuit as a 2D BSP type of ray casting and get on with an Occams Razer approach of what we know from the detailed visual exhibition of their advertising. I think that number one ‘elephant in the room’ should be the near instant loading of the display. And step through from there methodically.

  14. I heard back from the 3DCubes author Slawomir today, and he seems a little intrigued by Euclideon, not surprisingly I suppose, and he thinks they are doing something similar. Although he thinks the idea of a ‘search engine’ is just a publicity stunt.
    He says he hasn’t really got the time to get into it, but with his permission I’ll quote his description of 3DCube from his email: ; )

    From what I remember, (to get more detail I would have to examine source code and I don’t have time for this now), the idea of the renderer was sort of ray-tracing: Iterate through all pixels of the frame buffer. Each pixel corresponds to a square travelling through the model (not through looping by distance units of course, but as, sort of search query for the model) and seeing which model elements it hits. Upon “hitting”, a model element responds what percentage of square it hits (does not have to be accurate, say scale 0 to 10) and what average color that hit will represent (we are coloring a single pixel at a time).

    I don’t completely remember how I did the model indexing, but it was the model itself, not the indexing that was the problem. This is why, in my previous e-mail, I stressed the model issue, and did not say much about how the raster buffer was being filled.

    I think, I made pointers to next model objects for increasing x, y, and z, granulated at some reasonable step, much coarser than model detail. My ray square enters the model at (x1,y1,z1) link coordinate and travels through those model indexing pointers by increasing, decreasing x,y,and z according to trigonometry rules. Of course, due to coarseness of the pointers of the cubes connecting the model element, sometimes two bordering cubes have to be checked for hits.

    In other words, each pointer cube has six pointers – the pointer for increased X, for example, does not point to next cube behind if it is nothing but empty space, but to the next pointer cube that has something in it.

    But, to reiterate, what I did back then was quite primitive by today’s standards, and I am not quite sure it applies to what others are doing.
    Best Regards
    Slawomir A. Janczewski

    Thanks again, Slawomir.

    1. Great Dave H, and thanks Slawomir! Would it help to ask him for the code? The idea is to make an open source UD and his code might be much easier to understand than what we have.

      1. Well it was the first time he’d seen the Euclideon videos, and he was just expressing his opinion of the presentation, I was adding the comment for colour, sorry. 🙂

      2. Dave, finding, contacting him and asking is great! I think that he just does not want to disclose the way he constructed the database. We know already this is the key of UD, we talked repeatedly about that. On his old site he spends some time telling how useful and novel his database is. Maybe appc23 is advancing into understanding it and maybe Slawomir makes a move eventually.

      3. chorasimilarity :
        Dave, finding, contacting him and asking is great! I think that he just does not want to disclose the way he constructed the database. We know already this is the key of UD, we talked repeatedly about that. On his old site he spends some time telling how useful and novel his database is. Maybe appc23 is advancing into understanding it and maybe Slawomir makes a move eventually.

        We don’t actually know it’s the same thing at all, and it’s not all that quick to be honest. Plus it needs the whole area to be loaded at once. I’m not saying it’s nothing to do with UD but it’s not looking good, especially as it’s slower for further distances – it’s ray tracing. It’s still fun exploring all possibilities though, so I’m all nods on this.

  15. chorasimilarity :
    Great Dave H, and thanks Slawomir! Would it help to ask him for the code? The idea is to make an open source UD and his code might be much easier to understand than what we have.

    I did ask him, but he ignored my question, and I didn’t want to push.

    1. Well, I suppose we are going to make sense of it in finite time, but maybe he looks and understands the “open source” part, as well as the fact that here any contribution is publicly attributed 😉

  16. I did speak of the open nature of these discussion, but OK I may try again later.
    It actually doesn’t sound all that difficult from his description, I think the salient points are “each pointer cube has six pointers – the pointer for increased X, for example, does not point to next cube behind if it is nothing but empty space, but to the next pointer cube that has something in it.”

    So I think it’s a kind of 6 directional sweep of empty space, in which the ray tracer knows it can step over in rectangular spaces without hitting an object, a bit like ‘distance fields rendering I suppose. He also said he had to repeat some models, referring to Euclideon’s island, to save memory – all those pointers I guess. It might have worked better with a KD-tree?

  17. Was busy today – and tomorrow. Tried to pastebin the full pcode (3mb) – but there’s a limit of 440kb on pastebin sorry:(

    wow that it is really interesting info from 3DCube author (sinks my RLE theory 😛 ).

    I agree with DaveH – this sound really simple once your link structure is sorted. It’s probably best to get it working in 2D first – once that is good, then switch to 3D. Sounds like it’s best to render in columns and use the Previous Hit cube as a starting point? (hence the bad shadows in UD tech and hence the more models/detail the faster).

    algo would involve terms like this

    Ray Direction
    Current Cube
    Search surrounding Cubes using links based on Ray Direction and Min axis distance?
    kind of traverses the structure in this way – crawls over the structue like a spider
    Models are linked also using x,-x,y,-y,z,-z poiners?
    Models AABB are pointer linked

    Part distance estimation, Part Marching Part something else?

  18. I had a quick look yesterday but nothing popped out – and if it’s a variable linked list thing – nothing really will – because the structure will be unstructured apart from the indexes – or however Author has done it. Still trying to wrap my head around the ray traverse algo – in columns – first hit must take ages – next few if connected would be really quick?

  19. my mind is bubbling over this

    and a set of Cubes
    each cube is a Colour and set of x,-x,y,-y,z,-z links/pointers to the nearest cube in that direction if one exists or it points to the nearest cube?
    Do the links also have a distance or does the Cube have a Coordinate from which the axis distance is calculated?
    u32 Colour;
    Cube* PX; // pos X link
    Cube* NX; // neg X link
    Cube* PY; // e.t.c.
    Cube* NY;
    Cube* PZ;
    Cube* NZ;
    i32 PosX,PosY,PosZ;

    i32 PosX,PosY,PosZ;
    Cube* [200] XStarts; // list of Cubes nearest to X axis on AABB
    Cube* [200] YStarts; // list of Cubes nearest to Y axis on AABB
    Cube* [200] ZStarts; // list of Cubes nearest to Z axis on AABB


  20. You guys are nuts. 😉 You might have put more work trying to disassemble it, than it took me to write it YEARS ago. I am SLOWLY beginning to remember what I was doing back then.

    I will try to find the source code so I can actually know better what I am talking about. Now I am beginning to wonder how accurate I was describing this thing in my e-mails to DaveH. It’s been a while. You guys seem to be ready for far more details than I could possibly give without looking at the src.

    Very Best Regards

    1. Looked at it, some of it already gave me a headache.

      First installment.

      Main rendering loop. Main function simply calculates what each pixel in the frame buffer should be. Seems insane at the time when it was done, but, I guess, it all depends of what you have more of – polygons in the model to consider (or their substitutes) or pixels in the frame buffer.

      Using all integer math, Get_Pixel traverses the model to obtain the pixel. Quite different from polygon type rendering. I do not actually believe this was a good idea, but no floating point meant it could run on even very old HW (it did run on 386 and 486. With today’s computers it is generally stupid not to use floating point, this is why I was surprised to see Euclideon be proud of it). All depends what you do with it, I guess. But yes, you can render without using floating point or trigonometric functions or even multiplying. 😉

      void c3D::Render (void)
      { c3D_tColor *pVideo,*pVideo2;
      int iV;
      c3D_t3D_Lookups *pH;

      WNDDBG_printf(“*** RENDER (%d,%d,%d)(%d)\n”,xOrg>>c3D_Shift,yOrg>>c3D_Shift,zOrg>>c3D_Shift,Aux_xyz.iShift);

      pVideo = Video_Mod;
      for ( pH = r3D_Lookups; pH>c3D_Shift,
      // yA>>c3D_Shift,
      // zA>>c3D_Shift,
      // zInc>>c3D_Shift
      // );

      pH->zStep = pH->zStep_Start;
      pVideo2 = pVideo;

      for ( iV=0; iVzAStep = abs(pH->zStep) >> c3D_Shift; // Optimize

      *pVideo2 = Get_Pixel (pH);

      //WNDDBG_printf(” xo=%d yo=%d zo=%d x=%d y=%d z=%d\n”, ixHito, iyHito, izHito, ixHit, iyHit, izHit );

      pH->zStep -= pH->zStep_Inc;
      pVideo2 += hVideo_Size;


      //WNDDBG_printf(“MAX_COUNT = %d (%d,%d,%d) \n%g %g %g\n”,MAX_C


      From what little info is available from Euclideon’s marketing talk, their goal seems to also be rendering from model one pixel at a time.

    2. Saj, thanks for coming here! We think that database might be as important as the algorithm. Do you remember how you did it? You praise the method on your site, but you don’t give the details.

      Looking forward for your input, I think everybody’s glad you comment here.

  21. Hey Saj – welcome to the Digital Archaeology Party!!

    Isn’t it wonderful looking at old source – 17 years old 🙂

    Thanks for taking the time to dig it up – this method is very interesting, even today integer math is still faster than float – even with SIMD – it’s still faster!

    1. Tell me about it. The other routines are far, far worse. I would NEVER write anything that way today. 16 bit registers, segments, no dynamic containers, etc. Brrrrr…… It was hell, and completely unnecessary, due to only idiotic domination of Intel/Microsoft. But, that’s how it had to be done.

  22. I looked at some of my old watcom c code – and I’m totally with you – it was awful 😛 but hey that’s how we get to become great coders yeah!

  23. I was thinking about the Data structure last night – from the sparse details I came up with this:

    u32 Colour;
    Cube* X; // these are BiDirectional XOR pointers – yaay finally a use for that 🙂
    Cube* Y;
    Cube* Z;
    u32 Size;
    i32 PosX,PosY,PosZ;

    Model is also a cube – a container for other cubes. It’s almost like a Octree – but with a variable amount of children and freely aligned. Also because of the X,Y,Z pointers – the Data is pre-sorted in X,Y,Z. The Models X,Y,Z pointers point to the nearest child cube on the X,Y,Z AABB bounds. I think this could really work well if you balance the amount of Children so it is sparse and has alot of empty space (which will get traversed quickly). Also the model’s colour is the LOD. So the whole Data structure is very much like a octree.

    Ray has Postion,Direction and Size. Size change is linear in world space – so it can be used to decide if to traverse into the Cube’s children e.t.c.

    1. Jim if you use the subtraction form version instead of xor then you would never have to fix up pointers from the file load (or moving around in memory) as long as you had it in a continuous segment of memory..It just basically becomes bidirectional pointer offsets and not direct pointers.

      1. cool – I didn’t think about the actual data in the file – but great idea – finally a use for Bidirectional pointers!. I had some free time at lunch time and looked at the datafile – it has a big chunk at the start that seems to be a whole load of 18byte indexes , and then some 4,16,32 or even aligned data. I thought more about the structure – there was a problem with a parent cube – it’s x,y,z points to it’s nearest children, but then it also needs a x,y,z to point to it sibling cubes, this can be solved by it’s Edges children pointers pointing to it’s siblings:

        in 1D X

        X-> C1 ->C2 ->C3 -> -X

        C1-> Parent sibling

  24. saj :
    Tell me about it. The other routines are far, far worse. I would NEVER write anything that way today. 16 bit registers, segments, no dynamic containers, etc. Brrrrr…… It was hell, and completely unnecessary, due to only idiotic domination of Intel/Microsoft. But, that’s how it had to be done.

    Hi saj! That’s probably the wrong way of looking at it – it was fast and necessary programming – none of this invisible library code bulking out C++! And hey, it looks like it’s in C, *pfft* surely graphics code was all in assembler back then! 😀 😀
    Hey, sorry for dragging you into this Euclideon nightmare, but WELCOME! : )
    p.s. Don’t be ashamed of your code, please zip it all up and host it somewhere.

    1. The main renderer routine is indeed in assembler. I would not be an old hacker if I did not use assembler from time to time even today and even for programming high-end computers.

      As to the C++ container libraries, I am actually a fan of those, and wrote my own versions of them. Something that would be of great interest in modern implementation of 3D cube would be a triple indexed set container, which I implemented for a different application. It’s really awesome what one can do with these. It’s much better than three sets, one main one and two storing iterators to the first one.

      You get operations like those:

      iterator_x it_x= begin_x(); ++it_x; ….
      iterator_y it_y=it_x;
      iterator_z it_z=it_x; ++it_z;

      Beside being far faster than three separate sets, it takes far less memory as you get one RB tree node per data item, the header pointers are trippled, that’s all/

      This is the sort of thing I would use for indexing 3D model elements in a modern implementation of the renderer (or its, ahem, search engine) as it allows very fast traversing through very large model. Tripple iterator allows stepping down or up, say x, while skipping over elements with y or z z out of range of our interest at the moment.

      Going back to the old thing.

      I am still reading through the original 3D cube source. The idea of the model was to have versions in decreasing resolutions, averaged out previous level of the model both texture and detail wise. Something like octrees but organized differently. The only goal was being able to switch to consecutively lower res models. As each consecutive lower-res model needed approximately 8 times less memory, the lower res models were not a problem resources wise, while allowing very efficient, all integer ray-tracing renderer.

      As the Get_Pixel imaginary square travels through the model, at one point it gets larger than current model detail. That’s when you switch to lower res model as the goal, as always is to hit something to retrieve color for a single pixel.

      Due to memory limitations of the era I was not quite able to do what I wanted. So, I created the world of cubes, each of them assigned textures for their sides, and their versions in decreasing resolutions. I was not happy about this, as it resulted in visible cubes. Again, the main problem was RAM needed for the model – we are talking working with 8 MB of RAM – a joke in today’s conditions.

      The renderer algorithm is not limited at all to the cubes with textures, it can work with any model which allows flying through it until you hit something.

      1. That is awesome Saj – it solves one of the big problems with raw Octrees i.e. switching up and down the Lods quickly.

  25. Saj, I’ll ask here instead of making a reply to one of your messages. Would you explain what did you mean by this (see the post for the context): “the data organization of a 3D engine does not have any kind of data structure that would store data through clear (x,y,z) – coordinate granularity” ?
    Funny, the “… a modern implementation of the renderer (or its, ahem, search engine)” 🙂

    1. As far as I can remember, this remark was the result of a very long discussion on some forum of what is or isn’t a voxel or an octree. I was trying to stress the issue that the goal that I imagined was to create a model that does not store detail voxel by voxel as this is cost-prohibitive, wrong thinking problem of the same sort as trying to send digital video strictly in form of arrays of pixels was.

      This I already knew back then – that was the main lesson I remember from that little exercise. Obviously, I did not solve that problem to my satisfaction back then, but then again, trying to do it with 8MB of RAM was rather insane. But, the raytracing renderer of the sort I have built will work if a suitable model compression can be found.

      Not facing that issue and fooling yourself by say, replicating large number of identical raw voxel sub-models for the purpose of a rendering demonstration leads to nothing. I don’t know if Euclideon solved that problem or not, but the demo consisting of large number of repeated sub-models does make me wonder.

  26. Hi Saj

    When Traversing the structure did you use a multiply to forward the raypoint or did you use some kind of DDA based on the structures node? I’m really excited about the structure format it’s seems like a real smart way for traversal (If I understand it correctly).


  27. I think we shall be direct with this one. As someone pointed out it is based on ray casting/ray tracing. Now, my little experience on the matter tells me that achieving raw raytracing implies at least a GPU:
    or massive multi-core parallelization. Else, you’ll fail.

    Ray-casting usually is bound to 2 factors in a 3D rendering environment: screen resolution and ray length, where “length” refers to the variable number of cycles needed to travel long distances. This applies both to octree-type accelerations and DDA.

    My question to everyone (especially to the author) is: given a 1024×768 window, how well this algo performs if you feed him with like 21 trillion points and allow long pre-processing time? Can it mantain interactive framerates at ANY moment like UD shows?

    We know UD is bound to screen resolution only. So if you can get rid of the second “factor” I mentioned, or your algo is so fast the cost of travelling long distances is insignificant, UD is achieved.

    Completely opposite behaviors are seen on doing the inverse of ray-tracing: splatting.

    Splatting techniques (such as Shabtronic’s I guess) are fast as far as the camera keeps some distance from the scene, so that the program does not have to traverse much the tree.

    LODding is crucial here. Splatting turns into UD when the traversal loop guarantees traversal always to pixel-precision without leaves (and bounded to actual tree depth on full closeups of course).
    Btw can Shabtronic do that? I mean, always pixel-accurate traversal without skipping leaves, just by LODding. Luckily it can still keep up with realtime.

    Same goes for image-based rendering (JX’s demo). Faster if objects are far away and laggy on closeups.

    A system whose worst case can keep up with realtime without quality drops is considerable UD.
    Another definition of UD may be:
    If there is a time upper bound for rendering without dropping quality, the considered system is UD.

    1. Nice summary Magio. Euclideon behaves like a forward renderer. A good example indicator would be the errors down the side of the screen on the early videos, a ray-caster wouldn’t have those problems at all. You can change LOD quite easily if you order the points in a box in a Morton sorted order, either through XYZ or polar coordinates. To reduce the LOD just skip over evey other index, skipping larger amounts as the box gets further away. And if he’s using polar coords he could precompute the perspective transform and change orientation by simply stepping through the lookup table from different starting angles.
      And there is a good indication that UD is splatting in terms of objects rather than the complete world, which would explain (as many have said) how they managed to get those trillions of points into a 16Gb laptop.

  28. Magio, Dave H, that’s the usual trend, which lead us nowhere. The main new thing in Bruce Dell UD (and probably in Saj 3D Cube) is not the rendering algorithm, but the database. Compared to producing a database which satisfies all claims made by Dell and Saj (why don’t you believe them?), the task of writing an algorithm which uses that database is easy. The experience is a valuable thing, but not when it filters new things away. That’s my opinion, of course.

  29. HALO rendering – Hierarchical Axis Linked Octrees – Is what I would call this. This is a Major breakthrough in Octree Traversal. Because the End nodes are Connected to the Sibliings parent – it means.that you can very easily ray trace through a octree “without having to re-traverse the tree” (get it?). Because the End nodes link are always the parent – it means the link is upwards (bigger) in the octree – which is perfectly suited for a Ray Beam – since this grows in size also. I’m knocking up a demo as we speak. The algo can be entirely done with additions and shifts (and maybe a LUT for the beam size).

      1. No references I’m afraid – HALO (sorry for the cheesy name 🙂 ) – it’s all the stuff from this comment page. I’m pretty sure the Parent Sibling link has not been done before. It means the Data is larger – but as pointed out you can either whole store the world or splat parts or plane eqns or whatever in the bottom nodes.

  30. Hi DaveH

    Not really Voxel Cone Tracing – just rasterizes the world scene into a 3D Texture and traces that for illumination – it’s like a raw dump of the voxels with a standard octree structure..

    HALO is simply a standard octree with 8 nodes – with some extra data – each node has a link to it’s next axis siblings parent. This is the magic – it enables the traversal without a stack or re-traversing the tree.

  31. Sure – I explained it a few Days ago, but hopefully in more clarity here

    standard linear octree node:

    u32 mColour
    Node *mChildren[8];
    Node *mAxis[24];
    i32 mPosX,mPosY,mPosZ; // Not needed really – just here for clarity
    i32 mSize; // Not neede really – just here for clarity

    mColour is the colour of the node (i.e. the mip or average colour of all it’s children)
    mChildren are the standard 8 child nodes
    mAxis – 3 for each of the child nodes – point to the next siblings parent in the x,y,z direction.

    Because the 8 child nodes are clumped “Together” each node only has 3 possible axis directions and not 6 as normal.

    mChildren[0] is the node top left up, i.e. +Y -X -Z , which means its axis links are only in the +Y -X -Z directions.

  32. Siblings parent mean the next node in whichever axis direction that is the same size as this node – and the link is to it’s parent node.

  33. I’m sorry Marius for being trivial, I’m not really updated on the things of UD.

    I believe on the database claims they made. This is because:
    1) The octree does not have to be extremely subdivided. It just has to scale to pixel level at some point in space and there are many possible octree compression methods out there. This is because 2:
    2) the actual point cloud data is to be stored inside the last layer of voxels of the tree, and splatted on-demand in closeup views (this can be done in both raytracing or splatting, if you are lucky the splats coverage allows skipping further pixels). Those voxels have fixed size thus making possible to compress the cloud coordinates and colours (xyzrgb takes 15 bytes, can be crunched into less than 3 bytes). This fact has vital importance while streaming. Little data to read=faster streaming.
    One can compress a database at least down to 30% with this fact alone.

    Every node has an averaged colour of the portion of the cloud it contains in its children. When the LODs start keeping up with pixel-precision, the cloud data isn’t touched anymore because you just need the rough average stored on the parents. The result is not only a performance increase but also quality increase. Geometry aliasing may still be visible, but the textures and the inner colours will appear smooth at the distance.

    If you have some more computing time to spare, consider a trilinear filter to further smooth LOD switches, textures and everything.

    Use inter-node links to avoid retraversal as Jim said.

    This database structure applies well to both splatting-type UD and raycasting UD. I wonder if they can degenerate into an hybrid system.

    1. Nothing to be sorry about Magio, there seems to be something true in what you and Jim say, thank you both.

      But there’s something fishy about having to store the absolute position of each point, excuse me for not being able to give a rigorous argument.

      1. In a Sparse Voxel Octree no positions are stored, as everything is relative to any parent. Euclideon’s points appear to be at non-cube align locations in some of the videos, but it’s really hard to be certain.

  34. As Dave said, if you work with voxels only (Minecraft/3DCUBE style) you don’t need any coordinates.
    Else, if you use the octree to BSP a cloud database then indeed you need coordinates, but they can be packed inside the container voxel.
    In this case coordinates are relative to the center of the voxel and the components are very small in length. This permits to store locations at much lower bit precision losing almost nothing globally.
    In the end, the point cloud itself and an additional octree used to traverse the voxel grid are almost equal in size.
    That’s pretty much how cool people compress point clouds.

  35. Sorry I could not respond sooner. I was away on business.

    To answer some of the points raised.

    1. Raw voxel space will not work for two reasons. Suppose some world just 3000 by 3000 by 3000 detail points big (very poor model). You just blew some 30 GBytes of memory, even if you limit yourself to 1 byte per voxel to represent voxel color. Second issue is rendering time. If you render frame buffer 500 by 500, you will need to run a loop of 500*500*~3000 steps. No way.

    2. Therefore, you need some sparse array/container. If it is sparse, it no longer works directly with x,y,z coordinates, meaning it has some other way to access data, directly related to how the sparsity is represented (or not represented).

    3. If that other way to access data allows you to almost instantaneously skip the empty space to arrive at the ray hit needed to arrive at needed frame buffer pixel color, your processing loop is 500*500.

    4. Therefore, the whole point is to arrive at the sparsity model to allow the ray-tracing algorithm to quickly skip over the empty spaces, because this is the most critical way of accessing the model.

    5. For above reasons, I always objected to the discussion of the sort: “is it voxel-based or octree-based, or whatever-based”. Such a discussion leads to nothing, and it will always end up to be some religion-like discussion. The point is that model of sparsity that can be accessed quickly for ray-tracing purposes, that’s all. Solving this problem well leads to rendering engine being able to beat the polygon based one.

    6. Note that the polygon based model eliminates the 3000 component out of the above 500*500*3000 multiplication by means of scaling polygons and drawing their casts in the frame buffer. It works well as long as the number of polygons that need to be considered/drawn for a frame is far less than the number of pixels in the frame buffer.

    7. As we are approaching/bypassing that boundary, the days of the polygon based models and renderers are numbered.

    These are the goals I had when I tried to work on that 3D Cube silly thing. There is, however, a big, big disconnect between what I wanted to do and was able to do then due to the limited resources:
    1. Computer hardware – I made it work on 8 MB of RAM
    2. Computer speed – I worked with a 486
    3. My own resources (time and money) I could spend on this thing.

    If someone intends to seriously pursue the thing, I do have some new ideas to make this thing go forward, but am currently up to my eyeballs in work needed to finish my current crazy project ;).

    1. saj :
      3. If that other way to access data allows you to almost instantaneously skip the empty space to arrive at the ray hit needed to arrive at needed frame buffer pixel color, your processing loop is 500*500.
      The point is that model of sparsity that can be accessed quickly for ray-tracing purposes, that’s all. Solving this problem well leads to rendering engine being able to beat the polygon based one.

      argh wait a sec, I think I missed something. How one is supposed to do that? I mean, DDA is linear, using octrees for acceleration implies the storing of the octree in mem, raymarching is just too slow. Is there really another way i’m not aware of to skip empty space for raytracing? Can your method achieve a ray “hit” in a number of cycles which has an upper limit constant for any possible scene?

    2. Saj, I think everybody who visits this blog for the UD subject is paying attention. But just in case, if you want and have time for this, then you might write a guestpost. I bet there are several people interested to contribute and that probably more will join. That’s exactly the kind of thing I like to see here!

    3. Saj, one more. I don’t understand how your sworld.dat is structured. Do you remember? Have you explained this? Some of us tried, but without much success, to decipher this.

    4. I believe this path a thinking is more in line of finding a UD like solution even if it is not exactly Euclideon’s UD solution.

      The path I am currently exploring is not treating the whole scene as one massive hierarchy or grid. I am taking more of the macro cell approach that contains a collection of points, and that macro cell could have a possible hierarchy. I take a hit at the beginning of rendering a frame and project all macro cell’s 3d bounds in the view frustum to screen aligned 2d quads and build a 2d sorted hierarchy in screen space. So now a ray collision test into the world becomes a ray origin test in screen space. For each pixel I can use the screen space hierarchy to see:

      1. Do I fall within a 2d quad
      2. If I do then ray test the macro cell to get the point
      3. If I pass through the macro cell get next 2d quad and do again
      4. If I do not hit anything (quad or cell contents), this pixel is empty air

      This basically removes most of the typical 3d tree crawling and flattens empty space and uses spatial coherence. We can also use temporal coherence and use a render cache for each frame. If we keep a buffer of world cords for each rendered pixel when we change camera position we can project all those from the last frame to new positions, then for all remaining empty pixels we can use what I mentioned above.

  36. Hi Saj

    Voxels, Octrees, KD Trees,BSP Trees, Sphere Trees, BVH e.t.c. are just abstract descriptions – nobody has or will use them in a religious war – they are just used in convenience to describe complex ideas, which would otherwise take several post to explain.

    Can you tell us more details about your data structure and how you traverse it with integer only math – and what you used the multiple for? I’m guessing the mult is used for beam size,

    I’m currently working on my current HALO rendering demo – which I should have finished sometime this week, and so far it is 100% integer (I managed to ditch the multiply for the beam size) and it has enough sparsity/space skipping for realtime rendering. I haven’t decided what to store in the end nodes yet – 1st pass will probably be raw voxels and then I will try plane eqns, cubemaps,zsprites e.t.c.

    You’re all welcome to join in with this and add ideas e.t.c., I’m now 99% sure this is what Euclideon are doing, it all ties up with what they have said – the main clincher being “tracing a octree like structure without having to retraverse the tree”.


    1. Jim, you’re welcome with a guestpost too. Anyway, where “to join in with this”? I really want to see this done, as a open source proof of principle, from that it will be fast.

      1. chorasimil – I’ll get the basics working then create a google code/source forge open source job so anyone can jump in, but in the meantime prolly posting here is a good start.

        My current task is the polygon model conversion – this takes so long, much longer than standard octree conversion – because of the axis links – but hey I’ll get it done.

        other thing to be done are:

        Fast ray creation – still not had time to think about this. Actually if you look at the early euclideon demos – he has the fisheye problem – which comes from not cosine correcting the ray positions.

        Compression – leave until the very last thing.

  37. jim :
    chorasimil Actually if you look at the early euclideon demos – he has the fisheye problem – which comes from not cosine correcting the ray positions.
    Compression – leave until the very last thing.

    I think you have to be careful with the fish-eye thing. I was thinking the same thing until he said it took quite a while to render that animation – it’s not real-time. Unfortunately I can’t find the video again to proof it! 8 |

    And surely data compaction should be part of the algorithm, I would have thought that most data storage systems have this in mind, if anything, for memory cache purposes.

    Sorry to not be entirely contributory. : )

    1. Thanks DaveH

      Yeah – I was a bit shocked when they eventually said they were animations 😛

      We’ll once the first proof of concept code is done and working, compression can be done but it will be hard work since it will heavily affect the code e.t.c. It takes quite a while to finalize a data structure when doing R&D of this sort.

      Fast ray creation without cos/sin or normalize (transcendental functions are usually the slowest – since the microcode voodoo will be iterative) – it’s on the back burner if anybody wants to tackle it – gwan 🙂

      I also read somewhere Euclideon are in a bit of trouble at the moment having let go of 5 of there staff due to nobody picking up the Geoverse stuff – that’s a real shame and more of a reason for a open source version.


      1. Yes, I saw the the story about the 5 guys, but I guess is nothing serious. Re compression, I don’t think is very important, but maybe I don’t understand right. Databases like those used in geoverse are huge by default, the problem is to design them such that they can be used fast.
        Looking forward for the open source version, did not have time lately to grasp your proposal Jim, that’s another reason to call for a guestpost, with more detailed explanations.

        Btw, has anybody tried Bauke’s proposal?

  38. Baukes Idea is great but it involves a sort routine for each recursive call which will be slow.

    Just as a rough idea for the open source version

    Target machine is 2.4ghz

    Render resolution is 1280×720

    some very rough on the back of a napkin down the pub math gives us:

    2,400,000,000 cycles

    1280×720 = 921600 pixels

    2604 cycles per pixel

    of course this is very rough and not full of science goodness (cache hits, memory bandwidth e.t.c) – and doesn’t take into account the blit ( I think that’s around 2ms for a 1280×720 DIB blit )

    But it gives a good idea of what we are playing with.

    1. I don’t have a sorting routine at each recursive call, because I know beforehand in which order the children of an octree node must be traversed, such that they are in front to back order. So this order is hardcoded. Sure, the order depends on the direction you’re facing, that’s why I have 24 variations of the recursive call. It is now running at around 15FPS on my laptop. And has some kind of anti aliasing.

    1. Very good Jim! Just a question: is the number of cycles per pixel in any way dependent to one of the follows:
      – scene complexity?
      – scene extension?
      – database size?
      – empty space percentage/sparsity?

  39. Hi Magio

    86 cycles per pixel – that’s the budget to traverse the HALO structure (or any other system ) and find the hit point – that’s all you have to play with.


    1. I see. As far as you can keep the count constant the goal is met.
      I don’t know how much your work is related to Saj’s but still, from his last message, one can tell Saj is doing 1 cycle per pixel or so, am I close in saying this?
      Also of course, octrees can weigh tons in terms of storage. Be careful with that.
      I’m probably just overthinking it, since many possible UD candidates can be conceived. I just can’t stand the wait anymore.

    1. Those two links are papers about creating the best fit triangle mesh around a parametric or isosurface, aren’t they? They’re all about polygon fitting.

      1. No Dave, they are about hierarchical tetrahedral meshes, how to store them and how to search them quick. They explain things with triangles also, because it works in 2D, but who cares? The point is that there is a wealth of data structures, which are a bit like octrees, but much better and flexible for large quantities of data and for, well, hierarchical scaled versions.

  40. Here is why I point to hierarchical tetrahedral meshes. Imagine that you have a load of colored points in 3D. Group them in small families (preferably in groups if 4 close points). Compute the barycenter (center of mass) of each group. Repeat. The color of each new created point is the average of the colors of points in the group. Another version would be to start from a predefined hierarchical tetrahedral mesh, as one of those described in the newer article and then each point of the cloud is in one small tetrahedron, it can be recovered, if needed, from the barycentric coordinates wrt the tetrahedron. Color the faces of tetrahedra using the colors of contained points and their barycentric coordinates as weights. Store in one of the variants described towards the end of the newer paper. You get basically a tree, but not one which is a octree. I’m especially fond of the encoding from end of p.11- first part of p.12 from the newer paper.
    Easy to use, as well, I think is obvious how: put on screen what you see when you get down the tree, until you arrive at pixel resolution. At each refinement you have to solve which faces are in front from a projected tetrahedron.

  41. cool stuff there – I had a quick read – will try absorb it fully tonight, one thing to consider about octrees and the big plus for using them is that the spatiality is fixed, mean node coverage and size is fixed and is half the previous – this means that all calculations can be done with simple shifts and adds. Not that I’m a octree nut case or a religous fanatic – just passing on the info ( or maybe I am! lol). But that is the reason to use them at the moment. Positional weights sounds interesting.

  42. There is this trick which does not get off my brain. Say everything is in a huge cube and the POV is somewhere inside, in the center of a small cube. We want to project everything which is outside the small cube on the faces of the small cube, such that everything outside falls along the rays from the pov on the faces. We ignore what’s inside the small cube. Now, such a projection is easy to write in barycentric coordinates (which are obtained as a fixed, simple linear transformation and a translation depending on the coordinates of the pov. The rest to be done involves a parameter q = 2^{-k} (with k sufficiently big), which multiplies some barycentric coordinates, that’s easy and fast, but also dividing by 1 -q. Here’s the trick, \frac{1}{1-q} = 1 + q + q^{2} + ... , a series, right? But any truncated part of the series is smaller than \frac{1}{1-q}, so just by adding a new term to the series in the formulae moves the points closer and closer to the scree, thus giving a dynamics of falling on the screen (or the cubemap, the surface of the small cube). Pixels from the cubemap are then colored greedily. But I have no idea how fast this really is.

    1. For any bounded space (us on the inside looking out or us on the outside looking in) where we take all the possible viewing rays (extended in both directions across the cube) intersecting (in the simple case of a cube) the two sides we can indeed create a DB that can be fast that can recreate all possible views inside or out. But here the working set becomes rather larger than what we would probably care to have and end up with a variation of a lumigraph or light field.

  43. cool – could be quick – as pointed out above you have 86 cycles per pixel to traverse your data structure and draw your pixel. If you’re interested in the details behind the instructions and working out if a scheme really is feasible

    it’s all here:

    Appendix C – Instruction Latency and Throughput

    Latency — The number of clock cycles that are required for the execution core to complete the execution of all of the μ ops that form an instruction.

    Throughput — The number of clock cycles required to wait before the issue ports are free to accept the same instruction again. For many instructions, the throughput of an instruction can be signifi-cantly less than its latency.

    x87 fpu instructions start on page c-24

    1. Jim, I have a question concerning those86 cycles/rendered pixel: suppose you already extracted, from the huge database, the data needed for drawing a screen at a given moment. How many cycles/pixel do you need to actually draw the screen? I am aware that it depends on what exactly is extracted from the database, like: coordinates of visible 3d points and their colors, or the respective nodes in an octree, with some color information …

  44. I’ve done it before.. using an octree, and applying fairly quick plane/sphere collision allowed me to traverse it somewhat quick, the problem is simply slicing between near and far can result in a large number of traversals.. although, this could possibly be improved a lot to prevent the need for ever traversing nodes within a certain screenspace of already traversed pixels, as you can see in the second image, I tilted my octree slightly, to show that I only retrieved none-occluded points in this process.. but the maths to really speed things up.. such as ignoring traversal of occluded octree nodes was a bit beyond me when I tried this initially; on the far right, a simple none-occluded sphere with a slicing step size of 0.01(moving away from near);

    maybe this can actually be improved greatly if I could come up with a way of not traversing completely occluded octree nodes.. but still, even after that, I was transforming the points(3d) from local, to world-screen space each time I picked one.. which is also slow, I wonder if that can be improved as well, these are things I was incapable of improving during my attempt years ago.

  45. I will work up a demo using SIMD/C++ and try to get it as fast as I can, after words, I can send you guys the source, if you think you can manage to improve it using any neat tricks, then maybe we can really do something here:

    What I will do ->
    *** Simple Octree(not worried about memory atm);
    *** 3D Eularian Grid -> Each Node in the grid is an indexed pointer to an octree which occupies that cubical space, this helps when reading directly from hard-drive
    *** Bounding Frustum -> determine octrees/cells within the frustums visible space
    *** Adaptive Slicing -> As we move further out, the slicing step grows, to decrease LOD
    *** Dynamic Quad Tree -> Screen space occlusion for points
    *** Point to Plane projection -> since my plane is inversed to the octree in local space, I simply need to project my points onto the plane for it to appear in the right screen-space;
    *** Small Test Scene -> Simple Shapes, avoiding to much data because I cannot do the compression part well
    *** Empty Space Removal -> Determine which cubical spaces on the 3d eularian grid are not set with an octree, and ignore it.. determine start/end points from near to far, including gaps in between;

    What will more then likely be needed ->
    *** Octree Compression
    *** Anti-Occluded Node Traversal
    *** Faster Quad-Tree since this is built each frame

  46. Finally, the reason 3d textures would not work here, is due to the fact that you’d need to sort of scan-line each slice of the textures boundries.. this is slow, plane slicing against an octree can get the same results much quicker by avoiding empty space all together.. LOD can be simply determined by Octree depth, and this would of vary based on distance from camera..

    One more thing -> In the latest interview with bruce dell, and that kid, I noticed what appeared to be billboard rendering of points.. This tells me that euclideon is actually rendering the points on GPU, and simply fetching them on CPU.. with that being the case, I may include in my sample a GLSL shader, which basically I pass an array of billbards to it, whos’ extents/center depends on LOD and point position, the GPU can easily process this from local to world space, with ease, not very clear on how slow it may be otherwise to pass the vertex data to the GPU, maybe I can use CUDA.. or rather, the OpenGL equivalent.

    Also, keep in mind, this technique would not allow for the nice swaying animations my inspiration guy shows in his video at the bottom of the link I sent.

    Though, if you’re interested in animation, I’ve considered the good possibility of using this technique :;

    At which, one can effectively render “animation”, without any additional maths, or drop in frames per second.. since the cycling process is very minimal.

  47. sorry, I am not leaving any more comments after this, and will not return. you can delete all of these messages… it appears I may have figured it out a way to do this, perhaps not the same as euclideon, but good enough.

  48. This thread starts to be long, so probably I shall open another one. Thanks everybody, very interesting comments, looking for more, and again a warm welcome to (in a random order) Saj, Kyle, Jim, Shea, Magio as well as a nod to the regular contributors!

  49. Kyle :
    sorry, I am not leaving any more comments after this, and will not return. you can delete all of these messages… it appears I may have figured it out a way to do this, perhaps not the same as euclideon, but good enough.

    Someone’s found his ‘precious’ ! 🙂

    1. Dave H, I think he’s upset because it took so long until the comments appeared. But this was only because first time contributions are moderated and I was sleeping.

      But I wonder if this applies to JX and appc23 😉

      Kyle, now you can post without moderation.

  50. Hi everyone.

    Sorry I cannot reply more often. Have some other things to do (current crazy projects to finish).

    1. About the comment about core problems to solve and not waste time on discussion about what is the meaning of octree or voxel was not to imply that anybody does that here. Voxel or octree are indeed abstract terms, ideas of sorts that can be useful in discussions but often will mean whatever the author wants them to mean. Also, I got into such a discussion of “what is the meaning of is” related to the 3D Cube years ago and regretted it ever since.

    2. Octree rendering as envisioned by the people who originally coined and patented the term has, fortunately, little to do with what we were discussing here.

    See how they render the octree elements:

    3. While I am completely convinced the time is right for a new 3D rendering/modeling method and model, I am also quite certain that nobody has done it yet, including Euclideon. The repeated submodels seen in demonstrations suggest that (while they managed to render it the way we discussed it here), they still have a massive problem with amount of storage a model useful for say a video game will take. The “patent” they applied for is a provisional application that means very little and in practice almost always complicates the process of getting a real patent. They are running out of time to file for a real patent, if they don’t do it within days, it is as if they never filed anything.

    4. If I had a method that solved all those problems for sure, I would not talk about it on forums but would get a patent. Once upon a time I opposed patents like a lot of Open Source guys do, then I saw how Microsofts of this world take someone else’s ideas, slap a ton of filth on top of it and then get a patent on it. And yes, Microsoft has a patent on octree rendering too, and no, I don’t think they REALLY solved the problem either:

    Patents were intended to protect individual inventors from big corporations. Yes, the corporations have mucked it up a bit, but at least in the US the patents are still VERY CHEAP and can be used to protect yourself from being ripped off.

    Therefore, Kyle, if you are still reading this, if you make it work, do get a patent. Among other things, you will be able to discuss it without someone ripping you off and then calling your idea his.

    5. I do not think pure octree with 8 pointers per node down to smallest detail can do the trick. Consider a simple model, and calculate the number of pointers needed. But, using octree as indexer for models that have some finer detail within lowest level cube just might.

    6. The 3D Cube thing. As I said, from the time’s perspective, it was a very silly trial to show that raytracing can be a basis for a real-time 3D engine. The model took 2D (x,y) horizontal grid and then, I believe, sorted by z, sparse array of elements, each of them a cube associated with textures in dropping order of detail. A loop (that did not even need int multiply instruction) would advance through the grid until it intercepted a (filled) cube. As I said – nothing that special in the great scheme of things. The renderer loop was very simple, but the “coordinate” class that drove it all was not.

    Note that if you do such a integer renderer, traveling through x,y,z space can always be translated into something like take say 10 x steps, then 1 y step, then 10 x steps, then 1 y and 1 z step, etc.

    I really do not think dissecting the 3D cube thing will help you much, but if someone is still interested, I will zip the source up and post it somewhere.

    1. I was told by a patent office guy that the purpose of a Patent was enable invention sharing, and that you are allowed to build upon another patent to form a new one . It’s a shame that Patent trolls have changed things forever. [looks casually in Apple’s direction]

      1. The simple goal of utility patent is to solve some problems that others could not. Therefore, it actually helps to find existing patents dealing with similar issue in order to show that you can do something that others tried, and then you can do it better. A good patent application always lists such “prior art” of previous attempts at solving a problem. But, you still have to have something new and useful. A unique combination of method steps or unique combination of machine elements.

        Prior to patents, an inventor would always get ripped off if he shared details about his invention. He would show to others some magic gadget but would never disclose details. Meaning, he would take your money to pursue the idea, but could not tell you how it worked. Which, quickly led to most “inventors” being fraudsters. To a degree, you can still see it at work with some “inventors” and magic technology companies peddling great new ideas and inventions that they cannot disclose until they get a patent. 😉

        Therefore, the whole point about a patent is granting someone a limited time monopoly on his invention in exchange for him fully describing his invention so it can be replicated and improved on by others in the future.

        But, of course, people always find a way to make a mess of things. The worst patent of all is Microsoft’s “patent” on a dialog box popping up asking for a higher level user/password when you try to do something your current user level is not allowed to do (like install a device driver). Somehow they got that patent even though all other operating systems always had something like that.

      2. Forgot about the Apple “multi-touch” patents.

        A novel and unique idea of:
        You can use more than one finger to point at something 😉

      1. Would be great to see Saj’s algo running in a modern compilation setup – I speculate from the cpu calculations above – it would run around 50-60fps @ 1280×720 on a moden cpu – maybe faster. 94 cycles per pixel on a P200mhz – would probably translate to 40-50 cycles on a IA32?

      2. jim :
        I haven’t counted the instances used in the island demo – but there’s not that many – 20? Each instance at that detail level from my experience with pure octree storage uncompressed was around 200mb – I assume they are compressed in multiple ways – deflated for cheap ram storage and the internals of that also compressed – relative links, colour palettes e.t.c.
        I don’t believe his “all atoms are unique” tale – it’s obvious that instances are used.

        It would be great to see it running! It does sound to me like you can’t tilt the camera though.

      3. Cubemaps like JX and Bauke propose solve the tilting, by decoupling rotations based on the POV from translations of the POV. But you have to render 6 screens.

  51. Hi Saj

    Some great info – thanks very much for the detailed reply. The pure octree method has been done quite a few times and is very workable due to the massive amounts of memory we now. Like I said I’m trying plane eqns/z sprites e.t.c. later on once I get the main method working. I think all new ideas and implementations are good and a great inspiration for others – and 3D cube was in no way a silly idea, any new idea or angle can spark another thought process – hence the halo idea comes from trying to work out your 3D cube method 🙂

  52. The brick wall I keep hitting is that Island demo. If you take Bruce’s words as being truthful, then:-
    21 trillion atoms.
    All running in RAM (16 GB laptop), no disc streaming.
    All atoms are unique, i.e. no repeating models.

    Let’s say he’s using all of that RAM for argument sake.
    You get this:
    – that’s 0.0008 bytes per atom!

    That’s what has to be achieved, if it’s believable of course – and there’s the niggle.

    1. The island is full of repeating models, I would not believe that the database used by Dell for the island contains multiple copies of trees or of dirt. I think he wanted to say that multiple copies do not matter for his algorithm, provided one really has a huge database from somewhere, the algorithm would work as well. He proved that later with the geoverse.
      So, concerning the database, I would not be worried about it’s dimensions.
      There is the nice point made by Saj about the fact that, however, ideas won’t stop with UD and further breakthroughs are to be expected. (However, there should be limits to what can be stuffed in a fixed amount of memory, of course. The UD does not consist in having a magical way to squeeze the world in 16GB of RAM, but to be able to get fast what it’s needed for rendering in real time, from a huge database.)

  53. I haven’t counted the instances used in the island demo – but there’s not that many – 20? Each instance at that detail level from my experience with pure octree storage uncompressed was around 200mb – I assume they are compressed in multiple ways – deflated for cheap ram storage and the internals of that also compressed – relative links, colour palettes e.t.c.

    I don’t believe his “all atoms are unique” tale – it’s obvious that instances are used.

    1. Oh I’m sure it’s instances, because I can’t fit it into any other paradigms, so it must be! 🙂 😉

      This algorithm’s spec is all over the place, but it may be that this structure is quite tall and detailed, everything else we’ve seen so far has been flat:
      For that cathedral on their Facebook page is “Captured at 1mm resolution, compressing 850,700,000 points of data from 26gb to 10gb” – not brilliant.
      It also doesn’t make any sense going from 32 bytes to 12 bytes per point anyway. Why does it start at 32, it should be more than that?

      1. that’s funny – my current halo structure is 32bytes:uncompressed 😛

        class HALO_Node // 32 bytes
        u32 mC; // Children Pointers, 0 = null node 0xff000000 = Bit Children
        u32 mAN[3]; // Axis Node Pointers, 0 = null node
        u32 mAD[3]; // Axis Node Distance in node units
        u32 mColour; // Nodes LOD Colour
        // 0xf0000000 Node Position
        // 0x0f000000 Children count

        I’ve setup a google code project for it

        just learning GIT before I check stuff in, and start rolling it 🙂

      2. It is quite well possible that part of the ‘compression’ is done by dropping points. Bruce mentioned that the conversion process has a resolution parameter and that setting it to a low value would cause points to overlap and be discarded.

        Furthermore, I’ve written a point cloud to octree conversion program that is doing about 16.5 bytes per point (that would be about 14GB for the cathedral). It also drops about half of the points, simply because otherwise, disk usage becomes too large and the holes in the model start to connect to each other.

      1. Thanks preda – it’ll be a few days before I get anything checked in and “working” (still working on the traversal), but I’ll post on here when it’s all good. I’m using BCB 6.00 as the app exe – so I won’t check that code in since nobody uses it apart from me 😛 lol – but I will check in the renderer code and structures, models/data and all that jazz. The main render func will be something like:

        void RenderColour(ivec3 Pos,f32 Pitch,f32 Heading,u32* VidBuffer,u32 BufWidth,u32 BufHeight);

        so anyone can use it there own app setup with ease.

  54. Re patents, three comments:

    (a) there are open source licences, better use them!

    (b) or, if somebody “founds his precious” here, as Dave says, and runs with it, improves it and get rich, please talk with a lawyer about making a fund which supports open collaborations initiatives which are running on this blog. It will ease your conscience, because

    (c) more cool people like you collaborate by using this humble channel, more people look and link here, so better ranks this site, therefore if you run with the precious then you will be forever remembered here, for the delight of the netizens! 😉

  55. Me myself I’m 100% for open collaboration and no charge for this kind of stuff and against anybody making a profit from the hard skilled work put into it – I’ve put the HALO stuff under GNU GPL3 – hoping that it can only remain in the “free” domain and never be charged for. I’m not a expert in these silly licenses – maybe someone could advise me on the best route to take?

    1. Jim, I was half joking, but seriously the open way has only this power, of being open about this. Otherwise, on this site we try to attribute correctly all contributions. Everything one contribute it’s his, obviously, the only thing in discussion is what happens if A used B’s contribution without acknowledging it. In that case, my previous comments (b) and (c) apply.
      I am really interested in being fair about this because myself I use this blog as an open notebook. Practically, that means anybody with a vile character could take ideas from here and publish them in some research journal. But it’s not so difficult to track such a behaviour, these days. There is still a danger, which I encountered several times, that some stupid guy not understanding the net, does this by thinking that nobody will notice. But even those guys are passing through a process of education, see all the discussions about plagiarism in the research world. Anyway, to finish with it, I look forward for some semantic search engine, which will surely spell doom on a significant proportion of 20th century “publish or perish” research.

  56. Woah guys calm down. 😉 the main thing about this blog is that it’s not forum based, we need different Threads of ideas to contribute to, and this is starting to appear to be the wrong format to do this coherently. Perhaps the open-source space is the place, but on the other hand this blog is going up in google’s search engine with regards to Euclideon, so it’s bringing people in. Is that good or bad? I don’t know.

    1. Yes, let’s calm down, here are some facts. In the last 7 days the page which got the most hits, 35% of them, was the homepage. (It’s always like this, usually going up to 50% of the hits. The homepage,i.e. the page with the latest post, is not counted in the “Most viewed lately” window.) Then, 30% of the hits were taken by UD pages. The rest of the hits spread over 91 other pages (in the last 7 days that’s up from 60-70 other pages, which is usual).
      People coming at this blog are also interested in open access, maths, neuroscience, philosophy.
      In conclusion, is fair to say that about 30% of the interest in this blog comes from UD.
      Probably, the most attractive feature of the blog is that is not a forum, but a place where people with very different interests can coexist, without having to accept any majority pressure. Diversity is very good for creativity, so it’s a mixed blessing if this blog becomes to look like a UD forum.
      The “Most viewed lately” page is a bit misleading, because it does not reflect this diversity, due to limitations of design. I put it in order to encourage people to look at other subjects than their primary interest, but now it reflects the vague wave of attention on UD.

      1. I kinda like the blog format – it’s a free conversation from which idea’s spawn – and you can see that as it happens. But as pointed out is hard to get all the info in one go and tricky to put code and images without using pastebin or imagebin..

      2. Agree, but links are as good as the place they point to. I am waiting a couple of days, maybe somebody wants to sort a bit this heap, then I shall make a new post with links to relevant projects. (It would help to have for each project a post, like Bauke, this is up to you.)
        Probably yesterday with my comment on open licences I was too eloquent? Scaring a bit people? I hope not.

  57. In order to see clearly through all the work contributed here, is anybody willing to filter the various threads from the posts? I am thinking about something like this:
    – people who started projects , with links to respective pages (of projects)
    – hypotheses about how UD should work, with links to relevant arguments (links are more important than disputes about who’s right).

    It’s a big mess of data, I could do it but I’m asking for help. A bad thing is that it seems that comments don’t have a stable address (I might be wrong). So, instead of pointing to A who contributed in the comment B of the post C the link D, I would need rather link D contributed by A in post C.

    Any other idea? Anybody can start a wiki format, or something like this? I don’t like the idea of a forum, it’s useful only for short discussions and eats a lot of time.

    1. The link at the top of each comment, shown as “#147” for example, ends with “#comment-4180” (your post). I think this number is a stable address.

  58. chorasimilarity :
    Agree, but links are as good as the place they point to. I am waiting a couple of days, maybe somebody wants to sort a bit this heap, then I shall make a new post with links to relevant projects. (It would help to have for each project a post, like Bauke, this is up to you.)
    Probably yesterday with my comment on open licences I was too eloquent? Scaring a bit people? I hope not.

    I think the license thing didn’t scare anybody – actually it’s a really good thing to mention esp after the whole Creative labs – Carmack’s reverse affair

    1. That Vid really show’s off the island demo – it just look awesome and soo detailed – even tho it’s instances 😛 – much better than the official vid they released!

  59. I repeat here a question I asked Jim earlier:
    Jim, I have a question concerning those86 cycles/rendered pixel: suppose you already extracted, from the huge database, the data needed for drawing a screen at a given moment. How many cycles/pixel do you need to actually draw the screen, from that moment? I am aware that it depends on what exactly is extracted from the database, like: coordinates of visible 3d points and their colors, or the respective nodes in an octree, with some color information …

  60. well you need to simply put your pixel into the screen buffer, ScreenBuffer[x+y*width]=pixel; or *ScreenBuffer=pixel. So it depends how you are drawing your screen, at most it’s a few cycles. There’s also the cpu model,data cache and instruction cache hits and a load other things to consider – so it’s a very hard question to answer properly – I would say it takes around less than 6 cycles on average, I’ve currently timed the win api blit for a 1280×720 and it’s around 4ms out of the 32ms you have for 30fps.

  61. actually here’s a really good example:

    for (int b=0;b<720;b++)
    for (int a=0;a<1280;a++)
    for (int c=0;c>23;
    that takes up the full 32ms on my rig – it’s not optimized or anything – and it was just a quick test for my profiling system – it doesn’t actually do anything of use

    1. Thanks. More precisely, suppose that the screen has N pixels and that I have selected already a number cN of 3d points from the database, (with c small), such that I know that for each pixel on the screen, there are about c 3d points to choose from. How many cycles/pixel, from the budget of 86, do I have to spend for solving the worst case: render a database of cN points, without other knowledge, like this point project on that pixel?

  62. hmmm so you basically want to 3D to 2D convert your previously found points?

    you need to:

    1) transform them into camera space

    Matrix transform e.t.c.

    2) then transform them into parametric screen space


    3) Draw your pixel


    of course this is the worst code/way to do this – it probably be more than the 86 cycle budget.

    1. OK, let me ask differently: how big can be the database of 3d points (structured as you want, but obviously the answer depends on the structure), such that it can be rendered in real time, by known algorithms, without using the GPU?

  63. I don’t know the answer to that I’m afraid, I’d have to knock up a app to test it – If you mean how many points can It convert from 3D to 2D. I could estimate – 500,000 points – maybe – but that’s pure speculation. It would take up a lot of time to knock up a app and then optimize it to hell. And of course once optimized – it is then limited to how you can use it. It’s kind of a chicken and egg issue 😛

    But basically It’s not that many – the CPU even with SIMD/SSE is just not geared up for it. But it is really good to ask these kinds question – because it gives you insight into math algorithms and CPU algorithms, i.e. a math algorithm that’s dynamite on paper – might be waaay to slow in realty because it uses to many floating point operations or too much memory or too many stack calls e.t.c…….

    1. OK, here’s why I ask (we discussed about this here, but it got lost). Dell repeatedly said that UD is not a renderer. The real problem that Dell solves is the one of constructing a database structure (in infinite time, prior to the real-time use of it), with the properties:
      -the new database has dimensions comparable with the usual, octree based one (or whatever is the format they keep lidar data),
      – then UD works by selecting from this cleverly organized database exactly what is needed (“one point per pixel” quote), which is then fed to a renderer of your choice.

      SO, starting from your 86 cycles/pixel budget, I try to figure out how much time it takes to the UD program to select what it needs from the database, by substracting the time needed by the most clever, classical renderer.

  64. ok got you – well that all depends on the pixels that are fed to the renderer. is just a sequential list of pixels? if so then it takes around 6 cycles to draw them
    for (int b=0;b<720;b++)
    for (int a=0;a<1280;a++)
    u32 Pixel=UDAlgo(a,b,Camera e.t.c.); // 80 cycles
    *Buffer++=Pixel; // draw pixel – 6 cycles or whatever

    if the UD algo returns pixel colour and position it would be:

    for (int b=0;b<720;b++)
    for (int a=0;a<1280;a++)
    u32 Pixel=UDAlgo(a,b,camera…).Colour; //
    vec3 Pos=UDAldo(a,b,camera…).Pos; // 80 cycles
    sx=Transform(Pos); //
    sy=Transform(Pos); // 50 cycles or whatever

    my guess it that the UD algo did the 3D-2D transform in the past – hence the "Mass connected processing", unless they ditched software and gone totally GPU, in which case the algo could return a Vertex buffer object (a list of positions and colours)- which you then send to the GPU to render.

    One thing to consider – we've never seen a .exe of the software version – only vid's, which some have turned out to be animations 😛

  65. I can’t remember the link, it was the vid where Bruce said that they had tried ray tracing and tried octrees and said something like “this animation took xx time” e.t.c..I’m sure it’s still out there.

  66. sure – that would have led him to the discovery of the problem with standard octree traversal – which is that you have to re-traverse the entire tree at certain points. Consider a tree of depth 24, your at the middle point of the tree at the smallest child 24 levels down – you want to get the next child across – it requires that you jump back up 24 levels and then back down 24 levels to get to that next child 🙂

  67. Jim, using your traversal method, animations can also be applied with a technique such as color cycling. Would be and interesting, and fast way to provide real-time animation to your data.

    1. Hi Kyle, how can colour cycling enable animation? I don’t understand how that works, can you explain it a bit more – thanks!

  68. Hi All,

    I’ve been following this blog for a while now as it’s quite interesting to see how many people are still approaching the UD conundrum as a variation on well documented point rendering techniques, many of which have lost their value over the years when you start using much higher screen resolutions, dataset sizes or view distances (unfortunately most of them sound promising on “paper” but you have to implement them before discovering their limitations).

    I’m currently researching a very different approach which at this stage certainly seems to have many similar characteristics to UD:

    – has a non-trivial pre-process (which is where all the “difficult” problems need solving)
    – converts the raw point data into an alternative domain
    – has a fairly trivial renderer which runs very fast in software alone
    – render time is proportional to screen resolution
    – render time is not dependent on database size or viewpoint
    – the renderer deals purely with points and has no concept of: raycasting, splatting or voxels

    For the moment here are some (fairly heavy) clues to try and get other people thinking along a different track…

    1) the raw point dataset is in the spatial domain and traditional approaches using e.g. octree encoding will let you answer queries about which points exist within a specified volume
    2) an alternative domain is the visibility domain where you can answer queries about which points are visible from a specified view point
    3) think about how a metric for point visibility/occlusion would work
    4) also think about how point visibility is dependent on distance from the view point – individual points only project into visible pixels when you’re close to them and in the distance groups of points effectively merge together
    5) find an efficient way to store that information in a hierarchy which allows very fast progressive retrieval of the set of visible points from a given viewpoint


    1. Hi Jumper, welcome. That’s the kind of message I like. But now, you great wizard of Oz, better show us how it works, make us believe not by offering hints, but proofs or programs.

    2. hi Jumper, look really good.
      how is this domain?, have a formulae for convert from spatial domain? can you give us a more concrete info?
      I think (without know anything about you aproach) you not be worry about conversion and compression first.

      1. Hi Preda,

        Ok here’s a simplified description to get the method across (the storage method and the
        preprocess are actually more complex and is where the majority of the difficult problems lie – I’ve been playing with the method for over a year, time permitting, and I’m not quite ready to reveal some of the finer details yet…).

        The basic idea is to use the octree structure in a different way as a heirarchy of visibility information (think of something akin to JX’s cubemaps approach but generalized to a progressive hierarchy).

        If we consider the visibility of a point with respect to a cubic volume (e.g. a node of an octree) there are 3 basic situations:
        – the point is fully occluded from the node
        – the point is partially visible from some viewpoints inside the node
        – the point is fully visible from any viewpoint inside the node

        When creating the new database we start at the root node which encompasses the entire
        scene and recurse downwards examining which points are visible from each node.

        Each node contains a list of only the points which are external to that node and FULLY visible from anywhere inside the node (and taking distance thresholds into account as well).

        This implies we’re only concerned with empty space nodes which have no internal occuders. Any non-empty nodes are automatically subdivided and have no fully visible points themselves (since there will always be some viewpoint in the node where any point
        gets occluded).

        Any points which are only partially visible from the node (due to occlusion) are dealt with further done in the octree subdivision where they will begin to become fully visible.

        Rendering then consists of starting at the root node and doing a single quick top-down traversal of the octree and visiting only the nodes which contain the viewpoint, and collecting together all the points which these nodes reference.

        You’re then left with a combined list of all points visible from the given viewpoint, which can then be transformed, clipped and depth buffered on to the screen (using CPU or preferably as a GPU vertex buffer).

        As you can see this is a trivial renderer where the octree traversal takes virtually no time and on GPU the transform is also very fast.

      2. Thanks Jumper, if you want we can make a guestpost, but take your time to explain in more detail. At first sight, it looks that you are going to obtain an enormously huge database. Otherwise I agree with you that’s a path to pursue.

      3. Jumper, could you give more details about this: “Each node contains a list of only the points which are external to that node and FULLY visible from anywhere inside the node (and taking distance thresholds into account as well).”? There has to be something you do with the “distance thresholds”, otherwise practically most of the empty space nodes have links with most of the inhabited nodes. Related, I’ve joked repeatedly that Bruce Dell’s “Level of Distance” expression was used on purpose, but I don’t see how to implement it without multiplications, otherwise than approximately, by some Manhattan distance.

    3. Not really on the same track as Jumper’s comments but more to do with the idea of different domains. If you process your data from the spatial domain into the frequency domain (with some shifting) there are volume rendering techniques that by taking a 2d slice of this new domain (when gone through a reverse transform) that is equivalent to a particular viewpoint looking at the data using the projection-slice theorem in three dimensions.

      1. The techniques are all variations of fourier volume rendering and are used a lot it medical imaging. Only problem with this path is that it does not translate well into what we are wanting. These have the side effect of taking into account the effects of the whole volume instead of just the visual hulls. Depending on your transfer function you could approach it but now you are doing a lot more just to fight against the formula you used.

      2. Interesting. what seems to be needed is this: instead of the “x-ray version” computed as the sum (integral) along a direction, consider the “tropical” version, computed as the max of a function (the first hit). This “tropical analysis” is a very mysterious thing, the correspondent of the Fourier transform is the Legendre transform from convex analysis. (see section 5 here. I wonder if anybody tried to do tropical Fourier volume rendering, please tell if you know or do, because I like convex analysis.

      3. I can not say that I am aware of anyone trying that approach to be honest. It is worth research in its own right but I still see hurdles that would need to be worked out. If is along the same lines the problems would be that we would end up with basically a parallel project along an arbitrary viewing direction from outside the volume only. But I would have to check out more to see if that is only a problem due to current approaches.

      4. And that brings us full circle back to light fields and/or visual hull reconstruction on that thought. Its basically equivalent in the naïve approach to having an object prerendered from all angles at a fixed step increment.

      5. 🙂 Shea, it’s not related to this subject, but do you have any links to communicate concerning uses of DNA origami in computing (of the kind a naive person like me does not immediately discern by using google)? I arrived at the conclusion that 2 fingers are a better model of the concept of 2 than the equivalence class of all sets with 2 elements.

      6. I do not know if that is a serious question with a joke or a jab :). But if you are inquiring off topic about that for your graphic lambda calculus relations to DNA computing then we might want to move that possible discussion elsewhere.

  69. about their 2012 demo, I am ABSOLUTELY positive they just instanced the data, since they’re using tiles, they could just as easily only load unique tiles, and render them in their respective positions on the 3d world.

    As an update, to my previous post, I’ve found myself surprised with the speed of these additional changes, the only thing my algorithm does not like.. is noise, small spaced points, and since euclideon never shows ‘noise’, I wonder if they also have the same problem. They have solid surfaces in all of their examples, and the amount of noise in their geo-spatial demos, are minimal.

    For their geo-spatial demo, you’ll notice they started loaded from the hard-drive, I think this is because they could no longer do their previous instancing trick.. as they’d be forced to load many unique tiles, this can be indexed well, and loaded quickly as they pointed out if the world is aligned to a eularian grid, with some fixed cubical dimensions.

    This is a remark of someone here describing that in their island demo, compression would need to be greater then a thousand of a bit.. I refuse to believe that, and as stated above, have arguments to suggest their compression is not that capable, on unique data.

  70. interesting stuff Kyle – I really like the idea of volumetric imposters, simple and easy compared to the traversal nightmare I’m in at the moment 😛 – but am concerned of the speed of a software rendered version of – both the 3d-2d projection and the occulsion overdraw issue. How do you propose to get around those issues to get at 30fps?

  71. awesome!! I’m playing with the old Menger sponge set at the moment – so real data would be wicked. Really thinking I’ll have something running very soon now 😛

    just incase anyone wants the sponge set to play with here it is:

    u32 Sponge[104]={ 8, 16, 24, 32, 40, 48, 56, 64,
    0xff,0xff,0xff,0xff, 80, 80, 80, 80,
    0xff,0xff,0xff,0xff, 88, 88, 88, 88,
    0xff,0xff,0xff,0xff, 96, 96, 96, 96,
    0xff,0xff,0xff,0xff, 0, 0, 0, 0};

    Done in 2x2x2 dimensions and in linear octree format.

      1. it’s simply 8 sets of u32 per node
        each value is either a direct pointer to a node or it is 0xff meaning null node

    1. Approximately the same as in the movie, which is 15 FPS. It hovers around 10FPS and 15FPS, with some hiccups as data needs to be loaded from harddisk.
      I run it on a single core of my laptop i7 processor (2.2GHz), final rendering pass is done on my intel integrated graphics card.

  72. will prolly have something later on tonight, but I can’t promise that 100% 🙂 – I finalized the traversal algo and then ZZzzzzz’d – all this blog reading is tiring hahah 🙂

    1. yes, that’s an orgy of comments. A hundred times I wanted to stop the comments and to go to the usual demands – only informative, no unsupported opinion, long range attention span. But … it gives results, it’s kind of a party fun, in few days we’ll go back to being sober and think alone, undisturbed. Then party again, when we’ll have a reason ;).

  73. yeah I love the party vibe – it’s great it really keeps the left side of the brain at bay – which is much needed for this kind of speculative R&D

    1. Thank you Jim, I’ll add the links, (there is a number N of links in a comment which moves it in the moderation queue). There is a comment by Bauke which says that the Menger sponge is harder to “render” than a geographic landscape, maybe you try a too difficult world?

  74. Yes that is exactly correct – I made the Sponge data for that exact reason – it’s the worst case scenario for any rendering and will be the slowest – since it’s :

    1) infinite
    2) has many spaces which lead to more and more rendering.

    it’s going good so far – everything working as expected – just got to finished on the polygon converter.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s