I try to formulate the question about how Unlimited Detail works like this:

Let D be a database of 3D points, containing information about M points. Let also S be the image on the screen, say with N pixels. Problem:

reorganize the database D to obtain another database D’ with at most O(M) bits, such that

starting from D’ and a finite (say 100 bytes) word there exists an algorithm which finds the image on the screen in O(N log(M)) time.

Is this reasonable?

For example, take N=1. The finite word means the position and orientation of the screen in the 3D world of the database. If the M points would admit a representation as a number (euclidean invariant hash function?) of order M^a (i.e. polynomial in the number of points), then it would be reasonable to expect D’ to have dimension of order O(log(M)), so in this case simply by traversing D’ we get the time O(log(M)) = O(N log(M)). Even if we cannot make D’ to be O(log(M)) large, maybe the algorithm still takes O(log(M)) steps simply because M is approximately the volume, so the diameter in 3D space is roughly between M^(1/3) and M, or due to scaling of the perspective the algorithm may still hop through D’ in geometric, not arithmetic steps.

The second remark is that there is no restriction for the time which is necessary for transforming D into D’.

UD is only using non-multiplicative integer operations, so they say. I believe this strongly suggest that they are still using an octree. Though, they can still store anything in the nodes. This will be our D’. Note that D’ has depth O(log(M)). Each parent node stores an average of its childnodes. The octree only stores the surface of the objects.

The algorithm should run on single core of a current day cpu. We assume ours can do 1.000.000.000 = 1G operations/second. We aim for 20 fps, which gives 50M ops/frame.

One option is raytracing, which I believe is doable with only integer operations, but this would leave us with O(log(M)) operations per ray, which is little. For a 1024×768 screen, this is 63 operations, which is just too short. If this number can be increased by a factor 20 or something, this will become a feasible approach (as mentioned by John Carmack).

These figures confirm Bruce Dells observation that raytracing is too slow. We might be able to use raycasting instead, but Bruce mentioned he is doing something entirely different.

So here comes my solution to the above problem, which basically is just octree splatting.

We want to draw O(N) pixels on the screen and fill holes in O(N) time.

For each O(log(M)) levels of detail, we will traverse all nodes in the octree that:
1. are at the same detail level;
2. have a parent that is a radius of 0.5 * sqrt(N) * [size of node at current detail level];
3. have no childnodes that would be visited;
4. is within the viewport.
For each node we draw a single pixel on the screen at the projected center of the node. To handle overlapping pixels, one can either use a z-buffer or traverse the octree in back-to-front order.

Because we are only storing the surface, a cube with size R will have volume O(R^3), but contain only O(R^2) atoms. Therefore our traversal will only draw O(N) points. I believe it also visits only O(N) nodes.

Hence this method runs in O(N log(M)) time. It draws the same amount of pixels, but only O(N) will be useful, due to overlap.

The euclideon island requires an octree of depth 22. If we render N/4 pixels per level, this gives 11 operations per pixel/node. I think this is just about doable, but will require good optimization.

Bauke thanks, I didn’t got “2. have a parent that is a radius of 0.5 * sqrt(N) * [size of node at current detail level];” and “O(R^3), but contain only O(R^2) atoms. Therefore our traversal will only draw O(N) points. I believe it also visits only O(N) nodes. “, could you detail?
However I thing the database is more clever than that.

— “2. have a parent that is *within* a radius of 0.5 * sqrt(N) * [size of node at current detail level] around the current viewpoint; ”
I missed the word “within” there and the radius is the maximum distance from the current viewpoint. The idea here is to pick a sphere around the viewpoint and draw all the non-empty nodes within that sphere. The size of the sphere depends on the detail level currently being drawn.

— “O(R^3), but contain only O(R^2) atoms.”
This is a rephrase of Meagers theorem that states that the octree complexity is on the order of the surface area of the object.

— “Therefore our traversal will only draw O(N) points.”
The sphere has radius sqrt(N) and by Meagers theorem this implies that it contains O(N) nodes (at the current detail level).

— “I believe it also visits only O(N) nodes.”
If you want to visit X elements in an octree you need to traverse O(X log(M)) nodes. but since the elements are all the elements in s spherical region, I expect that the actual number of traversed nodes is O(M).

What I failed to mention is that rendering time now also depends on the surface complexity of the scene. I wonder whether in UD, this secretly is also the case.

I’m quite certain that UD still uses octrees as the basis structure of their database. They likely also added some clever tricks, or small, but clever, modifications. I can think of some, but none of those would give significant improvements.

No, it’s not that simple, for sure. Think that Bruce Dell mentions 20hrs time needed for transforming a 1TB database of 3D points into his format. (The question was “how much time a pc needs to do the 1TB database conversion?” and Dell seemed puzzled, then gave the mentioned answer. I think the question is anyways badly formulated and the answer should be completed with “not by a pc, but by a really strong computing device”. In another presentation, Dell puts the image of a computer farm when he speaks about the fact that the database conversion takes a while, but you only need it to be done once.) Secondly, the right algorithm for the second part of the problem (i.e. real time rendering) is such that it is still real time if you have the preprocessed database in a place and the computer in another place. I don’t think that’s feasible by your proposal. Finally, it is not right to say that a volume O(R^{3}) contains O(R^{2}) points of a surface, unless you have a bound on the curvature of the surface. Moreover, databases of real 3D points may have a different statistics (i.e. R^{a} with a between 2 and 3). Same holds btw for the fractal used by Dave H, which might be even more complex than real points databases.

During my study I did some experiments on I/O-efficient algorithms. One of the results was that with 16MB ram, it took 10 minutes to sort 1GB of data, assuming the algorithm is cache aware. Otherwise it would take days. Scaling this up to 1TB and say 16GB of ram, this probably would take a few hours. So processing 1TB in 20 hours, means that there is not much you can actually do. Also, I doubt whether their implementation is I/O-efficient.

In case the model is filled such that a volume of O(R^3) contains O(R^3) points, My algorithm would run in O(N sqrt(N) log(M)) time. This is probably also the worst case complexity that an octree ray tracer would have.

An octree is still the only data structure I can think of that has all the properties that UD has, except that it has no method known to me to render it at interactive speed.

Improving speed in exchange for memory is no option as this will violate the compression property. I’d say that each point can be stored at most once. So cubemap based approaches will not work.

I have been thinking about other storage structures, but all must be rendered with some variation of ray-tracing. As far as I’m able to understand the parallax-based method in https://chorasimilarity.wordpress.com/2013/01/25/teaser-2d-ud/ seems to have this problem as well, as it would have O(N log(M)) layers.

Another approach I’m currently thinking of, is something with point location data structures, but I’m not making any progress.

I think I should just implement my algorithm in #1 and see how well it runs.

Bauke, try it and tell us. Otherwise, please try to provide constructive commments, not authority based arguments. UD is based on some variant of octrees, that’s sure, if you understand octrees to be the 3D equivalent of the 1D compressed positional notation for numbers. That’s fine, but some of your other parts of argumentation would be better formulated by adding “I think”, or “I don’t see how” instead of being definite, but contradictory with other evidence, formulations. Which triggered this message of mine. If you say something is impossible, then prove it or otherwise say “I think is impossible”, see?

Sorry, my intention was to indeed prove my arguments, but I tend to omit them if I think a proof is trivial, and otherwise my proofs tend to be too short to follow. When reasoning about algorithm speed where I/O is the bottleneck, I can only make rough guesses.

Anyway, I did some implementation. The first thing I did was converting a dataset of Mouna Loa (135.833.540 points). This was 8.9 GB of plain ascii data (just a list of points). It took 4.8 minutes to convert it to 3.4 GB of data in another ascii format (still a list of points). CPU was used 81%, so disk I/O was clearly the bottleneck. This is a single linear scan, which means the files are read and written in an order that optimizes speed. Scaling these values up to 1TB, means a processing time of 9 hours. So my estimation wasn’t that far off.

Second thing I did, was loading 5.000.000 points of this dataset into memory. These are stored in the leaves of an octree. Note that I only create the leaf nodes that store exactly one point and I create their parent nodes. For the parent nodes I compute the average color of its child nodes. This way I can use this parent node as an approximation of the child nodes.

For rendering the entire dataset, I would have to build the octree data structure in a file on disk. Then I could use memory mapping to access this data and my OS would handle caching the used data in memory.

This program runs really well, when viewing the dataset from a distance. However, when moving into the dataset, the frame rate drops to about 4FPS. I estimate that the full dataset would require about 5 seconds per frame to render properly. Some optimization is necessary here.

I used a hole filling algorithm, however that failed to work when points where to far apart. (I can some in so much that I can see individual points.)

Another problem I noticed was that at the edge of a detail-sphere (those with radius 0.5 * sqrt(N) * [size of node at current detail level]), within the sphere the point density was to high, causing a grainy effect, while just outside the density was too little.

On another dataset, a texture with hight map, converted to point cloud, I noticed that my approximation method also has problems. the lower points where much darker than the higher points. And when viewing the scene from top and moving backwards, the scene abruptly became much darker when it moved outside a detail-sphere. Some interpolation would help to mask this effect.

Euclideon island does not seem to contain these kind of high contrast items. Otherwise I would expect to see a similar effect. Though I did believe to see that the scene becomes less saturated when zoomed out.

For storage nodes that store ‘almost’ identical data can be replaced by a single node. This converts the octree into an directed acyclic graph (dag). This would already give some compression. Some jpeg-like compression techniques can be used to further improve this compression. (If ‘almost’ identical nodes are replaced, you get lossy compression.) Along with the dag, this gives a compression method that could be considered jpeg compression for colored point clouds.

I believe I’m much better at writing such compression algorithm than at writing an efficient renderer, so if anyone is interested, I would be happy (and if time permits) to implement this compression algorithm and provide (Linux) code for reading and traversing this data, while it is stored in a file. (So this includes disk access and caching in memory.)

Random thoughts and works around Artificial Life & Intelligence, Bio-inspired Computation, Complex Sciences, their Applications, Technology, Science, Culture and Society

UD is only using non-multiplicative integer operations, so they say. I believe this strongly suggest that they are still using an octree. Though, they can still store anything in the nodes. This will be our D’. Note that D’ has depth O(log(M)). Each parent node stores an average of its childnodes. The octree only stores the surface of the objects.

The algorithm should run on single core of a current day cpu. We assume ours can do 1.000.000.000 = 1G operations/second. We aim for 20 fps, which gives 50M ops/frame.

One option is raytracing, which I believe is doable with only integer operations, but this would leave us with O(log(M)) operations per ray, which is little. For a 1024×768 screen, this is 63 operations, which is just too short. If this number can be increased by a factor 20 or something, this will become a feasible approach (as mentioned by John Carmack).

These figures confirm Bruce Dells observation that raytracing is too slow. We might be able to use raycasting instead, but Bruce mentioned he is doing something entirely different.

So here comes my solution to the above problem, which basically is just octree splatting.

We want to draw O(N) pixels on the screen and fill holes in O(N) time.

For each O(log(M)) levels of detail, we will traverse all nodes in the octree that:

1. are at the same detail level;

2. have a parent that is a radius of 0.5 * sqrt(N) * [size of node at current detail level];

3. have no childnodes that would be visited;

4. is within the viewport.

For each node we draw a single pixel on the screen at the projected center of the node. To handle overlapping pixels, one can either use a z-buffer or traverse the octree in back-to-front order.

Because we are only storing the surface, a cube with size R will have volume O(R^3), but contain only O(R^2) atoms. Therefore our traversal will only draw O(N) points. I believe it also visits only O(N) nodes.

Hence this method runs in O(N log(M)) time. It draws the same amount of pixels, but only O(N) will be useful, due to overlap.

The euclideon island requires an octree of depth 22. If we render N/4 pixels per level, this gives 11 operations per pixel/node. I think this is just about doable, but will require good optimization.

Bauke thanks, I didn’t got “2. have a parent that is a radius of 0.5 * sqrt(N) * [size of node at current detail level];” and “O(R^3), but contain only O(R^2) atoms. Therefore our traversal will only draw O(N) points. I believe it also visits only O(N) nodes. “, could you detail?

However I thing the database is more clever than that.

— “2. have a parent that is *within* a radius of 0.5 * sqrt(N) * [size of node at current detail level] around the current viewpoint; ”

I missed the word “within” there and the radius is the maximum distance from the current viewpoint. The idea here is to pick a sphere around the viewpoint and draw all the non-empty nodes within that sphere. The size of the sphere depends on the detail level currently being drawn.

— “O(R^3), but contain only O(R^2) atoms.”

This is a rephrase of Meagers theorem that states that the octree complexity is on the order of the surface area of the object.

— “Therefore our traversal will only draw O(N) points.”

The sphere has radius sqrt(N) and by Meagers theorem this implies that it contains O(N) nodes (at the current detail level).

— “I believe it also visits only O(N) nodes.”

If you want to visit X elements in an octree you need to traverse O(X log(M)) nodes. but since the elements are all the elements in s spherical region, I expect that the actual number of traversed nodes is O(M).

What I failed to mention is that rendering time now also depends on the surface complexity of the scene. I wonder whether in UD, this secretly is also the case.

I’m quite certain that UD still uses octrees as the basis structure of their database. They likely also added some clever tricks, or small, but clever, modifications. I can think of some, but none of those would give significant improvements.

No, it’s not that simple, for sure. Think that Bruce Dell mentions 20hrs time needed for transforming a 1TB database of 3D points into his format. (The question was “how much time a pc needs to do the 1TB database conversion?” and Dell seemed puzzled, then gave the mentioned answer. I think the question is anyways badly formulated and the answer should be completed with “not by a pc, but by a really strong computing device”. In another presentation, Dell puts the image of a computer farm when he speaks about the fact that the database conversion takes a while, but you only need it to be done once.) Secondly, the right algorithm for the second part of the problem (i.e. real time rendering) is such that it is still real time if you have the preprocessed database in a place and the computer in another place. I don’t think that’s feasible by your proposal. Finally, it is not right to say that a volume O(R^{3}) contains O(R^{2}) points of a surface, unless you have a bound on the curvature of the surface. Moreover, databases of real 3D points may have a different statistics (i.e. R^{a} with a between 2 and 3). Same holds btw for the fractal used by Dave H, which might be even more complex than real points databases.

During my study I did some experiments on I/O-efficient algorithms. One of the results was that with 16MB ram, it took 10 minutes to sort 1GB of data, assuming the algorithm is cache aware. Otherwise it would take days. Scaling this up to 1TB and say 16GB of ram, this probably would take a few hours. So processing 1TB in 20 hours, means that there is not much you can actually do. Also, I doubt whether their implementation is I/O-efficient.

In case the model is filled such that a volume of O(R^3) contains O(R^3) points, My algorithm would run in O(N sqrt(N) log(M)) time. This is probably also the worst case complexity that an octree ray tracer would have.

An octree is still the only data structure I can think of that has all the properties that UD has, except that it has no method known to me to render it at interactive speed.

Improving speed in exchange for memory is no option as this will violate the compression property. I’d say that each point can be stored at most once. So cubemap based approaches will not work.

I have been thinking about other storage structures, but all must be rendered with some variation of ray-tracing. As far as I’m able to understand the parallax-based method in https://chorasimilarity.wordpress.com/2013/01/25/teaser-2d-ud/ seems to have this problem as well, as it would have O(N log(M)) layers.

Another approach I’m currently thinking of, is something with point location data structures, but I’m not making any progress.

I think I should just implement my algorithm in #1 and see how well it runs.

Bauke, try it and tell us. Otherwise, please try to provide constructive commments, not authority based arguments. UD is based on some variant of octrees, that’s sure, if you understand octrees to be the 3D equivalent of the 1D compressed positional notation for numbers. That’s fine, but some of your other parts of argumentation would be better formulated by adding “I think”, or “I don’t see how” instead of being definite, but contradictory with other evidence, formulations. Which triggered this message of mine. If you say something is impossible, then prove it or otherwise say “I think is impossible”, see?

Sorry, my intention was to indeed prove my arguments, but I tend to omit them if I think a proof is trivial, and otherwise my proofs tend to be too short to follow. When reasoning about algorithm speed where I/O is the bottleneck, I can only make rough guesses.

Anyway, I did some implementation. The first thing I did was converting a dataset of Mouna Loa (135.833.540 points). This was 8.9 GB of plain ascii data (just a list of points). It took 4.8 minutes to convert it to 3.4 GB of data in another ascii format (still a list of points). CPU was used 81%, so disk I/O was clearly the bottleneck. This is a single linear scan, which means the files are read and written in an order that optimizes speed. Scaling these values up to 1TB, means a processing time of 9 hours. So my estimation wasn’t that far off.

Second thing I did, was loading 5.000.000 points of this dataset into memory. These are stored in the leaves of an octree. Note that I only create the leaf nodes that store exactly one point and I create their parent nodes. For the parent nodes I compute the average color of its child nodes. This way I can use this parent node as an approximation of the child nodes.

For rendering the entire dataset, I would have to build the octree data structure in a file on disk. Then I could use memory mapping to access this data and my OS would handle caching the used data in memory.

This program runs really well, when viewing the dataset from a distance. However, when moving into the dataset, the frame rate drops to about 4FPS. I estimate that the full dataset would require about 5 seconds per frame to render properly. Some optimization is necessary here.

I used a hole filling algorithm, however that failed to work when points where to far apart. (I can some in so much that I can see individual points.)

Another problem I noticed was that at the edge of a detail-sphere (those with radius 0.5 * sqrt(N) * [size of node at current detail level]), within the sphere the point density was to high, causing a grainy effect, while just outside the density was too little.

On another dataset, a texture with hight map, converted to point cloud, I noticed that my approximation method also has problems. the lower points where much darker than the higher points. And when viewing the scene from top and moving backwards, the scene abruptly became much darker when it moved outside a detail-sphere. Some interpolation would help to mask this effect.

Euclideon island does not seem to contain these kind of high contrast items. Otherwise I would expect to see a similar effect. Though I did believe to see that the scene becomes less saturated when zoomed out.

For storage nodes that store ‘almost’ identical data can be replaced by a single node. This converts the octree into an directed acyclic graph (dag). This would already give some compression. Some jpeg-like compression techniques can be used to further improve this compression. (If ‘almost’ identical nodes are replaced, you get lossy compression.) Along with the dag, this gives a compression method that could be considered jpeg compression for colored point clouds.

I believe I’m much better at writing such compression algorithm than at writing an efficient renderer, so if anyone is interested, I would be happy (and if time permits) to implement this compression algorithm and provide (Linux) code for reading and traversing this data, while it is stored in a file. (So this includes disk access and caching in memory.)