Tag Archives: vision

Unlimited detail challenge: the most easy formulation

Thanks to Dave H. for noticing the new Euclideon site!

Now, I propose you to think about  the most easy formulation of unlimited detail.

You live in a 2D world, you have a 1D screen which has 4 pixels. You look at the world, through the screen, by using a 90deg frustrum. The 2D world is finite, with diameter N, measured in the unit lengths of the pixels (so the screen has length 4 and your eye is at distance 2 from the screen). The world contains atoms which are located on integer coordinates in the plan. There are no two atoms in the same place. Each atom has attached to it at most P bits, representing its colour. Your position is given by a pair of integer coordinates and the screen points towards  N, S, E, W only.

Challenge: give a greedy algorithm which, given your position and the screen direction,  it chooses 4 atoms from the world which are visible through the screen, in at most O(log N) steps.

Hints:

  • think about what “visible” means in this setting
  • use creatively numbers written in base 2, as words.

_______________

The Teaser 2D UD might help.

I’m not giving this challenge because I am a secretive …. but because I enjoy collaborations.

Teaser: 2D UD

Here are two images which may (or may not) give an idea about another fast algorithm for real time rendering of 3D point cloud scenes (but attention, the images are drawn for the baby model of 2D point cloud scenes).  The secret lies in the database.

I have a busy schedule the next weeks and I have to get it out of my system. Therefore,  if anybody gets it then please send me a comment here. Has this been done before, does it work?

The images now: the first has no name

eucludeon

The second image is a photo of the Stoa Poikile, taken from here:

Stoa_Poikile

Hint: this is a solution for the ray shooting problem (read it) which eliminates trigonometry, shooting rays, computing intersections, and it uses only addition operation (once the database is well done), moreover, the database organized as in the pictures cannot be bigger than the original one (thus it is also a compression of the original database).

_______________

See the solution given  by JX of an unlimited detail algorithm here and here.

Diorama, Myriorama, Unlimited detail-orama

Let me tell  in plain words  the explanation by  JX about how a UD algorithm might work (is not just an idea, is supported by proof, experiments, go and see this post).

It is too funny! Is the computer version of a diorama. Is an unlimited-detail-orama.

Before giving the zest of the explanation of JX, let’s thinks: do you ever saw a totally artificial construction which, when you look at it, it tricks your mind to believe you look at an actual, vast piece of landscape, full of infinite detail? Yes, right? This is a serious thing, actually, it poses a lot of questions about how much can be  compressed the 3D visual experience of a mind boggling  huge database of 3D points.

Indeed, JX explains that his UD type algorithm has two parts:

  • indexing: start with a database of 3D points, like a laser scan. Then, produce another database of cubemaps centered in a net of equally spaced “centerpoints” which cover the 3D scene. The cubemaps are done at screen resolution, obtained as a projection of the scene on a reasonably small cube centered at the centerpoint. You may keep these cubemaps in various ways, one of these is by linking the centerpoint with the visible 3D points. Compress (several techniques suggested).   For this part of the algorithm there is no time constraint, it is done before the real-time rendering part.
  • real-time rendering: input where the camera is, get only the points seen from closest  centerpoint, get the cubemap, improve it by using previous cubemaps and/or neighbouring cubemaps. Take care about filling holes which appear when you change the point of view.

Now, let me show you this has been done before, in the meatspace.  And even more, like animation! Go and read this, is too funny:

  • The Daguerre Dioramas. Here’s (actually an improved version of) your cubemap JX: (image taken from the linked wiki page)

Diorama_diagram

  • But maybe you don’t work in the geospatial industry and you don’t have render farms and huge data available. Then you may use a Myriorama, with palm trees, gravel, statues, themselves rendered as dioramas. (image taken from the linked wiki page)

Myriorama_cards

  • Would you like to do animation? Here is it, look at the nice choo-choo train (polygon-rendered, at a a scale)

ExeterBank_modelrailway

(image taken from this wiki page)

Please, JX, correct me if I am wrong.

Discussion about how an UD algorithm might work

I offer this post for discussions around UD type algorithms. I shall update this post, each time indicating the original comment with the suggested updates.

[The rule concerning comments on this blog is that the first time you comment, I have to aprove it. I keep the privilege of not accepting or deleting comments which are not constructive]

For other posts here on the subject of UD see the dedicated tag unlimited detail.

I propose you to start from this comment by JX, then we may work on it to make it clear (even for a mathematician). Thank you JX for this comment!

I arranged a bit the comment, [what is written between brackets is my comment]. I numbered each paragraph, for easiness.

Now I worked and thought enough to reveal all the details, lol. [see this comment by JX]
I may dissapoint you: there’s no much mathematics in what I did. JUST SOME VERY ROUGH BRUTE-FORCE TRICKS.

1) In short: I render cubemaps but not of pixels – it is cubemaps of 3d points visible from some center.

2) When camera is in that cubemap’s center – all points projected and no holes visible. When camera moves, the world realistically changes in perspective but holes count increases. I combine few snapshots at time to decrease holes count, I also use simple hole filling algorithm. My hole filling algorithm sometimes gives same artifacts as in non-cropped UD videos (bottom and right sides) .

[source JX #2]   ( link to the artifacts image )this artifacts can be received after appliying hole-filling algorithm from left to right and then from top to the bottom, this why they appear only on right and bottom sides. Another case is viewport clipping of groups of points arranged into grid: link from my old experiment with such groups.

This confirms that UD has holes too and his claim “exactly one point for each pixel” isn’t true.
3) I used words “special”, “way”, “algorithm” etc just to fog the truth a bit. And there is some problems (with disk space) which doesn’t really bother UD as I understand. [that’s why they moved to geospatial industry] So probably my idea is very far from UD’s secret. Yes, it allows to render huge point clouds but it is stupid and I’m sure now it was done before. Maybe there is possibility to take some ideas from my engine and improve them, so here is the explanation:
4) Yes, I too started this project with this idea: “indexing is the key”. You say to the database: “camera position is XYZ, give me the points”. And there’s files in database with separated points, database just picks up few files and gives them to you. It just can’t be slow. It only may be very heavy-weight (impossible to store such many “panoramas”) .

5) I found that instead of keeping _screen pixels_ (like for panoramas) for each millimeter of camera position it is possible to keep actual _point coordinates_ (like single laser scanner frame) and project them again and again while camera moves and fill holes with other points and camera step between those files may be far bigger than millimeters (like for stereo-pairs to see volumetric image you only need two distant “snapshots”).

6) By “points linked with each other” I meant bunch of points linked to the some central points. (by points I mean _visible_ points from central point).

7) What is central point? Consider this as laser scanner frame. Scanner is static and catches points around itself. Point density near scanner is high and vice versa.

8) So again: my engine just switches gradually between virtual “scanner” snapshots of points relative to some center. During real-time presentation, for each frame a few snapshots are  projected, more points projected from the nearest, less from far snapshots.

9) Total point count isn’t very big, so real-time isn’t impossible. Some holes appear, simple algorithm fills them using only color and z-buffer data.
10) I receive frames(or snapshots) by projecting all the points using perspective matrix, I use fov 90, 256×256 or 512×512 point buffer (like z-buffer but it stores relative (to the scanner) point position XYZ).

11) I do this six times to receive cubemap. Maximum points in the frame is 512x512x6. I can easily do color interpolation for the overlapped points. I don’t pick color of the point from one place. This makes data interleaved and repeated.

12) Next functions allow me to compress point coordinates in snapshots to the 16bit values. Why it works – because we don’t need big precision for distant points, they often don’t change screen position while being moved by small steps.

int32_t expand(int16_t x, float y)
{
int8_t sign = 1;
if (x<0) { sign = -1; x = -x; }
return (x+x*(x*y))*sign;
}

int16_t shrink(int32_t z, float y)
{
int8_t sign = 1;
if (z<0) { sign = -1; z = -z; }
return ((sqrtf(4*y*z+1)-1)/(2*y))*sign;
}

13) I also compress colors to 16bit. I also compress normals to one 24bit value. I also add shader number (8bit) to the point. So one point in snapshot consists of:  16bit*3 position + 24bit normal + 16bit color + 8bit shader.

14) There must be some ways to compress it more (store colors in texture (lossy jpeg), make some points to share shader and normals). Uncompressed snapshot full of points (this may be indoor snapshot) 512x512x6 = 18Mb , 256x256x6 = 4,5Mb

Of course, after lzma compression (engine reads directly from ulzma output, which is fast) it can be up to 10 times smaller, but sometimes only 2-3 times. AND THIS IS A PROBLEM. I’m afraid, UD has smarter way to index it’s data.

For 320×240 screen resolution 512×512 is enough, 256×256 too, but there will be more holes and quality will suffer.

To summarize engine’s workflow:

15) Snapshot building stage. Render all scene points (any speed-up may be used here: octrees or, what I currently use: dynamic point skipping according to last point distance to the camera) to snapshots and compress them. Step between snapshots increases data weight AND rendering time AND quality. There’s no much sense to make step like 1 point. Or even 100 points. After this, scene is no longer needed or I should say scene won’t be used for realtime rendering.

16) Rendering stage. Load nearest snapshots to the camera and project points from them (more points for closer snapshots, less for distant. 1 main snapshot + ~6-8 additional used at time. (I am still not sure about this scheme and changing it often). Backfa..point culling applied. Shaders applied. Fill holes. Constantly update snapshots array according to the camera position.

17) If I restrict camera positions, it is possible to “compress” huge point cloud level into relatively small database. But in other cases my database will be many times greater than original point cloud scene. [ See comments   JX#2  , JX#3 , chorasimilarity#4 , chorasimilarity#5 . Here is an eye-candy image of an experiment by JX, see JX#2:]

eye_candy_by_JX

Next development steps may be:

18) dynamic camera step during snapshot building (It may be better to do more steps when more points closer to camera (simple to count during projection) and less steps when camera in air above the island, for example),

19) better snapshot compression (jpeg, maybe delta-coding for points), octree involvement during snapshot building.

20) But as I realized disk memory problems, my interest is falling.

Any questions?

UD question

I try to formulate the question about how Unlimited Detail works like this:

Let D be a database of 3D points, containing information about  M points. Let also S be the image on the screen, say with N pixels. Problem:

  • reorganize the database D to obtain another database D’ with at most O(M) bits, such that
  • starting from D’ and a finite (say 100 bytes) word there exists an algorithm which finds the image on the screen in O(N log(M)) time.

Is this reasonable?

For example, take N=1. The finite word means the position and orientation of the screen in the 3D world of the database. If the M points would admit a representation as a number (euclidean invariant hash function?) of order M^a (i.e. polynomial in the number of points), then it would be reasonable to expect  D’ to have dimension of order O(log(M)), so in this case simply by traversing D’ we get the time O(log(M)) = O(N log(M)). Even if we cannot make D’ to be O(log(M)) large, maybe the algorithm still takes O(log(M)) steps simply because M is approximately the volume, so the diameter in 3D space is roughly between M^(1/3) and M,  or due to scaling of the perspective the algorithm may still hop through D’ in geometric, not arithmetic steps.

The second remark is that there is no restriction for the time which is necessary for transforming D into D’.

Unlimited detail and 3D portal engines, or else real-time path tracing

Here are two new small pieces which might, or not, add to the understanding of how the Unlimited Detail – Euclideon algorithm might work. (Last post on this subject is Unlimited detail, software point cloud renderers, you may want to read it.)

3D-portal engines: From this 1999 page “Building a 3D portal engine“, several quotes (boldfaced by me):

Basically, a portal based engine is a way to overcome the problem of the incredible big datasets that usually make up a world. A good 3D engine should run at a decent speed, no matter what the size of the full world is; speed should be relative to the amount of detail that is actually visible. It would of course be even better if the speed would only depend on the number of pixels you want to draw, but since apparently no one has found an algorithm that does that, we’ll go for the next best thing.

A basic portal engine relies on a data set that represents the world. The ‘world’ is subdivided in areas, that I call ‘sectors’. Sectors are connected through ‘portals’, hence the name ‘Portal Engine’. The rendering process starts in the sector that the camera is in. It draws the polygons in the current sector, and when a portal is encountered, the adjacent sector is entered, and the polygons in that sector are processed. This would of course still draw every polygon in the world, assuming that all sectors are somehow connected. But, not every portal is visible. And if a portal is not visible, the sector that it links to doesn’t have to be drawn. That’s logical: A room is only visible if there’s a line of sight from the camera to that room, that is not obscured by a wall.

So now we have what we want: If a portal is invisible, tracing stops right there. If there’s a huge part of the world behind that portal, that part is never processed. The number of polygons that are actually processed is thus almost exactly equal to the number of visible polygons, plus the inserted portal polygons.

By now it should also be clear where portals should be inserted in a world: Good spots for portals are doors, corridors, windows and so on. That also makes clear why portal engines suck at outdoor scenes: It’s virtually impossible to pick good spots for portals there, and each sector can ‘see’ virtually every other sector in the world. Portal rendering can be perfectly combined with outdoor engines though: If you render your landscape with another type of engine, you could place portals in entrances of caves, buildings and so on. When the ‘normal’ renderer encounters a portal, you could simply switch to portal rendering for everything behind that portal. That way, a portal engine can even be nice for a ‘space-sim’…

So let’s dream and ask if there is any way to construct the database for the 3D scene such that the rendering process becomes an algorithm for finding the right portals, one for each pixel maybe. To think about.  The database is not a tree, but from the input given by the position of the viewer, the virtually available portals (which could be just pointers attached to faces of octrees, say, which point to the faces of smaller cubes which are visible from the bigger face, seen as a portal) organize themselves into a tree. Therefore the matter of finding what to put on a screen pixel could be solved by a search algorithm.

As a small bonus, here is the link to a patent of Euclideon Pty. Ltd. : An alternate method for the child rejection process in regards to octree rendering – AU2012903094.

Or else real-time path tracing. Related to Brigade 2, read here, and  a video:

Biological vision as a problem of Fully Homomorphic Encryption

One revelation after another! After learning about MOOC, I am reading now the Craig Gentry’s PhD thesis on fully homomorphic encryption.

Without more information loading, it looks to me that biological vision could be regarded as a fully homomorphic encryption problem.

Explanation: the problem of biological vision is the following. We have an organism, say a human or a fly.  By vision, the outer space is encrypted as a physical dynamical system in the brain, in a way which is basically unknown. However, the  encrypted information  is so good that the brain can compute, based on it, some function, which is then sent to the motor system, which in turn modifies the outer space efficiently (human kills fly or fly avoids human).

During this process there is no decryption, because there is no space, or image, in the brain (see the homunculus fallacy).

Therefore, the encryption used by the brain has to be a fully homomorphic encryption!

You may imagine how amazed I am by reading Gentry’s description, which I quote, with holes and my emphasis, from page 2 of his thesis:

Imagine you have an encryption scheme with a “noise parameter” attached to each ciphertext, where encryption outputs a ciphertext with small noise – say, less than n  – but decryption works as long as the noise is less than some threshold N \gg n. Furthermore,imagine you have algorithms […] that can take ciphertexts E(a)  and E(b)  and compute E(a+b) and E(a*b), but at the cost of adding or multiplying the noise parameters. This immediately gives a “somewhat homomorphic” encryption scheme […]. Now suppose that you have an algorithm Recrypt that takes a ciphertext E(a)  with noise N' < N  and outputs a “fresh” ciphertext E(a)  that also encrypts a, but which has noise parameter smaller than N^{1/2} .

[…] It turns out that a somewhat homomorphic encryption scheme that has this self-referential property of being able to handle circuits that are deeper than its own decryption circuit – in which case we say the somewhat homomorphic encryption scheme is “bootstrappable” – is enough to obtain the Recrypt algorithm, and thereby fully homomorphic encryption!

In order to understand my enthusiasm, here is again a link to exploring space slides, see also the posts concerning approximate structures.