|
Publications |
Expand all Collapse all |
Guiding a Diffusion Model with a Bad Version of Itself.
Tero Karras, Miika Aittala, Tuomas Kynkäänniemi, Jaakko Lehtinen, Timo Aila and Samuli Laine. Advances in Neural Information Processing Systems 37 (NeurIPS 2024). Oral presentation Best paper runner-up Abstract Bibtex [PDF] [arXiv] |
The primary axes of interest in image-generating diffusion models are image quality, the amount of variation in the results, and how well the results align with a given condition, e.g., a class label or a text prompt. The popular classifier-free guidance approach uses an unconditional model to guide a conditional model, leading to simultaneously better prompt alignment and higher-quality images at the cost of reduced variation. These effects seem inherently entangled, and thus hard to control. We make the surprising observation that it is possible to obtain disentangled control over image quality without compromising the amount of variation by guiding generation using a smaller, less-trained version of the model itself rather than an unconditional model. This leads to significant improvements in ImageNet generation, setting record FIDs of 1.01 for 64×64 and 1.25 for 512×512, using publicly available networks. Furthermore, the method is also applicable to unconditional diffusion models, drastically improving their quality.
@inproceedings{Karras2024autoguidance, author = {Tero Karras and Miika Aittala and Tuomas Kynk\"a\"anniemi and Jaakko Lehtinen and Timo Aila and Samuli Laine}, title = {Guiding a Diffusion Model with a Bad Version of Itself}, booktitle = {Advances in Neural Information Processing Systems 37 (proc. NeurIPS 2024)}, year = {2024}, }
Applying Guidance in a Limited Interval Improves Sample and Distribution Quality in Diffusion Models.
Tuomas Kynkäänniemi, Miika Aittala, Tero Karras, Samuli Laine, Timo Aila and Jaakko Lehtinen. Advances in Neural Information Processing Systems 37 (NeurIPS 2024). Abstract Bibtex [PDF] [arXiv] |
Guidance is a crucial technique for extracting the best performance out of image-generating diffusion models. Traditionally, a constant guidance weight has been applied throughout the sampling chain of an image. We show that guidance is clearly harmful toward the beginning of the chain (high noise levels), largely unnecessary toward the end (low noise levels), and only beneficial in the middle. We thus restrict it to a specific range of noise levels, improving both the inference speed and result quality. This limited guidance interval improves the record FID in ImageNet-512 significantly, from 1.81 to 1.40. We show that it is quantitatively and qualitatively beneficial across different sampler parameters, network architectures, and datasets, including the large-scale setting of Stable Diffusion XL. We thus suggest exposing the guidance interval as a hyperparameter in all diffusion models that use guidance.
@inproceedings{Kynkaanniemi2024guidance, author = {Tuomas Kynk\"a\"anniemi and Miika Aittala and Tero Karras and Samuli Laine and Timo Aila and Jaakko Lehtinen}, title = {Applying Guidance in a Limited Interval Improves Sample and Distribution Quality in Diffusion Models}, booktitle = {Advances in Neural Information Processing Systems 37 (proc. NeurIPS 2024)}, year = {2024}, }
Analyzing and Improving the Training Dynamics of Diffusion Models.
Tero Karras, Miika Aittala, Jaakko Lehtinen, Janne Hellsten, Timo Aila and Samuli Laine. Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2024. Oral presentation Best paper award candidate Abstract Bibtex [PDF] [Code] [arXiv] |
Diffusion models currently dominate the field of data-driven image synthesis with their unparalleled scaling to large datasets. In this paper, we identify and rectify several causes for uneven and ineffective training in the popular ADM diffusion model architecture, without altering its high-level structure. Observing uncontrolled magnitude changes and imbalances in both the network activations and weights over the course of training, we redesign the network layers to preserve activation, weight, and update magnitudes on expectation. We find that systematic application of this philosophy eliminates the observed drifts and imbalances, resulting in considerably better networks at equal computational complexity. Our modifications improve the previous record FID of 2.41 in ImageNet-512 synthesis to 1.81, achieved using fast deterministic sampling.
As an independent contribution, we present a method for setting the exponential moving average (EMA) parameters post-hoc, i.e., after completing the training run. This allows precise tuning of EMA length without the cost of performing several training runs, and reveals its surprising interactions with network architecture, training time, and guidance.
@inproceedings{Karras2024dynamics, author = {Tero Karras and Miika Aittala and Jaakko Lehtinen and Janne Hellsten and Timo Aila and Samuli Laine}, title = {Analyzing and Improving the Training Dynamics of Diffusion Models}, booktitle = {Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, year = {2024}, }
A Hybrid Generator Architecture for Controllable Face Synthesis.
Dann Mensah, Nam Hee Kim, Miika Aittala, Samuli Laine and Jaakko Lehtinen. SIGGRAPH 2023. Abstract Bibtex [PDF] |
Modern data-driven image generation models often surpass traditional graphics techniques in quality. However, while traditional modeling and animation tools allow precise control over the image generation process in terms of interpretable quantities — e.g., shapes and reflectances — endowing learned models with such controls is generally difficult.
In the context of human faces, we seek a data-driven generator architecture that simultaneously retains the photorealistic quality of modern generative adversarial networks (GAN) and allows explicit, disentangled controls over head shapes, expressions, identity, background, and illumination. While our high-level goal is shared by a large body of previous work, we approach the problem with a different philosophy: We treat the problem as an unconditional synthesis task, and engineer interpretable inductive biases into the model that make it easy for the desired behavior to emerge. Concretely, our generator is a combination of learned neural networks and fixed-function blocks, such as a 3D morphable head model and texture-mapping rasterizer, and we leave it up to the training process to figure out how they should be used together. This greatly simplifies the training problem by removing the need for labeled training data; we learn the distributions of the independent variables that drive the model instead of requiring that their values are known for each training image. Furthermore, we need no contrastive or imitation learning for correct behavior.
We show that our design successfully encourages the generative model to make use of the internal, interpretable representations in a semantically meaningful manner. This allows sampling of different aspects of the image independently, as well as precise control of the results by manipulating the internal state of the interpretable blocks within the generator. This enables, for instance, facial animation using traditional animation tools.
@inproceedings{Mensah2023siggraph, author = {Dann Mensah and Nam Hee Kim and Miika Aittala and Samuli Laine and Jaakko Lehtinen}, title = {A Hybrid Generator Architecture for Controllable Face Synthesis}, booktitle = {Proc. SIGGRAPH}, year = {2023}, }
StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis.
Axel Sauer, Tero Karras, Samuli Laine, Andreas Geiger and Timo Aila. International Conference on Machine Learning (ICML) 2023. Oral presentation Abstract Bibtex [PDF] [Code] [Video] [Project page] [arXiv] |
Text-to-image synthesis has recently seen significant progress thanks to large pretrained language models, large-scale training data, and the introduction of scalable model families such as diffusion and autoregressive models. However, the best-performing models require iterative evaluation to generate a single sample. In contrast, generative adversarial networks (GANs) only need a single forward pass. They are thus much faster, but they currently remain far behind the state-of-the-art in large-scale text-to-image synthesis. This paper aims to identify the necessary steps to regain competitiveness. Our proposed model, StyleGAN-T, addresses the specific requirements of large-scale text-to-image synthesis, such as large capacity, stable training on diverse datasets, strong text alignment, and controllable variation vs. text alignment tradeoff. StyleGAN-T significantly improves over previous GANs and outperforms distilled diffusion models — the previous state-of-the-art in fast text-to-image synthesis — in terms of sample quality and speed.
@inproceedings{Sauer2023icml, author = {Axel Sauer and Tero Karras and Samuli Laine and Andreas Geiger and Timo Aila}, title = {{StyleGAN-T}: {U}nlocking the Power of {GAN}s for Fast Large-Scale Text-to-Image Synthesis}, booktitle = {Proc. International Conference on Machine Learning (ICML)}, year = {2023}, }
Elucidating the Design Space of Diffusion-Based Generative Models.
Tero Karras, Miika Aittala, Timo Aila and Samuli Laine. Advances in Neural Information Processing Systems 35 (NeurIPS 2022). Oral presentation Outstanding paper award Abstract Bibtex [PDF] [Code] [arXiv] |
We argue that the theory and practice of diffusion-based generative models are currently unnecessarily convoluted and seek to remedy the situation by presenting a design space that clearly separates the concrete design choices. This lets us identify several changes to both the sampling and training processes, as well as preconditioning of the score networks. Together, our improvements yield new state-of-the-art FID of 1.79 for CIFAR-10 in a class-conditional setting and 1.97 in an unconditional setting, with much faster sampling (35 network evaluations per image) than prior designs. To further demonstrate their modular nature, we show that our design changes dramatically improve both the efficiency and quality obtainable with pre-trained score networks from previous work, including improving the FID of a previously trained ImageNet-64 model from 2.07 to near-SOTA 1.55, and after re-training with our proposed improvements to a new SOTA of 1.36.
@inproceedings{Karras2022elucidating, author = {Tero Karras and Miika Aittala and Timo Aila and Samuli Laine}, title = {Elucidating the Design Space of Diffusion-Based Generative Models}, booktitle = {Advances in Neural Information Processing Systems 35 (proc. NeurIPS 2022)}, year = {2022}, }
eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers.
Yogesh Balaji, Seungjun Nah, Xun Huang, Arash Vahdat, Jiaming Song, Karsten Kreis, Miika Aittala, Timo Aila, Samuli Laine, Bryan Catanzaro, Tero Karras and Ming-Yu Liu. Preprint, 2022. Abstract Bibtex [PDF] [Project page] [Video] [arXiv] |
Large-scale diffusion-based generative models have led to breakthroughs in text-conditioned high-resolution image synthesis. Starting from random noise, such text-to-image diffusion models gradually synthesize images in an iterative fashion while conditioning on text prompts. We find that their synthesis behavior qualitatively changes throughout this process: Early in sampling, generation strongly relies on the text prompt to generate text-aligned content, while later, the text conditioning is almost entirely ignored. This suggests that sharing model parameters throughout the entire generation process may not be ideal. Therefore, in contrast to existing works, we propose to train an ensemble of text-to-image diffusion models specialized for different synthesis stages. To maintain training efficiency, we initially train a single model, which is then split into specialized models that are trained for the specific stages of the iterative generation process. Our ensemble of diffusion models, called eDiff-I, results in improved text alignment while maintaining the same inference computation cost and preserving high visual quality, outperforming previous large-scale text-to-image diffusion models on the standard benchmark. In addition, we train our model to exploit a variety of embeddings for conditioning, including the T5 text, CLIP text, and CLIP image embeddings. We show that these different embeddings lead to different behaviors. Notably, the CLIP image embedding allows an intuitive way of transferring the style of a reference image to the target text-to-image output. Lastly, we show a technique that enables eDiff-I's "paint-with-words" capability. A user can select the word in the input text and paint it in a canvas to control the output, which is very handy for crafting the desired image in mind. The project page is available at https://deepimagination.cc/eDiff-I/
@article{Balaji2022eDiff-I, author = {Yogesh Balaji and Seungjun Nah and Xun Huang and Arash Vahdat and Jiaming Song and Karsten Kreis and Miika Aittala and Timo Aila and Samuli Laine and Bryan Catanzaro and Tero Karras and Ming-Yu Liu}, title = {{eDiff-I}: {T}ext-to-Image Diffusion Models with Ensemble of Expert Denoisers}, journal = {arXiv preprint arXiv:2211.01324}, year = {2022}, }
Projection-Domain Self-Supervision for Volumetric Helical CT Reconstruction.
Onni Kosomaa, Samuli Laine, Tero Karras, Miika Aittala and Jaakko Lehtinen. Preprint, 2022. Abstract Bibtex [PDF] [Project page] [arXiv] |
We propose a deep learning method for three-dimensional reconstruction in low-dose helical cone-beam computed tomography. We reconstruct the volume directly, i.e., not from 2D slices, guaranteeing consistency along all axes. In a crucial step beyond prior work, we train our model in a self-supervised manner in the projection domain using noisy 2D projection data, without relying on 3D reference data or the output of a reference reconstruction method. This means the fidelity of our results is not limited by the quality and availability of such data. We evaluate our method on real helical cone-beam projections and simulated phantoms. Our reconstructions are sharper and less noisy than those of previous methods, and several decibels better in quantitative PSNR measurements. When applied to full-dose data, our method produces high-quality results orders of magnitude faster than iterative techniques.
@article{Kosomaa2022cbct, author = {Onni Kosomaa and Samuli Laine and Tero Karras and Miika Aittala and Jaakko Lehtinen}, title = {Projection-Domain Self-Supervision for Volumetric Helical {CT} Reconstruction}, journal = {arXiv preprint arXiv:2212.07431}, year = {2022}, }
Disentangling Random and Cyclic Effects in Time-Lapse Sequences.
Erik Härkönen, Miika Aittala, Tuomas Kynkäänniemi, Samuli Laine, Timo Aila and Jaakko Lehtinen. ACM Transactions on Graphics 41(4) (SIGGRAPH 2022). Abstract Bibtex [PDF] [Code] [arXiv] |
Time-lapse image sequences offer visually compelling insights into dynamic processes that are too slow to observe in real time. However, playing a long time-lapse sequence back as a video often results in distracting flicker due to random effects, such as weather, as well as cyclic effects, such as the day-night cycle. We introduce the problem of disentangling time-lapse sequences in a way that allows separate, after-the-fact control of overall trends, cyclic effects, and random effects in the images, and describe a technique based on data-driven generative models that achieves this goal. This enables us to "re-render" the sequences in ways that would not be possible with the input images alone. For example, we can stabilize a long sequence to focus on plant growth over many months, under selectable, consistent weather.
Our approach is based on Generative Adversarial Networks (GAN) that are conditioned with the time coordinate of the time-lapse sequence. Our architecture and training procedure are designed so that the networks learn to model random variations, such as weather, using the GAN's latent space, and to disentangle overall trends and cyclic variations by feeding the conditioning time label to the model using Fourier features with specific frequencies.
We show that our models are robust to defects in the training data, enabling us to amend some of the practical difficulties in capturing long time-lapse sequences, such as temporary occlusions, uneven frame spacing, and missing frames.
@article{Harkonen2022tlgan, author = {Erik H\"ark\"onen and Miika Aittala and Tuomas Kynk\"a\"anniemi and Samuli Laine and Timo Aila and Jaakko Lehtinen}, title = {Disentangling Random and Cyclic Effects in Time-Lapse Sequences}, journal = {{ACM} Transactions on Graphics}, volume = {41}, number = {4}, year = {2022}, }
Alias-Free Generative Adversarial Networks.
Tero Karras, Miika Aittala, Samuli Laine, Erik Härkönen, Janne Hellsten, Jaakko Lehtinen and Timo Aila. Advances in Neural Information Processing Systems 34 (NeurIPS 2021). Oral presentation Abstract Bibtex [PDF] [Project page] [Code] [arXiv] |
We observe that despite their hierarchical convolutional nature, the synthesis process of typical generative adversarial networks depends on absolute pixel coordinates in an unhealthy manner. This manifests itself as, e.g., detail appearing to be glued to image coordinates instead of the surfaces of depicted objects. We trace the root cause to careless signal processing that causes aliasing in the generator network. Interpreting all signals in the network as continuous, we derive generally applicable, small architectural changes that guarantee that unwanted information cannot leak into the hierarchical synthesis process. The resulting networks match the FID of StyleGAN2 but differ dramatically in their internal representations, and they are fully equivariant to translation and rotation even at subpixel scales. Our results pave the way for generative models better suited for video and animation.
@inproceedings{Karras2021neurips, author = {Tero Karras and Miika Aittala and Samuli Laine and Erik H\"ark\"onen and Janne Hellsten and Jaakko Lehtinen and Timo Aila}, title = {Alias-Free Generative Adversarial Networks}, booktitle = {Advances in Neural Information Processing Systems 34 (proc. NeurIPS 2021)}, year = {2021}, }
Appearance-Driven Automatic 3D Model Simplification.
Jon Hasselgren, Jacob Munkberg, Jaakko Lehtinen, Miika Aittala and Samuli Laine. Eurographics Symposium on Rendering 2021 (Symposium track). Abstract Bibtex [PDF] [Presentation] [Code] [Video] [arXiv] |
We present a suite of techniques for jointly optimizing triangle meshes and shading models to match the appearance of reference
scenes. This capability has a number of uses, including appearance-preserving simplification of extremely complex assets,
conversion between rendering systems, and even conversion between geometric scene representations.
We follow and extend the classic analysis-by-synthesis family of techniques: enabled by a highly efficient differentiable renderer
and modern nonlinear optimization algorithms, our results are driven to minimize the image-space difference to the target
scene when rendered in similar viewing and lighting conditions. As the only signals driving the optimization are differences in
rendered images, the approach is highly general and versatile: it easily supports many different forward rendering models such
as normal mapping, spatially-varying BRDFs, displacement mapping, etc. Supervision through images only is also key to the
ability to easily convert between rendering systems and scene representations.
We output triangle meshes with textured materials to ensure that the models render efficiently on modern graphics hardware
and benefit from, e.g., hardware-accelerated rasterization, ray tracing, and filtered texture lookups. Our system is integrated in
a small Python code base, and can be applied at high resolutions and on large models. We describe several use cases, including
mesh decimation, level of detail generation, seamless mesh filtering and approximations of aggregate geometry.
@inproceedings{Hasselgren2021egsr, author = {Hasselgren, Jon and Munkberg, Jacob and Lehtinen, Jaakko and Aittala, Miika and Laine, Samuli}, title = {Appearance-Driven Automatic {3D} Model Simplification}, booktitle = {Eurographics Symposium on Rendering - DL-only Track}, year = {2021}, editor = {Bousseau, Adrien and McGuire, Morgan}, publisher = {The Eurographics Association}, DOI = {10.2312/sr.20211293}, }
Modular Primitives for High-Performance Differentiable Rendering.
Samuli Laine, Janne Hellsten, Tero Karras, Yeongho Seol, Jaakko Lehtinen and Timo Aila. ACM Transactions on Graphics 39(6) (SIGGRAPH Asia 2020). Abstract Bibtex [PDF] [Code] [arXiv] |
We present a modular differentiable renderer design that yields performance superior to previous methods by leveraging existing, highly optimized hardware graphics pipelines. Our design supports all crucial operations in a modern graphics pipeline: rasterizing large numbers of triangles, attribute interpolation, filtered texture lookups, as well as user-programmable shading and geometry processing, all in high resolutions. Our modular primitives allow custom, high-performance graphics pipelines to be built directly within automatic differentiation frameworks such as PyTorch or TensorFlow. As a motivating application, we formulate facial performance capture as an inverse rendering problem and show that it can be solved efficiently using our tools. Our results indicate that this simple and straightforward approach achieves excellent geometric correspondence between rendered results and reference imagery.
@article{Laine2020diffrast, author = {Samuli Laine and Janne Hellsten and Tero Karras and Yeongho Seol and Jaakko Lehtinen and Timo Aila}, title = {Modular Primitives for High-Performance Differentiable Rendering}, journal = {ACM Transactions on Graphics}, year = {2020}, volume = {39}, number = {6}, }
Training Generative Adversarial Networks with Limited Data.
Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen and Timo Aila. Advances in Neural Information Processing Systems 33 (NeurIPS 2020). Oral presentation Abstract Bibtex [PDF] [Code] [arXiv] |
Training generative adversarial networks (GAN) using too little data typically leads to discriminator overfitting, causing training to diverge. We propose an adaptive discriminator augmentation mechanism that significantly stabilizes training in limited data regimes. The approach does not require changes to loss functions or network architectures, and is applicable both when training from scratch and when fine-tuning an existing GAN on another dataset. We demonstrate, on several datasets, that good results are now possible using only a few thousand training images, often matching StyleGAN2 results with an order of magnitude fewer images. We expect this to open up new application domains for GANs. We also find that the widely used CIFAR-10 is, in fact, a limited data benchmark, and improve the record FID from 5.59 to 2.42.
@inproceedings{Karras2020neurips, author = {Tero Karras and Miika Aittala and Janne Hellsten and Samuli Laine and Jaakko Lehtinen and Timo Aila}, title = {Training Generative Adversarial Networks with Limited Data}, booktitle = {Advances in Neural Information Processing Systems 33 (proc. NeurIPS 2020)}, year = {2020}, }
Analyzing and Improving the Image Quality of StyleGAN.
Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen and Timo Aila. Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2020. Abstract Bibtex [PDF] [Video] [Code] [arXiv] |
The style-based GAN architecture (StyleGAN) yields state-of-the-art results in data-driven unconditional generative image modeling. We expose and analyze several of its characteristic artifacts, and propose changes in both model architecture and training methods to address them. In particular, we redesign the generator normalization, revisit progressive growing, and regularize the generator to encourage good conditioning in the mapping from latent codes to images. In addition to improving image quality, this path length regularizer yields the additional benefit that the generator becomes significantly easier to invert. This makes it possible to reliably attribute a generated image to a particular network. We furthermore visualize how well the generator utilizes its output resolution, and identify a capacity problem, motivating us to train larger models for additional quality improvements. Overall, our improved model redefines the state of the art in unconditional image modeling, both in terms of existing distribution quality metrics as well as perceived image quality.
@inproceedings{Karras2020cvpr, author = {Tero Karras and Samuli Laine and Miika Aittala and Janne Hellsten and Jaakko Lehtinen and Timo Aila}, title = {Analyzing and Improving the Image Quality of {StyleGAN}}, booktitle = {Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, year = {2020}, }
A Style-Based Generator Architecture for Generative Adversarial Networks.
Tero Karras, Samuli Laine and Timo Aila. Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2019. Oral presentation Best paper honorary mention Abstract Bibtex [PDF] [Video] [Code] [FFHQ dataset] [arXiv] |
We propose an alternative generator architecture for generative adversarial networks, borrowing from style transfer literature. The new architecture leads to an automatically learned, unsupervised separation of high-level attributes (e.g., pose and identity when trained on human faces) and stochastic variation in the generated images (e.g., freckles, hair), and it enables intuitive, scale-specific control of the synthesis. The new generator improves the state-of-the-art in terms of traditional distribution quality metrics, leads to demonstrably better interpolation properties, and also better disentangles the latent factors of variation. To quantify interpolation quality and disentanglement, we propose two new, automated methods that are applicable to any generator architecture. Finally, we introduce a new, highly varied and high-quality dataset of human faces.
@inproceedings{Karras2019cvpr, author = {Tero Karras and Samuli Laine and Timo Aila}, title = {A Style-Based Generator Architecture for Generative Adversarial Networks}, booktitle = {Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, year = {2019}, }
High-Quality Self-Supervised Deep Image Denoising.
Samuli Laine, Tero Karras, Jaakko Lehtinen and Timo Aila. Advances in Neural Information Processing Systems 32 (NeurIPS 2019). Abstract Bibtex [PDF] [Code] [Poster] [Slides] [arXiv] |
We describe a novel method for training high-quality image denoising models based on unorganized collections of corrupted images. The training does not need access to clean reference images, or explicit pairs of corrupted images, and can thus be applied in situations where such data is unacceptably expensive or impossible to acquire. We build on a recent technique that removes the need for reference data by employing networks with a "blind spot" in the receptive field, and significantly improve two key aspects: image quality and training efficiency. Our result quality is on par with state-of-the-art neural network denoisers in the case of i.i.d. additive Gaussian noise, and not far behind with Poisson and impulse noise. We also successfully handle cases where parameters of the noise model are variable and/or unknown in both training and evaluation data.
@inproceedings{Laine2019denoising, author = {Samuli Laine and Tero Karras and Jaakko Lehtinen and Timo Aila}, title = {High-Quality Self-Supervised Deep Image Denoising}, booktitle = {Advances in Neural Information Processing Systems 32 (proc. NeurIPS 2019)}, year = {2019}, }
Improved Precision and Recall Metric for Assessing Generative Models.
Tuomas Kynkäänniemi, Tero Karras, Samuli Laine, Jaakko Lehtinen and Timo Aila. Advances in Neural Information Processing Systems 32 (NeurIPS 2019). Abstract Bibtex [PDF] [Code] [arXiv] |
The ability to automatically estimate the quality and coverage of the samples produced by a generative model is a vital requirement for driving algorithm research. We present an evaluation metric that can separately and reliably measure both of these aspects in image generation tasks by forming explicit, non-parametric representations of the manifolds of real and generated data. We demonstrate the effectiveness of our metric in StyleGAN and BigGAN by providing several illustrative examples where existing metrics yield uninformative or contradictory results. Furthermore, we analyze multiple design variants of StyleGAN to better understand the relationships between the model architecture, training methods, and the properties of the resulting sample distribution. In the process, we identify new variants that improve the state-of-the-art. We also perform the first principled analysis of truncation methods and identify an improved method. Finally, we extend our metric to estimate the perceptual quality of individual samples, and use this to study latent space interpolations.
@inproceedings{Kynkaanniemi2019metric, author = {Tuomas Kynk\"a\"anniemi and Tero Karras and Samuli Laine and Jaakko Lehtinen and Timo Aila}, title = {Improved Precision and Recall Metric for Assessing Generative Models}, booktitle = {Advances in Neural Information Processing Systems 32 (proc. NeurIPS 2019)}, year = {2019}, }
Semi-Supervised Semantic Segmentation Needs Strong, Varied Perturbations.
Geoff French, Samuli Laine, Timo Aila, Michal Mackiewicz and Graham Finlayson. British Machine Vision Conference 2020. Abstract Bibtex [PDF] [Code] [arXiv] |
Consistency regularization describes a class of approaches that have yielded ground breaking results in semi-supervised classification problems. Prior work has established the cluster assumption — under which the data distribution consists of uniform class clusters of samples separated by low density regions — as important to its success. We analyze the problem of semantic segmentation and find that its' distribution does not exhibit low density regions separating classes and offer this as an explanation for why semi-supervised segmentation is a challenging problem, with only a few reports of success. We then identify choice of augmentation as key to obtaining reliable performance without such low-density regions. We find that adapted variants of the recently proposed CutOut and CutMix augmentation techniques yield state-of-the-art semi-supervised semantic segmentation results in standard datasets. Furthermore, given its challenging nature we propose that semantic segmentation acts as an effective acid test for evaluating semi-supervised regularizers.
@inproceedings{French2019seg, author = {Geoffrey French and Samuli Laine and Timo Aila and Michal Mackiewicz and Graham D. Finlayson}, title = {Semi-Supervised Semantic Segmentation Needs Strong, Varied Perturbations}, booktitle = {31st British Machine Vision Conference 2020, {BMVC} 2020}, year = {2020}, }
NVGaze: An Anatomically-Informed Dataset for Low-Latency, Near-Eye Gaze Estimation.
Joohwan Kim, Michael Stengel, Alexander Majercik, Shalini De Mello, David Dunn, Samuli Laine, Morgan McGuire and David Luebke. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 2019. Abstract Bibtex [PDF] [Supplement] [Video] [Project page] |
Quality, diversity, and size of training data are critical factors for learning-based gaze estimators. We create two datasets satisfying these criteria for near-eye gaze estimation under infrared illumination: a synthetic dataset using anatomicallyinformed eye and face models with variations in face shape, gaze direction, pupil and iris, skin tone, and external conditions (2M images at 1280x960), and a real-world dataset collected with 35 subjects (2.5M images at 640x480). Using these datasets we train neural networks performing with submillisecond latency. Our gaze estimation network achieves 2.06(±0.44)° of accuracy across a wide 30°×40° field of view on real subjects excluded from training and 0.5° best-case accuracy (across the same FOV) when explicitly trained for one real subject. We also train a pupil localization network which achieves higher robustness than previous methods.
@inproceedings{Kim2019sigchi, author = {Kim, Joohwan and Stengel, Michael and Majercik, Alexander and De Mello, Shalini and Dunn, David and Laine, Samuli and McGuire, Morgan and Luebke, David}, title = {{NVGaze}: {A}n Anatomically-Informed Dataset for Low-Latency, Near-Eye Gaze Estimation}, booktitle = {Proceedings of the SIGCHI Conference on Human Factors in Computing Systems}, series = {CHI '19}, year = {2019}, }
Progressive Growing of GANs for Improved Quality, Stability, and Variation.
Tero Karras, Timo Aila, Samuli Laine and Jaakko Lehtinen. International Conference on Learning Representations (ICLR) 2018. Oral presentation Abstract Bibtex [PDF] [Video] [Code] [Poster] [arXiv] |
We describe a new training methodology for generative adversarial networks. The key idea is to grow both the generator and discriminator progressively: starting from a low resolution, we add new layers that model increasingly fine details as training progresses. This both speeds the training up and greatly stabilizes it, allowing us to produce images of unprecedented quality, e.g., CelebA images at 10242. We also propose a simple way to increase the variation in generated images, and achieve a record inception score of 8.80 in unsupervised CIFAR10. Additionally, we describe several implementation details that are important for discouraging unhealthy competition between the generator and discriminator. Finally, we suggest a new metric for evaluating GAN results, both in terms of image quality and variation. As an additional contribution, we construct a higher-quality version of the CelebA dataset.
@inproceedings{Karras2018iclr, author = {Tero Karras and Timo Aila and Samuli Laine and Jaakko Lehtinen}, title = {Progressive Growing of {GAN}s for Improved Quality, Stability, and Variation}, booktitle = {Proc. International Conference on Learning Representations (ICLR)}, year = {2018}, }
Noise2Noise: Learning Image Restoration without Clean Data.
Jaakko Lehtinen, Jacob Munkberg, Jon Hasselgren, Samuli Laine, Tero Karras, Miika Aittala and Timo Aila. International Conference on Machine Learning (ICML) 2018. Abstract Bibtex [PDF] [Code] [arXiv] |
We apply basic statistical reasoning to signal reconstruction by machine learning — learning to map corrupted observations to clean signals — with a simple and powerful conclusion: it is possible to learn to restore images by only looking at corrupted examples, at performance at and sometimes exceeding training using clean data, without explicit image priors or likelihood models of the corruption. In practice, we show that a single model learns photographic noise removal, denoising synthetic Monte Carlo images, and reconstruction of undersampled MRI scans — all corrupted by different processes — based on noisy data only.
@inproceedings{Lehtinen2018icml, author = {Jaakko Lehtinen and Jacob Munkberg and Jon Hasselgren and Samuli Laine and Tero Karras and Miika Aittala and Timo Aila}, title = {{Noise2Noise}: {L}earning Image Restoration without Clean Data}, booktitle = {Proc. International Conference on Machine Learning (ICML)}, year = {2018}, }
Feature-Based Metrics for Exploring the Latent Space of Generative Models.
Samuli Laine. ICLR 2018 workshop track. Abstract Bibtex [PDF] [Poster] |
Several recent papers have treated the latent space of deep generative models, e.g., GANs or VAEs, as Riemannian manifolds. The argument is that operations such as interpolation are better done along geodesics that minimize path length not in the latent space but in the output space of the generator. However, this implicitly assumes that some simple metric such as L2 is meaningful in the output space, even though it is well known that for, e.g., semantic comparison of images it is woefully inadequate. In this work, we consider imposing an arbitrary metric on the generator's output space and show both theoretically and experimentally that a feature-based metric can produce much more sensible interpolations than the usual L2 metric. This observation leads to the conclusion that analysis of latent space geometry would benefit from using a suitable, explicitly defined metric.
@techreport{Laine2018iclr, author = {Samuli Laine}, title = {Feature-Based Metrics for Exploring the Latent Space of Generative Models}, journal = {ICLR workshop track}, year = {2018}, }
Audio-Driven Facial Animation by Joint End-to-End Learning of Pose and Emotion.
Tero Karras, Timo Aila, Samuli Laine, Antti Herva and Jaakko Lehtinen. ACM Transactions on Graphics 36(4) (SIGGRAPH 2017). Abstract Bibtex [PDF] [Video] |
We present a machine learning technique for driving 3D facial animation by audio input in real time and with low latency. Our deep neural network learns a mapping from input waveforms to the 3D vertex coordinates of a face model, and simultaneously discovers a compact, latent code that disambiguates the variations in facial expression that cannot be explained by the audio alone. During inference, the latent code can be used as an intuitive control for the emotional state of the face puppet.
We train our network with 3-5 minutes of high-quality animation data obtained using traditional, vision-based performance capture methods. Even though our primary goal is to model the speaking style of a single actor, our model yields reasonable results even when driven with audio from other speakers with different gender, accent, or language, as we demonstrate with a user study. The results are applicable to in-game dialogue, low-cost localization, virtual reality avatars, and telepresence.
@article{Karras2017siggraph, author = {Tero Karras and Timo Aila and Samuli Laine and Antti Herva and Jaakko Lehtinen}, title = {Audio-Driven Facial Animation by Joint End-to-End Learning of Pose and Emotion.}, journal = {ACM Transactions on Graphics}, year = {2017}, volume = {36}, number = {4}, }
Efficient Incoherent Ray Traversal on GPUs Through Compressed Wide BVHs.
Henri Ylitie, Tero Karras and Samuli Laine. High-Performance Graphics 2017. Abstract Bibtex [PDF] |
We present a GPU-based ray traversal algorithm that operates on compressed wide BVHs and maintains the traversal stack in a compressed format. Our method reduces the amount of memory traffic significantly, which translates to 1.9-2.1x improvement in incoherent ray traversal performance compared to the current state of the art. Furthermore, the memory consumption of our hierarchy is 35-60% of a typical uncompressed BVH.
In addition, we present an algorithmically efficient method for converting a binary BVH into a wide BVH in a SAH-optimal fashion, and an improved method for ordering the child nodes at build time for the purposes of octant-aware fixed-order traversal.
@techreport{Ylitie2017hpg, author = {Henri Ylitie and Tero Karras and Samuli Laine}, title = {Efficient Incoherent Ray Traversal on {GPU}s Through Compressed Wide {BVH}s}, booktitle = {Proceedings of High-Performance Graphics 2017}, year = {2017}, }
Production-Level Facial Performance Capture Using Deep Convolutional Neural Networks.
Samuli Laine, Tero Karras, Timo Aila, Antti Herva, Shunsuke Saito, Ronald Yu, Hao Li and Jaakko Lehtinen. Symposium on Computer Animation 2017. Abstract Bibtex [PDF] [Video] [arXiv] |
We present a real-time deep learning framework for video-based facial performance capture -- the dense 3D tracking of an actor's face given a monocular video. Our pipeline begins with accurately capturing a subject using a high-end production facial capture pipeline based on multi-view stereo tracking and artist-enhanced animations. With 5-10 minutes of captured footage, we train a convolutional neural network to produce high-quality output, including self-occluded regions, from a monocular video sequence of that subject. Since this 3D facial performance capture is fully automated, our system can drastically reduce the amount of labor involved in the development of modern narrative-driven video games or films involving realistic digital doubles of actors and potentially hours of animated dialogue per character. We compare our results with several state-of-the-art monocular real-time facial capture techniques and demonstrate compelling animation inference in challenging areas such as eyes and lips.
@techreport{Laine2017sca, author = {Samuli Laine and Tero Karras and Timo Aila and Antti Herva and Shunsuke Saito and Ronald Yu and Hao Li and Jaakko Lehtinen}, title = {Production-Level Facial Performance Capture Using Deep Convolutional Neural Networks}, journal = {Proc. Symposium on Computer Animation (SCA)}, year = {2017}, }
Temporal Ensembling for Semi-Supervised Learning.
Samuli Laine and Timo Aila. International Conference on Learning Representations 2017. Abstract Bibtex [PDF] [Code] [arXiv] |
In this paper, we present a simple and efficient method for training deep neural networks in a semi-supervised setting where only a small portion of training data is labeled. We introduce self-ensembling, where we form a consensus prediction of the unknown labels using the outputs of the network-in-training on different epochs, and most importantly, under different regularization and input augmentation conditions. This ensemble prediction can be expected to be a better predictor for the unknown labels than the output of the network at the most recent training epoch, and can thus be used as a target for training. Using our method, we set new records for two standard semi-supervised learning benchmarks, reducing the (non-augmented) classification error rate from 18.44% to 7.05% in SVHN with 500 labels and from 18.63% to 16.55% in CIFAR-10 with 4000 labels, and further to 5.12% and 12.16% by enabling the standard augmentations. We additionally obtain a clear improvement in CIFAR-100 classification accuracy by using random images from the Tiny Images dataset as unlabeled extra inputs during training. Finally, we demonstrate good tolerance to incorrect labels.
@inproceedings{Laine2017iclr, author = {Samuli Laine and Timo Aila}, title = {Temporal Ensembling for Semi-Supervised Learning}, booktitle = {Proc. International Conference on Learning Representations (ICLR)}, year = {2017}, }
Apex Point Map for Constant-Time Bounding Plane Approximation.
Samuli Laine and Tero Karras. Eurographics Symposium on Rendering 2015 (EI&I track). Abstract Bibtex [PDF] [Slides] |
We introduce apex point map, a simple data structure for constructing conservative bounds for rigid objects. The data structure is distilled from a dense k-DOP, and can be queried in constant time to determine a tight bounding plane with any given normal vector. Both precalculation and lookup can be implemented very efficiently on current GPUs. Applications include, e.g., finding tight world-space bounds for transformed meshes, determining per-object shadow map extents, more accurate view frustum culling, and collision detection.
@inproceedings{Laine2015egsr, author = {Samuli Laine and Tero Karras}, title = {Apex Point Map for Constant-Time Bounding Plane Approximation}, booktitle = {Eurographics Symposium on Rendering - Experimental Ideas & Implementations}, year = {2015}, editor = {Jaakko Lehtinen and Derek Nowrouzezahrai}, publisher = {The Eurographics Association}, DOI = {10.2312/sre.20151166}, }
Occluder Simplification using Planar Sections.
Ari Silvennoinen, Hannu Saransaari, Samuli Laine and Jaakko Lehtinen. Computer Graphics Forum 33(1), 2014. Abstract Bibtex [PDF] [Video] |
We present a method for extreme occluder simplification. We take a triangle soup as input, and produce a small set of polygons with closely matching occlusion properties. In contrast to methods that optimize the original geometry, our algorithm has very few requirements for the input—specifically, the input does not need to be a watertight, two-manifold mesh. This robustness is achieved by working on a well-behaved, discretized representation of the input instead of the original, potentially badly structured geometry. We first formulate the algorithm for individual occluders, and further introduce a hierarchy for handling large, complex scenes.
@article{Silvennoinen2014CGF, author = {Silvennoinen, Ari and Saransaari, Hannu and Laine, Samuli and Lehtinen, Jaakko}, title = {Occluder Simplification Using Planar Sections}, journal = {Computer Graphics Forum}, volume = {33}, number = {1}, issn = {1467-8659}, url = {http://dx.doi.org/10.1111/cgf.12271}, doi = {10.1111/cgf.12271}, pages = {235--245}, year = {2014}, }
Gradient-Domain Metropolis Light Transport.
Jaakko Lehtinen, Tero Karras, Samuli Laine, Miika Aittala, Frédo Durand and Timo Aila. ACM Transactions on Graphics 32(4) (SIGGRAPH 2013). Abstract Bibtex [PDF] [Project page] [Door scene] |
We introduce a novel Metropolis rendering algorithm that directly computes image gradients, and reconstructs the final image from the gradients by solving a Poisson equation. The reconstruction is aided by a low-fidelity approximation of the image computed during gradient sampling. As an extension of path-space Metropolis light transport, our algorithm is well suited for difficult transport scenarios. We demonstrate that our method outperforms the state-of-the-art in several well-known test scenes. Additionally, we analyze the spectral properties of gradient-domain sampling, and compare it to the traditional image-domain sampling.
@article{Lehtinen2013sg, author = {Jaakko Lehtinen and Tero Karras and Samuli Laine and Miika Aittala and Fr\'{e}do Durand and Timo Aila}, title = {Gradient-Domain Metropolis Light Transport}, journal = {ACM Transactions on Graphics}, year = {2013}, volume = {32}, number = {4}, }
On Quality Metrics of Bounding Volume Hierarchies.
Timo Aila, Tero Karras and Samuli Laine. High-Performance Graphics 2013. Best paper award Abstract Bibtex [PDF] |
The surface area heuristic (SAH) is widely used as a predictor for ray tracing performance, and as a heuristic to guide the construction of spatial acceleration structures. We investigate how well SAH actually predicts ray tracing performance of a bounding volume hierarchy (BVH), observe that this relationship is far from perfect, and then propose two new metrics that together with SAH almost completely explain the measured performance. Our observations shed light on the increasingly common situation that a supposedly good tree construction algorithm produces trees that are slower to trace than expected. We also note that the trees constructed using greedy top-down algorithms are consistently faster to trace than SAH indicates and are also more SIMD-friendly than competing approaches.
@inproceedings{Aila2013hpg, author = {Timo Aila and Tero Karras and Samuli Laine}, title = {On Quality Metrics of Bounding Volume Hierarchies}, booktitle = {Proc. High-Performance Graphics}, year = {2013}, }
Megakernels Considered Harmful: Wavefront Path Tracing on GPUs.
Samuli Laine, Tero Karras and Timo Aila. High-Performance Graphics 2013. Best paper 2nd place Test of time award 2nd place Abstract Bibtex [PDF] [Slides] |
When programming for GPUs, simply porting a large CPU program into an equally large GPU kernel is generally not a good approach. Due to SIMT execution model on GPUs, divergence in control flow carries substantial performance penalties, as does high register usage that lessens the latency-hiding capability that is essential for the high-latency, high-bandwidth memory system of a GPU. In this paper, we implement a path tracer on a GPU using a wavefront formulation, avoiding these pitfalls that can be especially prominent when using materials that are expensive to evaluate. We compare our performance against the traditional megakernel approach, and demonstrate that the wavefront formulation is much better suited for real-world use cases where multiple complex materials are present in the scene.
@inproceedings{Laine2013hpg, author = {Samuli Laine and Tero Karras and Timo Aila}, title = {Megakernels Considered Harmful: Wavefront Path Tracing on {GPU}s}, booktitle = {Proceedings of High-Performance Graphics 2013}, year = {2013}, }
A Topological Approach to Voxelization.
Samuli Laine. Computer Graphics Forum 32(4) (EGSR 2013). Abstract Bibtex [PDF] [Slides] |
We present a novel approach to voxelization, based on intersecting the input primitives against intersection targets in the voxel grid. Instead of relying on geometric proximity measures, our approach is topological in nature, i.e., it builds on the connectivity and separability properties of the input and the intersection targets. We discuss voxelization of curves and surfaces in both 2D and 3D, and derive intersection targets that produce voxelizations with various connectivity, separability and thinness properties. The simplicity of our method allows for easy proofs of these properties. Our approach is directly applicable to curved primitives, and it is independent of input tessellation.
@article{Laine2013egsr, author = {Samuli Laine}, title = {A Topological Approach to Voxelization}, journal = {Computer Graphics Forum (Proc. Eurographics Symposium on Rendering 2013)} volume = {32}, number = {4}, year = {2013}, publisher = {Eurographics Association and Blackwell Publishing Ltd}, }
Reconstructing the Indirect Light Field for Global Illumination.
Jaakko Lehtinen, Timo Aila, Samuli Laine and Frédo Durand. ACM Transactions on Graphics 31(4) (SIGGRAPH 2012). Abstract Bibtex [PDF] [Project page] |
Stochastic techniques for rendering indirect illumination suffer from noise due to the variance in the integrand. In this paper, we describe a general reconstruction technique that exploits anisotropy in the light field and permits efficient reuse of input samples between pixels or world-space locations, multiplying the effective sampling rate by a large factor. Our technique introduces visibility-aware anisotropic reconstruction to indirect illumination, ambient occlusion and glossy reflections. It operates on point samples without knowledge of the scene, and can thus be seen as an advanced image filter. Our results show dramatic improvement in image quality while using very sparse input samplings.
@article{Lehtinen2012sg, author = {Jaakko Lehtinen and Timo Aila and Samuli Laine and Fr\'{e}do Durand}, title = {Reconstructing the Indirect Light Field for Global Illumination}, journal = {ACM Transactions on Graphics}, year = {2012}, volume = {31}, number = {4}, }
Understanding the Efficiency of Ray Traversal on GPUs – Kepler and Fermi Addendum.
Timo Aila, Samuli Laine and Tero Karras. NVIDIA Technical Report NVR-2012-02, 2012. Poster at High-Performance Graphics 2012. Abstract Bibtex [PDF] [Poster] [Code] [Project page] |
This technical report is an addendum to the HPG2009 paper "Understanding the Efficiency of Ray Traversal on GPUs", and provides citable performance results for Kepler and Fermi architectures. We explain how to optimize the traversal and intersection kernels for these newer platforms, and what the important architectural limiters are. We plot the relative ray tracing performance between architecture generations against the available memory bandwidth and peak FLOPS, and demonstrate that ray tracing is still, even with incoherent rays and more complex scenes, almost entirely limited by the available FLOPS. We will also discuss two esoteric instructions, present in both Fermi and Kepler, and show that they can be safely used for faster acceleration structure traversal.
@techreport{Aila:Efficiency:NVIDIA:2012, author = {Timo Aila and Samuli Laine and Tero Karras}, title = {Understanding the Efficiency of Ray Traversal on {GPU}s -- {K}epler and {F}ermi Addendum}, month = {June}, year = {2012}, institution = {NVIDIA Corporation}, type = {NVIDIA Technical Report}, number = {NVR-2012-02}, }
Clipless Dual-Space Bounds for Faster Stochastic Rasterization.
Samuli Laine, Timo Aila, Tero Karras and Jaakko Lehtinen. ACM Transactions on Graphics 30(4) (SIGGRAPH 2011). Abstract Bibtex [PDF] [Video] [Slides] |
We present a novel method for increasing the efficiency of stochastic rasterization of motion and defocus blur. Contrary to earlier approaches, our method is efficient even with the low sampling densities commonly encountered in realtime rendering, while allowing the use of arbitrary sampling patterns for maximal image quality. Our clipless dual-space formulation avoids problems with triangles that cross the camera plane during the shutter interval. The method is also simple to plug into existing rendering systems.
@article{Laine2011sg, author = {Samuli Laine and Timo Aila and Tero Karras and Jaakko Lehtinen}, title = {Clipless Dual-Space Bounds for Faster Stochastic Rasterization}, journal = {ACM Transactions on Graphics}, year = {2011}, volume = {30}, number = {4}, }
Temporal Light Field Reconstruction for Rendering Distribution Effects.
Jaakko Lehtinen, Timo Aila, Jiawen Chen, Samuli Laine and Frédo Durand. ACM Transactions on Graphics 30(4) (SIGGRAPH 2011). Abstract Bibtex [PDF] [Videos] [Images] [Project page] |
Traditionally, effects that require evaluating multidimensional integrals for each pixel, such as motion blur, depth of field, and soft shadows, suffer from noise due to the variance of the highdimensional integrand. In this paper, we describe a general reconstruction technique that exploits the anisotropy in the temporal light field and permits efficient reuse of samples between pixels, multiplying the effective sampling rate by a large factor. We show that our technique can be applied in situations that are challenging or impossible for previous anisotropic reconstruction methods, and that it can yield good results with very sparse inputs. We demonstrate our method for simultaneous motion blur, depth of field, and soft shadows.
@article{Lehtinen2011sg, author = {Jaakko Lehtinen and Timo Aila and Jiawen Chen and Samuli Laine and Fr\'{e}do Durand}, title = {Temporal Light Field Reconstruction for Rendering Distribution Effects}, journal = {ACM Transactions on Graphics}, year = {2011}, volume = {30}, number = {4}, }
Improved Dual-Space Bounds for Simultaneous Motion and Defocus Blur.
Samuli Laine and Tero Karras. NVIDIA Technical Report NVR-2011-004, 2011. Abstract Bibtex [PDF] |
Our previous paper on stochastic rasterization [Laine et al. 2011] presented a method for constructing time and lens bounds to accelerate stochastic rasterization by skipping the costly 5D coverage test. Although the method works for the combined case of simultaneous motion and defocus blur, its efficiency drops when significant amounts of both effects are present. In this paper, we describe a bound computation method that treats time and lens domains in a unified fashion, and yields tight bounds also for the combined case.
@techreport{Laine:Bounds2:NVIDIA:2011, author = {Samuli Laine and Tero Karras}, title = {Improved Dual-Space Bounds for Simultaneous Motion and Defocus Blur}, month = {November}, year = {2011}, institution = {NVIDIA Corporation}, type = {NVIDIA Technical Report}, number = {NVR-2011-004}, }
Efficient Triangle Coverage Tests for Stochastic Rasterization.
Samuli Laine and Tero Karras. NVIDIA Technical Report NVR-2011-003, 2011. Abstract Bibtex [PDF] |
In our previous paper on stochastic rasterization [Laine et al. 2011], we stated that a 5D triangle coverage test consumes approximately 25 FMA (fused multiply-add) operations. This technical report details the operation of our coverage test. We also provide variants specialized for defocus-only and motion-only cases.
@techreport{Laine:Coverage:NVIDIA:2011, author = {Samuli Laine and Tero Karras}, title = {Efficient Triangle Coverage Tests for Stochastic Rasterization}, month = {September}, year = {2011}, institution = {NVIDIA Corporation}, type = {NVIDIA Technical Report}, number = {NVR-2011-003}, }
High-Performance Software Rasterization on GPUs.
Samuli Laine and Tero Karras. High-Performance Graphics 2011. Best paper award Abstract Bibtex [PDF] [Slides] [Code] |
In this paper, we implement an efficient, completely software-based graphics pipeline on a GPU. Unlike previous approaches, we obey ordering constraints imposed by current graphics APIs, guarantee hole-free rasterization, and support multisample antialiasing. Our goal is to examine the performance implications of not exploiting the fixed-function graphics pipeline, and to discern which additional hardware support would benefit software-based graphics the most.
We present significant improvements over previous work in terms of scalability, performance, and capabilities. Our pipeline is malleable and easy to extend, and we demonstrate that in a wide variety of test cases its performance is within a factor of 2–8x compared to the hardware graphics pipeline on a top of the line GPU.
Our implementation is open sourced and available at http://code.google.com/p/cudaraster/.
@inproceedings{Laine2011hpg, author = {Samuli Laine and Tero Karras}, title = {High-Performance Software Rasterization on {GPU}s}, booktitle = {Proceedings of High-Performance Graphics 2011}, year = {2011}, }
Stratified Sampling for Stochastic Transparency.
Samuli Laine and Tero Karras. Computer Graphics Forum 30(4) (EGSR 2011). Abstract Bibtex [PDF] [Slides] |
The traditional method of rendering semi-transparent surfaces using alpha blending requires sorting the surfaces in depth order. There are several techniques for order-independent transparency, but most require either unbounded storage or can be fragile due to forced compaction of information during rendering. Stochastic transparency works in a fixed amount of storage and produces results with the correct expected value. However, carelessly chosen sampling strategies easily result in high variance of the final pixel colors, showing as noise in the image. In this paper, we describe a series of improvements to stochastic transparency that enable stratified sampling in both spatial and alpha domains. As a result, the amount of noise in the image is significantly reduced, while the result remains unbiased.
@article{Laine2011egsr, author = {Samuli Laine and Tero Karras}, title = {Stratified Sampling for Stochastic Transparency}, journal = {Computer Graphics Forum (Proc. Eurographics Symposium on Rendering 2011)} volume = {30}, number = {4}, year = {2011}, publisher = {Eurographics Association and Blackwell Publishing Ltd}, }
Efficient Sparse Voxel Octrees.
Samuli Laine and Tero Karras. IEEE Transactions on Visualization and Computer Graphics 17(8), 2011. Abstract Bibtex [IEEE Digital Library] [Code] |
In this paper we examine the possibilities of using voxel representations as a generic way for expressing complex and feature-rich geometry on current and future GPUs. We present in detail a compact data structure for storing voxels and an efficient algorithm for performing ray casts using this structure.
We augment the voxel data with novel contour information that increases geometric resolution, allows more compact encoding of smooth surfaces, and accelerates ray casts. We also employ a novel normal compression format for storing high-precision object-space normals. Finally, we present a variable-radius post-process filtering technique for smoothing out blockiness caused by discrete sampling of shading attributes.
Based on benchmark results, we show that our voxel representation is competitive with triangle-based representations in terms of ray casting performance, while allowing tremendously greater geometric detail and unique shading information for every voxel.
Our voxel codebase is open sourced and available at http://code.google.com/p/efficient-sparse-voxel-octrees/.
@article{10.1109/TVCG.2010.240, author = {Samuli Laine and Tero Karras}, title = {Efficient Sparse Voxel Octrees}, journal = {IEEE Transactions on Visualization and Computer Graphics}, volume = {17}, issn = {1077-2626}, year = {2011}, pages = {1048-1059}, doi = {http://doi.ieeecomputersociety.org/10.1109/TVCG.2010.240}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }
A Local Image Reconstruction Algorithm for Stochastic Rendering.
Peter Shirley, Timo Aila, Jonathan Cohen, Eric Enderton, Samuli Laine, David Luebke and Morgan McGuire. ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games 2011. Abstract Bibtex [PDF] |
Stochastic renderers produce unbiased but noisy images of scenes that include the advanced camera effects of motion and defocus blur and possibly other effects such as transparency. We present a simple algorithm that selectively adds bias in the form of image space blur to pixels that are unlikely to have high frequency content in the final image. For each pixel, we sweep once through a fixed neighborhood of samples in front to back order, using a simple accumulation scheme. We achieve good quality images with only 16 samples per pixel, making the algorithm potentially practical for interactive stochastic rendering in the near future.
@inproceedings{Shirley2011i3d, author = {Peter Shirley and Timo Aila and Jonathan Cohen and Eric Enderton and Samuli Laine and David Luebke and Morgan Mc{G}uire}, title = {A Local Image Reconstruction Algorithm for Stochastic Rendering}, booktitle = {Proceedings of ACM SIGGRAPH 2011 Symposium on Interactive 3D Graphics and Games}, pages = {9--13}, year = {2011}, publisher = {ACM Press}, }
Restart Trail for Stackless BVH Traversal.
Samuli Laine. High-Performance Graphics 2010. Abstract Bibtex [PDF] [Slides] |
A ray cast algorithm utilizing a hierarchical acceleration structure needs to perform a tree traversal in the hierarchy. In its basic form, executing the traversal requires a stack that holds the nodes that are still to be processed. In some cases, such a stack can be prohibitively expensive to maintain or access, due to storage or memory bandwidth limitations. The stack can, however, be eliminated or replaced with a fixed-size buffer using so-called stackless or short stack algorithms. These require that the traversal can be restarted from root so that the already processed part of the tree is not entered again. For kd-tree ray casts, this is accomplished easily by ray shortening, but the approach does not extend to other kinds of hierarchies such as BVHs.
In this paper, we introduce restart trail, a simple algorithmic method that makes restarts possible regardless of the type of hierarchy by storing one bit of data per level. This enables stackless and short stack traversal for BVH ray casts, where using a full stack or constraining the traversal order have so far been the only options.
@inproceedings{Laine2010hpg, author = {Samuli Laine}, title = {Restart Trail for Stackless {BVH} Traversal}, booktitle = {Proceedings of High-Performance Graphics 2010}, year = {2010}, }
Two Methods for Fast Ray-Cast Ambient Occlusion.
Samuli Laine and Tero Karras. Computer Graphics Forum 29(4) (EGSR 2010). Abstract Bibtex [PDF] [Slides] |
Ambient occlusion has proven to be a useful tool for producing realistic images, both in offline rendering and interactive applications. In production rendering, ambient occlusion is typically computed by casting a large number of short shadow rays from each visible point, yielding unparalleled quality but long rendering times. Interactive applications typically use screen-space approximations which are fast but suffer from systematic errors due to missing information behind the nearest depth layer.
In this paper, we present two efficient methods for calculating ambient occlusion so that the results match those produced by a ray tracer. The first method is targeted for rasterization-based engines, and it leverages the GPU graphics pipeline for finding occlusion relations between scene triangles and the visible points. The second method is a drop-in replacement for ambient occlusion computation in offline renderers, allowing the querying of ambient occlusion for any point in the scene. Both methods are based on the principle of simultaneously computing the result of all shadow rays for a single receiver point.
@article{Laine2010egsr, author = {Samuli Laine and Tero Karras}, title = {Two Methods for Fast Ray-Cast Ambient Occlusion}, journal = {Computer Graphics Forum (Proc. Eurographics Symposium on Rendering 2010)} volume = {29}, number = {4}, year = {2010}, publisher = {Eurographics Association and Blackwell Publishing Ltd}, }
Efficient Sparse Voxel Octrees – Analysis, Extensions, and Implementation.
Samuli Laine and Tero Karras. NVIDIA Technical Report NVR-2010-001, 2010. Abstract Bibtex [PDF] [Code] |
This technical report extends our previous paper on sparse voxel octrees. We first discuss the benefits and drawbacks of voxel representations and how the storage space requirements behave for different kinds of content. Then, we explain in detail our compact data structure for storing voxels and an efficient ray cast algorithm that utilizes this structure, including the contributions of the original paper: additional voxel contour information, normal compression format for storing high-precision object-space normals, post-process filtering technique for smoothing out blockiness of shading, and beam optimization for accelerating ray casts.
Management of voxel data in memory and on disk is covered in more detail, as well as the construction of voxel hierarchy. We extend the results section considerably, providing detailed statistics of our test cases. Finally, we discuss the technological barriers and problems that would need to be overcome before voxels could be widely adopted as a generic content format.
Our voxel codebase is open sourced and available at http://code.google.com/p/efficient-sparse-voxel-octrees.
@techreport{Laine:Octree:NVIDIA:2010, author = {Samuli Laine and Tero Karras}, title = {Efficient Sparse Voxel Octrees -- Analysis, Extensions, and Implementation}, month = {February}, year = {2010}, institution = {NVIDIA Corporation}, type = {NVIDIA Technical Report}, number = {NVR-2010-001}, }
Efficient Sparse Voxel Octrees.
Samuli Laine and Tero Karras. ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games 2010. Best paper honorable mention Abstract Bibtex [PDF] [Video (Xvid)] [Slides] [Code] |
In this paper we examine the possibilities of using voxel representations as a generic way for expressing complex and feature-rich geometry on current and future GPUs. We present in detail a compact data structure for storing voxels and an efficient algorithm for performing ray casts using this structure.
We augment the voxel data with novel contour information that increases geometric resolution, allows more compact encoding of smooth surfaces, and accelerates ray casts. We also employ a novel normal compression format for storing high-precision object-space normals. Finally, we present a variable-radius post-process filtering technique for smoothing out blockiness caused by discrete sampling of shading attributes.
Our benchmarks show that our voxel representation is competitive with triangle-based representations in terms of ray casting performance, while allowing tremendously greater geometric detail and unique shading information for every voxel.
@inproceedings{Laine2010i3d, author = {Samuli Laine and Tero Karras}, title = {Efficient Sparse Voxel Octrees}, booktitle = {Proceedings of ACM SIGGRAPH 2010 Symposium on Interactive 3D Graphics and Games}, pages = {55--63}, year = {2010}, publisher = {ACM Press}, }
Understanding the Efficiency of Ray Traversal on GPUs.
Timo Aila and Samuli Laine. High-Performance Graphics 2009. Best paper 3rd place Test of time award Abstract Bibtex [PDF] [Slides] [Code] [Project page] |
We discuss the mapping of elementary ray tracing operations---acceleration structure traversal and primitive intersection---onto wide SIMD/SIMT machines. Our focus is on NVIDIA GPUs, but some of the observations should be valid for other wide machines as well. While several fast GPU tracing methods have been published, very little is actually understood about their performance. Nobody knows whether the methods are anywhere near the theoretically obtainable limits, and if not, what might be causing the discrepancy. We study this question by comparing the measurements against a simulator that tells the upper bound of performance for a given kernel. We observe that previously known methods are a factor of 1.5--2.5X off from theoretical optimum, and most of the gap is not explained by memory bandwidth, but rather by previously unidentified inefficiencies in hardware work distribution. We then propose a simple solution that significantly narrows the gap between simulation and measurement. This results in the fastest GPU ray tracer to date. We provide results for primary, ambient occlusion and diffuse interreflection rays.
@inproceedings{Aila2009hpg, author = {Timo Aila and Samuli Laine}, title = {Understanding the Efficiency of Ray Traversal on {GPU}s}, booktitle = {Proceedings of High-Performance Graphics 2009}, pages = {145--149}, year = {2009}, }
Accelerated Beam Tracing Algorithm.
Samuli Laine, Samuel Siltanen, Tapio Lokki and Lauri Savioja. Applied Acoustics 70(1), 2009. Abstract Bibtex [PDF] [Code] |
Determining early specular reflection paths is essential for room acoustics modeling. Beam tracing algorithms have been used to calculate these paths efficiently, thus allowing modeling of acoustics in real-time with a moving listener in simple, or complex but densely occluded, environments with a stationary sound source. In this paper it is shown that beam tracing algorithms can still be optimized by utilizing the spatial coherence in path validation with a moving listener. Since the precalculations required for the presented technique are relatively fast, the acoustic reflection paths can be calculated even for a moving source in simple cases. Simulations were performed to show how the accelerated algorithm compares with the basic algorithm with varying scene complexity and occlusion. Up to two orders of magnitude speed-up was achieved.
@article{Laine2009aa, author = {Samuli Laine and Samuel Siltanen and Tapio Lokki and Lauri Savioja}, title = {Accelerated Beam Tracing Algorithm}, journal = {Applied Acoustics}, volume = {70}, number = {1}, year = {2009}, pages = {172--181}, }
Incremental Instant Radiosity.
Hannu Saransaari, Samuli Laine, Janne Kontkanen, Jaakko Lehtinen and Timo Aila. Article published in book ShaderX6, Charles River Media, 2008. Bibtex [Code] |
inbook{Saransaari2008shaderx6, author = {Hannu Saransaari and Samuli Laine and Janne Kontkanen and Jaakko Lehtinen and Timo Aila}, title = {Incremental Instant Radiosity}, editor = {Wolfgang Engel}, booktitle = {ShaderX^6}, year = {2008}, pages = {381--391}, chapter = {6.2}, }
Incremental Instant Radiosity for Real-Time Indirect Illumination.
Samuli Laine, Hannu Saransaari, Janne Kontkanen, Jaakko Lehtinen and Timo Aila. Eurographics Symposium on Rendering 2007. Abstract Bibtex [PDF] [Animations] [Slides] [Code] |
We present a method for rendering single-bounce indirect illumination in real time on currently available graphics hardware. The method is based on the instant radiosity algorithm, where virtual point lights (VPLs) are generated by casting rays from the primary light source. Hardware shadow maps are then employed for determining the indirect illumination from the VPLs. Our main contribution is an algorithm for reusing the VPLs and incrementally maintaining their good distribution. As a result, only a few shadow maps need to be rendered per frame as long as the motion of the primary light source is reasonably smooth. This yields real-time frame rates even when hundreds of VPLs are used.
@inproceedings{Laine2007egsr, author = {Samuli Laine and Hannu Saransaari and Janne Kontkanen and Jaakko Lehtinen and Timo Aila}, title = {Incremental Instant Radiosity for Real-Time Indirect Illumination}, booktitle = {Proceedings of Eurographics Symposium on Rendering 2007}, pages = {277--286}, year = {2007}, publisher = {Eurographics Association}, }
An Improved Physically-Based Soft Shadow Volume Algorithm.
Jaakko Lehtinen, Samuli Laine and Timo Aila. Computer Graphics Forum 25(3) (Eurographics 2006). Abstract Bibtex [PDF] |
We identify and analyze several performance problems in a state-of-the-art physically-based soft shadow volume algorithm, and present an improved method that alleviates these problems by replacing an overly conservative spatial acceleration structure by a more efficient one. The new technique consistently outperforms both the previous method and a ray tracing-based reference solution in several realistic situations while retaining the correctness of the solution and other desirable characteristics of the previous method. These include the unintrusiveness of the original algorithm, meaning that our method can be used as a black-box shadow solver in any offline renderer without requiring multiple passes over the image or other special accommodation. We achieve speedup factors from 1.6 to 12.3 when compared to the previous method.
@article{Lehtinen2006eurographics, author = {Jaakko Lehtinen and Samuli Laine and Timo Aila}, title = {An Improved Physically-Based Soft Shadow Volume Algorithm}, journal = {Computer Graphics Forum}, volume = {25}, number = {3}, year = {2006}, pages = {303--312}, publisher = {Eurographics Association and Blackwell Publishing Ltd}, }
A Weighted Error Metric and Optimization Method for Antialiasing Patterns.
Samuli Laine and Timo Aila. Computer Graphics Forum 25(1), 2006. Abstract Bibtex [PDF] [Pattern page] |
Displaying a synthetic image on a computer display requires determining the colors of individual pixels. To avoid aliasing, multiple samples of the image can be taken per pixel, after which the color of a pixel may be computed as a weighted sum of the samples. The positions and weights of the samples play a major role in the resulting image quality, especially in real-time applications where usually only a handful of samples can be afforded per pixel. This paper presents a new error metric and an optimization method for antialiasing patterns used in image reconstruction. The metric is based on comparing the pattern against a given reference reconstruction filter in spatial domain and it takes into account psychovisually measured angle-specific acuities for sharp features.
@article{Laine2006cgf, author = {Samuli Laine and Timo Aila}, title = {A Weighted Error Metric and Optimization Method for Antialiasing Patterns}, journal = {Computer Graphics Forum}, volume = {25}, number = {1}, year = {2006}, pages = {83--94}, publisher = {Eurographics Association and Blackwell Publishing Ltd}, }
Sampling Precomputed Volumetric Lighting.
Janne Kontkanen and Samuli Laine. Journal of Graphics Tools 11(3), 2006. Abstract Bibtex [PDF] |
Precomputing volumetric lighting allows realistic mutual shadowing and reflections between objects with little runtime cost: for example, using an irradiance volume the shadows and reflections due to a static scene can be precomputed into a three-dimensional grid and this grid can be used to shade moving objects at runtime. However, a rather low spatial resolution has to be used to keep the memory requirements acceptable. For this reason, these methods often suffer from aliasing artifacts.
In this article we introduce a new sampling algorithm for precomputing lighting into a regular three-dimensional grid. The advantage of the new method is that it dramatically reduces aliasing while adding only a small overhead for the precomputation time. Additionally, the runtime component does not have to be changed at all.
@article{Kontkanen2006jgt, author = {Janne Kontkanen and Samuli Laine}, title = {Sampling Precomputed Volumetric Lighting}, journal = {Journal of Graphics Tools}, year = {2006}, pages = {1--16}, volume = {11}, number = {3}, }
A Family of Inexpensive Sampling Schemes.
Jon Hasselgren, Tomas Akenine-Möller and Samuli Laine. Computer Graphics Forum 24(4), 2005. Abstract Bibtex [PDF] |
To improve image quality in computer graphics, antialiazing techniques such as supersampling and multisampling are used. We explore a family of inexpensive sampling schemes that cost as little as 1.25 samples per pixel and up to 2.0 samples per pixel. By placing sample points in the corners or on the edges of the pixels, sharing can occur between pixels, and this makes it possible to create inexpensive sampling schemes. Using an evaluation and optimization framework, we present optimized sampling patterns costing 1.25, 1.5, 1.75 and 2.0 samples per pixel.
@article{Hasselgren2005cgf, author = {Jon Hasselgren and Tomas Akenine-M\"oller and Samuli Laine}, title = {A Family of Inexpensive Sampling Schemes}, journal = {Computer Graphics Forum}, volume = {24}, number = {4}, year = {2005}, pages = {843--848}, publisher = {Eurographics Association and Blackwell Publishing Ltd}, }
Soft Shadow Volumes for Ray Tracing.
Samuli Laine, Timo Aila, Ulf Assarsson, Jaakko Lehtinen and Tomas Akenine-Möller. ACM Transactions on Graphics 24(3) (SIGGRAPH 2005). Proceedings cover image Abstract Bibtex [PDF] [Slides] |
We present a new, fast algorithm for rendering physically-based soft shadows in ray tracing-based renderers. Our method replaces the hundreds of shadow rays commonly used in stochastic ray tracers with a single shadow ray and a local reconstruction of the visibility function. Compared to tracing the shadow rays, our algorithm produces exactly the same image while executing one to two orders of magnitude faster in the test scenes used. Our first contribution is a two-stage method for quickly determining the silhouette edges that overlap an area light source, as seen from the point to be shaded. Secondly, we show that these partial silhouettes of occluders, along with a single shadow ray, are sufficient for reconstructing the visibility function between the point and the light source.
@article{Laine2005sg, author = {Samuli Laine and Timo Aila and Ulf Assarsson and Jaakko Lehtinen and Tomas Akenine-M\"oller}, title = {Soft Shadow Volumes for Ray Tracing}, journal = {ACM Transactions on Graphics}, year = {2005}, pages = {1156--1165}, volume = {24}, number = {3}, publisher = {ACM}, }
Hierarchical Penumbra Casting.
Samuli Laine and Timo Aila. Computer Graphics Forum 24(3) (Eurographics 2005). Abstract Bibtex [PDF] [Slides] Note: Math symbol ℓ is missing in the CGF print. |
We present a novel algorithm for rendering physically-based soft shadows in complex scenes. Instead of casting shadow rays, we place both the points to be shaded and the samples of an area light source into separate hierarchies, and compute hierarchically the shadows caused by each occluding triangle. This yields an efficient algorithm with memory requirements independent of the complexity of the scene.
@article{Laine2005eurographics, author = {Samuli Laine and Timo Aila}, title = {Hierarchical Penumbra Casting}, journal = {Computer Graphics Forum}, volume = {24}, number = {3}, year = {2005}, pages = {313--322}, publisher = {Eurographics Association and Blackwell Publishing Ltd}, }
Split-Plane Shadow Volumes.
Samuli Laine. Graphics Hardware 2005. Abstract Bibtex [PDF] [Animations] [Slides] |
We present a novel method for rendering shadow volumes. The core idea of the method is to locally choose between Z-pass and Z-fail algorithms on a per-tile basis. The choice is made by comparing the contents of the low-resolution depth buffer against an automatically constructed split plane. We show that this reduces the number of stencil updates substantially without affecting the resulting shadows. We outline a simple and efficient hardware implementation that enables the early tile culling stages to reject considerably more pixels than with shadow volume optimizations currently available in the hardware.
@inproceedings{Laine2005gh, author = {Samuli Laine}, title = {Split-Plane Shadow Volumes}, booktitle = {Proceedings of Graphics Hardware 2005}, pages = {23--32}, year = {2005}, publisher = {Eurographics Association}, }
A General Algorithm for Output-Sensitive Visibility Preprocessing.
Samuli Laine. ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games 2005. Abstract Bibtex [PDF] [Slides] |
Occlusion culling based on precomputed visibility information is a standard method for accelerating the rendering in real-time graphics applications. In this paper we present a new general algorithm that performs the visibility precomputation for a group of viewcells in an output-sensitive fashion. This is achieved by exploiting the directional coherence of visibility between adjacent viewcells. The algorithm is independent of the underlying from-region visibility solver and is therefore applicable to exact, conservative and aggressive visibility solvers in both 2D and 3D.
@inproceedings{Laine2005i3d, author = {Samuli Laine}, title = {A General Algorithm for Output-Sensitive Visibility Preprocessing}, booktitle = {Proceedings of ACM SIGGRAPH 2005 Symposium on Interactive 3D Graphics and Games}, pages = {31--39}, year = {2005}, publisher = {ACM Press}, }
Ambient Occlusion Fields.
Janne Kontkanen and Samuli Laine. Article published in book ShaderX4, Charles River Media, 2006. Bibtex |
@inbook{Kontkanen2006shaderx4, author = {Janne Kontkanen and Samuli Laine}, title = {Ambient Occlusion Fields}, editor = {Wolfgang Engel}, booktitle = {ShaderX^4}, year = {2005}, pages = {101--108}, chapter = {2.4}, }
Ambient Occlusion Fields.
Janne Kontkanen and Samuli Laine. ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games 2005. Abstract Bibtex [PDF] [Video] |
We present a novel real-time technique for computing inter-object ambient occlusion. For each occluding object, we precompute a field in the surrounding space that encodes an approximation of the occlusion caused by the object. This volumetric information is then used at run-time in a fragment program for quickly determining the shadow cast on the receiving objects. According to our results, both the computational and storage requirements are low enough for the technique to be directly applicable to computer games running on the current graphics hardware.
@inproceedings{Kontkanen2005i3d, author = {Janne Kontkanen and Samuli Laine}, title = {Ambient Occlusion Fields}, booktitle = {Proceedings of ACM SIGGRAPH 2005 Symposium on Interactive 3D Graphics and Games}, pages = {41--48}, year = {2005}, publisher = {ACM Press}, }
Alias-Free Shadow Maps.
Timo Aila and Samuli Laine. Eurographics Symposium on Rendering 2004. Abstract Bibtex [PDF] [Slides] [Hairball in .OBJ format] |
In this paper we abandon the regular structure of shadow maps. Instead, we transform the visible pixels P(x,y,z) from screen space to the image plane of a light source P'(x',y',z'). The (x',y') are then used as sampling points when the geometry is rasterized into the shadow map. This eliminates the resolution issues that have plagued shadow maps for decades, e.g., jagged shadow boundaries. Incorrect self-shadowing is also greatly reduced, and semi-transparent shadow casters and receivers can be supported. A hierarchical software implementation is outlined.
@inproceedings{Aila2004egsr, author = {Timo Aila and Samuli Laine}, title = {Alias-Free Shadow Maps}, booktitle = {Proceedings of Eurographics Symposium on Rendering 2004}, pages = {161--166}, year = {2004}, publisher = {Eurographics Association}, }
Proceedings
Proceedings of High-Performance Graphics 2010. Michael Doggett, Samuli Laine and Warren Hunt (editors). Saarbrücken, Germany. Bibtex [Preface and Table of Contents] [EG DL] [ACM DL] |
@proceedings{HPG10-proc, editor = {Michael Doggett and Samuli Laine and Warren Hunt}, title = {High-Performance Graphics 2010}, year = {2010}, isbn = {978-3-905674-26-2}, issn = {2079-8679}, address = {Saarbr\"{u}cken, Germany}, publisher = {Eurographics Association}, }
Theses
Efficient Physically-Based Shadow Algorithms. Samuli Laine. Doctoral thesis, Helsinki University of Technology, August 2006. Abstract Bibtex [PDF] |
This research focuses on developing efficient algorithms for computing shadows in computer-generated images. A distinctive feature of the shadow algorithms presented in this thesis is that they produce correct, physically-based results, instead of giving approximations whose quality is often hard to ensure or evaluate.
Light sources that are modeled as points without any spatial extent produce hard shadows with sharp boundaries. Shadow mapping is a traditional method for rendering such shadows. A shadow map is a depth buffer computed from the scene, using a point light source as the viewpoint. The finite resolution of the shadow map requires that its contents are resampled when determining the shadows on visible surfaces. This causes various artifacts such as incorrect self-shadowing and jagged shadow boundaries. A novel method is presented that avoids the resampling step, and provides exact shadows for every point visible in the image.
The shadow volume algorithm is another commonly used algorithm for real-time rendering of hard shadows. This algorithm gives exact results and does not suffer from any resampling problems, but it tends to consume a lot of fillrate, which leads to performance problems. This thesis presents a new technique for locally choosing between two previous shadow volume algorithms with different performance characteristics. A simple criterion for making the local choices is shown to yield better performance than using either of the algorithms alone.
Light sources with nonzero spatial extent give rise to soft shadows with smooth boundaries. A novel method is presented that transposes the classical processing order for soft shadow computation in offline rendering. Instead of casting shadow rays, the algorithm first conceptually collects every ray that would need to be cast, and then processes the shadow-casting primitives one by one, hierarchically finding the rays that are blocked.
Another new soft shadow algorithm takes a different point of view into computing the shadows. Only the silhouettes of the shadow casters are used for determining the shadows, and an unintrusive execution model makes the algorithm practical for production use in offline rendering.
The proposed techniques accelerate the computing of physically-based shadows in real-time and offline rendering. These improvements make it possible to use correct, physically-based shadows in a broad range of scenes that previous methods cannot handle efficiently enough.
@phdthesis{Laine2006phd, author = {Samuli Laine}, title = {Efficient Physically-Based Shadow Algorithms}, school = {Helsinki University of Technology}, month = {August}, year = {2006}, }
An Incremental Shaft Subdivision Algorithm for Computing Shadows and Visibility. Samuli Laine. Master's thesis, Helsinki University of Technology, March 2006. Abstract Bibtex [PDF] |
The rendering of soft shadows is an important task in computer graphics. Soft shadows appear when the light source is not modeled as a single point but as an object with nonzero surface area. Obtaining correct physically-based shadows requires determining the amount of light that flows from the light source to a receiving point on the surface being rendered. This is generally computationally expensive, and efficient solution methods are needed for keeping the rendering times on a tolerable level.
There is usually significant coherence in shadows among nearby receiving points, and nearby parts of a light source also tend to contribute to the image in a similar fashion. Exploiting these forms of coherence is the key element of modern soft shadow algorithms.
This thesis presents a novel physically-based soft shadow algorithm that attempts to exploit the coherence as much as possible, solving the shadow relations in large chunks instead of considering single points in the emitting or receiving end. The computation of shadow relations is performed hierarchically, and an efficient representation of shadow-casting geometry is maintained incrementally. The algorithm is a generic tool for the solving sets of visibility relations in polygonal scenes, and may have uses in areas other than shadow computation as well.
In addition to presenting the novel algorithm in detail, several existing physically-based shadow algorithms are analyzed and ranked according to their computational complexities. Experimental results are also presented for illustrating the applicability of the novel algorithm in different kinds of rendering situations.
@mastersthesis{Laine2006msc, author = {Samuli Laine}, title = {An Incremental Shaft Subdivision Algorithm for Computing Shadows and Visibility}, school = {Helsinki University of Technology}, month = {March}, year = {2006}, }