Compare commits

..

60 Commits

Author SHA1 Message Date
ReinUsesLisp
6177cbdbe1 gl_shader_decompiler: Fixup slow path 2019-09-04 15:03:51 -03:00
ReinUsesLisp
9cf52d027d gl_device: Disable precise in fragment shaders on bugged drivers 2019-09-04 01:54:00 -03:00
ReinUsesLisp
03276e7490 gl_shader_decompiler: Fixup AMD's slow path type 2019-09-04 01:54:00 -03:00
ReinUsesLisp
6c449793b8 gl_shader_decompiler: Rework GLSL decompiler type system
GLSL decompiler type system was broken. We converted all return values
to float except for some cases where returning we couldn't and
implicitly broke the rule of returning floats (e.g. for bools or bool
pairs).

Instead of doing this introduce class Expression that knows what type a
return value has and when a consumer wants to use the string it asks for
it with a required type, emitting a runtime error if types are
incompatible.

This has the disadvantage that there's more C++ code, but we can emit
better GLSL code that's easier to read.
2019-09-04 01:54:00 -03:00
David
922c7f4e51 Merge pull request #2835 from chris062689/master
CI Fix (Azure/Testing) - apt update before upgrade. Use apt-get.
2019-09-04 14:37:29 +10:00
Flame Sage
ce3edaad26 Changed apt-get upgrade for specific package. 2019-09-04 00:15:51 -04:00
Flame Sage
573a1e7662 apt update before upgrade. Use apt-get.
Identified a bug in the script which uses the azure image when attempting to upgrade python3-pip.
Package index was out of date because apt-get update was not ran before attempting the upgrade.
2019-09-04 03:49:55 +00:00
David
e1981b8b8d Merge pull request #2708 from DarkLordZach/mii-db-source-crash
mii: Handle logging of unknown database source
2019-09-04 13:07:10 +10:00
bunnei
19af91434e Merge pull request #2793 from ReinUsesLisp/bgr565
renderer_opengl: Implement RGB565 framebuffer format
2019-09-03 22:36:32 -04:00
bunnei
81fbc5370d Merge pull request #2812 from ReinUsesLisp/f2i-selector
shader_ir/conversion: Implement F2I and F2F F16 selector
2019-09-03 22:35:33 -04:00
bunnei
d4f33b822b Merge pull request #2811 from ReinUsesLisp/fsetp-fix
float_set_predicate: Add missing negation bit for the second operand
2019-09-03 22:34:34 -04:00
bunnei
137d165672 Merge pull request #2826 from ReinUsesLisp/macro-binding
maxwell_3d: Fix macro binding cursor
2019-09-03 22:32:42 -04:00
bunnei
86b39e0677 Merge pull request #2831 from FearlessTobi/port-4914
Port citra-emu/citra#4914: "Fix to Windows sleep issues"
2019-09-03 22:32:09 -04:00
bunnei
7397289266 Merge pull request #2832 from FearlessTobi/im-an-idiot
configuration/config: Add missing screenshot path read
2019-09-03 22:31:32 -04:00
fearlessTobi
952f010c2c configuration/config: Add missing screenshot path read
I missed this in my original PR (https://github.com/yuzu-emu/yuzu/pull/1886).
2019-09-04 03:08:15 +02:00
fearlessTobi
4ea572791b Fix to Windows sleep issues
Co-Authored-By: Vitor K <vitor-k@users.noreply.github.com>
2019-09-03 23:00:34 +02:00
bunnei
50b5bb44a0 Merge pull request #2765 from FernandoS27/dma-fix
MaxwellDMA: Fixes, corrections and relaxations.
2019-09-01 13:13:05 -04:00
ReinUsesLisp
52a41f482f maxwell_3d: Fix macro binding cursor 2019-09-01 05:01:11 -03:00
Rodrigo Locatti
4d4f9cc104 video_core: Silent miscellaneous warnings (#2820)
* texture_cache/surface_params: Remove unused local variable

* rasterizer_interface: Add missing documentation commentary

* maxwell_dma: Remove unused rasterizer reference

* video_core/gpu: Sort member declaration order to silent -Wreorder warning

* fermi_2d: Remove unused MemoryManager reference

* video_core: Silent unused variable warnings

* buffer_cache: Silent -Wreorder warnings

* kepler_memory: Remove unused MemoryManager reference

* gl_texture_cache: Add missing override

* buffer_cache: Add missing include

* shader/decode: Remove unused variables
2019-08-30 14:08:00 -04:00
Fernando Sahmkow
67cc2d5046 Merge pull request #2819 from ReinUsesLisp/fixup-clang
gl_buffer_cache: Add missing include
2019-08-29 18:13:35 -04:00
ReinUsesLisp
878adee0a3 gl_buffer_cache: Add missing include
RasterizerInterface was considered an incomplete object by clang.
2019-08-29 22:02:52 +00:00
bunnei
a67c4e6e02 Merge pull request #2742 from ReinUsesLisp/fix-texture-buffers
gl_texture_cache: Miscellaneous texture buffer fixes
2019-08-29 15:59:17 -04:00
James Rowe
f0c75573b1 Revert "externals: Update FMT to 6.0.0"
This reverts commit ca4ca8a6dc.
2019-08-29 12:23:34 -06:00
Ethan
ca4ca8a6dc externals: Update FMT to 6.0.0 2019-08-29 19:37:46 +02:00
bunnei
e424615839 Merge pull request #2783 from FernandoS27/new-buffer-cache
Implement a New LLE Buffer Cache
2019-08-29 13:07:01 -04:00
bunnei
f8cc5668f8 Merge pull request #2758 from ReinUsesLisp/packed-tid
shader/decode: Implement S2R Tic
2019-08-29 12:58:43 -04:00
bunnei
680ab61327 Merge pull request #2786 from ReinUsesLisp/vote
shader_ir: Implement VOTE on Nvidia drivers
2019-08-29 12:58:10 -04:00
ReinUsesLisp
e3534700d7 shader_ir/conversion: Split int and float selector and implement F2F H1 2019-08-28 16:09:33 -03:00
ReinUsesLisp
b13fbc25b8 shader_ir/conversion: Implement F2I F16 Ra.H1 2019-08-27 23:40:40 -03:00
ReinUsesLisp
6207751b00 float_set_predicate: Add missing negation bit for the second operand 2019-08-27 21:57:43 -03:00
ReinUsesLisp
4e35177e23 shader_ir: Implement VOTE
Implement VOTE using Nvidia's intrinsics. Documentation about these can
be found here
https://developer.nvidia.com/reading-between-threads-shader-intrinsics

Instead of using portable ARB instructions I opted to use Nvidia
intrinsics because these are the closest we have to how Tegra X1
hardware renders.

To stub VOTE on non-Nvidia drivers (including nouveau) this commit
simulates a GPU with a warp size of one, returning what is meaningful
for the instruction being emulated:

* anyThreadNV(value) -> value
* allThreadsNV(value) -> value
* allThreadsEqualNV(value) -> true

ballotARB, also known as "uint64_t(activeThreadsNV())", emits

VOTE.ANY Rd, PT, PT;

on nouveau's compiler. This doesn't match exactly to Nvidia's code

VOTE.ALL Rd, PT, PT;

Which is emulated with activeThreadsNV() by this commit. In theory this
shouldn't really matter since .ANY, .ALL and .EQ affect the predicates
(set to PT on those cases) and not the registers.
2019-08-21 14:50:38 -03:00
Fernando Sahmkow
83ec2091c1 Buffer Cache: Adress Feedback. 2019-08-21 12:14:27 -04:00
Fernando Sahmkow
6ce2c85047 Buffer_Cache: Implement flushing. 2019-08-21 12:14:26 -04:00
Fernando Sahmkow
de8ff8a1c6 Buffer_Cache: Implement barriers. 2019-08-21 12:14:25 -04:00
Fernando Sahmkow
286f4c446a Buffer_Cache: Optimize and track written areas. 2019-08-21 12:14:25 -04:00
Fernando Sahmkow
5f4b746a1e BufferCache: Rework mapping caching. 2019-08-21 12:14:24 -04:00
Fernando Sahmkow
86d8563314 Buffer_Cache: Fixes and optimizations. 2019-08-21 12:14:23 -04:00
Fernando Sahmkow
862bec001b Video_Core: Implement a new Buffer Cache 2019-08-21 12:14:22 -04:00
bunnei
b4a8cfbd00 Merge pull request #2748 from FernandoS27/align-memory
VM_Manager: Align allocated host physical memory to 256bytes
2019-08-21 12:10:10 -04:00
bunnei
d654b3d82e Merge pull request #2769 from FernandoS27/commands-flush
GPU: Flush commands on every dma pusher step.
2019-08-21 10:29:56 -04:00
bunnei
dfdd20142e Merge pull request #2777 from ReinUsesLisp/hsetp2-fe3h-fix
half_set_predicate: Fix HSETP2_C constant buffer offset
2019-08-21 10:29:17 -04:00
bunnei
cedc1aab4a Merge pull request #2753 from FernandoS27/float-convert
Shader_Ir: Implement F16 Variants of F2F, F2I, I2F.
2019-08-21 10:27:57 -04:00
bunnei
74a7ce1df7 Merge pull request #2773 from lioncash/test-unused
yuzu-tester/yuzu: Remove unused variable
2019-08-21 10:27:29 -04:00
ReinUsesLisp
ec0da3ef64 half_set_predicate: Fix HSETP2_C constant buffer offset 2019-08-04 02:50:55 -03:00
Lioncash
6e11cfcdf0 yuzu-tester/yuzu: Correct format string
Prevents an invalid formatting exception from being thrown.
2019-07-29 20:55:48 -04:00
Lioncash
a0ee10b114 yuzu-tester/yuzu: Remove unused variable
Gets rid of a compilation warning.
2019-07-29 20:50:33 -04:00
Fernando Sahmkow
e52c895559 GPU: Flush commands on every dma pusher step.
This commit ensures that the host gpu is constantly fed with commands to
work with, while the guest gpu keeps producing the rest of the commands.
This reduces syncing time between host and guest gpu.
2019-07-26 16:54:22 -04:00
Fernando Sahmkow
a452ff983d MaxwellDMA: Fixes, corrections and relaxations.
This commit fixes offsets on Linear -> Tiled copies, corrects z pos
fortiled->linear copies, corrects bytes_per_pixel calculation in tiled
-> linear copies and relaxes some limitations set by latest dma fixes
refactors.
2019-07-25 20:41:42 -04:00
ReinUsesLisp
104641db07 shader/decode: Implement S2R Tic 2019-07-22 16:16:10 -03:00
Fernando Sahmkow
11f4e739bd Shader_Ir: Implement F16 Variants of F2F, F2I, I2F.
This commit takes care of implementing the F16 Variants of the 
conversion instructions and makes sure conversions are done.
2019-07-20 17:38:25 -04:00
Fernando Sahmkow
febb88efc4 Common/Alignment: Add noexcept where required. 2019-07-19 21:49:54 -04:00
Fernando Sahmkow
024b5fe91a Kernel: Address Feedback 2019-07-19 11:28:57 -04:00
Fernando Sahmkow
0901c33753 Common: Correct alignment allocator to work on C++14 or higher. 2019-07-19 11:11:42 -04:00
Fernando Sahmkow
9bede4eeed VM_Manager: Align allocated memory to 256bytes
This commit ensures that all backing memory allocated for the Guest CPU
is aligned to 256 bytes. This due to how gpu memory works and the heavy
constraints it has in the alignment of physical memory.
2019-07-19 10:06:08 -04:00
ReinUsesLisp
74632c76ce gl_shader_decompiler: Rename bufferImage to imageBuffer
The online OpenGL documentation is wrong. The type definition is
imageBuffer.
2019-07-18 01:16:44 -03:00
ReinUsesLisp
87909d327f gl_shader_cache: Fix newline on buffer preprocessor definitions 2019-07-18 01:16:15 -03:00
ReinUsesLisp
e7bdf8b22a textures: Fix texture buffer size calculation 2019-07-18 01:07:08 -03:00
ReinUsesLisp
84027f4808 gl_texture_cache: Do not set texture parameters to buffers 2019-07-18 01:06:26 -03:00
ReinUsesLisp
73b2dc6d4f gl_texture_cache: Add missing break in CreateTexture 2019-07-18 01:04:18 -03:00
Zach Hilman
37a352e9d3 mii: Handle logging of unknown database source 2019-07-10 07:07:24 -04:00
71 changed files with 1715 additions and 878 deletions

View File

@@ -10,7 +10,7 @@ jobs:
BuildSuffix: 'windows-testing'
ScriptFolder: 'windows'
steps:
- script: sudo apt upgrade python3-pip && pip install requests urllib3
- script: sudo apt-get update && sudo apt-get --only-upgrade -y install python3-pip && pip install requests urllib3
displayName: 'Prepare Environment'
- task: PythonScript@0
condition: eq(variables['Build.Reason'], 'PullRequest')

View File

@@ -81,6 +81,7 @@ set(HASH_FILES
"${VIDEO_CORE}/shader/decode/register_set_predicate.cpp"
"${VIDEO_CORE}/shader/decode/shift.cpp"
"${VIDEO_CORE}/shader/decode/video.cpp"
"${VIDEO_CORE}/shader/decode/warp.cpp"
"${VIDEO_CORE}/shader/decode/xmad.cpp"
"${VIDEO_CORE}/shader/control_flow.cpp"
"${VIDEO_CORE}/shader/control_flow.h"

View File

@@ -55,6 +55,7 @@ add_custom_command(OUTPUT scm_rev.cpp
"${VIDEO_CORE}/shader/decode/register_set_predicate.cpp"
"${VIDEO_CORE}/shader/decode/shift.cpp"
"${VIDEO_CORE}/shader/decode/video.cpp"
"${VIDEO_CORE}/shader/decode/warp.cpp"
"${VIDEO_CORE}/shader/decode/xmad.cpp"
"${VIDEO_CORE}/shader/control_flow.cpp"
"${VIDEO_CORE}/shader/control_flow.h"

View File

@@ -3,6 +3,7 @@
#pragma once
#include <cstddef>
#include <memory>
#include <type_traits>
namespace Common {
@@ -37,4 +38,63 @@ constexpr bool IsWordAligned(T value) {
return (value & 0b11) == 0;
}
template <typename T, std::size_t Align = 16>
class AlignmentAllocator {
public:
using value_type = T;
using size_type = std::size_t;
using difference_type = std::ptrdiff_t;
using pointer = T*;
using const_pointer = const T*;
using reference = T&;
using const_reference = const T&;
public:
pointer address(reference r) noexcept {
return std::addressof(r);
}
const_pointer address(const_reference r) const noexcept {
return std::addressof(r);
}
pointer allocate(size_type n) {
return static_cast<pointer>(::operator new (n, std::align_val_t{Align}));
}
void deallocate(pointer p, size_type) {
::operator delete (p, std::align_val_t{Align});
}
void construct(pointer p, const value_type& wert) {
new (p) value_type(wert);
}
void destroy(pointer p) {
p->~value_type();
}
size_type max_size() const noexcept {
return size_type(-1) / sizeof(value_type);
}
template <typename T2>
struct rebind {
using other = AlignmentAllocator<T2, Align>;
};
bool operator!=(const AlignmentAllocator<T, Align>& other) const noexcept {
return !(*this == other);
}
// Returns true if and only if storage allocated from *this
// can be deallocated from other, and vice versa.
// Always returns true for stateless allocators.
bool operator==(const AlignmentAllocator<T, Align>& other) const noexcept {
return true;
}
};
} // namespace Common

View File

@@ -8,6 +8,7 @@
#include <vector>
#include "common/common_types.h"
#include "core/hle/kernel/physical_memory.h"
namespace Kernel {
@@ -77,7 +78,7 @@ struct CodeSet final {
}
/// The overall data that backs this code set.
std::vector<u8> memory;
Kernel::PhysicalMemory memory;
/// The segments that comprise this code set.
std::array<Segment, 3> segments;

View File

@@ -0,0 +1,19 @@
// Copyright 2019 yuzu Emulator Project
// Licensed under GPLv2 or any later version
// Refer to the license.txt file included.
#pragma once
#include "common/alignment.h"
namespace Kernel {
// This encapsulation serves 2 purposes:
// - First, to encapsulate host physical memory under a single type and set an
// standard for managing it.
// - Second to ensure all host backing memory used is aligned to 256 bytes due
// to strict alignment restrictions on GPU memory.
using PhysicalMemory = std::vector<u8, Common::AlignmentAllocator<u8, 256>>;
} // namespace Kernel

View File

@@ -247,7 +247,7 @@ VAddr Process::CreateTLSRegion() {
ASSERT(region_address.Succeeded());
const auto map_result = vm_manager.MapMemoryBlock(
*region_address, std::make_shared<std::vector<u8>>(Memory::PAGE_SIZE), 0,
*region_address, std::make_shared<PhysicalMemory>(Memory::PAGE_SIZE), 0,
Memory::PAGE_SIZE, MemoryState::ThreadLocal);
ASSERT(map_result.Succeeded());
@@ -277,7 +277,7 @@ void Process::FreeTLSRegion(VAddr tls_address) {
}
void Process::LoadModule(CodeSet module_, VAddr base_addr) {
const auto memory = std::make_shared<std::vector<u8>>(std::move(module_.memory));
const auto memory = std::make_shared<PhysicalMemory>(std::move(module_.memory));
const auto MapSegment = [&](const CodeSet::Segment& segment, VMAPermission permissions,
MemoryState memory_state) {
@@ -327,7 +327,7 @@ void Process::AllocateMainThreadStack(u64 stack_size) {
// Allocate and map the main thread stack
const VAddr mapping_address = vm_manager.GetTLSIORegionEndAddress() - main_thread_stack_size;
vm_manager
.MapMemoryBlock(mapping_address, std::make_shared<std::vector<u8>>(main_thread_stack_size),
.MapMemoryBlock(mapping_address, std::make_shared<PhysicalMemory>(main_thread_stack_size),
0, main_thread_stack_size, MemoryState::Stack)
.Unwrap();
}

View File

@@ -28,7 +28,7 @@ SharedPtr<SharedMemory> SharedMemory::Create(KernelCore& kernel, Process* owner_
shared_memory->other_permissions = other_permissions;
if (address == 0) {
shared_memory->backing_block = std::make_shared<std::vector<u8>>(size);
shared_memory->backing_block = std::make_shared<Kernel::PhysicalMemory>(size);
shared_memory->backing_block_offset = 0;
// Refresh the address mappings for the current process.
@@ -59,8 +59,8 @@ SharedPtr<SharedMemory> SharedMemory::Create(KernelCore& kernel, Process* owner_
}
SharedPtr<SharedMemory> SharedMemory::CreateForApplet(
KernelCore& kernel, std::shared_ptr<std::vector<u8>> heap_block, std::size_t offset, u64 size,
MemoryPermission permissions, MemoryPermission other_permissions, std::string name) {
KernelCore& kernel, std::shared_ptr<Kernel::PhysicalMemory> heap_block, std::size_t offset,
u64 size, MemoryPermission permissions, MemoryPermission other_permissions, std::string name) {
SharedPtr<SharedMemory> shared_memory(new SharedMemory(kernel));
shared_memory->owner_process = nullptr;

View File

@@ -10,6 +10,7 @@
#include "common/common_types.h"
#include "core/hle/kernel/object.h"
#include "core/hle/kernel/physical_memory.h"
#include "core/hle/kernel/process.h"
#include "core/hle/result.h"
@@ -62,12 +63,10 @@ public:
* block.
* @param name Optional object name, used for debugging purposes.
*/
static SharedPtr<SharedMemory> CreateForApplet(KernelCore& kernel,
std::shared_ptr<std::vector<u8>> heap_block,
std::size_t offset, u64 size,
MemoryPermission permissions,
MemoryPermission other_permissions,
std::string name = "Unknown Applet");
static SharedPtr<SharedMemory> CreateForApplet(
KernelCore& kernel, std::shared_ptr<Kernel::PhysicalMemory> heap_block, std::size_t offset,
u64 size, MemoryPermission permissions, MemoryPermission other_permissions,
std::string name = "Unknown Applet");
std::string GetTypeName() const override {
return "SharedMemory";
@@ -135,7 +134,7 @@ private:
~SharedMemory() override;
/// Backing memory for this shared memory block.
std::shared_ptr<std::vector<u8>> backing_block;
std::shared_ptr<PhysicalMemory> backing_block;
/// Offset into the backing block for this shared memory.
std::size_t backing_block_offset = 0;
/// Size of the memory block. Page-aligned.

View File

@@ -47,7 +47,7 @@ ResultCode TransferMemory::MapMemory(VAddr address, u64 size, MemoryPermission p
return ERR_INVALID_STATE;
}
backing_block = std::make_shared<std::vector<u8>>(size);
backing_block = std::make_shared<PhysicalMemory>(size);
const auto map_state = owner_permissions == MemoryPermission::None
? MemoryState::TransferMemoryIsolated

View File

@@ -8,6 +8,7 @@
#include <vector>
#include "core/hle/kernel/object.h"
#include "core/hle/kernel/physical_memory.h"
union ResultCode;
@@ -82,7 +83,7 @@ private:
~TransferMemory() override;
/// Memory block backing this instance.
std::shared_ptr<std::vector<u8>> backing_block;
std::shared_ptr<PhysicalMemory> backing_block;
/// The base address for the memory managed by this instance.
VAddr base_address = 0;

View File

@@ -5,6 +5,7 @@
#include <algorithm>
#include <iterator>
#include <utility>
#include "common/alignment.h"
#include "common/assert.h"
#include "common/logging/log.h"
#include "common/memory_hook.h"
@@ -103,7 +104,7 @@ bool VMManager::IsValidHandle(VMAHandle handle) const {
}
ResultVal<VMManager::VMAHandle> VMManager::MapMemoryBlock(VAddr target,
std::shared_ptr<std::vector<u8>> block,
std::shared_ptr<PhysicalMemory> block,
std::size_t offset, u64 size,
MemoryState state, VMAPermission perm) {
ASSERT(block != nullptr);
@@ -260,7 +261,7 @@ ResultVal<VAddr> VMManager::SetHeapSize(u64 size) {
if (heap_memory == nullptr) {
// Initialize heap
heap_memory = std::make_shared<std::vector<u8>>(size);
heap_memory = std::make_shared<PhysicalMemory>(size);
heap_end = heap_region_base + size;
} else {
UnmapRange(heap_region_base, GetCurrentHeapSize());
@@ -341,7 +342,7 @@ ResultCode VMManager::MapPhysicalMemory(VAddr target, u64 size) {
const auto map_size = std::min(end_addr - cur_addr, vma_end - cur_addr);
if (vma.state == MemoryState::Unmapped) {
const auto map_res =
MapMemoryBlock(cur_addr, std::make_shared<std::vector<u8>>(map_size, 0), 0,
MapMemoryBlock(cur_addr, std::make_shared<PhysicalMemory>(map_size, 0), 0,
map_size, MemoryState::Heap, VMAPermission::ReadWrite);
result = map_res.Code();
if (result.IsError()) {
@@ -442,7 +443,7 @@ ResultCode VMManager::UnmapPhysicalMemory(VAddr target, u64 size) {
if (result.IsError()) {
for (const auto [map_address, map_size] : unmapped_regions) {
const auto remap_res =
MapMemoryBlock(map_address, std::make_shared<std::vector<u8>>(map_size, 0), 0,
MapMemoryBlock(map_address, std::make_shared<PhysicalMemory>(map_size, 0), 0,
map_size, MemoryState::Heap, VMAPermission::None);
ASSERT_MSG(remap_res.Succeeded(), "UnmapPhysicalMemory re-map on error");
}
@@ -593,7 +594,7 @@ ResultCode VMManager::MirrorMemory(VAddr dst_addr, VAddr src_addr, u64 size, Mem
ASSERT_MSG(vma_offset + size <= vma->second.size,
"Shared memory exceeds bounds of mapped block");
const std::shared_ptr<std::vector<u8>>& backing_block = vma->second.backing_block;
const std::shared_ptr<PhysicalMemory>& backing_block = vma->second.backing_block;
const std::size_t backing_block_offset = vma->second.offset + vma_offset;
CASCADE_RESULT(auto new_vma,
@@ -606,7 +607,7 @@ ResultCode VMManager::MirrorMemory(VAddr dst_addr, VAddr src_addr, u64 size, Mem
return RESULT_SUCCESS;
}
void VMManager::RefreshMemoryBlockMappings(const std::vector<u8>* block) {
void VMManager::RefreshMemoryBlockMappings(const PhysicalMemory* block) {
// If this ever proves to have a noticeable performance impact, allow users of the function to
// specify a specific range of addresses to limit the scan to.
for (const auto& p : vma_map) {
@@ -764,7 +765,7 @@ void VMManager::MergeAdjacentVMA(VirtualMemoryArea& left, const VirtualMemoryAre
right.backing_block->begin() + right.offset + right.size);
} else {
// Slow case: make a new memory block for left and right.
auto new_memory = std::make_shared<std::vector<u8>>();
auto new_memory = std::make_shared<PhysicalMemory>();
new_memory->insert(new_memory->end(), left.backing_block->begin() + left.offset,
left.backing_block->begin() + left.offset + left.size);
new_memory->insert(new_memory->end(), right.backing_block->begin() + right.offset,

View File

@@ -11,6 +11,7 @@
#include "common/common_types.h"
#include "common/memory_hook.h"
#include "common/page_table.h"
#include "core/hle/kernel/physical_memory.h"
#include "core/hle/result.h"
#include "core/memory.h"
@@ -290,7 +291,7 @@ struct VirtualMemoryArea {
// Settings for type = AllocatedMemoryBlock
/// Memory block backing this VMA.
std::shared_ptr<std::vector<u8>> backing_block = nullptr;
std::shared_ptr<PhysicalMemory> backing_block = nullptr;
/// Offset into the backing_memory the mapping starts from.
std::size_t offset = 0;
@@ -348,7 +349,7 @@ public:
* @param size Size of the mapping.
* @param state MemoryState tag to attach to the VMA.
*/
ResultVal<VMAHandle> MapMemoryBlock(VAddr target, std::shared_ptr<std::vector<u8>> block,
ResultVal<VMAHandle> MapMemoryBlock(VAddr target, std::shared_ptr<PhysicalMemory> block,
std::size_t offset, u64 size, MemoryState state,
VMAPermission perm = VMAPermission::ReadWrite);
@@ -547,7 +548,7 @@ public:
* Scans all VMAs and updates the page table range of any that use the given vector as backing
* memory. This should be called after any operation that causes reallocation of the vector.
*/
void RefreshMemoryBlockMappings(const std::vector<u8>* block);
void RefreshMemoryBlockMappings(const PhysicalMemory* block);
/// Dumps the address space layout to the log, for debugging
void LogLayout() const;
@@ -777,7 +778,7 @@ private:
// the entire virtual address space extents that bound the allocations, including any holes.
// This makes deallocation and reallocation of holes fast and keeps process memory contiguous
// in the emulator address space, allowing Memory::GetPointer to be reasonably safe.
std::shared_ptr<std::vector<u8>> heap_memory;
std::shared_ptr<PhysicalMemory> heap_memory;
// The end of the currently allocated heap. This is not an inclusive
// end of the range. This is essentially 'base_address + current_size'.

View File

@@ -175,6 +175,10 @@ MiiStoreData ConvertInfoToStoreData(const MiiInfo& info) {
} // namespace
std::ostream& operator<<(std::ostream& os, Source source) {
if (static_cast<std::size_t>(source) >= SOURCE_NAMES.size()) {
return os << "[UNKNOWN SOURCE]";
}
os << SOURCE_NAMES.at(static_cast<std::size_t>(source));
return os;
}

View File

@@ -77,7 +77,7 @@ enum class LoadState : u32 {
Done = 1,
};
static void DecryptSharedFont(const std::vector<u32>& input, std::vector<u8>& output,
static void DecryptSharedFont(const std::vector<u32>& input, Kernel::PhysicalMemory& output,
std::size_t& offset) {
ASSERT_MSG(offset + (input.size() * sizeof(u32)) < SHARED_FONT_MEM_SIZE,
"Shared fonts exceeds 17mb!");
@@ -94,7 +94,7 @@ static void DecryptSharedFont(const std::vector<u32>& input, std::vector<u8>& ou
offset += transformed_font.size() * sizeof(u32);
}
static void EncryptSharedFont(const std::vector<u8>& input, std::vector<u8>& output,
static void EncryptSharedFont(const std::vector<u8>& input, Kernel::PhysicalMemory& output,
std::size_t& offset) {
ASSERT_MSG(offset + input.size() + 8 < SHARED_FONT_MEM_SIZE, "Shared fonts exceeds 17mb!");
const u32 KEY = EXPECTED_MAGIC ^ EXPECTED_RESULT;
@@ -121,7 +121,7 @@ struct PL_U::Impl {
return shared_font_regions.at(index);
}
void BuildSharedFontsRawRegions(const std::vector<u8>& input) {
void BuildSharedFontsRawRegions(const Kernel::PhysicalMemory& input) {
// As we can derive the xor key we can just populate the offsets
// based on the shared memory dump
unsigned cur_offset = 0;
@@ -144,7 +144,7 @@ struct PL_U::Impl {
Kernel::SharedPtr<Kernel::SharedMemory> shared_font_mem;
/// Backing memory for the shared font data
std::shared_ptr<std::vector<u8>> shared_font;
std::shared_ptr<Kernel::PhysicalMemory> shared_font;
// Automatically populated based on shared_fonts dump or system archives.
std::vector<FontRegion> shared_font_regions;
@@ -166,7 +166,7 @@ PL_U::PL_U() : ServiceFramework("pl:u"), impl{std::make_unique<Impl>()} {
// Rebuild shared fonts from data ncas
if (nand->HasEntry(static_cast<u64>(FontArchives::Standard),
FileSys::ContentRecordType::Data)) {
impl->shared_font = std::make_shared<std::vector<u8>>(SHARED_FONT_MEM_SIZE);
impl->shared_font = std::make_shared<Kernel::PhysicalMemory>(SHARED_FONT_MEM_SIZE);
for (auto font : SHARED_FONTS) {
const auto nca =
nand->GetEntry(static_cast<u64>(font.first), FileSys::ContentRecordType::Data);
@@ -207,7 +207,7 @@ PL_U::PL_U() : ServiceFramework("pl:u"), impl{std::make_unique<Impl>()} {
}
} else {
impl->shared_font = std::make_shared<std::vector<u8>>(
impl->shared_font = std::make_shared<Kernel::PhysicalMemory>(
SHARED_FONT_MEM_SIZE); // Shared memory needs to always be allocated and a fixed size
const std::string user_path = FileUtil::GetUserPath(FileUtil::UserPath::SysDataDir);

View File

@@ -295,7 +295,7 @@ Kernel::CodeSet ElfReader::LoadInto(VAddr vaddr) {
}
}
std::vector<u8> program_image(total_image_size);
Kernel::PhysicalMemory program_image(total_image_size);
std::size_t current_image_position = 0;
Kernel::CodeSet codeset;

View File

@@ -69,7 +69,7 @@ AppLoader::LoadResult AppLoader_KIP::Load(Kernel::Process& process) {
const VAddr base_address = process.VMManager().GetCodeRegionBaseAddress();
Kernel::CodeSet codeset;
std::vector<u8> program_image;
Kernel::PhysicalMemory program_image;
const auto load_segment = [&program_image](Kernel::CodeSet::Segment& segment,
const std::vector<u8>& data, u32 offset) {

View File

@@ -143,7 +143,7 @@ static bool LoadNroImpl(Kernel::Process& process, const std::vector<u8>& data,
}
// Build program image
std::vector<u8> program_image(PageAlignSize(nro_header.file_size));
Kernel::PhysicalMemory program_image(PageAlignSize(nro_header.file_size));
std::memcpy(program_image.data(), data.data(), program_image.size());
if (program_image.size() != PageAlignSize(nro_header.file_size)) {
return {};

View File

@@ -89,7 +89,7 @@ std::optional<VAddr> AppLoader_NSO::LoadModule(Kernel::Process& process,
// Build program image
Kernel::CodeSet codeset;
std::vector<u8> program_image;
Kernel::PhysicalMemory program_image;
for (std::size_t i = 0; i < nso_header.segments.size(); ++i) {
std::vector<u8> data =
file.ReadBytes(nso_header.segments_compressed_size[i], nso_header.segments[i].offset);

View File

@@ -1,5 +1,7 @@
add_library(video_core STATIC
buffer_cache.h
buffer_cache/buffer_block.h
buffer_cache/buffer_cache.h
buffer_cache/map_interval.h
dma_pusher.cpp
dma_pusher.h
debug_utils/debug_utils.cpp
@@ -100,6 +102,7 @@ add_library(video_core STATIC
shader/decode/integer_set.cpp
shader/decode/half_set.cpp
shader/decode/video.cpp
shader/decode/warp.cpp
shader/decode/xmad.cpp
shader/decode/other.cpp
shader/control_flow.cpp

View File

@@ -1,299 +0,0 @@
// Copyright 2019 yuzu Emulator Project
// Licensed under GPLv2 or any later version
// Refer to the license.txt file included.
#pragma once
#include <array>
#include <memory>
#include <mutex>
#include <unordered_map>
#include <unordered_set>
#include <utility>
#include <vector>
#include "common/alignment.h"
#include "common/common_types.h"
#include "core/core.h"
#include "video_core/memory_manager.h"
#include "video_core/rasterizer_cache.h"
namespace VideoCore {
class RasterizerInterface;
}
namespace VideoCommon {
template <typename BufferStorageType>
class CachedBuffer final : public RasterizerCacheObject {
public:
explicit CachedBuffer(VAddr cpu_addr, u8* host_ptr)
: RasterizerCacheObject{host_ptr}, host_ptr{host_ptr}, cpu_addr{cpu_addr} {}
~CachedBuffer() override = default;
VAddr GetCpuAddr() const override {
return cpu_addr;
}
std::size_t GetSizeInBytes() const override {
return size;
}
u8* GetWritableHostPtr() const {
return host_ptr;
}
std::size_t GetSize() const {
return size;
}
std::size_t GetCapacity() const {
return capacity;
}
bool IsInternalized() const {
return is_internal;
}
const BufferStorageType& GetBuffer() const {
return buffer;
}
void SetSize(std::size_t new_size) {
size = new_size;
}
void SetInternalState(bool is_internal_) {
is_internal = is_internal_;
}
BufferStorageType ExchangeBuffer(BufferStorageType buffer_, std::size_t new_capacity) {
capacity = new_capacity;
std::swap(buffer, buffer_);
return buffer_;
}
private:
u8* host_ptr{};
VAddr cpu_addr{};
std::size_t size{};
std::size_t capacity{};
bool is_internal{};
BufferStorageType buffer;
};
template <typename BufferStorageType, typename BufferType, typename StreamBuffer>
class BufferCache : public RasterizerCache<std::shared_ptr<CachedBuffer<BufferStorageType>>> {
public:
using Buffer = std::shared_ptr<CachedBuffer<BufferStorageType>>;
using BufferInfo = std::pair<const BufferType*, u64>;
explicit BufferCache(VideoCore::RasterizerInterface& rasterizer, Core::System& system,
std::unique_ptr<StreamBuffer> stream_buffer)
: RasterizerCache<Buffer>{rasterizer}, system{system},
stream_buffer{std::move(stream_buffer)}, stream_buffer_handle{
this->stream_buffer->GetHandle()} {}
~BufferCache() = default;
void Unregister(const Buffer& entry) override {
std::lock_guard lock{RasterizerCache<Buffer>::mutex};
if (entry->IsInternalized()) {
internalized_entries.erase(entry->GetCacheAddr());
}
ReserveBuffer(entry);
RasterizerCache<Buffer>::Unregister(entry);
}
void TickFrame() {
marked_for_destruction_index =
(marked_for_destruction_index + 1) % marked_for_destruction_ring_buffer.size();
MarkedForDestruction().clear();
}
BufferInfo UploadMemory(GPUVAddr gpu_addr, std::size_t size, std::size_t alignment = 4,
bool internalize = false, bool is_written = false) {
std::lock_guard lock{RasterizerCache<Buffer>::mutex};
auto& memory_manager = system.GPU().MemoryManager();
const auto host_ptr = memory_manager.GetPointer(gpu_addr);
if (!host_ptr) {
return {GetEmptyBuffer(size), 0};
}
const auto cache_addr = ToCacheAddr(host_ptr);
// Cache management is a big overhead, so only cache entries with a given size.
// TODO: Figure out which size is the best for given games.
constexpr std::size_t max_stream_size = 0x800;
if (!internalize && size < max_stream_size &&
internalized_entries.find(cache_addr) == internalized_entries.end()) {
return StreamBufferUpload(host_ptr, size, alignment);
}
auto entry = RasterizerCache<Buffer>::TryGet(cache_addr);
if (!entry) {
return FixedBufferUpload(gpu_addr, host_ptr, size, internalize, is_written);
}
if (entry->GetSize() < size) {
IncreaseBufferSize(entry, size);
}
if (is_written) {
entry->MarkAsModified(true, *this);
}
return {ToHandle(entry->GetBuffer()), 0};
}
/// Uploads from a host memory. Returns the OpenGL buffer where it's located and its offset.
BufferInfo UploadHostMemory(const void* raw_pointer, std::size_t size,
std::size_t alignment = 4) {
std::lock_guard lock{RasterizerCache<Buffer>::mutex};
return StreamBufferUpload(raw_pointer, size, alignment);
}
void Map(std::size_t max_size) {
std::tie(buffer_ptr, buffer_offset_base, invalidated) = stream_buffer->Map(max_size, 4);
buffer_offset = buffer_offset_base;
}
/// Finishes the upload stream, returns true on bindings invalidation.
bool Unmap() {
stream_buffer->Unmap(buffer_offset - buffer_offset_base);
return std::exchange(invalidated, false);
}
virtual const BufferType* GetEmptyBuffer(std::size_t size) = 0;
protected:
void FlushObjectInner(const Buffer& entry) override {
DownloadBufferData(entry->GetBuffer(), 0, entry->GetSize(), entry->GetWritableHostPtr());
}
virtual BufferStorageType CreateBuffer(std::size_t size) = 0;
virtual const BufferType* ToHandle(const BufferStorageType& storage) = 0;
virtual void UploadBufferData(const BufferStorageType& buffer, std::size_t offset,
std::size_t size, const u8* data) = 0;
virtual void DownloadBufferData(const BufferStorageType& buffer, std::size_t offset,
std::size_t size, u8* data) = 0;
virtual void CopyBufferData(const BufferStorageType& src, const BufferStorageType& dst,
std::size_t src_offset, std::size_t dst_offset,
std::size_t size) = 0;
private:
BufferInfo StreamBufferUpload(const void* raw_pointer, std::size_t size,
std::size_t alignment) {
AlignBuffer(alignment);
const std::size_t uploaded_offset = buffer_offset;
std::memcpy(buffer_ptr, raw_pointer, size);
buffer_ptr += size;
buffer_offset += size;
return {&stream_buffer_handle, uploaded_offset};
}
BufferInfo FixedBufferUpload(GPUVAddr gpu_addr, u8* host_ptr, std::size_t size,
bool internalize, bool is_written) {
auto& memory_manager = Core::System::GetInstance().GPU().MemoryManager();
const auto cpu_addr = memory_manager.GpuToCpuAddress(gpu_addr);
ASSERT(cpu_addr);
auto entry = GetUncachedBuffer(*cpu_addr, host_ptr);
entry->SetSize(size);
entry->SetInternalState(internalize);
RasterizerCache<Buffer>::Register(entry);
if (internalize) {
internalized_entries.emplace(ToCacheAddr(host_ptr));
}
if (is_written) {
entry->MarkAsModified(true, *this);
}
if (entry->GetCapacity() < size) {
MarkedForDestruction().push_back(entry->ExchangeBuffer(CreateBuffer(size), size));
}
UploadBufferData(entry->GetBuffer(), 0, size, host_ptr);
return {ToHandle(entry->GetBuffer()), 0};
}
void IncreaseBufferSize(Buffer& entry, std::size_t new_size) {
const std::size_t old_size = entry->GetSize();
if (entry->GetCapacity() < new_size) {
const auto& old_buffer = entry->GetBuffer();
auto new_buffer = CreateBuffer(new_size);
// Copy bits from the old buffer to the new buffer.
CopyBufferData(old_buffer, new_buffer, 0, 0, old_size);
MarkedForDestruction().push_back(
entry->ExchangeBuffer(std::move(new_buffer), new_size));
// This buffer could have been used
invalidated = true;
}
// Upload the new bits.
const std::size_t size_diff = new_size - old_size;
UploadBufferData(entry->GetBuffer(), old_size, size_diff, entry->GetHostPtr() + old_size);
// Update entry's size in the object and in the cache.
Unregister(entry);
entry->SetSize(new_size);
RasterizerCache<Buffer>::Register(entry);
}
Buffer GetUncachedBuffer(VAddr cpu_addr, u8* host_ptr) {
if (auto entry = TryGetReservedBuffer(host_ptr)) {
return entry;
}
return std::make_shared<CachedBuffer<BufferStorageType>>(cpu_addr, host_ptr);
}
Buffer TryGetReservedBuffer(u8* host_ptr) {
const auto it = buffer_reserve.find(ToCacheAddr(host_ptr));
if (it == buffer_reserve.end()) {
return {};
}
auto& reserve = it->second;
auto entry = reserve.back();
reserve.pop_back();
return entry;
}
void ReserveBuffer(Buffer entry) {
buffer_reserve[entry->GetCacheAddr()].push_back(std::move(entry));
}
void AlignBuffer(std::size_t alignment) {
// Align the offset, not the mapped pointer
const std::size_t offset_aligned = Common::AlignUp(buffer_offset, alignment);
buffer_ptr += offset_aligned - buffer_offset;
buffer_offset = offset_aligned;
}
std::vector<BufferStorageType>& MarkedForDestruction() {
return marked_for_destruction_ring_buffer[marked_for_destruction_index];
}
Core::System& system;
std::unique_ptr<StreamBuffer> stream_buffer;
BufferType stream_buffer_handle{};
bool invalidated = false;
u8* buffer_ptr = nullptr;
u64 buffer_offset = 0;
u64 buffer_offset_base = 0;
std::size_t marked_for_destruction_index = 0;
std::array<std::vector<BufferStorageType>, 4> marked_for_destruction_ring_buffer;
std::unordered_set<CacheAddr> internalized_entries;
std::unordered_map<CacheAddr, std::vector<Buffer>> buffer_reserve;
};
} // namespace VideoCommon

View File

@@ -0,0 +1,76 @@
// Copyright 2019 yuzu Emulator Project
// Licensed under GPLv2 or any later version
// Refer to the license.txt file included.
#pragma once
#include <unordered_set>
#include <utility>
#include "common/alignment.h"
#include "common/common_types.h"
#include "video_core/gpu.h"
namespace VideoCommon {
class BufferBlock {
public:
bool Overlaps(const CacheAddr start, const CacheAddr end) const {
return (cache_addr < end) && (cache_addr_end > start);
}
bool IsInside(const CacheAddr other_start, const CacheAddr other_end) const {
return cache_addr <= other_start && other_end <= cache_addr_end;
}
u8* GetWritableHostPtr() const {
return FromCacheAddr(cache_addr);
}
u8* GetWritableHostPtr(std::size_t offset) const {
return FromCacheAddr(cache_addr + offset);
}
std::size_t GetOffset(const CacheAddr in_addr) {
return static_cast<std::size_t>(in_addr - cache_addr);
}
CacheAddr GetCacheAddr() const {
return cache_addr;
}
CacheAddr GetCacheAddrEnd() const {
return cache_addr_end;
}
void SetCacheAddr(const CacheAddr new_addr) {
cache_addr = new_addr;
cache_addr_end = new_addr + size;
}
std::size_t GetSize() const {
return size;
}
void SetEpoch(u64 new_epoch) {
epoch = new_epoch;
}
u64 GetEpoch() {
return epoch;
}
protected:
explicit BufferBlock(CacheAddr cache_addr, const std::size_t size) : size{size} {
SetCacheAddr(cache_addr);
}
~BufferBlock() = default;
private:
CacheAddr cache_addr{};
CacheAddr cache_addr_end{};
std::size_t size{};
u64 epoch{};
};
} // namespace VideoCommon

View File

@@ -0,0 +1,447 @@
// Copyright 2019 yuzu Emulator Project
// Licensed under GPLv2 or any later version
// Refer to the license.txt file included.
#pragma once
#include <array>
#include <memory>
#include <mutex>
#include <unordered_map>
#include <unordered_set>
#include <utility>
#include <vector>
#include "common/alignment.h"
#include "common/common_types.h"
#include "core/core.h"
#include "video_core/buffer_cache/buffer_block.h"
#include "video_core/buffer_cache/map_interval.h"
#include "video_core/memory_manager.h"
#include "video_core/rasterizer_interface.h"
namespace VideoCommon {
using MapInterval = std::shared_ptr<MapIntervalBase>;
template <typename TBuffer, typename TBufferType, typename StreamBuffer>
class BufferCache {
public:
using BufferInfo = std::pair<const TBufferType*, u64>;
BufferInfo UploadMemory(GPUVAddr gpu_addr, std::size_t size, std::size_t alignment = 4,
bool is_written = false) {
std::lock_guard lock{mutex};
auto& memory_manager = system.GPU().MemoryManager();
const auto host_ptr = memory_manager.GetPointer(gpu_addr);
if (!host_ptr) {
return {GetEmptyBuffer(size), 0};
}
const auto cache_addr = ToCacheAddr(host_ptr);
// Cache management is a big overhead, so only cache entries with a given size.
// TODO: Figure out which size is the best for given games.
constexpr std::size_t max_stream_size = 0x800;
if (size < max_stream_size) {
if (!is_written && !IsRegionWritten(cache_addr, cache_addr + size - 1)) {
return StreamBufferUpload(host_ptr, size, alignment);
}
}
auto block = GetBlock(cache_addr, size);
auto map = MapAddress(block, gpu_addr, cache_addr, size);
if (is_written) {
map->MarkAsModified(true, GetModifiedTicks());
if (!map->IsWritten()) {
map->MarkAsWritten(true);
MarkRegionAsWritten(map->GetStart(), map->GetEnd() - 1);
}
} else {
if (map->IsWritten()) {
WriteBarrier();
}
}
const u64 offset = static_cast<u64>(block->GetOffset(cache_addr));
return {ToHandle(block), offset};
}
/// Uploads from a host memory. Returns the OpenGL buffer where it's located and its offset.
BufferInfo UploadHostMemory(const void* raw_pointer, std::size_t size,
std::size_t alignment = 4) {
std::lock_guard lock{mutex};
return StreamBufferUpload(raw_pointer, size, alignment);
}
void Map(std::size_t max_size) {
std::lock_guard lock{mutex};
std::tie(buffer_ptr, buffer_offset_base, invalidated) = stream_buffer->Map(max_size, 4);
buffer_offset = buffer_offset_base;
}
/// Finishes the upload stream, returns true on bindings invalidation.
bool Unmap() {
std::lock_guard lock{mutex};
stream_buffer->Unmap(buffer_offset - buffer_offset_base);
return std::exchange(invalidated, false);
}
void TickFrame() {
++epoch;
while (!pending_destruction.empty()) {
if (pending_destruction.front()->GetEpoch() + 1 > epoch) {
break;
}
pending_destruction.pop_front();
}
}
/// Write any cached resources overlapping the specified region back to memory
void FlushRegion(CacheAddr addr, std::size_t size) {
std::lock_guard lock{mutex};
std::vector<MapInterval> objects = GetMapsInRange(addr, size);
std::sort(objects.begin(), objects.end(), [](const MapInterval& a, const MapInterval& b) {
return a->GetModificationTick() < b->GetModificationTick();
});
for (auto& object : objects) {
if (object->IsModified() && object->IsRegistered()) {
FlushMap(object);
}
}
}
/// Mark the specified region as being invalidated
void InvalidateRegion(CacheAddr addr, u64 size) {
std::lock_guard lock{mutex};
std::vector<MapInterval> objects = GetMapsInRange(addr, size);
for (auto& object : objects) {
if (object->IsRegistered()) {
Unregister(object);
}
}
}
virtual const TBufferType* GetEmptyBuffer(std::size_t size) = 0;
protected:
explicit BufferCache(VideoCore::RasterizerInterface& rasterizer, Core::System& system,
std::unique_ptr<StreamBuffer> stream_buffer)
: rasterizer{rasterizer}, system{system}, stream_buffer{std::move(stream_buffer)},
stream_buffer_handle{this->stream_buffer->GetHandle()} {}
~BufferCache() = default;
virtual const TBufferType* ToHandle(const TBuffer& storage) = 0;
virtual void WriteBarrier() = 0;
virtual TBuffer CreateBlock(CacheAddr cache_addr, std::size_t size) = 0;
virtual void UploadBlockData(const TBuffer& buffer, std::size_t offset, std::size_t size,
const u8* data) = 0;
virtual void DownloadBlockData(const TBuffer& buffer, std::size_t offset, std::size_t size,
u8* data) = 0;
virtual void CopyBlock(const TBuffer& src, const TBuffer& dst, std::size_t src_offset,
std::size_t dst_offset, std::size_t size) = 0;
/// Register an object into the cache
void Register(const MapInterval& new_map, bool inherit_written = false) {
const CacheAddr cache_ptr = new_map->GetStart();
const std::optional<VAddr> cpu_addr =
system.GPU().MemoryManager().GpuToCpuAddress(new_map->GetGpuAddress());
if (!cache_ptr || !cpu_addr) {
LOG_CRITICAL(HW_GPU, "Failed to register buffer with unmapped gpu_address 0x{:016x}",
new_map->GetGpuAddress());
return;
}
const std::size_t size = new_map->GetEnd() - new_map->GetStart();
new_map->SetCpuAddress(*cpu_addr);
new_map->MarkAsRegistered(true);
const IntervalType interval{new_map->GetStart(), new_map->GetEnd()};
mapped_addresses.insert({interval, new_map});
rasterizer.UpdatePagesCachedCount(*cpu_addr, size, 1);
if (inherit_written) {
MarkRegionAsWritten(new_map->GetStart(), new_map->GetEnd() - 1);
new_map->MarkAsWritten(true);
}
}
/// Unregisters an object from the cache
void Unregister(MapInterval& map) {
const std::size_t size = map->GetEnd() - map->GetStart();
rasterizer.UpdatePagesCachedCount(map->GetCpuAddress(), size, -1);
map->MarkAsRegistered(false);
if (map->IsWritten()) {
UnmarkRegionAsWritten(map->GetStart(), map->GetEnd() - 1);
}
const IntervalType delete_interval{map->GetStart(), map->GetEnd()};
mapped_addresses.erase(delete_interval);
}
private:
MapInterval CreateMap(const CacheAddr start, const CacheAddr end, const GPUVAddr gpu_addr) {
return std::make_shared<MapIntervalBase>(start, end, gpu_addr);
}
MapInterval MapAddress(const TBuffer& block, const GPUVAddr gpu_addr,
const CacheAddr cache_addr, const std::size_t size) {
std::vector<MapInterval> overlaps = GetMapsInRange(cache_addr, size);
if (overlaps.empty()) {
const CacheAddr cache_addr_end = cache_addr + size;
MapInterval new_map = CreateMap(cache_addr, cache_addr_end, gpu_addr);
u8* host_ptr = FromCacheAddr(cache_addr);
UploadBlockData(block, block->GetOffset(cache_addr), size, host_ptr);
Register(new_map);
return new_map;
}
const CacheAddr cache_addr_end = cache_addr + size;
if (overlaps.size() == 1) {
MapInterval& current_map = overlaps[0];
if (current_map->IsInside(cache_addr, cache_addr_end)) {
return current_map;
}
}
CacheAddr new_start = cache_addr;
CacheAddr new_end = cache_addr_end;
bool write_inheritance = false;
bool modified_inheritance = false;
// Calculate new buffer parameters
for (auto& overlap : overlaps) {
new_start = std::min(overlap->GetStart(), new_start);
new_end = std::max(overlap->GetEnd(), new_end);
write_inheritance |= overlap->IsWritten();
modified_inheritance |= overlap->IsModified();
}
GPUVAddr new_gpu_addr = gpu_addr + new_start - cache_addr;
for (auto& overlap : overlaps) {
Unregister(overlap);
}
UpdateBlock(block, new_start, new_end, overlaps);
MapInterval new_map = CreateMap(new_start, new_end, new_gpu_addr);
if (modified_inheritance) {
new_map->MarkAsModified(true, GetModifiedTicks());
}
Register(new_map, write_inheritance);
return new_map;
}
void UpdateBlock(const TBuffer& block, CacheAddr start, CacheAddr end,
std::vector<MapInterval>& overlaps) {
const IntervalType base_interval{start, end};
IntervalSet interval_set{};
interval_set.add(base_interval);
for (auto& overlap : overlaps) {
const IntervalType subtract{overlap->GetStart(), overlap->GetEnd()};
interval_set.subtract(subtract);
}
for (auto& interval : interval_set) {
std::size_t size = interval.upper() - interval.lower();
if (size > 0) {
u8* host_ptr = FromCacheAddr(interval.lower());
UploadBlockData(block, block->GetOffset(interval.lower()), size, host_ptr);
}
}
}
std::vector<MapInterval> GetMapsInRange(CacheAddr addr, std::size_t size) {
if (size == 0) {
return {};
}
std::vector<MapInterval> objects{};
const IntervalType interval{addr, addr + size};
for (auto& pair : boost::make_iterator_range(mapped_addresses.equal_range(interval))) {
objects.push_back(pair.second);
}
return objects;
}
/// Returns a ticks counter used for tracking when cached objects were last modified
u64 GetModifiedTicks() {
return ++modified_ticks;
}
void FlushMap(MapInterval map) {
std::size_t size = map->GetEnd() - map->GetStart();
TBuffer block = blocks[map->GetStart() >> block_page_bits];
u8* host_ptr = FromCacheAddr(map->GetStart());
DownloadBlockData(block, block->GetOffset(map->GetStart()), size, host_ptr);
map->MarkAsModified(false, 0);
}
BufferInfo StreamBufferUpload(const void* raw_pointer, std::size_t size,
std::size_t alignment) {
AlignBuffer(alignment);
const std::size_t uploaded_offset = buffer_offset;
std::memcpy(buffer_ptr, raw_pointer, size);
buffer_ptr += size;
buffer_offset += size;
return {&stream_buffer_handle, uploaded_offset};
}
void AlignBuffer(std::size_t alignment) {
// Align the offset, not the mapped pointer
const std::size_t offset_aligned = Common::AlignUp(buffer_offset, alignment);
buffer_ptr += offset_aligned - buffer_offset;
buffer_offset = offset_aligned;
}
TBuffer EnlargeBlock(TBuffer buffer) {
const std::size_t old_size = buffer->GetSize();
const std::size_t new_size = old_size + block_page_size;
const CacheAddr cache_addr = buffer->GetCacheAddr();
TBuffer new_buffer = CreateBlock(cache_addr, new_size);
CopyBlock(buffer, new_buffer, 0, 0, old_size);
buffer->SetEpoch(epoch);
pending_destruction.push_back(buffer);
const CacheAddr cache_addr_end = cache_addr + new_size - 1;
u64 page_start = cache_addr >> block_page_bits;
const u64 page_end = cache_addr_end >> block_page_bits;
while (page_start <= page_end) {
blocks[page_start] = new_buffer;
++page_start;
}
return new_buffer;
}
TBuffer MergeBlocks(TBuffer first, TBuffer second) {
const std::size_t size_1 = first->GetSize();
const std::size_t size_2 = second->GetSize();
const CacheAddr first_addr = first->GetCacheAddr();
const CacheAddr second_addr = second->GetCacheAddr();
const CacheAddr new_addr = std::min(first_addr, second_addr);
const std::size_t new_size = size_1 + size_2;
TBuffer new_buffer = CreateBlock(new_addr, new_size);
CopyBlock(first, new_buffer, 0, new_buffer->GetOffset(first_addr), size_1);
CopyBlock(second, new_buffer, 0, new_buffer->GetOffset(second_addr), size_2);
first->SetEpoch(epoch);
second->SetEpoch(epoch);
pending_destruction.push_back(first);
pending_destruction.push_back(second);
const CacheAddr cache_addr_end = new_addr + new_size - 1;
u64 page_start = new_addr >> block_page_bits;
const u64 page_end = cache_addr_end >> block_page_bits;
while (page_start <= page_end) {
blocks[page_start] = new_buffer;
++page_start;
}
return new_buffer;
}
TBuffer GetBlock(const CacheAddr cache_addr, const std::size_t size) {
TBuffer found{};
const CacheAddr cache_addr_end = cache_addr + size - 1;
u64 page_start = cache_addr >> block_page_bits;
const u64 page_end = cache_addr_end >> block_page_bits;
while (page_start <= page_end) {
auto it = blocks.find(page_start);
if (it == blocks.end()) {
if (found) {
found = EnlargeBlock(found);
} else {
const CacheAddr start_addr = (page_start << block_page_bits);
found = CreateBlock(start_addr, block_page_size);
blocks[page_start] = found;
}
} else {
if (found) {
if (found == it->second) {
++page_start;
continue;
}
found = MergeBlocks(found, it->second);
} else {
found = it->second;
}
}
++page_start;
}
return found;
}
void MarkRegionAsWritten(const CacheAddr start, const CacheAddr end) {
u64 page_start = start >> write_page_bit;
const u64 page_end = end >> write_page_bit;
while (page_start <= page_end) {
auto it = written_pages.find(page_start);
if (it != written_pages.end()) {
it->second = it->second + 1;
} else {
written_pages[page_start] = 1;
}
page_start++;
}
}
void UnmarkRegionAsWritten(const CacheAddr start, const CacheAddr end) {
u64 page_start = start >> write_page_bit;
const u64 page_end = end >> write_page_bit;
while (page_start <= page_end) {
auto it = written_pages.find(page_start);
if (it != written_pages.end()) {
if (it->second > 1) {
it->second = it->second - 1;
} else {
written_pages.erase(it);
}
}
page_start++;
}
}
bool IsRegionWritten(const CacheAddr start, const CacheAddr end) const {
u64 page_start = start >> write_page_bit;
const u64 page_end = end >> write_page_bit;
while (page_start <= page_end) {
if (written_pages.count(page_start) > 0) {
return true;
}
page_start++;
}
return false;
}
VideoCore::RasterizerInterface& rasterizer;
Core::System& system;
std::unique_ptr<StreamBuffer> stream_buffer;
TBufferType stream_buffer_handle{};
bool invalidated = false;
u8* buffer_ptr = nullptr;
u64 buffer_offset = 0;
u64 buffer_offset_base = 0;
using IntervalSet = boost::icl::interval_set<CacheAddr>;
using IntervalCache = boost::icl::interval_map<CacheAddr, MapInterval>;
using IntervalType = typename IntervalCache::interval_type;
IntervalCache mapped_addresses{};
static constexpr u64 write_page_bit{11};
std::unordered_map<u64, u32> written_pages{};
static constexpr u64 block_page_bits{21};
static constexpr u64 block_page_size{1 << block_page_bits};
std::unordered_map<u64, TBuffer> blocks{};
std::list<TBuffer> pending_destruction{};
u64 epoch{};
u64 modified_ticks{};
std::recursive_mutex mutex;
};
} // namespace VideoCommon

View File

@@ -0,0 +1,89 @@
// Copyright 2019 yuzu Emulator Project
// Licensed under GPLv2 or any later version
// Refer to the license.txt file included.
#pragma once
#include "common/common_types.h"
#include "video_core/gpu.h"
namespace VideoCommon {
class MapIntervalBase {
public:
MapIntervalBase(const CacheAddr start, const CacheAddr end, const GPUVAddr gpu_addr)
: start{start}, end{end}, gpu_addr{gpu_addr} {}
void SetCpuAddress(VAddr new_cpu_addr) {
cpu_addr = new_cpu_addr;
}
VAddr GetCpuAddress() const {
return cpu_addr;
}
GPUVAddr GetGpuAddress() const {
return gpu_addr;
}
bool IsInside(const CacheAddr other_start, const CacheAddr other_end) const {
return (start <= other_start && other_end <= end);
}
bool operator==(const MapIntervalBase& rhs) const {
return std::tie(start, end) == std::tie(rhs.start, rhs.end);
}
bool operator!=(const MapIntervalBase& rhs) const {
return !operator==(rhs);
}
void MarkAsRegistered(const bool registered) {
is_registered = registered;
}
bool IsRegistered() const {
return is_registered;
}
CacheAddr GetStart() const {
return start;
}
CacheAddr GetEnd() const {
return end;
}
void MarkAsModified(const bool is_modified_, const u64 tick) {
is_modified = is_modified_;
ticks = tick;
}
bool IsModified() const {
return is_modified;
}
u64 GetModificationTick() const {
return ticks;
}
void MarkAsWritten(const bool is_written_) {
is_written = is_written_;
}
bool IsWritten() const {
return is_written;
}
private:
CacheAddr start;
CacheAddr end;
GPUVAddr gpu_addr;
VAddr cpu_addr{};
bool is_written{};
bool is_modified{};
bool is_registered{};
u64 ticks{};
};
} // namespace VideoCommon

View File

@@ -31,6 +31,7 @@ void DmaPusher::DispatchCalls() {
break;
}
}
gpu.FlushCommands();
}
bool DmaPusher::Step() {

View File

@@ -10,8 +10,7 @@
namespace Tegra::Engines {
Fermi2D::Fermi2D(VideoCore::RasterizerInterface& rasterizer, MemoryManager& memory_manager)
: rasterizer{rasterizer}, memory_manager{memory_manager} {}
Fermi2D::Fermi2D(VideoCore::RasterizerInterface& rasterizer) : rasterizer{rasterizer} {}
void Fermi2D::CallMethod(const GPU::MethodCall& method_call) {
ASSERT_MSG(method_call.method < Regs::NUM_REGS,

View File

@@ -33,7 +33,7 @@ namespace Tegra::Engines {
class Fermi2D final {
public:
explicit Fermi2D(VideoCore::RasterizerInterface& rasterizer, MemoryManager& memory_manager);
explicit Fermi2D(VideoCore::RasterizerInterface& rasterizer);
~Fermi2D() = default;
/// Write the value to the register identified by method.
@@ -145,7 +145,6 @@ public:
private:
VideoCore::RasterizerInterface& rasterizer;
MemoryManager& memory_manager;
/// Performs the copy from the source surface to the destination surface as configured in the
/// registers.

View File

@@ -15,7 +15,7 @@
namespace Tegra::Engines {
KeplerMemory::KeplerMemory(Core::System& system, MemoryManager& memory_manager)
: system{system}, memory_manager{memory_manager}, upload_state{memory_manager, regs.upload} {}
: system{system}, upload_state{memory_manager, regs.upload} {}
KeplerMemory::~KeplerMemory() = default;

View File

@@ -65,7 +65,6 @@ public:
private:
Core::System& system;
MemoryManager& memory_manager;
Upload::State upload_state;
};

View File

@@ -249,16 +249,10 @@ void Maxwell3D::CallMacroMethod(u32 method, std::vector<u32> parameters) {
executing_macro = 0;
// Lookup the macro offset
const u32 entry{(method - MacroRegistersStart) >> 1};
const auto& search{macro_offsets.find(entry)};
if (search == macro_offsets.end()) {
LOG_CRITICAL(HW_GPU, "macro not found for method 0x{:X}!", method);
UNREACHABLE();
return;
}
const u32 entry = ((method - MacroRegistersStart) >> 1) % macro_positions.size();
// Execute the current macro.
macro_interpreter.Execute(search->second, std::move(parameters));
macro_interpreter.Execute(macro_positions[entry], std::move(parameters));
}
void Maxwell3D::CallMethod(const GPU::MethodCall& method_call) {
@@ -421,7 +415,7 @@ void Maxwell3D::ProcessMacroUpload(u32 data) {
}
void Maxwell3D::ProcessMacroBind(u32 data) {
macro_offsets[regs.macros.entry] = data;
macro_positions[regs.macros.entry++] = data;
}
void Maxwell3D::ProcessQueryGet() {
@@ -524,7 +518,7 @@ void Maxwell3D::ProcessQueryCondition() {
void Maxwell3D::ProcessSyncPoint() {
const u32 sync_point = regs.sync_info.sync_point.Value();
const u32 increment = regs.sync_info.increment.Value();
const u32 cache_flush = regs.sync_info.unknown.Value();
[[maybe_unused]] const u32 cache_flush = regs.sync_info.unknown.Value();
if (increment) {
system.GPU().IncrementSyncPoint(sync_point);
}
@@ -626,10 +620,10 @@ Texture::TICEntry Maxwell3D::GetTICEntry(u32 tic_index) const {
Texture::TICEntry tic_entry;
memory_manager.ReadBlockUnsafe(tic_address_gpu, &tic_entry, sizeof(Texture::TICEntry));
const auto r_type{tic_entry.r_type.Value()};
const auto g_type{tic_entry.g_type.Value()};
const auto b_type{tic_entry.b_type.Value()};
const auto a_type{tic_entry.a_type.Value()};
[[maybe_unused]] const auto r_type{tic_entry.r_type.Value()};
[[maybe_unused]] const auto g_type{tic_entry.g_type.Value()};
[[maybe_unused]] const auto b_type{tic_entry.b_type.Value()};
[[maybe_unused]] const auto a_type{tic_entry.a_type.Value()};
// TODO(Subv): Different data types for separate components are not supported
DEBUG_ASSERT(r_type == g_type && r_type == b_type && r_type == a_type);

View File

@@ -1270,7 +1270,7 @@ private:
MemoryManager& memory_manager;
/// Start offsets of each macro in macro_memory
std::unordered_map<u32, u32> macro_offsets;
std::array<u32, 0x80> macro_positions = {};
/// Memory for macro code
MacroMemory macro_memory;

View File

@@ -5,18 +5,17 @@
#include "common/assert.h"
#include "common/logging/log.h"
#include "core/core.h"
#include "core/settings.h"
#include "video_core/engines/maxwell_3d.h"
#include "video_core/engines/maxwell_dma.h"
#include "video_core/memory_manager.h"
#include "video_core/rasterizer_interface.h"
#include "video_core/renderer_base.h"
#include "video_core/textures/decoders.h"
namespace Tegra::Engines {
MaxwellDMA::MaxwellDMA(Core::System& system, VideoCore::RasterizerInterface& rasterizer,
MemoryManager& memory_manager)
: system{system}, rasterizer{rasterizer}, memory_manager{memory_manager} {}
MaxwellDMA::MaxwellDMA(Core::System& system, MemoryManager& memory_manager)
: system{system}, memory_manager{memory_manager} {}
void MaxwellDMA::CallMethod(const GPU::MethodCall& method_call) {
ASSERT_MSG(method_call.method < Regs::NUM_REGS,
@@ -84,13 +83,17 @@ void MaxwellDMA::HandleCopy() {
ASSERT(regs.exec.enable_2d == 1);
if (regs.exec.is_dst_linear && !regs.exec.is_src_linear) {
ASSERT(regs.src_params.size_z == 1);
ASSERT(regs.src_params.BlockDepth() == 0);
// If the input is tiled and the output is linear, deswizzle the input and copy it over.
const u32 src_bytes_per_pixel = regs.src_pitch / regs.src_params.size_x;
const u32 bytes_per_pixel = regs.dst_pitch / regs.x_count;
const std::size_t src_size = Texture::CalculateSize(
true, src_bytes_per_pixel, regs.src_params.size_x, regs.src_params.size_y,
true, bytes_per_pixel, regs.src_params.size_x, regs.src_params.size_y,
regs.src_params.size_z, regs.src_params.BlockHeight(), regs.src_params.BlockDepth());
const std::size_t src_layer_size = Texture::CalculateSize(
true, bytes_per_pixel, regs.src_params.size_x, regs.src_params.size_y, 1,
regs.src_params.BlockHeight(), regs.src_params.BlockDepth());
const std::size_t dst_size = regs.dst_pitch * regs.y_count;
if (read_buffer.size() < src_size) {
@@ -104,23 +107,23 @@ void MaxwellDMA::HandleCopy() {
memory_manager.ReadBlock(source, read_buffer.data(), src_size);
memory_manager.ReadBlock(dest, write_buffer.data(), dst_size);
Texture::UnswizzleSubrect(regs.x_count, regs.y_count, regs.dst_pitch,
regs.src_params.size_x, src_bytes_per_pixel, read_buffer.data(),
write_buffer.data(), regs.src_params.BlockHeight(),
regs.src_params.pos_x, regs.src_params.pos_y);
Texture::UnswizzleSubrect(
regs.x_count, regs.y_count, regs.dst_pitch, regs.src_params.size_x, bytes_per_pixel,
read_buffer.data() + src_layer_size * regs.src_params.pos_z, write_buffer.data(),
regs.src_params.BlockHeight(), regs.src_params.pos_x, regs.src_params.pos_y);
memory_manager.WriteBlock(dest, write_buffer.data(), dst_size);
} else {
ASSERT(regs.dst_params.BlockDepth() == 0);
const u32 src_bytes_per_pixel = regs.src_pitch / regs.x_count;
const u32 bytes_per_pixel = regs.src_pitch / regs.x_count;
const std::size_t dst_size = Texture::CalculateSize(
true, src_bytes_per_pixel, regs.dst_params.size_x, regs.dst_params.size_y,
true, bytes_per_pixel, regs.dst_params.size_x, regs.dst_params.size_y,
regs.dst_params.size_z, regs.dst_params.BlockHeight(), regs.dst_params.BlockDepth());
const std::size_t dst_layer_size = Texture::CalculateSize(
true, src_bytes_per_pixel, regs.dst_params.size_x, regs.dst_params.size_y, 1,
true, bytes_per_pixel, regs.dst_params.size_x, regs.dst_params.size_y, 1,
regs.dst_params.BlockHeight(), regs.dst_params.BlockDepth());
const std::size_t src_size = regs.src_pitch * regs.y_count;
@@ -133,14 +136,19 @@ void MaxwellDMA::HandleCopy() {
write_buffer.resize(dst_size);
}
memory_manager.ReadBlock(source, read_buffer.data(), src_size);
memory_manager.ReadBlock(dest, write_buffer.data(), dst_size);
if (Settings::values.use_accurate_gpu_emulation) {
memory_manager.ReadBlock(source, read_buffer.data(), src_size);
memory_manager.ReadBlock(dest, write_buffer.data(), dst_size);
} else {
memory_manager.ReadBlockUnsafe(source, read_buffer.data(), src_size);
memory_manager.ReadBlockUnsafe(dest, write_buffer.data(), dst_size);
}
// If the input is linear and the output is tiled, swizzle the input and copy it over.
Texture::SwizzleSubrect(regs.x_count, regs.y_count, regs.src_pitch, regs.dst_params.size_x,
src_bytes_per_pixel,
write_buffer.data() + dst_layer_size * regs.dst_params.pos_z,
read_buffer.data(), regs.dst_params.BlockHeight());
Texture::SwizzleSubrect(
regs.x_count, regs.y_count, regs.src_pitch, regs.dst_params.size_x, bytes_per_pixel,
write_buffer.data() + dst_layer_size * regs.dst_params.pos_z, read_buffer.data(),
regs.dst_params.BlockHeight(), regs.dst_params.pos_x, regs.dst_params.pos_y);
memory_manager.WriteBlock(dest, write_buffer.data(), dst_size);
}

View File

@@ -20,10 +20,6 @@ namespace Tegra {
class MemoryManager;
}
namespace VideoCore {
class RasterizerInterface;
}
namespace Tegra::Engines {
/**
@@ -33,8 +29,7 @@ namespace Tegra::Engines {
class MaxwellDMA final {
public:
explicit MaxwellDMA(Core::System& system, VideoCore::RasterizerInterface& rasterizer,
MemoryManager& memory_manager);
explicit MaxwellDMA(Core::System& system, MemoryManager& memory_manager);
~MaxwellDMA() = default;
/// Write the value to the register identified by method.
@@ -180,8 +175,6 @@ public:
private:
Core::System& system;
VideoCore::RasterizerInterface& rasterizer;
MemoryManager& memory_manager;
std::vector<u8> read_buffer;

View File

@@ -538,6 +538,12 @@ enum class PhysicalAttributeDirection : u64 {
Output = 1,
};
enum class VoteOperation : u64 {
All = 0, // allThreadsNV
Any = 1, // anyThreadNV
Eq = 2, // allThreadsEqualNV
};
union Instruction {
Instruction& operator=(const Instruction& instr) {
value = instr.value;
@@ -564,6 +570,13 @@ union Instruction {
BitField<13, 1, u64> trigger;
} nop;
union {
BitField<48, 2, VoteOperation> operation;
BitField<45, 3, u64> dest_pred;
BitField<39, 3, u64> value;
BitField<42, 1, u64> negate_value;
} vote;
union {
BitField<8, 8, Register> gpr;
BitField<20, 24, s64> offset;
@@ -873,6 +886,7 @@ union Instruction {
union {
BitField<0, 3, u64> pred0;
BitField<3, 3, u64> pred3;
BitField<6, 1, u64> neg_b;
BitField<7, 1, u64> abs_a;
BitField<39, 3, u64> pred39;
BitField<42, 1, u64> neg_pred;
@@ -1006,7 +1020,6 @@ union Instruction {
} iset;
union {
BitField<41, 2, u64> selector; // i2i and i2f only
BitField<45, 1, u64> negate_a;
BitField<49, 1, u64> abs_a;
BitField<10, 2, Register::Size> src_size;
@@ -1023,8 +1036,6 @@ union Instruction {
} f2i;
union {
BitField<8, 2, Register::Size> src_size;
BitField<10, 2, Register::Size> dst_size;
BitField<39, 4, u64> rounding;
// H0, H1 extract for F16 missing
BitField<41, 1, u64> selector; // Guessed as some games set it, TODO: reverse this value
@@ -1034,6 +1045,13 @@ union Instruction {
}
} f2f;
union {
BitField<41, 2, u64> selector;
} int_src;
union {
BitField<41, 1, u64> selector;
} float_src;
} conversion;
union {
@@ -1489,6 +1507,7 @@ public:
SYNC,
BRK,
DEPBAR,
VOTE,
BFE_C,
BFE_R,
BFE_IMM,
@@ -1651,6 +1670,7 @@ public:
Hfma2,
Flow,
Synch,
Warp,
Memory,
Texture,
Image,
@@ -1777,6 +1797,7 @@ private:
INST("111000110100---", Id::BRK, Type::Flow, "BRK"),
INST("111000110000----", Id::EXIT, Type::Flow, "EXIT"),
INST("1111000011110---", Id::DEPBAR, Type::Synch, "DEPBAR"),
INST("0101000011011---", Id::VOTE, Type::Warp, "VOTE"),
INST("1110111111011---", Id::LD_A, Type::Memory, "LD_A"),
INST("1110111101001---", Id::LD_S, Type::Memory, "LD_S"),
INST("1110111101000---", Id::LD_L, Type::Memory, "LD_L"),

View File

@@ -23,9 +23,9 @@ GPU::GPU(Core::System& system, VideoCore::RendererBase& renderer, bool is_async)
memory_manager = std::make_unique<Tegra::MemoryManager>(system, rasterizer);
dma_pusher = std::make_unique<Tegra::DmaPusher>(*this);
maxwell_3d = std::make_unique<Engines::Maxwell3D>(system, rasterizer, *memory_manager);
fermi_2d = std::make_unique<Engines::Fermi2D>(rasterizer, *memory_manager);
fermi_2d = std::make_unique<Engines::Fermi2D>(rasterizer);
kepler_compute = std::make_unique<Engines::KeplerCompute>(system, rasterizer, *memory_manager);
maxwell_dma = std::make_unique<Engines::MaxwellDMA>(system, rasterizer, *memory_manager);
maxwell_dma = std::make_unique<Engines::MaxwellDMA>(system, *memory_manager);
kepler_memory = std::make_unique<Engines::KeplerMemory>(system, *memory_manager);
}
@@ -108,6 +108,10 @@ bool GPU::CancelSyncptInterrupt(const u32 syncpoint_id, const u32 value) {
return true;
}
void GPU::FlushCommands() {
renderer.Rasterizer().FlushCommands();
}
u32 RenderTargetBytesPerPixel(RenderTargetFormat format) {
ASSERT(format != RenderTargetFormat::NONE);

View File

@@ -19,6 +19,10 @@ inline CacheAddr ToCacheAddr(const void* host_ptr) {
return reinterpret_cast<CacheAddr>(host_ptr);
}
inline u8* FromCacheAddr(CacheAddr cache_addr) {
return reinterpret_cast<u8*>(cache_addr);
}
namespace Core {
class System;
}
@@ -149,6 +153,8 @@ public:
/// Calls a GPU method.
void CallMethod(const MethodCall& method_call);
void FlushCommands();
/// Returns a reference to the Maxwell3D GPU engine.
Engines::Maxwell3D& Maxwell3D();
@@ -274,8 +280,8 @@ private:
protected:
std::unique_ptr<Tegra::DmaPusher> dma_pusher;
VideoCore::RendererBase& renderer;
Core::System& system;
VideoCore::RendererBase& renderer;
private:
std::unique_ptr<Tegra::MemoryManager> memory_manager;

View File

@@ -50,6 +50,9 @@ public:
/// and invalidated
virtual void FlushAndInvalidateRegion(CacheAddr addr, u64 size) = 0;
/// Notify the rasterizer to send all written commands to the host GPU.
virtual void FlushCommands() = 0;
/// Notify rasterizer that a frame is about to finish
virtual void TickFrame() = 0;

View File

@@ -7,28 +7,41 @@
#include <glad/glad.h>
#include "common/assert.h"
#include "common/microprofile.h"
#include "video_core/rasterizer_interface.h"
#include "video_core/renderer_opengl/gl_buffer_cache.h"
#include "video_core/renderer_opengl/gl_rasterizer.h"
#include "video_core/renderer_opengl/gl_resource_manager.h"
namespace OpenGL {
MICROPROFILE_DEFINE(OpenGL_Buffer_Download, "OpenGL", "Buffer Download", MP_RGB(192, 192, 128));
CachedBufferBlock::CachedBufferBlock(CacheAddr cache_addr, const std::size_t size)
: VideoCommon::BufferBlock{cache_addr, size} {
gl_buffer.Create();
glNamedBufferData(gl_buffer.handle, static_cast<GLsizeiptr>(size), nullptr, GL_DYNAMIC_DRAW);
}
CachedBufferBlock::~CachedBufferBlock() = default;
OGLBufferCache::OGLBufferCache(RasterizerOpenGL& rasterizer, Core::System& system,
std::size_t stream_size)
: VideoCommon::BufferCache<OGLBuffer, GLuint, OGLStreamBuffer>{
: VideoCommon::BufferCache<Buffer, GLuint, OGLStreamBuffer>{
rasterizer, system, std::make_unique<OGLStreamBuffer>(stream_size, true)} {}
OGLBufferCache::~OGLBufferCache() = default;
OGLBuffer OGLBufferCache::CreateBuffer(std::size_t size) {
OGLBuffer buffer;
buffer.Create();
glNamedBufferData(buffer.handle, static_cast<GLsizeiptr>(size), nullptr, GL_DYNAMIC_DRAW);
return buffer;
Buffer OGLBufferCache::CreateBlock(CacheAddr cache_addr, std::size_t size) {
return std::make_shared<CachedBufferBlock>(cache_addr, size);
}
const GLuint* OGLBufferCache::ToHandle(const OGLBuffer& buffer) {
return &buffer.handle;
void OGLBufferCache::WriteBarrier() {
glMemoryBarrier(GL_ALL_BARRIER_BITS);
}
const GLuint* OGLBufferCache::ToHandle(const Buffer& buffer) {
return buffer->GetHandle();
}
const GLuint* OGLBufferCache::GetEmptyBuffer(std::size_t) {
@@ -36,23 +49,24 @@ const GLuint* OGLBufferCache::GetEmptyBuffer(std::size_t) {
return &null_buffer;
}
void OGLBufferCache::UploadBufferData(const OGLBuffer& buffer, std::size_t offset, std::size_t size,
const u8* data) {
glNamedBufferSubData(buffer.handle, static_cast<GLintptr>(offset),
void OGLBufferCache::UploadBlockData(const Buffer& buffer, std::size_t offset, std::size_t size,
const u8* data) {
glNamedBufferSubData(*buffer->GetHandle(), static_cast<GLintptr>(offset),
static_cast<GLsizeiptr>(size), data);
}
void OGLBufferCache::DownloadBufferData(const OGLBuffer& buffer, std::size_t offset,
std::size_t size, u8* data) {
glGetNamedBufferSubData(buffer.handle, static_cast<GLintptr>(offset),
void OGLBufferCache::DownloadBlockData(const Buffer& buffer, std::size_t offset, std::size_t size,
u8* data) {
MICROPROFILE_SCOPE(OpenGL_Buffer_Download);
glGetNamedBufferSubData(*buffer->GetHandle(), static_cast<GLintptr>(offset),
static_cast<GLsizeiptr>(size), data);
}
void OGLBufferCache::CopyBufferData(const OGLBuffer& src, const OGLBuffer& dst,
std::size_t src_offset, std::size_t dst_offset,
std::size_t size) {
glCopyNamedBufferSubData(src.handle, dst.handle, static_cast<GLintptr>(src_offset),
static_cast<GLintptr>(dst_offset), static_cast<GLsizeiptr>(size));
void OGLBufferCache::CopyBlock(const Buffer& src, const Buffer& dst, std::size_t src_offset,
std::size_t dst_offset, std::size_t size) {
glCopyNamedBufferSubData(*src->GetHandle(), *dst->GetHandle(),
static_cast<GLintptr>(src_offset), static_cast<GLintptr>(dst_offset),
static_cast<GLsizeiptr>(size));
}
} // namespace OpenGL

View File

@@ -7,7 +7,7 @@
#include <memory>
#include "common/common_types.h"
#include "video_core/buffer_cache.h"
#include "video_core/buffer_cache/buffer_cache.h"
#include "video_core/rasterizer_cache.h"
#include "video_core/renderer_opengl/gl_resource_manager.h"
#include "video_core/renderer_opengl/gl_stream_buffer.h"
@@ -21,7 +21,24 @@ namespace OpenGL {
class OGLStreamBuffer;
class RasterizerOpenGL;
class OGLBufferCache final : public VideoCommon::BufferCache<OGLBuffer, GLuint, OGLStreamBuffer> {
class CachedBufferBlock;
using Buffer = std::shared_ptr<CachedBufferBlock>;
class CachedBufferBlock : public VideoCommon::BufferBlock {
public:
explicit CachedBufferBlock(CacheAddr cache_addr, const std::size_t size);
~CachedBufferBlock();
const GLuint* GetHandle() const {
return &gl_buffer.handle;
}
private:
OGLBuffer gl_buffer{};
};
class OGLBufferCache final : public VideoCommon::BufferCache<Buffer, GLuint, OGLStreamBuffer> {
public:
explicit OGLBufferCache(RasterizerOpenGL& rasterizer, Core::System& system,
std::size_t stream_size);
@@ -30,18 +47,20 @@ public:
const GLuint* GetEmptyBuffer(std::size_t) override;
protected:
OGLBuffer CreateBuffer(std::size_t size) override;
Buffer CreateBlock(CacheAddr cache_addr, std::size_t size) override;
const GLuint* ToHandle(const OGLBuffer& buffer) override;
void WriteBarrier() override;
void UploadBufferData(const OGLBuffer& buffer, std::size_t offset, std::size_t size,
const u8* data) override;
const GLuint* ToHandle(const Buffer& buffer) override;
void DownloadBufferData(const OGLBuffer& buffer, std::size_t offset, std::size_t size,
u8* data) override;
void UploadBlockData(const Buffer& buffer, std::size_t offset, std::size_t size,
const u8* data) override;
void CopyBufferData(const OGLBuffer& src, const OGLBuffer& dst, std::size_t src_offset,
std::size_t dst_offset, std::size_t size) override;
void DownloadBlockData(const Buffer& buffer, std::size_t offset, std::size_t size,
u8* data) override;
void CopyBlock(const Buffer& src, const Buffer& dst, std::size_t src_offset,
std::size_t dst_offset, std::size_t size) override;
};
} // namespace OpenGL

View File

@@ -14,12 +14,22 @@
namespace OpenGL {
namespace {
template <typename T>
T GetInteger(GLenum pname) {
GLint temporary;
glGetIntegerv(pname, &temporary);
return static_cast<T>(temporary);
}
bool TestProgram(const GLchar* glsl) {
const GLuint shader{glCreateShaderProgramv(GL_VERTEX_SHADER, 1, &glsl)};
GLint link_status;
glGetProgramiv(shader, GL_LINK_STATUS, &link_status);
glDeleteProgram(shader);
return link_status == GL_TRUE;
}
} // Anonymous namespace
Device::Device() {
@@ -27,42 +37,41 @@ Device::Device() {
shader_storage_alignment = GetInteger<std::size_t>(GL_SHADER_STORAGE_BUFFER_OFFSET_ALIGNMENT);
max_vertex_attributes = GetInteger<u32>(GL_MAX_VERTEX_ATTRIBS);
max_varyings = GetInteger<u32>(GL_MAX_VARYING_VECTORS);
has_warp_intrinsics = GLAD_GL_NV_gpu_shader5 && GLAD_GL_NV_shader_thread_group &&
GLAD_GL_NV_shader_thread_shuffle;
has_vertex_viewport_layer = GLAD_GL_ARB_shader_viewport_layer_array;
has_variable_aoffi = TestVariableAoffi();
has_component_indexing_bug = TestComponentIndexingBug();
has_precise_bug = TestPreciseBug();
LOG_INFO(Render_OpenGL, "Renderer_VariableAOFFI: {}", has_variable_aoffi);
LOG_INFO(Render_OpenGL, "Renderer_ComponentIndexingBug: {}", has_component_indexing_bug);
LOG_INFO(Render_OpenGL, "Renderer_PreciseBug: {}", has_precise_bug);
}
Device::Device(std::nullptr_t) {
uniform_buffer_alignment = 0;
max_vertex_attributes = 16;
max_varyings = 15;
has_warp_intrinsics = true;
has_vertex_viewport_layer = true;
has_variable_aoffi = true;
has_component_indexing_bug = false;
has_precise_bug = false;
}
bool Device::TestVariableAoffi() {
const GLchar* AOFFI_TEST = R"(#version 430 core
return TestProgram(R"(#version 430 core
// This is a unit test, please ignore me on apitrace bug reports.
uniform sampler2D tex;
uniform ivec2 variable_offset;
out vec4 output_attribute;
void main() {
output_attribute = textureOffset(tex, vec2(0), variable_offset);
}
)";
const GLuint shader{glCreateShaderProgramv(GL_VERTEX_SHADER, 1, &AOFFI_TEST)};
GLint link_status{};
glGetProgramiv(shader, GL_LINK_STATUS, &link_status);
glDeleteProgram(shader);
const bool supported{link_status == GL_TRUE};
LOG_INFO(Render_OpenGL, "Renderer_VariableAOFFI: {}", supported);
return supported;
})");
}
bool Device::TestComponentIndexingBug() {
constexpr char log_message[] = "Renderer_ComponentIndexingBug: {}";
const GLchar* COMPONENT_TEST = R"(#version 430 core
layout (std430, binding = 0) buffer OutputBuffer {
uint output_value;
@@ -102,12 +111,21 @@ void main() {
GLuint result;
glGetNamedBufferSubData(ssbo.handle, 0, sizeof(result), &result);
if (result != values.at(index)) {
LOG_INFO(Render_OpenGL, log_message, true);
return true;
}
}
LOG_INFO(Render_OpenGL, log_message, false);
return false;
}
bool Device::TestPreciseBug() {
return !TestProgram(R"(#version 430 core
in vec3 coords;
out float out_value;
uniform sampler2DShadow tex;
void main() {
precise float tmp_value = vec4(texture(tex, coords)).x;
out_value = tmp_value;
})");
}
} // namespace OpenGL

View File

@@ -30,6 +30,10 @@ public:
return max_varyings;
}
bool HasWarpIntrinsics() const {
return has_warp_intrinsics;
}
bool HasVertexViewportLayer() const {
return has_vertex_viewport_layer;
}
@@ -42,17 +46,24 @@ public:
return has_component_indexing_bug;
}
bool HasPreciseBug() const {
return has_precise_bug;
}
private:
static bool TestVariableAoffi();
static bool TestComponentIndexingBug();
static bool TestPreciseBug();
std::size_t uniform_buffer_alignment{};
std::size_t shader_storage_alignment{};
u32 max_vertex_attributes{};
u32 max_varyings{};
bool has_warp_intrinsics{};
bool has_vertex_viewport_layer{};
bool has_variable_aoffi{};
bool has_component_indexing_bug{};
bool has_precise_bug{};
};
} // namespace OpenGL

View File

@@ -708,8 +708,6 @@ void RasterizerOpenGL::DrawArrays() {
return;
}
const auto& regs = gpu.regs;
SyncColorMask();
SyncFragmentColorClampState();
SyncMultiSampleState();
@@ -863,6 +861,10 @@ void RasterizerOpenGL::FlushAndInvalidateRegion(CacheAddr addr, u64 size) {
InvalidateRegion(addr, size);
}
void RasterizerOpenGL::FlushCommands() {
glFlush();
}
void RasterizerOpenGL::TickFrame() {
buffer_cache.TickFrame();
}
@@ -976,7 +978,7 @@ void RasterizerOpenGL::SetupGlobalMemory(const GLShader::GlobalMemoryEntry& entr
GPUVAddr gpu_addr, std::size_t size) {
const auto alignment{device.GetShaderStorageBufferAlignment()};
const auto [ssbo, buffer_offset] =
buffer_cache.UploadMemory(gpu_addr, size, alignment, true, entry.IsWritten());
buffer_cache.UploadMemory(gpu_addr, size, alignment, entry.IsWritten());
bind_ssbo_pushbuffer.Push(ssbo, buffer_offset, static_cast<GLsizeiptr>(size));
}

View File

@@ -63,6 +63,7 @@ public:
void FlushRegion(CacheAddr addr, u64 size) override;
void InvalidateRegion(CacheAddr addr, u64 size) override;
void FlushAndInvalidateRegion(CacheAddr addr, u64 size) override;
void FlushCommands() override;
void TickFrame() override;
bool AccelerateSurfaceCopy(const Tegra::Engines::Fermi2D::Regs::Surface& src,
const Tegra::Engines::Fermi2D::Regs::Surface& dst,

View File

@@ -212,7 +212,9 @@ CachedProgram SpecializeShader(const std::string& code, const GLShader::ShaderEn
const auto texture_buffer_usage{variant.texture_buffer_usage};
std::string source = "#version 430 core\n"
"#extension GL_ARB_separate_shader_objects : enable\n";
"#extension GL_ARB_separate_shader_objects : enable\n"
"#extension GL_NV_gpu_shader5 : enable\n"
"#extension GL_NV_shader_thread_group : enable\n";
if (entries.shader_viewport_layer_array) {
source += "#extension GL_ARB_shader_viewport_layer_array : enable\n";
}
@@ -247,20 +249,24 @@ CachedProgram SpecializeShader(const std::string& code, const GLShader::ShaderEn
if (!texture_buffer_usage.test(i)) {
continue;
}
source += fmt::format("#define SAMPLER_{}_IS_BUFFER", i);
source += fmt::format("#define SAMPLER_{}_IS_BUFFER\n", i);
}
if (texture_buffer_usage.any()) {
source += '\n';
}
if (program_type == ProgramType::Geometry) {
const auto [glsl_topology, debug_name, max_vertices] =
GetPrimitiveDescription(primitive_mode);
source += "layout (" + std::string(glsl_topology) + ") in;\n";
source += "layout (" + std::string(glsl_topology) + ") in;\n\n";
source += "#define MAX_VERTEX_INPUT " + std::to_string(max_vertices) + '\n';
}
if (program_type == ProgramType::Compute) {
source += "layout (local_size_variable) in;\n";
}
source += '\n';
source += code;
OGLShader shader;
@@ -289,7 +295,7 @@ std::set<GLenum> GetSupportedFormats() {
CachedShader::CachedShader(const ShaderParameters& params, ProgramType program_type,
GLShader::ProgramResult result)
: RasterizerCacheObject{params.host_ptr}, host_ptr{params.host_ptr}, cpu_addr{params.cpu_addr},
: RasterizerCacheObject{params.host_ptr}, cpu_addr{params.cpu_addr},
unique_identifier{params.unique_identifier}, program_type{program_type},
disk_cache{params.disk_cache}, precompiled_programs{params.precompiled_programs},
entries{result.second}, code{std::move(result.first)}, shader_length{entries.shader_length} {}

View File

@@ -106,7 +106,6 @@ private:
ShaderDiskCacheUsage GetUsage(const ProgramVariant& variant) const;
u8* host_ptr{};
VAddr cpu_addr{};
u64 unique_identifier{};
ProgramType program_type{};

File diff suppressed because it is too large Load Diff

View File

@@ -184,6 +184,9 @@ GLint GetSwizzleSource(SwizzleSource source) {
}
void ApplyTextureDefaults(const SurfaceParams& params, GLuint texture) {
if (params.IsBuffer()) {
return;
}
glTextureParameteri(texture, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
glTextureParameteri(texture, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
glTextureParameteri(texture, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
@@ -208,6 +211,7 @@ OGLTexture CreateTexture(const SurfaceParams& params, GLenum target, GLenum inte
glNamedBufferStorage(texture_buffer.handle, params.width * params.GetBytesPerPixel(),
nullptr, GL_DYNAMIC_STORAGE_BIT);
glTextureBuffer(texture.handle, internal_format, texture_buffer.handle);
break;
case SurfaceTarget::Texture2D:
case SurfaceTarget::TextureCubemap:
glTextureStorage2D(texture.handle, params.emulated_levels, internal_format, params.width,

View File

@@ -51,7 +51,7 @@ public:
}
protected:
void DecorateSurfaceName();
void DecorateSurfaceName() override;
View CreateView(const ViewParams& view_key) override;
View CreateViewInner(const ViewParams& view_key, bool is_proxy);

View File

@@ -735,6 +735,16 @@ private:
return {};
}
Id FCastHalf0(Operation operation) {
UNIMPLEMENTED();
return {};
}
Id FCastHalf1(Operation operation) {
UNIMPLEMENTED();
return {};
}
Id HNegate(Operation operation) {
UNIMPLEMENTED();
return {};
@@ -745,6 +755,11 @@ private:
return {};
}
Id HCastFloat(Operation operation) {
UNIMPLEMENTED();
return {};
}
Id HUnpack(Operation operation) {
UNIMPLEMENTED();
return {};
@@ -1057,6 +1072,26 @@ private:
return {};
}
Id BallotThread(Operation) {
UNIMPLEMENTED();
return {};
}
Id VoteAll(Operation) {
UNIMPLEMENTED();
return {};
}
Id VoteAny(Operation) {
UNIMPLEMENTED();
return {};
}
Id VoteEqual(Operation) {
UNIMPLEMENTED();
return {};
}
Id DeclareBuiltIn(spv::BuiltIn builtin, spv::StorageClass storage, Id type,
const std::string& name) {
const Id id = OpVariable(type, storage);
@@ -1210,6 +1245,8 @@ private:
&SPIRVDecompiler::Unary<&Module::OpFNegate, Type::Float>,
&SPIRVDecompiler::Unary<&Module::OpFAbs, Type::Float>,
&SPIRVDecompiler::Ternary<&Module::OpFClamp, Type::Float>,
&SPIRVDecompiler::FCastHalf0,
&SPIRVDecompiler::FCastHalf1,
&SPIRVDecompiler::Binary<&Module::OpFMin, Type::Float>,
&SPIRVDecompiler::Binary<&Module::OpFMax, Type::Float>,
&SPIRVDecompiler::Unary<&Module::OpCos, Type::Float>,
@@ -1270,6 +1307,7 @@ private:
&SPIRVDecompiler::Unary<&Module::OpFAbs, Type::HalfFloat>,
&SPIRVDecompiler::HNegate,
&SPIRVDecompiler::HClamp,
&SPIRVDecompiler::HCastFloat,
&SPIRVDecompiler::HUnpack,
&SPIRVDecompiler::HMergeF32,
&SPIRVDecompiler::HMergeH0,
@@ -1346,6 +1384,11 @@ private:
&SPIRVDecompiler::WorkGroupId<0>,
&SPIRVDecompiler::WorkGroupId<1>,
&SPIRVDecompiler::WorkGroupId<2>,
&SPIRVDecompiler::BallotThread,
&SPIRVDecompiler::VoteAll,
&SPIRVDecompiler::VoteAny,
&SPIRVDecompiler::VoteEqual,
};
static_assert(operation_decompilers.size() == static_cast<std::size_t>(OperationCode::Amount));

View File

@@ -176,6 +176,7 @@ u32 ShaderIR::DecodeInstr(NodeBlock& bb, u32 pc) {
{OpCode::Type::Ffma, &ShaderIR::DecodeFfma},
{OpCode::Type::Hfma2, &ShaderIR::DecodeHfma2},
{OpCode::Type::Conversion, &ShaderIR::DecodeConversion},
{OpCode::Type::Warp, &ShaderIR::DecodeWarp},
{OpCode::Type::Memory, &ShaderIR::DecodeMemory},
{OpCode::Type::Texture, &ShaderIR::DecodeTexture},
{OpCode::Type::Image, &ShaderIR::DecodeImage},

View File

@@ -14,6 +14,12 @@ using Tegra::Shader::Instruction;
using Tegra::Shader::OpCode;
using Tegra::Shader::Register;
namespace {
constexpr OperationCode GetFloatSelector(u64 selector) {
return selector == 0 ? OperationCode::FCastHalf0 : OperationCode::FCastHalf1;
}
} // Anonymous namespace
u32 ShaderIR::DecodeConversion(NodeBlock& bb, u32 pc) {
const Instruction instr = {program_code[pc]};
const auto opcode = OpCode::Decode(instr);
@@ -22,7 +28,7 @@ u32 ShaderIR::DecodeConversion(NodeBlock& bb, u32 pc) {
case OpCode::Id::I2I_R:
case OpCode::Id::I2I_C:
case OpCode::Id::I2I_IMM: {
UNIMPLEMENTED_IF(instr.conversion.selector);
UNIMPLEMENTED_IF(instr.conversion.int_src.selector != 0);
UNIMPLEMENTED_IF(instr.conversion.dst_size != Register::Size::Word);
UNIMPLEMENTED_IF(instr.alu.saturate_d);
@@ -57,8 +63,8 @@ u32 ShaderIR::DecodeConversion(NodeBlock& bb, u32 pc) {
case OpCode::Id::I2F_R:
case OpCode::Id::I2F_C:
case OpCode::Id::I2F_IMM: {
UNIMPLEMENTED_IF(instr.conversion.dst_size != Register::Size::Word);
UNIMPLEMENTED_IF(instr.conversion.selector);
UNIMPLEMENTED_IF(instr.conversion.int_src.selector != 0);
UNIMPLEMENTED_IF(instr.conversion.dst_size == Register::Size::Long);
UNIMPLEMENTED_IF_MSG(instr.generates_cc,
"Condition codes generation in I2F is not implemented");
@@ -82,14 +88,19 @@ u32 ShaderIR::DecodeConversion(NodeBlock& bb, u32 pc) {
value = GetOperandAbsNegFloat(value, false, instr.conversion.negate_a);
SetInternalFlagsFromFloat(bb, value, instr.generates_cc);
if (instr.conversion.dst_size == Register::Size::Short) {
value = Operation(OperationCode::HCastFloat, PRECISE, value);
}
SetRegister(bb, instr.gpr0, value);
break;
}
case OpCode::Id::F2F_R:
case OpCode::Id::F2F_C:
case OpCode::Id::F2F_IMM: {
UNIMPLEMENTED_IF(instr.conversion.f2f.dst_size != Register::Size::Word);
UNIMPLEMENTED_IF(instr.conversion.f2f.src_size != Register::Size::Word);
UNIMPLEMENTED_IF(instr.conversion.dst_size == Register::Size::Long);
UNIMPLEMENTED_IF(instr.conversion.src_size == Register::Size::Long);
UNIMPLEMENTED_IF_MSG(instr.generates_cc,
"Condition codes generation in F2F is not implemented");
@@ -107,6 +118,13 @@ u32 ShaderIR::DecodeConversion(NodeBlock& bb, u32 pc) {
}
}();
if (instr.conversion.src_size == Register::Size::Short) {
value = Operation(GetFloatSelector(instr.conversion.float_src.selector), NO_PRECISE,
std::move(value));
} else {
ASSERT(instr.conversion.float_src.selector == 0);
}
value = GetOperandAbsNegFloat(value, instr.conversion.abs_a, instr.conversion.negate_a);
value = [&]() {
@@ -124,19 +142,24 @@ u32 ShaderIR::DecodeConversion(NodeBlock& bb, u32 pc) {
default:
UNIMPLEMENTED_MSG("Unimplemented F2F rounding mode {}",
static_cast<u32>(instr.conversion.f2f.rounding.Value()));
return Immediate(0);
return value;
}
}();
value = GetSaturatedFloat(value, instr.alu.saturate_d);
SetInternalFlagsFromFloat(bb, value, instr.generates_cc);
if (instr.conversion.dst_size == Register::Size::Short) {
value = Operation(OperationCode::HCastFloat, PRECISE, value);
}
SetRegister(bb, instr.gpr0, value);
break;
}
case OpCode::Id::F2I_R:
case OpCode::Id::F2I_C:
case OpCode::Id::F2I_IMM: {
UNIMPLEMENTED_IF(instr.conversion.src_size != Register::Size::Word);
UNIMPLEMENTED_IF(instr.conversion.src_size == Register::Size::Long);
UNIMPLEMENTED_IF_MSG(instr.generates_cc,
"Condition codes generation in F2I is not implemented");
Node value = [&]() {
@@ -153,6 +176,13 @@ u32 ShaderIR::DecodeConversion(NodeBlock& bb, u32 pc) {
}
}();
if (instr.conversion.src_size == Register::Size::Short) {
value = Operation(GetFloatSelector(instr.conversion.float_src.selector), NO_PRECISE,
std::move(value));
} else {
ASSERT(instr.conversion.float_src.selector == 0);
}
value = GetOperandAbsNegFloat(value, instr.conversion.abs_a, instr.conversion.negate_a);
value = [&]() {

View File

@@ -15,7 +15,6 @@ using Tegra::Shader::OpCode;
u32 ShaderIR::DecodeFloatSet(NodeBlock& bb, u32 pc) {
const Instruction instr = {program_code[pc]};
const auto opcode = OpCode::Decode(instr);
const Node op_a = GetOperandAbsNegFloat(GetRegister(instr.gpr8), instr.fset.abs_a != 0,
instr.fset.neg_a != 0);

View File

@@ -16,10 +16,9 @@ using Tegra::Shader::Pred;
u32 ShaderIR::DecodeFloatSetPredicate(NodeBlock& bb, u32 pc) {
const Instruction instr = {program_code[pc]};
const auto opcode = OpCode::Decode(instr);
const Node op_a = GetOperandAbsNegFloat(GetRegister(instr.gpr8), instr.fsetp.abs_a != 0,
instr.fsetp.neg_a != 0);
Node op_a = GetOperandAbsNegFloat(GetRegister(instr.gpr8), instr.fsetp.abs_a != 0,
instr.fsetp.neg_a != 0);
Node op_b = [&]() {
if (instr.is_b_imm) {
return GetImmediate19(instr);
@@ -29,12 +28,13 @@ u32 ShaderIR::DecodeFloatSetPredicate(NodeBlock& bb, u32 pc) {
return GetConstBuffer(instr.cbuf34.index, instr.cbuf34.GetOffset());
}
}();
op_b = GetOperandAbsNegFloat(op_b, instr.fsetp.abs_b, false);
op_b = GetOperandAbsNegFloat(std::move(op_b), instr.fsetp.abs_b, instr.fsetp.neg_b);
// We can't use the constant predicate as destination.
ASSERT(instr.fsetp.pred3 != static_cast<u64>(Pred::UnusedIndex));
const Node predicate = GetPredicateComparisonFloat(instr.fsetp.cond, op_a, op_b);
const Node predicate =
GetPredicateComparisonFloat(instr.fsetp.cond, std::move(op_a), std::move(op_b));
const Node second_pred = GetPredicate(instr.fsetp.pred39, instr.fsetp.neg_pred != 0);
const OperationCode combiner = GetPredicateCombiner(instr.fsetp.op);

View File

@@ -30,7 +30,7 @@ u32 ShaderIR::DecodeHalfSetPredicate(NodeBlock& bb, u32 pc) {
case OpCode::Id::HSETP2_C:
cond = instr.hsetp2.cbuf_and_imm.cond;
h_and = instr.hsetp2.cbuf_and_imm.h_and;
op_b = GetOperandAbsNegHalf(GetConstBuffer(instr.cbuf34.index, instr.cbuf34.offset),
op_b = GetOperandAbsNegHalf(GetConstBuffer(instr.cbuf34.index, instr.cbuf34.GetOffset()),
instr.hsetp2.cbuf.abs_b, instr.hsetp2.cbuf.negate_b);
break;
case OpCode::Id::HSETP2_IMM:

View File

@@ -14,7 +14,6 @@ using Tegra::Shader::OpCode;
u32 ShaderIR::DecodeIntegerSet(NodeBlock& bb, u32 pc) {
const Instruction instr = {program_code[pc]};
const auto opcode = OpCode::Decode(instr);
const Node op_a = GetRegister(instr.gpr8);
const Node op_b = [&]() {

View File

@@ -16,7 +16,6 @@ using Tegra::Shader::Pred;
u32 ShaderIR::DecodeIntegerSetPredicate(NodeBlock& bb, u32 pc) {
const Instruction instr = {program_code[pc]};
const auto opcode = OpCode::Decode(instr);
const Node op_a = GetRegister(instr.gpr8);

View File

@@ -74,6 +74,13 @@ u32 ShaderIR::DecodeOther(NodeBlock& bb, u32 pc) {
case SystemVariable::InvocationInfo:
LOG_WARNING(HW_GPU, "MOV_SYS instruction with InvocationInfo is incomplete");
return Immediate(0u);
case SystemVariable::Tid: {
Node value = Immediate(0);
value = BitfieldInsert(value, Operation(OperationCode::LocalInvocationIdX), 0, 9);
value = BitfieldInsert(value, Operation(OperationCode::LocalInvocationIdY), 16, 9);
value = BitfieldInsert(value, Operation(OperationCode::LocalInvocationIdZ), 26, 5);
return value;
}
case SystemVariable::TidX:
return Operation(OperationCode::LocalInvocationIdX);
case SystemVariable::TidY:

View File

@@ -15,7 +15,6 @@ using Tegra::Shader::OpCode;
u32 ShaderIR::DecodePredicateSetRegister(NodeBlock& bb, u32 pc) {
const Instruction instr = {program_code[pc]};
const auto opcode = OpCode::Decode(instr);
UNIMPLEMENTED_IF_MSG(instr.generates_cc,
"Condition codes generation in PSET is not implemented");

View File

@@ -0,0 +1,55 @@
// Copyright 2019 yuzu Emulator Project
// Licensed under GPLv2 or any later version
// Refer to the license.txt file included.
#include "common/assert.h"
#include "common/common_types.h"
#include "video_core/engines/shader_bytecode.h"
#include "video_core/shader/node_helper.h"
#include "video_core/shader/shader_ir.h"
namespace VideoCommon::Shader {
using Tegra::Shader::Instruction;
using Tegra::Shader::OpCode;
using Tegra::Shader::Pred;
using Tegra::Shader::VoteOperation;
namespace {
OperationCode GetOperationCode(VoteOperation vote_op) {
switch (vote_op) {
case VoteOperation::All:
return OperationCode::VoteAll;
case VoteOperation::Any:
return OperationCode::VoteAny;
case VoteOperation::Eq:
return OperationCode::VoteEqual;
default:
UNREACHABLE_MSG("Invalid vote operation={}", static_cast<u64>(vote_op));
return OperationCode::VoteAll;
}
}
} // Anonymous namespace
u32 ShaderIR::DecodeWarp(NodeBlock& bb, u32 pc) {
const Instruction instr = {program_code[pc]};
const auto opcode = OpCode::Decode(instr);
switch (opcode->get().GetId()) {
case OpCode::Id::VOTE: {
const Node value = GetPredicate(instr.vote.value, instr.vote.negate_value != 0);
const Node active = Operation(OperationCode::BallotThread, value);
const Node vote = Operation(GetOperationCode(instr.vote.operation), value);
SetRegister(bb, instr.gpr0, active);
SetPredicate(bb, instr.vote.dest_pred, vote);
break;
}
default:
UNIMPLEMENTED_MSG("Unhandled warp instruction: {}", opcode->get().GetName());
break;
}
return pc;
}
} // namespace VideoCommon::Shader

View File

@@ -30,6 +30,8 @@ enum class OperationCode {
FNegate, /// (MetaArithmetic, float a) -> float
FAbsolute, /// (MetaArithmetic, float a) -> float
FClamp, /// (MetaArithmetic, float value, float min, float max) -> float
FCastHalf0, /// (MetaArithmetic, f16vec2 a) -> float
FCastHalf1, /// (MetaArithmetic, f16vec2 a) -> float
FMin, /// (MetaArithmetic, float a, float b) -> float
FMax, /// (MetaArithmetic, float a, float b) -> float
FCos, /// (MetaArithmetic, float a) -> float
@@ -83,17 +85,18 @@ enum class OperationCode {
UBitfieldExtract, /// (MetaArithmetic, uint value, int offset, int offset) -> uint
UBitCount, /// (MetaArithmetic, uint) -> uint
HAdd, /// (MetaArithmetic, f16vec2 a, f16vec2 b) -> f16vec2
HMul, /// (MetaArithmetic, f16vec2 a, f16vec2 b) -> f16vec2
HFma, /// (MetaArithmetic, f16vec2 a, f16vec2 b, f16vec2 c) -> f16vec2
HAbsolute, /// (f16vec2 a) -> f16vec2
HNegate, /// (f16vec2 a, bool first, bool second) -> f16vec2
HClamp, /// (f16vec2 src, float min, float max) -> f16vec2
HUnpack, /// (Tegra::Shader::HalfType, T value) -> f16vec2
HMergeF32, /// (f16vec2 src) -> float
HMergeH0, /// (f16vec2 dest, f16vec2 src) -> f16vec2
HMergeH1, /// (f16vec2 dest, f16vec2 src) -> f16vec2
HPack2, /// (float a, float b) -> f16vec2
HAdd, /// (MetaArithmetic, f16vec2 a, f16vec2 b) -> f16vec2
HMul, /// (MetaArithmetic, f16vec2 a, f16vec2 b) -> f16vec2
HFma, /// (MetaArithmetic, f16vec2 a, f16vec2 b, f16vec2 c) -> f16vec2
HAbsolute, /// (f16vec2 a) -> f16vec2
HNegate, /// (f16vec2 a, bool first, bool second) -> f16vec2
HClamp, /// (f16vec2 src, float min, float max) -> f16vec2
HCastFloat, /// (MetaArithmetic, float a) -> f16vec2
HUnpack, /// (Tegra::Shader::HalfType, T value) -> f16vec2
HMergeF32, /// (f16vec2 src) -> float
HMergeH0, /// (f16vec2 dest, f16vec2 src) -> f16vec2
HMergeH1, /// (f16vec2 dest, f16vec2 src) -> f16vec2
HPack2, /// (float a, float b) -> f16vec2
LogicalAssign, /// (bool& dst, bool src) -> void
LogicalAnd, /// (bool a, bool b) -> bool
@@ -165,6 +168,11 @@ enum class OperationCode {
WorkGroupIdY, /// () -> uint
WorkGroupIdZ, /// () -> uint
BallotThread, /// (bool) -> uint
VoteAll, /// (bool) -> bool
VoteAny, /// (bool) -> bool
VoteEqual, /// (bool) -> bool
Amount,
};

View File

@@ -405,4 +405,9 @@ Node ShaderIR::BitfieldExtract(Node value, u32 offset, u32 bits) {
Immediate(offset), Immediate(bits));
}
Node ShaderIR::BitfieldInsert(Node base, Node insert, u32 offset, u32 bits) {
return Operation(OperationCode::UBitfieldInsert, NO_PRECISE, base, insert, Immediate(offset),
Immediate(bits));
}
} // namespace VideoCommon::Shader

View File

@@ -167,6 +167,7 @@ private:
u32 DecodeFfma(NodeBlock& bb, u32 pc);
u32 DecodeHfma2(NodeBlock& bb, u32 pc);
u32 DecodeConversion(NodeBlock& bb, u32 pc);
u32 DecodeWarp(NodeBlock& bb, u32 pc);
u32 DecodeMemory(NodeBlock& bb, u32 pc);
u32 DecodeTexture(NodeBlock& bb, u32 pc);
u32 DecodeImage(NodeBlock& bb, u32 pc);
@@ -279,6 +280,9 @@ private:
/// Extracts a sequence of bits from a node
Node BitfieldExtract(Node value, u32 offset, u32 bits);
/// Inserts a sequence of bits from a node
Node BitfieldInsert(Node base, Node insert, u32 offset, u32 bits);
void WriteTexInstructionFloat(NodeBlock& bb, Tegra::Shader::Instruction instr,
const Node4& components);

View File

@@ -58,7 +58,6 @@ public:
std::size_t GetHostSizeInBytes() const {
std::size_t host_size_in_bytes;
if (GetCompressionType() == SurfaceCompression::Converted) {
constexpr std::size_t rgb8_bpp = 4ULL;
// ASTC is uncompressed in software, in emulated as RGBA8
host_size_in_bytes = 0;
for (u32 level = 0; level < num_levels; ++level) {

View File

@@ -308,8 +308,6 @@ protected:
if (!guard_render_targets && surface->IsRenderTarget()) {
ManageRenderTargetUnregister(surface);
}
const GPUVAddr gpu_addr = surface->GetGpuAddr();
const CacheAddr cache_ptr = surface->GetCacheAddr();
const std::size_t size = surface->GetSizeInBytes();
const VAddr cpu_addr = surface->GetCpuAddr();
rasterizer.UpdatePagesCachedCount(cpu_addr, size, -1);

View File

@@ -257,19 +257,21 @@ std::vector<u8> UnswizzleTexture(u8* address, u32 tile_size_x, u32 tile_size_y,
void SwizzleSubrect(u32 subrect_width, u32 subrect_height, u32 source_pitch, u32 swizzled_width,
u32 bytes_per_pixel, u8* swizzled_data, u8* unswizzled_data,
u32 block_height_bit) {
u32 block_height_bit, u32 offset_x, u32 offset_y) {
const u32 block_height = 1U << block_height_bit;
const u32 image_width_in_gobs{(swizzled_width * bytes_per_pixel + (gob_size_x - 1)) /
gob_size_x};
for (u32 line = 0; line < subrect_height; ++line) {
const u32 dst_y = line + offset_y;
const u32 gob_address_y =
(line / (gob_size_y * block_height)) * gob_size * block_height * image_width_in_gobs +
((line % (gob_size_y * block_height)) / gob_size_y) * gob_size;
const auto& table = legacy_swizzle_table[line % gob_size_y];
(dst_y / (gob_size_y * block_height)) * gob_size * block_height * image_width_in_gobs +
((dst_y % (gob_size_y * block_height)) / gob_size_y) * gob_size;
const auto& table = legacy_swizzle_table[dst_y % gob_size_y];
for (u32 x = 0; x < subrect_width; ++x) {
const u32 dst_x = x + offset_x;
const u32 gob_address =
gob_address_y + (x * bytes_per_pixel / gob_size_x) * gob_size * block_height;
const u32 swizzled_offset = gob_address + table[(x * bytes_per_pixel) % gob_size_x];
gob_address_y + (dst_x * bytes_per_pixel / gob_size_x) * gob_size * block_height;
const u32 swizzled_offset = gob_address + table[(dst_x * bytes_per_pixel) % gob_size_x];
u8* source_line = unswizzled_data + line * source_pitch + x * bytes_per_pixel;
u8* dest_addr = swizzled_data + swizzled_offset;

View File

@@ -44,7 +44,8 @@ std::size_t CalculateSize(bool tiled, u32 bytes_per_pixel, u32 width, u32 height
/// Copies an untiled subrectangle into a tiled surface.
void SwizzleSubrect(u32 subrect_width, u32 subrect_height, u32 source_pitch, u32 swizzled_width,
u32 bytes_per_pixel, u8* swizzled_data, u8* unswizzled_data, u32 block_height);
u32 bytes_per_pixel, u8* swizzled_data, u8* unswizzled_data, u32 block_height,
u32 offset_x, u32 offset_y);
/// Copies a tiled subrectangle into a linear surface.
void UnswizzleSubrect(u32 subrect_width, u32 subrect_height, u32 dest_pitch, u32 swizzled_width,

View File

@@ -213,7 +213,7 @@ struct TICEntry {
if (header_version != TICHeaderVersion::OneDBuffer) {
return width_minus_1 + 1;
}
return (buffer_high_width_minus_one << 16) | buffer_low_width_minus_one;
return ((buffer_high_width_minus_one << 16) | buffer_low_width_minus_one) + 1;
}
u32 Height() const {

View File

@@ -516,6 +516,7 @@ void Config::ReadPathValues() {
UISettings::values.roms_path = ReadSetting(QStringLiteral("romsPath")).toString();
UISettings::values.symbols_path = ReadSetting(QStringLiteral("symbolsPath")).toString();
UISettings::values.screenshot_path = ReadSetting(QStringLiteral("screenshotPath")).toString();
UISettings::values.game_directory_path =
ReadSetting(QStringLiteral("gameListRootDir"), QStringLiteral(".")).toString();
UISettings::values.game_directory_deepscan =

View File

@@ -119,6 +119,7 @@ Q_IMPORT_PLUGIN(QWindowsIntegrationPlugin);
#endif
#ifdef _WIN32
#include <windows.h>
extern "C" {
// tells Nvidia and AMD drivers to use the dedicated GPU by default on laptops with switchable
// graphics
@@ -747,6 +748,18 @@ void GMainWindow::OnDisplayTitleBars(bool show) {
}
}
void GMainWindow::PreventOSSleep() {
#ifdef _WIN32
SetThreadExecutionState(ES_CONTINUOUS | ES_SYSTEM_REQUIRED | ES_DISPLAY_REQUIRED);
#endif
}
void GMainWindow::AllowOSSleep() {
#ifdef _WIN32
SetThreadExecutionState(ES_CONTINUOUS);
#endif
}
QStringList GMainWindow::GetUnsupportedGLExtensions() {
QStringList unsupported_ext;
@@ -966,6 +979,8 @@ void GMainWindow::BootGame(const QString& filename) {
}
void GMainWindow::ShutdownGame() {
AllowOSSleep();
discord_rpc->Pause();
emu_thread->RequestStop();
@@ -1567,6 +1582,8 @@ void GMainWindow::OnMenuRecentFile() {
}
void GMainWindow::OnStartGame() {
PreventOSSleep();
emu_thread->SetRunning(true);
qRegisterMetaType<Core::Frontend::SoftwareKeyboardParameters>(
@@ -1598,6 +1615,8 @@ void GMainWindow::OnPauseGame() {
ui.action_Pause->setEnabled(false);
ui.action_Stop->setEnabled(true);
ui.action_Capture_Screenshot->setEnabled(false);
AllowOSSleep();
}
void GMainWindow::OnStopGame() {

View File

@@ -130,6 +130,9 @@ private:
void ConnectWidgetEvents();
void ConnectMenuEvents();
void PreventOSSleep();
void AllowOSSleep();
QStringList GetUnsupportedGLExtensions();
bool LoadROM(const QString& filename);
void BootGame(const QString& filename);

View File

@@ -92,7 +92,6 @@ int main(int argc, char** argv) {
int option_index = 0;
char* endarg;
#ifdef _WIN32
int argc_w;
auto argv_w = CommandLineToArgvW(GetCommandLineW(), &argc_w);
@@ -226,7 +225,7 @@ int main(int argc, char** argv) {
switch (load_result) {
case Core::System::ResultStatus::ErrorGetLoader:
LOG_CRITICAL(Frontend, "Failed to obtain loader for %s!", filepath.c_str());
LOG_CRITICAL(Frontend, "Failed to obtain loader for {}!", filepath);
return -1;
case Core::System::ResultStatus::ErrorLoader:
LOG_CRITICAL(Frontend, "Failed to load ROM!");