Compare commits

..

33 Commits

Author SHA1 Message Date
Lioncash
289adf87ac CMakeLists: Remove EMU_ARCH_BITS definition
This was only ever used by the now-removed memory_util functions. Also,
given we don't plan to support 32-bit architectures, this is just a
leftover from citra at this point.
2018-10-23 12:24:43 -04:00
Lioncash
1291f3f820 common: Remove memory_util.cpp/.h
Everything from here is completely unused and also written with the
notion of supporting 32-bit architecture variants in mind. Given the
Switch itself is on a 64-bit architecture, we won't be supporting 32-bit
architectures. If we need specific allocation functions in the future,
it's likely more worthwhile to new functions for that purpose.
2018-10-23 12:21:34 -04:00
bunnei
e7e209d900 Merge pull request #1552 from FearlessTobi/port-4336
Port citra-emu/citra#4336: "Only redefine some 64-bit file operation for MSVC"
2018-10-23 10:23:41 -04:00
bunnei
5716496239 Merge pull request #1519 from ReinUsesLisp/vsetp
gl_shader_decompiler: Implement VSETP
2018-10-23 10:22:37 -04:00
bunnei
0f3d8c2574 Merge pull request #1539 from lioncash/dma
maxwell_dma: Silence compilation warnings
2018-10-23 10:22:12 -04:00
bunnei
75d807788c Merge pull request #1470 from FernandoS27/alpha_testing
Implemented Alpha Test using Shader Emulation
2018-10-23 10:21:30 -04:00
Weiyi Wang
d9ca6351dd cmake: mingw also needs _FILE_OFFSET_BITS=64 2018-10-23 15:11:25 +02:00
Weiyi Wang
2ff2732a78 only redefine 64 bit file operation for MSVC
MinGW provides POSIX functions
2018-10-23 15:11:18 +02:00
ReinUsesLisp
7d6dca0d0a gl_shader_decompiler: Implement VSETP 2018-10-23 01:07:20 -03:00
ReinUsesLisp
5dfb43531c gl_shader_decompiler: Abstract VMAD into a video subset 2018-10-23 01:07:20 -03:00
bunnei
848a49112a Merge pull request #1512 from ReinUsesLisp/brk
gl_shader_decompiler: Implement PBK and BRK
2018-10-23 00:01:38 -04:00
bunnei
496d155d7b Merge pull request #1550 from FernandoS27/fmul32
Added Saturation to FMUL32I
2018-10-22 23:58:09 -04:00
bunnei
40c63073a9 Merge pull request #1543 from lioncash/target
CMakeLists: Use target_compile_definitions instead of add_definitions to define YUZU_ENABLE_COMPATIBILITY_REPORTING
2018-10-22 22:50:10 -04:00
bunnei
4cccfb4190 Merge pull request #1537 from lioncash/shader
gl_shader_decompiler: Minor changes
2018-10-22 22:49:49 -04:00
FernandoS27
8e1239fbc5 Assert that multiple render targets are not set while alpha testing 2018-10-22 15:35:45 -04:00
bunnei
ff6b2d4574 Merge pull request #1545 from DarkLordZach/psm
psm: Add psm service and stub commands 0 and 1
2018-10-22 15:27:05 -04:00
FernandoS27
59a004f915 Use standard UBO and fix/stylize the code 2018-10-22 15:07:33 -04:00
FernandoS27
17315cee16 Cache uniform locations and restructure the implementation 2018-10-22 15:07:32 -04:00
FernandoS27
bcb5b924fd Remove SyncAlphaTest and clang format 2018-10-22 15:07:31 -04:00
FernandoS27
7b39107e3a Added Alpha Func 2018-10-22 15:07:30 -04:00
FernandoS27
aa620c14af Implemented Alpha Testing 2018-10-22 15:07:30 -04:00
Zach Hilman
314a948373 psm: Stub GetChargerType
Used by LovePotion Lua Homebrew. Stubbed as connected to official Nintendo Switch dock.
2018-10-21 22:03:25 -04:00
Zach Hilman
10a2d20e26 psm: Stub GetBatteryChargePercentage
Used by LovePotion Lua Homebrew. Stubbed to return 100% charge.
2018-10-20 18:01:11 -04:00
Zach Hilman
3b8c0f8885 service: Add skeleton for psm service
Seems to be the power controller. Listed in switchbrew under the category PTM services.
2018-10-20 18:01:07 -04:00
Lioncash
98a27b1ec7 CMakeLists: Use target_compile_definitions instead of add_definitions to define YUZU_ENABLE_COMPATIBILITY_REPORTING
Keeps the definition constrained to the yuzu target and prevents
polluting anything else in the same directory (should that ever happen).
It also keeps it consistent with how the USE_DISCORD_PRESENCE definition
is introduced below it.
2018-10-20 17:28:29 -04:00
Lioncash
c1e5525fc6 engines/maxwell_*: Use nested namespace specifiers where applicable
These three source files are the only ones within the engines directory
that don't use nested namespaces. We may as well change these over to
keep things consistent.
2018-10-20 15:58:09 -04:00
Lioncash
d53c73adaa maxwell_dma: Make variables const where applicable within HandleCopy()
These are never modified, so we can make that assumption explicit.
2018-10-20 15:56:01 -04:00
Lioncash
dd1ee39426 maxwell_dma: Make FlushAndInvalidate's size parameter a u64
This prevents truncation warnings at the lambda's usage sites.
2018-10-20 15:54:45 -04:00
Lioncash
08e574eec4 maxwell_dma: Remove unused variables in HandleCopy()
These pointer variables are never used, so we can get rid of them.
2018-10-20 15:53:24 -04:00
Lioncash
8a86c8d48b gl_shader_decompiler: Allow std::move to function in SetPredicate
If the variable being moved is const, then std::move will always perform
a copy (since it can't actually move the data).
2018-10-20 14:25:15 -04:00
Lioncash
381baf783d gl_shader_decompiler: Get rid of variable shadowing warnings
A variable with the same name was previously declared in an outer scope.
2018-10-20 14:22:37 -04:00
Lioncash
61ef8af1e2 gl_shader_decompiler: Fix a few comment typos 2018-10-20 14:19:28 -04:00
ReinUsesLisp
41fb25349a gl_shader_decompiler: Implement PBK and BRK 2018-10-17 21:30:45 -03:00
22 changed files with 354 additions and 351 deletions

View File

@@ -170,7 +170,7 @@ endif()
# On modern Unixes, this is typically already the case. The lone exception is
# glibc, which may default to 32 bits. glibc allows this to be configured
# by setting _FILE_OFFSET_BITS.
if(CMAKE_SYSTEM_NAME STREQUAL "Linux")
if(CMAKE_SYSTEM_NAME STREQUAL "Linux" OR MINGW)
add_definitions(-D_FILE_OFFSET_BITS=64)
endif()
@@ -178,10 +178,6 @@ endif()
set_property(DIRECTORY APPEND PROPERTY
COMPILE_DEFINITIONS $<$<CONFIG:Debug>:_DEBUG> $<$<NOT:$<CONFIG:Debug>>:NDEBUG>)
math(EXPR EMU_ARCH_BITS ${CMAKE_SIZEOF_VOID_P}*8)
add_definitions(-DEMU_ARCH_BITS=${EMU_ARCH_BITS})
# System imported libraries
# ======================

View File

@@ -64,8 +64,6 @@ add_library(common STATIC
logging/text_formatter.cpp
logging/text_formatter.h
math_util.h
memory_util.cpp
memory_util.h
microprofile.cpp
microprofile.h
microprofileui.h

View File

@@ -15,21 +15,24 @@
#ifdef _WIN32
#include <windows.h>
// windows.h needs to be included before other windows headers
#include <commdlg.h> // for GetSaveFileName
#include <direct.h> // getcwd
#include <direct.h> // getcwd
#include <io.h>
#include <shellapi.h>
#include <shlobj.h> // for SHGetFolderPath
#include <tchar.h>
#include "common/string_util.h"
// 64 bit offsets for windows
#ifdef _MSC_VER
// 64 bit offsets for MSVC
#define fseeko _fseeki64
#define ftello _ftelli64
#define atoll _atoi64
#define fileno _fileno
#endif
// 64 bit offsets for MSVC and MinGW. MinGW also needs this for using _wstat64
#define stat _stat64
#define fstat _fstat64
#define fileno _fileno
#else
#ifdef __APPLE__
#include <sys/param.h>

View File

@@ -91,6 +91,7 @@ enum class Class : ClassType {
Service_PM, ///< The PM service
Service_PREPO, ///< The PREPO (Play report) service
Service_PSC, ///< The PSC service
Service_PSM, ///< The PSM service
Service_SET, ///< The SET (Settings) service
Service_SM, ///< The SM (Service manager) service
Service_SPL, ///< The SPL service

View File

@@ -1,177 +0,0 @@
// Copyright 2013 Dolphin Emulator Project / 2014 Citra Emulator Project
// Licensed under GPLv2 or any later version
// Refer to the license.txt file included.
#include "common/logging/log.h"
#include "common/memory_util.h"
#ifdef _WIN32
#include <windows.h>
// Windows.h needs to be included before psapi.h
#include <psapi.h>
#include "common/common_funcs.h"
#include "common/string_util.h"
#else
#include <cstdlib>
#include <sys/mman.h>
#endif
#if !defined(_WIN32) && defined(ARCHITECTURE_x86_64) && !defined(MAP_32BIT)
#include <unistd.h>
#define PAGE_MASK (getpagesize() - 1)
#define round_page(x) ((((unsigned long)(x)) + PAGE_MASK) & ~(PAGE_MASK))
#endif
// This is purposely not a full wrapper for virtualalloc/mmap, but it
// provides exactly the primitive operations that Dolphin needs.
void* AllocateExecutableMemory(std::size_t size, bool low) {
#if defined(_WIN32)
void* ptr = VirtualAlloc(nullptr, size, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
#else
static char* map_hint = nullptr;
#if defined(ARCHITECTURE_x86_64) && !defined(MAP_32BIT)
// This OS has no flag to enforce allocation below the 4 GB boundary,
// but if we hint that we want a low address it is very likely we will
// get one.
// An older version of this code used MAP_FIXED, but that has the side
// effect of discarding already mapped pages that happen to be in the
// requested virtual memory range (such as the emulated RAM, sometimes).
if (low && (!map_hint))
map_hint = (char*)round_page(512 * 1024 * 1024); /* 0.5 GB rounded up to the next page */
#endif
void* ptr = mmap(map_hint, size, PROT_READ | PROT_WRITE | PROT_EXEC,
MAP_ANON | MAP_PRIVATE
#if defined(ARCHITECTURE_x86_64) && defined(MAP_32BIT)
| (low ? MAP_32BIT : 0)
#endif
,
-1, 0);
#endif /* defined(_WIN32) */
#ifdef _WIN32
if (ptr == nullptr) {
#else
if (ptr == MAP_FAILED) {
ptr = nullptr;
#endif
LOG_ERROR(Common_Memory, "Failed to allocate executable memory");
}
#if !defined(_WIN32) && defined(ARCHITECTURE_x86_64) && !defined(MAP_32BIT)
else {
if (low) {
map_hint += size;
map_hint = (char*)round_page(map_hint); /* round up to the next page */
}
}
#endif
#if EMU_ARCH_BITS == 64
if ((u64)ptr >= 0x80000000 && low == true)
LOG_ERROR(Common_Memory, "Executable memory ended up above 2GB!");
#endif
return ptr;
}
void* AllocateMemoryPages(std::size_t size) {
#ifdef _WIN32
void* ptr = VirtualAlloc(nullptr, size, MEM_COMMIT, PAGE_READWRITE);
#else
void* ptr = mmap(nullptr, size, PROT_READ | PROT_WRITE, MAP_ANON | MAP_PRIVATE, -1, 0);
if (ptr == MAP_FAILED)
ptr = nullptr;
#endif
if (ptr == nullptr)
LOG_ERROR(Common_Memory, "Failed to allocate raw memory");
return ptr;
}
void* AllocateAlignedMemory(std::size_t size, std::size_t alignment) {
#ifdef _WIN32
void* ptr = _aligned_malloc(size, alignment);
#else
void* ptr = nullptr;
#ifdef ANDROID
ptr = memalign(alignment, size);
#else
if (posix_memalign(&ptr, alignment, size) != 0)
LOG_ERROR(Common_Memory, "Failed to allocate aligned memory");
#endif
#endif
if (ptr == nullptr)
LOG_ERROR(Common_Memory, "Failed to allocate aligned memory");
return ptr;
}
void FreeMemoryPages(void* ptr, std::size_t size) {
if (ptr) {
#ifdef _WIN32
if (!VirtualFree(ptr, 0, MEM_RELEASE))
LOG_ERROR(Common_Memory, "FreeMemoryPages failed!\n{}", GetLastErrorMsg());
#else
munmap(ptr, size);
#endif
}
}
void FreeAlignedMemory(void* ptr) {
if (ptr) {
#ifdef _WIN32
_aligned_free(ptr);
#else
free(ptr);
#endif
}
}
void WriteProtectMemory(void* ptr, std::size_t size, bool allowExecute) {
#ifdef _WIN32
DWORD oldValue;
if (!VirtualProtect(ptr, size, allowExecute ? PAGE_EXECUTE_READ : PAGE_READONLY, &oldValue))
LOG_ERROR(Common_Memory, "WriteProtectMemory failed!\n{}", GetLastErrorMsg());
#else
mprotect(ptr, size, allowExecute ? (PROT_READ | PROT_EXEC) : PROT_READ);
#endif
}
void UnWriteProtectMemory(void* ptr, std::size_t size, bool allowExecute) {
#ifdef _WIN32
DWORD oldValue;
if (!VirtualProtect(ptr, size, allowExecute ? PAGE_EXECUTE_READWRITE : PAGE_READWRITE,
&oldValue))
LOG_ERROR(Common_Memory, "UnWriteProtectMemory failed!\n{}", GetLastErrorMsg());
#else
mprotect(ptr, size,
allowExecute ? (PROT_READ | PROT_WRITE | PROT_EXEC) : PROT_WRITE | PROT_READ);
#endif
}
std::string MemUsage() {
#ifdef _WIN32
#pragma comment(lib, "psapi")
DWORD processID = GetCurrentProcessId();
HANDLE hProcess;
PROCESS_MEMORY_COUNTERS pmc;
std::string Ret;
// Print information about the memory usage of the process.
hProcess = OpenProcess(PROCESS_QUERY_INFORMATION | PROCESS_VM_READ, FALSE, processID);
if (nullptr == hProcess)
return "MemUsage Error";
if (GetProcessMemoryInfo(hProcess, &pmc, sizeof(pmc)))
Ret = fmt::format("{} K", Common::ThousandSeparate(pmc.WorkingSetSize / 1024, 7));
CloseHandle(hProcess);
return Ret;
#else
return "";
#endif
}

View File

@@ -1,21 +0,0 @@
// Copyright 2013 Dolphin Emulator Project / 2014 Citra Emulator Project
// Licensed under GPLv2 or any later version
// Refer to the license.txt file included.
#pragma once
#include <cstddef>
#include <string>
void* AllocateExecutableMemory(std::size_t size, bool low = true);
void* AllocateMemoryPages(std::size_t size);
void FreeMemoryPages(void* ptr, std::size_t size);
void* AllocateAlignedMemory(std::size_t size, std::size_t alignment);
void FreeAlignedMemory(void* ptr);
void WriteProtectMemory(void* ptr, std::size_t size, bool executable = false);
void UnWriteProtectMemory(void* ptr, std::size_t size, bool allowExecute = false);
std::string MemUsage();
inline int GetPageSize() {
return 4096;
}

View File

@@ -331,6 +331,8 @@ add_library(core STATIC
hle/service/prepo/prepo.h
hle/service/psc/psc.cpp
hle/service/psc/psc.h
hle/service/ptm/psm.cpp
hle/service/ptm/psm.h
hle/service/service.cpp
hle/service/service.h
hle/service/set/set.cpp

View File

@@ -0,0 +1,71 @@
// Copyright 2018 yuzu emulator team
// Licensed under GPLv2 or any later version
// Refer to the license.txt file included.
#include <memory>
#include "common/logging/log.h"
#include "core/hle/ipc_helpers.h"
#include "core/hle/service/ptm/psm.h"
#include "core/hle/service/service.h"
#include "core/hle/service/sm/sm.h"
namespace Service::PSM {
constexpr u32 BATTERY_FULLY_CHARGED = 100; // 100% Full
constexpr u32 BATTERY_CURRENTLY_CHARGING = 1; // Plugged into an official dock
class PSM final : public ServiceFramework<PSM> {
public:
explicit PSM() : ServiceFramework{"psm"} {
// clang-format off
static const FunctionInfo functions[] = {
{0, &PSM::GetBatteryChargePercentage, "GetBatteryChargePercentage"},
{1, &PSM::GetChargerType, "GetChargerType"},
{2, nullptr, "EnableBatteryCharging"},
{3, nullptr, "DisableBatteryCharging"},
{4, nullptr, "IsBatteryChargingEnabled"},
{5, nullptr, "AcquireControllerPowerSupply"},
{6, nullptr, "ReleaseControllerPowerSupply"},
{7, nullptr, "OpenSession"},
{8, nullptr, "EnableEnoughPowerChargeEmulation"},
{9, nullptr, "DisableEnoughPowerChargeEmulation"},
{10, nullptr, "EnableFastBatteryCharging"},
{11, nullptr, "DisableFastBatteryCharging"},
{12, nullptr, "GetBatteryVoltageState"},
{13, nullptr, "GetRawBatteryChargePercentage"},
{14, nullptr, "IsEnoughPowerSupplied"},
{15, nullptr, "GetBatteryAgePercentage"},
{16, nullptr, "GetBatteryChargeInfoEvent"},
{17, nullptr, "GetBatteryChargeInfoFields"},
};
// clang-format on
RegisterHandlers(functions);
}
~PSM() override = default;
private:
void GetBatteryChargePercentage(Kernel::HLERequestContext& ctx) {
LOG_WARNING(Service_PSM, "(STUBBED) called");
IPC::ResponseBuilder rb{ctx, 3};
rb.Push(RESULT_SUCCESS);
rb.Push<u32>(BATTERY_FULLY_CHARGED);
}
void GetChargerType(Kernel::HLERequestContext& ctx) {
LOG_WARNING(Service_PSM, "(STUBBED) called");
IPC::ResponseBuilder rb{ctx, 3};
rb.Push(RESULT_SUCCESS);
rb.Push<u32>(BATTERY_CURRENTLY_CHARGING);
}
};
void InstallInterfaces(SM::ServiceManager& sm) {
std::make_shared<PSM>()->InstallAsService(sm);
}
} // namespace Service::PSM

View File

@@ -0,0 +1,15 @@
// Copyright 2018 yuzu emulator team
// Licensed under GPLv2 or any later version
// Refer to the license.txt file included.
#pragma once
namespace Service::SM {
class ServiceManager;
}
namespace Service::PSM {
void InstallInterfaces(SM::ServiceManager& sm);
} // namespace Service::PSM

View File

@@ -58,6 +58,7 @@
#include "core/hle/service/pm/pm.h"
#include "core/hle/service/prepo/prepo.h"
#include "core/hle/service/psc/psc.h"
#include "core/hle/service/ptm/psm.h"
#include "core/hle/service/service.h"
#include "core/hle/service/set/settings.h"
#include "core/hle/service/sm/sm.h"
@@ -246,6 +247,7 @@ void Init(std::shared_ptr<SM::ServiceManager>& sm, FileSys::VfsFilesystem& vfs)
PlayReport::InstallInterfaces(*sm);
PM::InstallInterfaces(*sm);
PSC::InstallInterfaces(*sm);
PSM::InstallInterfaces(*sm);
Set::InstallInterfaces(*sm);
Sockets::InstallInterfaces(*sm);
SPL::InstallInterfaces(*sm);

View File

@@ -13,8 +13,7 @@
#include "video_core/renderer_base.h"
#include "video_core/textures/texture.h"
namespace Tegra {
namespace Engines {
namespace Tegra::Engines {
/// First register id that is actually a Macro call.
constexpr u32 MacroRegistersStart = 0xE00;
@@ -408,5 +407,4 @@ void Maxwell3D::ProcessClearBuffers() {
rasterizer.Clear();
}
} // namespace Engines
} // namespace Tegra
} // namespace Tegra::Engines

View File

@@ -643,8 +643,10 @@ public:
u32 d3d_cull_mode;
ComparisonOp depth_test_func;
float alpha_test_ref;
ComparisonOp alpha_test_func;
INSERT_PADDING_WORDS(0xB);
INSERT_PADDING_WORDS(0x9);
struct {
u32 separate_alpha;

View File

@@ -6,8 +6,7 @@
#include "core/core.h"
#include "video_core/engines/maxwell_compute.h"
namespace Tegra {
namespace Engines {
namespace Tegra::Engines {
void MaxwellCompute::WriteReg(u32 method, u32 value) {
ASSERT_MSG(method < Regs::NUM_REGS,
@@ -26,5 +25,4 @@ void MaxwellCompute::WriteReg(u32 method, u32 value) {
}
}
} // namespace Engines
} // namespace Tegra
} // namespace Tegra::Engines

View File

@@ -7,8 +7,7 @@
#include "video_core/rasterizer_interface.h"
#include "video_core/textures/decoders.h"
namespace Tegra {
namespace Engines {
namespace Tegra::Engines {
MaxwellDMA::MaxwellDMA(VideoCore::RasterizerInterface& rasterizer, MemoryManager& memory_manager)
: memory_manager(memory_manager), rasterizer{rasterizer} {}
@@ -78,9 +77,9 @@ void MaxwellDMA::HandleCopy() {
ASSERT(regs.exec.enable_2d == 1);
std::size_t copy_size = regs.x_count * regs.y_count;
const std::size_t copy_size = regs.x_count * regs.y_count;
const auto FlushAndInvalidate = [&](u32 src_size, u32 dst_size) {
const auto FlushAndInvalidate = [&](u32 src_size, u64 dst_size) {
// TODO(Subv): For now, manually flush the regions until we implement GPU-accelerated
// copying.
rasterizer.FlushRegion(source_cpu, src_size);
@@ -91,14 +90,11 @@ void MaxwellDMA::HandleCopy() {
rasterizer.InvalidateRegion(dest_cpu, dst_size);
};
u8* src_buffer = Memory::GetPointer(source_cpu);
u8* dst_buffer = Memory::GetPointer(dest_cpu);
if (regs.exec.is_dst_linear && !regs.exec.is_src_linear) {
ASSERT(regs.src_params.size_z == 1);
// If the input is tiled and the output is linear, deswizzle the input and copy it over.
u32 src_bytes_per_pixel = regs.src_pitch / regs.src_params.size_x;
const u32 src_bytes_per_pixel = regs.src_pitch / regs.src_params.size_x;
FlushAndInvalidate(regs.src_pitch * regs.src_params.size_y,
copy_size * src_bytes_per_pixel);
@@ -111,7 +107,7 @@ void MaxwellDMA::HandleCopy() {
ASSERT(regs.dst_params.size_z == 1);
ASSERT(regs.src_pitch == regs.x_count);
u32 src_bpp = regs.src_pitch / regs.x_count;
const u32 src_bpp = regs.src_pitch / regs.x_count;
FlushAndInvalidate(regs.src_pitch * regs.y_count,
regs.dst_params.size_x * regs.dst_params.size_y * src_bpp);
@@ -122,5 +118,4 @@ void MaxwellDMA::HandleCopy() {
}
}
} // namespace Engines
} // namespace Tegra
} // namespace Tegra::Engines

View File

@@ -214,7 +214,7 @@ enum class IMinMaxExchange : u64 {
XHi = 3,
};
enum class VmadType : u64 {
enum class VideoType : u64 {
Size16_Low = 0,
Size16_High = 1,
Size32 = 2,
@@ -782,6 +782,14 @@ union Instruction {
BitField<45, 2, PredOperation> op;
} psetp;
union {
BitField<43, 4, PredCondition> cond;
BitField<45, 2, PredOperation> op;
BitField<3, 3, u64> pred3;
BitField<0, 3, u64> pred0;
BitField<39, 3, u64> pred39;
} vsetp;
union {
BitField<12, 3, u64> pred12;
BitField<15, 1, u64> neg_pred12;
@@ -1154,15 +1162,17 @@ union Instruction {
union {
BitField<48, 1, u64> signed_a;
BitField<38, 1, u64> is_byte_chunk_a;
BitField<36, 2, VmadType> type_a;
BitField<36, 2, VideoType> type_a;
BitField<36, 2, u64> byte_height_a;
BitField<49, 1, u64> signed_b;
BitField<50, 1, u64> use_register_b;
BitField<30, 1, u64> is_byte_chunk_b;
BitField<28, 2, VmadType> type_b;
BitField<28, 2, VideoType> type_b;
BitField<28, 2, u64> byte_height_b;
} video;
union {
BitField<51, 2, VmadShr> shr;
BitField<55, 1, u64> saturate; // Saturates the result (a * b + c)
BitField<47, 1, u64> cc;
@@ -1213,11 +1223,13 @@ public:
KIL,
SSY,
SYNC,
BRK,
DEPBAR,
BFE_C,
BFE_R,
BFE_IMM,
BRA,
PBK,
LD_A,
LD_C,
ST_A,
@@ -1236,6 +1248,7 @@ public:
OUT_R, // Emit vertex/primitive
ISBERD,
VMAD,
VSETP,
FFMA_IMM, // Fused Multiply and Add
FFMA_CR,
FFMA_RC,
@@ -1374,7 +1387,7 @@ public:
/// conditionally executed).
static bool IsPredicatedInstruction(Id opcode) {
// TODO(Subv): Add the rest of unpredicated instructions.
return opcode != Id::SSY;
return opcode != Id::SSY && opcode != Id::PBK;
}
class Matcher {
@@ -1470,9 +1483,11 @@ private:
#define INST(bitstring, op, type, name) Detail::GetMatcher(bitstring, op, type, name)
INST("111000110011----", Id::KIL, Type::Flow, "KIL"),
INST("111000101001----", Id::SSY, Type::Flow, "SSY"),
INST("111000101010----", Id::PBK, Type::Flow, "PBK"),
INST("111000100100----", Id::BRA, Type::Flow, "BRA"),
INST("1111000011111---", Id::SYNC, Type::Flow, "SYNC"),
INST("111000110100---", Id::BRK, Type::Flow, "BRK"),
INST("1111000011110---", Id::DEPBAR, Type::Synch, "DEPBAR"),
INST("1111000011111---", Id::SYNC, Type::Synch, "SYNC"),
INST("1110111111011---", Id::LD_A, Type::Memory, "LD_A"),
INST("1110111110010---", Id::LD_C, Type::Memory, "LD_C"),
INST("1110111111110---", Id::ST_A, Type::Memory, "ST_A"),
@@ -1491,6 +1506,7 @@ private:
INST("1111101111100---", Id::OUT_R, Type::Trivial, "OUT_R"),
INST("1110111111010---", Id::ISBERD, Type::Trivial, "ISBERD"),
INST("01011111--------", Id::VMAD, Type::Trivial, "VMAD"),
INST("0101000011110---", Id::VSETP, Type::Trivial, "VSETP"),
INST("0011001-1-------", Id::FFMA_IMM, Type::Ffma, "FFMA_IMM"),
INST("010010011-------", Id::FFMA_CR, Type::Ffma, "FFMA_CR"),
INST("010100011-------", Id::FFMA_RC, Type::Ffma, "FFMA_RC"),
@@ -1610,4 +1626,4 @@ private:
}
};
} // namespace Tegra::Shader
} // namespace Tegra::Shader

View File

@@ -570,10 +570,11 @@ void RasterizerOpenGL::DrawArrays() {
SyncBlendState();
SyncLogicOpState();
SyncCullMode();
SyncAlphaTest();
SyncScissorTest();
// Alpha Testing is synced on shaders.
SyncTransformFeedback();
SyncPointState();
CheckAlphaTests();
// TODO(bunnei): Sync framebuffer_scale uniform here
// TODO(bunnei): Sync scissorbox uniform(s) here
@@ -1007,17 +1008,6 @@ void RasterizerOpenGL::SyncLogicOpState() {
state.logic_op.operation = MaxwellToGL::LogicOp(regs.logic_op.operation);
}
void RasterizerOpenGL::SyncAlphaTest() {
const auto& regs = Core::System::GetInstance().GPU().Maxwell3D().regs;
// TODO(Rodrigo): Alpha testing is a legacy OpenGL feature, but it can be
// implemented with a test+discard in fragment shaders.
if (regs.alpha_test_enabled != 0) {
LOG_CRITICAL(Render_OpenGL, "Alpha testing is not implemented");
UNREACHABLE();
}
}
void RasterizerOpenGL::SyncScissorTest() {
const auto& regs = Core::System::GetInstance().GPU().Maxwell3D().regs;
@@ -1052,4 +1042,15 @@ void RasterizerOpenGL::SyncPointState() {
state.point.size = regs.point_size == 0 ? 1 : regs.point_size;
}
void RasterizerOpenGL::CheckAlphaTests() {
const auto& regs = Core::System::GetInstance().GPU().Maxwell3D().regs;
if (regs.alpha_test_enabled != 0 && regs.rt_control.count > 1) {
LOG_CRITICAL(
Render_OpenGL,
"Alpha Testing is enabled with Multiple Render Targets, this behavior is undefined.");
UNREACHABLE();
}
}
} // namespace OpenGL

View File

@@ -162,9 +162,6 @@ private:
/// Syncs the LogicOp state to match the guest state
void SyncLogicOpState();
/// Syncs the alpha test state to match the guest state
void SyncAlphaTest();
/// Syncs the scissor test state to match the guest state
void SyncScissorTest();
@@ -174,6 +171,9 @@ private:
/// Syncs the point state to match the guest state
void SyncPointState();
/// Check asserts for alpha testing.
void CheckAlphaTests();
bool has_ARB_direct_state_access = false;
bool has_ARB_multi_bind = false;
bool has_ARB_separate_shader_objects = false;

View File

@@ -163,10 +163,11 @@ private:
const ExitMethod jmp = Scan(target, end, labels);
return exit_method = ParallelExit(no_jmp, jmp);
}
case OpCode::Id::SSY: {
// The SSY instruction uses a similar encoding as the BRA instruction.
case OpCode::Id::SSY:
case OpCode::Id::PBK: {
// The SSY and PBK use a similar encoding as the BRA instruction.
ASSERT_MSG(instr.bra.constant_buffer == 0,
"Constant buffer SSY is not supported");
"Constant buffer branching is not supported");
const u32 target = offset + instr.bra.GetBranchTarget();
labels.insert(target);
// Continue scanning for an exit method.
@@ -378,8 +379,8 @@ public:
* @param reg The destination register to use.
* @param elem The element to use for the operation.
* @param value The code representing the value to assign. Type has to be half float.
* @param type Half float kind of assignment.
* @param dest_num_components Number of components in the destionation.
* @param merge Half float kind of assignment.
* @param dest_num_components Number of components in the destination.
* @param value_num_components Number of components in the value.
* @param is_saturated Optional, when True, saturates the provided value.
* @param dest_elem Optional, the destination element to use for the operation.
@@ -422,6 +423,7 @@ public:
* @param reg The destination register to use.
* @param elem The element to use for the operation.
* @param attribute The input attribute to use as the source value.
* @param input_mode The input mode.
* @param vertex The register that decides which vertex to read from (used in GS).
*/
void SetRegisterToInputAttibute(const Register& reg, u64 elem, Attribute::Index attribute,
@@ -951,7 +953,7 @@ private:
// Can't assign to the constant predicate.
ASSERT(pred != static_cast<u64>(Pred::UnusedIndex));
const std::string variable = 'p' + std::to_string(pred) + '_' + suffix;
std::string variable = 'p' + std::to_string(pred) + '_' + suffix;
shader.AddLine(variable + " = " + value + ';');
declr_predicates.insert(std::move(variable));
}
@@ -1058,7 +1060,7 @@ private:
/*
* Transforms the input string GLSL operand into an unpacked half float pair.
* @note This function returns a float type pair instead of a half float pair. This is because
* real half floats are not standarized in GLSL but unpackHalf2x16 (which returns a vec2) is.
* real half floats are not standardized in GLSL but unpackHalf2x16 (which returns a vec2) is.
* @param operand Input operand. It has to be an unsigned integer.
* @param type How to unpack the unsigned integer to a half float pair.
* @param abs Get the absolute value of unpacked half floats.
@@ -1232,27 +1234,27 @@ private:
}
/*
* Emits code to push the input target address to the SSY address stack, incrementing the stack
* Emits code to push the input target address to the flow address stack, incrementing the stack
* top.
*/
void EmitPushToSSYStack(u32 target) {
void EmitPushToFlowStack(u32 target) {
shader.AddLine('{');
++shader.scope;
shader.AddLine("ssy_stack[ssy_stack_top] = " + std::to_string(target) + "u;");
shader.AddLine("ssy_stack_top++;");
shader.AddLine("flow_stack[flow_stack_top] = " + std::to_string(target) + "u;");
shader.AddLine("flow_stack_top++;");
--shader.scope;
shader.AddLine('}');
}
/*
* Emits code to pop an address from the SSY address stack, setting the jump address to the
* Emits code to pop an address from the flow address stack, setting the jump address to the
* popped address and decrementing the stack top.
*/
void EmitPopFromSSYStack() {
void EmitPopFromFlowStack() {
shader.AddLine('{');
++shader.scope;
shader.AddLine("ssy_stack_top--;");
shader.AddLine("jmp_to = ssy_stack[ssy_stack_top];");
shader.AddLine("flow_stack_top--;");
shader.AddLine("jmp_to = flow_stack[flow_stack_top];");
shader.AddLine("break;");
--shader.scope;
shader.AddLine('}');
@@ -1264,9 +1266,29 @@ private:
ASSERT_MSG(header.ps.omap.sample_mask == 0, "Samplemask write is unimplemented");
shader.AddLine("if (alpha_test[0] != 0) {");
++shader.scope;
// We start on the register containing the alpha value in the first RT.
u32 current_reg = 3;
for (u32 render_target = 0; render_target < Maxwell3D::Regs::NumRenderTargets;
++render_target) {
// TODO(Blinkhawk): verify the behavior of alpha testing on hardware when
// multiple render targets are used.
if (header.ps.IsColorComponentOutputEnabled(render_target, 0) ||
header.ps.IsColorComponentOutputEnabled(render_target, 1) ||
header.ps.IsColorComponentOutputEnabled(render_target, 2) ||
header.ps.IsColorComponentOutputEnabled(render_target, 3)) {
shader.AddLine(fmt::format("if (!AlphaFunc({})) discard;",
regs.GetRegisterAsFloat(current_reg)));
current_reg += 4;
}
}
--shader.scope;
shader.AddLine('}');
// Write the color outputs using the data in the shader registers, disabled
// rendertargets/components are skipped in the register assignment.
u32 current_reg = 0;
current_reg = 0;
for (u32 render_target = 0; render_target < Maxwell3D::Regs::NumRenderTargets;
++render_target) {
// TODO(Subv): Figure out how dual-source blending is configured in the Switch.
@@ -1290,6 +1312,63 @@ private:
}
}
/// Unpacks a video instruction operand (e.g. VMAD).
std::string GetVideoOperand(const std::string& op, bool is_chunk, bool is_signed,
Tegra::Shader::VideoType type, u64 byte_height) {
const std::string value = [&]() {
if (!is_chunk) {
const auto offset = static_cast<u32>(byte_height * 8);
return "((" + op + " >> " + std::to_string(offset) + ") & 0xff)";
}
const std::string zero = "0";
switch (type) {
case Tegra::Shader::VideoType::Size16_Low:
return '(' + op + " & 0xffff)";
case Tegra::Shader::VideoType::Size16_High:
return '(' + op + " >> 16)";
case Tegra::Shader::VideoType::Size32:
// TODO(Rodrigo): From my hardware tests it becomes a bit "mad" when
// this type is used (1 * 1 + 0 == 0x5b800000). Until a better
// explanation is found: assert.
UNIMPLEMENTED();
return zero;
case Tegra::Shader::VideoType::Invalid:
UNREACHABLE_MSG("Invalid instruction encoding");
return zero;
default:
UNREACHABLE();
return zero;
}
}();
if (is_signed) {
return "int(" + value + ')';
}
return value;
};
/// Gets the A operand for a video instruction.
std::string GetVideoOperandA(Instruction instr) {
return GetVideoOperand(regs.GetRegisterAsInteger(instr.gpr8, 0, false),
instr.video.is_byte_chunk_a != 0, instr.video.signed_a,
instr.video.type_a, instr.video.byte_height_a);
}
/// Gets the B operand for a video instruction.
std::string GetVideoOperandB(Instruction instr) {
if (instr.video.use_register_b) {
return GetVideoOperand(regs.GetRegisterAsInteger(instr.gpr20, 0, false),
instr.video.is_byte_chunk_b != 0, instr.video.signed_b,
instr.video.type_b, instr.video.byte_height_b);
} else {
return '(' +
std::to_string(instr.video.signed_b ? static_cast<s16>(instr.alu.GetImm20_16())
: instr.alu.GetImm20_16()) +
')';
}
}
/**
* Compiles a single instruction from Tegra to GLSL.
* @param offset the offset of the Tegra shader instruction.
@@ -3264,16 +3343,32 @@ private:
// The SSY opcode tells the GPU where to re-converge divergent execution paths, it
// sets the target of the jump that the SYNC instruction will make. The SSY opcode
// has a similar structure to the BRA opcode.
ASSERT_MSG(instr.bra.constant_buffer == 0, "Constant buffer SSY is not supported");
ASSERT_MSG(instr.bra.constant_buffer == 0, "Constant buffer flow is not supported");
const u32 target = offset + instr.bra.GetBranchTarget();
EmitPushToSSYStack(target);
EmitPushToFlowStack(target);
break;
}
case OpCode::Id::PBK: {
// PBK pushes to a stack the address where BRK will jump to. This shares stack with
// SSY but using SYNC on a PBK address will kill the shader execution. We don't
// emulate this because it's very unlikely a driver will emit such invalid shader.
ASSERT_MSG(instr.bra.constant_buffer == 0, "Constant buffer PBK is not supported");
const u32 target = offset + instr.bra.GetBranchTarget();
EmitPushToFlowStack(target);
break;
}
case OpCode::Id::SYNC: {
// The SYNC opcode jumps to the address previously set by the SSY opcode
ASSERT(instr.flow.cond == Tegra::Shader::FlowCondition::Always);
EmitPopFromSSYStack();
EmitPopFromFlowStack();
break;
}
case OpCode::Id::BRK: {
// The BRK opcode jumps to the address previously set by the PBK opcode
ASSERT(instr.flow.cond == Tegra::Shader::FlowCondition::Always);
EmitPopFromFlowStack();
break;
}
case OpCode::Id::DEPBAR: {
@@ -3283,87 +3378,51 @@ private:
break;
}
case OpCode::Id::VMAD: {
const bool signed_a = instr.vmad.signed_a == 1;
const bool signed_b = instr.vmad.signed_b == 1;
const bool result_signed = signed_a || signed_b;
boost::optional<std::string> forced_result;
auto Unpack = [&](const std::string& op, bool is_chunk, bool is_signed,
Tegra::Shader::VmadType type, u64 byte_height) {
const std::string value = [&]() {
if (!is_chunk) {
const auto offset = static_cast<u32>(byte_height * 8);
return "((" + op + " >> " + std::to_string(offset) + ") & 0xff)";
}
const std::string zero = "0";
switch (type) {
case Tegra::Shader::VmadType::Size16_Low:
return '(' + op + " & 0xffff)";
case Tegra::Shader::VmadType::Size16_High:
return '(' + op + " >> 16)";
case Tegra::Shader::VmadType::Size32:
// TODO(Rodrigo): From my hardware tests it becomes a bit "mad" when
// this type is used (1 * 1 + 0 == 0x5b800000). Until a better
// explanation is found: assert.
UNREACHABLE_MSG("Unimplemented");
return zero;
case Tegra::Shader::VmadType::Invalid:
// Note(Rodrigo): This flag is invalid according to nvdisasm. From my
// testing (even though it's invalid) this makes the whole instruction
// assign zero to target register.
forced_result = boost::make_optional(zero);
return zero;
default:
UNREACHABLE();
return zero;
}
}();
if (is_signed) {
return "int(" + value + ')';
}
return value;
};
const std::string op_a = Unpack(regs.GetRegisterAsInteger(instr.gpr8, 0, false),
instr.vmad.is_byte_chunk_a != 0, signed_a,
instr.vmad.type_a, instr.vmad.byte_height_a);
std::string op_b;
if (instr.vmad.use_register_b) {
op_b = Unpack(regs.GetRegisterAsInteger(instr.gpr20, 0, false),
instr.vmad.is_byte_chunk_b != 0, signed_b, instr.vmad.type_b,
instr.vmad.byte_height_b);
} else {
op_b = '(' +
std::to_string(signed_b ? static_cast<s16>(instr.alu.GetImm20_16())
: instr.alu.GetImm20_16()) +
')';
}
const bool result_signed = instr.video.signed_a == 1 || instr.video.signed_b == 1;
const std::string op_a = GetVideoOperandA(instr);
const std::string op_b = GetVideoOperandB(instr);
const std::string op_c = regs.GetRegisterAsInteger(instr.gpr39, 0, result_signed);
std::string result;
if (forced_result) {
result = *forced_result;
} else {
result = '(' + op_a + " * " + op_b + " + " + op_c + ')';
std::string result = '(' + op_a + " * " + op_b + " + " + op_c + ')';
switch (instr.vmad.shr) {
case Tegra::Shader::VmadShr::Shr7:
result = '(' + result + " >> 7)";
break;
case Tegra::Shader::VmadShr::Shr15:
result = '(' + result + " >> 15)";
break;
}
switch (instr.vmad.shr) {
case Tegra::Shader::VmadShr::Shr7:
result = '(' + result + " >> 7)";
break;
case Tegra::Shader::VmadShr::Shr15:
result = '(' + result + " >> 15)";
break;
}
regs.SetRegisterToInteger(instr.gpr0, result_signed, 1, result, 1, 1,
instr.vmad.saturate == 1, 0, Register::Size::Word,
instr.vmad.cc);
break;
}
case OpCode::Id::VSETP: {
const std::string op_a = GetVideoOperandA(instr);
const std::string op_b = GetVideoOperandB(instr);
// We can't use the constant predicate as destination.
ASSERT(instr.vsetp.pred3 != static_cast<u64>(Pred::UnusedIndex));
const std::string second_pred = GetPredicateCondition(instr.vsetp.pred39, false);
const std::string combiner = GetPredicateCombiner(instr.vsetp.op);
const std::string predicate = GetPredicateComparison(instr.vsetp.cond, op_a, op_b);
// Set the primary predicate to the result of Predicate OP SecondPredicate
SetPredicate(instr.vsetp.pred3,
'(' + predicate + ") " + combiner + " (" + second_pred + ')');
if (instr.vsetp.pred0 != static_cast<u64>(Pred::UnusedIndex)) {
// Set the secondary predicate to the result of !Predicate OP SecondPredicate,
// if enabled
SetPredicate(instr.vsetp.pred0,
"!(" + predicate + ") " + combiner + " (" + second_pred + ')');
}
break;
}
default: {
LOG_CRITICAL(HW_GPU, "Unhandled instruction: {}", opcode->GetName());
UNREACHABLE();
@@ -3427,11 +3486,11 @@ private:
labels.insert(subroutine.begin);
shader.AddLine("uint jmp_to = " + std::to_string(subroutine.begin) + "u;");
// TODO(Subv): Figure out the actual depth of the SSY stack, for now it seems
// unlikely that shaders will use 20 nested SSYs.
constexpr u32 SSY_STACK_SIZE = 20;
shader.AddLine("uint ssy_stack[" + std::to_string(SSY_STACK_SIZE) + "];");
shader.AddLine("uint ssy_stack_top = 0u;");
// TODO(Subv): Figure out the actual depth of the flow stack, for now it seems
// unlikely that shaders will use 20 nested SSYs and PBKs.
constexpr u32 FLOW_STACK_SIZE = 20;
shader.AddLine("uint flow_stack[" + std::to_string(FLOW_STACK_SIZE) + "];");
shader.AddLine("uint flow_stack_top = 0u;");
shader.AddLine("while (true) {");
++shader.scope;
@@ -3498,7 +3557,7 @@ private:
// Declarations
std::set<std::string> declr_predicates;
}; // namespace Decompiler
}; // namespace OpenGL::GLShader::Decompiler
std::string GetCommonDeclarations() {
return fmt::format("#define MAX_CONSTBUFFER_ELEMENTS {}\n",

View File

@@ -29,6 +29,7 @@ layout(std140) uniform vs_config {
vec4 viewport_flip;
uvec4 instance_id;
uvec4 flip_stage;
uvec4 alpha_test;
};
)";
@@ -105,6 +106,7 @@ layout (std140) uniform gs_config {
vec4 viewport_flip;
uvec4 instance_id;
uvec4 flip_stage;
uvec4 alpha_test;
};
void main() {
@@ -142,8 +144,33 @@ layout (std140) uniform fs_config {
vec4 viewport_flip;
uvec4 instance_id;
uvec4 flip_stage;
uvec4 alpha_test;
};
bool AlphaFunc(in float value) {
float ref = uintBitsToFloat(alpha_test[2]);
switch (alpha_test[1]) {
case 1:
return false;
case 2:
return value < ref;
case 3:
return value == ref;
case 4:
return value <= ref;
case 5:
return value > ref;
case 6:
return value != ref;
case 7:
return value >= ref;
case 8:
return true;
default:
return false;
}
}
void main() {
exec_fragment();
}
@@ -152,4 +179,4 @@ void main() {
out += program.first;
return {out, program.second};
}
} // namespace OpenGL::GLShader
} // namespace OpenGL::GLShader

View File

@@ -16,6 +16,17 @@ void MaxwellUniformData::SetFromRegs(const Maxwell3D::State::ShaderStageInfo& sh
viewport_flip[0] = regs.viewport_transform[0].scale_x < 0.0 ? -1.0f : 1.0f;
viewport_flip[1] = regs.viewport_transform[0].scale_y < 0.0 ? -1.0f : 1.0f;
u32 func = static_cast<u32>(regs.alpha_test_func);
// Normalize the gl variants of opCompare to be the same as the normal variants
u32 op_gl_variant_base = static_cast<u32>(Tegra::Engines::Maxwell3D::Regs::ComparisonOp::Never);
if (func >= op_gl_variant_base) {
func = func - op_gl_variant_base + 1U;
}
alpha_test.enabled = regs.alpha_test_enabled;
alpha_test.func = func;
alpha_test.ref = regs.alpha_test_ref;
// We only assign the instance to the first component of the vector, the rest is just padding.
instance_id[0] = state.current_instance;

View File

@@ -22,8 +22,14 @@ struct MaxwellUniformData {
alignas(16) GLvec4 viewport_flip;
alignas(16) GLuvec4 instance_id;
alignas(16) GLuvec4 flip_stage;
struct alignas(16) {
GLuint enabled;
GLuint func;
GLfloat ref;
GLuint padding;
} alpha_test;
};
static_assert(sizeof(MaxwellUniformData) == 48, "MaxwellUniformData structure size is incorrect");
static_assert(sizeof(MaxwellUniformData) == 64, "MaxwellUniformData structure size is incorrect");
static_assert(sizeof(MaxwellUniformData) < 16384,
"MaxwellUniformData structure must be less than 16kb as per the OpenGL spec");

View File

@@ -121,7 +121,7 @@ target_link_libraries(yuzu PRIVATE Boost::boost glad Qt5::OpenGL Qt5::Widgets)
target_link_libraries(yuzu PRIVATE ${PLATFORM_LIBRARIES} Threads::Threads)
if (YUZU_ENABLE_COMPATIBILITY_REPORTING)
add_definitions(-DYUZU_ENABLE_COMPATIBILITY_REPORTING)
target_compile_definitions(yuzu PRIVATE -DYUZU_ENABLE_COMPATIBILITY_REPORTING)
endif()
if (USE_DISCORD_PRESENCE)