Compare commits

...

40 Commits

Author SHA1 Message Date
Fernando Sahmkow
23ca1eb82e yuzu-cmd/CMakeLists: Correct attribution for this function. 2021-08-08 20:24:53 +02:00
Fernando S
859deda3bb Merge pull request #6834 from K0bin/buffer-image-granularity
Respect Vulkan bufferImageGranularity
2021-08-08 11:57:40 +02:00
bunnei
b023413c98 Merge pull request #6698 from german77/SDL_QoL
input_common: Improve SDL joystick and hide toggle option
2021-08-08 02:44:42 -07:00
bunnei
00358e2098 Merge pull request #6817 from gidoly/patch-1
Add description to fast gpu time option
2021-08-08 01:11:47 -07:00
german77
48b6d41f1b input_common: Improve SDL joystick and hide toggle option 2021-08-07 23:11:23 -05:00
bunnei
63325cafbe Merge pull request #6827 from Morph1984/uuid-hash
common: uuid: Add hash function for UUID
2021-08-07 17:18:46 -07:00
bunnei
bd0e1d3a25 Merge pull request #6830 from ameerj/nvdec-unimpld-codec
nvdec: Better logging for unimplemented codecs
2021-08-07 12:37:39 -07:00
Robin Kertels
bb29dcb7f2 vulkan_memory_allocator: Respect bufferImageGranularity 2021-08-07 15:28:05 +02:00
bunnei
456adb95ff Merge pull request #6795 from sankasan/cmd-remove-cursor-fullscreen
yuzu-cmd: hide mouse cursor when started fullscreen
2021-08-07 02:00:29 -07:00
bunnei
bd1a764827 Merge pull request #6815 from german77/ui_improvements
settings_ui: Add emulated joystick position dot to controller preview
2021-08-06 23:54:23 -07:00
ameerj
928b64d2ce nvdec: Better logging for unimplemented codecs 2021-08-07 01:08:33 -04:00
bunnei
268b5764c7 Merge pull request #6791 from ameerj/astc-opt
astc_decoder: Various performance and memory optimizations
2021-08-06 21:45:24 -07:00
bunnei
f183668a87 Merge pull request #6799 from ameerj/vp9-fixes
nvdec: Fix VP9 reference frame refreshes
2021-08-06 17:46:46 -07:00
ameerj
156ea746a3 nvhost_nvdec_common: Remove BufferMap
This was mainly used to keep track of mapped buffers for later unmapping.  Since unmap is no longer implemented, this no longer seves a valuable purpose.
2021-08-06 20:11:12 -04:00
ameerj
e3688f0627 vp9: Cleanup unused variables
With reference frames refreshes fix, we no longer need to buffer two frames in advance.
We can also remove other unused or otherwise unneeded variables.
2021-08-06 20:08:11 -04:00
ameerj
a3f80a97a3 vp9: Fix reference frame refreshes
This resolves the artifacting when decoding VP9 streams.
2021-08-06 20:08:08 -04:00
ameerj
cc8ac112fc nvhost_nvdec_common: Stub UnmapBuffer Ioctl
Skip unmapping nvdec buffers to avoid breaking the continuity of the VP9 reference frame addresses, and the risk of invalidating data before the async GPU thread is done with it.
2021-08-06 20:06:30 -04:00
bunnei
42d8e08f78 Merge pull request #6822 from yzct12345/clion-assert
assert: Avoid empty macros
2021-08-05 22:29:45 -07:00
Morph
d20c5ac720 common: uuid: Add hash function for UUID
Used when UUID is a key in an unordered_map. The hash is produced by XORing the high and low 64-bits of the UUID together.
2021-08-06 00:41:55 -04:00
gidoly
8ba551e1cd Update configure_graphics_advanced.ui
add description too fast gpu time
2021-08-06 06:08:12 +09:00
bunnei
e1a92db519 Merge pull request #6813 from Morph1984/hex-string-to-uuid
common: uuid: Add hex string to UUID constructor
2021-08-05 13:29:11 -07:00
yzct12345
7e846be376 assert: Verify formatting 2021-08-05 17:46:22 +00:00
yzct12345
346149dcf9 assert: Avoid empty macros 2021-08-05 17:21:56 +00:00
Mai M
a1cb453470 Merge pull request #6819 from Morph1984/i-am-dumb
applet_swkbd: Include the null terminator in the buffer size calculation
2021-08-04 23:32:01 -04:00
Mai M
9a7d2e3659 Merge pull request #6818 from Morph1984/hex-util-bug
hex_util: Fix incorrect array size in AsArray
2021-08-04 23:31:04 -04:00
Morph
f10dc35dd0 applet_swkbd: Include the null terminator in the buffer size calculation
Some games may interpret the read string as a null-terminated string instead of just reading the string up to buffer_size.
2021-08-04 22:32:09 -04:00
Morph
7b39215c8a hex_util: Fix incorrect array size in AsArray
Although this isn't used, this is a potential bug as HexStringToArray will perform an out-of-bounds read.
2021-08-04 22:16:29 -04:00
Morph
edb9c72e26 Merge pull request #6816 from lat9nq/fix-mult-contrl
config: Read connected setting for controllers
2021-08-04 22:05:53 -04:00
lat9nq
be16d92060 config: Read connected setting for controllers
Currently yuzu will read the mapping but does not connect a controller
despite adding subsequent configurations for it. Read the `connected`
setting for now as a boolean like the Qt frontend.
2021-08-04 17:09:35 -04:00
german77
d5bf597436 settings_ui: Use better colors for the light theme 2021-08-04 11:47:06 -05:00
german77
1fb158ce90 settings_ui: Add emulated joystick position dot to controller preview 2021-08-04 11:46:54 -05:00
Morph
705f111058 common: uuid: Add hex string to UUID constructor
This allows for easily converting a hex string into a Common::UUID, which is backed by a 128 bit unsigned integer.
2021-08-04 10:45:41 -04:00
yzct12345
2868d4ba84 nvdec: Implement VA-API hardware video acceleration (#6713)
* nvdec: VA-API

* Verify formatting

* Forgot a semicolon for Windows

* Clarify comment about AV_PIX_FMT_NV12

* Fix assert log spam from missing negation

* vic: Remove forgotten debug code

* Address lioncash's review

* Mention VA-API is Intel/AMD

* Address v1993's review

* Hopefully fix CMakeLists style this time

* vic: Improve cache locality

* vic: Fix off-by-one error

* codec: Async

* codec: Forgot the GetValue()

* nvdec: Address ameerj's review

* codec: Fallback to CPU without VA-API support

* cmake: Address lat9nq's review

* cmake: Make VA-API optional

* vaapi: Multiple GPU

* Apply suggestions from code review

Co-authored-by: Ameer J <52414509+ameerj@users.noreply.github.com>

* nvdec: Address ameerj's review

* codec: Use anonymous instead of static

* nvdec: Remove enum and fix memory leak

* nvdec: Address ameerj's review

* codec: Remove preparation for threading

Co-authored-by: Ameer J <52414509+ameerj@users.noreply.github.com>
2021-08-03 23:43:11 -04:00
san
3e26141483 yuzu-cmd: hide cursor when in fullscreen
Exposed the SDL_ShowCursor function to EmuWindow baseclass. When creating the window (GL or VK) in fullscreen it now automatically hides the cursor.
2021-08-01 21:46:13 +02:00
ameerj
c439fc9be9 astc_decoder: Reduce workgroup size
This reduces the amount of over dispatching when there are odd dimensions (i.e. ASTC 8x5), which rarely evenly divide into 32x32.
2021-08-01 01:22:27 -04:00
ameerj
5ab8053511 astc_decoder: Compute offset swizzles in-shader
Alleviates the dependency on the swizzle table and a uniform which is constant for all ASTC texture sizes.
2021-08-01 01:22:26 -04:00
ameerj
b2862e4772 astc_decoder: Make use of uvec4 for payload data 2021-07-31 22:28:04 -04:00
ameerj
a75d70fa90 astc_decoder: Simplify Select2DPartition 2021-07-31 21:36:26 -04:00
ameerj
5665d05547 astc_decoder: Optimize the use EncodingData
This buffer was a list of EncodingData structures sorted by their bit length, with some duplication from the cpu decoder implementation.
We can take advantage of its sorted property to optimize its usage in the shader.

Thanks to wwylele for the optimization idea.
2021-07-31 21:36:26 -04:00
ameerj
15c0c213b1 astc.h: Move data to cpp implementation
Moves leftover values that are no longer used by the gpu decoder back to the cpp implementation.
2021-07-31 21:26:42 -04:00
40 changed files with 758 additions and 766 deletions

View File

@@ -583,8 +583,32 @@ if (YUZU_USE_BUNDLED_FFMPEG)
"${FFmpeg_PREFIX};${FFmpeg_BUILD_DIR}"
CACHE PATH "Path to FFmpeg headers" FORCE)
if (${CMAKE_SYSTEM_NAME} STREQUAL "Linux")
Include(FindPkgConfig REQUIRED)
pkg_check_modules(LIBVA libva)
endif()
if(LIBVA_FOUND)
pkg_check_modules(LIBDRM libdrm REQUIRED)
find_package(X11 REQUIRED)
pkg_check_modules(LIBVA-DRM libva-drm REQUIRED)
pkg_check_modules(LIBVA-X11 libva-x11 REQUIRED)
set(FFmpeg_LIBVA_LIBRARIES
${LIBDRM_LIBRARIES}
${X11_LIBRARIES}
${LIBVA-DRM_LIBRARIES}
${LIBVA-X11_LIBRARIES}
${LIBVA_LIBRARIES})
set(FFmpeg_HWACCEL_FLAGS
--enable-hwaccel=h264_vaapi
--enable-hwaccel=vp9_vaapi
--enable-libdrm)
message(STATUS "VA-API found")
else()
set(FFmpeg_HWACCEL_FLAGS --disable-vaapi)
endif()
# `configure` parameters builds only exactly what yuzu needs from FFmpeg
# `--disable-{vaapi,vdpau}` is needed to avoid linking issues
# `--disable-vdpau` is needed to avoid linking issues
add_custom_command(
OUTPUT
${FFmpeg_MAKEFILE}
@@ -600,15 +624,16 @@ if (YUZU_USE_BUNDLED_FFMPEG)
--disable-network
--disable-postproc
--disable-swresample
--disable-vaapi
--disable-vdpau
--enable-decoder=h264
--enable-decoder=vp9
--cc="${CMAKE_C_COMPILER}"
--cxx="${CMAKE_CXX_COMPILER}"
${FFmpeg_HWACCEL_FLAGS}
WORKING_DIRECTORY
${FFmpeg_BUILD_DIR}
)
unset(FFmpeg_HWACCEL_FLAGS)
# Workaround for Ubuntu 18.04's older version of make not being able to call make as a child
# with context of the jobserver. Also helps ninja users.
@@ -618,9 +643,10 @@ if (YUZU_USE_BUNDLED_FFMPEG)
OUTPUT_VARIABLE
SYSTEM_THREADS)
set(FFmpeg_BUILD_LIBRARIES ${FFmpeg_LIBRARIES})
add_custom_command(
OUTPUT
${FFmpeg_LIBRARIES}
${FFmpeg_BUILD_LIBRARIES}
COMMAND
make -j${SYSTEM_THREADS}
WORKING_DIRECTORY
@@ -630,7 +656,12 @@ if (YUZU_USE_BUNDLED_FFMPEG)
# ALL makes this custom target build every time
# but it won't actually build if the DEPENDS parameter is up to date
add_custom_target(ffmpeg-configure ALL DEPENDS ${FFmpeg_MAKEFILE})
add_custom_target(ffmpeg-build ALL DEPENDS ${FFmpeg_LIBRARIES} ffmpeg-configure)
add_custom_target(ffmpeg-build ALL DEPENDS ${FFmpeg_BUILD_LIBRARIES} ffmpeg-configure)
link_libraries(${FFmpeg_LIBVA_LIBRARIES})
set(FFmpeg_LIBRARIES ${FFmpeg_LIBVA_LIBRARIES} ${FFmpeg_BUILD_LIBRARIES}
CACHE PATH "Paths to FFmpeg libraries" FORCE)
unset(FFmpeg_BUILD_LIBRARIES)
unset(FFmpeg_LIBVA_LIBRARIES)
if (FFmpeg_FOUND)
message(STATUS "Found FFmpeg version ${FFmpeg_VERSION}")

View File

@@ -51,11 +51,11 @@ QPushButton#GPUStatusBarButton:hover {
}
QPushButton#GPUStatusBarButton:checked {
color: #ff8040;
color: #b06020;
}
QPushButton#GPUStatusBarButton:!checked {
color: #40dd40;
color: #109010;
}
QPushButton#buttonRefreshDevices {

View File

@@ -52,8 +52,12 @@ assert_noinline_call(const Fn& fn) {
#define DEBUG_ASSERT(_a_) ASSERT(_a_)
#define DEBUG_ASSERT_MSG(_a_, ...) ASSERT_MSG(_a_, __VA_ARGS__)
#else // not debug
#define DEBUG_ASSERT(_a_)
#define DEBUG_ASSERT_MSG(_a_, _desc_, ...)
#define DEBUG_ASSERT(_a_) \
do { \
} while (0)
#define DEBUG_ASSERT_MSG(_a_, _desc_, ...) \
do { \
} while (0)
#endif
#define UNIMPLEMENTED() ASSERT_MSG(false, "Unimplemented code!")

View File

@@ -61,7 +61,7 @@ template <typename ContiguousContainer>
return out;
}
[[nodiscard]] constexpr std::array<u8, 16> AsArray(const char (&data)[17]) {
[[nodiscard]] constexpr std::array<u8, 16> AsArray(const char (&data)[33]) {
return HexStringToArray<16>(data);
}

View File

@@ -6,10 +6,64 @@
#include <fmt/format.h>
#include "common/assert.h"
#include "common/uuid.h"
namespace Common {
namespace {
bool IsHexDigit(char c) {
return (c >= '0' && c <= '9') || (c >= 'a' && c <= 'f') || (c >= 'A' && c <= 'F');
}
u8 HexCharToByte(char c) {
if (c >= '0' && c <= '9') {
return static_cast<u8>(c - '0');
}
if (c >= 'a' && c <= 'f') {
return static_cast<u8>(c - 'a' + 10);
}
if (c >= 'A' && c <= 'F') {
return static_cast<u8>(c - 'A' + 10);
}
ASSERT_MSG(false, "{} is not a hexadecimal digit!", c);
return u8{0};
}
} // Anonymous namespace
u128 HexStringToU128(std::string_view hex_string) {
const size_t length = hex_string.length();
// Detect "0x" prefix.
const bool has_0x_prefix = length > 2 && hex_string[0] == '0' && hex_string[1] == 'x';
const size_t offset = has_0x_prefix ? 2 : 0;
// Check length.
if (length > 32 + offset) {
ASSERT_MSG(false, "hex_string has more than 32 hexadecimal characters!");
return INVALID_UUID;
}
u64 lo = 0;
u64 hi = 0;
for (size_t i = 0; i < length - offset; ++i) {
const char c = hex_string[length - 1 - i];
if (!IsHexDigit(c)) {
ASSERT_MSG(false, "{} is not a hexadecimal digit!", c);
return INVALID_UUID;
}
if (i < 16) {
lo |= u64{HexCharToByte(c)} << (i * 4);
}
if (i >= 16) {
hi |= u64{HexCharToByte(c)} << ((i - 16) * 4);
}
}
return u128{lo, hi};
}
UUID UUID::Generate() {
std::random_device device;
std::mt19937 gen(device());

View File

@@ -5,6 +5,7 @@
#pragma once
#include <string>
#include <string_view>
#include "common/common_types.h"
@@ -12,12 +13,30 @@ namespace Common {
constexpr u128 INVALID_UUID{{0, 0}};
/**
* Converts a hex string to a 128-bit unsigned integer.
*
* The hex string can be formatted in lowercase or uppercase, with or without the "0x" prefix.
*
* This function will assert and return INVALID_UUID under the following conditions:
* - If the hex string is more than 32 characters long
* - If the hex string contains non-hexadecimal characters
*
* @param hex_string Hexadecimal string
*
* @returns A 128-bit unsigned integer if successfully converted, INVALID_UUID otherwise.
*/
[[nodiscard]] u128 HexStringToU128(std::string_view hex_string);
struct UUID {
// UUIDs which are 0 are considered invalid!
u128 uuid;
UUID() = default;
constexpr explicit UUID(const u128& id) : uuid{id} {}
constexpr explicit UUID(const u64 lo, const u64 hi) : uuid{{lo, hi}} {}
explicit UUID(std::string_view hex_string) {
uuid = HexStringToU128(hex_string);
}
[[nodiscard]] constexpr explicit operator bool() const {
return uuid != INVALID_UUID;
@@ -50,3 +69,14 @@ struct UUID {
static_assert(sizeof(UUID) == 16, "UUID is an invalid size!");
} // namespace Common
namespace std {
template <>
struct hash<Common::UUID> {
size_t operator()(const Common::UUID& uuid) const noexcept {
return uuid.uuid[1] ^ uuid.uuid[0];
}
};
} // namespace std

View File

@@ -377,7 +377,8 @@ void SoftwareKeyboard::SubmitForTextCheck(std::u16string submitted_text) {
if (swkbd_config_common.use_utf8) {
std::string utf8_submitted_text = Common::UTF16ToUTF8(current_text);
const u64 buffer_size = utf8_submitted_text.size();
// Include the null terminator in the buffer size.
const u64 buffer_size = utf8_submitted_text.size() + 1;
LOG_DEBUG(Service_AM, "\nBuffer Size: {}\nUTF-8 Submitted Text: {}", buffer_size,
utf8_submitted_text);
@@ -386,7 +387,8 @@ void SoftwareKeyboard::SubmitForTextCheck(std::u16string submitted_text) {
std::memcpy(out_data.data() + sizeof(u64), utf8_submitted_text.data(),
utf8_submitted_text.size());
} else {
const u64 buffer_size = current_text.size() * sizeof(char16_t);
// Include the null terminator in the buffer size.
const u64 buffer_size = (current_text.size() + 1) * sizeof(char16_t);
LOG_DEBUG(Service_AM, "\nBuffer Size: {}\nUTF-16 Submitted Text: {}", buffer_size,
Common::UTF16ToUTF8(current_text));

View File

@@ -166,8 +166,6 @@ NvResult nvhost_nvdec_common::MapBuffer(const std::vector<u8>& input, std::vecto
LOG_ERROR(Service_NVDRV, "failed to map size={}", object->size);
} else {
cmd_buffer.map_address = object->dma_map_addr;
AddBufferMap(object->dma_map_addr, object->size, object->addr,
object->status == nvmap::Object::Status::Allocated);
}
}
std::memcpy(output.data(), &params, sizeof(IoctlMapBuffer));
@@ -178,30 +176,11 @@ NvResult nvhost_nvdec_common::MapBuffer(const std::vector<u8>& input, std::vecto
}
NvResult nvhost_nvdec_common::UnmapBuffer(const std::vector<u8>& input, std::vector<u8>& output) {
IoctlMapBuffer params{};
std::memcpy(&params, input.data(), sizeof(IoctlMapBuffer));
std::vector<MapBufferEntry> cmd_buffer_handles(params.num_entries);
SliceVectors(input, cmd_buffer_handles, params.num_entries, sizeof(IoctlMapBuffer));
auto& gpu = system.GPU();
for (auto& cmd_buffer : cmd_buffer_handles) {
const auto object{nvmap_dev->GetObject(cmd_buffer.map_handle)};
if (!object) {
LOG_ERROR(Service_NVDRV, "invalid cmd_buffer nvmap_handle={:X}", cmd_buffer.map_handle);
std::memcpy(output.data(), &params, output.size());
return NvResult::InvalidState;
}
if (const auto size{RemoveBufferMap(object->dma_map_addr)}; size) {
gpu.MemoryManager().Unmap(object->dma_map_addr, *size);
} else {
// This occurs quite frequently, however does not seem to impact functionality
LOG_DEBUG(Service_NVDRV, "invalid offset=0x{:X} dma=0x{:X}", object->addr,
object->dma_map_addr);
}
object->dma_map_addr = 0;
}
// This is intntionally stubbed.
// Skip unmapping buffers here, as to not break the continuity of the VP9 reference frame
// addresses, and risk invalidating data before the async GPU thread is done with it
std::memset(output.data(), 0, output.size());
LOG_DEBUG(Service_NVDRV, "(STUBBED) called");
return NvResult::Success;
}
@@ -212,33 +191,4 @@ NvResult nvhost_nvdec_common::SetSubmitTimeout(const std::vector<u8>& input,
return NvResult::Success;
}
std::optional<nvhost_nvdec_common::BufferMap> nvhost_nvdec_common::FindBufferMap(
GPUVAddr gpu_addr) const {
const auto it = std::find_if(
buffer_mappings.begin(), buffer_mappings.upper_bound(gpu_addr), [&](const auto& entry) {
return (gpu_addr >= entry.second.StartAddr() && gpu_addr < entry.second.EndAddr());
});
ASSERT(it != buffer_mappings.end());
return it->second;
}
void nvhost_nvdec_common::AddBufferMap(GPUVAddr gpu_addr, std::size_t size, VAddr cpu_addr,
bool is_allocated) {
buffer_mappings.insert_or_assign(gpu_addr, BufferMap{gpu_addr, size, cpu_addr, is_allocated});
}
std::optional<std::size_t> nvhost_nvdec_common::RemoveBufferMap(GPUVAddr gpu_addr) {
const auto iter{buffer_mappings.find(gpu_addr)};
if (iter == buffer_mappings.end()) {
return std::nullopt;
}
std::size_t size = 0;
if (iter->second.IsAllocated()) {
size = iter->second.Size();
}
buffer_mappings.erase(iter);
return size;
}
} // namespace Service::Nvidia::Devices

View File

@@ -23,45 +23,6 @@ public:
~nvhost_nvdec_common() override;
protected:
class BufferMap final {
public:
constexpr BufferMap() = default;
constexpr BufferMap(GPUVAddr start_addr_, std::size_t size_)
: start_addr{start_addr_}, end_addr{start_addr_ + size_} {}
constexpr BufferMap(GPUVAddr start_addr_, std::size_t size_, VAddr cpu_addr_,
bool is_allocated_)
: start_addr{start_addr_}, end_addr{start_addr_ + size_}, cpu_addr{cpu_addr_},
is_allocated{is_allocated_} {}
constexpr VAddr StartAddr() const {
return start_addr;
}
constexpr VAddr EndAddr() const {
return end_addr;
}
constexpr std::size_t Size() const {
return end_addr - start_addr;
}
constexpr VAddr CpuAddr() const {
return cpu_addr;
}
constexpr bool IsAllocated() const {
return is_allocated;
}
private:
GPUVAddr start_addr{};
GPUVAddr end_addr{};
VAddr cpu_addr{};
bool is_allocated{};
};
struct IoctlSetNvmapFD {
s32_le nvmap_fd{};
};
@@ -154,17 +115,11 @@ protected:
NvResult UnmapBuffer(const std::vector<u8>& input, std::vector<u8>& output);
NvResult SetSubmitTimeout(const std::vector<u8>& input, std::vector<u8>& output);
std::optional<BufferMap> FindBufferMap(GPUVAddr gpu_addr) const;
void AddBufferMap(GPUVAddr gpu_addr, std::size_t size, VAddr cpu_addr, bool is_allocated);
std::optional<std::size_t> RemoveBufferMap(GPUVAddr gpu_addr);
s32_le nvmap_fd{};
u32_le submit_timeout{};
std::shared_ptr<nvmap> nvmap_dev;
SyncpointManager& syncpoint_manager;
std::array<u32, MaxSyncPoints> device_syncpoints{};
// This is expected to be ordered, therefore we must use a map, not unordered_map
std::map<GPUVAddr, BufferMap> buffer_mappings;
};
}; // namespace Devices
} // namespace Service::Nvidia

View File

@@ -304,10 +304,10 @@ std::vector<std::unique_ptr<Polling::DevicePoller>> InputSubsystem::GetPollers([
}
std::string GenerateKeyboardParam(int key_code) {
Common::ParamPackage param{
{"engine", "keyboard"},
{"code", std::to_string(key_code)},
};
Common::ParamPackage param;
param.Set("engine", "keyboard");
param.Set("code", key_code);
param.Set("toggle", false);
return param.Serialize();
}

View File

@@ -57,6 +57,7 @@ Common::ParamPackage MouseButtonFactory::GetNextInput() const {
if (pad.button != MouseInput::MouseButton::Undefined) {
params.Set("engine", "mouse");
params.Set("button", static_cast<u16>(pad.button));
params.Set("toggle", false);
return params;
}
}

View File

@@ -82,6 +82,12 @@ public:
state.buttons.insert_or_assign(button, value);
}
void PreSetButton(int button) {
if (!state.buttons.contains(button)) {
SetButton(button, false);
}
}
void SetMotion(SDL_ControllerSensorEvent event) {
constexpr float gravity_constant = 9.80665f;
std::lock_guard lock{mutex};
@@ -155,9 +161,16 @@ public:
state.axes.insert_or_assign(axis, value);
}
float GetAxis(int axis, float range) const {
void PreSetAxis(int axis) {
if (!state.axes.contains(axis)) {
SetAxis(axis, 0);
}
}
float GetAxis(int axis, float range, float offset) const {
std::lock_guard lock{mutex};
return static_cast<float>(state.axes.at(axis)) / (32767.0f * range);
const float value = static_cast<float>(state.axes.at(axis)) / 32767.0f;
return (value + offset) / range;
}
bool RumblePlay(u16 amp_low, u16 amp_high) {
@@ -174,9 +187,10 @@ public:
return false;
}
std::tuple<float, float> GetAnalog(int axis_x, int axis_y, float range) const {
float x = GetAxis(axis_x, range);
float y = GetAxis(axis_y, range);
std::tuple<float, float> GetAnalog(int axis_x, int axis_y, float range, float offset_x,
float offset_y) const {
float x = GetAxis(axis_x, range, offset_x);
float y = GetAxis(axis_y, range, offset_y);
y = -y; // 3DS uses an y-axis inverse from SDL
// Make sure the coordinates are in the unit circle,
@@ -483,7 +497,7 @@ public:
trigger_if_greater(trigger_if_greater_) {}
bool GetStatus() const override {
const float axis_value = joystick->GetAxis(axis, 1.0f);
const float axis_value = joystick->GetAxis(axis, 1.0f, 0.0f);
if (trigger_if_greater) {
return axis_value > threshold;
}
@@ -500,12 +514,14 @@ private:
class SDLAnalog final : public Input::AnalogDevice {
public:
explicit SDLAnalog(std::shared_ptr<SDLJoystick> joystick_, int axis_x_, int axis_y_,
bool invert_x_, bool invert_y_, float deadzone_, float range_)
bool invert_x_, bool invert_y_, float deadzone_, float range_,
float offset_x_, float offset_y_)
: joystick(std::move(joystick_)), axis_x(axis_x_), axis_y(axis_y_), invert_x(invert_x_),
invert_y(invert_y_), deadzone(deadzone_), range(range_) {}
invert_y(invert_y_), deadzone(deadzone_), range(range_), offset_x(offset_x_),
offset_y(offset_y_) {}
std::tuple<float, float> GetStatus() const override {
auto [x, y] = joystick->GetAnalog(axis_x, axis_y, range);
auto [x, y] = joystick->GetAnalog(axis_x, axis_y, range, offset_x, offset_y);
const float r = std::sqrt((x * x) + (y * y));
if (invert_x) {
x = -x;
@@ -522,8 +538,8 @@ public:
}
std::tuple<float, float> GetRawStatus() const override {
const float x = joystick->GetAxis(axis_x, range);
const float y = joystick->GetAxis(axis_y, range);
const float x = joystick->GetAxis(axis_x, range, offset_x);
const float y = joystick->GetAxis(axis_y, range, offset_y);
return {x, -y};
}
@@ -555,6 +571,8 @@ private:
const bool invert_y;
const float deadzone;
const float range;
const float offset_x;
const float offset_y;
};
class SDLVibration final : public Input::VibrationDevice {
@@ -621,7 +639,7 @@ public:
trigger_if_greater(trigger_if_greater_) {}
Input::MotionStatus GetStatus() const override {
const float axis_value = joystick->GetAxis(axis, 1.0f);
const float axis_value = joystick->GetAxis(axis, 1.0f, 0.0f);
bool trigger = axis_value < threshold;
if (trigger_if_greater) {
trigger = axis_value > threshold;
@@ -720,13 +738,13 @@ public:
LOG_ERROR(Input, "Unknown direction {}", direction_name);
}
// This is necessary so accessing GetAxis with axis won't crash
joystick->SetAxis(axis, 0);
joystick->PreSetAxis(axis);
return std::make_unique<SDLAxisButton>(joystick, axis, threshold, trigger_if_greater);
}
const int button = params.Get("button", 0);
// This is necessary so accessing GetButton with button won't crash
joystick->SetButton(button, false);
joystick->PreSetButton(button);
return std::make_unique<SDLButton>(joystick, button, toggle);
}
@@ -757,13 +775,15 @@ public:
const std::string invert_y_value = params.Get("invert_y", "+");
const bool invert_x = invert_x_value == "-";
const bool invert_y = invert_y_value == "-";
const float offset_x = params.Get("offset_x", 0.0f);
const float offset_y = params.Get("offset_y", 0.0f);
auto joystick = state.GetSDLJoystickByGUID(guid, port);
// This is necessary so accessing GetAxis with axis_x and axis_y won't crash
joystick->SetAxis(axis_x, 0);
joystick->SetAxis(axis_y, 0);
joystick->PreSetAxis(axis_x);
joystick->PreSetAxis(axis_y);
return std::make_unique<SDLAnalog>(joystick, axis_x, axis_y, invert_x, invert_y, deadzone,
range);
range, offset_x, offset_y);
}
private:
@@ -844,13 +864,13 @@ public:
LOG_ERROR(Input, "Unknown direction {}", direction_name);
}
// This is necessary so accessing GetAxis with axis won't crash
joystick->SetAxis(axis, 0);
joystick->PreSetAxis(axis);
return std::make_unique<SDLAxisMotion>(joystick, axis, threshold, trigger_if_greater);
}
const int button = params.Get("button", 0);
// This is necessary so accessing GetButton with button won't crash
joystick->SetButton(button, false);
joystick->PreSetButton(button);
return std::make_unique<SDLButtonMotion>(joystick, button);
}
@@ -995,6 +1015,7 @@ Common::ParamPackage BuildButtonParamPackageForButton(int port, std::string guid
params.Set("port", port);
params.Set("guid", std::move(guid));
params.Set("button", button);
params.Set("toggle", false);
return params;
}
@@ -1134,13 +1155,15 @@ Common::ParamPackage BuildParamPackageForBinding(int port, const std::string& gu
}
Common::ParamPackage BuildParamPackageForAnalog(int port, const std::string& guid, int axis_x,
int axis_y) {
int axis_y, float offset_x, float offset_y) {
Common::ParamPackage params;
params.Set("engine", "sdl");
params.Set("port", port);
params.Set("guid", guid);
params.Set("axis_x", axis_x);
params.Set("axis_y", axis_y);
params.Set("offset_x", offset_x);
params.Set("offset_y", offset_y);
params.Set("invert_x", "+");
params.Set("invert_y", "+");
return params;
@@ -1342,24 +1365,39 @@ AnalogMapping SDLState::GetAnalogMappingForDevice(const Common::ParamPackage& pa
const auto& binding_left_y =
SDL_GameControllerGetBindForAxis(controller, SDL_CONTROLLER_AXIS_LEFTY);
if (params.Has("guid2")) {
joystick2->PreSetAxis(binding_left_x.value.axis);
joystick2->PreSetAxis(binding_left_y.value.axis);
const auto left_offset_x = -joystick2->GetAxis(binding_left_x.value.axis, 1.0f, 0);
const auto left_offset_y = -joystick2->GetAxis(binding_left_y.value.axis, 1.0f, 0);
mapping.insert_or_assign(
Settings::NativeAnalog::LStick,
BuildParamPackageForAnalog(joystick2->GetPort(), joystick2->GetGUID(),
binding_left_x.value.axis, binding_left_y.value.axis));
binding_left_x.value.axis, binding_left_y.value.axis,
left_offset_x, left_offset_y));
} else {
joystick->PreSetAxis(binding_left_x.value.axis);
joystick->PreSetAxis(binding_left_y.value.axis);
const auto left_offset_x = -joystick->GetAxis(binding_left_x.value.axis, 1.0f, 0);
const auto left_offset_y = -joystick->GetAxis(binding_left_y.value.axis, 1.0f, 0);
mapping.insert_or_assign(
Settings::NativeAnalog::LStick,
BuildParamPackageForAnalog(joystick->GetPort(), joystick->GetGUID(),
binding_left_x.value.axis, binding_left_y.value.axis));
binding_left_x.value.axis, binding_left_y.value.axis,
left_offset_x, left_offset_y));
}
const auto& binding_right_x =
SDL_GameControllerGetBindForAxis(controller, SDL_CONTROLLER_AXIS_RIGHTX);
const auto& binding_right_y =
SDL_GameControllerGetBindForAxis(controller, SDL_CONTROLLER_AXIS_RIGHTY);
joystick->PreSetAxis(binding_right_x.value.axis);
joystick->PreSetAxis(binding_right_y.value.axis);
const auto right_offset_x = -joystick->GetAxis(binding_right_x.value.axis, 1.0f, 0);
const auto right_offset_y = -joystick->GetAxis(binding_right_y.value.axis, 1.0f, 0);
mapping.insert_or_assign(Settings::NativeAnalog::RStick,
BuildParamPackageForAnalog(joystick->GetPort(), joystick->GetGUID(),
binding_right_x.value.axis,
binding_right_y.value.axis));
binding_right_y.value.axis, right_offset_x,
right_offset_y));
return mapping;
}
@@ -1563,8 +1601,9 @@ public:
}
if (const auto joystick = state.GetSDLJoystickBySDLID(event.jaxis.which)) {
// Set offset to zero since the joystick is not on center
auto params = BuildParamPackageForAnalog(joystick->GetPort(), joystick->GetGUID(),
first_axis, axis);
first_axis, axis, 0, 0);
first_axis = -1;
return params;
}

View File

@@ -1,5 +1,10 @@
add_subdirectory(host_shaders)
if(LIBVA_FOUND)
set_source_files_properties(command_classes/codecs/codec.cpp
PROPERTIES COMPILE_DEFINITIONS LIBVA_FOUND=1)
endif()
add_library(video_core STATIC
buffer_cache/buffer_base.h
buffer_cache/buffer_cache.cpp

View File

@@ -2,7 +2,6 @@
// Licensed under GPLv2 or any later version
// Refer to the license.txt file included.
#include <cstring>
#include <fstream>
#include <vector>
#include "common/assert.h"
@@ -17,10 +16,47 @@ extern "C" {
}
namespace Tegra {
#if defined(LIBVA_FOUND)
// Hardware acceleration code from FFmpeg/doc/examples/hw_decode.c originally under MIT license
namespace {
constexpr std::array<const char*, 2> VAAPI_DRIVERS = {
"i915",
"amdgpu",
};
AVPixelFormat GetHwFormat(AVCodecContext*, const AVPixelFormat* pix_fmts) {
for (const AVPixelFormat* p = pix_fmts; *p != AV_PIX_FMT_NONE; ++p) {
if (*p == AV_PIX_FMT_VAAPI) {
return AV_PIX_FMT_VAAPI;
}
}
LOG_INFO(Service_NVDRV, "Could not find compatible GPU AV format, falling back to CPU");
return *pix_fmts;
}
bool CreateVaapiHwdevice(AVBufferRef** av_hw_device) {
AVDictionary* hwdevice_options = nullptr;
av_dict_set(&hwdevice_options, "connection_type", "drm", 0);
for (const auto& driver : VAAPI_DRIVERS) {
av_dict_set(&hwdevice_options, "kernel_driver", driver, 0);
const int hwdevice_error = av_hwdevice_ctx_create(av_hw_device, AV_HWDEVICE_TYPE_VAAPI,
nullptr, hwdevice_options, 0);
if (hwdevice_error >= 0) {
LOG_INFO(Service_NVDRV, "Using VA-API with {}", driver);
av_dict_free(&hwdevice_options);
return true;
}
LOG_DEBUG(Service_NVDRV, "VA-API av_hwdevice_ctx_create failed {}", hwdevice_error);
}
LOG_DEBUG(Service_NVDRV, "VA-API av_hwdevice_ctx_create failed for all drivers");
av_dict_free(&hwdevice_options);
return false;
}
} // namespace
#endif
void AVFrameDeleter(AVFrame* ptr) {
av_frame_unref(ptr);
av_free(ptr);
av_frame_free(&ptr);
}
Codec::Codec(GPU& gpu_, const NvdecCommon::NvdecRegisters& regs)
@@ -32,19 +68,31 @@ Codec::~Codec() {
return;
}
// Free libav memory
AVFrame* av_frame{nullptr};
avcodec_send_packet(av_codec_ctx, nullptr);
av_frame = av_frame_alloc();
AVFrame* av_frame = av_frame_alloc();
avcodec_receive_frame(av_codec_ctx, av_frame);
avcodec_flush_buffers(av_codec_ctx);
av_frame_unref(av_frame);
av_free(av_frame);
av_frame_free(&av_frame);
avcodec_close(av_codec_ctx);
av_buffer_unref(&av_hw_device);
}
void Codec::InitializeHwdec() {
// Prioritize integrated GPU to mitigate bandwidth bottlenecks
#if defined(LIBVA_FOUND)
if (CreateVaapiHwdevice(&av_hw_device)) {
const auto hw_device_ctx = av_buffer_ref(av_hw_device);
ASSERT_MSG(hw_device_ctx, "av_buffer_ref failed");
av_codec_ctx->hw_device_ctx = hw_device_ctx;
av_codec_ctx->get_format = GetHwFormat;
return;
}
#endif
// TODO more GPU accelerated decoders
}
void Codec::Initialize() {
AVCodecID codec{AV_CODEC_ID_NONE};
AVCodecID codec;
switch (current_codec) {
case NvdecCommon::VideoCodec::H264:
codec = AV_CODEC_ID_H264;
@@ -53,22 +101,24 @@ void Codec::Initialize() {
codec = AV_CODEC_ID_VP9;
break;
default:
UNIMPLEMENTED_MSG("Unknown codec {}", current_codec);
return;
}
av_codec = avcodec_find_decoder(codec);
av_codec_ctx = avcodec_alloc_context3(av_codec);
av_opt_set(av_codec_ctx->priv_data, "tune", "zerolatency", 0);
// TODO(ameerj): libavcodec gpu hw acceleration
InitializeHwdec();
if (!av_codec_ctx->hw_device_ctx) {
LOG_INFO(Service_NVDRV, "Using FFmpeg software decoding");
}
const auto av_error = avcodec_open2(av_codec_ctx, av_codec, nullptr);
if (av_error < 0) {
LOG_ERROR(Service_NVDRV, "avcodec_open2() Failed.");
avcodec_close(av_codec_ctx);
av_buffer_unref(&av_hw_device);
return;
}
initialized = true;
return;
}
void Codec::SetTargetCodec(NvdecCommon::VideoCodec codec) {
@@ -80,36 +130,64 @@ void Codec::SetTargetCodec(NvdecCommon::VideoCodec codec) {
void Codec::Decode() {
const bool is_first_frame = !initialized;
if (!initialized) {
if (is_first_frame) {
Initialize();
}
bool vp9_hidden_frame = false;
AVPacket packet{};
av_init_packet(&packet);
std::vector<u8> frame_data;
if (current_codec == NvdecCommon::VideoCodec::H264) {
frame_data = h264_decoder->ComposeFrameHeader(state, is_first_frame);
} else if (current_codec == NvdecCommon::VideoCodec::Vp9) {
frame_data = vp9_decoder->ComposeFrameHeader(state);
vp9_hidden_frame = vp9_decoder->WasFrameHidden();
}
AVPacket packet{};
av_init_packet(&packet);
packet.data = frame_data.data();
packet.size = static_cast<s32>(frame_data.size());
avcodec_send_packet(av_codec_ctx, &packet);
if (!vp9_hidden_frame) {
// Only receive/store visible frames
AVFramePtr frame = AVFramePtr{av_frame_alloc(), AVFrameDeleter};
avcodec_receive_frame(av_codec_ctx, frame.get());
av_frames.push(std::move(frame));
// Limit queue to 10 frames. Workaround for ZLA decode and queue spam
if (av_frames.size() > 10) {
av_frames.pop();
}
if (const int ret = avcodec_send_packet(av_codec_ctx, &packet); ret) {
LOG_DEBUG(Service_NVDRV, "avcodec_send_packet error {}", ret);
return;
}
// Only receive/store visible frames
if (vp9_hidden_frame) {
return;
}
AVFrame* hw_frame = av_frame_alloc();
AVFrame* sw_frame = hw_frame;
ASSERT_MSG(hw_frame, "av_frame_alloc hw_frame failed");
if (const int ret = avcodec_receive_frame(av_codec_ctx, hw_frame); ret) {
LOG_DEBUG(Service_NVDRV, "avcodec_receive_frame error {}", ret);
av_frame_free(&hw_frame);
return;
}
if (!hw_frame->width || !hw_frame->height) {
LOG_WARNING(Service_NVDRV, "Zero width or height in frame");
av_frame_free(&hw_frame);
return;
}
#if defined(LIBVA_FOUND)
// Hardware acceleration code from FFmpeg/doc/examples/hw_decode.c under MIT license
if (hw_frame->format == AV_PIX_FMT_VAAPI) {
sw_frame = av_frame_alloc();
ASSERT_MSG(sw_frame, "av_frame_alloc sw_frame failed");
// Can't use AV_PIX_FMT_YUV420P and share code with software decoding in vic.cpp
// because Intel drivers crash unless using AV_PIX_FMT_NV12
sw_frame->format = AV_PIX_FMT_NV12;
const int transfer_data_ret = av_hwframe_transfer_data(sw_frame, hw_frame, 0);
ASSERT_MSG(!transfer_data_ret, "av_hwframe_transfer_data error {}", transfer_data_ret);
av_frame_free(&hw_frame);
}
#endif
if (sw_frame->format != AV_PIX_FMT_YUV420P && sw_frame->format != AV_PIX_FMT_NV12) {
UNIMPLEMENTED_MSG("Unexpected video format from host graphics: {}", sw_frame->format);
av_frame_free(&sw_frame);
return;
}
av_frames.push(AVFramePtr{sw_frame, AVFrameDeleter});
if (av_frames.size() > 10) {
LOG_TRACE(Service_NVDRV, "av_frames.push overflow dropped frame");
av_frames.pop();
}
}
@@ -119,7 +197,6 @@ AVFramePtr Codec::GetCurrentFrame() {
if (av_frames.empty()) {
return AVFramePtr{nullptr, AVFrameDeleter};
}
AVFramePtr frame = std::move(av_frames.front());
av_frames.pop();
return frame;
@@ -144,6 +221,5 @@ std::string_view Codec::GetCurrentCodecName() const {
default:
return "Unknown";
}
};
}
} // namespace Tegra

View File

@@ -22,7 +22,6 @@ extern "C" {
namespace Tegra {
class GPU;
struct VicRegisters;
void AVFrameDeleter(AVFrame* ptr);
using AVFramePtr = std::unique_ptr<AVFrame, decltype(&AVFrameDeleter)>;
@@ -55,10 +54,13 @@ public:
[[nodiscard]] std::string_view GetCurrentCodecName() const;
private:
void InitializeHwdec();
bool initialized{};
NvdecCommon::VideoCodec current_codec{NvdecCommon::VideoCodec::None};
AVCodec* av_codec{nullptr};
AVBufferRef* av_hw_device{nullptr};
AVCodecContext* av_codec_ctx{nullptr};
GPU& gpu;

View File

@@ -11,6 +11,9 @@
namespace Tegra::Decoder {
namespace {
constexpr u32 diff_update_probability = 252;
constexpr u32 frame_sync_code = 0x498342;
// Default compressed header probabilities once frame context resets
constexpr Vp9EntropyProbs default_probs{
.y_mode_prob{
@@ -361,8 +364,7 @@ Vp9PictureInfo VP9::GetVp9PictureInfo(const NvdecCommon::NvdecRegisters& state)
InsertEntropy(state.vp9_entropy_probs_offset, vp9_info.entropy);
// surface_luma_offset[0:3] contains the address of the reference frame offsets in the following
// order: last, golden, altref, current. It may be worthwhile to track the updates done here
// to avoid buffering frame data needed for reference frame updating in the header composition.
// order: last, golden, altref, current.
std::copy(state.surface_luma_offset.begin(), state.surface_luma_offset.begin() + 4,
vp9_info.frame_offsets.begin());
@@ -384,33 +386,18 @@ Vp9FrameContainer VP9::GetCurrentFrame(const NvdecCommon::NvdecRegisters& state)
gpu.MemoryManager().ReadBlock(state.frame_bitstream_offset, current_frame.bit_stream.data(),
current_frame.info.bitstream_size);
}
// Buffer two frames, saving the last show frame info
if (!next_next_frame.bit_stream.empty()) {
if (!next_frame.bit_stream.empty()) {
Vp9FrameContainer temp{
.info = current_frame.info,
.bit_stream = std::move(current_frame.bit_stream),
};
next_next_frame.info.show_frame = current_frame.info.last_frame_shown;
current_frame.info = next_next_frame.info;
current_frame.bit_stream = std::move(next_next_frame.bit_stream);
next_next_frame = std::move(temp);
if (!next_frame.bit_stream.empty()) {
Vp9FrameContainer temp2{
.info = current_frame.info,
.bit_stream = std::move(current_frame.bit_stream),
};
next_frame.info.show_frame = current_frame.info.last_frame_shown;
current_frame.info = next_frame.info;
current_frame.bit_stream = std::move(next_frame.bit_stream);
next_frame = std::move(temp2);
} else {
next_frame.info = current_frame.info;
next_frame.bit_stream = std::move(current_frame.bit_stream);
}
next_frame.info.show_frame = current_frame.info.last_frame_shown;
current_frame.info = next_frame.info;
current_frame.bit_stream = std::move(next_frame.bit_stream);
next_frame = std::move(temp);
} else {
next_next_frame.info = current_frame.info;
next_next_frame.bit_stream = std::move(current_frame.bit_stream);
next_frame.info = current_frame.info;
next_frame.bit_stream = std::move(current_frame.bit_stream);
}
return current_frame;
}
@@ -613,86 +600,64 @@ VpxBitStreamWriter VP9::ComposeUncompressedHeader() {
// Reset context
prev_frame_probs = default_probs;
swap_next_golden = false;
swap_ref_indices = false;
loop_filter_ref_deltas.fill(0);
loop_filter_mode_deltas.fill(0);
// allow frames offsets to stabilize before checking for golden frames
grace_period = 4;
// On key frames, all frame slots are set to the current frame,
// so the value of the selected slot doesn't really matter.
frame_ctxs.fill({current_frame_number, false, default_probs});
frame_ctxs.fill(default_probs);
// intra only, meaning the frame can be recreated with no other references
current_frame_info.intra_only = true;
} else {
if (!current_frame_info.show_frame) {
uncomp_writer.WriteBit(current_frame_info.intra_only);
if (!current_frame_info.last_frame_was_key) {
swap_next_golden = !swap_next_golden;
}
} else {
current_frame_info.intra_only = false;
}
if (!current_frame_info.error_resilient_mode) {
uncomp_writer.WriteU(0, 2); // Reset frame context.
}
// Last, Golden, Altref frames
std::array<s32, 3> ref_frame_index{0, 1, 2};
// Set when next frame is hidden
// altref and golden references are swapped
if (swap_next_golden) {
ref_frame_index = std::array<s32, 3>{0, 2, 1};
const auto& curr_offsets = current_frame_info.frame_offsets;
const auto& next_offsets = next_frame.info.frame_offsets;
const bool ref_frames_different = curr_offsets[1] != curr_offsets[2];
const bool next_references_swap =
(next_offsets[1] == curr_offsets[2]) || (next_offsets[2] == curr_offsets[1]);
const bool needs_ref_swap = ref_frames_different && next_references_swap;
if (needs_ref_swap) {
swap_ref_indices = !swap_ref_indices;
}
union {
u32 raw;
BitField<0, 1, u32> refresh_last;
BitField<1, 2, u32> refresh_golden;
BitField<2, 1, u32> refresh_alt;
} refresh_frame_flags;
// update Last Frame
u64 refresh_frame_flags = 1;
// golden frame may refresh, determined if the next golden frame offset is changed
bool golden_refresh = false;
if (grace_period <= 0) {
for (s32 index = 1; index < 3; ++index) {
if (current_frame_info.frame_offsets[index] !=
next_frame.info.frame_offsets[index]) {
current_frame_info.refresh_frame[index] = true;
golden_refresh = true;
grace_period = 3;
}
refresh_frame_flags.raw = 0;
for (u32 index = 0; index < 3; ++index) {
// Refresh indices that use the current frame as an index
if (curr_offsets[3] == next_offsets[index]) {
refresh_frame_flags.raw |= 1u << index;
}
}
if (current_frame_info.show_frame &&
(!next_frame.info.show_frame || next_frame.info.is_key_frame)) {
// Update golden frame
refresh_frame_flags = swap_next_golden ? 2 : 4;
if (swap_ref_indices) {
const u32 temp = refresh_frame_flags.refresh_golden;
refresh_frame_flags.refresh_golden.Assign(refresh_frame_flags.refresh_alt.Value());
refresh_frame_flags.refresh_alt.Assign(temp);
}
if (!current_frame_info.show_frame) {
// Update altref
refresh_frame_flags = swap_next_golden ? 2 : 4;
} else if (golden_refresh) {
refresh_frame_flags = 3;
}
if (current_frame_info.intra_only) {
uncomp_writer.WriteU(frame_sync_code, 24);
uncomp_writer.WriteU(static_cast<s32>(refresh_frame_flags), 8);
uncomp_writer.WriteU(refresh_frame_flags.raw, 8);
uncomp_writer.WriteU(current_frame_info.frame_size.width - 1, 16);
uncomp_writer.WriteU(current_frame_info.frame_size.height - 1, 16);
uncomp_writer.WriteBit(false); // Render and frame size different.
} else {
uncomp_writer.WriteU(static_cast<s32>(refresh_frame_flags), 8);
for (s32 index = 1; index < 4; index++) {
const bool swap_indices = needs_ref_swap ^ swap_ref_indices;
const auto ref_frame_index = swap_indices ? std::array{0, 2, 1} : std::array{0, 1, 2};
uncomp_writer.WriteU(refresh_frame_flags.raw, 8);
for (size_t index = 1; index < 4; index++) {
uncomp_writer.WriteU(ref_frame_index[index - 1], 3);
uncomp_writer.WriteU(current_frame_info.ref_frame_sign_bias[index], 1);
}
uncomp_writer.WriteBit(true); // Frame size with refs.
uncomp_writer.WriteBit(false); // Render and frame size different.
uncomp_writer.WriteBit(current_frame_info.allow_high_precision_mv);
@@ -714,10 +679,9 @@ VpxBitStreamWriter VP9::ComposeUncompressedHeader() {
frame_ctx_idx = 1;
}
uncomp_writer.WriteU(frame_ctx_idx, 2); // Frame context index.
prev_frame_probs =
frame_ctxs[frame_ctx_idx].probs; // reference probabilities for compressed header
frame_ctxs[frame_ctx_idx] = {current_frame_number, false, current_frame_info.entropy};
uncomp_writer.WriteU(frame_ctx_idx, 2); // Frame context index.
prev_frame_probs = frame_ctxs[frame_ctx_idx]; // reference probabilities for compressed header
frame_ctxs[frame_ctx_idx] = current_frame_info.entropy;
uncomp_writer.WriteU(current_frame_info.first_level, 6);
uncomp_writer.WriteU(current_frame_info.sharpness_level, 3);
@@ -812,7 +776,6 @@ const std::vector<u8>& VP9::ComposeFrameHeader(const NvdecCommon::NvdecRegisters
current_frame_info = curr_frame.info;
bitstream = std::move(curr_frame.bit_stream);
}
// The uncompressed header routine sets PrevProb parameters needed for the compressed header
auto uncomp_writer = ComposeUncompressedHeader();
std::vector<u8> compressed_header = ComposeCompressedHeader();
@@ -828,13 +791,6 @@ const std::vector<u8>& VP9::ComposeFrameHeader(const NvdecCommon::NvdecRegisters
frame.begin() + uncompressed_header.size());
std::copy(bitstream.begin(), bitstream.end(),
frame.begin() + uncompressed_header.size() + compressed_header.size());
// keep track of frame number
current_frame_number++;
grace_period--;
// don't display hidden frames
hidden = !current_frame_info.show_frame;
return frame;
}

View File

@@ -14,7 +14,6 @@
namespace Tegra {
class GPU;
enum class FrameType { KeyFrame = 0, InterFrame = 1 };
namespace Decoder {
/// The VpxRangeEncoder, and VpxBitStreamWriter classes are used to compose the
@@ -124,7 +123,7 @@ public:
/// Returns true if the most recent frame was a hidden frame.
[[nodiscard]] bool WasFrameHidden() const {
return hidden;
return !current_frame_info.show_frame;
}
private:
@@ -178,19 +177,12 @@ private:
std::array<s8, 4> loop_filter_ref_deltas{};
std::array<s8, 2> loop_filter_mode_deltas{};
bool hidden = false;
s64 current_frame_number = -2; // since we buffer 2 frames
s32 grace_period = 6; // frame offsets need to stabilize
std::array<FrameContexts, 4> frame_ctxs{};
Vp9FrameContainer next_frame{};
Vp9FrameContainer next_next_frame{};
bool swap_next_golden{};
std::array<Vp9EntropyProbs, 4> frame_ctxs{};
bool swap_ref_indices{};
Vp9PictureInfo current_frame_info{};
Vp9EntropyProbs prev_frame_probs{};
s32 diff_update_probability = 252;
s32 frame_sync_code = 0x498342;
};
} // namespace Decoder

View File

@@ -296,12 +296,6 @@ struct RefPoolElement {
bool refresh{};
};
struct FrameContexts {
s64 from;
bool adapted;
Vp9EntropyProbs probs;
};
#define ASSERT_POSITION(field_name, position) \
static_assert(offsetof(Vp9EntropyProbs, field_name) == position, \
"Field " #field_name " has invalid position")

View File

@@ -39,7 +39,7 @@ void Nvdec::Execute() {
codec->Decode();
break;
default:
UNIMPLEMENTED_MSG("Unknown codec {}", static_cast<u32>(codec->GetCurrentCodec()));
UNIMPLEMENTED_MSG("Codec {}", codec->GetCurrentCodecName());
break;
}
}

View File

@@ -46,11 +46,8 @@ void Vic::ProcessMethod(Method method, u32 argument) {
case Method::SetOutputSurfaceLumaOffset:
output_surface_luma_address = arg;
break;
case Method::SetOutputSurfaceChromaUOffset:
output_surface_chroma_u_address = arg;
break;
case Method::SetOutputSurfaceChromaVOffset:
output_surface_chroma_v_address = arg;
case Method::SetOutputSurfaceChromaOffset:
output_surface_chroma_address = arg;
break;
default:
break;
@@ -65,11 +62,10 @@ void Vic::Execute() {
const VicConfig config{gpu.MemoryManager().Read<u64>(config_struct_address + 0x20)};
const AVFramePtr frame_ptr = nvdec_processor->GetFrame();
const auto* frame = frame_ptr.get();
if (!frame || frame->width == 0 || frame->height == 0) {
if (!frame) {
return;
}
const VideoPixelFormat pixel_format =
static_cast<VideoPixelFormat>(config.pixel_format.Value());
const auto pixel_format = static_cast<VideoPixelFormat>(config.pixel_format.Value());
switch (pixel_format) {
case VideoPixelFormat::BGRA8:
case VideoPixelFormat::RGBA8: {
@@ -83,16 +79,18 @@ void Vic::Execute() {
sws_freeContext(scaler_ctx);
scaler_ctx = nullptr;
// FFmpeg returns all frames in YUV420, convert it into expected format
scaler_ctx =
sws_getContext(frame->width, frame->height, AV_PIX_FMT_YUV420P, frame->width,
frame->height, target_format, 0, nullptr, nullptr, nullptr);
// Frames are decoded into either YUV420 or NV12 formats. Convert to desired format
scaler_ctx = sws_getContext(frame->width, frame->height,
static_cast<AVPixelFormat>(frame->format), frame->width,
frame->height, target_format, 0, nullptr, nullptr, nullptr);
scaler_width = frame->width;
scaler_height = frame->height;
}
// Get Converted frame
const std::size_t linear_size = frame->width * frame->height * 4;
const u32 width = static_cast<u32>(frame->width);
const u32 height = static_cast<u32>(frame->height);
const std::size_t linear_size = width * height * 4;
// Only allocate frame_buffer once per stream, as the size is not expected to change
if (!converted_frame_buffer) {
@@ -109,11 +107,10 @@ void Vic::Execute() {
if (blk_kind != 0) {
// swizzle pitch linear to block linear
const u32 block_height = static_cast<u32>(config.block_linear_height_log2);
const auto size = Tegra::Texture::CalculateSize(true, 4, frame->width, frame->height, 1,
block_height, 0);
const auto size =
Tegra::Texture::CalculateSize(true, 4, width, height, 1, block_height, 0);
luma_buffer.resize(size);
Tegra::Texture::SwizzleSubrect(frame->width, frame->height, frame->width * 4,
frame->width, 4, luma_buffer.data(),
Tegra::Texture::SwizzleSubrect(width, height, width * 4, width, 4, luma_buffer.data(),
converted_frame_buffer.get(), block_height, 0, 0);
gpu.MemoryManager().WriteBlock(output_surface_luma_address, luma_buffer.data(), size);
@@ -131,41 +128,65 @@ void Vic::Execute() {
const std::size_t surface_height = config.surface_height_minus1 + 1;
const auto frame_width = std::min(surface_width, static_cast<size_t>(frame->width));
const auto frame_height = std::min(surface_height, static_cast<size_t>(frame->height));
const std::size_t half_width = frame_width / 2;
const std::size_t half_height = frame_height / 2;
const std::size_t aligned_width = (surface_width + 0xff) & ~0xff;
const std::size_t aligned_width = (surface_width + 0xff) & ~0xffUL;
const auto* luma_ptr = frame->data[0];
const auto* chroma_b_ptr = frame->data[1];
const auto* chroma_r_ptr = frame->data[2];
const auto stride = static_cast<size_t>(frame->linesize[0]);
const auto half_stride = static_cast<size_t>(frame->linesize[1]);
luma_buffer.resize(aligned_width * surface_height);
chroma_buffer.resize(aligned_width * surface_height / 2);
// Populate luma buffer
const u8* luma_src = frame->data[0];
for (std::size_t y = 0; y < frame_height; ++y) {
const std::size_t src = y * stride;
const std::size_t dst = y * aligned_width;
for (std::size_t x = 0; x < frame_width; ++x) {
luma_buffer[dst + x] = luma_ptr[src + x];
luma_buffer[dst + x] = luma_src[src + x];
}
}
gpu.MemoryManager().WriteBlock(output_surface_luma_address, luma_buffer.data(),
luma_buffer.size());
// Populate chroma buffer from both channels with interleaving.
for (std::size_t y = 0; y < half_height; ++y) {
const std::size_t src = y * half_stride;
const std::size_t dst = y * aligned_width;
// Chroma
const std::size_t half_height = frame_height / 2;
const auto half_stride = static_cast<size_t>(frame->linesize[1]);
for (std::size_t x = 0; x < half_width; ++x) {
chroma_buffer[dst + x * 2] = chroma_b_ptr[src + x];
chroma_buffer[dst + x * 2 + 1] = chroma_r_ptr[src + x];
switch (frame->format) {
case AV_PIX_FMT_YUV420P: {
// Frame from FFmpeg software
// Populate chroma buffer from both channels with interleaving.
const std::size_t half_width = frame_width / 2;
const u8* chroma_b_src = frame->data[1];
const u8* chroma_r_src = frame->data[2];
for (std::size_t y = 0; y < half_height; ++y) {
const std::size_t src = y * half_stride;
const std::size_t dst = y * aligned_width;
for (std::size_t x = 0; x < half_width; ++x) {
chroma_buffer[dst + x * 2] = chroma_b_src[src + x];
chroma_buffer[dst + x * 2 + 1] = chroma_r_src[src + x];
}
}
break;
}
gpu.MemoryManager().WriteBlock(output_surface_chroma_u_address, chroma_buffer.data(),
case AV_PIX_FMT_NV12: {
// Frame from VA-API hardware
// This is already interleaved so just copy
const u8* chroma_src = frame->data[1];
for (std::size_t y = 0; y < half_height; ++y) {
const std::size_t src = y * stride;
const std::size_t dst = y * aligned_width;
for (std::size_t x = 0; x < frame_width; ++x) {
chroma_buffer[dst + x] = chroma_src[src + x];
}
}
break;
}
default:
UNREACHABLE();
break;
}
gpu.MemoryManager().WriteBlock(output_surface_chroma_address, chroma_buffer.data(),
chroma_buffer.size());
break;
}

View File

@@ -22,8 +22,8 @@ public:
SetControlParams = 0x1c1,
SetConfigStructOffset = 0x1c2,
SetOutputSurfaceLumaOffset = 0x1c8,
SetOutputSurfaceChromaUOffset = 0x1c9,
SetOutputSurfaceChromaVOffset = 0x1ca
SetOutputSurfaceChromaOffset = 0x1c9,
SetOutputSurfaceChromaUnusedOffset = 0x1ca
};
explicit Vic(GPU& gpu, std::shared_ptr<Nvdec> nvdec_processor);
@@ -64,8 +64,7 @@ private:
GPUVAddr config_struct_address{};
GPUVAddr output_surface_luma_address{};
GPUVAddr output_surface_chroma_u_address{};
GPUVAddr output_surface_chroma_v_address{};
GPUVAddr output_surface_chroma_address{};
SwsContext* scaler_ctx{};
s32 scaler_width{};

View File

@@ -10,33 +10,27 @@
#define END_PUSH_CONSTANTS };
#define UNIFORM(n)
#define BINDING_INPUT_BUFFER 0
#define BINDING_ENC_BUFFER 1
#define BINDING_SWIZZLE_BUFFER 2
#define BINDING_OUTPUT_IMAGE 3
#define BINDING_OUTPUT_IMAGE 1
#else // ^^^ Vulkan ^^^ // vvv OpenGL vvv
#define BEGIN_PUSH_CONSTANTS
#define END_PUSH_CONSTANTS
#define UNIFORM(n) layout(location = n) uniform
#define BINDING_SWIZZLE_BUFFER 0
#define BINDING_INPUT_BUFFER 1
#define BINDING_ENC_BUFFER 2
#define BINDING_INPUT_BUFFER 0
#define BINDING_OUTPUT_IMAGE 0
#endif
layout(local_size_x = 32, local_size_y = 32, local_size_z = 1) in;
layout(local_size_x = 8, local_size_y = 8, local_size_z = 1) in;
BEGIN_PUSH_CONSTANTS
UNIFORM(1) uvec2 block_dims;
UNIFORM(2) uint bytes_per_block_log2;
UNIFORM(3) uint layer_stride;
UNIFORM(4) uint block_size;
UNIFORM(5) uint x_shift;
UNIFORM(6) uint block_height;
UNIFORM(7) uint block_height_mask;
UNIFORM(2) uint layer_stride;
UNIFORM(3) uint block_size;
UNIFORM(4) uint x_shift;
UNIFORM(5) uint block_height;
UNIFORM(6) uint block_height_mask;
END_PUSH_CONSTANTS
struct EncodingData {
@@ -55,45 +49,35 @@ struct TexelWeightParams {
bool void_extent_hdr;
};
// Swizzle data
layout(binding = BINDING_SWIZZLE_BUFFER, std430) readonly buffer SwizzleTable {
uint swizzle_table[];
};
layout(binding = BINDING_INPUT_BUFFER, std430) readonly buffer InputBufferU32 {
uint astc_data[];
};
// ASTC Encodings data
layout(binding = BINDING_ENC_BUFFER, std430) readonly buffer EncodingsValues {
EncodingData encoding_values[];
uvec4 astc_data[];
};
layout(binding = BINDING_OUTPUT_IMAGE, rgba8) uniform writeonly image2DArray dest_image;
const uint GOB_SIZE_X = 64;
const uint GOB_SIZE_Y = 8;
const uint GOB_SIZE_Z = 1;
const uint GOB_SIZE = GOB_SIZE_X * GOB_SIZE_Y * GOB_SIZE_Z;
const uint GOB_SIZE_X_SHIFT = 6;
const uint GOB_SIZE_Y_SHIFT = 3;
const uint GOB_SIZE_Z_SHIFT = 0;
const uint GOB_SIZE_SHIFT = GOB_SIZE_X_SHIFT + GOB_SIZE_Y_SHIFT + GOB_SIZE_Z_SHIFT;
const uint GOB_SIZE_SHIFT = GOB_SIZE_X_SHIFT + GOB_SIZE_Y_SHIFT;
const uvec2 SWIZZLE_MASK = uvec2(GOB_SIZE_X - 1, GOB_SIZE_Y - 1);
const int BLOCK_SIZE_IN_BYTES = 16;
const int BLOCK_INFO_ERROR = 0;
const int BLOCK_INFO_VOID_EXTENT_HDR = 1;
const int BLOCK_INFO_VOID_EXTENT_LDR = 2;
const int BLOCK_INFO_NORMAL = 3;
const uint BYTES_PER_BLOCK_LOG2 = 4;
const int JUST_BITS = 0;
const int QUINT = 1;
const int TRIT = 2;
// ASTC Encodings data, sorted in ascending order based on their BitLength value
// (see GetBitLength() function)
EncodingData encoding_values[22] = EncodingData[](
EncodingData(JUST_BITS, 0, 0, 0), EncodingData(JUST_BITS, 1, 0, 0), EncodingData(TRIT, 0, 0, 0),
EncodingData(JUST_BITS, 2, 0, 0), EncodingData(QUINT, 0, 0, 0), EncodingData(TRIT, 1, 0, 0),
EncodingData(JUST_BITS, 3, 0, 0), EncodingData(QUINT, 1, 0, 0), EncodingData(TRIT, 2, 0, 0),
EncodingData(JUST_BITS, 4, 0, 0), EncodingData(QUINT, 2, 0, 0), EncodingData(TRIT, 3, 0, 0),
EncodingData(JUST_BITS, 5, 0, 0), EncodingData(QUINT, 3, 0, 0), EncodingData(TRIT, 4, 0, 0),
EncodingData(JUST_BITS, 6, 0, 0), EncodingData(QUINT, 4, 0, 0), EncodingData(TRIT, 5, 0, 0),
EncodingData(JUST_BITS, 7, 0, 0), EncodingData(QUINT, 5, 0, 0), EncodingData(TRIT, 6, 0, 0),
EncodingData(JUST_BITS, 8, 0, 0)
);
// The following constants are expanded variants of the Replicate()
// function calls corresponding to the following arguments:
// value: index into the generated table
@@ -135,44 +119,37 @@ const uint REPLICATE_7_BIT_TO_8_TABLE[128] =
// Input ASTC texture globals
uint current_index = 0;
int bitsread = 0;
uint total_bitsread = 0;
uint local_buff[16];
int total_bitsread = 0;
uvec4 local_buff;
// Color data globals
uint color_endpoint_data[16];
uvec4 color_endpoint_data;
int color_bitsread = 0;
uint total_color_bitsread = 0;
int color_index = 0;
// Four values, two endpoints, four maximum paritions
uint color_values[32];
int colvals_index = 0;
// Weight data globals
uint texel_weight_data[16];
uvec4 texel_weight_data;
int texel_bitsread = 0;
uint total_texel_bitsread = 0;
int texel_index = 0;
bool texel_flag = false;
// Global "vectors" to be pushed into when decoding
EncodingData result_vector[100];
EncodingData result_vector[144];
int result_index = 0;
EncodingData texel_vector[100];
EncodingData texel_vector[144];
int texel_vector_index = 0;
uint unquantized_texel_weights[2][144];
uint SwizzleOffset(uvec2 pos) {
pos = pos & SWIZZLE_MASK;
return swizzle_table[pos.y * 64 + pos.x];
}
uint ReadTexel(uint offset) {
// extract the 8-bit value from the 32-bit packed data.
return bitfieldExtract(astc_data[offset / 4], int((offset * 8) & 24), 8);
uint x = pos.x;
uint y = pos.y;
return ((x % 64) / 32) * 256 + ((y % 8) / 2) * 64 + ((x % 32) / 16) * 32 +
(y % 2) * 16 + (x % 16);
}
// Replicates low num_bits such that [(to_bit - 1):(to_bit - 1 - from_bit)]
@@ -278,14 +255,10 @@ uint Hash52(uint p) {
return p;
}
uint SelectPartition(uint seed, uint x, uint y, uint z, uint partition_count, bool small_block) {
if (partition_count == 1) {
return 0;
}
uint Select2DPartition(uint seed, uint x, uint y, uint partition_count, bool small_block) {
if (small_block) {
x <<= 1;
y <<= 1;
z <<= 1;
}
seed += (partition_count - 1) * 1024;
@@ -299,10 +272,6 @@ uint SelectPartition(uint seed, uint x, uint y, uint z, uint partition_count, bo
uint seed6 = uint((rnum >> 20) & 0xF);
uint seed7 = uint((rnum >> 24) & 0xF);
uint seed8 = uint((rnum >> 28) & 0xF);
uint seed9 = uint((rnum >> 18) & 0xF);
uint seed10 = uint((rnum >> 22) & 0xF);
uint seed11 = uint((rnum >> 26) & 0xF);
uint seed12 = uint(((rnum >> 30) | (rnum << 2)) & 0xF);
seed1 = (seed1 * seed1);
seed2 = (seed2 * seed2);
@@ -312,12 +281,8 @@ uint SelectPartition(uint seed, uint x, uint y, uint z, uint partition_count, bo
seed6 = (seed6 * seed6);
seed7 = (seed7 * seed7);
seed8 = (seed8 * seed8);
seed9 = (seed9 * seed9);
seed10 = (seed10 * seed10);
seed11 = (seed11 * seed11);
seed12 = (seed12 * seed12);
int sh1, sh2, sh3;
uint sh1, sh2;
if ((seed & 1) > 0) {
sh1 = (seed & 2) > 0 ? 4 : 5;
sh2 = (partition_count == 3) ? 6 : 5;
@@ -325,25 +290,19 @@ uint SelectPartition(uint seed, uint x, uint y, uint z, uint partition_count, bo
sh1 = (partition_count == 3) ? 6 : 5;
sh2 = (seed & 2) > 0 ? 4 : 5;
}
sh3 = (seed & 0x10) > 0 ? sh1 : sh2;
seed1 >>= sh1;
seed2 >>= sh2;
seed3 >>= sh1;
seed4 >>= sh2;
seed5 >>= sh1;
seed6 >>= sh2;
seed7 >>= sh1;
seed8 >>= sh2;
seed1 = (seed1 >> sh1);
seed2 = (seed2 >> sh2);
seed3 = (seed3 >> sh1);
seed4 = (seed4 >> sh2);
seed5 = (seed5 >> sh1);
seed6 = (seed6 >> sh2);
seed7 = (seed7 >> sh1);
seed8 = (seed8 >> sh2);
seed9 = (seed9 >> sh3);
seed10 = (seed10 >> sh3);
seed11 = (seed11 >> sh3);
seed12 = (seed12 >> sh3);
uint a = seed1 * x + seed2 * y + seed11 * z + (rnum >> 14);
uint b = seed3 * x + seed4 * y + seed12 * z + (rnum >> 10);
uint c = seed5 * x + seed6 * y + seed9 * z + (rnum >> 6);
uint d = seed7 * x + seed8 * y + seed10 * z + (rnum >> 2);
uint a = seed1 * x + seed2 * y + (rnum >> 14);
uint b = seed3 * x + seed4 * y + (rnum >> 10);
uint c = seed5 * x + seed6 * y + (rnum >> 6);
uint d = seed7 * x + seed8 * y + (rnum >> 2);
a &= 0x3F;
b &= 0x3F;
@@ -368,58 +327,37 @@ uint SelectPartition(uint seed, uint x, uint y, uint z, uint partition_count, bo
}
}
uint Select2DPartition(uint seed, uint x, uint y, uint partition_count, bool small_block) {
return SelectPartition(seed, x, y, 0, partition_count, small_block);
}
uint ReadBit() {
if (current_index >= local_buff.length()) {
uint ExtractBits(uvec4 payload, int offset, int bits) {
if (bits <= 0) {
return 0;
}
uint bit = bitfieldExtract(local_buff[current_index], bitsread, 1);
++bitsread;
++total_bitsread;
if (bitsread == 8) {
++current_index;
bitsread = 0;
int last_offset = offset + bits - 1;
int shifted_offset = offset >> 5;
if ((last_offset >> 5) == shifted_offset) {
return bitfieldExtract(payload[shifted_offset], offset & 31, bits);
}
return bit;
int first_bits = 32 - (offset & 31);
int result_first = int(bitfieldExtract(payload[shifted_offset], offset & 31, first_bits));
int result_second = int(bitfieldExtract(payload[shifted_offset + 1], 0, bits - first_bits));
return result_first | (result_second << first_bits);
}
uint StreamBits(uint num_bits) {
uint ret = 0;
for (uint i = 0; i < num_bits; i++) {
ret |= ((ReadBit() & 1) << i);
}
int int_bits = int(num_bits);
uint ret = ExtractBits(local_buff, total_bitsread, int_bits);
total_bitsread += int_bits;
return ret;
}
uint ReadColorBit() {
uint bit = 0;
if (texel_flag) {
bit = bitfieldExtract(texel_weight_data[texel_index], texel_bitsread, 1);
++texel_bitsread;
++total_texel_bitsread;
if (texel_bitsread == 8) {
++texel_index;
texel_bitsread = 0;
}
} else {
bit = bitfieldExtract(color_endpoint_data[color_index], color_bitsread, 1);
++color_bitsread;
++total_color_bitsread;
if (color_bitsread == 8) {
++color_index;
color_bitsread = 0;
}
}
return bit;
}
uint StreamColorBits(uint num_bits) {
uint ret = 0;
for (uint i = 0; i < num_bits; i++) {
ret |= ((ReadColorBit() & 1) << i);
int int_bits = int(num_bits);
if (texel_flag) {
ret = ExtractBits(texel_weight_data, texel_bitsread, int_bits);
texel_bitsread += int_bits;
} else {
ret = ExtractBits(color_endpoint_data, color_bitsread, int_bits);
color_bitsread += int_bits;
}
return ret;
}
@@ -596,22 +534,16 @@ void DecodeColorValues(uvec4 modes, uint num_partitions, uint color_data_bits) {
for (uint i = 0; i < num_partitions; i++) {
num_values += ((modes[i] >> 2) + 1) << 1;
}
int range = 256;
while (--range > 0) {
EncodingData val = encoding_values[range];
// Find the largest encoding that's within color_data_bits
// TODO(ameerj): profile with binary search
int range = 0;
while (++range < encoding_values.length()) {
uint bit_length = GetBitLength(num_values, range);
if (bit_length <= color_data_bits) {
while (--range > 0) {
EncodingData newval = encoding_values[range];
if (newval.encoding != val.encoding && newval.num_bits != val.num_bits) {
break;
}
}
++range;
if (bit_length > color_data_bits) {
break;
}
}
DecodeIntegerSequence(range, num_values);
DecodeIntegerSequence(range - 1, num_values);
uint out_index = 0;
for (int itr = 0; itr < result_index; ++itr) {
if (out_index >= num_values) {
@@ -1028,7 +960,7 @@ int FindLayout(uint mode) {
return 5;
}
TexelWeightParams DecodeBlockInfo(uint block_index) {
TexelWeightParams DecodeBlockInfo() {
TexelWeightParams params = TexelWeightParams(uvec2(0), 0, false, false, false, false);
uint mode = StreamBits(11);
if ((mode & 0x1ff) == 0x1fc) {
@@ -1110,10 +1042,10 @@ TexelWeightParams DecodeBlockInfo(uint block_index) {
}
weight_index -= 2;
if ((mode_layout != 9) && ((mode & 0x200) != 0)) {
const int max_weights[6] = int[6](9, 11, 15, 19, 23, 31);
const int max_weights[6] = int[6](7, 8, 9, 10, 11, 12);
params.max_weight = max_weights[weight_index];
} else {
const int max_weights[6] = int[6](1, 2, 3, 4, 5, 7);
const int max_weights[6] = int[6](1, 2, 3, 4, 5, 6);
params.max_weight = max_weights[weight_index];
}
return params;
@@ -1144,8 +1076,8 @@ void FillVoidExtentLDR(ivec3 coord) {
}
}
void DecompressBlock(ivec3 coord, uint block_index) {
TexelWeightParams params = DecodeBlockInfo(block_index);
void DecompressBlock(ivec3 coord) {
TexelWeightParams params = DecodeBlockInfo();
if (params.error_state) {
FillError(coord);
return;
@@ -1212,7 +1144,7 @@ void DecompressBlock(ivec3 coord, uint block_index) {
// Read color data...
uint color_data_bits = remaining_bits;
while (remaining_bits > 0) {
int nb = int(min(remaining_bits, 8U));
int nb = int(min(remaining_bits, 32U));
uint b = StreamBits(nb);
color_endpoint_data[ced_pointer] = uint(bitfieldExtract(b, 0, nb));
++ced_pointer;
@@ -1254,25 +1186,20 @@ void DecompressBlock(ivec3 coord, uint block_index) {
ComputeEndpoints(endpoints[i][0], endpoints[i][1], color_endpoint_mode[i]);
}
for (uint i = 0; i < 16; i++) {
texel_weight_data[i] = local_buff[i];
}
for (uint i = 0; i < 8; i++) {
#define REVERSE_BYTE(b) ((b * 0x0802U & 0x22110U) | (b * 0x8020U & 0x88440U)) * 0x10101U >> 16
uint a = REVERSE_BYTE(texel_weight_data[i]);
uint b = REVERSE_BYTE(texel_weight_data[15 - i]);
#undef REVERSE_BYTE
texel_weight_data[i] = uint(bitfieldExtract(b, 0, 8));
texel_weight_data[15 - i] = uint(bitfieldExtract(a, 0, 8));
}
texel_weight_data = local_buff;
texel_weight_data = bitfieldReverse(texel_weight_data).wzyx;
uint clear_byte_start =
(GetPackedBitSize(params.size, params.dual_plane, params.max_weight) >> 3) + 1;
texel_weight_data[clear_byte_start - 1] =
texel_weight_data[clear_byte_start - 1] &
uint byte_insert = ExtractBits(texel_weight_data, int(clear_byte_start - 1) * 8, 8) &
uint(
((1 << (GetPackedBitSize(params.size, params.dual_plane, params.max_weight) % 8)) - 1));
for (uint i = 0; i < 16 - clear_byte_start; i++) {
texel_weight_data[clear_byte_start + i] = 0U;
uint vec_index = (clear_byte_start - 1) >> 2;
texel_weight_data[vec_index] =
bitfieldInsert(texel_weight_data[vec_index], byte_insert, int((clear_byte_start - 1) % 4) * 8, 8);
for (uint i = clear_byte_start; i < 16; ++i) {
uint idx = i >> 2;
texel_weight_data[idx] = bitfieldInsert(texel_weight_data[idx], 0, int(i % 4) * 8, 8);
}
texel_flag = true; // use texel "vector" and bit stream in integer decoding
DecodeIntegerSequence(params.max_weight, GetNumWeightValues(params.size, params.dual_plane));
@@ -1281,8 +1208,11 @@ void DecompressBlock(ivec3 coord, uint block_index) {
for (uint j = 0; j < block_dims.y; j++) {
for (uint i = 0; i < block_dims.x; i++) {
uint local_partition = Select2DPartition(partition_index, i, j, num_partitions,
uint local_partition = 0;
if (num_partitions > 1) {
local_partition = Select2DPartition(partition_index, i, j, num_partitions,
(block_dims.y * block_dims.x) < 32);
}
vec4 p;
uvec4 C0 = ReplicateByteTo16(endpoints[local_partition][0]);
uvec4 C1 = ReplicateByteTo16(endpoints[local_partition][1]);
@@ -1303,7 +1233,7 @@ void DecompressBlock(ivec3 coord, uint block_index) {
void main() {
uvec3 pos = gl_GlobalInvocationID;
pos.x <<= bytes_per_block_log2;
pos.x <<= BYTES_PER_BLOCK_LOG2;
// Read as soon as possible due to its latency
const uint swizzle = SwizzleOffset(pos.xy);
@@ -1321,13 +1251,8 @@ void main() {
if (any(greaterThanEqual(coord, imageSize(dest_image)))) {
return;
}
uint block_index =
pos.z * gl_WorkGroupSize.x * gl_WorkGroupSize.y + pos.y * gl_WorkGroupSize.x + pos.x;
current_index = 0;
bitsread = 0;
for (int i = 0; i < 16; i++) {
local_buff[i] = ReadTexel(offset + i);
}
DecompressBlock(coord, block_index);
local_buff = astc_data[offset / 16];
DecompressBlock(coord);
}

View File

@@ -60,19 +60,14 @@ UtilShaders::UtilShaders(ProgramManager& program_manager_)
copy_bc4_program(MakeProgram(OPENGL_COPY_BC4_COMP)) {
const auto swizzle_table = Tegra::Texture::MakeSwizzleTable();
swizzle_table_buffer.Create();
astc_buffer.Create();
glNamedBufferStorage(swizzle_table_buffer.handle, sizeof(swizzle_table), &swizzle_table, 0);
glNamedBufferStorage(astc_buffer.handle, sizeof(ASTC_ENCODINGS_VALUES), &ASTC_ENCODINGS_VALUES,
0);
}
UtilShaders::~UtilShaders() = default;
void UtilShaders::ASTCDecode(Image& image, const ImageBufferMap& map,
std::span<const VideoCommon::SwizzleParameters> swizzles) {
static constexpr GLuint BINDING_SWIZZLE_BUFFER = 0;
static constexpr GLuint BINDING_INPUT_BUFFER = 1;
static constexpr GLuint BINDING_ENC_BUFFER = 2;
static constexpr GLuint BINDING_INPUT_BUFFER = 0;
static constexpr GLuint BINDING_OUTPUT_IMAGE = 0;
const Extent2D tile_size{
@@ -80,34 +75,32 @@ void UtilShaders::ASTCDecode(Image& image, const ImageBufferMap& map,
.height = VideoCore::Surface::DefaultBlockHeight(image.info.format),
};
program_manager.BindComputeProgram(astc_decoder_program.handle);
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, BINDING_SWIZZLE_BUFFER, swizzle_table_buffer.handle);
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, BINDING_ENC_BUFFER, astc_buffer.handle);
glFlushMappedNamedBufferRange(map.buffer, map.offset, image.guest_size_bytes);
glUniform2ui(1, tile_size.width, tile_size.height);
// Ensure buffer data is valid before dispatching
glFlush();
for (const SwizzleParameters& swizzle : swizzles) {
const size_t input_offset = swizzle.buffer_offset + map.offset;
const u32 num_dispatches_x = Common::DivCeil(swizzle.num_tiles.width, 32U);
const u32 num_dispatches_y = Common::DivCeil(swizzle.num_tiles.height, 32U);
const u32 num_dispatches_x = Common::DivCeil(swizzle.num_tiles.width, 8U);
const u32 num_dispatches_y = Common::DivCeil(swizzle.num_tiles.height, 8U);
const auto params = MakeBlockLinearSwizzle2DParams(swizzle, image.info);
ASSERT(params.origin == (std::array<u32, 3>{0, 0, 0}));
ASSERT(params.destination == (std::array<s32, 3>{0, 0, 0}));
ASSERT(params.bytes_per_block_log2 == 4);
glUniform1ui(2, params.bytes_per_block_log2);
glUniform1ui(3, params.layer_stride);
glUniform1ui(4, params.block_size);
glUniform1ui(5, params.x_shift);
glUniform1ui(6, params.block_height);
glUniform1ui(7, params.block_height_mask);
glUniform1ui(2, params.layer_stride);
glUniform1ui(3, params.block_size);
glUniform1ui(4, params.x_shift);
glUniform1ui(5, params.block_height);
glUniform1ui(6, params.block_height_mask);
glBindImageTexture(BINDING_OUTPUT_IMAGE, image.StorageHandle(), swizzle.level, GL_TRUE, 0,
GL_WRITE_ONLY, GL_RGBA8);
// ASTC texture data
glBindBufferRange(GL_SHADER_STORAGE_BUFFER, BINDING_INPUT_BUFFER, map.buffer, input_offset,
image.guest_size_bytes - swizzle.buffer_offset);
glBindImageTexture(BINDING_OUTPUT_IMAGE, image.StorageHandle(), swizzle.level, GL_TRUE, 0,
GL_WRITE_ONLY, GL_RGBA8);
glDispatchCompute(num_dispatches_x, num_dispatches_y, image.info.resources.layers);
}

View File

@@ -62,7 +62,6 @@ private:
ProgramManager& program_manager;
OGLBuffer swizzle_table_buffer;
OGLBuffer astc_buffer;
OGLProgram astc_decoder_program;
OGLProgram block_linear_unswizzle_2d_program;

View File

@@ -30,16 +30,12 @@
namespace Vulkan {
using Tegra::Texture::SWIZZLE_TABLE;
using Tegra::Texture::ASTC::ASTC_ENCODINGS_VALUES;
using namespace Tegra::Texture::ASTC;
namespace {
constexpr u32 ASTC_BINDING_INPUT_BUFFER = 0;
constexpr u32 ASTC_BINDING_ENC_BUFFER = 1;
constexpr u32 ASTC_BINDING_SWIZZLE_BUFFER = 2;
constexpr u32 ASTC_BINDING_OUTPUT_IMAGE = 3;
constexpr size_t ASTC_NUM_BINDINGS = 4;
constexpr u32 ASTC_BINDING_OUTPUT_IMAGE = 1;
constexpr size_t ASTC_NUM_BINDINGS = 2;
template <size_t size>
inline constexpr VkPushConstantRange COMPUTE_PUSH_CONSTANT_RANGE{
@@ -75,7 +71,7 @@ constexpr DescriptorBankInfo INPUT_OUTPUT_BANK_INFO{
.score = 2,
};
constexpr std::array<VkDescriptorSetLayoutBinding, 4> ASTC_DESCRIPTOR_SET_BINDINGS{{
constexpr std::array<VkDescriptorSetLayoutBinding, ASTC_NUM_BINDINGS> ASTC_DESCRIPTOR_SET_BINDINGS{{
{
.binding = ASTC_BINDING_INPUT_BUFFER,
.descriptorType = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER,
@@ -83,20 +79,6 @@ constexpr std::array<VkDescriptorSetLayoutBinding, 4> ASTC_DESCRIPTOR_SET_BINDIN
.stageFlags = VK_SHADER_STAGE_COMPUTE_BIT,
.pImmutableSamplers = nullptr,
},
{
.binding = ASTC_BINDING_ENC_BUFFER,
.descriptorType = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER,
.descriptorCount = 1,
.stageFlags = VK_SHADER_STAGE_COMPUTE_BIT,
.pImmutableSamplers = nullptr,
},
{
.binding = ASTC_BINDING_SWIZZLE_BUFFER,
.descriptorType = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER,
.descriptorCount = 1,
.stageFlags = VK_SHADER_STAGE_COMPUTE_BIT,
.pImmutableSamplers = nullptr,
},
{
.binding = ASTC_BINDING_OUTPUT_IMAGE,
.descriptorType = VK_DESCRIPTOR_TYPE_STORAGE_IMAGE,
@@ -108,12 +90,12 @@ constexpr std::array<VkDescriptorSetLayoutBinding, 4> ASTC_DESCRIPTOR_SET_BINDIN
constexpr DescriptorBankInfo ASTC_BANK_INFO{
.uniform_buffers = 0,
.storage_buffers = 3,
.storage_buffers = 1,
.texture_buffers = 0,
.image_buffers = 0,
.textures = 0,
.images = 1,
.score = 4,
.score = 2,
};
constexpr VkDescriptorUpdateTemplateEntryKHR INPUT_OUTPUT_DESCRIPTOR_UPDATE_TEMPLATE{
@@ -135,22 +117,6 @@ constexpr std::array<VkDescriptorUpdateTemplateEntryKHR, ASTC_NUM_BINDINGS>
.offset = ASTC_BINDING_INPUT_BUFFER * sizeof(DescriptorUpdateEntry),
.stride = sizeof(DescriptorUpdateEntry),
},
{
.dstBinding = ASTC_BINDING_ENC_BUFFER,
.dstArrayElement = 0,
.descriptorCount = 1,
.descriptorType = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER,
.offset = ASTC_BINDING_ENC_BUFFER * sizeof(DescriptorUpdateEntry),
.stride = sizeof(DescriptorUpdateEntry),
},
{
.dstBinding = ASTC_BINDING_SWIZZLE_BUFFER,
.dstArrayElement = 0,
.descriptorCount = 1,
.descriptorType = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER,
.offset = ASTC_BINDING_SWIZZLE_BUFFER * sizeof(DescriptorUpdateEntry),
.stride = sizeof(DescriptorUpdateEntry),
},
{
.dstBinding = ASTC_BINDING_OUTPUT_IMAGE,
.dstArrayElement = 0,
@@ -163,7 +129,6 @@ constexpr std::array<VkDescriptorUpdateTemplateEntryKHR, ASTC_NUM_BINDINGS>
struct AstcPushConstants {
std::array<u32, 2> blocks_dims;
u32 bytes_per_block_log2;
u32 layer_stride;
u32 block_size;
u32 x_shift;
@@ -354,46 +319,6 @@ ASTCDecoderPass::ASTCDecoderPass(const Device& device_, VKScheduler& scheduler_,
ASTCDecoderPass::~ASTCDecoderPass() = default;
void ASTCDecoderPass::MakeDataBuffer() {
constexpr size_t TOTAL_BUFFER_SIZE = sizeof(ASTC_ENCODINGS_VALUES) + sizeof(SWIZZLE_TABLE);
data_buffer = device.GetLogical().CreateBuffer(VkBufferCreateInfo{
.sType = VK_STRUCTURE_TYPE_BUFFER_CREATE_INFO,
.pNext = nullptr,
.flags = 0,
.size = TOTAL_BUFFER_SIZE,
.usage = VK_BUFFER_USAGE_STORAGE_BUFFER_BIT | VK_BUFFER_USAGE_TRANSFER_DST_BIT,
.sharingMode = VK_SHARING_MODE_EXCLUSIVE,
.queueFamilyIndexCount = 0,
.pQueueFamilyIndices = nullptr,
});
data_buffer_commit = memory_allocator.Commit(data_buffer, MemoryUsage::Upload);
const auto staging_ref = staging_buffer_pool.Request(TOTAL_BUFFER_SIZE, MemoryUsage::Upload);
std::memcpy(staging_ref.mapped_span.data(), &ASTC_ENCODINGS_VALUES,
sizeof(ASTC_ENCODINGS_VALUES));
// Tack on the swizzle table at the end of the buffer
std::memcpy(staging_ref.mapped_span.data() + sizeof(ASTC_ENCODINGS_VALUES), &SWIZZLE_TABLE,
sizeof(SWIZZLE_TABLE));
scheduler.Record([src = staging_ref.buffer, offset = staging_ref.offset, dst = *data_buffer,
TOTAL_BUFFER_SIZE](vk::CommandBuffer cmdbuf) {
static constexpr VkMemoryBarrier write_barrier{
.sType = VK_STRUCTURE_TYPE_MEMORY_BARRIER,
.pNext = nullptr,
.srcAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT,
.dstAccessMask = VK_ACCESS_SHADER_READ_BIT,
};
const VkBufferCopy copy{
.srcOffset = offset,
.dstOffset = 0,
.size = TOTAL_BUFFER_SIZE,
};
cmdbuf.CopyBuffer(src, dst, copy);
cmdbuf.PipelineBarrier(VK_PIPELINE_STAGE_TRANSFER_BIT, VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT,
0, write_barrier);
});
}
void ASTCDecoderPass::Assemble(Image& image, const StagingBufferRef& map,
std::span<const VideoCommon::SwizzleParameters> swizzles) {
using namespace VideoCommon::Accelerated;
@@ -402,9 +327,6 @@ void ASTCDecoderPass::Assemble(Image& image, const StagingBufferRef& map,
VideoCore::Surface::DefaultBlockHeight(image.info.format),
};
scheduler.RequestOutsideRenderPassOperationContext();
if (!data_buffer) {
MakeDataBuffer();
}
const VkPipeline vk_pipeline = *pipeline;
const VkImageAspectFlags aspect_mask = image.AspectMask();
const VkImage vk_image = image.Handle();
@@ -436,16 +358,13 @@ void ASTCDecoderPass::Assemble(Image& image, const StagingBufferRef& map,
});
for (const VideoCommon::SwizzleParameters& swizzle : swizzles) {
const size_t input_offset = swizzle.buffer_offset + map.offset;
const u32 num_dispatches_x = Common::DivCeil(swizzle.num_tiles.width, 32U);
const u32 num_dispatches_y = Common::DivCeil(swizzle.num_tiles.height, 32U);
const u32 num_dispatches_x = Common::DivCeil(swizzle.num_tiles.width, 8U);
const u32 num_dispatches_y = Common::DivCeil(swizzle.num_tiles.height, 8U);
const u32 num_dispatches_z = image.info.resources.layers;
update_descriptor_queue.Acquire();
update_descriptor_queue.AddBuffer(map.buffer, input_offset,
image.guest_size_bytes - swizzle.buffer_offset);
update_descriptor_queue.AddBuffer(*data_buffer, 0, sizeof(ASTC_ENCODINGS_VALUES));
update_descriptor_queue.AddBuffer(*data_buffer, sizeof(ASTC_ENCODINGS_VALUES),
sizeof(SWIZZLE_TABLE));
update_descriptor_queue.AddImage(image.StorageImageView(swizzle.level));
const void* const descriptor_data{update_descriptor_queue.UpdateData()};
@@ -453,11 +372,11 @@ void ASTCDecoderPass::Assemble(Image& image, const StagingBufferRef& map,
const auto params = MakeBlockLinearSwizzle2DParams(swizzle, image.info);
ASSERT(params.origin == (std::array<u32, 3>{0, 0, 0}));
ASSERT(params.destination == (std::array<s32, 3>{0, 0, 0}));
ASSERT(params.bytes_per_block_log2 == 4);
scheduler.Record([this, num_dispatches_x, num_dispatches_y, num_dispatches_z, block_dims,
params, descriptor_data](vk::CommandBuffer cmdbuf) {
const AstcPushConstants uniforms{
.blocks_dims = block_dims,
.bytes_per_block_log2 = params.bytes_per_block_log2,
.layer_stride = params.layer_stride,
.block_size = params.block_size,
.x_shift = params.x_shift,

View File

@@ -96,15 +96,10 @@ public:
std::span<const VideoCommon::SwizzleParameters> swizzles);
private:
void MakeDataBuffer();
VKScheduler& scheduler;
StagingBufferPool& staging_buffer_pool;
VKUpdateDescriptorQueue& update_descriptor_queue;
MemoryAllocator& memory_allocator;
vk::Buffer data_buffer;
MemoryCommit data_buffer_commit;
};
} // namespace Vulkan

View File

@@ -151,6 +151,76 @@ private:
const IntType& m_Bits;
};
enum class IntegerEncoding { JustBits, Quint, Trit };
struct IntegerEncodedValue {
constexpr IntegerEncodedValue() = default;
constexpr IntegerEncodedValue(IntegerEncoding encoding_, u32 num_bits_)
: encoding{encoding_}, num_bits{num_bits_} {}
constexpr bool MatchesEncoding(const IntegerEncodedValue& other) const {
return encoding == other.encoding && num_bits == other.num_bits;
}
// Returns the number of bits required to encode num_vals values.
u32 GetBitLength(u32 num_vals) const {
u32 total_bits = num_bits * num_vals;
if (encoding == IntegerEncoding::Trit) {
total_bits += (num_vals * 8 + 4) / 5;
} else if (encoding == IntegerEncoding::Quint) {
total_bits += (num_vals * 7 + 2) / 3;
}
return total_bits;
}
IntegerEncoding encoding{};
u32 num_bits = 0;
u32 bit_value = 0;
union {
u32 quint_value = 0;
u32 trit_value;
};
};
// Returns a new instance of this struct that corresponds to the
// can take no more than mav_value values
static constexpr IntegerEncodedValue CreateEncoding(u32 mav_value) {
while (mav_value > 0) {
u32 check = mav_value + 1;
// Is mav_value a power of two?
if (!(check & (check - 1))) {
return IntegerEncodedValue(IntegerEncoding::JustBits, std::popcount(mav_value));
}
// Is mav_value of the type 3*2^n - 1?
if ((check % 3 == 0) && !((check / 3) & ((check / 3) - 1))) {
return IntegerEncodedValue(IntegerEncoding::Trit, std::popcount(check / 3 - 1));
}
// Is mav_value of the type 5*2^n - 1?
if ((check % 5 == 0) && !((check / 5) & ((check / 5) - 1))) {
return IntegerEncodedValue(IntegerEncoding::Quint, std::popcount(check / 5 - 1));
}
// Apparently it can't be represented with a bounded integer sequence...
// just iterate.
mav_value--;
}
return IntegerEncodedValue(IntegerEncoding::JustBits, 0);
}
static constexpr std::array<IntegerEncodedValue, 256> MakeEncodedValues() {
std::array<IntegerEncodedValue, 256> encodings{};
for (std::size_t i = 0; i < encodings.size(); ++i) {
encodings[i] = CreateEncoding(static_cast<u32>(i));
}
return encodings;
}
static constexpr std::array<IntegerEncodedValue, 256> ASTC_ENCODINGS_VALUES = MakeEncodedValues();
namespace Tegra::Texture::ASTC {
using IntegerEncodedVector = boost::container::static_vector<
IntegerEncodedValue, 256,
@@ -521,35 +591,41 @@ static TexelWeightParams DecodeBlockInfo(InputBitStream& strm) {
return params;
}
static void FillVoidExtentLDR(InputBitStream& strm, std::span<u32> outBuf, u32 blockWidth,
u32 blockHeight) {
// Don't actually care about the void extent, just read the bits...
for (s32 i = 0; i < 4; ++i) {
strm.ReadBits<13>();
// Replicates low num_bits such that [(to_bit - 1):(to_bit - 1 - from_bit)]
// is the same as [(num_bits - 1):0] and repeats all the way down.
template <typename IntType>
static constexpr IntType Replicate(IntType val, u32 num_bits, u32 to_bit) {
if (num_bits == 0 || to_bit == 0) {
return 0;
}
// Decode the RGBA components and renormalize them to the range [0, 255]
u16 r = static_cast<u16>(strm.ReadBits<16>());
u16 g = static_cast<u16>(strm.ReadBits<16>());
u16 b = static_cast<u16>(strm.ReadBits<16>());
u16 a = static_cast<u16>(strm.ReadBits<16>());
u32 rgba = (r >> 8) | (g & 0xFF00) | (static_cast<u32>(b) & 0xFF00) << 8 |
(static_cast<u32>(a) & 0xFF00) << 16;
for (u32 j = 0; j < blockHeight; j++) {
for (u32 i = 0; i < blockWidth; i++) {
outBuf[j * blockWidth + i] = rgba;
const IntType v = val & static_cast<IntType>((1 << num_bits) - 1);
IntType res = v;
u32 reslen = num_bits;
while (reslen < to_bit) {
u32 comp = 0;
if (num_bits > to_bit - reslen) {
u32 newshift = to_bit - reslen;
comp = num_bits - newshift;
num_bits = newshift;
}
res = static_cast<IntType>(res << num_bits);
res = static_cast<IntType>(res | (v >> comp));
reslen += num_bits;
}
return res;
}
static void FillError(std::span<u32> outBuf, u32 blockWidth, u32 blockHeight) {
for (u32 j = 0; j < blockHeight; j++) {
for (u32 i = 0; i < blockWidth; i++) {
outBuf[j * blockWidth + i] = 0xFFFF00FF;
}
static constexpr std::size_t NumReplicateEntries(u32 num_bits) {
return std::size_t(1) << num_bits;
}
template <typename IntType, u32 num_bits, u32 to_bit>
static constexpr auto MakeReplicateTable() {
std::array<IntType, NumReplicateEntries(num_bits)> table{};
for (IntType value = 0; value < static_cast<IntType>(std::size(table)); ++value) {
table[value] = Replicate(value, num_bits, to_bit);
}
return table;
}
static constexpr auto REPLICATE_BYTE_TO_16_TABLE = MakeReplicateTable<u32, 8, 16>();
@@ -572,6 +648,9 @@ static constexpr auto REPLICATE_2_BIT_TO_8_TABLE = MakeReplicateTable<u32, 2, 8>
static constexpr auto REPLICATE_3_BIT_TO_8_TABLE = MakeReplicateTable<u32, 3, 8>();
static constexpr auto REPLICATE_4_BIT_TO_8_TABLE = MakeReplicateTable<u32, 4, 8>();
static constexpr auto REPLICATE_5_BIT_TO_8_TABLE = MakeReplicateTable<u32, 5, 8>();
static constexpr auto REPLICATE_6_BIT_TO_8_TABLE = MakeReplicateTable<u32, 6, 8>();
static constexpr auto REPLICATE_7_BIT_TO_8_TABLE = MakeReplicateTable<u32, 7, 8>();
static constexpr auto REPLICATE_8_BIT_TO_8_TABLE = MakeReplicateTable<u32, 8, 8>();
/// Use a precompiled table with the most common usages, if it's not in the expected range, fallback
/// to the runtime implementation
static constexpr u32 FastReplicateTo8(u32 value, u32 num_bits) {
@@ -1316,6 +1395,37 @@ static void ComputeEndpoints(Pixel& ep1, Pixel& ep2, const u32*& colorValues,
#undef READ_INT_VALUES
}
static void FillVoidExtentLDR(InputBitStream& strm, std::span<u32> outBuf, u32 blockWidth,
u32 blockHeight) {
// Don't actually care about the void extent, just read the bits...
for (s32 i = 0; i < 4; ++i) {
strm.ReadBits<13>();
}
// Decode the RGBA components and renormalize them to the range [0, 255]
u16 r = static_cast<u16>(strm.ReadBits<16>());
u16 g = static_cast<u16>(strm.ReadBits<16>());
u16 b = static_cast<u16>(strm.ReadBits<16>());
u16 a = static_cast<u16>(strm.ReadBits<16>());
u32 rgba = (r >> 8) | (g & 0xFF00) | (static_cast<u32>(b) & 0xFF00) << 8 |
(static_cast<u32>(a) & 0xFF00) << 16;
for (u32 j = 0; j < blockHeight; j++) {
for (u32 i = 0; i < blockWidth; i++) {
outBuf[j * blockWidth + i] = rgba;
}
}
}
static void FillError(std::span<u32> outBuf, u32 blockWidth, u32 blockHeight) {
for (u32 j = 0; j < blockHeight; j++) {
for (u32 i = 0; i < blockWidth; i++) {
outBuf[j * blockWidth + i] = 0xFFFF00FF;
}
}
}
static void DecompressBlock(std::span<const u8, 16> inBuf, const u32 blockWidth,
const u32 blockHeight, std::span<u32, 12 * 12> outBuf) {
InputBitStream strm(inBuf);

View File

@@ -9,117 +9,6 @@
namespace Tegra::Texture::ASTC {
enum class IntegerEncoding { JustBits, Quint, Trit };
struct IntegerEncodedValue {
constexpr IntegerEncodedValue() = default;
constexpr IntegerEncodedValue(IntegerEncoding encoding_, u32 num_bits_)
: encoding{encoding_}, num_bits{num_bits_} {}
constexpr bool MatchesEncoding(const IntegerEncodedValue& other) const {
return encoding == other.encoding && num_bits == other.num_bits;
}
// Returns the number of bits required to encode num_vals values.
u32 GetBitLength(u32 num_vals) const {
u32 total_bits = num_bits * num_vals;
if (encoding == IntegerEncoding::Trit) {
total_bits += (num_vals * 8 + 4) / 5;
} else if (encoding == IntegerEncoding::Quint) {
total_bits += (num_vals * 7 + 2) / 3;
}
return total_bits;
}
IntegerEncoding encoding{};
u32 num_bits = 0;
u32 bit_value = 0;
union {
u32 quint_value = 0;
u32 trit_value;
};
};
// Returns a new instance of this struct that corresponds to the
// can take no more than mav_value values
constexpr IntegerEncodedValue CreateEncoding(u32 mav_value) {
while (mav_value > 0) {
u32 check = mav_value + 1;
// Is mav_value a power of two?
if (!(check & (check - 1))) {
return IntegerEncodedValue(IntegerEncoding::JustBits, std::popcount(mav_value));
}
// Is mav_value of the type 3*2^n - 1?
if ((check % 3 == 0) && !((check / 3) & ((check / 3) - 1))) {
return IntegerEncodedValue(IntegerEncoding::Trit, std::popcount(check / 3 - 1));
}
// Is mav_value of the type 5*2^n - 1?
if ((check % 5 == 0) && !((check / 5) & ((check / 5) - 1))) {
return IntegerEncodedValue(IntegerEncoding::Quint, std::popcount(check / 5 - 1));
}
// Apparently it can't be represented with a bounded integer sequence...
// just iterate.
mav_value--;
}
return IntegerEncodedValue(IntegerEncoding::JustBits, 0);
}
constexpr std::array<IntegerEncodedValue, 256> MakeEncodedValues() {
std::array<IntegerEncodedValue, 256> encodings{};
for (std::size_t i = 0; i < encodings.size(); ++i) {
encodings[i] = CreateEncoding(static_cast<u32>(i));
}
return encodings;
}
constexpr std::array<IntegerEncodedValue, 256> ASTC_ENCODINGS_VALUES = MakeEncodedValues();
// Replicates low num_bits such that [(to_bit - 1):(to_bit - 1 - from_bit)]
// is the same as [(num_bits - 1):0] and repeats all the way down.
template <typename IntType>
constexpr IntType Replicate(IntType val, u32 num_bits, u32 to_bit) {
if (num_bits == 0 || to_bit == 0) {
return 0;
}
const IntType v = val & static_cast<IntType>((1 << num_bits) - 1);
IntType res = v;
u32 reslen = num_bits;
while (reslen < to_bit) {
u32 comp = 0;
if (num_bits > to_bit - reslen) {
u32 newshift = to_bit - reslen;
comp = num_bits - newshift;
num_bits = newshift;
}
res = static_cast<IntType>(res << num_bits);
res = static_cast<IntType>(res | (v >> comp));
reslen += num_bits;
}
return res;
}
constexpr std::size_t NumReplicateEntries(u32 num_bits) {
return std::size_t(1) << num_bits;
}
template <typename IntType, u32 num_bits, u32 to_bit>
constexpr auto MakeReplicateTable() {
std::array<IntType, NumReplicateEntries(num_bits)> table{};
for (IntType value = 0; value < static_cast<IntType>(std::size(table)); ++value) {
table[value] = Replicate(value, num_bits, to_bit);
}
return table;
}
constexpr auto REPLICATE_6_BIT_TO_8_TABLE = MakeReplicateTable<u32, 6, 8>();
constexpr auto REPLICATE_7_BIT_TO_8_TABLE = MakeReplicateTable<u32, 7, 8>();
constexpr auto REPLICATE_8_BIT_TO_8_TABLE = MakeReplicateTable<u32, 8, 8>();
void Decompress(std::span<const uint8_t> data, uint32_t width, uint32_t height, uint32_t depth,
uint32_t block_width, uint32_t block_height, std::span<uint8_t> output);

View File

@@ -228,7 +228,9 @@ void MemoryCommit::Release() {
MemoryAllocator::MemoryAllocator(const Device& device_, bool export_allocations_)
: device{device_}, properties{device_.GetPhysical().GetMemoryProperties()},
export_allocations{export_allocations_} {}
export_allocations{export_allocations_},
buffer_image_granularity{
device_.GetPhysical().GetProperties().limits.bufferImageGranularity} {}
MemoryAllocator::~MemoryAllocator() = default;
@@ -258,7 +260,9 @@ MemoryCommit MemoryAllocator::Commit(const vk::Buffer& buffer, MemoryUsage usage
}
MemoryCommit MemoryAllocator::Commit(const vk::Image& image, MemoryUsage usage) {
auto commit = Commit(device.GetLogical().GetImageMemoryRequirements(*image), usage);
VkMemoryRequirements requirements = device.GetLogical().GetImageMemoryRequirements(*image);
requirements.size = Common::AlignUp(requirements.size, buffer_image_granularity);
auto commit = Commit(requirements, usage);
image.BindMemory(commit.Memory(), commit.Offset());
return commit;
}

View File

@@ -123,6 +123,8 @@ private:
const VkPhysicalDeviceMemoryProperties properties; ///< Physical device properties.
const bool export_allocations; ///< True when memory allocations have to be exported.
std::vector<std::unique_ptr<MemoryAllocation>> allocations; ///< Current allocations.
VkDeviceSize buffer_image_granularity; // The granularity for adjacent offsets between buffers
// and optimal images
};
/// Returns true when a memory usage is guaranteed to be host visible.

View File

@@ -82,14 +82,17 @@
<string>Enables asynchronous shader compilation, which may reduce shader stutter. This feature is experimental.</string>
</property>
<property name="text">
<string>Use asynchronous shader building</string>
<string>Use asynchronous shader building (hack)</string>
</property>
</widget>
</item>
<item>
<widget class="QCheckBox" name="use_fast_gpu_time">
<property name="toolTip">
<string>Enables Fast GPU Time. This option will force most games to run at their highest native resolution.</string>
</property>
<property name="text">
<string>Use Fast GPU Time</string>
<string>Use Fast GPU Time (hack)</string>
</property>
</widget>
</item>

View File

@@ -309,11 +309,14 @@ ConfigureInputPlayer::ConfigureInputPlayer(QWidget* parent, std::size_t player_i
buttons_param[button_id].Clear();
button_map[button_id]->setText(tr("[not set]"));
});
context_menu.addAction(tr("Toggle button"), [&] {
const bool toggle_value = !buttons_param[button_id].Get("toggle", false);
buttons_param[button_id].Set("toggle", toggle_value);
button_map[button_id]->setText(ButtonToText(buttons_param[button_id]));
});
if (buttons_param[button_id].Has("toggle")) {
context_menu.addAction(tr("Toggle button"), [&] {
const bool toggle_value =
!buttons_param[button_id].Get("toggle", false);
buttons_param[button_id].Set("toggle", toggle_value);
button_map[button_id]->setText(ButtonToText(buttons_param[button_id]));
});
}
if (buttons_param[button_id].Has("threshold")) {
context_menu.addAction(tr("Set threshold"), [&] {
const int button_threshold = static_cast<int>(

View File

@@ -122,6 +122,7 @@ void PlayerControlPreview::UpdateColors() {
colors.slider_arrow = QColor(14, 15, 18);
colors.font2 = QColor(255, 255, 255);
colors.indicator = QColor(170, 238, 255);
colors.indicator2 = QColor(100, 255, 100);
colors.deadzone = QColor(204, 136, 136);
colors.slider_button = colors.button;
}
@@ -139,6 +140,7 @@ void PlayerControlPreview::UpdateColors() {
colors.slider_arrow = QColor(65, 68, 73);
colors.font2 = QColor(0, 0, 0);
colors.indicator = QColor(0, 0, 200);
colors.indicator2 = QColor(0, 150, 0);
colors.deadzone = QColor(170, 0, 0);
colors.slider_button = QColor(153, 149, 149);
}
@@ -317,8 +319,7 @@ void PlayerControlPreview::DrawLeftController(QPainter& p, const QPointF center)
using namespace Settings::NativeAnalog;
DrawJoystick(p, center + QPointF(9, -69) + (axis_values[LStick].value * 8), 1.8f,
button_values[Settings::NativeButton::LStick]);
DrawRawJoystick(p, center + QPointF(-140, 90), axis_values[LStick].raw_value,
axis_values[LStick].properties);
DrawRawJoystick(p, center + QPointF(-140, 90), QPointF(0, 0));
}
using namespace Settings::NativeButton;
@@ -432,8 +433,7 @@ void PlayerControlPreview::DrawRightController(QPainter& p, const QPointF center
using namespace Settings::NativeAnalog;
DrawJoystick(p, center + QPointF(-9, 11) + (axis_values[RStick].value * 8), 1.8f,
button_values[Settings::NativeButton::RStick]);
DrawRawJoystick(p, center + QPointF(140, 90), axis_values[RStick].raw_value,
axis_values[RStick].properties);
DrawRawJoystick(p, QPointF(0, 0), center + QPointF(140, 90));
}
using namespace Settings::NativeButton;
@@ -547,8 +547,7 @@ void PlayerControlPreview::DrawDualController(QPainter& p, const QPointF center)
DrawJoystick(p, center + QPointF(-65, -65) + (l_stick.value * 7), 1.62f, l_button);
DrawJoystick(p, center + QPointF(65, 12) + (r_stick.value * 7), 1.62f, r_button);
DrawRawJoystick(p, center + QPointF(-180, 90), l_stick.raw_value, l_stick.properties);
DrawRawJoystick(p, center + QPointF(180, 90), r_stick.raw_value, r_stick.properties);
DrawRawJoystick(p, center + QPointF(-180, 90), center + QPointF(180, 90));
}
using namespace Settings::NativeButton;
@@ -634,8 +633,7 @@ void PlayerControlPreview::DrawHandheldController(QPainter& p, const QPointF cen
DrawJoystick(p, center + QPointF(-171, -41) + (l_stick.value * 4), 1.0f, l_button);
DrawJoystick(p, center + QPointF(171, 8) + (r_stick.value * 4), 1.0f, r_button);
DrawRawJoystick(p, center + QPointF(-50, 0), l_stick.raw_value, l_stick.properties);
DrawRawJoystick(p, center + QPointF(50, 0), r_stick.raw_value, r_stick.properties);
DrawRawJoystick(p, center + QPointF(-50, 0), center + QPointF(50, 0));
}
using namespace Settings::NativeButton;
@@ -728,10 +726,7 @@ void PlayerControlPreview::DrawProController(QPainter& p, const QPointF center)
button_values[Settings::NativeButton::LStick]);
DrawProJoystick(p, center + QPointF(51, 0), axis_values[RStick].value, 11,
button_values[Settings::NativeButton::RStick]);
DrawRawJoystick(p, center + QPointF(-50, 105), axis_values[LStick].raw_value,
axis_values[LStick].properties);
DrawRawJoystick(p, center + QPointF(50, 105), axis_values[RStick].raw_value,
axis_values[RStick].properties);
DrawRawJoystick(p, center + QPointF(-50, 105), center + QPointF(50, 105));
}
using namespace Settings::NativeButton;
@@ -821,10 +816,7 @@ void PlayerControlPreview::DrawGCController(QPainter& p, const QPointF center) {
p.setBrush(colors.font);
DrawSymbol(p, center + QPointF(61, 37) + (axis_values[RStick].value * 9.5f), Symbol::C,
1.0f);
DrawRawJoystick(p, center + QPointF(-198, -125), axis_values[LStick].raw_value,
axis_values[LStick].properties);
DrawRawJoystick(p, center + QPointF(198, -125), axis_values[RStick].raw_value,
axis_values[RStick].properties);
DrawRawJoystick(p, center + QPointF(-198, -125), center + QPointF(198, -125));
}
using namespace Settings::NativeButton;
@@ -2358,8 +2350,33 @@ void PlayerControlPreview::DrawGCJoystick(QPainter& p, const QPointF center, boo
DrawCircle(p, center, 7.5f);
}
void PlayerControlPreview::DrawRawJoystick(QPainter& p, const QPointF center, const QPointF value,
const Input::AnalogProperties& properties) {
void PlayerControlPreview::DrawRawJoystick(QPainter& p, QPointF center_left, QPointF center_right) {
using namespace Settings::NativeAnalog;
if (controller_type != Settings::ControllerType::LeftJoycon) {
DrawJoystickProperties(p, center_right, axis_values[RStick].properties);
p.setPen(colors.indicator);
p.setBrush(colors.indicator);
DrawJoystickDot(p, center_right, axis_values[RStick].raw_value,
axis_values[RStick].properties);
p.setPen(colors.indicator2);
p.setBrush(colors.indicator2);
DrawJoystickDot(p, center_right, axis_values[RStick].value, axis_values[RStick].properties);
}
if (controller_type != Settings::ControllerType::RightJoycon) {
DrawJoystickProperties(p, center_left, axis_values[LStick].properties);
p.setPen(colors.indicator);
p.setBrush(colors.indicator);
DrawJoystickDot(p, center_left, axis_values[LStick].raw_value,
axis_values[LStick].properties);
p.setPen(colors.indicator2);
p.setBrush(colors.indicator2);
DrawJoystickDot(p, center_left, axis_values[LStick].value, axis_values[LStick].properties);
}
}
void PlayerControlPreview::DrawJoystickProperties(QPainter& p, const QPointF center,
const Input::AnalogProperties& properties) {
constexpr float size = 45.0f;
const float range = size * properties.range;
const float deadzone = size * properties.deadzone;
@@ -2376,10 +2393,14 @@ void PlayerControlPreview::DrawRawJoystick(QPainter& p, const QPointF center, co
pen.setColor(colors.deadzone);
p.setPen(pen);
DrawCircle(p, center, deadzone);
}
void PlayerControlPreview::DrawJoystickDot(QPainter& p, const QPointF center, const QPointF value,
const Input::AnalogProperties& properties) {
constexpr float size = 45.0f;
const float range = size * properties.range;
// Dot pointer
p.setPen(colors.indicator);
p.setBrush(colors.indicator);
DrawCircle(p, center + (value * range), 2);
}

View File

@@ -90,6 +90,7 @@ private:
QColor highlight2{};
QColor transparent{};
QColor indicator{};
QColor indicator2{};
QColor led_on{};
QColor led_off{};
QColor slider{};
@@ -139,7 +140,10 @@ private:
// Draw joystick functions
void DrawJoystick(QPainter& p, QPointF center, float size, bool pressed);
void DrawJoystickSideview(QPainter& p, QPointF center, float angle, float size, bool pressed);
void DrawRawJoystick(QPainter& p, QPointF center, QPointF value,
void DrawRawJoystick(QPainter& p, QPointF center_left, QPointF center_right);
void DrawJoystickProperties(QPainter& p, QPointF center,
const Input::AnalogProperties& properties);
void DrawJoystickDot(QPainter& p, QPointF center, QPointF value,
const Input::AnalogProperties& properties);
void DrawProJoystick(QPainter& p, QPointF center, QPointF offset, float scalar, bool pressed);
void DrawGCJoystick(QPainter& p, QPointF center, bool pressed);

View File

@@ -1,5 +1,6 @@
set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} ${PROJECT_SOURCE_DIR}/CMakeModules)
# Credits to Samantas5855 and others for this function.
function(create_resource file output filename)
# Read hex data from file
file(READ ${file} filedata HEX)

View File

@@ -278,6 +278,9 @@ void Config::ReadValues() {
if (Settings::values.players.GetValue()[p].analogs[i].empty())
Settings::values.players.GetValue()[p].analogs[i] = default_param;
}
Settings::values.players.GetValue()[p].connected =
sdl2_config->GetBoolean(group, "connected", false);
}
ReadSetting("ControlsGeneral", Settings::values.mouse_enabled);

View File

@@ -122,6 +122,10 @@ void EmuWindow_SDL2::OnResize() {
UpdateCurrentFramebufferLayout(width, height);
}
void EmuWindow_SDL2::ShowCursor(bool show_cursor) {
SDL_ShowCursor(show_cursor ? SDL_ENABLE : SDL_DISABLE);
}
void EmuWindow_SDL2::Fullscreen() {
switch (Settings::values.fullscreen_mode.GetValue()) {
case Settings::FullscreenMode::Exclusive:
@@ -228,6 +232,7 @@ void EmuWindow_SDL2::WaitEvent() {
}
}
// Credits to Samantas5855 and others for this function.
void EmuWindow_SDL2::SetWindowIcon() {
SDL_RWops* const yuzu_icon_stream = SDL_RWFromConstMem((void*)yuzu_icon, yuzu_icon_size);
if (yuzu_icon_stream == nullptr) {

View File

@@ -67,6 +67,9 @@ protected:
/// Called by WaitEvent when any event that may cause the window to be resized occurs
void OnResize();
/// Called when users want to hide the mouse cursor
void ShowCursor(bool show_cursor);
/// Called when user passes the fullscreen parameter flag
void Fullscreen();

View File

@@ -111,6 +111,7 @@ EmuWindow_SDL2_GL::EmuWindow_SDL2_GL(InputCommon::InputSubsystem* input_subsyste
if (fullscreen) {
Fullscreen();
ShowCursor(false);
}
window_context = SDL_GL_CreateContext(render_window);

View File

@@ -45,6 +45,7 @@ EmuWindow_SDL2_VK::EmuWindow_SDL2_VK(InputCommon::InputSubsystem* input_subsyste
if (fullscreen) {
Fullscreen();
ShowCursor(false);
}
switch (wm.subsystem) {