[PATCH 0/28] AMD IOMMU interrupt remapping support

July 05th, 2012 - 08:50 am ET by Joerg Roedel | Report spam
Hi,

this patch-set adds support for IRQ remapping to the AMD IOMMU driver in
Linux. It works similar to the already present IRQ remapping code for
VT-d. The IOAPIC, HPET and MSI interrupts have a fixed setup to index an
interrupt remapping table entry and the real vector and destination is
only configured in the IRQ remapping table. This means that also the IRQ
affinity is only changed in the IOMMU. The code was heavily
stress-tested on a lot of machines and no known issues are left.

I also pushed this code into a branch of the IOMMU tree:

git://git.kernel.org/pub/scm/linux/.../iommu.git irq-remapping

Use this branch if you want to test the code yourself.

Any feedback appreciated :)

Thanks,

Joerg

Diffstat:

arch/x86/include/asm/hw_irq.h | 15 +-
arch/x86/include/asm/irq_remapping.h | 2 +
arch/x86/kernel/apic/io_apic.c | 6 +-
drivers/iommu/amd_iommu.c | 507 +++++++++++++++++++++-
drivers/iommu/amd_iommu_init.c | 772 ++++++++++++++++++++++++-
drivers/iommu/amd_iommu_proto.h | 8 +
drivers/iommu/amd_iommu_types.h | 58 ++-
drivers/iommu/intel_irq_remapping.c | 8 +-
drivers/iommu/irq_remapping.c | 6 +
drivers/iommu/irq_remapping.h | 5 +
10 files changed, 1151 insertions(+), 236 deletions(-)


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
email Follow the discussionReplies 12 repliesReplies Make a reply

Similar topics

Replies

#1 Joerg Roedel
July 05th, 2012 - 08:50 am ET | Report spam
This function will initialize everthing necessary so that
devices can do DMA. This includes dma_ops and iommu_ops.

Signed-off-by: Joerg Roedel

drivers/iommu/amd_iommu_init.c | 27 ++++++++++++++++++++-
1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index ab3bc19..0afd89c 100644
a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -1633,8 +1633,6 @@ int amd_iommu_init_hardware(void)

enable_iommus();

- amd_iommu_init_notifier();
-
register_syscore_ops(&amd_iommu_syscore_ops);

return ret;
@@ -1675,6 +1673,25 @@ static bool detect_ivrs(void)
return true;
}

+static int amd_iommu_init_dma(void)
+{
+ int ret;
+
+ if (iommu_pass_through)
+ ret = amd_iommu_init_passthrough();
+ else
+ ret = amd_iommu_init_dma_ops();
+
+ if (ret)
+ return ret;
+
+ amd_iommu_init_api();
+
+ amd_iommu_init_notifier();
+
+ return 0;
+}
+
/*
* This is the core init function for AMD IOMMU hardware in the system.
* This function is called from the generic x86 DMA layer initialization
@@ -1696,11 +1713,7 @@ static int __init amd_iommu_init(void)
if (ret)
goto free;

- if (iommu_pass_through)
- ret = amd_iommu_init_passthrough();
- else
- ret = amd_iommu_init_dma_ops();
-
+ ret = amd_iommu_init_dma();
if (ret)
goto free;

1.7.9.5


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Replies Reply to this message
#2 Joerg Roedel
July 05th, 2012 - 08:50 am ET | Report spam
Add routines to setup interrupt remapping for MSI
interrupts.

Signed-off-by: Joerg Roedel

drivers/iommu/amd_iommu.c | 74 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 74 insertions(+)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index dc2e699..ee10f30 100644
a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -4054,4 +4054,78 @@ static int free_irq(int irq)
return 0;
}

+static void compose_msi_msg(struct pci_dev *pdev,
+ unsigned int irq, unsigned int dest,
+ struct msi_msg *msg, u8 hpet_id)
+{
+ struct irq_2_irte *irte_info;
+ struct irq_cfg *cfg;
+ union irte irte;
+
+ cfg = irq_get_chip_data(irq);
+ if (!cfg)
+ return;
+
+ irte_info = &cfg->irq_remap_info.irq_2_irte;
+
+ irte.val = 0;
+ irte.fields.vector = cfg->vector;
+ irte.fields.int_type = apic->irq_delivery_mode;
+ irte.fields.destination = dest;
+ irte.fields.dm = apic->irq_dest_mode;
+ irte.fields.valid = 1;
+
+ modify_irte(irte_info->devid, irte_info->index, irte);
+
+ msg->address_hi = MSI_ADDR_BASE_HI;
+ msg->address_lo = MSI_ADDR_BASE_LO;
+ msg->data = irte_info->index;
+}
+
+static int msi_alloc_irq(struct pci_dev *pdev, int irq, int nvec)
+{
+ struct irq_cfg *cfg;
+ int index;
+ u16 devid;
+
+ if (!pdev)
+ return -EINVAL;
+
+ cfg = irq_get_chip_data(irq);
+ if (!cfg)
+ return -EINVAL;
+
+ devid = get_device_id(&pdev->dev);
+ index = alloc_irq_index(cfg, devid, nvec);
+
+ return index < 0 ? MAX_IRQS_PER_TABLE : index;
+}
+
+static int msi_setup_irq(struct pci_dev *pdev, unsigned int irq,
+ int index, int offset)
+{
+ struct irq_2_irte *irte_info;
+ struct irq_cfg *cfg;
+ u16 devid;
+
+ if (!pdev)
+ return -EINVAL;
+
+ cfg = irq_get_chip_data(irq);
+ if (!cfg)
+ return -EINVAL;
+
+ if (index >= MAX_IRQS_PER_TABLE)
+ return 0;
+
+ devid = get_device_id(&pdev->dev);
+ irte_info = &cfg->irq_remap_info.irq_2_irte;
+
+ irte_info->devid = devid;
+ irte_info->index = index + offset;
+ cfg->remapped = true;
+
+ return 0;
+}
+
#endif
1.7.9.5


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Replies Reply to this message
#3 Joerg Roedel
July 05th, 2012 - 08:50 am ET | Report spam
For interrupt remapping the relevant IOMMU initialization
needs to run earlier at boot when the PCI subsystem is not
yet initialized. To support that this patch splits the parts
of IOMMU initialization which need PCI accesses out of the
initial setup path so that this can be done later.

Signed-off-by: Joerg Roedel

drivers/iommu/amd_iommu_init.c | 224 +++++++++++++++++++++
drivers/iommu/amd_iommu_types.h | 5 +-
2 files changed, 121 insertions(+), 108 deletions(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index fb118bb..1354497 100644
a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -721,90 +721,6 @@ static void __init set_device_exclusion_range(u16 devid, struct ivmd_header *m)
}

/*
- * This function reads some important data from the IOMMU PCI space and
- * initializes the driver data structure with it. It reads the hardware
- * capabilities and the first/last device entries
- */
-static void __init init_iommu_from_pci(struct amd_iommu *iommu)
-{
- int cap_ptr = iommu->cap_ptr;
- u32 range, misc, low, high;
- int i, j;
-
- pci_read_config_dword(iommu->dev, cap_ptr + MMIO_CAP_HDR_OFFSET,
- &iommu->cap);
- pci_read_config_dword(iommu->dev, cap_ptr + MMIO_RANGE_OFFSET,
- &range);
- pci_read_config_dword(iommu->dev, cap_ptr + MMIO_MISC_OFFSET,
- &misc);
-
- iommu->first_device = calc_devid(MMIO_GET_BUS(range),
- MMIO_GET_FD(range));
- iommu->last_device = calc_devid(MMIO_GET_BUS(range),
- MMIO_GET_LD(range));
- iommu->evt_msi_num = MMIO_MSI_NUM(misc);
-
- if (!(iommu->cap & (1 << IOMMU_CAP_IOTLB)))
- amd_iommu_iotlb_sup = false;
-
- /* read extended feature bits */
- low = readl(iommu->mmio_base + MMIO_EXT_FEATURES);
- high = readl(iommu->mmio_base + MMIO_EXT_FEATURES + 4);
-
- iommu->features = ((u64)high << 32) | low;
-
- if (iommu_feature(iommu, FEATURE_GT)) {
- int glxval;
- u32 pasids;
- u64 shift;
-
- shift = iommu->features & FEATURE_PASID_MASK;
- shift >>= FEATURE_PASID_SHIFT;
- pasids = (1 << shift);
-
- amd_iommu_max_pasids = min(amd_iommu_max_pasids, pasids);
-
- glxval = iommu->features & FEATURE_GLXVAL_MASK;
- glxval >>= FEATURE_GLXVAL_SHIFT;
-
- if (amd_iommu_max_glx_val == -1)
- amd_iommu_max_glx_val = glxval;
- else
- amd_iommu_max_glx_val = min(amd_iommu_max_glx_val, glxval);
- }
-
- if (iommu_feature(iommu, FEATURE_GT) &&
- iommu_feature(iommu, FEATURE_PPR)) {
- iommu->is_iommu_v2 = true;
- amd_iommu_v2_present = true;
- }
-
- if (!is_rd890_iommu(iommu->dev))
- return;
-
- /*
- * Some rd890 systems may not be fully reconfigured by the BIOS, so
- * it's necessary for us to store this information so it can be
- * reprogrammed on resume
- */
-
- pci_read_config_dword(iommu->dev, iommu->cap_ptr + 4,
- &iommu->stored_addr_lo);
- pci_read_config_dword(iommu->dev, iommu->cap_ptr + 8,
- &iommu->stored_addr_hi);
-
- /* Low bit locks writes to configuration space */
- iommu->stored_addr_lo &= ~1;
-
- for (i = 0; i < 6; i++)
- for (j = 0; j < 0x12; j++)
- iommu->stored_l1[i][j] = iommu_read_l1(iommu, i, j);
-
- for (i = 0; i < 0x83; i++)
- iommu->stored_l2[i] = iommu_read_l2(iommu, i);
-}
-
-/*
* Takes a pointer to an AMD IOMMU entry in the ACPI table and
* initializes the hardware and our data structures with it.
*/
@@ -1020,13 +936,7 @@ static int __init init_iommu_one(struct amd_iommu *iommu, struct ivhd_header *h)
/*
* Copy data from ACPI table entry to the iommu struct
*/
- iommu->dev = pci_get_bus_and_slot(PCI_BUS(h->devid), h->devid & 0xff);
- if (!iommu->dev)
- return 1;
-
- iommu->root_pdev = pci_get_bus_and_slot(iommu->dev->bus->number,
- PCI_DEVFN(0, 0));
-
+ iommu->devid = h->devid;
iommu->cap_ptr = h->cap_ptr;
iommu->pci_seg = h->pci_seg;
iommu->mmio_phys = h->mmio_phys;
@@ -1044,20 +954,10 @@ static int __init init_iommu_one(struct amd_iommu *iommu, struct ivhd_header *h)

iommu->int_enabled = false;

- init_iommu_from_pci(iommu);
init_iommu_from_acpi(iommu, h);
init_iommu_devices(iommu);

- if (iommu_feature(iommu, FEATURE_PPR)) {
- iommu->ppr_log = alloc_ppr_log(iommu);
- if (!iommu->ppr_log)
- return -ENOMEM;
- }
-
- if (iommu->cap & (1UL << IOMMU_CAP_NPCACHE))
- amd_iommu_np_cache = true;
-
- return pci_enable_device(iommu->dev);
+ return 0;
}

/*
@@ -1106,6 +1006,121 @@ static int __init init_iommu_all(struct acpi_table_header *table)
return 0;
}

+static int iommu_init_pci(struct amd_iommu *iommu)
+{
+ int cap_ptr = iommu->cap_ptr;
+ u32 range, misc, low, high;
+
+ iommu->dev = pci_get_bus_and_slot(PCI_BUS(iommu->devid),
+ iommu->devid & 0xff);
+ if (!iommu->dev)
+ return -ENODEV;
+
+ pci_read_config_dword(iommu->dev, cap_ptr + MMIO_CAP_HDR_OFFSET,
+ &iommu->cap);
+ pci_read_config_dword(iommu->dev, cap_ptr + MMIO_RANGE_OFFSET,
+ &range);
+ pci_read_config_dword(iommu->dev, cap_ptr + MMIO_MISC_OFFSET,
+ &misc);
+
+ iommu->first_device = calc_devid(MMIO_GET_BUS(range),
+ MMIO_GET_FD(range));
+ iommu->last_device = calc_devid(MMIO_GET_BUS(range),
+ MMIO_GET_LD(range));
+
+ if (!(iommu->cap & (1 << IOMMU_CAP_IOTLB)))
+ amd_iommu_iotlb_sup = false;
+
+ /* read extended feature bits */
+ low = readl(iommu->mmio_base + MMIO_EXT_FEATURES);
+ high = readl(iommu->mmio_base + MMIO_EXT_FEATURES + 4);
+
+ iommu->features = ((u64)high << 32) | low;
+
+ if (iommu_feature(iommu, FEATURE_GT)) {
+ int glxval;
+ u32 pasids;
+ u64 shift;
+
+ shift = iommu->features & FEATURE_PASID_MASK;
+ shift >>= FEATURE_PASID_SHIFT;
+ pasids = (1 << shift);
+
+ amd_iommu_max_pasids = min(amd_iommu_max_pasids, pasids);
+
+ glxval = iommu->features & FEATURE_GLXVAL_MASK;
+ glxval >>= FEATURE_GLXVAL_SHIFT;
+
+ if (amd_iommu_max_glx_val == -1)
+ amd_iommu_max_glx_val = glxval;
+ else
+ amd_iommu_max_glx_val = min(amd_iommu_max_glx_val, glxval);
+ }
+
+ if (iommu_feature(iommu, FEATURE_GT) &&
+ iommu_feature(iommu, FEATURE_PPR)) {
+ iommu->is_iommu_v2 = true;
+ amd_iommu_v2_present = true;
+ }
+
+ if (iommu_feature(iommu, FEATURE_PPR)) {
+ iommu->ppr_log = alloc_ppr_log(iommu);
+ if (!iommu->ppr_log)
+ return -ENOMEM;
+ }
+
+ if (iommu->cap & (1UL << IOMMU_CAP_NPCACHE))
+ amd_iommu_np_cache = true;
+
+ if (is_rd890_iommu(iommu->dev)) {
+ int i, j;
+
+ iommu->root_pdev = pci_get_bus_and_slot(iommu->dev->bus->number,
+ PCI_DEVFN(0, 0));
+
+ /*
+ * Some rd890 systems may not be fully reconfigured by the
+ * BIOS, so it's necessary for us to store this information so
+ * it can be reprogrammed on resume
+ */
+ pci_read_config_dword(iommu->dev, iommu->cap_ptr + 4,
+ &iommu->stored_addr_lo);
+ pci_read_config_dword(iommu->dev, iommu->cap_ptr + 8,
+ &iommu->stored_addr_hi);
+
+ /* Low bit locks writes to configuration space */
+ iommu->stored_addr_lo &= ~1;
+
+ for (i = 0; i < 6; i++)
+ for (j = 0; j < 0x12; j++)
+ iommu->stored_l1[i][j] = iommu_read_l1(iommu, i, j);
+
+ for (i = 0; i < 0x83; i++)
+ iommu->stored_l2[i] = iommu_read_l2(iommu, i);
+ }
+
+ return pci_enable_device(iommu->dev);
+}
+
+static int amd_iommu_init_pci(void)
+{
+ struct amd_iommu *iommu;
+ int ret = 0;
+
+ for_each_iommu(iommu) {
+ ret = iommu_init_pci(iommu);
+ if (ret)
+ break;
+ }
+
+ /* Make sure ACS will be enabled */
+ pci_request_acs();
+
+ ret = amd_iommu_init_devices();
+
+ return ret;
+}
+
/****************************************************************************
*
* The following functions initialize the MSI interrupts for all IOMMUs
@@ -1569,7 +1584,7 @@ int __init amd_iommu_init_hardware(void)
if (ret)
goto free;

- ret = amd_iommu_init_devices();
+ ret = amd_iommu_init_pci();
if (ret)
goto free;

@@ -1702,9 +1717,6 @@ int __init amd_iommu_detect(void)
iommu_detected = 1;
x86_init.iommu.iommu_init = amd_iommu_init;

- /* Make sure ACS will be enabled */
- pci_request_acs();
-
return 0;
}

diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index 2435555..6f32d7b 100644
a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -501,6 +501,9 @@ struct amd_iommu {
/* IOMMUv2 */
bool is_iommu_v2;

+ /* PCI device id of the IOMMU device */
+ u16 devid;
+
/*
* Capability pointer. There could be more than one IOMMU per PCI
* device function if there are more than one AMD IOMMU capability
@@ -530,8 +533,6 @@ struct amd_iommu {
u32 evt_buf_size;
/* event buffer virtual address */
u8 *evt_buf;
- /* MSI number for event interrupt */
- u16 evt_msi_num;

/* Base of the PPR log, if present */
u8 *ppr_log;
1.7.9.5


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Replies Reply to this message
#4 Joerg Roedel
July 05th, 2012 - 08:50 am ET | Report spam
When the IOMMU is enabled very early (as with irq-remapping)
some devices are still in BIOS hand. When dma is blocked
early this can cause lots of IO_PAGE_FAULTs. So delay the
DMA initialization and do it right before the dma_ops are
initialized.

Signed-off-by: Joerg Roedel

drivers/iommu/amd_iommu_init.c | 22 ++++++++++++++++++-
1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index e663f1d..453f80a 100644
a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -1383,19 +1383,27 @@ static int __init init_memory_definitions(struct acpi_table_header *table)
* Init the device table to not allow DMA access for devices and
* suppress all page faults
*/
-static void init_device_table(void)
+static void init_device_table_dma(void)
{
u32 devid;

for (devid = 0; devid <= amd_iommu_last_bdf; ++devid) {
set_dev_entry_bit(devid, DEV_ENTRY_VALID);
set_dev_entry_bit(devid, DEV_ENTRY_TRANSLATION);
-
- if (amd_iommu_irq_remap)
- set_dev_entry_bit(devid, DEV_ENTRY_IRQ_TBL_EN);
}
}

+static void init_device_table(void)
+{
+ u32 devid;
+
+ if (!amd_iommu_irq_remap)
+ return;
+
+ for (devid = 0; devid <= amd_iommu_last_bdf; ++devid)
+ set_dev_entry_bit(devid, DEV_ENTRY_IRQ_TBL_EN);
+}
+
static void iommu_init_flags(struct amd_iommu *iommu)
{
iommu->acpi_flags & IVHD_FLAG_HT_TUN_EN_MASK ?
@@ -1783,8 +1791,14 @@ static bool detect_ivrs(void)

static int amd_iommu_init_dma(void)
{
+ struct amd_iommu *iommu;
int ret;

+ init_device_table_dma();
+
+ for_each_iommu(iommu)
+ iommu_flush_all_caches(iommu);
+
if (iommu_pass_through)
ret = amd_iommu_init_passthrough();
else
1.7.9.5


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Replies Reply to this message
#5 Joe Perches
July 06th, 2012 - 12:10 am ET | Report spam
On Thu, 2012-07-05 at 14:36 +0200, Joerg Roedel wrote:
This function will be called before the PCI subsystem is
initialized. Therefore dev_name doen't work and IOMMU
information can't be printed to the klog as before. Move the
code to print that information to a later point where PCI
initializtion has already happened.


[]
diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c


[]
@@ -1102,6 +1085,28 @@ static int iommu_init_pci(struct amd_iommu *iommu)


[]
+static void print_iommu_info(void)
+{
+ static const char * const feat_str[] = {
+ "PreF", "PPR", "X2APIC", "NX", "GT", "[5]",
+ "IA", "GA", "HE", "PC", NULL
+ };
+ struct amd_iommu *iommu;
+
+ for_each_iommu(iommu) {
+ int i;
+ pr_info("AMD-Vi: Found IOMMU at %s cap 0x%hx",
+ dev_name(&iommu->dev->dev), iommu->cap_ptr);
+ if (iommu->cap & (1 << IOMMU_CAP_EFR)) {
+ pr_info("AMD-Vi: Extended features: ");
+ for (i = 0; feat_str[i]; ++i)
+ if (iommu_feature(iommu, (1ULL << i)))
+ pr_cont(" %s", feat_str[i]);



I think this should use {} around the for loop
and this would be better as:

static const char * const feat_str[] = {
"PreF", "PPR", "X2APIC", "NX", "GT", "[5]",
"IA", "GA", "HE", "PC"
};
[]
for (i = 0; ARRAY_SIZE(feat_str); i++) {
if (iommu_feature(iommu, (1ULL << i)))
pr_cont(" %s", feat_str[i]);
}

I don't see the utility of the separate function and
this could just be inlined in the calling function.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Replies Reply to this message
Help Create a new topicNext page Replies Make a reply
Search Make your own search